An Efficient Clustering Algorithm For Very Large Databases

Sajjad Fallah; Ghorban Kheradmandian

An Efficient Clustering Algorithm For Very Large Databases

محل انتشار: کنفرانس ملی مهندسی نرم افزار

سال انتشار: 1388

نوع سند: مقاله کنفرانسی

زبان: انگلیسی

مشاهده: 2,948

فایل این مقاله در 7 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/61448

شناسه ملی سند علمی:

NCSE01_040

تاریخ نمایه سازی: 14 آبان 1387

چکیده مقاله:

Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset. In this paper a new clustering algorithm for very large databases is proposed. The proposed algorithm, at most loads the entire data set into memory, three times. In phase 1, the encompassing space of data set is identified and it is partitioned into several sub-spaces, depending on the amount of available memory. Next, the entire data set, step by step is loaded into memory and each data point is assigned into a sub-space, and the average of data assigned into each sub-space is stored. In the later phase, some of the small sub-clusters corresponding to sub-spaces are hierarchically merged and constitute larger clusters. Our clustering algorithm is independent of the order of training samples appearance and our experimental results with complex data reveal that its performance is better than the well known k means algorithm.

کلیدواژه ها:

Clustering ، Large Database ، Hierarchical ، Space Dividing ، merging

نویسندگان

Sajjad Fallah

Department of ICT, Shahid Beheshti of Medical University Tehran, Iran

Ghorban Kheradmandian

Department of Computer Engineering, Amirkabir University of Technology Tehran, Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

Tian Zhang, Raghu Ramakrishn an, and Miron Livny, ،BIRCH: An ...
Martin Ester, Hans-Peter Kriegel and Xiaowei XU, *Knowledge Discovery in ...
William H.Hau, *Knowledge Discovery in Databases (KDD) and Data Mining, ...
construction of k-clusters, ' Journal, Vol. 15, pp.326-332, 1972. ...
Kurita T. 4An efficient agglomerative clustering algorithm using a heap, ...
Bradley, P., Fayyad, U., and Reina, C. _ Scaling clustering ...

نمایش کامل مراجع