ارائه یک الگوریتم خوشه ‌بندی مبتنی بر چگالی توسعه‌ یافته در کلان ‌داده‌ها

Fa | Ar | En

ارائه یک الگوریتم خوشه ‌بندی مبتنی بر چگالی توسعه‌ یافته در کلان ‌داده‌ها


نویسنده	قائمی رضا ,آراد یعقوب ,حاج قاضی فرشته
منبع	مديريت اطلاعات - 1401 - دوره : 8 - شماره : 2 - صفحه:21 -41
چکیده	امروزه تولید داده از طریق تجهیزات هوشمند، ازجمله تلفن ‌های همراه، با رشد چشم‌گیری روبه‌رو بوده و خوشه‌بندی یکی از تکنیک‌های پرکاربرد کشف دانش در کلان‌داده‌ها است. خوشه‌بندی مبتنی بر چگالی (dbscan)، از الگوریتم‌های خوشه‌بندی کارا در داده‌کاوی بوده و با وجود داشتن مزایا، دارای مشکلاتی ازجمله سختی در تعیین پارامترهای ورودی و همچنین، نداشتن توانایی در کشف خوشه‌هایی با چگالی متفاوت نیز هست. در الگوریتم پیشنهادی این مقاله، از الگوریتم k-dbscan در گروه‌بندی داده‌های حجیم با هدف کاهش زمان اجرای خوشه‌بندی الهام گرفته شده است. به‌علاوه، با استفاده از الگوریتم‌های k-means و h-dbscan، چگالی‌های مختلف مجموعه‌داده تشخیص داده می‌شود، برای هر چگالی یک شعاع eps تعیین شده و سپس، الگوریتم پیشنهادی خوشه‌بندی مبتنی بر چگالی توسعه‌یافته با پارامترهای منطبق روی داده‌ها اعمال می‌شود. در واقع، نوآوری این مقاله استفاده از خوشه‌بندی k-means و تخمین چگالی‌های مختلف در روش خوشه‌بندی dbscan است. الگوریتم پیشنهادی روی چهار مجموعه‌داده استاندارد image segmentation، pendigit، letters و shuttle control با الگوریتم خوشه‌بندی dbscan ساده و دو الگوریتم توسعه‌یافته k-dbscan و h-dbscan مقایسه شده است. نتایج نشان می‌دهد که الگوریتم پیشنهادی در زمانی که هر دو معیار زمان و دقت در خوشه‌بندی ملاک باشند، در مقایسه با الگوریتم‌های دیگر، الگوریتم برتری است.
کلیدواژه	کلان‌ داده‌ها، خوشه‌ بندی، k-means ،h-dbscan ،k-dbscan ،dbscan
آدرس	دانشگاه آزاد اسلامی واحد قوچان, گروه مهندسی کامپیوتر, ایران, دانشگاه آزاد اسلامی واحد علوم و تحقیقات, گروه مهندسی کامپیوتر, ایران, دانشگاه آزاد اسلامی واحد علوم و تحقیقات, گروه مهندسی کامپیوتر, ایران
پست الکترونیکی	f.hajghazi@iau-neyshabur.ac.ir

an extended density-based clustering algorithm in big data

Authors	ghaemi reza ,arad yaghoob ,hajghazi fereshteh
Abstract	today, data generation through smart equipment, including mobile phones, has faced a significant growth, and clustering is one of the most widely used knowledge discovery techniques in big data. density-based clustering (dbscan) is one of the most efficient clustering algorithms in data mining, and despite having advantages, it also has problems, such as the difficulty in determining the input parameters, as well as not being able to detect clusters. with different densities. in the proposed algorithm of this article, it is inspired by the k-dbscan algorithm in grouping large data with the aim of reducing the clustering execution time.in addition, by using k-means and h-dbscan algorithms, different densities of the data set were identified and an eps radius was determined for each density, and then, the proposed density-based clustering algorithm was developed with parameters the matching is applied to the data, and in fact, the innovation of this article is the use of k means clustering and the estimation of different densities in the dbscan clustering method. the proposed algorithm has been compared with the simple dbscan clustering algorithm and two developed k-dbscan and h-dbscan algorithms on four standard data sets: image segmentation, pendigit, letters and shuttle control. the results show that the proposed algorithm is superior to other algorithms when both time and accuracy are criteria in clustering.
Keywords	big data ,clustering ,dbscan ,k-dbscan ,h-dbscan ,k-means