بهبود الگوریتم خوشه‌بندی مبتنی بر چگالی با استفاده از اصلاح تعاریف چگالی و پارامتر ورودی

Fa | Ar | En

بهبود الگوریتم خوشه‌بندی مبتنی بر چگالی با استفاده از اصلاح تعاریف چگالی و پارامتر ورودی


نویسنده	پهلوانزاده علیرضا ,نیک نفس علی اکبر
منبع	پردازش علائم و داده ها - 1398 - شماره : 2 - صفحه:105 -120
چکیده	خوشه بندی مبتنی بر چگالی یکی از روش های مورد توجه در داده کاوی و dbscanنمونه ای پرکاربرد از این روش است. dbscan علاوه بر مزایای خود معایبی نیز دارد. به عنوان نمونه، تعیین پارامترهای ورودی این الگوریتم توسط کاربر کار مشکلی است. در مقالۀ حاضر سعی شده است، اصلاحاتی روی یکی از الگوریتم های مبتنی برچگالی به نام isbdbscan انجام شود. در روش پیشنهادی همانند isbdbscan از یک پارامتر ورودی k به عنوان تعداد نزدیک ترین همسایه استفاده شده است. از آنجا که تعیین پارامتر k ممکن است، برای کاربر مشکل باشد، یک روش پیشنهادی با الگوریتم ژنتیک برای تعیین خودکار k نیز ارائه شده است. برای ارزیابی روش های پیشنهادی آزمایش هایی روی یازده مجموعه دادۀ استاندارد انجام شد و دقت خوشه بندی در روش ها مورد ارزیابی قرار گرفت. نتایج به دست آمده در مقایسه با دیگر روش های موجود نشان داد که روش پیشنهادی در مجموعه داده های مختلف، نتایج بهتری را کسب کرده است.
کلیدواژه	خوشه‌بندی مبتنی بر چگالی، پارامتر همسایگی، خوشه‌بندی با چگالی متفاوت
آدرس	دانشگاه شهید باهنر کرمان, بخش مهندسی کامپیوتر, ایران, دانشگاه شهید باهنر کرمان, بخش مهندسی کامپیوتر, ایران

Improvement of density-based clustering algorithm using modifying the density definitions and input parameter

Authors	Pahlevanzadeh Alireza ,Niknafs Aliakbar
Abstract	Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is densitybased clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically identify the number of clusters. There are advantages and disadvantages in this algorithm. It is difficult to determine the input parameters of this algorithm by the user. Also, this algorithm is unable to detect clusters with different densities in the data set. ISBDBSCAN algorithm is another example of densitybased algorithms that eliminates the disadvantages of the DBSCAN algorithm. ISBDBSCAN algorithm reduces the input parameters of DBSCAN algorithm and uses an input parameter k as the nearest neighbor's number. This method is also able to identify different density clusters, but according to the definition of the new core point, It is not able to identify some clusters in a different data set. This paper presents a method for improving ISBDBSCAN algorithm. A proposed approach, such as ISBDBSCAN, uses an input parameter k as the number of nearest neighbors and provides a new definition for core point. This method performs clustering in three steps, with the difference that, unlike ISBDBSCAN algorithm, it can create a new cluster in the final stage. In the proposed method, a new criterion, such as the number of dataset dimensions used to detect noise in the used data set. Since the determination of the k parameter in the proposed method may be difficult for the user, a new method with genetic algorithm is also proposed for the automatic estimation of the k parameter. To evaluate the proposed methods, tests were carried out on 11 standard data sets and the accuracy of clustering in the methods was evaluated. The results showe that the proposed method is able to achieve better results in different data sets compare to other available methods. In the proposed method, the automatic determination of k parameter also obtained acceptable results.
Keywords	Density-based clustering ,neighborhood parameter ,clustering with different density