یافتن پارامترهای بهینه برای الگوریتم خوشه‌بندی adbscan با استفاده از الگوریتم ژنتیک

Fa | Ar | En

یافتن پارامترهای بهینه برای الگوریتم خوشه‌بندی adbscan با استفاده از الگوریتم ژنتیک


نویسنده	انتظامی مطهره ,شکیبا علی
منبع	نهمين كنفرانس بين المللي وب پژوهي - 1402 - دوره : 9 - نهمین کنفرانس بین المللی وب پژوهی - کد همایش: 02221-97364 - صفحه:0 -0
چکیده	خوشه‌بندی، فرآیندی است که مجموعه‌ای از اشیاء را به گروه‌های مجزا افراز می‌کند که هر افراز یک خوشه نامیده می‌شود. در یک خوشه‌بندی، مطلوب است تا اعضاء هر خوشه از لحاظ ویژگی‌ها، به یکدیگر شبیه باشند. همچنین، لازم است تا میزان شباهت بین نمونه‌هایی که در خوشه‌های متفاوت هستند، پایین باشد. به صورت کلی، الگوریتم‌های خوشه‌بندی از یکی از رویکردهای افرازی، سلسله‌مراتبی، چگالی، مبتنی بر مدل و یا ترکیبی از آن‌ها استفاده می‌کنند. الگوریتم adbscan، الگوریتمی برای خوشه‌بندی دادگان و مبتنی بر چگالی است. این الگوریتم، یک روش جدید برای شناسایی نمونه‌های محلی با چگالی بالا با استفاده از خواص ذاتی گراف نزدیکترین همسایگی را ارائه می‌کند. در این الگوریتم، از دو پارامتر k (تعداد نزدیکترین همسایگان) و درصد نویز در مجموعه داده استفاده می‌شود. این دو پارامتر، تاثیر به سزایی در نتیجه محاسبات و کیفیت خروجی دارند. بنابراین، لازم است تا این دو مقدار در بهینه‌ترین حالت ممکن تنظیم شوند. جستجوی فراگیر، یکی از راهکارهای یافتن مقدار بهینه است. به منظور کاهش زمان جستجو، در این مقاله از روش جستجوی ژنتیک برای یافتن مقادیر بهینه‌ی این پارامترها استفاده شده است. با به کارگیری روش پیشنهادی، به صورت متوسط، 46/11 درصد بهبود در معیار ari حاصل شده است.
کلیدواژه	خوشه‌بندی مبتنی بر چگالی، adbscan ، الگوریتم ژنتیک
آدرس	, iran, , iran
پست الکترونیکی	ali.shakiba@vru.ac.ir

finding optimal parameters for adbscan clustering algorithm using genetic algorithm

Authors
Abstract	clustering is the process of partitioning a set of objects into disjoint groups, each partition is called a cluster. intuitively, it is desirable that the members in each cluster are very similar to each other in terms of their characteristics. as well, it is desirable to have a low degree of similarity between members in different clusters. in general, clustering algorithms can be categorized to follow either a partitioning, a hierarchical, a density, a model-based or any combination of these approaches.the adbscan algorithm is a density-based clustering algorithm which presents a new method to identify high-density local instances considering the properties of the nearest neighbor graph. two parameters are used in this algorithm, namely the parameter k representing the number of nearest neighbors, and the percentage of noise in the data set. these parameters have a significant effect on the quality of the output as well as the required time. therefore, it is necessary to find optimal values for these parameters. brute-force search is one of the naïve ways to this end. however, evolutionary-based algorithms such as genetic search methods can be used to make the search process easy and efficient. in this paper, we applied the genetic algorithm to get optimal values of the parameters. the proposed method led to an 11.46% improvement in the ari criterion, on average.
Keywords	density-based clustering ,adbscan ,genetic algorithm