ارائه روش مبتنی بر الگوریتم ژنتیک برای مسئله ‌یافتن پایدارترین خوشه‌ها در خوشه‌بندی ترکیبی

Fa | Ar | En

ارائه روش مبتنی بر الگوریتم ژنتیک برای مسئله ‌یافتن پایدارترین خوشه‌ها در خوشه‌بندی ترکیبی


نویسنده	صمیمی بهبهان نوید ,نجاتیان صمد ,پروین حمید ,باقری فرد کرم اله ,رضایی وحیده
منبع	پردازش علائم و داده ها - 1403 - شماره : 3 - صفحه:111 -136
چکیده	خوشه‌بندی نقش حیاتی در روش‌های بازیابی اطلاعات برای سازمان‌دهی مجموعه‌های بزرگ، درونِ تعداد کمی خوشه معنادار دارد. یکی از مهم‌ترین انگیزه‌های استفاده از خوشه‌بندی، تعیین و آشکارکردن ساختار ذاتی و پنهان یک مجموعه‌داده است. کاربران انسانی به علت تفاوت در سلیقه و طرز تفکرات مختلف از کشف ساختار ذاتی و درونی مجموعه‌داده‌ای بزرگ متون ناتوان‌اند. الگوریتم‌های خوشه‌بندی ترکیبی چند الگوریتم خوشه‌بندی را با هم ترکیب می‌کنند تا در نهایت به یک سامانه کلی خوشه‌بندی برسند. روش‌های خوشه‌بندی ترکیبی برای یافتن راه‌های بهتری با استفاده از بیرون‌کشیدن اطلاعات از چندین افراز اولیه داده‌هاست. ازآنجاکه الگوریتم‌های خوشه‌بندی مختلف به نقاط مختلف داده نگاه می‌کنند، آن‌ها می‌توانند افراز‌های مختلفی را از این‌چنین داده‌هایی تولید کنند؛ با ترکیب افراز‌های به‌دست‌آمده از الگوریتم‌های مختلف، ایجاد یک افراز با کارایی بالا ممکن است، حتی اگر خوشه‌ها از هم بسیار متراکم باشند. در این مقاله، روشی جدید معرفی شده‌است که به‌جای استفاده از تمامی خوشه‌های اولیه تولیدشده، از پایدارترین آن‌ها که توسط شش روش مختلف تولید شده‌اند، استفاده می‌کند. برای انتخاب خوشه‌های پایدارتر از تابع توافقی مبتنی بر ماتریس هم‌بستگی استفاده می‌شود. انتخاب خوشه‌های پایدارتر بر اساس معیار پایداری خوشه مبتنی بر معیار فیشر انجام می‌گیرد و سپس خوشه‌های به‌دست‌آمده به‌وسیله الگوریتم ژنتیک مورد ارزیابی قرار می‌گیرد و طبق این الگوریتم پایدارترین خوشه‌ها انتخاب می‌شوند؛ درنهایت ماتریس هم‌بستگی به‌دست‌آمده از اجماع خوشه‌های بهینه، به‌عنوان یک ماتریس مشابهت در نظر گرفته می‌شود. یک الگوریتم خوشه‌بندی سلسله‌مراتبی به‌عنوان تابع جمع‌کننده نهایی در نظر گرفته می‌شود و ماتریس هم‌بستگی به‌دست‌آمده را به‌عنوان ورودی گرفته و خوشه‌بندی توافقی نهایی را برمی‌گرداند. نتایج تجربی روی چندین مجموعه‌داده نشان می‌دهد که روش پیشنهادی، خوشه‌های متنوع و با پایداری بالا تولید می‌کند. به طور مشخص، این روش در معیارهای nmi و ari به ترتیب بهبودهای قابل توجهی به میزان 12٪ و 5٪ نسبت به بهترین روش‌های پیشین به دست آورده‌است. این نشان‌دهنده برتری روش خوشه‌بندی ترکیبی پیشنهادی مبتنی بر پایداری خوشه و الگوریتم‌های ژنتیک است.
کلیدواژه	خوشه‌بندی ترکیبی، پایداری خوشه، معیار فیشر، ماتریس هم‌بستگی، الگوریتم ژنتیک
آدرس	دانشگاه آزاد اسلامی واحد یاسوج, گروه مهندسی کامپیوتر, ایران, دانشگاه آزاد اسلامی واحد یاسوج, گروه مهندسی برق, ایران, دانشگاه آزاد اسلامی واحد نورآباد ممسنی, گروه مهندسی کامپیوتر, ایران, دانشگاه آزاد اسلامی واحد یاسوج, گروه مهندسی کامپیوتر, ایران, دانشگاه آزاد اسلامی واحد یاسوج, گروه ریاضی, ایران

presenting a method based on genetic algorithm for finding the most stable clusters in ensemble clustering

Authors	samimi navid ,nejatian samad ,parvin hamid ,bagheri fard karamolah ,rezaei vahideh
Abstract	clustering is one of the fundamental tools in data analysis and data mining, enabling the extraction of hidden and meaningful structures from large datasets by grouping data based on intrinsic similarities. however, selecting optimal clusters in conventional clustering algorithms poses challenges, especially when clusters are dense or heterogeneous. in this study, a novel genetic algorithm-based method is proposed to identify the most stable clusters in ensemble clustering. by leveraging cluster stability criteria and a correlation matrix, the proposed approach improves the accuracy and stability of the final clustering results. the proposed method involves generating initial partitions of the data using six different clustering algorithms. next, the fisher criterion is applied to identify more stable clusters. these selected clusters are then evaluated and optimized using a genetic algorithm to construct an optimized correlation matrix. this matrix is subsequently fed into a hierarchical clustering algorithm, which produces the final consensus clustering. the proposed method was tested on standard datasets. results demonstrated improvements of 12% and 5% in nmi and ari metrics, respectively, compared to previous methods. the use of a genetic algorithm enabled the identification of clusters with higher stability and diversity, reducing the impact of noise and increasing the accuracy of the final clustering. moreover, the method outperformed individual base clustering algorithms in providing more precise clustering results. due to its ability to enhance the accuracy and stability of clustering, the proposed method holds potential for applications in domains such as big data analysis, machine learning, and information retrieval. the use of the fisher criterion for selecting stable clusters and genetic algorithms for optimization are among the strengths of this research. this method not only preserves diversity among clusters but also significantly enhances clustering accuracy. future studies could explore the combination of this approach with more advanced algorithms to assess its applicability to more complex datasets.
Keywords	ensemble clustering ,cluster stability ,fisher criterion ,correlation matrix ,genetic algorithm