یک الگوریتم مبتنی بر افرازبندی گراف برای خوشه‌بندی سامانه‌‌های نرم‌افزاری با ابعاد بزرگ

Fa | Ar | En

یک الگوریتم مبتنی بر افرازبندی گراف برای خوشه‌بندی سامانه‌‌های نرم‌افزاری با ابعاد بزرگ


نویسنده	پوراصغر بابک ,ایزدخواه حبیب ,لطفی شهریار ,صالحی خیام
منبع	پردازش علائم و داده ها - 1400 - شماره : 4 - صفحه:37 -48
چکیده	از روش‌های خوشه‌بندی برای بازیابی ساختار نرم‌افزار جهت فهم درست آن و همچنین بازسازی نرم‌افزار استفاده می‌شود. در ادبیات موضوع، بیشتر الگوریتم‌های ارائه‌شده برای خوشه‌بندی سامانه‌های نرم‌افزاری به دو دسته الگوریتم‌های مبتنی بر جستجو و الگوریتم‌های سلسله‌ مراتبی طبقه‌بندی می‌شوند و الگوریتمی ‌از رده مبتنی بر افراز برای خوشه‌بندی یک سامانه نرم‌افزاری ارائه نشده ‌است. این روش‌ها سعی دارند که گراف وابستگی موجودیت به‌دست‌آمده از کد منبع سامانه نرم‌افزاری را به چند مجموعه راسی افراز کنند. در سامانه‌های نرم‌افزاری، موجودیت می‌تواند رده، تابع و یا یک فایل باشد. با توجه به چندجمله‌ای غیر قطعی، سخت‌بودن مساله خوشه‌بندی، در سال‌های اخیر از روش‌های تکاملی و مبتنی بر جستجو مانند الگوریتم ژنتیک برای این حل این مساله، زیاد استفاده شده است. هر چند این الگوریتم‌ها در برخی موارد می‌توانند ساختار مناسبی از نرم‌افزار را به‌دست آورند، اما برای نرم‌افزار‌های با ابعاد بزرگ، با توجه به زمان اجرا و حافظه مصرفی زیاد، قابل اجرا نیستند؛ همچنین، این روش‌ها از اطلاعات و دانش گرافی موجود در گراف وابستگی موجودیت استفاده‌ی چندانی نمی‌کنند. در این مقاله یک الگوریتم مبتنی بر افراز ارائه شده است که بتوان از آن در خوشه‌بندی نرم افزار نیز استفاده کرد. همچنین، یک نوع فاصله جدید برای قیاس تشابه و عدم تشابه ارائه شده ‌است. انتظار می‌رود روش پیشنهادی بتواند در قیاس با سایر روش‌های موجود، خوشه‌بندی‌هایی با کیفیت بالاتر و نزدیک به خوشه‌بندی فرد خبره، تولید کند. برای بررسی صحت اجرای الگوریتم، آن را بر روی نرم‌افزار موزیلا فایرفاکس اجرا کرده و نتایج را با الگوریتم‌های مطرح این حوزه، مقایسه کرده‌ایم.
کلیدواژه	مهندسی نرم‌افزار، مهندسی معکوس، خوشه‌بندی نرم‌افزار، الگوریتم k-means
آدرس	دانشگاه تبریز, دانشکده علوم ریاضی, گروه علوم کامپیوتر, ایران, دانشگاه تبریز, دانشکده علوم ریاضی, گروه علوم کامپیوتر, ایران, دانشگاه تبریز, دانشکده علوم ریاضی, گروه علوم کامپیوتر, ایران, دانشگاه شهرکرد, دانشکده علوم ریاضی, گروه علوم کامپیوتر, ایران
پست الکترونیکی	khayyam.salehi@gmail.com

A partition-based algorithm for clustering large-scale software systems

Authors	Pourasghar Babak ,Izadkhah Habib ,Lotfi Shahriar ,Salehi Khayyam
Abstract	Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and searchbased techniques. In the former, clustering is a process of merging (splitting) similar (nonsimilar) clusters. These techniques suffered from the drawbacks such as finiteness criterion and arbitrary decisions occurred in the process. Because of the NPhardness of clustering software systems, evolutionary and searchbased algorithms are more commonly used algorithm than hierarchical ones. In evolutionary algorithms, the clustering of software systems is considered as a problem of searching over some possible clustering candidates. Although these algorithms are often able to achieve an appropriate structure of the software, they are not applicable in clustering largescale software. Furthermore, these algorithms are unable to consider the knowledge in the artifact dependency graph, which extracted from the source code of the software. In software systems, an artifact can be everything like a class, a function, or a file. In this paper, a new partitionbased clustering algorithm is presented. This algorithm attempts to partition the artifact dependency graph considering the knowledge therein. Moreover, a new distance criterion is presented to measure the similarity and dissimilarity of the artifacts. The proposed algorithm starts with the artifact dependency graph and creates the similarity matrices of the artifacts. So, it attempts to refine the partition candidate until a fixed point is reached. We expect that the proposed method compared with other methods could lead to achieve the clustering with high quality and similar to the expert #39;s clustering based on MoJoFM measure. To demonstrate the applicability and validity of the proposed algorithm, a largescale case study, Mozilla Firefox, is employed. The results demonstrate that the proposed algorithm outperforms the commonly used evolutionary methods in the literature.
Keywords	Software Engineering ,Reverse Engineering ,Software Clustering ,K-means algorithm