الگوریتمی مبتنی بر گراف برای خوشه ‌بندی سوره‌های قرآن کریم

Fa | Ar | En

الگوریتمی مبتنی بر گراف برای خوشه ‌بندی سوره‌های قرآن کریم


نویسنده	مینایی بهروز ,متقی مریم سادات
منبع	پردازش علائم و داده ها - 1403 - شماره : 1 - صفحه:71 -88
چکیده	قرآن کتاب نازل شده از طرف خداست و تا به امروز اندیشمندان و پژوهشگران مختلفی در جهت شناخت قرآن و فهم آن تلاش کرده اند. در دسترس بودن دستگاه های رایانه ای فرصت مغتنمی است که با افزایش سرعت پژوهشگران در پیمودن مسیر، آنها را در رسیدن به قله های بلندتری یاری کند. خوشه بندی یکی از روشهایی است که برای فهم ساختار داده به کار میرود. در این مقاله به خوشه بندی سوره های قرآن کریم بر اساس هم وقوعی کلمات در آن پرداخته و برای دستیابی به این هدف از یک رویکرد موجود مبتنی بر گراف استفاده شده است. در پژوهش جاری، نخست هر سوره را به صورت یک گراف غیرجهت دار و وزن دار بازنمایی کرده،سپس بردار هر سوره را بر اساس گراف سوره تشکیل داده ایم و پس از آن سوره ها را خوشه بندی نموده ایم. برای ارزیابی کیفیت خوشه بندی از معیار نیمرخ استفاده کرده ایم. بر اساس این معیار در بهترین خوشه بندی در بین اجراهای مختلف مقدار نیمرخ 91/0 به دست آمده است. این پژوهش زیرساخت ساختاری مناسبی برای توصیف لایه معنایی سوره ها و آیات قرآن پیش روی پژوهشگران حوزه زبان شناسی محاسباتی در دامنه علوم قرآنی فراهم می سازد.
کلیدواژه	خوشه‌بندی متن، بازنمایی شبکه‌ای متن، گراف متن، زیرگراف‌ پرتکرار، قرآن ‌کاوی رایانشی
آدرس	دانشگاه علم و صنعت ایران, دانشکده مهندسی کامپیوتر, ایران, دانشگاه شهید بهشتی, پژوهشکده اعجاز قرآن, ایران
پست الکترونیکی	m.motaghi88@chmail.ir

a graph-based algorithm for clustering qur’anic surahs

Authors	minaei behrouz ,mottaghi maryam sadat
Abstract	the holy quran is revealed from god almighty. up to now many scholars and researchers have tried to understand the holy qur'an and comprehend it. the availability of computer systems is a great opportunity to help researchers reach higher peaks by speeding them up in their way. clustering is one of the methods has been used to understand the structure of the data. in clustering, we want to divide samples of data into groups so that the members of each cluster are similar together and are different from the members of the other clusters. clustering of quranic surahs has been the subject of some computer studies on the quran. in these studies, different approaches have been considered to vectorizing the surahs. in a study, thabet formed vectors of each surah by considering some stems of quranic words as features and the normalized probability of their occurrences in the surah as feature values and clustered just 24 surahs due to the sparseness of the obtained data matrix. with a similar approach in vectorizing the surahs, moisl calculated the minimum surah length threshold per feature in order to solve the problem of shorter surahs by using some concepts of statistical sampling theory, and could cluster more surahs. instead of using words as features, sharaf considered 13 features including existence of referring to the story of adam and iblis, number of the phrase «یا اَیُُّهَا الَُّذینَ آمَنُوا » (o you who believe), and determined the method of measuring each feature. then, he formed data matrix and clustered the qur'anic surahs. in another study, sufi et al. considered the topics identified for each verse in the tafsir rahnama as features and constructed a binary data matrix based on the presence or absence of that topic in the tafsir of that surah and applied clustering. in this article, we have clustered the surahs of the holy quran based on the co-occurrence of words in it. to achieve this goal, we have used an existing graph-based approach. in the present study, we first represent each surah as a weighted undirected graph. then we form the vector of each surah by considering closed frequent sub-graphs as features and relative occurrence of them in each surah as feature values, and eventually cluster the surahs. we used the silhouette score to evaluate the quality of clustering. based on this criterion, in the best clustering among different runs, the silhouette score of 0.91 was obtained. this research provides a proper structural infrastructure for specifying the semantic layer of holy quran surahs for computational linguistics researchers in the domain of quranic studies.
Keywords	document clustering ,text graph ,frequent subgraph ,computational qur'an mining