تکنیک‌های خلاصه‌سازی چندسندی خودکار متون فارسی مبتنی بر الگوریتم‌های فرااکتشافی

Fa | Ar | En

تکنیک‌های خلاصه‌سازی چندسندی خودکار متون فارسی مبتنی بر الگوریتم‌های فرااکتشافی


نویسنده	آهنگری فاطمه ,کرباسی سهیلا ,یعقوبی مهدی
منبع	مطالعات كتابداري و سازماندهي اطلاعات - 1398 - دوره : 30 - شماره : 2 - صفحه:58 -80
چکیده	هدف:ارائه الگوی خلاصه‌سازی استاندارد متون فارسی با رویکرد تبدیل مسئله خلاصه‌سازی به مسئله بهینه‌سازی توسط الگوریتم‌های فرااکتشافی سازگار. روش‌شناسی: در این پژوهش از اسناد استاندارد پیکره چندسندی «پاسخ» که شامل 50 موضوع مختلف از انواع گونه‌های خبری از خبرگزاری‌های پرببینده ایران، برای ارزیابی استفاده شده است. هر موضوع حاوی 20 سند و همچنین 5 خلاصه چکیده‌ای و 5 خلاصه استخراجی است. ابتدا عملیات پیش‌پردازش روی متون ورودی انجام و خلاصه‌های اولیه تولید شدند. این کار به‌کمک معیار tfisf، معیارهای خوانایی و انسجام جملات، ویژگی شباهت با عنوان، ویژگی موقعیت جمله در متن، و ویژگی طول جمله انجام شد. با توجه به هر یک از این معیارها، وزنی به هر یک از جملات خلاصه اختصاص داده و ماتریس شباهت ایجاد شد. سپس، خروجی سیستم استخراج توسط دو الگوریتم فرااکتشافی ژنتیک و جستجوی فاخته برای رسیدن به خلاصه‌ نهایی پردازش شد. درنهایت، خروجی به‌دست‌آمده از مرحله قبل به‌کمک ابزار ارزیابی rouge و مقایسه با خلاصه‌های انسانی تحلیل شدند. یافته‌ها: میانگین همه مقادیر به‌دست‌آمده از ابزار ارزیابی rouge در محاسبه میزان هم‌پوشانی نمونه‌های مشترک خلاصه‌های انسانی و خلاصه ماشینی توسط الگوریتم جستجوی فاخته بیشتر از مقادیر به‌دست‌آمده توسط الگوریتم ژنتیک و همچنین سامانه خلاصه‌ساز برخط ایجاز بودند. از میان هشت معیار موجود در این ابزار، دو معیار ارزیابی طولانی‌ترین زیررشته مشترک با مقدار 0.33 و تعداد لغات مشابه در متن با مقدار 0.40 نتایج بهتری نسبت به بقیه معیارها داشتند. نتیجه‌گیری: نتایج حاصل از مقایسه دو الگوریتم به‌کاررفته، حاکی از عملکرد بهتر الگوریتم جستجوی فاخته در هر یک از معیارهای ابزار rouge است. از طرفی مقایسه زمانی نتایج نشان می‌دهد که میانگین زمانی محاسبه‌شده برای خلاصه‌سازی توسط سیستم پیشنهادی با الگوریتم جستجوی فاخته کمتر است.
کلیدواژه	خلاصه‌سازی خودکار متن، خلاصه استخراجی، الگوریتم‌های فرااکتشافی، الگوریتم ژنتیک، الگوریتم جستجوی فاخته، ابزار ارزیابی rouge
آدرس	دانشگاه گلستان, دانشکده فنی و مهندسی, ایران, دانشگاه گلستان, دانشکده فنی و مهندسی, گروه کامپیوتر, ایران, دانشگاه گلستان, دانشکده فنی و مهندسی, گروه کامپیوتر, ایران
پست الکترونیکی	m.yaghoubi@gu.ac.ir

Automatic Persian MultiText Summarization Techniques based on MetaHeuristic Algorithms

Authors	Ahangari fatemeh ,karbasi soheila ,Yaghoubi mehdi
Abstract	Purpose: The main objective of this study is to present a pattern for standard summarization of Persian texts with the approach of converting the problem to optimization problem by compatible metaheuristic algorithms. Methodology: In this research, standard multitext "Pasokh" collection, which contains 50 different types of news from the most popular news agencies in Iran, each containing 20 documents, as well as 5 summaries of abstractive and 5 extractive, used for evaluation. First, the preprocessing performed on the input texts and the initial summary generated with TFISF benchmark, readability and consistency criteria of the sentences, similarity to the title, position of the sentence in the text, and the length of the sentence. With respect to each of these criteria, weighting function assigned to extracted sentences and a similarity matrix created. Then, output of the extraction system processed by Genetic algorithm and Cuckoo search algorithm for the final summary. Eventually, the output obtained from the previous step analyzed using the Rouge evaluation tools and the comparison with the human abstracts. Findings: The average of all values obtained in Rouge evaluation tools for calculation the overlapping of common samples of human summaries and machine summaries by Cuckoo search algorithm were higher than the values obtained by Genetic algorithm as well as Ijaz online summarizer system. Meanwhile, among the eight criteria, the longest common subsentence with a value of 0.33 and the number of common words in the text with 0.40 were better than the rest. Conclusion: The results of the comparison of two algorithms indicate that the Cuckoo search algorithm is better in the entire criteria. On the other hand, comparing the results suggests that the average time calculated for summarizing by the proposed system is also less.
Keywords