ارائه مدلی برای تشخیص شایعات فارسی مبتنی بر تحلیل ویژگی‌های محتوایی در متن شبکه‌های اجتماعی

Fa | Ar | En

ارائه مدلی برای تشخیص شایعات فارسی مبتنی بر تحلیل ویژگی‌های محتوایی در متن شبکه‌های اجتماعی


نویسنده	جهانبخش نقده زلیخا ,فیضی درخشی محمد رضا ,شریفی آرش
منبع	پردازش علائم و داده ها - 1400 - شماره : 1 - صفحه:50 -29
چکیده	شایعه یک تلاش جمعی است که در آن از قدرت واژگان برای تفسیر یک موقعیت مبهم ولی جذاب استفاده می شود؛ بنابراین، شناسایی زبان شایعه می تواند در تشخیص شایعات کمک کننده باشد. پژوهش‌های پیشین برای حل مساله تشخیص شایعه بیشتر بر روی اطلاعات متنی موجود در ریتوییت و توییت پاسخ کاربران و کمتر بر روی متن اصلی شایعه متمرکز شده اند. اغلب این پژوهش‌ها بر روی زبان انگلیسی بوده و کارهای محدودی در زبان فارسی انجام شده است؛ از این رو، این مقاله تنها با تمرکز برروی متن اصلی شایعات فارسی و معرفی ویژگی هایی با ارزش اطلاعات محتوایی بالا، مدلی مبتنی بر ویژگی های محتوایی فیزیکی و غیرفیزیکی برای تشخیص شایعات فارسی منتشر‌شده برروی توییتر و تلگرام ارائه می‌کند. مدل پیشنهادی شایعات فارسی مجموعه‌داده توییتر را با معیارf - 0.848، شایعات مجموعه‌داده زلزله کرمانشاه را با معیارf- 0.952 و شایعات تلگرامی را با معیارf -0.867 شناسایی کرده است؛ که نشان‌دهنده توانمندی مدل پیشنهادی برای شناسایی شایعات تنها با تمرکز بر ویژگی های محتوایی متن شایعه منبع است.
کلیدواژه	تشخیص شایعات فارسی، تحلیل محتوی، ویژگی‌های محتوایی فیزیکی و غیرفیزیکی، پردازش متن
آدرس	دانشگاه آزاد اسلامی واحد علوم و تحقیقات تهران, گروه مهندسی رایانه, ایران, دانشگاه تبریز, دانشکده مهندسی برق و کامپیوتر, گروه مهندسی رایانه, ایران, دانشگاه آزاد اسلامی واحد علوم و تحقیقات تهران, گروه مهندسی رایانه, ایران

A Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks

Authors	Jahanbakhsh-Nagadeh Zoleikha ,Feizi-Derakhshi Mohammad-Reza ,Sharifi Arash
Abstract	The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in the English language, but more limited work has been done in the Persian language to detect rumors. This study analyzed the content of the original rumor and introduced informative content features to early identify Persian rumors (i.e., when it is published on news media but has not yet spread on social media) on Twitter and Telegram. Therefore, the proposed model is based on physical and nonphysical content features in three categories including, lexical, syntactic, and pragmatic. These features are a combination of the common content features along with the proposed new contentbased features. Since no social context information is available at the time of posting rumors, the proposed model is independent of propagationbased features and relies on the contentbased information of the original rumor. Although in the proposed model, much information (including user information, the userchr('39')s reaction to the rumor, and propagation structures) are ignored, but helpful content information can be obtained for classification by content analysis of the original rumor.Several experiments have been performed on the various combinations of feature sets (i.e., common and proposed content features) to explore the capability of features in distinguishing rumors and nonrumors separately and jointly. To this end, three machine learning algorithms including, Random Forest (RF), AdaBoost, and Support Vector Machine (SVM) have been used as strong classifications to evaluate the accuracy of the proposed model. To achieve the best performance of classification algorithms on the training dataset, it is necessary to use feature selection techniques. In this study, the Sequential Forward Floating Search (SFFS) approach has been used to select valuable features. Also, the statistical results of the ttest on the Pvalue (<=0.05) demonstrate that most of the new features proposed in this study reveal statistically significant differences between rumor and nonrumor documents. The experimental results are shown the performance of new proposed features to improve the accuracy of the rumor detection. The Fmeasure of the proposed model to detect Persian rumors on the Twitter dataset was 0.848, on the Kermanshah earthquake dataset was 0.952 and on the Telegram dataset was 0.867, which indicated the ability of the proposed method to identify rumors only by focusing on the content features of the original rumor text. The results of evaluating the proposed model on Twitter rumors show that, despite the short length of Twitter tweets and the extraction of limited content information from tweets, the proposed model can detect Twitter rumors with acceptable accuracy. Hence, the ability of content features to distinguish rumors from nonrumors is proven.
Keywords	Persian rumors detection ,Content analysis ,Physical and non-physical content features ,Text processing