>
Fa   |   Ar   |   En
   parsner-social: a corpus for named entity recognition in persian social media texts  
   
نویسنده asgari-bidhendi majid ,janfada behrooz ,roshani talab omid reza ,minaei-bidgoli behrouz
منبع journal of ai and data mining - 2021 - دوره : 9 - شماره : 2 - صفحه:181 -192
چکیده    Named entity recognition (ner) is one of the essential prerequisites for many natural language processing tasks. all public corpora for persian named entity recognition, such as parsnercorp and armanpersonercorpus, are based on the bijankhan corpus, which is originated from the hamshahri newspaper in 2004. correspondingly, most of the published named entity recognition models in persian are specially tuned for the news data and are not flexible enough to be applied in different text categories, such as social media texts. this study introduces parsner-social, a corpus for training named entity recognition models in the persian language built from social media sources. this corpus consists of 205,373 tokens and their ner tags, crawled from social media contents, including 10 telegram channels in 10 different categories. furthermore, three supervised methods are introduced and trained based on the parsner-social corpus: two conditional random field models as baseline models and one state-of-the-art deep learning model with six different configurations are evaluated on the proposed dataset. the experiments show that the mono-lingual persian models based on bidirectional encoder representations from transformers (mlbert) outperform the other approaches on the parsner-social corpus. among different configurations of mlbert models, the parsbert+bert-tokenclass model obtained an f1-score of 89.65%.
کلیدواژه named entity recognition ,natural language processing ,social media corpus ,persian language
آدرس iran university of science and technology, computer engineering school, iran, iran university of science and technology, computer engineering school, iran, iran university of science and technology, computer engineering school, iran, iran university of science and technology, school of computer engineering, iran
پست الکترونیکی b_minaei@iust.ac.ir
 
     
   
Authors
  
 
 

Copyright 2023
Islamic World Science Citation Center
All Rights Reserved