>
Fa   |   Ar   |   En
   a transformer-based approach for persian text chunking  
   
نویسنده kavehzadeh parsa ,abdollah pour mohammad mahdi ,momtazi saeedeh
منبع journal of ai and data mining - 2022 - دوره : 10 - شماره : 3 - صفحه:373 -383
چکیده    Over the last few years, text chunking has taken a significant part in the sequence labeling tasks. although a large variety of methods have been proposed for shallow parsing in english, most of the proposed approaches for text chunking in the persian language are based on the simple and traditional concepts. in this paper, we propose using the state-of-the-art transformer-based contextualized models, namely bert and xlm-roberta, as the major structure of our models. conditional random field (crf), a combination of bidirectional long short-term memory (bilstm) and crf, and a simple dense layer are employed after the transformer-based models in order to enhance the model's performance in predicting the chunk labels. moreover, we provide a new dataset for noun phrase chunking in persian, which includes the annotated data of persian news text. our experiments reveal that xlm-roberta achieves the best performance between all the architectures tried on the proposed dataset. the obtained results also show that using a single crf layer would yield better results than a dense layer, and even the combination of bilstm and crf.
کلیدواژه persian text chunking ,sequence labeling ,deep learning ,contextualized word representation
آدرس amirkabir university of technology, computer engineering department, iran, amirkabir university of technology, computer engineering department, iran, amirkabir university of technology, computer engineering department, iran
پست الکترونیکی momtazi@aut.ac.ir
 
     
   
Authors
  
 
 

Copyright 2023
Islamic World Science Citation Center
All Rights Reserved