|
|
A comparative analysis of text classification for Turkish language
|
|
|
|
|
نویسنده
|
yildirim savaş ,yildiz tuğba
|
منبع
|
pamukkale university journal of engineering sciences - 2018 - دوره : 24 - شماره : 5 - صفحه:879 -886
|
چکیده
|
Text categorization plays important role in the field of natural language processing. recently, the rapid growth in the amount of textual data and requirement of automatic annotation makes the problem of text categorization more important. as a prominent one of the traditional methods, the bag-of-words approach has been successfully applied to text categorization problem for years. recently, neural network language models (nnlm) have achieved successful results for various problems of natural language processing (nlp). the most important advantage of the nnlm is to provide effective word and document representations. those representations are lower dimensional and are found to be more effective than traditional methods. they have been exploited successfully for semantic and syntactic analysis. on the other hand, the traditional bag-of-words approaches that use one-hot long vector representation are still considered powerful in terms of their accuracy in document classification. however, comparing these approaches for turkish language has not been attempted before. in this study, we compared them within a variety of analysis. we observed that the traditional bag-of-word representation utilizing an effective feature selection and a machine learning algorithm aligned with it have comparable performance with new generation vector based methods, namely word embeddings. in this study, we have conducted various experiments comparing these approaches and designated an effective text categorization architecture for turkish language.
|
کلیدواژه
|
Text classification ,Machine learning ,Artificial neural network
|
آدرس
|
istanbul bilgi üniversitesi, mühendislik ve doğa bilimleri fakültesi, bilgisayar mühendisliği, Turkey, istanbul bilgi üniversitesi, mühendislik ve doğa bilimleri fakültesi, bilgisayar mühendisliği, Turkey
|
پست الکترونیکی
|
tugba.dalyan@bilgi.edu.tr
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|