|
|
A novel feature extraction approach for text-based language identification: Binary patterns [Doköman Dili tanima için yeni bir öznitelik çikarim yaklaşimi: Ikili desenler]
|
|
|
|
|
نویسنده
|
kaya y. ,ertuǧrul ö.f.
|
منبع
|
journal of the faculty of engineering and architecture of gazi university - 2016 - دوره : 31 - شماره : 4 - صفحه:1085 -1094
|
چکیده
|
Language identification (li),which is a major task in natural language processing,is the process of determining the language from a given content. in this paper,a novel approach,which is based on the probability of the use of the characters that have the similar orders with respect to their utf-8 values,was proposed. in order to evaluate and validate the proposed approach,four datasets,which contain texts in different numbers of languages,were employed. in the proposed approach,the features that were exacted by one-dimensional local binary pattern (1d-lbp) method were classified by various machine learning methods. achieved li accuracies in each of four employed datasets were 86.20%,92.75%,100% and 89.77%,respectively. the results showed that the proposed approach yields high success rates and it is an efficient way of language identification.
|
کلیدواژه
|
Feature extraction; Natural language processing; One dimensional local binary patterns; Text-based language identification
|
آدرس
|
siirt üniversitesi,bilgisayar möhendisliǧi bölömö,kezer kampösö,siirt, Turkey, batman üniversitesi,elektrik-elektronik möhendisliǧi bölömö,batiraman kampösö,batman, Turkey
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|