|
|
Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language
|
|
|
|
|
نویسنده
|
Namdar Saman ,Faili Hesham ,Khadivi Shahram
|
منبع
|
international journal of information and communication technology research - 2012 - دوره : 5 - شماره : 1 - صفحه:39 -52
|
چکیده
|
Statistical machine translation (smt) is a machine translation paradigm, in which translations aregenerated on the base of statistical models. in this system, parameters are derived from an analysis of a parallelcorpus, and smt quality depends on the ability of learning word translations. enriching the smt by a suitablemorphology analyser decreases out of vocabulary words and dictionary size dramatically. this could be moreconsiderable when it deals with a highly-inflectional, low-resource, language like persian. defining a suitablegranularity for word segment may improve the alignment quality in the parallel corpus. in this paper differentschemes and word’s combinations segments in a smt’s experiment from persian to english language are prospectedand the best one-to-one alignment, which is called en-like scheme, is proposed. by using the mentioned scheme thetranslation’s quality from persian to english is improved about 3 points with respect to bleu measure over thephrase-based smt.
|
کلیدواژه
|
Statistical Machine Translation ,Segmentation Schemes ,Lexical Granularities ,Morpheme ,Persian Language
|
آدرس
|
university of tehran, NLP Lab, School of ECE,, ایران, university of tehran, NLP Lab, School of ECE,, ایران, amirkabir university of technology, NLP Lab, Computer Engineering & IT Department, ایران
|
پست الکترونیکی
|
khadivi@aut.ac.ir
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|