پایه‌گذاری بستری نو و کارآمد در حوزه بازشناسی گفتار فارسی

Fa | Ar | En

پایه‌گذاری بستری نو و کارآمد در حوزه بازشناسی گفتار فارسی


نویسنده	باباعلی باقر
منبع	پردازش علائم و داده ها - 1395 - دوره : 13 - شماره : 3 - صفحه:51 -62
چکیده	برخلاف پیشینۀ سی سالۀ پژوهش در حوزۀ بازشناسی گفتار فارسی در ایران و دست یافتن به پیشرفت های در خور توجه، نتایج عمده کارهای انجام شده به دلیل عدم وجود بستر یکسان، قابل مقایسه و ارزیابی دقیق نیستند. بستر بیش تر شامل سامانۀ بازشناسی و دادگان با تعریف مشخص مجموعه های آموزش، توسعه و ارزیابی است. سامانۀ متن باز کلدی با وجود نوظهور بودن آن ویژگی های منحصر به فردی دارد که در سال های اخیر مورد توجه اکثر آزمایشگاه های تراز نخست پردازش گفتار دنیا قرار گرفته است و با لحاظ همه جوانب، بهترین انتخاب موجود در راستای پایه گذاری این بستر برای تمامی زبان ها از جمله زبان فارسی است. در این مقاله پس از بررسی خصوصیات، توانمندی ها و اجزای مختلف نرم افراز کلدی؛ دادگان فارس دات را به دلیل ثبت رسمی و قابل دسترس بودن آن برای همگان از سراسر دنیا به عنوان بخش دیگر این بستر انتخاب کرده و به تاسی از انتخاب انجام شده بر روی دادگان timit به تعریف مجموعه های آموزش، توسعه و ارزیابی می پردازیم. در نهایت بیش تر قریب به اتفاق تکنیک ها و روش های موجود در کلدی بر روی دادگان فارس دات، مطابق تعریف صورت گرفته، مورد آزمایش قرار گرفته اند. بهترین میزان خطای حاصل در بازشناسی واج برای مجموعه توسعه 3/20 درصد و برای مجموعه آزمون 8/19 بوده است. دسترسی به کدهای نوشته در جهت فراهم سازی این بستر، در نرم افزار کلدی موجود است که با توجه به متن باز بودن آن، دسترسی به آنها به منظور بازسازی نتایج آمده در این مقاله در صورت در اختیارداشتن دادگان فارس دات به راحتی قابل انجام است.
کلیدواژه	بازشناسی گفتار پیوسته فارسی، دادگان فارس دات، نرم‌افزار متن‌باز کلدی.
آدرس	دانشگاه تهران, دانشکده ریاضی، آمار و علوم کامپیوتر, ایران
پست الکترونیکی	bagher.babaali@gmail.com

Hyper-Spectral Data Feature Extraction Using Rational Function Curve Fitting

Authors	BabaAli Bagher
Abstract	Although researches in the field of Persian speech recognition claim a thirtyyearold history in Iran which has achieved considerable progresses, due to the lack of welldefined experimental framework, outcomes from many of these researches are not comparable to each other and their accurate assessment won rsquo;t be possible. The experimental framework includes ASR toolkit and speech database which consists of training, development and test datasets. In recent years, as a stateoftheart opensource ASR toolkit; Kaldi has been very wellreceived and welcomed in the community of the worldranked speech researchers around the world. considering all aspects, Kaldi is the best option among all of the other ASR toolkits to establish a framework to do research in all languages, including Persian. In this paper, we chose Fardat as the speech database which is the counterpart of TIMIT for Persian language because not only it has got a standard form but it rsquo;s also accessible for all researchers around the world. Similar to the recipe on TIMIT database, we defined these three sets on the Farsdat: Training, Development and Test sets. After a survey on Kaldi rsquo;s components and features, we applied most of stateoftheart ASR techniques in the Kaldi on the Farsdat based on three sets definition. The best phone error rate on development and test set have been 20.3% and 19.8%. All of the codes and the recipe that was written by author have been submitted to Kaldi repository and they are accessible for free, so all the reported results will be easily replicable if you have access to Farsdat database.
Keywords	Persian Continuous Speech Recognition ,FarsDat Database ,Kaldi Toolkit