استفاده از مناطق شاخص زیر کلمات چاپی فارسی برای کاهش فضای جستجو در بازشناسی آنها

Fa | Ar | En

استفاده از مناطق شاخص زیر کلمات چاپی فارسی برای کاهش فضای جستجو در بازشناسی آنها


نویسنده	داودی هما ,کبیر احسان اله
منبع	مهندسي برق و مهندسي كامپيوتر ايران - 1393 - دوره : 12 - شماره : 1 - صفحه:1 -11
چکیده	در روش‏های رایج برای کاهش اندازه دیکشنری، معمولاً مجموعه کلمات بر اساس ویژگی‏های شکل کلی‌شان خوشه‏‏بندی می‏شوند. سپس، هر کلمه‏ ورودی به این خوشه‏ها طبقه‏بندی می‏شود. با توجه به تاثیر مستقیم این مرحله در نتیجه نهایی سیستم بازشناسی، کاهش دیکشنری باید با دقت بالایی انجام شود. به این منظور در این مقاله روشی را برای تایید ارائه می‏کنیم که میزان اطمینان به خوشه انتخابی را تعیین می‏کند. میزان اطمینان به خوشه انتخابی بر اساس ویژگی‏های محلی شکل تعیین می‏شود. بردارهای ویژگی محلی از شکل زیر کلمه ورودی استخراج شده و با مناطق شاخص متناظر با خوشه انتخابی مقایسه می‏شود. مناطق شاخص یک خوشه، مناطقی از شکل هستند که زیر کلمات آن خوشه را از سایر خوشه ها متمایز می‌کنند و در انتها روش تایید پیشنهادی به همراه مجموعه‏ای از قوانین برای کاهش اندازه دیکشنری به کار می‏رود. آزمایش‌های انجام‌شده بر مجموعه شکل‏های زیر کلمات فارسی نشان می‏دهد با روش پیشنهادی این مقاله می‏توان با حفظ دقت، فضای جستجو را تا حد قابل توجهی کاهش داد.
کلیدواژه	تایید، توصیف گر شکل، زیر- کلمات چاپی، شکل کلمات، طبقه بندی،منطقه شاخص.
آدرس	دانشگاه تربیت مدرس, دانشکده مهندسی برق و کامپیوتر, ایران, دانشگاه تربیت مدرس, دانشکده مهندسی برق و کامپیوتر, ایران
پست الکترونیکی	kabir@modares.ac.ir

Using Prominent Regions in Search Space Reduction for Recognition of Printed Farsi Subwords

Authors	Kabir E. ,Davoudi H.
Abstract	In the most common Lexicon reduction methods, lexicon words are clustered based on their holistic shape features and then each query word image is classified into the closest cluster. As the errors at this stage propagate to the subsequent stages, relevant clusters should be selected with a high degree of accuracy. In this paper we present a novel verification method which decides on the validity of the recognized clusters based on a proposed confidence measure. The level of confidence to the selected clusters is measured using local shape features in the verification phase, where it is determined that the selected cluster is acceptable or not. For this purpose, some local shape features of the input subword image are compared to the “prominent regions” of the corresponding cluster. The prominent regions of a cluster are some local regions that discriminate the members of that cluster compared to the other clusters. The proposed verification method along with some predefined rules is used to reduce the lexicon size of Farsi subwords. The experiments conducted on a set of 6895 common Farsi subwords show that our proposed method significantly reduces the search space while preserving the accuracy in an acceptable rate.
Keywords