|
|
|
|
empirical evaluation of well-known farsi ocr engines on the idpl-pfod dataset
|
|
|
|
|
|
|
|
نویسنده
|
hosseini fatemeh-sadat ,shabaninia elham ,nezamabadi-pour hossein
|
|
منبع
|
power, control, and data processing systems - 2025 - دوره : 2 - شماره : 1 - صفحه:35 -47
|
|
چکیده
|
Optical character recognition (ocr), also referred to as text recognition, extracts text from scanned documents, camera images, etc. ocr has numerous applications in reading forms and cheques; converting archived documents to digital files, reading books and papersetc. an accurate ocr system speeds up these processes by removing time-consuming user tasks. however, ocr is challenging especially in languages such as farsi due to the intrinsic characteristics of this language and limited resources such as suitable datasets to evaluate the effectiveness and efficiency of proposed methods. idpl-pfod is a new synthetic farsi printed dataset that offers a wide range of variations including different backgrounds, font types, distortions, blurs, etc. therefore, in this paper, two ocr engines, tesseract and easyocr are evaluated on the idpl-pfod dataset to show the limitations of existing ocr engines for farsi. evaluations using standard metrics reveal that tesseract and easyocr respectively achieve an overall accuracy of 84.41% and 73.28% on this dataset. furthermore, the robustness of these two engines is evaluated against different variations such as textured background, salt & pepper noise, gaussian blur, and distortions. this paper provides valuable insights to the community by reviewing the current challenges of deep learning methods for farsi ocr and serving as a foundation for further research and advancements in the future.
|
|
کلیدواژه
|
idpl-pfod dataset ,optical character recognition (ocr) ,easyocr ,tesseract ,ocr engines ,farsi ocr
|
|
آدرس
|
shahid bahonar university of kerman, department of electrical engineering, iran, graduate university of advanced technology, department of applied mathematics, iran, shahid bahonar university of kerman, department of electrical engineering, iran
|
|
پست الکترونیکی
|
nezam_h@yahoo.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|