تصحیح خودکار غلط‌های املایی در متون سونوگرافی فارسی با استفاده از شبکه‌های عصبی

Fa | Ar | En

تصحیح خودکار غلط‌های املایی در متون سونوگرافی فارسی با استفاده از شبکه‌های عصبی


نویسنده	دشتی محمدصادق ,خطیبی بردسیری عمید ,جعفری شهباززاده مهدی
منبع	پياورد سلامت - 1403 - دوره : 18 - شماره : 1 - صفحه:19 -31
چکیده	زمینه و هدف: گزارش‌های پزشکی و پرونده‌های الکترونیک سلامت برای تشخیص و درمان بیماران و تحقیقات پزشکی اهمیت فراوان دارند. تصحیح غلط‌های املایی موجود در متون پزشکی برای اطمینان از تفسیر صحیح اطلاعات امری ضروری است. این پژوهش برای تصحیح خودکار متون پزشکی زبان فارسی به کمک شبکه‌های عصبی انجام پذیرفته است.روش بررسی: در این پژوهش که در سال 1402 انجام شد، مدل کامپیوتری جدیدی مبتنی بر شبکه‌های عصبی مصنوعی و تکنیک جای‌گذاری دوگانه با استفاده از زبان برنامه‌نویسی پایتون در محیط ویندوز توسعه یافت. مدل جای‌گذاری دوگانه کلمات به طور خاص برای تصحیح املا در حوزه متون سونوگرافی فارسی تنظیم شد. مدل پیشنهادی، از تکنیک‌های متنوعی برای تشخیص خودکار خطا، از جمله تطابق با فرهنگ واژگان ومحاسبه میزان مشابهت متنی استفاده می‌کند. همچنین برای انتخاب خودکار مناسب‌ترین کلمه جایگزین با غلط‌های املایی، از ویژگی‌های خاصی همچون فاصله ویرایش(edit-distance)، همراه با امتیاز مشابهت استفاده شده است. داده‌های آموزش و آزمایش مدل جاری، بخشی از مجموعه متون کلینیک سونوگرافی بیمارستان امام خمینی تهران است.یافته‌ها: مدل پیشنهادی بر اساس شبکه‌های عصبی مصنوعی توسعه یافته و از یک معماری جدید جای‌گذاری دوگانه کلمات جهت انتخاب بهترین کلمات کاندید، به منظور جایگزینی با غلط‌های املایی و معنایی بهره می‌برد. مطابق بررسی انجام شده بر روی متون سونوگرافی فارسی، دقت مدل پیشنهادی بر حسب معیار-f(f-measure) در تشخیص و تصحیح خودکار خطاهای معنایی به ترتیب برابر با 90/5% و 90% می‌باشد. به علاوه، دقت 90/8% در زمینه تصحیح خطاهای شکلی کسب گردید.نتیجه‌گیری: مطابق نتایج ارزیابی، روش پیشنهادی می‌تواند به طور موثر طیف گسترده‌ای از خطاهای شکلی و معنایی، از جمله جایگزینی، جابه‌جایی، درج و حذف را در متون پزشکی مدیریت کند. استفاده و ادغام معیار فاصله ویرایش با امتیاز مشابهت متنی مستخرج از مدل جای‌گذاری دوگانه به‌طور قابل‌توجهی دقت تصحیح غلط‌های املایی را در متون سونوگرافی فارسی افزایش داده که این امر متضمن صحت بیش‌تر محتوای این گونه اسناد خواهد بود. به باور نویسندگان، مدل پیشنهادی، پیشرفت قابل‌توجهی در زمینه‌ی تشخیص و تصحیح غلط‌های املایی برای متون سونوگرافی زبان فارسی است.
کلیدواژه	تصحیح خطا، جای‌گذاری عصبی، شبکه‌های عصبی، متون سونوگرافی، پردازش زبان فارسی
آدرس	دانشگاه آزاد اسلامی واحد کرمان, دانشکده علوم پایه, ایران, دانشگاه آزاد اسلامی واحد کرمان, دانشکده علوم پایه, گروه مهندسی کامپیوتر, ایران, دانشگاه آزاد اسلامی واحد کرمان, دانشکده فنی و مهندسی, گروه مهندسی برق, ایران

automatic spelling correction in persian sonography text with neural networks

Authors	dashti mohammad sadegh ,khatibi bardsiri amid ,jafari shahbazzadeh mehdi
Abstract	background and aim: medical reports and electronic health records are critically important for diagnosis, treatment, patient protection, and medical research. correcting spelling errors in medical texts is essential to ensure accurate interpretation of information. this research was conducted to automatically correct spelling mistakes in persian medical texts using neural networks.material and methods: in this study, which was conducted in 2023, a computational model based on artificial intelligence neural networks and dual embedding techniques was developed using python in a windows environment. the dual embedding model was fine-tuned for correcting spelling errors in persian sonography texts. the proposed model employs various techniques for automatic error detection, including dictionary lookup approach and contextual similarity coefficients. furthermore, features specific to text processing, such as edit-distance, along with similarity coefficients, were utilized to automatically select the most appropriate substitute for a misspelled word. the training and testing data for the current model were sourced from a collection of sonography texts from the imam khomeini hospital’s sonography clinic in tehran.results: the proposed model which is based on artificial neural networks, leverages a novel dualembedding architecture to select the best candidate words for correcting both non-word and real-word errors. according to the evaluation results on persian sonography text, the proposed model achieved an f-measure accuracy of 90.5% in detecting real-word errors. furthermore, it demonstrated an impressive 90% accuracy in automatically correcting these real-word errors. additionally, the model exhibited a strong performance, achieving 90.8% accuracy in correcting non-word errors.conclusion: based on the evaluation results, the proposed method is robust against various changes in word forms and can effectively manage a wide range of morphological and semantic errors, including replacements, transpositions, insertions, and deletions in medical texts. the integration of editdistance with textual similarity coefficients extracted from the dual embedding model significantly enhanced the accuracy of spelling corrections in persian sonography texts, ensuring greater validity of such documents. the authors believe that the proposed model represents a significant advancement in the detection and correction of spelling errors in persian sonography texts.
Keywords	spelling correction ,neural embeddings ,neural networks ,radiology reporting ,persian language processing