ارائه روشی مبتنی بر ژنتیک برای رفع ابهام نام نویسندگان مقالات

Fa | Ar | En

ارائه روشی مبتنی بر ژنتیک برای رفع ابهام نام نویسندگان مقالات


نویسنده	مظفری نیلوفر
منبع	پژوهشنامه پردازش و مديريت اطلاعات - 1400 - دوره : 36 - شماره : 3 - صفحه:791 -816
چکیده	امروزه، با افزایش روزافزون حجم مقالات از یک طرف و استفاده از اینترنت و خدمات موتورهای جست‌وجو از طرف دیگر، روش‌های ابهام‌زدایی از اسامی پژوهشگران بسیار مورد توجه قرار گرفته است. تاکنون روش‌های مختلفی برای حل این مشکل ارائه شده که هر یک مزایا و معایب خاص خود را دارند. هدف این مقاله ارائه راهکاری جهت شناسایی رکوردهای متعددی است که به یک نویسنده تعلق دارند. بدین‌منظور بعد از استخراج ویژگی‌های داخلی و خارجی نویسندگان، یک معیار جدید جهت مشخص‌کردن میزان مشابهت میان دو رکورد ارائه شده است. اهمیت هر یک از ویژگی‌های ارائه‌شده با استفاده از الگوریتمی مبتنی بر ژنتیک با دو تابع برازش مختلف تعیین می‌شود تا از طریق یادگیری نمونه‌های موجود بهینه‌ترین ضرایب به‌دست آید. روش پیشنهادی با دو تابع برازش روی داده‌های آزمایشی مورد ارزیابی و مقایسه قرار گرفته و نتایج حاصل نشان‌دهنده افزایش دقت در روش پیشنهادی با هر دو تابع برازش نسبت به روش‌ قبلی است.
کلیدواژه	ابهام نام نویسندگان، فاصله لونشتین، الگوریتم ژنتیک، تابع برازش
آدرس	پایگاه استنادی علوم جهان اسلام, ایران. مرکز منطقه ای اطلاع رسانی علوم و فناوری, ایران
پست الکترونیکی	mozafari@ricest.ac.ir

A Genetic-based Approach for Author Name Disambiguation Problem

Authors	Mozafari Niloofar
Abstract	In the recent years, with the increasing volume of articles and the use of Internet and search engine services, the author name disambiguation problem has received a lot of attention. Name disambiguation can occur when one is seeking a list of publications of an author who has used different name variations and also when there are multiple other authors with the same name. So far, various methods have been proposed to solve this problem, each of which has its own advantages and disadvantages. Despite years of research, the name disambiguation problem remains largely unresolved. In this study, we propose an algorithm to identify several records that belong to one author. For this purpose, a new criterion has been proposed to determine the similarity between the two records. Since this study addresses the approximate matching of authors rsquo; records, the importance of the fields in each record is determined by the coefficients. In order to get the optimal coefficients, we propose a genetic algorithm to learn from the available samples. The proposed method has been evaluated with two fitness functions on experimental data and the results are promising.
Keywords	Name Disambiguation Problem ,Levenshtein Distance ,Genetic Algorithm ,Fitness Function