پیاده‌سازی مدل بازیابی خبرگان با استفاده از روش تحلیل معنای نهان و گراف زمان‌دار

Fa | Ar | En

پیاده‌سازی مدل بازیابی خبرگان با استفاده از روش تحلیل معنای نهان و گراف زمان‌دار


نویسنده	رضوانی شهلا ,نقشینه نادر ,خلیلی جعفرآباد احمد
منبع	پژوهشنامه كتابداري و اطلاع رساني - 1402 - دوره : 13 - شماره : 1 - صفحه:226 -245
چکیده	مقدمه: خبره یابی شناسایی افراد با دانش و مهارت کافی در زمینه‌ای خاص و معرفی آنها به‌عنوان خبره در آن زمینه است. بازیابی افراد خبره زیرمجموعه‌ای از بازیابی اطلاعات است که هدف آن ارائه رتبه‌بندی از افرادی است که دارای دانش درزمینۀ خاصی هستند. کار خبره یابی خودکار به‌دلیل فراوان‌بودن اطلاعات خبرگی و منابع داده چالش‌برانگیز است. هدف این پژوهش مقایسه عملکرد خبره‌یابی مدل بازیابی اطلاعات تحلیل معنای نهان و نیز گراف زمان دار با مدل پایه بود. روش‌شناسی: روش پژوهش تجربی است و در کنار آن از روش کتابخانه ای نیز استفاده شده است. روشی که در پژوهش حاضر برای بازیابی مقالات استفاده می‌شود ال‌اس‌ای یا بازیابی معنای نهان است که بر روی مقالات مجموعه آزمون تهیه‌شده از وب‌آو‌ساینس پیاده شد. این اسناد شامل مقالات انگلیسی علم اطلاعات و دانش شناسی است که از 1989 تا 2018 در پایگاه وب‌آوساینس در ذیل مقوله علم اطلاعات و دانش‌شناسی نمایه شده است. تعداد کل این مقالات 126924، پرس وجوهای ساخته‌شده توسط کاربران به همه این مقالات عرضه شد. اسناد بازیابی‌شده مورد قضاوت ربط قرار گرفتند و پس از انجام قضاوت ربط اسناد توسط شرکت‌کنندگان در پژوهش، عملکرد مدل‌ بازیابی اطلاعات توسط سنجه‌های ارزیابی نظام‌های بازیابی اطلاعات اندازه‌گیری شد. سنجه‌های ارزیابی که در پژوهش حاضر مورداستفاده قرار گرفتند عبارت‌اند از میانگین متوسط دقت، میانگین معکوس رتبه، و دقت در سطح پنج نتیجه اول بازیابی شده. حاصل سنجه های محاسبه‌شده با مقدار هر یک از این سنجه‌ها در مدل پایه مقایسه شد. برای دخالت دادن عامل زمان از گراف زمان دار استفاده گردید. پس از دخالت دادن عامل زمان نویسندگانی که بیشترین کار مرتبط و نیز شاخص خرد شبکه اجتماعی‌ آنها بیشتر بود به‌عنوان خبره معرفی گردید. سپس ده پرس‌وجو از مدل پژوهش حاضر و مدل پایه به‌طور تصادفی ساده انتخاب گردید و برای قضاوت در اختیار هشت نفر از افرادی که توسط جامعه دوم معرفی گردید قرار گرفت و نتایج حاصل باهم مقایسه گردید. یافته‌ها: میزان به‌دست‌آمده از هر یک از سنجه های بازیابی اطلاعات یعنی میزان دقت در سطح پنج نتیجه اول، میانگین متوسط دقت (map) و میانگین معکوس رتبه (mrr) به‌ترتیب با مقدار 0.895، 0.839 و 0.909، مدل بازیابی تحلیل معنای نهان عملکرد بهتری نسبت به مدل پایه داشت؛ و این امر به‌دلیل بهتربودن عملکرد بازیابی به‌روش کاهش ابعاد نسبت به تطابق کلیدواژه‌ای است. چون در این روش از نمایه‌سازی معنای نهان استفاده می شود که نوعی نمایه‌سازی مفهومی است و از روش آماری حداقل مربعات بهره می برد و نمایه‌سازی ذکرشده با به‌کارگیری این روش آماری استخراج می شود طبق تعریف پژوهشگران، خبره کسی است که بیشترین کار مرتبط با مجموع پرس وجوها در ده سال اخیر را داشته و دارای بالاترین مقدار در مرکزیت درجه ای، نزدیکی، بینابینی و بردار ویژه باشد. تعداد 10 پرس‌وجو از هر پژوهش به‌طور مجموع 20 پرس وجو به‌صورت اتفاقی انتخاب گردید و به خبرگان مشخص‌شده هر پژوهش توسط جامعه آماری سوم نمره صفر یا یک داده شد. مجموع نمرات برای هرکدام نشان می دهد دخالت‌دادن عامل زمان و استفاده از گراف زمان‌دار ازنظر نفر اول به میزان 3 نمره و ازنظر نفر دوم نیز به‌اندازه 3 نمره و ... از مدل پایه پیشی گرفته است. نتیجه‌گیری: نتایج نشان دادند که مدل ال‌اس‌ای در مقایسه با مدل پایه جهت بازیابی اسناد مرتبط عملکرد بهتری داشته است و نیز استفاده از گراف زمان دار نسبت به مدل پایه عملکرد بهتری را نشان داده است.
کلیدواژه	تحلیل معنای نهان، گراف زمان‌دار، مدل بازیابی خبرگان، زمان، نظام اطلاعاتی
آدرس	دانشگاه تهران, ایران, دانشگاه تهران, ایران, دانشگاه تهران, ایران
پست الکترونیکی	ahmad.khalili@ut.ac.ir

implementation of experts’ retrieval model using latent semantic indexing (lsa) method and temporal graph

Authors	rezvani shahla ,naghshineh nader ,khalilijafarabad ahmad
Abstract	introduction: retrieval of experts is a subset of information retrieval that aims to provide a ranking of people who have knowledge in a particular field. automated expertise work is challenging due to the abundance of expert information and data sources. many expert approaches in both industry and academia have been proposed using new techniques in information retrieval, data mining, knowledge discovery, statistical modeling, probabilistic modeling, and complex networking. all researchers estimate the relationship between the query and the supporting documents of the expert candidate based on the occurrence of query words in the supporting documents, and they are main and important researches. these models are not capable of semantic communication. therefore, in this research, the document-oriented method was considered using the lsa recovery model and the use of a time graphmethodology: the research method is experimental ones, aside from this, survey and library methods have been used. the method used in current study to retrieve articles on lsa or latent semantic analysis, which is based on the articles of the test collection prepared by web of science. these documents include english articles in information science and librarianship from 1989 to 2018 is indexed under the category of information science and librarianship on the website. total number of these articles were 126924 and queries made by users were provided to all these articles. the retrieved documents were judged by relevance and after judging the relevance of the documents by the participants in the study, the performance of the information retrieval model was measured by the evaluation measurements of information retrieval systems. the result of the calculated measures was compared with the value of each of these measures in the basic model. a temporal graph was used to include the time factor. after that, the authors who had the most relevant work and their value of micro index of social network were introduced as experts. then ten queries from the present research model and the basic model were randomly selected and given to eight people introduced by the second community for judgment and the results were compared. findings: according to the innovation used in the current research, which was the application of the information retrieval model of latent semantic analysis, which was finally used to retrieve expert authors, in terms of the amount obtained from each of the information retrieval metrics, i.e., the accuracy level at the level of the first five results, or p@5, mean average precision (map) and mean inverse rank (mrr) with values of 0.895, 0.839 and 0.909, respectively, the latent semantic analysis recovery model performed better than the base model. in addition, this is due to the better performance of the retrieval using the dimensionality reduction method compared to keyword matching. in this method, hidden meaning indexing is used, which is a kind of conceptual indexing and uses the statistical method of least squares, and the above indexing is extracted by applying this statistical method. as we know, there are many ways to express a word (synonyms), so it is possible that the query words do not match the words of the document. in addition, most words have multiple meanings (multiple synonyms), so retrieving information based on the concept and meaning of a document is a better approach. lsi assumes that there is a number of latent structures in word usage that are partially blocked by diverse word choices. svd is used to estimate this structure. the vectors that are obtained statistically strengthen the indicators of meaning more than individual words. the results of other researches also indicate that retrieving documents by matching query keywords with documents is a relatively weaker method. also, the lsa retrieval model has a better performance in retrieving documents in a large set of documents than in a small set. according to the next innovation of the current research, which was the involvement of the time factor in expert search, and also according to the use of social network indicators and the final relevance judgment, the results showed that the performance of this method is significantly better than the model has been the base. the time factor was included in the retrieval of experts so that people who are no longer alive or who have been around for a long time since their last publication in a certain field are not retrieved. considering the useful life of publications in the field of knowledge and information science, a ten-year period was involved. after using publication time as the determining factor of expert retrieval, those who had published the most related work were considered as the next determining factor and then the micro indicators of the social network such as degree centrality, betweenness centrality, closeness and special vector are other determining factors that are widely used in scientometric researches and recently in expert retrieval researches. the ten queries proposed in the current research were sent to 8 people who defined the second statistical population of the research, and the results indicated that the performance of the time graph and expert finding performed better by using the factor of the most relevant published works and the factor of micro-indexes of the social network. conclusion: lsi assumes that there is a number of latent structures in word usage that are partially blocked by diverse word choices. svd is used to estimate this structure. the vectors that are obtained statistically strengthen the indicators of meaning more than individual words. the results of other researches also indicate that retrieving documents by matching query keywords with documents is a relatively weaker method. also, the lsa retrieval model has a better performance in retrieving documents in a large set of documents than in a small set. according to the next innovation of the current research, which was the involvement of the time factor in expert search, and also according to the use of social network indicators and the final relevance judgment, the results showed that the performance of this method is significantly better than the model that has been the base. the time factor was included in the retrieval of experts so that people who are no longer alive or who have been around for a long time since their last publication in a certain field are not retrieved. considering the useful life of publications in the field of knowledge and information science and, a ten-year period was involved. after using publication time as the determining factor of expert retrieval, those who had published the most related work were considered as the next determining factor and then the micro indicators of the social network such as degree centrality, betweenness centrality, closeness and special vectors are other determining factors that are widely used in scientometric researches and recently in expert retrieval researches. is used. the results showed that the lsa model performed better than the base model for retrieving related documents and the use of time graph showed better performance than the base model.
Keywords	latent semantic analysis ,temporal graph ,expert finding model ,time ,information system