|
|
پیادهسازی مدل بازیابی خبرگان با استفاده از روش تحلیل معنای نهان و گراف زماندار
|
|
|
|
|
نویسنده
|
رضوانی شهلا ,نقشینه نادر ,خلیلی جعفرآباد احمد
|
منبع
|
پژوهشنامه كتابداري و اطلاع رساني - 1402 - دوره : 13 - شماره : 1 - صفحه:226 -245
|
چکیده
|
مقدمه: خبره یابی شناسایی افراد با دانش و مهارت کافی در زمینهای خاص و معرفی آنها بهعنوان خبره در آن زمینه است. بازیابی افراد خبره زیرمجموعهای از بازیابی اطلاعات است که هدف آن ارائه رتبهبندی از افرادی است که دارای دانش درزمینۀ خاصی هستند. کار خبره یابی خودکار بهدلیل فراوانبودن اطلاعات خبرگی و منابع داده چالشبرانگیز است. هدف این پژوهش مقایسه عملکرد خبرهیابی مدل بازیابی اطلاعات تحلیل معنای نهان و نیز گراف زمان دار با مدل پایه بود. روششناسی: روش پژوهش تجربی است و در کنار آن از روش کتابخانه ای نیز استفاده شده است. روشی که در پژوهش حاضر برای بازیابی مقالات استفاده میشود الاسای یا بازیابی معنای نهان است که بر روی مقالات مجموعه آزمون تهیهشده از وبآوساینس پیاده شد. این اسناد شامل مقالات انگلیسی علم اطلاعات و دانش شناسی است که از 1989 تا 2018 در پایگاه وبآوساینس در ذیل مقوله علم اطلاعات و دانششناسی نمایه شده است. تعداد کل این مقالات 126924، پرس وجوهای ساختهشده توسط کاربران به همه این مقالات عرضه شد. اسناد بازیابیشده مورد قضاوت ربط قرار گرفتند و پس از انجام قضاوت ربط اسناد توسط شرکتکنندگان در پژوهش، عملکرد مدل بازیابی اطلاعات توسط سنجههای ارزیابی نظامهای بازیابی اطلاعات اندازهگیری شد. سنجههای ارزیابی که در پژوهش حاضر مورداستفاده قرار گرفتند عبارتاند از میانگین متوسط دقت، میانگین معکوس رتبه، و دقت در سطح پنج نتیجه اول بازیابی شده. حاصل سنجه های محاسبهشده با مقدار هر یک از این سنجهها در مدل پایه مقایسه شد. برای دخالت دادن عامل زمان از گراف زمان دار استفاده گردید. پس از دخالت دادن عامل زمان نویسندگانی که بیشترین کار مرتبط و نیز شاخص خرد شبکه اجتماعی آنها بیشتر بود بهعنوان خبره معرفی گردید. سپس ده پرسوجو از مدل پژوهش حاضر و مدل پایه بهطور تصادفی ساده انتخاب گردید و برای قضاوت در اختیار هشت نفر از افرادی که توسط جامعه دوم معرفی گردید قرار گرفت و نتایج حاصل باهم مقایسه گردید. یافتهها: میزان بهدستآمده از هر یک از سنجه های بازیابی اطلاعات یعنی میزان دقت در سطح پنج نتیجه اول، میانگین متوسط دقت (map) و میانگین معکوس رتبه (mrr) بهترتیب با مقدار 0.895، 0.839 و 0.909، مدل بازیابی تحلیل معنای نهان عملکرد بهتری نسبت به مدل پایه داشت؛ و این امر بهدلیل بهتربودن عملکرد بازیابی بهروش کاهش ابعاد نسبت به تطابق کلیدواژهای است. چون در این روش از نمایهسازی معنای نهان استفاده می شود که نوعی نمایهسازی مفهومی است و از روش آماری حداقل مربعات بهره می برد و نمایهسازی ذکرشده با بهکارگیری این روش آماری استخراج می شود طبق تعریف پژوهشگران، خبره کسی است که بیشترین کار مرتبط با مجموع پرس وجوها در ده سال اخیر را داشته و دارای بالاترین مقدار در مرکزیت درجه ای، نزدیکی، بینابینی و بردار ویژه باشد. تعداد 10 پرسوجو از هر پژوهش بهطور مجموع 20 پرس وجو بهصورت اتفاقی انتخاب گردید و به خبرگان مشخصشده هر پژوهش توسط جامعه آماری سوم نمره صفر یا یک داده شد. مجموع نمرات برای هرکدام نشان می دهد دخالتدادن عامل زمان و استفاده از گراف زماندار ازنظر نفر اول به میزان 3 نمره و ازنظر نفر دوم نیز بهاندازه 3 نمره و ... از مدل پایه پیشی گرفته است. نتیجهگیری: نتایج نشان دادند که مدل الاسای در مقایسه با مدل پایه جهت بازیابی اسناد مرتبط عملکرد بهتری داشته است و نیز استفاده از گراف زمان دار نسبت به مدل پایه عملکرد بهتری را نشان داده است.
|
کلیدواژه
|
تحلیل معنای نهان، گراف زماندار، مدل بازیابی خبرگان، زمان، نظام اطلاعاتی
|
آدرس
|
دانشگاه تهران, ایران, دانشگاه تهران, ایران, دانشگاه تهران, ایران
|
پست الکترونیکی
|
ahmad.khalili@ut.ac.ir
|
|
|
|
|
|
|
|
|
implementation of experts’ retrieval model using latent semantic indexing (lsa) method and temporal graph
|
|
|
Authors
|
rezvani shahla ,naghshineh nader ,khalilijafarabad ahmad
|
Abstract
|
introduction: retrieval of experts is a subset of information retrieval that aims to provide a ranking of people who have knowledge in a particular field. automated expertise work is challenging due to the abundance of expert information and data sources. many expert approaches in both industry and academia have been proposed using new techniques in information retrieval, data mining, knowledge discovery, statistical modeling, probabilistic modeling, and complex networking. all researchers estimate the relationship between the query and the supporting documents of the expert candidate based on the occurrence of query words in the supporting documents, and they are main and important researches. these models are not capable of semantic communication. therefore, in this research, the document-oriented method was considered using the lsa recovery model and the use of a time graphmethodology: the research method is experimental ones, aside from this, survey and library methods have been used. the method used in current study to retrieve articles on lsa or latent semantic analysis, which is based on the articles of the test collection prepared by web of science. these documents include english articles in information science and librarianship from 1989 to 2018 is indexed under the category of information science and librarianship on the website. total number of these articles were 126924 and queries made by users were provided to all these articles. the retrieved documents were judged by relevance and after judging the relevance of the documents by the participants in the study, the performance of the information retrieval model was measured by the evaluation measurements of information retrieval systems. the result of the calculated measures was compared with the value of each of these measures in the basic model. a temporal graph was used to include the time factor. after that, the authors who had the most relevant work and their value of micro index of social network were introduced as experts. then ten queries from the present research model and the basic model were randomly selected and given to eight people introduced by the second community for judgment and the results were compared. findings: according to the innovation used in the current research, which was the application of the information retrieval model of latent semantic analysis, which was finally used to retrieve expert authors, in terms of the amount obtained from each of the information retrieval metrics, i.e., the accuracy level at the level of the first five results, or p@5, mean average precision (map) and mean inverse rank (mrr) with values of 0.895, 0.839 and 0.909, respectively, the latent semantic analysis recovery model performed better than the base model. in addition, this is due to the better performance of the retrieval using the dimensionality reduction method compared to keyword matching. in this method, hidden meaning indexing is used, which is a kind of conceptual indexing and uses the statistical method of least squares, and the above indexing is extracted by applying this statistical method. as we know, there are many ways to express a word (synonyms), so it is possible that the query words do not match the words of the document. in addition, most words have multiple meanings (multiple synonyms), so retrieving information based on the concept and meaning of a document is a better approach. lsi assumes that there is a number of latent structures in word usage that are partially blocked by diverse word choices. svd is used to estimate this structure. the vectors that are obtained statistically strengthen the indicators of meaning more than individual words. the results of other researches also indicate that retrieving documents by matching query keywords with documents is a relatively weaker method. also, the lsa retrieval model has a better performance in retrieving documents in a large set of documents than in a small set. according to the next innovation of the current research, which was the involvement of the time factor in expert search, and also according to the use of social network indicators and the final relevance judgment, the results showed that the performance of this method is significantly better than the model has been the base. the time factor was included in the retrieval of experts so that people who are no longer alive or who have been around for a long time since their last publication in a certain field are not retrieved. considering the useful life of publications in the field of knowledge and information science, a ten-year period was involved. after using publication time as the determining factor of expert retrieval, those who had published the most related work were considered as the next determining factor and then the micro indicators of the social network such as degree centrality, betweenness centrality, closeness and special vector are other determining factors that are widely used in scientometric researches and recently in expert retrieval researches. the ten queries proposed in the current research were sent to 8 people who defined the second statistical population of the research, and the results indicated that the performance of the time graph and expert finding performed better by using the factor of the most relevant published works and the factor of micro-indexes of the social network. conclusion: lsi assumes that there is a number of latent structures in word usage that are partially blocked by diverse word choices. svd is used to estimate this structure. the vectors that are obtained statistically strengthen the indicators of meaning more than individual words. the results of other researches also indicate that retrieving documents by matching query keywords with documents is a relatively weaker method. also, the lsa retrieval model has a better performance in retrieving documents in a large set of documents than in a small set. according to the next innovation of the current research, which was the involvement of the time factor in expert search, and also according to the use of social network indicators and the final relevance judgment, the results showed that the performance of this method is significantly better than the model that has been the base. the time factor was included in the retrieval of experts so that people who are no longer alive or who have been around for a long time since their last publication in a certain field are not retrieved. considering the useful life of publications in the field of knowledge and information science and, a ten-year period was involved. after using publication time as the determining factor of expert retrieval, those who had published the most related work were considered as the next determining factor and then the micro indicators of the social network such as degree centrality, betweenness centrality, closeness and special vectors are other determining factors that are widely used in scientometric researches and recently in expert retrieval researches. is used. the results showed that the lsa model performed better than the base model for retrieving related documents and the use of time graph showed better performance than the base model.
|
Keywords
|
latent semantic analysis ,temporal graph ,expert finding model ,time ,information system
|
|
|
|
|
|
|
|
|
|
|