|
|
evaluating semantic and syntactic similarity for plagiarism detection in english using nlp
|
|
|
|
|
نویسنده
|
khajeh zadeh mahsa ,zaifar meisam
|
منبع
|
دومين كنفرانس ملي تحول ديجيتال و سيستم هاي هوشمند - 1402 - دوره : 2 - دومین کنفرانس ملی تحول دیجیتال و سیستم های هوشمند - کد همایش: 02231-67491 - صفحه:0 -0
|
چکیده
|
Manually detecting plagiarism in the huge volume of published documents is not feasible. existing automatic plagiarism detection tools mostly focus on lexical matching, missing semantic and syntactic aspects of plagiarism. a challenging area of plagiarism detection is the semantic area which is the combination of lexical and syntactic conversions. nlp can be exploited to analyze the semantic similarity and detect document plagiarism. hybrid methods, made by a combination of different kinds of algorithms, have proven to be more comprehensive. in this study an existing hybrid similarity algorithm is improved and a plagiarism detection method and plagiarism score is defined to compare document plagiarism levels. the results on masrp dataset show a few percent improvement in all similarity evaluation criteria, including accuracy, precision, recall and f-measure. moreover, the document plagiarism score shows a good reflection of the amount of plagiarism detected in the documents. our tests on cpsa corpus verify that the defined plagiarism score correlates to the level of plagiarism in the suspicious document.
|
کلیدواژه
|
semantic similarity ,syntactic similarity ,plagiarism ,nlp
|
آدرس
|
, iran, , iran
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|