|
|
Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis
|
|
|
|
|
نویسنده
|
Flores Enrique ,Barrón-Cedeño Alberto ,Moreno Lidia ,Rosso Paolo
|
منبع
|
journal of universal computer science - 2015 - دوره : 21 - شماره : 13 - صفحه:1708 -1725
|
چکیده
|
Nowadays, internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. the temptation to re-use these materials is very high. even source codes are easily available through a simple search on the web. there is a need of detecting potential instances of source code re-use. source code re-use detection has usually been approached comparing source codes in their compiled version. when dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. we assume that a source code is a piece of text ,with its syntax and structure, so we aim at applying models for free text re-use detection to source code. in this paper we compare a latent semantic analysis (lsa) approach with previously used text re-use detection models for measuring cross-language similarity in source code. the lsa-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.
|
کلیدواژه
|
Cross-Language Re-Use Detection ,Source Code ,Plagiarism ,Latent Semantic Analysis
|
آدرس
|
Universitat Polit`ecnica de Valencia, Spain, Hamad Bin Khalifa University (HBKU), Qatar Computing Research Institute, Qatar, Universitat Politècnica de Valencia, Spain, Universitat Politècnica de Valencia, Spain
|
پست الکترونیکی
|
albarron@qf.org.qa, albarron@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|