>
Fa   |   Ar   |   En
   a joint semantic vector representation model for text clustering and classification  
   
نویسنده momtazi s. ,rahbar a. ,salami d. ,khanijazani i.
منبع journal of ai and data mining - 2019 - دوره : 7 - شماره : 3 - صفحه:443 -450
چکیده    Text clustering and classification are two main tasks of text mining. feature selection plays the key role in the quality of the clustering and classification results. although wordbased features such as term frequencyinverse document frequency (tf-idf) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use semantic models for document vector representations. latent dirichlet allocation (lda) topic modeling and doc2vec neural document embedding are two wellknown techniques for this purpose. in this paper, we first study the conceptual difference between the two models and show that they have different behavior and capture semantic features of texts from different perspectives. we then proposed a hybrid approach for document vector representation to benefit from the advantages of both models. the experimental results on 20newsgroup show the superiority of the proposed model compared to each of the baselines on both text clustering and classification tasks. we achieved 2.6% improvement in fmeasure for text clustering and 2.1% improvement in fmeasure in text classification compared to the best baseline model.
کلیدواژه text mining ,semantic representation ,topic modeling ,neural document embedding
آدرس amirkabir university of technology, computer engineering and information technology department, iran, amirkabir university of technology, computer engineering and information technology department, iran, amirkabir university of technology, computer engineering and information technology department, iran, amirkabir university of technology, computer engineering and information technology department, iran
پست الکترونیکی imankhanijazani@gmail.com
 
     
   
Authors
  
 
 

Copyright 2023
Islamic World Science Citation Center
All Rights Reserved