|
|
Solving the difficult problem of topic extraction in Thai tweets
|
|
|
|
|
نویسنده
|
nararatwong r. ,legaspi r. ,cooharojananone n. ,okada h. ,maruyama h.
|
منبع
|
journal of telecommunication, electronic and computer engineering - 2016 - دوره : 8 - شماره : 6 - صفحه:141 -145
|
چکیده
|
We tackled in this study the difficult problem of topic extraction in thai tweets on the country's historic flood in 2011. after using latent dirichlet allocation (lda) to extract the topics,the first difficulty that faced us was the inaccuracy the word segmentation task that affected our interpretation of the lda result. to solve this,we refined the stop word list from the lda result by removing uninformative words caused by the word segmentation,which resulted to a more relevant and comprehensible outcome. with the improved results,we then constructed a rule-based categorization model and used it to categorize all the collected tweets on a per-week scale to observe changes in tweeting trend. not only did the categories reveal the most relevant and compelling topics that people raised at that time,they also allowed us to understand how people perceived the situations as they unfold over time.
|
کلیدواژه
|
LDA; Thai tweets; Topic extraction
|
آدرس
|
graduate university for advanced studies,kanagawa,japan,national institute of informatics,tokyo, Japan, research organization of information and systems,transdisciplinary research integration center,institute of statistical mathematics,tokyo, Japan, chulalongkorn university, Japan, national institute of informatics,tokyo, Japan, research organization of information and systems,transdisciplinary research integration center,institute of statistical mathematics,tokyo, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|