>
Fa   |   Ar   |   En
   نگرشی به «متن‌کاوی» در پژوهش‌های زبانی: رویکرد رایانشی در تحلیل متون  
   
نویسنده مسجدی هادی ,عادل محمد رضا ,امیریان محمدرضا ,زارعیان غلامرضا
منبع جستارهاي زباني - 1400 - دوره : 12 - شماره : 6 - صفحه:499 -531
چکیده    »متن کاوی « به فرایند رایانشی تحلیل متون بدون ساختار و استخراج لایه های زبانی پنهان و مضامین موجود در آن‌ها گفته می شود. این روش، اهمیت ویژه ای در تحلیل محتوا یا مضمون پژوهش های توصیفی و تفسیری دارد. در این فرایند، نخست متون ساده ساختارمند شده وسپس مفاهیم و انگاره‌های نهفتۀ آن خلاصه سازی، طبقه بندی، مدل سازی، ارزیابی و تفسیر می شوند. نظر به اینکه این روش به‌ویژه در مطالعات گفتمان به‌منزلۀ یک نو آوری میان رشته ای به‌شمار می آید، سزاوار است استفاده از آن در مطالعات دانشگاهی کشور با جدیت بیشتری دنبال شود. مع الوصف، به رغم گستردگی کمی و کیفی پژوهش های بین‌المللی در این حوزه، جای خالی این پژوهش‌ها در مقالات فارسی و انگلیسی داخل کشور بسیار احساس می‌شود. از این رو، این مقاله در نظر دارد از رهگذر کنکاش نظری و عملی روش های متن کاوی و ارزیابی ابزارها و روش‌های اصلی آن در زبان فارسی و انگلیسی، بستری مناسب برای بهره مندی از ظرفیت های این روش شناسی در مطالعات زبانی فراهم سازد.
کلیدواژه متن‌کاوی، متون بدون ساختار، تحلیل محتوا، تحلیل مضمون، پردازش طبیعی زبان.
آدرس دانشگاه حکیم سبزواری, ایران, دانشگاه حکیم سبزواری, ایران, دانشگاه حکیم سبزواری, ایران, دانشگاه حکیم سبزواری, ایران
 
   An Overview of Text Mining in Language Studies: The Computational Approach to Text Analytics  
   
Authors Zareian Gholamreza ,Masjedy Hadi ,Amirian Seyed Mohammad Reza ,Adel Seyyed Mohammad Reza
Abstract    Text mining rsquo; refers to the computational process of unstructured text analytics for extracting latent linguistic layers and themes. It is especially significant as content or thematic analysis in descriptive and interpretive studies. This process begins with structuring simple texts and proceeds with summarizing, classifiing, modelling, evaluating and interpreting the inherent textual concepts and patterns. Given that this method counts as an interdisciplinary innovation especially in discoursal studies, it is to be pursued more intensively in academic studies. Despite the multitude of English studies in this area, there has been little interest to date in text mining amongst Iranian researchers as evidenced by the critically limited number of local Persian and English studies. Thus looking into the theory and practice of text mining and its major analytic tools and methods in Persian and English, this paper aims to prepare the ground for utilizing this methodology in language studies. The last two decades faced a major increase in the rate and accuracy of knowledge generation in language studies due to advances in interdisciplinary studies of applied linguistics and computer sciences. At the heart of methodological innovations especially in discourse studies lies lsquo;text mining rsquo; whose merits have only recently been appreciated by researchers. lsquo;Text mining rsquo;, lsquo;text data mining rsquo; or lsquo;Text Analysis rsquo; is the use of different data mining algorithms and methods like natural language processing and linguistic as well as statistical techniques to derive linguistic features, significant patterns and valuable themes from the unstructured texts through collecting unstructured data, preprocessing and cleansing them to detect and remove anomalies and processing and controlling operations (Zhou et al, 2012). These processes are further broken down into feature extraction, structural analysis, text summary, text classification, text clustering, and association analysis. Text mining is actually a complicated procedure of extracting valuable, significant patterns and trends from a large number of textual data used for such functions as product suggestion analysis, social media opinion mining, and sentiment or trend analysis (He, 2013).Dating back to Feldman and Dagan (1995), text mining is an innovative methodology with a relatively short history which is often integrated with corpus analysis to computationally analyze a large body of unstructured texts as potential inormatieofinsight. As a subfield of data mining in computer sciences and an interdisciplinary method, text mining borrows from corpus and computational linguistics, whose main purpose is to extract the metacharacters representing textual features (PonsPorrata et al, 2007). Zhou et al (2017) believe that despite its short history, text mining has been remarkably evolved into the mainstream research methodology in many interdisciplinary areas in the wake of increasingly rapid developments in data mining.Hashimi et al (2015) explained the steps involved in text mining as a semiautomated process of collecting, structuring and then analyzing textual data as follows: (a) collecting unstructured data from a variety of sources like textual documents, social media, web pages, mails, blogs, etc. using specialized corpora for organization, (b) preprocessing and cleansing the data for removing the anomalies to unveil latent valuable information using text mining tools, (c) unstructured data conversion into relevant structured formats, (d) discovering the underlying data patterns using word structures, sequences and frequency, and (e) extracting useful knowledge and storing them in a secure database for evaluation, later retrieval, trend analysis and possible decisionmaking. Text mining aslso makes use of lexicometrics dealing with frequency and cooccurrence analysis of vocabulary to derive structures from texts; sentiment analysis is an application of lexicometrics looking for positive or negative emotions in documents and has been used in social media analysis for evaluating public opinion (Shangzhen Lemen, 2016).Text mining is an area of inquiry that in itself deserves to be pursued more intensively in future studies and this paper, thus, is an attempt to review its basic principles, procedures and top analytic tools and to raise researchers rsquo; awareness of the virtues of text mining.
Keywords Text mining ,unstructured texts ,content analysis ,thematic analysis ,natural language processing
 
 

Copyright 2023
Islamic World Science Citation Center
All Rights Reserved