تعیین تعداد گروه در مجموعه داده های ژئوشیمیایی با استفاده از شاخص های بازشناسی الگوی مبتنی بر تفکیک و تراکم خوشه ها

Fa | Ar | En

تعیین تعداد گروه در مجموعه داده های ژئوشیمیایی با استفاده از شاخص های بازشناسی الگوی مبتنی بر تفکیک و تراکم خوشه ها


نویسنده	اسمعیل اوغلی سعید ,طباطبایی حسن ,اسدی هارونی هوشنگ
منبع	روشهاي تحليلي و عددي در مهندسي معدن - 1398 - شماره : 18 - صفحه:61 -76
چکیده	تقسیم‌بندی مجموعه داده به زیرمجموعه‌های همگن، هدفی اساسی در تحلیل داده‌های ژئوشیمیایی است که اغلب از ابزار خوشه‌بندی برای نیل به آن استفاده می‌شود. مهم‌ترین چالش عملی موجود در این راستا، تخمین تعداد حقیقی گروه‌های نهان در مجموعه داده است که به طور سنتی از اطلاعات ژئوشیمیایی توصیفی، دانش کارشناسی یا به کارگیری یک شاخص آماری خاص برای حل آن استفاده می‌شود. خروجی این روش‌ها اغلب ناپایدار و همراه با عدم‌قطعیت است، لذا رویکردی که این مقاله برای حل مسئله تعیین تعداد خوشه در داده‌ها پیشنهاد می‌کند، اجرای گستره‌ای از شاخص ‌های موجود و تولید توزیعی از پاسخ‌های ممکن و نهایتاً استخراج جواب نهایی از آن است. شاخص‌های به کار رفته در این زمینه، مبتنی بر روابط بازشناسی الگو و بر مبنای بیشینه‌سازی پارامتر تفکیک بین گروهی و کمینه‌سازی پارامتر تراکم درون گروهی هستند. جهت آزمون رویکرد پیشنهادی، مجموعه داده شبیه‌سازی شده دوبعدی با چهار خوشه مصنوعی تولید گشته و با اجرای 30 شاخص پرکاربرد بر روی آن، بالاترین فرکانس موجود در توزیع پاسخ‌ها منطبق بر جواب حقیقی مسئله به دست آمده است. این راهکار عیناً بر روی یک مجموعه داده ژئوشیمیایی حقیقی و چندمتغیره، شامل داده‌های خاک کانسار مس طلای دالی شمالی واقع در استان مرکزی اجرا شده است که نتایج به دست آمده نشان دهنده معنی‌دار بودن و انطباق پاسخ نهایی با فرآیندهای زمین‌شناسی و کانه‌زایی محدوده است.
کلیدواژه	داده های ژئوشیمیایی، خوشه بندی، تعداد گروه، تفکیک خوشه ها، تراکم خوشه ها، کانسار دالی شمالی
آدرس	دانشگاه صنعتی اصفهان, دانشکده مهندسی معدن, ایران, دانشگاه صنعتی اصفهان, دانشکده مهندسی معدن, ایران, دانشگاه صنعتی اصفهان, دانشکده مهندسی معدن, ایران

Determining the number of groups in geochemical data set using pattern recognition indices on the basis of separation and compactness of clusters

Authors	esmaeiloghli saeid ,Tabatabaei Seyed Hassan ,Asadi Haroni Hooshang
Abstract	SummaryThis paper presents an innovative approach for calculating the correct number of groups in the geochemical data sets. The proposed method reduces the uncertainty of traditional methods that is often based on expert knowledge or application of a unique index. On the basis of separation and compactness of clusters, several pattern recognition indices (thirty indices) are used to produce the response distribution. Then, the optimal solution is concluded from the possible answers which are selected on the basis of the maximum frequency of distribution. This process has been implemented on a simulated data set which ultimately has been managed to properly identify the true number of artificial clusters. It has also been applied to a real geochemical data set, and consequently, three clusters are estimated as the optimum group numbers in the data set. The three groups resulted from data clustering are fully correlated with the geological and geochemical evidences in the study area. IntroductionPartitioning of the heterogeneous data set into homogeneous subsets is an important goal of geochemical data processing which clustering tools are usually used to achieve this goal. Nevertheless, the most important practical challenge in this regard is an estimation of the actual number of underlying groups in the data set. This is traditionally related to descriptive geochemical information, expert knowledge, and unique statistical index. Due to the instability and uncertainty of the mentioned approaches, we recommend solving the problem by implementing the whole range of indices, creating a distribution of possible responses and consequently extracting the best answer. Methodology and ApproachesTo evaluate the performance of the proposed approaches, we generated a twodimensional simulated data set containing four artificial clusters. The real geochemical data set that is used in this research includes 149 soil samples collected from the North Dalli porphyry CuAu deposit, located in Markazi province. Thirty indices were used to determine the optimal number of groups in the data set. These indices were essentially achieved from pattern recognition and their performance is based on maximizing the withingroup separation and minimizing the betweengroup compactness. Results and ConclusionsAll indices were implemented in the R programming environment. The mode of response distribution in the case of simulated data was in compliance with the true number of artificial clusters. In case of the geochemical data set of the Dalli CuAu deposit, three clusters were identified. Clustering of geochemical data into these three groups indicated a clear geochemical zonation, which corresponds to the geological and mineralogical evidences in the study area.
Keywords