|
|
انتخاب برخط ویژگیهای جریانی با استفاده از سری هندسی ماتریس مجاورت ویژگیها
|
|
|
|
|
نویسنده
|
اسکندری صادق
|
منبع
|
پردازش علائم و داده ها - 1399 - شماره : 4 - صفحه:3 -14
|
چکیده
|
انتخاب، ویژگی یکی از گامهای پیشپردازش مهم در یادگیری ماشینی و دادهکاوی است. تمامی الگوریتمهای انتخاب ویژگی سنتی فرض میکنند که کل فضای ویژگی از ابتدای چرخه انتخاب در دسترس است؛ با این وجود در بسیاری از کاربردهای دنیای واقعی با سناریوی ویژگیهای جریانی مواجه هستیم. در این سناریو، تعداد ویژگیها بهمرور زمان افزایش مییابد. در این مقاله، مساله انتخاب برخط ویژگیهای جریانی از منظر سریهای هندسی گراف ارتباط ویژگیها مورد بررسی قرار گرفته و یک الگوریتم جدید به نام osfsgs پیشنهاد شده است. این الگوریتم با استفاده از مفهوم سری هندسی گراف مجاورت، ویژگیهای افزونه را به شکل برخط حذف میکند؛ علاوهبراین، الگوریتم پیشنهادی از یک سازوکار نگهداری ویژگیهای افزونه بهره میبَرَد که امکان بررسی مجدد ویژگیهای بسیار خوبی را که درقبل حذف شدهاند، فراهم میآورد. الگوریتم پیشنهادی بر روی هشت مجموعهداده با ابعاد بزرگ اعمال شده و نتایج نشاندهنده دقت بالای این الگوریتم در نمونههای زمانی مختلف است.
|
کلیدواژه
|
ویژگیهای جریانی، انتخاب ویژگی، سری هندسی
|
آدرس
|
دانشگاه گیلان, دانشکده علوم ریاضی, گروه علوم رایانه, ایران
|
پست الکترونیکی
|
eskandari@guilan.ac.ir
|
|
|
|
|
|
|
|
|
Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
|
|
|
Authors
|
|
Abstract
|
Feature Selection (FS) is an important preprocessing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many realworld applications. In OSF, the number of training examples is fixed while the number of features grows with time as new features stream in. For instance, in the problem of semantic segmentation of images using texturebased features, the number of features can be infinitely growing. In these dynamically growing scenarios, a rudimentary approach is waiting a long time for all features to become available and then carry out the feature selection methods. However, due to the importance of optimal decisions at every time step, a more rational approach is to design an online streaming feature selection (OSFS) method which selects a best feature subset from sofarseen information and updates the subset on the fly when new features stream in. Any OSFS method must satisfy three critical conditions; first, it should not require any domain knowledge about feature space, because the full feature space is unknown or inaccessible. Second, it should allow efficient incremental updates in selected features. Third, it should be as accurate as possible at each time instance to allow having reliable classification and learning tasks at that time instance. In this paper, OSFS is considered from the geometric series of features adjacency matrix and, a new OSFS algorithm called OSFSGS is proposed. This algorithm ranks features based on path integrals and the centrality concept on an online feature adjacency graph. The most appealing characteristics of the proposed algorithm are; 1) all possible subsets of features are considered in evaluating the rank of a given feature, 2) it is extremely efficient, as it converts the feature ranking problem to simply calculating the geometric series of an adjacency matrix and 3) beside selected features subset, it uses a redundant features subset that provides the reconsideration of good features at different time instances. This algorithm is compared with three stateoftheart OSFS algorithms, namely informationinvesting, fastOSFS and OSFSMI. The informationinvesting algorithm is an embedded online feature selection method that considers the feature selection as a part of learning process. This algorithm selects a new incoming feature if it reduces the model entropy more than the cost of the feature coding. The fastOSFS algorithm is a filter method that gradually generates a Markovblanket of feature space using causalitybased measures. For any new incoming feature, this algorithm executes two processes: an online relevance analysis and then an online redundancy analysis. OSFSMI is a similar algorithm to fastOSFS, in which uses information theory for feature analysis. The algorithms are extensively evaluated on eight highdimensional datasets in terms of compactness, classification accuracy and runtime. In order to provide OSF scenario, features are considered one by one. Moreover, in order to strengthen the comparison, the results are averaged over 30 random streaming orders. Experimental results demonstrate that OSFSGS algorithm achieves better accuracies than the three existing OSFS algorithms.
|
Keywords
|
Streaming Features ,Feature Selection ,Geometric Series
|
|
|
|
|
|
|
|
|
|
|