>
Fa   |   Ar   |   En
   enhanced data point importance for subset selection in partial least square regression:a comparative study with kennard-stone method  
   
نویسنده vazifeh solout mahya ,vali zade somaye ,abdollahi hamid ,ghasemi jahan bakhsh
منبع نهمين سمينار ملي دوسالانه كمومتريكس ايران - 1402 - دوره : 9 - نهمین سمينار ملی دوسالانه کمومتريکس ايران - کد همایش: 02230-81220 - صفحه:0 -0
چکیده    Upon the application of multivariate analysis to a dataset, whether involving singular block data(pca, mcr, simca) or multi-block data (pcr, pls), the process of choosing a subset ofsamples from the complete dataset becomes essential. this procedure is referred to as subsetselection. a subset refers to a smaller, representative portion of the entire dataset that is used forthe purpose of building, refining, or validating the model. the characteristics of the subset chosenwithin the calibration model depend on the specific goals and requirements of the calibrationprocess. the subset should accurately represent the overall characteristics of the entire dataset. itshould capture the various patterns, trends, and variations present in the data. so, the choice ofsubset within a calibration model is a critical step.we proposed a new method for subset selection based on data point importance (dpi) in partialleast square regression. in pls space, data points can be categorized into essential and nonessential points. essential points (ep) represent convex hull vertices built from data points in anormalized space, forming a representative set of data. on the other hand, non-essential points arelocated inside the convex hull. recently, an algorithm called data point importance (dpi) hasbeen introduced [1] to determine the order of importance of eps, enabling the sorting ofinformation and selection of samples within the dataset. dpi provides an easily calculable valuethat reflects the impact of each data point on preserving the data structure s pattern. this researchextends the concept of dpi to non-essential points, establishing the sequence of importance for alldata points and sorting information for each of them. the study evaluates the idea of enhanceddpi (edpi) and its application in selecting important points to subset selection in pls regression.the algorithm we present involves analyzing data points through layered convex hulls, assessingtheir relative importance. the ranking of all data points (samples) in the training is accomplishedusing edpi, which determines their relevance in maintaining the integrity of the data structurewithin the row space. the study also conducts a comparison between the outcomes achievedthrough sample selection using the edpi strategy and those obtained via the kennard-stonemethod (ks). figure 1 depicts the ranking outcomes of data points (samples) utilizing theenhanced dpi strategy, showcasing the comparable performance of the proposed data splittingmethod compared to the ks approach for corn data.
آدرس , iran, , iran, , iran, , iran
 
     
   
Authors
  
 
 

Copyright 2023
Islamic World Science Citation Center
All Rights Reserved