|
|
enhanced data point importance for efficient data splitting in classificationmodels: application to olive oil authentication
|
|
|
|
|
نویسنده
|
zare zahra ,vali zade somaye ,abdollahi hamid
|
منبع
|
نهمين سمينار ملي دوسالانه كمومتريكس ايران - 1402 - دوره : 9 - نهمین سمينار ملی دوسالانه کمومتريکس ايران - کد همایش: 02230-81220 - صفحه:0 -0
|
چکیده
|
In the realm of data science, classification models are vital for predicting or identifying classeswithin datasets. the success of creating a classification model hinges on accurately selectingsamples for both the training and testing datasets. proper data splitting during data preprocessingdirectly influences the effectiveness and efficiency of the final classification model. in pca space,data points can be categorized into essential and non-essential points. essential points (ep)represent convex hull vertices built from data points in a normalized space, forming arepresentative set of data. on the other hand, non-essential points are located inside the convexhull. recently, an algorithm called data point importance (dpi) has been introduced [1] todetermine the order of importance of eps, enabling the sorting of information and selection ofsamples within the dataset. dpi provides an easily calculable value that reflects the impact of eachdata point on preserving the data structure s pattern. this research extends the concept of dpi tonon-essential points, establishing the sequence of importance for all data points and sortinginformation for each of them. the study evaluates the idea of enhanced dpi (edpi) and itsapplication in selecting important points that affect the efficiency of class modeling. in theproposed algorithm, data points are examined in the form of layered convex hulls, and their orderof importance is evaluated. edpi is used to rank all data points (samples) in the row space of thetraining set of the target class based on their significance in preserving the data structure. theapproach is applied in class modeling (dd-simca) for authenticating extra virgin olive oilsamples. the research also compares the results obtained from sample selection using the edpistrategy with the kennard-stone method (ks). the study utilizes raman spectra of pure samplesand samples adulterated with various oils to develop one-class models for evaluating theauthenticity and adulteration of extra virgin olive oil. figure 1 illustrates the ranking results ofdata points (samples) based on the enhanced dpi strategy, demonstrating that the proposedmethod for data splitting outperforms the ks method in many cases [1].
|
کلیدواژه
|
essential points (ep) ,data point importance (dpi) ,enhanced dpi (edpi).
|
آدرس
|
, iran, , iran, , iran
|
پست الکترونیکی
|
abd@iasbs.ac.ir
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|