|
|
robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin b2 production data
|
|
|
|
|
نویسنده
|
roozbeh mahdi ,maanavi monireh ,babaie-kafaki saman
|
منبع
|
iranian journal of health sciences - 2020 - دوره : 8 - شماره : 2 - صفحه:9 -22
|
چکیده
|
Background and purpose: by evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. the main problems with high-dimensional data are the estimation of the coefficients and interpretation. for high-dimension problems, classical methods are not reliable because of a large number of predictor variables. in addition, classical methods are affected by the presence of outliers and collinearity. methods: nowadays, many real-world data sets carry structures of high-dimensional and outliers problems. in the regression concept, an outlier is a point that fails to follow the main linear pattern of the data. the ordinary least-squares estimator is potentially sensitive to the outliers; this fact provided necessary motivations to investigate robust estimations. to handle these problems, we combined the least absolute shrinkage and selection operator (lasso) with the least trimmed squares (lts) estimation. results: due to the flexibility and applicability of the semiparametric model in medical data, a penalized optimization approach for semiparametric regression models to simultaneously combat high-dimension and outliers in the data set. based on the numerical study, it was deduced that the proposed model is quite efficient in the sense that it has a significant value of goodness of fit (mse=1.3807). conclusion: we have proposed an optimization approach for semiparametric models to combat outliers in the data set. especially, based on a penalization lasso scheme, we have suggested a nonlinear integer programming problem as the semiparametric model which can be effectively solved by any evolutionary algorithm. we have also studied a real-world application related to the riboflavin production.
|
کلیدواژه
|
high-dimensional data set; ordinary least square method; outliers; robust regression
|
آدرس
|
semnan university, faculty of mathematics, statistics & computer science, iran, semnan university, faculty of mathematics, statistics and computer science, iran, semnan university, faculty of mathematics, statistics & computer science, iran
|
پست الکترونیکی
|
sbk@semnan.ac.ir
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|