|
|
feature correlation and importance analysis in cardiovascular health
|
|
|
|
|
نویسنده
|
ghaderi aida ,sharif samane
|
منبع
|
اولين همايش ملي هوش مصنوعي و فناوري هاي سلامت در پزشكي - 1403 - دوره : 1 - اولین همایش ملی هوش مصنوعی و فناوری های سلامت در پزشکی - کد همایش: 03241-50950 - صفحه:0 -0
|
چکیده
|
Introduction: this study focuses on employing feature selection techniques to streamline data prior to constructing predictive models using classification algorithms. the refined set of features is subsequently fed into these classification algorithms to develop models for heart disease prediction. these models are then utilized to compare the accuracy of different classifiers. the research employs statistical and machine learning approaches, statistical correlation and mutual information methods. the correlation coefficient between each feature and cardiovascular disease is calculated. furthermore, the study determines the extent to which each feature influences the development of predictive models with mi degrees. this approach allows for a more efficient and targeted analysis by reducing the dataset to its most relevant components. by doing so, the researchers aim to enhance the accuracy and efficiency of heart disease prediction models while also gaining insights into which features are most closely associated with cardiovascular disease. methods and materials: in this article, our methodology focuses on identifying key features relevant to the diagnosis of cardiovascular disease. in the first section, we calculate the correlation using statistical methods to assess the relationships among input features related to cardiovascular diseases. in the second section, we employ a feature selection algorithm based on the mutual information method to determine the significance of each feature in diagnosing the condition. the mutual information method evaluates the degree of an attribute's association with the output and quantifies how much information a feature contributes about the output, assigning it a score. results: during feature selection, the most relevant features are extracted from the dataset, helping to eliminate redundancy. as irrelevant features are removed from the input data, feature selection can increase the prediction accuracy. in this study, features were selected based on their correlation using statistical methods and through mutual information techniques. after reducing the dataset through feature selection, the remaining features were used as input for the classifiers. analyzing correlations and the significance of each feature aids in improving the classifier model's accuracy. this approach led us to conclude that the features related to chest pain type (cp), old peak, number of major vessels colored by fluoroscopy (ca), and thalassemia (thal) are crucial for diagnosing heart disease. table 1. features selected for the analysis s/n selected features mi degree 1 cp 0.14330327 2 old peak 0.11824184 3 ca 0.14620273 4 thal 0.13688095 conclusion and discussion: in the study, the correlation between all features and the target variable (cardiovascular disease) is calculated. subsequently, the significance level is determined using the mutual information approach. this method takes into account both linear and nonlinear relationships within the data. as a result, the mutual information values can represent the importance of each input feature, thereby enhancing the accuracy of the classifier model.
|
کلیدواژه
|
feature selection ,correlation ,mutual information ,cardiovascular
|
آدرس
|
, iran, , iran
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|