ارزیابی روش‌های یادگیری ماشین (svm، glm، fda و rf) در تهیه نقشه حساسیت سیل بخشی از استان خوزستان

Fa | Ar | En

ارزیابی روش‌های یادگیری ماشین (svm، glm، fda و rf) در تهیه نقشه حساسیت سیل بخشی از استان خوزستان


نویسنده	قنواتی رویت ,سلاجقه علی ,پورقاسمی حمیدرضا ,خلیقی سیگارودی شهرام ,کشت کار حمیدرضا
منبع	مدل سازي و مديريت آب و خاك - 1404 - دوره : 5 - شماره : 1 - صفحه:231 -246
چکیده	سیل یکی از مخرب‌ترین بلایای طبیعی است که خسارات جدی به منابع طبیعی و زیرساخت‌ها وارد کرده و تلفات انسانی بسیاری به‌همراه دارد. مدل‌های یادگیری ماشین به‌منظور شناسایی و مدیریت مناطق در معرض خطر سیل به‌طور گسترده‌ای مورد توجه بوده است. هدف از این تحقیق ارزیابی عملکرد چهار مدل ماشین بردار پشتیبان (svm)، خطی تعمیم‌یافته (glm)، آنالیز تفکیکی انعطاف‌پذیر (fda) و جنگل تصادفی (rf) در مدل‌سازی پراکنش خطر وقوع سیل بخشی از استان خوزستان بود. برای این منظور 13 عامل موثر بر سیل شامل عوامل توپوگرافی، هیدرواقلیمی، سنگ‌شناسی و انسانی تعیین شد. سپس موقعیت 334 نقطه محل وقوع و عدم وقوع سیلاب براساس بازدیدهای میدانی و گزارش‌های موجود مشخص شد؛ که 70% از این نقاط برای آموزش و 30% باقیمانده جهت اعتبارسنجی مدل‌ها، بصورت تصادفی در نظر گرفته شد. نتایج ارزیابی عملکرد مدل‌های مورد بررسی براساس شاخص مساحت زیر منحنی مشخصه عامل گیرنده (roc) برای مدل‌های rf، glm و fda بالاتر از 0.7 بدست آمد؛ که مدل rf با سطح زیر منحنی 98.8 درصد از دقت بالاتری نسبت به سایر مدلها برخوردار بود. براساس نقشه حساسیت خطر سیل حاصل از این مدل به‌ترتیب در 4.7% و 12.4% از سطح منطقه احتمال وقوع سیل خیلی‌زیاد و زیاد بوده است. نتایج این تحقیق به مدیران در کاهش تهدیدهای مرتبط با سیل و اجرای راهکارهای مدیریتی موثر در جهت کاهش خسارات آن کمک می‌کند.
کلیدواژه	استان خوزستان، خطر سیل، مدل‌های داده‌کاوی، مدل جنگل تصادفی
آدرس	دانشگاه تهران, دانشکده منابع طبیعی, گروه احیا مناطق خشک و کوهستانی, ایران, دانشگاه تهران, دانشکده منابع طبیعی, گروه احیاء مناطق خشک و کوهستانی, ایران, دانشگاه شیراز, دانشکده کشاورزی, بخش علوم خاک, ایران, دانشگاه تهران, دانشکده منابع طبیعی, گروه احیا مناطق خشک و کوهستانی, ایران, دانشگاه تهران, دانشکده منابع طبیعی, گروه احیا مناطق خشک و کوهستانی, ایران
پست الکترونیکی	hkeshtkar@ut.ac.ir

evaluation of machine learning techniques (svm, glm, fda, rf) in preparing flood susceptibility map of a part of khuzestan province

Authors	ghanavati royat ,salajegheh ali ,pourghasemi hamidreza ,khalighi sigaroodi shahram ,keshtkar hamidreza
Abstract	introductiondeveloping countries are particularly vulnerable to floods due to inadequate infrastructure, limited financial resources, and lack of advanced technology for mitigating flood impacts. therefore, there is a critical need to develop high-performance flood forecasting models to delineate flood-sensitive areas. the frequency, lethality, and economic impact of floods have spurred the scientific community to create sophisticated algorithms and models to manage the inherent complexity of these natural events. data mining algorithms have revolutionized scientific research by extracting patterns from vast, unstructured datasets and predicting future trends in complex natural phenomena. machine learning techniques, a vital subset of data mining methods, excel in making accurate predictions by addressing data limitations and preventing overfitting with proper configuration. previous studies have demonstrated that machine learning algorithms significantly improve the speed and accuracy of mapping potential flood risks. consequently, this study aims to develop a flood susceptibility map for a region in khuzestan province using advanced machine learning algorithms. this region has experienced frequent floods, leading to substantial human and financial losses. notably, during the floods of 2018, villages near the dez and karkheh dams encountered severe challenges.materials and methodsthe preparation of the flood risk map is based on two key hypotheses: (1) the past is indicative of the future, implying that future hazards will occur under conditions similar to those of past events, and (2) flood risk conditioning factors are spatially related and can be utilized in forecasting models. to test these hypotheses, the locations of past floods were obtained from relevant authorities and verified through field visits. these locations were randomly divided into two groups: a training group (70%) and a validation group (30%). data on flood risk conditioning factors, including topography, hydroclimatic conditions, and geological information, were collected and used to create raster maps of these predictive factors. the locations of flood points were treated as dependent variables. machine learning algorithms, specifically support vector machine (svm), generalized linear model (glm), flexible discriminant analysis (fda), and random forest (rf), were applied to generate the flood risk map. the performance of the models was assessed using the area under the receiver operating characteristic curve (roc) with the validation group data (30% of the flood points), and the best-performing model was selected. the final flood risk map was then produced based on this optimal model.results and discussionaccording to the collinearity analysis of the 13 factors influencing floods, all factors had tolerance thresholds greater than 0.1 and variance inflation factors less than 5. therefore, collinearity was not an issue, and no factors needed to be removed. flood susceptibility modeling was conducted using four models: svm, glm, fda, and rf. the resulting flood hazard maps from these models were classified into five risk categories: very low, low, medium, high, and very high. the results indicated that all four models identified flat lands and surface runoff margins as areas with higher flood susceptibility. in all models, more than half of the study area was classified as having low and very low flood risk. specifically, the svm, glm, fda, and rf models identified 73.9%, 69%, 72.6%, and 63.9% of the area, respectively, as low and very low risk, with the remainder falling into medium to very high risk categories. additionally, the rf and glm models indicated a larger portion of the region was at high to very high risk, with 4.7% and 3.9% of the area classified as high risk, respectively.among the four models, the rf model demonstrated the highest performance, with an area under the curve (auc) value of 98.8%.conclusion predicting high-risk areas is crucial for guiding decisions and implementing preventive measures. this study evaluated the performance of four machine learning models—svm, glm, fda, and rf—in preparing a flood hazard map for a part of khuzestan province, using the area under the roc curve as the evaluation metric. the results revealed that the rf model achieved the highest accuracy, with an area under the curve of 98.8%, and was identified as the most suitable model for predicting flood risk areas. according to this model, the areas classified as very low, low, medium, high, and very high risk accounted for 34.2%, 29.7%, 18.9%, 12.4%, and 4.7% of the region, respectively. additionally, the glm and fda models demonstrated acceptable accuracy, with auc values of 76.3% and 75.2%, respectively. these results underscore the efficacy of machine learning models in predicting flood risk areas. given the increasing population, urban development, and infrastructure expansion in mountainous areas and floodplains, it is essential to develop various hazard susceptibility maps and multi-hazard maps for sustainable development. future research should focus on evaluating different machine learning models and creating hazard maps for other potential hazards in the region, ultimately leading to the development of comprehensive multi-hazard maps. the findings of this research will assist decision-makers and policymakers in making informed planning decisions for both current and future land use and infrastructure development.
Keywords	: data mining models ,flood hazard ,random forest model ,khuzestan province.