مدلسازی غلظت ذرات معلق pm2.5 بر مبنای جانهی داده‌ها و استفاده از روش‌های یادگیری ماشین

Fa | Ar | En

مدلسازی غلظت ذرات معلق pm2.5 بر مبنای جانهی داده‌ها و استفاده از روش‌های یادگیری ماشین


نویسنده	حق بیان سارا ,تشیع بهنام ,حسینی مریم
منبع	علوم و فنون نقشه برداري - 1402 - دوره : 12 - شماره : 4 - صفحه:77 -89
چکیده	مدلسازی دقیق تغییرات مکانی-زمانی نیازمند روش مناسب و داده های کامل و دقیق است. داده ‌ها از حسگرهای ایستگاه ‌های پایش جمع آوری می ‌شوند. تعداد این ایستگاه ها محدود است و به دلیل عوامل اجتناب ناپذیر بخشی از داده ها از دست می روند. نوآوری مقاله حاضر، غلبه بر محدودیت ‌های روش ‌های موجود در جانهی مقادیر از دست رفته pm2.5 است. محدودیت روش ‌های موجود، عدم توجه همزمان به مکانیسم مکانی-زمانی داده های از دست‌رفته است. به منظور غلبه بر محدودیت ‌های روش های موجود، جانهی مقادیر از دست رفته pm2.5 همراه با در نظر گرفتن روابط بین متغیرها با حفظ تغییر پذیری و عدم قطعیت طبیعی آن ‌ها، با استفاده از مدل ‌های درخت اضافی و درخت تصمیم پیاده‌سازی گردید. نتایج نشان داد که روش درخت اضافی به دلیل کاهش سوگیری با میانگین 0.80=r2دقت بالاتری از روش درخت تصمیم با میانگین 0.64=r2 در جانهی مقادیر از دست رفته pm2.5دارد. پس از مدیریت داده ‌های از دست رفته با استفاده از روش درخت اضافی، از روش xgboost به دلیل ارزیابی غیرخطی اهمیت متغیرهای موثر با هدف افزیش دقت و کاهش هزینه محاسباتی برای مدلسازی تغییرات مکانی-زمانی آلاینده pm2.5 در بافت های مختلف جغرافیایی کلانشهر تهران استفاده گردید. متغیرهای موثر درنظرگرفته شده برای جانهی و مدلسازی شامل داده های هواشناسی و سایر آلاینده های اصلی نظیر o3،pm10،co،so2، no2 است. متغیرهای هواشناسی شامل مجموع بارش، رطوبت نسبی، دما از مدل ecmwf استخراج گردیدند. استفاده از مدل ecmwf علاوه بر افزایش تعداد ایستگاه هواشناسی، امکان استفاده از رزولوشن یک ساعتی با تعداد بسیار ناچیز داده از دست رفته را در مقابل تعداد محدود، رزولوشن سه ساعتی با تعداد زیاد داده از دست‌رفته هواشناسی را فراهم می ‌کند.
کلیدواژه	داده‌های از دست رفته، یادگیری ماشین، درخت تصمیم، xgboost. ,pm2.5 ,xgboost
آدرس	دانشگاه اصفهان, دانشکده مهندسی عمران و حمل و نقل, ایران, دانشگاه اصفهان, دانشکده مهندسی عمران و حمل و نقل, ایران, دانشگاه اصفهان, دانشکده مهندسی عمران و حمل و نقل, ایران
پست الکترونیکی	maryam.hosseinii1977@gmail.com

modeling spatial-temporal changes in pm2.5 concentration based on data imputation and the use of machine learning methods in different geographical contexts of the tehran metropolis

Authors	haghbayan sara ,tashayo behnam ,hosseinii maryam
Abstract	management of exposure and dealing with the consequences of the concentration of pm2.5 in urban environments requires accurate modeling of spatial-temporal changes of pollutant. accurate modeling of spatial-temporal changes requires appropriate modeling methods and complete and accurate data. these data are measured by different sensors and with different accuracy, have different variability and due to unavoidable factors such as sensor damage. missing data cause many problems such as loss of sample size and errors in data analysis; therefore, it is necessary to use solutions to estimate the missing data in modeling the concentration of pm2.5. in this study, a method based on extra tree and decision tree models has been proposed to imputation the missing values of pm2.5 along with considering the relationships between variables while maintaining their variability and natural uncertainty. meteorological variables and other main pollutants such as o3, pm10, co, so2, no2 were considered as effective variables in imputation the missing values of pm2.5. meteorological variables including total precipitation, relative humidity, and temperature were extracted from the model of the european center for medium-term weather forecasting. using the ecmwf model, in addition to increasing the number of meteorological stations, provides the possibility of using hourly resolution with a very small number of missing data, as opposed to a limited number of three-hour resolutions with a large number of missing meteorological data. the results showed that the extra tree method has a higher accuracy than the decision tree method with an average of r2=0.813 due to the reduction of bias with an average of r2=0.653 in imputation of missing pm2.5 values. after managing the missing data using the extra tree method, the xgboost method was used due to the non-linear evaluation of the importance of the effective variables with the aim of increasing the accuracy and reducing the computational cost for modeling the spatial-temporal changes of the pm2.5 pollutant in different geographical contexts.
Keywords	pm2.5 ,missing data ,machine learning ,extra tree ,decision tree