ترمیم داده های مفقود هواشناسی با روش های تکاملی و یادگیری ماشین مطالعه موردی: بارش و دمای ماهانه درازمدت مشهد

Fa | Ar | En

ترمیم داده های مفقود هواشناسی با روش های تکاملی و یادگیری ماشین مطالعه موردی: بارش و دمای ماهانه درازمدت مشهد


نویسنده	فرزندی محبوبه ,ثنایی نژاد حسین ,قهرمان بیژن ,سرمد مجید
منبع	آب و خاك - 1398 - دوره : 33 - شماره : 2 - صفحه:361 -377
چکیده	بارش و دما از مهم‌ترین متغیرهای هوا و اقلیم شناسی هستند. طول دوره آماری اهمیت بسزایی در دقت تحلیل این دو متغیر دارد. حجم نمونه کمتر از 100 سال نمی تواند نوسانات دراز مدت را به خوبی منعکس کند. طولانی ترین آمار مربوط به دما و بارش ماهانه مشهد نزدیک به 125 سال (از حدود 1893 الی 2017) است. متاسفانه این آمار مفقودی دارد. ترمیم داده های مفقود و افزایش دقت برآورد آن ها هدف این پژوهش است. ایستگاه هایی از کشورهای مجاور به عنوان ایستگاه های مبنا انتخاب شدند. ابتدا داده های مفقود با برازش ده الگوی رگرسیونی چندگانه برای بارش ماهانه (با ضرایب تعیین 0.63 تا 0.81) و شش الگو برای دمای ماهانه (0.986تا 0.993) ترمیم شدند. سپس برای کاهش خطاها، پارامترهای الگوهای رگرسیونی با روش های ga و aco بهینه شدند. افزون بر این دو روش ann و svr نیز به منظور الگوسازی این داده ها نیز به کار گرفته شدند. نتایج نشان داد ga و aco دقت برآورد داده های مفقود بارش را نسبت به روش های رگرسیونی فوق به طور چشمگیری افزایش می دهد. کمترین rmse بین تمام الگوهای رگرسیونی بارش 9.79 میلی متر است. این معیار با روش ga به 2.560 میلی متر و با aco به 2.559 کاهش می بابد. کمترین rmse بین الگوهای رگرسیونی دما 0.986 میلی متر است. این معیار با روش ann به 0.726 میلی متر و با svr نیز به 0.551 کاهش می بابد. مقایسه ترمیم دما و بارش نشان می دهد که روش های تکاملی برای بارش و روش های یادگیری ماشین برای دما عملکرد بهتری دارند.
کلیدواژه	الگوریتم ژنتیک، داده مفقود، رگرسیون بردار پشتیبان، شبکه عصبی مصنوعی، کلونی مورچگان
آدرس	دانشگاه فردوسی مشهد, دانشکده کشاورزی, گروه علوم و مهندسی آب, ایران, دانشگاه فردوسی مشهد, دانشکده کشاورزی, گروه علوم و مهندسی آب, ایران, دانشگاه فردوسی مشهد, دانشکده کشاورزی, گروه علوم و مهندسی آب, ایران, دانشگاه فردوسی مشهد, دانشکده ریاضی, گروه آمار, ایران

Imputation of Missing Meteorological Data with Evolutionary and Machine Learning Methods Case Study: Longterm Monthly Precipitation and Temperature of Mashhad

Authors	farzandi mahboobeh ,Sanaeinejad Seyed Hossein ,Ghahraman Bijan ,Sarmad Majid
Abstract	;Introduction: Temperature and precipitation are two of the main variables in meteorology and climatology. These are basic inputs in water resource management. The length of the statistical period plays a pivotal role in the accurate analysis of these variables. Observation data at Iran 's first synoptic station from 1330 (1951) is available at the Iranian Meteorological Organization website The historical monthly precipitation and temperature of five stations in Iran is available since 1880 with missing data. These data measured by the Embassy of the United States and Britain from the Qajar period and recorded in World Weather records books. These synoptic stations include Mashhad, Isfahan, Tehran, Bushehr, and Jask. The monthly missing data were predominantly recorded during World War II (19411949). Unfortunately, these data have missing. Therefore, the accuracy of simulating these variables is very important.   The current research aimed to predict the missing values of monthly temperature and precipitation in Mashhad station. The stations in the neighboring countries were selected due to the distance to Mashhad, relationship, and completeness of data since 1880, as the predictive variables. Monthly precipitation of Ashgabat from Tajikistan and Sarakhs, Kooshkah, Bayram Ali, Kerki and Repetek from Turkmenistan were selected as an independent variable in the making of Missing Rainfall in Mashhad. Also, the temperature of Ashgabat, Bayram Ali, Gudan, Sarakhs, and Tajan were selected to restore the monthly temperature of the Mashhad station. This research has fitted ten multiple regression models to monthly rainfall of Mashhad station and has fitted 6 multiple regression to the monthly temperature of Mashhad. then the parameters of these patterns are optimized by genetic and Ant Colony algorithm. Also, the Artificial Neural Network (MLP) model and Support vector regression have been selected and implemented in order to simulate monthly precipitation and temperature data of Mashhad.;  Materials and Methods: In  statistical modeling,  regression analysis  is a set of statistical processes for  estimating  the relationships among variables. It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a  dependent variable  and one or more  independent variables  (or 'predictors '). Genetic algorithm  (GA) is a  metaheuristic  inspired by the process of  natural selection  that belongs to the larger class of  evolutionary algorithms  (EA). Genetic algorithms are commonly used to generate highquality solutions to  optimization  and  search problems  by relying on bioinspired operators such as  mutation,  crossover, and  selection. Ant colony optimization  algorithm  (ACO) is a probabilistic  technique for solving computational problems which can be reduced to finding good paths through  graphs. This algorithm is a member of the  ant colony algorithms  family, in  swarm intelligence  methods, and it constitutes some  metaheuristic optimizations. Artificial neural networks are one of the main tools used in machine learning. As the “neural” part of their name suggests, they are braininspired systems which are intended to replicate the way that we humans learn. Neural networks consist of input and output layers, as well as (in most cases) a hidden layer consisting of units that transform the input into something that the output layer can use. They are excellent tools for finding patterns which are far too complex or numerous for a human programmer to extract and teach the machine to recognize. In  machine learning,  support vector machines  (SVMs, also  support vector networks) are  supervised learning  models with associated learning  algorithms  that analyze data used for  classification  and  regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a nonprobabilistic  binary  linear classifier  (although methods such as  Platt scaling  exist to use SVM in a probabilistic classification setting).;Results and Discussion: At the first stage, several multiple regressions were fitted to monthly precipitation (with coefficients ranging from 0.63 to 0.81) and six patterns for monthly temperature (0.9860.993). Afterward, GA and ACO were applied to improve the accuracy of the selected regression models by optimizing their parameters. At the next stage, ANN and SVR were used to estimate the monthly missing values separately. Finally, the results of the previous stages were compared using the root mean square error (RMSE), and the optimal models were applied to determine the missing values of monthly temperature and precipitation of Mashhad. The results showed that the Genetic Algorithm and Ant Colony increase the accuracy of the estimation of missing rainfall data significantly more than the previous methods. The lowest error criterion (RMSE) between regression patterns is 9.8 millimeters. By genetic algorithm, this criterion is reduced to 2.56 mm, and by ant colony algorithm to 2.559.;Conclusion: Comparison of the above methods in restoration temperature and precipitation shows that evolutionary methods (GA and ACO) are the best for estimating the missing monthly precipitation and machine learning methods (ANN and SVR) are the best to imputation missing data of monthly temperature.
Keywords