ABSTRACT
Due to global climate change, managing water resources is one of the most critical challenges for most countries in the world, especially in the Middle East. In the Kurdistan Region of Iraq (KRI), there is a good amount of precipitation, surface water, and groundwater, but the main issue is mismanagement of those sources. Rainfall is one of the major sources of water resources in KRI. In order to manage the available water resources and prevent natural disasters such as floods and droughts, there is a need for reliable models for forecasting rainfall. The current study focuses on developing a hybrid model, namely seasonal autoregressive integrated moving average combined with an artificial neural network (SARIMA-ANN) for forecasting monthly rainfall at Sulaymaniyah City for the duration of 1938–2012. For comparison purposes, a conventional machine learning model, namely artificial neural networks (ANN) has been applied on the same data. Two different statistical measurements, namely, root mean square error (RMSE) and coefficient of determination (R2), have been used to check the accuracy of the proposed models. According to the findings, SARIMA-ANN outperformed ANN with RMSE = 11.5, RMSE = 51.002, R2 = 0.98, R2 = 0.43, respectively. The findings of the current study could contribute to Sustainable Development Goal (SDG) 6.
HIGHLIGHTS
An innovative hybrid model has been developed for rainfall prediction.
Accurate rainfall prediction will lead to better management of water resources.
Rainfall prediction is an important hydrological tool due to global climate change.
ABBREVIATIONS
INTRODUCTION
Water is one of the basic needs of humans as 60% of the human body is made of water. Water is utilized for different categories such as washing, cleaning, cooking, and so on. Just as life does not exist without oxygen, so it does not exist without water, and it is the only source that supports the Earth to distinguish itself from other planets and provide opportunities for life. The Earth's surface consists of three parts water and one part land. Finding water anywhere else in the universe is nearly impossible. It is important for all living things; therefore, water must be preserved for future generations. Water resource management is the process of planning, protecting, and improving water resources in terms of quality and quantity. Managing water resources is important in order to be able to supply water for domestic, irrigation, hydropower production, etc. Water resources management also helps to reduce the risks of extreme events such as floods (Akram & Hamid 2013; Vieira et al. 2020; Yang & Liu 2020; Fereshtehpour et al. 2021).
Nowadays, machine learning (ML) models have been used in the field of water management, especially to predict rainfall (Latif et al. 2023). ML is able to predict rainfall based on historical rainfall data. Nowadays, accurate prediction of rainfall is a big challenge due to climate change (Basha et al. 2020). ML models are widely used to predict future rainfall for the short and long term. The ML model relies on hydrological variables and data on previous rainfall and accurately processes it to predict future rainfall. Researchers around the world rely on ML due to the in-depth and accurate predictions it makes for the data (Parmar et al. 2017; Ahmed et al. 2020).
Nowadays, the Kurdistan Region of Iraq (KRI) faces water scarcity. The water level has decreased in both Dokan and Darbandikhan dams which are the main sources of water for Sulaymaniyah City in terms of supplying water for society. This water scarcity will lead to other issues in the future if the authority is not able to resolve it. The biggest problem in the KRI is the lack of a good water management plan. Developing a reliable model for predicting rainfall is considered one of the most important tools for managing water resources (Tinti 2017; Mohammed et al. 2018).
Having a wider population would increase water demand. To ensure there is enough water for humans, agriculture, and industry, effective water management is needed. Water resources are significantly affected by climate change, including shifts in precipitation patterns, more severe droughts, and more intense rainfall events (Boretti & Rosa 2019; Zubaidi et al. 2020; Barker et al. 2021). In today's world, starting to predict water resources with ML techniques is increasingly considered an important approach due to their many advantages over conventional modeling techniques (Latif & Ahmed 2023). ML algorithms can process a lot of data and find patterns that manual analysis might miss. Predicting the availability, demand, and quality of water can be made more accurate with ML to recognize complex relationships and patterns in data. The speed with which ML algorithms can process a large amount of data makes it ideal for real-time applications like flood forecasting and water management. ML algorithms are useful for predicting water resources under changing climate conditions and other environmental factors since they can adapt to changing conditions and learn from the new data. By determining the best methods for allocating and managing water to satisfy the requirements of various stakeholders, ML algorithms can enhance water management strategies. By automating data processing and analysis, ML methods can lower the cost of water resource management (Shen 2018).
It can be concluded that ML methods are important for predicting water resource parameters since they improve accuracy, speed, adaptability, optimization, and cost-effectiveness for effective management of water resources in today's rapidly changing world (Li & Sansalone 2022). Many studies have been proposed for forecasting monthly rainfall in different regions. For example, a study conducted by Ali & Shahbaz (2020) proposes an effective method for runoff forecasting by modeling the relationship between precipitation and runoff to identify the optimal rainfall pattern for forecasting daytime streamflow. They have identified different sets of rainfall antecedent components and developed an artificial neural network (ANN) model. Their findings show the potential of ANN based approaches as an effective alternative for solving hydrological problems. Furthermore, another study proposed by Adede et al. (2015) aims to describe an ML experiment that uses ANN to predict rainfall. The data set was divided into three subsets of training, validation, and testing. The test was run 100 times with different random distribution of data. For each fold, ANN was trained 100 times resulting in 10,000 prediction ensembles. The goal of repeatedly training an ANN was to build a more robust model that could handle different data variations. Moreover, another study conducted by Pham et al. (2020) uses meteorological variables such as maximum and minimum temperature, wind speed, relative humidity, and solar radiation at different altitudes as input parameters to predict daily rainfall in Hoa Province in Vietnam. The utilized models were adaptive network-based fuzzy inference systems (ANFIS) optimized with particle swarm optimization (PSO), ANN, and support vector machine (SVM). The results showed that all AI models provided adequate forecasts, but SVM proved to be the best method for forecasting rainfall. In another study, Malki et al. (2020) aimed to investigate the relationship between weather variables, particularly temperature and humidity, and the spread of COVID-19. Various ML models were proposed and employed to extract this relationship using data on the number of confirmed cases and weather variables in certain regions. The study found that weather variables, particularly temperature, are more relevant in predicting the mortality rate of COVID-19 compared to census variables. This suggests that temperature and humidity are important features for predicting the mortality rate of COVID-19. Furthermore, their study indicates that higher temperatures are associated with lower numbers of infection cases, suggesting a potential relationship between temperature and the spread of COVID-19.
The aim of the current study is to develop a hybrid ML model, namely SARIMA-ANN, for forecasting monthly rainfall in Sulaymaniyah City, located in the north of Iraq. In order to compare the accuracy of the proposed hybrid SARIMA-ANN model, ANN has been also applied to the same dataset.
MATERIALS AND METHODS
Study area
Data
Artificial neural network
SARIMA-ANN
Input selections
Auto-correlation function (ACF) is a statistical technique used to determine how closely a time series of data and its lagged values correlate. In other words, it evaluates how closely a data point resembles its previous values. The correlation coefficient between time series data and a lagged version of that data at various time lags is used to calculate the ACF. The outcome is a collection of correlation coefficients, each of which is associated with a different time lag. The ideal lag value for a time series model can be chosen using ACF to spot patterns and trends in time series data. In this research, ACF was used to find the most correlated lagged data. All the training data in the ANN system were built on 4-lag-time as recommended by ACF.
Evaluation metrics
Metrics used to assess an ML model's performance are called evaluation metrics. With the support of these measures, it is possible to evaluate the model's effectiveness in terms of accuracy, precision, recall, and other crucial factors. This research depends on two types of evaluation metrics. The first type is RMSE, and the second type is R2.
Root mean squared error
R2
RESULT AND DISCUSSION
Predicting the amount of rainfall in Sulaymaniyah after testing the data through the ANN model could not achieve accurate results. In the months when the amount of rainfall in the city was high, the level of prediction was very poor in testing the data through the ANN model. In the next step, SARIMA-ANN was developed, and the best model obtained an accurate rainfall forecasting result.
ANN results
ANN models . | Input selection . | R2 . | RMSE . |
---|---|---|---|
Model 1 | Rt−1 | 0.335 | 55.05 |
Model 2 | Rt−1, Rt−2 | 0.362 | 54.312 |
Model 3 | Rt−1, Rt−2, Rt−3 | 0.386 | 53.292 |
Model 4 | Rt−1, Rt−2, Rt−3, Rt−4 | 0.431 | 51.002 |
Model 5 | Rt−1, Rt−2, Rt−3, Rt−4, Rt−5 | 0.372 | 55.501 |
ANN models . | Input selection . | R2 . | RMSE . |
---|---|---|---|
Model 1 | Rt−1 | 0.335 | 55.05 |
Model 2 | Rt−1, Rt−2 | 0.362 | 54.312 |
Model 3 | Rt−1, Rt−2, Rt−3 | 0.386 | 53.292 |
Model 4 | Rt−1, Rt−2, Rt−3, Rt−4 | 0.431 | 51.002 |
Model 5 | Rt−1, Rt−2, Rt−3, Rt−4, Rt−5 | 0.372 | 55.501 |
SARIMA-ANN results
Since Model 4 outperforms all models based on both RMSE and R2 metrics, SARIMA-ANN has been developed based on the input combination scenario of Model 4. Table 2 shows the results of the developed SARIMA-ANN model for predicting rainfall. It shows that RMSE is 11.5 and R2 is 0.98 which outperformed the most accurate model of ANN (Model 4) with a significant difference of RMSE = 51.002 and R2 = 0.431. It shows the successful results of using the SARIMA-ANN model in this study. The SARIMA-ANN model successfully identified the data and accurately worked on it to forecast monthly rainfall in Sulaymaniyah. The model forecasted low and high levels of rainfall in Sulaymaniyah accurately, which helps sustainable water management.
SARIMA-ANN . | Input selection . | R2 . | RMSE . |
---|---|---|---|
Developed model | Rt−1, Rt−2, Rt−3, Rt−4 | 0.98 | 11.5 |
SARIMA-ANN . | Input selection . | R2 . | RMSE . |
---|---|---|---|
Developed model | Rt−1, Rt−2, Rt−3, Rt−4 | 0.98 | 11.5 |
Based on the findings of the current study, SARIMA-ANN could be very useful for Sulaymaniyah City since it could accurately predict the monthly rainfall data. On the other hand, studies for rainfall prediction in Sulaymaniyah using hybrid models are rarely found in the literature. Therefore, the findings of the current study could fill this gap in the literature.
CONCLUSION
The water level of Sulaymaniyah City rises and falls due to climate change. ANN failed to perform well in predicting monthly rainfall in Sulaymaniyah. The main obstacle to the low prediction results was the low quantity of data. The fourth model was selected as the highest accuracy in the ANN model. Later, the input selection scenario of the fourth model was developed using SARIMA-ANN and obtained a high accuracy of RMSE = 12.209 and R2 = 0.979. This developed model of the current research could be considered as a suitable tool for managing rainfall in Sulaymaniyah City. It is recommended for future studies to use daily rainfall data since it will help the ML models train better and have more accurate predicted values.
AVAILABILITY OF DATA AND MATERIAL
Data will be made available on reasonable request.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.