ABSTRACT
Using machine learning methods is efficient in predicting floods in areas where complete data is not available. Therefore, this study considers the Adaptive Neuro-Fuzzy Inference System (ANFIS) model combined with evolutionary algorithms, namely Harris Hawks Optimization (HHO) and Arithmetic Optimization Algorithm (AOA), to predict the flood of Shahrchay River in the northwest of Iran. The data used included the daily data of precipitation, evaporation, and runoff in the years 2016 and 2017, where 70% of the data were used for model training and the rest for testing the models. The results showed that although the ANFIS model provided values with high errors in several steps, especially in steps with maximum or minimum values, the use of HHO and AOA optimization algorithms resulted in a significant reduction in the error values. The ANFIS-AOA model utilizing an input scenario including the flow in the previous one to three days exerted the most promising results in the test data, with Nash Sutcliffe Efficiency (NSE) Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) of 0.93, 1.34, and 0.69, respectively. According to Taylor's diagram, the ANFIS-AOA hybrid algorithm predicts flood values with greater performance than the other models.
HIGHLIGHTS
Integration of ANFIS methods with optimization algorithms was done.
Effective flood estimation with limited data was done.
Quantitative assessment metrics and model validation were performed.
Optimal input pattern selection was done.
Practical implications for water management were studied.
INTRODUCTION
Floods are one of the most harmful and destructive natural disasters, causing significant damage to infrastructure, agriculture, human life, and the socioeconomic system. Heavy and excess rainfall and inadequate drainage systems are typically associated with flooding (Sankaranarayanan et al. 2020). Because of both these problems, governments are under pressure to create precise and trustworthy maps of flood risk areas and to make more plans for flood risk management that emphasize readiness, protection, and prevention (Danso-Amoako et al. 2012; Oborie & Rowland 2023). For this reason, flood forecasting models are a crucial tool for managing extreme events and assessing risk. Reliable and precise forecasting supports policy recommendations and analyses, strategies for managing water resources, and additional discharge modeling (Xie et al. 2017). Thus, in order to minimize damage, it is crucial to provide forecasting systems for both short- and long-term floods and other hydrological events (Pitt 2008).
By forecasting the extent and volume of a flood, it is possible to evaluate its risk, and appropriate measures can be taken to reduce financial and human losses. However, since weather patterns are dynamic, predicting the exact time and location of floods is typically challenging. Major flood forecasting models of today include a variety of simplifying assumptions and are primarily data-specific (Lohani et al. 2014; Farahmand et al. 2023).
Various experimental black-box, event-oriented, deterministic, continuous, and hybrid techniques are employed to simulate the behavior of watersheds (Nayak et al. 2005; Kayhomayoon et al. 2022). As a result, there is very little short-term forecasting. The application of physical models for flood forecasting presents severe limitations in addition to the requirement for in-depth knowledge and proficiency regarding hydrological parameters (van den Honert & McAneney 2011; Costabile & Macchione 2015). According to recent research, physical models are not efficient in predicting hydrological phenomena compared to hydrological models and machine learning methods (Devia et al. 2015; Kayhomayoon et al. 2023). It is crucial to employ numerical methods to forecast hydrological events (Dazzi et al. 2021). These models require a large dataset, including topography, land use, and precipitation intensity and duration, among other things, in order to correctly and accurately predict phenomena like runoff and precipitation. However, in some places, this information might not be easily accessible or available (Dazzi et al. 2021).
The third type of model, known as the data-driven models, are devoid of the challenges and issues associated with the physical and conceptual/numerical models. These models only use observational data from the past to make predictions. These machine learning-based models ignore the actual physical environment in which the desired phenomenon occurs (Hashemi et al. 2014). Owing to the algorithm built into their design, they are able to quickly and efficiently make predictions by recognizing the relationships between input and output data (Dazzi et al. 2021; Sadio & Faye 2023). Machine learning models are commonly employed in studies related to hydrology and hydrogeology. Artificial Neural Network (ANN), Support Vector Machine (SVM), decision trees, Fuzzy Inference System (FIS), gene expression programming, Adaptive Neuro-Fuzzy Inference System (ANFIS), and hybrid models that benefit from an optimization scheme, like Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Harris Hawks Optimization (HHO), Arithmetic Optimization Algorithm (AOA), and Slime Mold Algorithm (SMA) are a few of the many approaches from which these models are derived (Javidan et al. 2022).
The ANFIS model is one of the first machine learning models, which is a combination of ANN and FIS. This model has attracted the attention of researchers since it can be widely used to consider uncertainty. Recently, evolutionary algorithms have been used to improve the prediction accuracy of these models in prediction problems. The HHO and AOA methods are among the most recent evolutionary algorithms used for optimization problems. Developed by Heidari et al. (2019), the HHO algorithm works based on mimicking the hunting of prey by Harris hawks, while the AOA works on mathematical operators and is introduced by Abualigah et al. (2021). The use of evolutionary algorithms to improve prediction accuracy has been recommended by various researchers since the last decade, which seems necessary considering the introduction of new algorithms in prediction problems. Flood studies and other hydrological phenomena have benefited from the successful application of these models (Rezaeianzadeh et al. 2014; Adib & Mahmoodi 2017; Mosavi et al. 2018; Sharifi Garmdareh et al. 2018; Wu et al. 2019).
As an example, the use of ANFIS and other machine learning models in flood forecasting by Rezaeianzadeh et al. (2014), the use of genetic algorithms in improving the results of flood forecasting using ANN by Adib & Mahmoodi (2017), and the use of other machine learning models such as SVM, ANN, and ANFIS in predicting flood peak discharge by Sharifi Garmdareh et al. (2018). Although the results of these studies and other similar research show that various machine learning models have been successfully used in the field of flood forecasting, the use of new evolutionary algorithms in improving the accuracy of these models has been limited. In addition, climatic factors are involved in forecasting the amount of flood, and considering them can help in better flood forecasting (Rahmani-Rezaeieh et al. 2020).
Although the ANFIS model has many benefits (Firat & Güngör 2007; Esmaili et al. 2021), it might perform poorly in some problems because of producing irrational output values in some points or being trapped in local minimum points (Qasem et al. 2017; Seifi & Riahi-Madvar 2019; Kayhomayoon et al. 2023; Milan et al. 2021). One of the best approaches to increase the prediction performance of single models in such scenarios is to employ evolutionary algorithms. These algorithms can be used to boost prediction accuracy and enhance the performance of individual models by optimizing the values of their hyperparameters (Dehghani et al. 2019; Arya Azar et al. 2021a). One of the most recently introduced optimization techniques is the AOA (Abualigah et al. 2021). This algorithm is efficient in terms of exploration–exploitation balance and convergence speed, and it complies with Archimedes' law (Abualigah et al. 2021). Numerous studies have employed this algorithm to produce acceptable output data (Zhang et al. 2021a, 2021b; Patil et al. 2022; Kayhomayoon et al. 2023). Thus, in this study, for the first time, the potential of two evolutionary algorithms, namely HHO and AOA, is investigated to boost the prediction performance of ANFIS and present a reliable flood prediction model. Therefore, it can be said that the development of hybrid models for flood forecasting is among the novelty of this research compared to previous work. The results can answer the question of how much hybrid methods are useful to improve the performance of base models in flood prediction.
Floods in Iran are primarily caused by either rain or snowmelt. The most frequent cause of flooding in Iran's temperate and cold regions – the north, northwest, and a sizable portion of the country's west – is the result of a combination of rain and snowmelt. A thorough analysis employing historical data and previous occurrences is required to prevent the negative effects of flooding, even though the intensity of flooding depends on a number of factors, including topography, climatic conditions, and vegetation. The Shahrchay River in northwest Iran was chosen as the study area for this research to predict flood discharge using machine learning models.
This area is one of the most important areas of the Lake Urmia's basin, which has been involved in various floods in the last few decades. In addition, several studies on using machine learning models and optimal evolutionary algorithms to achieve the proper accuracy of flood forecasting have been carried out in this field so far.
Various variables were taken into consideration to create several input scenarios for the models to predict floods. Initially, the correlation between the model output (flood amount) and the influencing variables was measured. Subsequently, the input scenarios were defined and the models were assembled using the input scenarios. Ultimately, the best model and scenario for flood forecasting was identified after examining the performance of each model using statistical and graphical evaluation criteria.
MATERIALS AND METHODS
Study area and data
The factors influencing the flow rate should be identified in order to estimate the flood. For the 2 years, 2016 and 2017, the daily values of evaporation (E), precipitation (P), and flow (Q) with delays of 1, 2, and 3 days were taken into consideration as input variables. Table 1 displays the statistical properties of the data. For modeling, 730 data samples were taken into account. The average data for the flow rate was 3.81 m3/day.
Statistical parameters of the daily data used in this study
Statistical parameter . | Flood flow (m3/day) . | Precipitation (mm) . | Evaporation (mm) . |
---|---|---|---|
Number of data | 730 | 730 | 730 |
Max | 17.5 | 52.8 | 3.01 |
Mean | 3.81 | 1.16 | 3.01 |
Min | 0.004 | 0.00 | 0.00 |
Skew | 1.12 | 7.00 | 0.65 |
SD | 4.96 | 2.56 | 2.76 |
Mode | 0.103 | 0.00 | 0.00 |
Statistical parameter . | Flood flow (m3/day) . | Precipitation (mm) . | Evaporation (mm) . |
---|---|---|---|
Number of data | 730 | 730 | 730 |
Max | 17.5 | 52.8 | 3.01 |
Mean | 3.81 | 1.16 | 3.01 |
Min | 0.004 | 0.00 | 0.00 |
Skew | 1.12 | 7.00 | 0.65 |
SD | 4.96 | 2.56 | 2.76 |
Mode | 0.103 | 0.00 | 0.00 |
The maximum discharge of the Shahrchay River at the Urmia Band station.
Formulating efficient input scenarios in which various input variables participate is required to achieve the highest prediction performance. The input scenarios were defined in this work according to the highest correlation coefficient. Table 2 displays the correlation coefficient between the model output (flood amount) and the input variables.
Input scenarios for predictive models
Correlation coefficient Scenario . | Q (t–1) . | Q (t–2) . | Q (t–3) . | E . | P . |
---|---|---|---|---|---|
. | 0.94 . | 0.83 . | 0.71 . | 0.27 . | − 0.026 . |
S1 | * | ||||
S2 | * | * | |||
S3 | * | * | * | ||
S4 | * | * | * | * | |
S5 | * | * | * | * | * |
Correlation coefficient Scenario . | Q (t–1) . | Q (t–2) . | Q (t–3) . | E . | P . |
---|---|---|---|---|---|
. | 0.94 . | 0.83 . | 0.71 . | 0.27 . | − 0.026 . |
S1 | * | ||||
S2 | * | * | |||
S3 | * | * | * | ||
S4 | * | * | * | * | |
S5 | * | * | * | * | * |
The table shows that the discharge 1 day and 2 days prior to the model output had the highest correlation, with correlation coefficients of 0.94 and 0.83, respectively. Table 2 also displays five input scenarios and their variables defined based on the correlation coefficients. The flow with a 1-day delay is included in the first pattern. The flow is also included in the second scenario but with two daily delays: one and two. Scenarios 3–5 were defined based on a similar approach.
Machine learning models
In this study, ANFIS as well as its hybrid models with HHO and AOA algorithms have been used for flood forecasting. Implementing these modeling approaches was done in the MATLAB R2021 programming environment.
Adaptive Neuro-Fuzzy Inference System
This model was introduced to leverage the performance of these two models to increase simulation performance. This model is composed of layers, much like a neural network; however, it makes use of rule formulation, much like a fuzzy logic system (Kayhomayoon et al. 2023).
Fuzzy membership functions, such as bell, trapezoidal, Gaussian, and triangular functions, are employed in numerous studies (Zhu et al. 2010; Milan et al. 2018). The weight of the rules is determined in the second layer, or rule node, by multiplying the input values to each node collectively. The ‘AND’ operator is used in this research, but the AND or OR operators can also be used in this layer. The weight of the rules is normalized by the nodes in the third layer. The rules are obtained in the result nodes, which are referred to as the rules layer. In the de-fuzzification process, this layer converts each fuzzy rule's output into a non-fuzzy one.
Harris Hawks Optimization
Arithmetic Optimization Algorithm
Introduced by Abualigah et al. (2021), the AOA is a new meta-heuristic optimization algorithm that makes use of the distribution behavior of the primary arithmetic operators in mathematics. The AOA is designed to carry out optimization procedures in a variety of search spaces through mathematical modeling and implementation. Without computing the derivatives of optimization problems, this population-based algorithm can resolve them. This optimization process begins with the generation of a population or random set of potential solutions. The main four arithmetic operators estimate the feasible positions of the near-optimal solution during each iteration.
Performance evaluation criteria


RESULTS AND DISCUSSION
ANFIS includes several hyperparameters whose values should be optimized for better prediction tasks. For example, the number of fuzzy rules or the type of fuzzy membership function can produce various values for the model output. Model training established the proper architecture of the ANFIS hybrid model considering the optimized parameter values (Table 3). According to the table, the Gaussian membership function was identified as the most suitable fuzzy membership function for the ANFIS model. The first-order linear output function was identified as the best type because the Sugeno fuzzy type is used in ANFIS and produces a function as a result. The HHO algorithm specifies the proper values of the parameters for the given problem by running the model with various values until it reaches the appropriate values for each parameter.
Optimized parameters of ANFIS and HHO–ANFIS models
Model . | Parameter . | value . |
---|---|---|
ANFIS | Fuzzy structure | Sugeno- type |
Initial FIS for training | Genfis3 | |
The type of membership functions | Gaussian | |
The membership function of the output | Linear | |
Optimization method | Hybrid | |
Maximum number of fuzzy rules | 10 | |
The maximum number of epochs | 2,000 | |
HHO | Number of search agent | 40 |
Iteration number | 2,000 | |
β | 1.5 | |
Range partitions (weights and biases) | [3, − 3] | |
Population size | 30 |
Model . | Parameter . | value . |
---|---|---|
ANFIS | Fuzzy structure | Sugeno- type |
Initial FIS for training | Genfis3 | |
The type of membership functions | Gaussian | |
The membership function of the output | Linear | |
Optimization method | Hybrid | |
Maximum number of fuzzy rules | 10 | |
The maximum number of epochs | 2,000 | |
HHO | Number of search agent | 40 |
Iteration number | 2,000 | |
β | 1.5 | |
Range partitions (weights and biases) | [3, − 3] | |
Population size | 30 |
Table 4 presents the results of using each scenario. In comparison with the hybrid models, the ANFIS model has performed poorly in forecasting river flow. For the test data, the RMSE values for all scenarios, except scenario 3, are greater than 2, while these values are less than 2 for the other two models. The ANFIS model has a relatively large error in the test data estimation, despite predicting the training data with relatively high performance. Using scenario 3, where the RMSE, MAPE, and NSE error evaluation criteria for the test data are 1.8, 1.07, and 0.89, respectively, the ANFIS model has produced the best results. Using scenario 5 (RMSE = 1.2, MAPE = 0.59, and NSE = 0.95), the best performance among the training data was achieved. However, for the test data, the RMSE, MAPE, and NSE values are increased to 2.25, 1.04, and 0.79, respectively.
Results of the error evaluation criteria
. | Scenarios . | RMSE (m3/day) . | MAPE (m3/day) . | NSE . | |||
---|---|---|---|---|---|---|---|
Training . | Test . | Training . | Test . | Training . | Test . | ||
ANFIS | 1 | 1.60 | 2.30 | 0.8 | 1.11 | 0.90 | 0.80 |
2 | 1.40 | 2.00 | 0.71 | 1.06 | 0.93 | 0.86 | |
3 | 1.58 | 1.80 | 0.88 | 1.07 | 0.91 | 0.89 | |
4 | 1.30 | 2.12 | 0.75 | 1.26 | 0.93 | 0.83 | |
5 | 1.20 | 2.25 | 0.59 | 1.04 | 0.95 | 0.79 | |
ANFIS–HHO | 1 | 1.86 | 1.74 | 0.94 | 0.95 | 0.87 | 0.90 |
2 | 1.49 | 1.84 | 0.8 | 0.82 | 0.92 | 0.88 | |
3 | 1.62 | 1.60 | 0.88 | 0.81 | 0.9 | 0.91 | |
4 | 1.68 | 1.40 | 0.88 | 0.74 | 0.90 | 0.92 | |
5 | 1.57 | 1.60 | 0.81 | 0.87 | 0.91 | 0.90 | |
ANFIS–AOA | 1 | 1.86 | 1.9 | 0.91 | 0.96 | 0.88 | 0.87 |
2 | 1.5 | 1.8 | 0.74 | 0.89 | 0.91 | 0.88 | |
3 | 1.67 | 1.34 | 0.86 | 0.69 | 0.90 | 0.93 | |
4 | 1.70 | 1.37 | 0.92 | 0.76 | 0.89 | 0.93 | |
5 | 1.60 | 1.59 | 0.85 | 0.86 | 0.91 | 0.91 |
. | Scenarios . | RMSE (m3/day) . | MAPE (m3/day) . | NSE . | |||
---|---|---|---|---|---|---|---|
Training . | Test . | Training . | Test . | Training . | Test . | ||
ANFIS | 1 | 1.60 | 2.30 | 0.8 | 1.11 | 0.90 | 0.80 |
2 | 1.40 | 2.00 | 0.71 | 1.06 | 0.93 | 0.86 | |
3 | 1.58 | 1.80 | 0.88 | 1.07 | 0.91 | 0.89 | |
4 | 1.30 | 2.12 | 0.75 | 1.26 | 0.93 | 0.83 | |
5 | 1.20 | 2.25 | 0.59 | 1.04 | 0.95 | 0.79 | |
ANFIS–HHO | 1 | 1.86 | 1.74 | 0.94 | 0.95 | 0.87 | 0.90 |
2 | 1.49 | 1.84 | 0.8 | 0.82 | 0.92 | 0.88 | |
3 | 1.62 | 1.60 | 0.88 | 0.81 | 0.9 | 0.91 | |
4 | 1.68 | 1.40 | 0.88 | 0.74 | 0.90 | 0.92 | |
5 | 1.57 | 1.60 | 0.81 | 0.87 | 0.91 | 0.90 | |
ANFIS–AOA | 1 | 1.86 | 1.9 | 0.91 | 0.96 | 0.88 | 0.87 |
2 | 1.5 | 1.8 | 0.74 | 0.89 | 0.91 | 0.88 | |
3 | 1.67 | 1.34 | 0.86 | 0.69 | 0.90 | 0.93 | |
4 | 1.70 | 1.37 | 0.92 | 0.76 | 0.89 | 0.93 | |
5 | 1.60 | 1.59 | 0.85 | 0.86 | 0.91 | 0.91 |
Note: The best-identified scenario for each model is highlighted in bold.
These findings suggest that the ANFIS network gets trapped in local minima while training. Using the ANFIS–HHO hybrid model based on scenario 4, the highest prediction accuracy was achieved for the test data, where the RMSE, MAPE, and NSE values were equal to 1.4, 0.74, and 0.92, respectively. These values were 1.68, 0.88, and 0.90 for the training data, respectively. Using the third scenario for test data (RMSE = 1.34, MAPE = 0.69, NSE = 0.93) and for training data (RMSE = 1.67, MAPE = 0.86, NSE = 0.9), the ANFIS–AOA model was able to produce the most promising results.
Time-series graph of observed and simulated data: (a) ANFIS, (b) ANFIS–HHO, and (c) ANFIS–AOA.
Time-series graph of observed and simulated data: (a) ANFIS, (b) ANFIS–HHO, and (c) ANFIS–AOA.
Scatter plot of observed and simulated data: (a) ANFIS, (b) ANFIS–HHO, and (c) ANFIS–AOA.
Scatter plot of observed and simulated data: (a) ANFIS, (b) ANFIS–HHO, and (c) ANFIS–AOA.
DISCUSSION
According to the results, the hybrid models introduced in this study exerted a promising performance by having the input variables of precipitation, evaporation, and also flow values one to three days before the prediction time. The results of this study show the appropriate performance of machine learning models that can be used in hydrological problems, which is in line with the results of Arya Azar et al. (2023) in the use of machine learning models in predicting evaporation and inflow to dams, Tayfur et al. (2018) in the prediction of flood flows, and Adnan Ikram et al. (2022) in the estimation of runoff. The results of this research and the improvement of the ANFIS model by hybrid approaches were consistent with the results of Samantaray et al. (2023) on using the ANFIS–PSO and ANFIS–SMA models in flood estimation.
Examining some of this research, for example, Arya Azar et al. (2023), shows that the HHO was used successfully to improve the ANFIS in predicting the longitudinal dispersion coefficient of rivers. In the research of Adnan Ikram et al. (2021), hybrid models performed well in improving the prediction accuracy of machine learning models, which is consistent with the results of this research. Also, Tayfur et al. (2018) used various evolutionary algorithms such as genetic algorithm, PSO, and ACO to improve the accuracy of ANN in flood forecasting. Similar to the results of the current research, they indicated the improvement of prediction accuracy by evolutionary algorithms. Therefore, it can be concluded that basic machine learning approaches such as neuron-based and kernel-based algorithms, are trapped in some local optimal points, while evolutionary algorithms do not suffer from such weaknesses. ANFIS might be very accurate in predicting the training data (Tayfur et al. 2018), but the data of the test section might not be predicted with such accuracy, while the combined models of ANFIS–HHO and ANFIS AOA had the same accuracy for predicting both parts of data, which is one of the advantages of these models. For this reason, in order to improve predictive accuracy in time-series and regression, and specifically in flood forecasting, it is always recommended to use novel algorithms.
Among the input scenarios developed, the hydraulic parameters that have significant effects on the flood level have not been considered. Despite the favorable results, the introduced models have a major limitation due to not considering the physical properties of the system and its basic information such as vegetation, the structure of the river channel, and the slope of the area, since they can have more significant effects on the occurrence of floods than variables such as evaporation. Therefore, it is necessary to consider these factors in future research. Also, due to the data-driven nature of these models, caution should always be taken in using their outputs as their values sometimes might not be hydrologically logical. This approach is suitable and recommended for regions where it is not possible to collect sufficient information on hydrological parameters and we have to estimate the amount of flood flow with the minimum possible information.
Access to a large number of machine learning models is a great advantage for researchers, who can study various models to find the most suitable model. Although the results of this research were promising, in order to confirm the results and recommend new hybrid models for other similar research, it is necessary to use these models in other basins, which is suggested for future research. In this research, limited variables were used with the aim of using the lowest number of input variables to achieve a suitable flood estimation. In future research, more input variables such as temperature, and precipitation with various delays can be considered, which may effectively improve the performance of the models. Also, the data used in this study belonged to 1 year and on a daily basis. Increasing the length of the statistical period and using more data will lead to better training of the models and as a result, it will increase the accuracy of the models.
CONCLUSION
Accurate river flow prediction is crucial for managing water resources to the best of their abilities. To forecast the amount of flow, a number of models, including conceptual, physical, data-oriented, and experimental models, have been put forth. Considering the complexity of physical, experimental, and conceptual models, data-oriented methods have become increasingly popular among researchers in recent years. The effectiveness of the ANFIS, ANFIS–AOA, and ANFIS–HHO models was assessed in this study for flood prediction. The Shahrchay River in Urmia was chosen as the study area for this purpose. Five scenarios were defined to introduce the inputs to the models. The findings can be summarized as follows:
There was the strongest correlation between the flow and its amounts in the previous one to three days. A correlation coefficient of 0.94 was achieved.
The ANFIS–HHO and ANFIS–AOA hybrid models addressed the limitation of the ANFIS model in producing irrational values, such as negative values, in certain steps.
Using the third scenario, including the flow in the previous one to three days, the ANFIS model produced the best results; while utilizing the fourth scenario that added the evaporation as an input, the ANFIS–HHO and ANFIS–AOA models produced the best results.
The performance of the ANFIS model was improved remarkably by using the HHO and AOA.
The Taylor's diagrams demonstrated how accurate both models were at predicting floods. Using this diagram to compare the models, it was possible to see that the ANFIS–AOA hybrid model performed better in prediction tasks.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.