Using machine learning methods is efficient in predicting floods in areas where complete data is not available. Therefore, this study considers the Adaptive Neuro-Fuzzy Inference System (ANFIS) model combined with evolutionary algorithms, namely Harris Hawks Optimization (HHO) and Arithmetic Optimization Algorithm (AOA), to predict the flood of Shahrchay River in the northwest of Iran. The data used included the daily data of precipitation, evaporation, and runoff in the years 2016 and 2017, where 70% of the data were used for model training and the rest for testing the models. The results showed that although the ANFIS model provided values with high errors in several steps, especially in steps with maximum or minimum values, the use of HHO and AOA optimization algorithms resulted in a significant reduction in the error values. The ANFIS-AOA model utilizing an input scenario including the flow in the previous one to three days exerted the most promising results in the test data, with Nash Sutcliffe Efficiency (NSE) Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) of 0.93, 1.34, and 0.69, respectively. According to Taylor's diagram, the ANFIS-AOA hybrid algorithm predicts flood values with greater performance than the other models.

  • Integration of ANFIS methods with optimization algorithms was done.

  • Effective flood estimation with limited data was done.

  • Quantitative assessment metrics and model validation were performed.

  • Optimal input pattern selection was done.

  • Practical implications for water management were studied.

Floods are one of the most harmful and destructive natural disasters, causing significant damage to infrastructure, agriculture, human life, and the socioeconomic system. Heavy and excess rainfall and inadequate drainage systems are typically associated with flooding (Sankaranarayanan et al. 2020). Because of both these problems, governments are under pressure to create precise and trustworthy maps of flood risk areas and to make more plans for flood risk management that emphasize readiness, protection, and prevention (Danso-Amoako et al. 2012; Oborie & Rowland 2023). For this reason, flood forecasting models are a crucial tool for managing extreme events and assessing risk. Reliable and precise forecasting supports policy recommendations and analyses, strategies for managing water resources, and additional discharge modeling (Xie et al. 2017). Thus, in order to minimize damage, it is crucial to provide forecasting systems for both short- and long-term floods and other hydrological events (Pitt 2008).

By forecasting the extent and volume of a flood, it is possible to evaluate its risk, and appropriate measures can be taken to reduce financial and human losses. However, since weather patterns are dynamic, predicting the exact time and location of floods is typically challenging. Major flood forecasting models of today include a variety of simplifying assumptions and are primarily data-specific (Lohani et al. 2014; Farahmand et al. 2023).

Various experimental black-box, event-oriented, deterministic, continuous, and hybrid techniques are employed to simulate the behavior of watersheds (Nayak et al. 2005; Kayhomayoon et al. 2022). As a result, there is very little short-term forecasting. The application of physical models for flood forecasting presents severe limitations in addition to the requirement for in-depth knowledge and proficiency regarding hydrological parameters (van den Honert & McAneney 2011; Costabile & Macchione 2015). According to recent research, physical models are not efficient in predicting hydrological phenomena compared to hydrological models and machine learning methods (Devia et al. 2015; Kayhomayoon et al. 2023). It is crucial to employ numerical methods to forecast hydrological events (Dazzi et al. 2021). These models require a large dataset, including topography, land use, and precipitation intensity and duration, among other things, in order to correctly and accurately predict phenomena like runoff and precipitation. However, in some places, this information might not be easily accessible or available (Dazzi et al. 2021).

The third type of model, known as the data-driven models, are devoid of the challenges and issues associated with the physical and conceptual/numerical models. These models only use observational data from the past to make predictions. These machine learning-based models ignore the actual physical environment in which the desired phenomenon occurs (Hashemi et al. 2014). Owing to the algorithm built into their design, they are able to quickly and efficiently make predictions by recognizing the relationships between input and output data (Dazzi et al. 2021; Sadio & Faye 2023). Machine learning models are commonly employed in studies related to hydrology and hydrogeology. Artificial Neural Network (ANN), Support Vector Machine (SVM), decision trees, Fuzzy Inference System (FIS), gene expression programming, Adaptive Neuro-Fuzzy Inference System (ANFIS), and hybrid models that benefit from an optimization scheme, like Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Harris Hawks Optimization (HHO), Arithmetic Optimization Algorithm (AOA), and Slime Mold Algorithm (SMA) are a few of the many approaches from which these models are derived (Javidan et al. 2022).

The ANFIS model is one of the first machine learning models, which is a combination of ANN and FIS. This model has attracted the attention of researchers since it can be widely used to consider uncertainty. Recently, evolutionary algorithms have been used to improve the prediction accuracy of these models in prediction problems. The HHO and AOA methods are among the most recent evolutionary algorithms used for optimization problems. Developed by Heidari et al. (2019), the HHO algorithm works based on mimicking the hunting of prey by Harris hawks, while the AOA works on mathematical operators and is introduced by Abualigah et al. (2021). The use of evolutionary algorithms to improve prediction accuracy has been recommended by various researchers since the last decade, which seems necessary considering the introduction of new algorithms in prediction problems. Flood studies and other hydrological phenomena have benefited from the successful application of these models (Rezaeianzadeh et al. 2014; Adib & Mahmoodi 2017; Mosavi et al. 2018; Sharifi Garmdareh et al. 2018; Wu et al. 2019).

As an example, the use of ANFIS and other machine learning models in flood forecasting by Rezaeianzadeh et al. (2014), the use of genetic algorithms in improving the results of flood forecasting using ANN by Adib & Mahmoodi (2017), and the use of other machine learning models such as SVM, ANN, and ANFIS in predicting flood peak discharge by Sharifi Garmdareh et al. (2018). Although the results of these studies and other similar research show that various machine learning models have been successfully used in the field of flood forecasting, the use of new evolutionary algorithms in improving the accuracy of these models has been limited. In addition, climatic factors are involved in forecasting the amount of flood, and considering them can help in better flood forecasting (Rahmani-Rezaeieh et al. 2020).

Although the ANFIS model has many benefits (Firat & Güngör 2007; Esmaili et al. 2021), it might perform poorly in some problems because of producing irrational output values in some points or being trapped in local minimum points (Qasem et al. 2017; Seifi & Riahi-Madvar 2019; Kayhomayoon et al. 2023; Milan et al. 2021). One of the best approaches to increase the prediction performance of single models in such scenarios is to employ evolutionary algorithms. These algorithms can be used to boost prediction accuracy and enhance the performance of individual models by optimizing the values of their hyperparameters (Dehghani et al. 2019; Arya Azar et al. 2021a). One of the most recently introduced optimization techniques is the AOA (Abualigah et al. 2021). This algorithm is efficient in terms of exploration–exploitation balance and convergence speed, and it complies with Archimedes' law (Abualigah et al. 2021). Numerous studies have employed this algorithm to produce acceptable output data (Zhang et al. 2021a, 2021b; Patil et al. 2022; Kayhomayoon et al. 2023). Thus, in this study, for the first time, the potential of two evolutionary algorithms, namely HHO and AOA, is investigated to boost the prediction performance of ANFIS and present a reliable flood prediction model. Therefore, it can be said that the development of hybrid models for flood forecasting is among the novelty of this research compared to previous work. The results can answer the question of how much hybrid methods are useful to improve the performance of base models in flood prediction.

Floods in Iran are primarily caused by either rain or snowmelt. The most frequent cause of flooding in Iran's temperate and cold regions – the north, northwest, and a sizable portion of the country's west – is the result of a combination of rain and snowmelt. A thorough analysis employing historical data and previous occurrences is required to prevent the negative effects of flooding, even though the intensity of flooding depends on a number of factors, including topography, climatic conditions, and vegetation. The Shahrchay River in northwest Iran was chosen as the study area for this research to predict flood discharge using machine learning models.

This area is one of the most important areas of the Lake Urmia's basin, which has been involved in various floods in the last few decades. In addition, several studies on using machine learning models and optimal evolutionary algorithms to achieve the proper accuracy of flood forecasting have been carried out in this field so far.

Various variables were taken into consideration to create several input scenarios for the models to predict floods. Initially, the correlation between the model output (flood amount) and the influencing variables was measured. Subsequently, the input scenarios were defined and the models were assembled using the input scenarios. Ultimately, the best model and scenario for flood forecasting was identified after examining the performance of each model using statistical and graphical evaluation criteria.

Study area and data

Shahrchay Basin (44°17′–44°35′ N, 37°19′–37°35′ N) is one of Lake Urmia's western basins. The most significant river in the basin, the Shahrchay River, rises in the Iran–Turkey border region. This river enters Lake Urmia on the lower side of Keshtiban after flowing through the south side of Urmia City. The Shahrchay River is 60 km long, with a catchment area of 960 km2. After drinking water branches off, the river's average yearly volume is 168 million cubic meters (MCM). Water from the Shahrchay River is used by ca. 900 villages along its path. The volume of runoff from this river is 260 MCM per year. The study area's location is depicted in Figure 1.
Figure 1

The location of the study area.

Figure 1

The location of the study area.

Close modal

The factors influencing the flow rate should be identified in order to estimate the flood. For the 2 years, 2016 and 2017, the daily values of evaporation (E), precipitation (P), and flow (Q) with delays of 1, 2, and 3 days were taken into consideration as input variables. Table 1 displays the statistical properties of the data. For modeling, 730 data samples were taken into account. The average data for the flow rate was 3.81 m3/day.

Table 1

Statistical parameters of the daily data used in this study

Statistical parameterFlood flow (m3/day)Precipitation (mm)Evaporation (mm)
Number of data 730 730 730 
Max 17.5 52.8 3.01 
Mean 3.81 1.16 3.01 
Min 0.004 0.00 0.00 
Skew 1.12 7.00 0.65 
SD 4.96 2.56 2.76 
Mode 0.103 0.00 0.00 
Statistical parameterFlood flow (m3/day)Precipitation (mm)Evaporation (mm)
Number of data 730 730 730 
Max 17.5 52.8 3.01 
Mean 3.81 1.16 3.01 
Min 0.004 0.00 0.00 
Skew 1.12 7.00 0.65 
SD 4.96 2.56 2.76 
Mode 0.103 0.00 0.00 

Various input scenarios were created by combining the aforementioned variables, and machine learning models were used to predict the river flow rate. In this study, 70% of the data was used for model training while the rest was used to test the models. The research methodology is brought in Figure 2. The Shahrchay River's maximum discharge at the Band station between 1986 and 2017 is depicted in Figure 3. The figure indicates that the maximum discharge in the majority of the studied years falls between 0 and 50 m3/s. The flow reached its maximum value of 290 and 340 m3/day in 1991 and 1992, respectively.
Figure 2

The proposed research methodology.

Figure 2

The proposed research methodology.

Close modal
Figure 3

The maximum discharge of the Shahrchay River at the Urmia Band station.

Figure 3

The maximum discharge of the Shahrchay River at the Urmia Band station.

Close modal

Formulating efficient input scenarios in which various input variables participate is required to achieve the highest prediction performance. The input scenarios were defined in this work according to the highest correlation coefficient. Table 2 displays the correlation coefficient between the model output (flood amount) and the input variables.

Table 2

Input scenarios for predictive models

Correlation coefficient ScenarioQ (t–1)Q (t–2)Q (t–3)EP
0.940.830.710.27− 0.026
S1     
S2    
S3   
S4  
S5 
Correlation coefficient ScenarioQ (t–1)Q (t–2)Q (t–3)EP
0.940.830.710.27− 0.026
S1     
S2    
S3   
S4  
S5 

The table shows that the discharge 1 day and 2 days prior to the model output had the highest correlation, with correlation coefficients of 0.94 and 0.83, respectively. Table 2 also displays five input scenarios and their variables defined based on the correlation coefficients. The flow with a 1-day delay is included in the first pattern. The flow is also included in the second scenario but with two daily delays: one and two. Scenarios 3–5 were defined based on a similar approach.

Machine learning models

In this study, ANFIS as well as its hybrid models with HHO and AOA algorithms have been used for flood forecasting. Implementing these modeling approaches was done in the MATLAB R2021 programming environment.

Adaptive Neuro-Fuzzy Inference System

This model was introduced to leverage the performance of these two models to increase simulation performance. This model is composed of layers, much like a neural network; however, it makes use of rule formulation, much like a fuzzy logic system (Kayhomayoon et al. 2023).

Similar to the FIS, the output of ANFIS can be specified using two types of fuzzy inference: Mamdani and Sugeno. Since the output in Sugeno is a linear relationship, it is used in ANFIS (Jang 1993). The structure of ANFIS can be expressed as Equations (1) and (2) for two input variables, such as x1 and x2 of Sugeno type and with fuzzy {If-Then} rules, with output y
formula
(1)
formula
(2)
where q, p, and r are the subsequent model parameters that are assessed during the training phase, and A and B are the fuzzy sets, respectively. The ANFIS structure includes five layers. The degree of membership of the input nodes is ascertained in various fuzzy intervals in the first layer, where the inputs pass through various membership functions.

Fuzzy membership functions, such as bell, trapezoidal, Gaussian, and triangular functions, are employed in numerous studies (Zhu et al. 2010; Milan et al. 2018). The weight of the rules is determined in the second layer, or rule node, by multiplying the input values to each node collectively. The ‘AND’ operator is used in this research, but the AND or OR operators can also be used in this layer. The weight of the rules is normalized by the nodes in the third layer. The rules are obtained in the result nodes, which are referred to as the rules layer. In the de-fuzzification process, this layer converts each fuzzy rule's output into a non-fuzzy one.

Harris Hawks Optimization

HHO is inspired by the hunting process of prey by Harris Hawks (Heidari et al. 2019). It includes two stages of soft and hard besieges. In the soft siege, the prey still has enough energy and tries to escape with random misleading jumps. The hawks gently surround the prey to make it more tired. However, in the hard siege, the prey is completely tired and has little escape energy. Harris hawks hardly surround the prey to finally deliver the surprise pounce. In this algorithm, the Harris hawks move randomly to find prey, and their position is determined by Equation (3)
formula
(3)
where X(t) and X(t + 1) are the positions of hawks at iterations t and t + 1, respectively, Xrabbit(t) is the prey's position, r1, r2, r3, r4, and q are random numbers, being updated in each iteration, UB and LB and the lower and upper limits of variables, Xrand(t) is the position of an arbitrary hawk, and Xm is the average position of the population, which is obtained using Equation (4)
formula
(4)
where N is the total number of hawks, and Xi(t) is the position of each hawk in iteration t. The prey's energy during the escape is defined by Equation (5)
formula
(5)
where T is the maximum iteration number, E is the prey's energy, and E0 is the initial energy. The E parameter is utilized to enable the algorithm to use soft and hard besiege processes to trap the prey. Soft and hard besieging occur when |E| ≥ 0.5 and |E| < 0.5, respectively (Figure 4).
Figure 4

Flowchart of the HHO algorithm.

Figure 4

Flowchart of the HHO algorithm.

Close modal
When |E| ≥ 0.5, the hawks encircle the prey softly to make the prey more exhausted and then perform the surprise pounce (Equations (6) and (7))
formula
(6)
formula
(7)
where ΔX(t) is the difference between the position of the prey and the current position of hawks in iteration t, and J demonstrates the strength of the prey's random jumps. In hard besiege, the hawks hardly encircle the prey for performing a surprise pounce. Equation (8) shows the updates of current positions in hard besiege.
formula
(8)
To perform a soft besiege, we supposed that the hawks could evaluate (decide) their next move based on the following rule.
formula
(9)
We supposed that they would dive based on the levy flight (LF)-based patterns using the following rule.
formula
(10)
where D is the dimension of the problem, S is a random vector, and LF is calculated using Equation (11)
formula
(11)
where u and v are random values between 0 and 1, and β is a default constant set to 1.5. Hence, Equation (12) shows the final strategy for updating the positions of hawks in the soft besiege phase.
formula
(12)
If r < 0.5 and |E| < 0.5, the prey has not enough energy to escape, and a hard besiege is constructed before the surprise pounce to catch the prey (Figure 4). Therefore, the following rule is performed in hard besiege condition
formula
(13)
where Y and Z are obtained using Equations (14) and (15)
formula
(14)
formula
(15)

Arithmetic Optimization Algorithm

Introduced by Abualigah et al. (2021), the AOA is a new meta-heuristic optimization algorithm that makes use of the distribution behavior of the primary arithmetic operators in mathematics. The AOA is designed to carry out optimization procedures in a variety of search spaces through mathematical modeling and implementation. Without computing the derivatives of optimization problems, this population-based algorithm can resolve them. This optimization process begins with the generation of a population or random set of potential solutions. The main four arithmetic operators estimate the feasible positions of the near-optimal solution during each iteration.

Every solution modifies its position based on the optimal solution found. By increasing linearly from 0.2 to 0.9, the parameter r strikes a balance between exploration and exploitation. When r is small, candidate solutions aim to deviate from the near-optimal solution; when r is large, they converge toward the near-optimal solution. When the end criterion is satisfied, the AOA comes to an end (Figure 5). Three main factors determine the computational complexity of the proposed AOA: updating of solutions, fitness function evaluation, and initialization processes.
Figure 5

Flowchart of the AOA (Abualigah et al. 2021).

Performance evaluation criteria

Root mean square error (RMSE) (Equation (16)), Nash–Sutcliffe efficiency (NSE) index (Equation (17)), and mean absolute error (MAE) (Equation (18)) were measured to evaluate the performance of scenarios and machine learning models used in this study.
formula
(16)
formula
(17)
formula
(18)
where xo is the observed (measured) value, xp is the predicted (estimated) value, and n is the number of samples. and represent the average of the predicted and observed data, respectively.

ANFIS includes several hyperparameters whose values should be optimized for better prediction tasks. For example, the number of fuzzy rules or the type of fuzzy membership function can produce various values for the model output. Model training established the proper architecture of the ANFIS hybrid model considering the optimized parameter values (Table 3). According to the table, the Gaussian membership function was identified as the most suitable fuzzy membership function for the ANFIS model. The first-order linear output function was identified as the best type because the Sugeno fuzzy type is used in ANFIS and produces a function as a result. The HHO algorithm specifies the proper values of the parameters for the given problem by running the model with various values until it reaches the appropriate values for each parameter.

Table 3

Optimized parameters of ANFIS and HHO–ANFIS models

ModelParametervalue
ANFIS Fuzzy structure Sugeno- type 
Initial FIS for training Genfis3 
The type of membership functions Gaussian 
The membership function of the output Linear 
Optimization method Hybrid 
Maximum number of fuzzy rules 10 
The maximum number of epochs 2,000 
HHO Number of search agent 40 
Iteration number 2,000 
β 1.5 
Range partitions (weights and biases) [3, − 3] 
Population size 30 
ModelParametervalue
ANFIS Fuzzy structure Sugeno- type 
Initial FIS for training Genfis3 
The type of membership functions Gaussian 
The membership function of the output Linear 
Optimization method Hybrid 
Maximum number of fuzzy rules 10 
The maximum number of epochs 2,000 
HHO Number of search agent 40 
Iteration number 2,000 
β 1.5 
Range partitions (weights and biases) [3, − 3] 
Population size 30 

Table 4 presents the results of using each scenario. In comparison with the hybrid models, the ANFIS model has performed poorly in forecasting river flow. For the test data, the RMSE values for all scenarios, except scenario 3, are greater than 2, while these values are less than 2 for the other two models. The ANFIS model has a relatively large error in the test data estimation, despite predicting the training data with relatively high performance. Using scenario 3, where the RMSE, MAPE, and NSE error evaluation criteria for the test data are 1.8, 1.07, and 0.89, respectively, the ANFIS model has produced the best results. Using scenario 5 (RMSE = 1.2, MAPE = 0.59, and NSE = 0.95), the best performance among the training data was achieved. However, for the test data, the RMSE, MAPE, and NSE values are increased to 2.25, 1.04, and 0.79, respectively.

Table 4

Results of the error evaluation criteria

ScenariosRMSE (m3/day)
MAPE (m3/day)
NSE
TrainingTestTrainingTestTrainingTest
ANFIS 1.60 2.30 0.8 1.11 0.90 0.80 
1.40 2.00 0.71 1.06 0.93 0.86 
3 1.58 1.80 0.88 1.07 0.91 0.89 
1.30 2.12 0.75 1.26 0.93 0.83 
1.20 2.25 0.59 1.04 0.95 0.79 
ANFIS–HHO 1.86 1.74 0.94 0.95 0.87 0.90 
1.49 1.84 0.8 0.82 0.92 0.88 
1.62 1.60 0.88 0.81 0.9 0.91 
4 1.68 1.40 0.88 0.74 0.90 0.92 
1.57 1.60 0.81 0.87 0.91 0.90 
ANFIS–AOA 1.86 1.9 0.91 0.96 0.88 0.87 
1.5 1.8 0.74 0.89 0.91 0.88 
3 1.67 1.34 0.86 0.69 0.90 0.93 
1.70 1.37 0.92 0.76 0.89 0.93 
1.60 1.59 0.85 0.86 0.91 0.91 
ScenariosRMSE (m3/day)
MAPE (m3/day)
NSE
TrainingTestTrainingTestTrainingTest
ANFIS 1.60 2.30 0.8 1.11 0.90 0.80 
1.40 2.00 0.71 1.06 0.93 0.86 
3 1.58 1.80 0.88 1.07 0.91 0.89 
1.30 2.12 0.75 1.26 0.93 0.83 
1.20 2.25 0.59 1.04 0.95 0.79 
ANFIS–HHO 1.86 1.74 0.94 0.95 0.87 0.90 
1.49 1.84 0.8 0.82 0.92 0.88 
1.62 1.60 0.88 0.81 0.9 0.91 
4 1.68 1.40 0.88 0.74 0.90 0.92 
1.57 1.60 0.81 0.87 0.91 0.90 
ANFIS–AOA 1.86 1.9 0.91 0.96 0.88 0.87 
1.5 1.8 0.74 0.89 0.91 0.88 
3 1.67 1.34 0.86 0.69 0.90 0.93 
1.70 1.37 0.92 0.76 0.89 0.93 
1.60 1.59 0.85 0.86 0.91 0.91 

Note: The best-identified scenario for each model is highlighted in bold.

These findings suggest that the ANFIS network gets trapped in local minima while training. Using the ANFIS–HHO hybrid model based on scenario 4, the highest prediction accuracy was achieved for the test data, where the RMSE, MAPE, and NSE values were equal to 1.4, 0.74, and 0.92, respectively. These values were 1.68, 0.88, and 0.90 for the training data, respectively. Using the third scenario for test data (RMSE = 1.34, MAPE = 0.69, NSE = 0.93) and for training data (RMSE = 1.67, MAPE = 0.86, NSE = 0.9), the ANFIS–AOA model was able to produce the most promising results.

Figure 6 displays the time-series graph of the simulated and observed test data for each model considering its best-identified scenario. This graph demonstrates how well the models performed in forecasting the river flow. A normal distribution is also fitted to the error values for each model to measure the range of error with respect to their normal distribution. Errors following a normal distribution show an acceptable prediction to some extent (Kayhomayoon et al. 2022). The ANFIS–AOA model yields output values that are more in line with the observational data among the models. The two hybrid models largely address the limitation of the ANFIS model in estimating the peak and trough points in certain time steps.
Figure 6

Time-series graph of observed and simulated data: (a) ANFIS, (b) ANFIS–HHO, and (c) ANFIS–AOA.

Figure 6

Time-series graph of observed and simulated data: (a) ANFIS, (b) ANFIS–HHO, and (c) ANFIS–AOA.

Close modal
The scatterplots of simulated and observed data for each model based on its best-identified scenario are displayed in Figure 7. The data are denser near the regression line in all three models. Compared to the other two models, the ANFIS–AOA model has produced fewer negative values for the flood.
Figure 7

Scatter plot of observed and simulated data: (a) ANFIS, (b) ANFIS–HHO, and (c) ANFIS–AOA.

Figure 7

Scatter plot of observed and simulated data: (a) ANFIS, (b) ANFIS–HHO, and (c) ANFIS–AOA.

Close modal
Figure 8 displays Taylor's diagram, which compares observational data with data from predictive models. The data's standard deviation is displayed on the x and y axes in this graph. The correlation coefficient, which ranges from 0 to 1, between the desired and observed data is displayed on the arc of the quadrant of the circle. The RMSD is also shown by the arcs inside the circle. These schematics are created for specific patterns. All three models' data positions are in close proximity to the observational data. The model output data is positioned closer to RMSD = 2 in the Anfis diagram, and the correlation coefficient between the data is close to 0.95. The correlation coefficient in both hybrid models falls between 0.95 and 0.99. The diagrams show that the ANFIS–AOA data has a slightly higher correlation coefficient than the ANFIS–HHO data. As a result, compared to the other two models, the ANFIS–AOA model has greater accuracy.
Figure 8

Taylor's diagram: (a) ANFIS, (b) ANFIS–HHO, and (c) ANFIS–AOA.

Figure 8

Taylor's diagram: (a) ANFIS, (b) ANFIS–HHO, and (c) ANFIS–AOA.

Close modal

According to the results, the hybrid models introduced in this study exerted a promising performance by having the input variables of precipitation, evaporation, and also flow values one to three days before the prediction time. The results of this study show the appropriate performance of machine learning models that can be used in hydrological problems, which is in line with the results of Arya Azar et al. (2023) in the use of machine learning models in predicting evaporation and inflow to dams, Tayfur et al. (2018) in the prediction of flood flows, and Adnan Ikram et al. (2022) in the estimation of runoff. The results of this research and the improvement of the ANFIS model by hybrid approaches were consistent with the results of Samantaray et al. (2023) on using the ANFIS–PSO and ANFIS–SMA models in flood estimation.

Examining some of this research, for example, Arya Azar et al. (2023), shows that the HHO was used successfully to improve the ANFIS in predicting the longitudinal dispersion coefficient of rivers. In the research of Adnan Ikram et al. (2021), hybrid models performed well in improving the prediction accuracy of machine learning models, which is consistent with the results of this research. Also, Tayfur et al. (2018) used various evolutionary algorithms such as genetic algorithm, PSO, and ACO to improve the accuracy of ANN in flood forecasting. Similar to the results of the current research, they indicated the improvement of prediction accuracy by evolutionary algorithms. Therefore, it can be concluded that basic machine learning approaches such as neuron-based and kernel-based algorithms, are trapped in some local optimal points, while evolutionary algorithms do not suffer from such weaknesses. ANFIS might be very accurate in predicting the training data (Tayfur et al. 2018), but the data of the test section might not be predicted with such accuracy, while the combined models of ANFIS–HHO and ANFIS AOA had the same accuracy for predicting both parts of data, which is one of the advantages of these models. For this reason, in order to improve predictive accuracy in time-series and regression, and specifically in flood forecasting, it is always recommended to use novel algorithms.

Among the input scenarios developed, the hydraulic parameters that have significant effects on the flood level have not been considered. Despite the favorable results, the introduced models have a major limitation due to not considering the physical properties of the system and its basic information such as vegetation, the structure of the river channel, and the slope of the area, since they can have more significant effects on the occurrence of floods than variables such as evaporation. Therefore, it is necessary to consider these factors in future research. Also, due to the data-driven nature of these models, caution should always be taken in using their outputs as their values sometimes might not be hydrologically logical. This approach is suitable and recommended for regions where it is not possible to collect sufficient information on hydrological parameters and we have to estimate the amount of flood flow with the minimum possible information.

Access to a large number of machine learning models is a great advantage for researchers, who can study various models to find the most suitable model. Although the results of this research were promising, in order to confirm the results and recommend new hybrid models for other similar research, it is necessary to use these models in other basins, which is suggested for future research. In this research, limited variables were used with the aim of using the lowest number of input variables to achieve a suitable flood estimation. In future research, more input variables such as temperature, and precipitation with various delays can be considered, which may effectively improve the performance of the models. Also, the data used in this study belonged to 1 year and on a daily basis. Increasing the length of the statistical period and using more data will lead to better training of the models and as a result, it will increase the accuracy of the models.

Accurate river flow prediction is crucial for managing water resources to the best of their abilities. To forecast the amount of flow, a number of models, including conceptual, physical, data-oriented, and experimental models, have been put forth. Considering the complexity of physical, experimental, and conceptual models, data-oriented methods have become increasingly popular among researchers in recent years. The effectiveness of the ANFIS, ANFIS–AOA, and ANFIS–HHO models was assessed in this study for flood prediction. The Shahrchay River in Urmia was chosen as the study area for this purpose. Five scenarios were defined to introduce the inputs to the models. The findings can be summarized as follows:

  • There was the strongest correlation between the flow and its amounts in the previous one to three days. A correlation coefficient of 0.94 was achieved.

  • The ANFIS–HHO and ANFIS–AOA hybrid models addressed the limitation of the ANFIS model in producing irrational values, such as negative values, in certain steps.

  • Using the third scenario, including the flow in the previous one to three days, the ANFIS model produced the best results; while utilizing the fourth scenario that added the evaporation as an input, the ANFIS–HHO and ANFIS–AOA models produced the best results.

  • The performance of the ANFIS model was improved remarkably by using the HHO and AOA.

  • The Taylor's diagrams demonstrated how accurate both models were at predicting floods. Using this diagram to compare the models, it was possible to see that the ANFIS–AOA hybrid model performed better in prediction tasks.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abualigah
L.
,
Diabat
A.
,
Mirjalili
S.
,
Abd Elaziz
M.
&
Gandomi
A. H.
2021
The arithmetic optimization algorithm
.
Computer Methods in Applied Mechanics and Engineering
376
,
113609
.
Adnan Ikram
R. M.
,
Jaafari
A.
,
Milan
S. G.
,
Kisi
O.
,
Heddam
S.
&
Zounemat-Kermani
M.
2022
Hybridized adaptive neuro-fuzzy inference system with metaheuristic algorithms for modeling monthly pan evaporation
.
Water
14
(
21
),
3549
.
Arya Azar
N.
,
Ghordoyee Milan
S.
&
Kayhomayoon
Z.
2021a
Predicting monthly evaporation from dam reservoirs using LS-SVR and ANFIS optimized by Harris Hawks optimization algorithm
.
Environmental Monitoring and Assessment
193
(
11
),
1
14
.
Costabile
P.
&
Macchione
F.
2015
Enhancing river model set-up for 2-D dynamic flood modeling
.
Environmental Modelling & Software
67
,
89
107
.
Danso-Amoako
E.
,
Scholz
M.
,
Kalimeris
N.
,
Yang
Q.
&
Shao
J.
2012
Predicting dam failure risk for sustainable flood retention basins: A generic case study for the wider Greater Manchester area
.
Computers, Environment and Urban Systems
36
(
5
),
423
433
.
Devia
G. K.
,
Ganasri
B. P.
&
Dwarakish
G. S.
2015
A review on hydrological models
.
Aquatic Procedia
4
,
1001
1007
.
Esmaili
M.
,
Aliniaeifard
S.
,
Mashal
M.
,
Asefpour Vakilian
K.
,
Ghorbanzadeh
P.
,
Azadegan
B.
,
Seif
M.
&
Didaran
F.
2021
Assessment of adaptive neuro-fuzzy inference system (ANFIS) to predict production and water productivity of lettuce in response to different light intensities and CO2 concentrations
.
Agricultural Water Management
258
,
107201
.
Farahmand
G.
,
Samet
K.
,
Golmohammadi
H.
,
Ashrafi
M.
,
Patel
N.
&
Soufi
M.
2023
Numerical study of Shahrchay dam break and locating the flood prone areas of Urmia city led from it
.
Modeling Earth Systems and Environment
9
(
4
),
4573
4582
.
Firat
M.
&
Güngör
M.
2007
River flow estimation using adaptive neuro fuzzy inference system
.
Mathematics and Computers in Simulation
75
(
3–4
),
87
96
.
Hashemi
A.
,
Asefpour Vakilian
K.
,
Khazaei
J.
&
Massah
J.
2014
An artificial neural network modeling for force control system of a robotic pruning machine
.
Journal of Information and Organizational Sciences
38
(
1
),
35
41
.
Heidari
A. A.
,
Mirjalili
S.
,
Faris
H.
,
Aljarah
I.
,
Mafarja
M.
&
Chen
H.
2019
Harris hawks optimization: Algorithm and applications
.
Future Generation Computer Systems
97
,
849
872
.
Jang
J. S.
1993
ANFIS: Adaptive-network-based fuzzy inference system
.
IEEE Transactions on Systems, Man, and Cybernetics
23
(
3
),
665
685
.
Javidan
S. M.
,
Banakar
A.
,
Asefpour Vakilian
K.
&
Ampatzidis
Y.
2022
A feature selection method using slime mould optimization algorithm in order to diagnose plant leaf diseases
. In:
2022 8th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS)
,
28 December 2022
,
Mazandaran, Iran
.
Kayhomayoon
Z.
,
Naghizadeh
F.
,
Malekpoor
M.
,
Arya Azar
N.
,
Ball
J.
&
Ghordoyee Milan
S.
2022
Prediction of evaporation from dam reservoirs under climate change using soft computing techniques
.
Environmental Science and Pollution Research
1
24
.
Kayhomayoon
Z.
,
Naghizadeh
F.
,
Malekpoor
M.
,
Arya Azar
N.
,
Ball
J.
&
Ghordoyee Milan
S.
2023
Prediction of evaporation from dam reservoirs under climate change using soft computing techniques
.
Environmental Science and Pollution Research
30
(
10
),
27912
27935
.
Lohani
A. K.
,
Goel
N.
&
Bhatia
K.
2014
Improving real-time flood forecasting using fuzzy inference system
.
Journal of Hydrology
509
,
25
41
.
Mosavi
A.
,
Bathla
Y.
&
Varkonyi-Koczy
A.
2018
Predicting the future using web knowledge: state of the art survey. In: Recent Advances in Technology Research and Education: Proceedings of the 16th International Conference on Global Research and Education Inter-Academia 2017 (Luca, D., Sirghi, L. & Costin, C., eds.). Springer International Publishing, Cham, pp. 341–349
.
Nayak
P.
,
Sudheer
K.
,
Rangan
D.
&
Ramasastri
K.
2005
Short-term flood forecasting with a neurofuzzy model
.
Water Resources Research
41
, 1–16.
Patil
P. R.
,
Tanavade
S.
&
Dinesh
M. N.
2022
Analysis of power loss in forward converter transformer using a novel machine learning-based optimization framework
.
Soft Computing
27 (7),
1
17
.
Pitt
M.
2008
Learning Lessons From the 2007 Floods
.
Cabinet Office
,
London, UK
.
Qasem
S. N.
,
Ebtehaj
I.
&
Riahi Madavar
H.
2017
Optimizing ANFIS for sediment transport in open channels using different evolutionary algorithms
.
Journal of Applied Research in Water and Wastewater
4
(
1
),
290
298
.
Rahmani-Rezaeieh
A.
,
Mohammadi
M.
&
Danandeh Mehr
A.
2020
Climate change impacts on floodway and floodway fringe: A case study in Shahrchay River Basin, Iran
.
Arabian Journal of Geosciences
13
(
12
),
494
.
Rezaeianzadeh
M.
,
Tabari
H.
,
Arabi Yazdi
A.
,
Isik
S.
&
Kalin
L.
2014
Flood flow forecasting using ANN, ANFIS and regression models
.
Neural Computing and Applications
25
,
25
37
.
Sadio
C. A. A. S.
&
Faye
C.
2023
Evaluation of extreme flow characteristics in the Casamance watershed upstream of Kolda using the IHA/RVA method
.
International Journal of Sustainable Energy and Environmental Research
12
(
2
),
31
45
.
Samantaray
S.
,
Sahoo
P.
,
Sahoo
A.
&
Satapathy
D. P.
2023
Flood discharge prediction using improved ANFIS model combined with hybrid particle swarm optimisation and slime mould algorithm
.
Environmental Science and Pollution Research
30 (35), 83845–83872.
Sankaranarayanan
S.
,
Prabhakar
M.
,
Satish
S.
,
Jain
P.
,
Ramprasad
A.
&
Krishnan
A.
2020
Flood prediction based on weather parameters using deep learning
.
Journal of Water and Climate Change
11
(
4
),
1766
1783
.
Sharifi Garmdareh
E.
,
Vafakhah
M.
&
Eslamian
S. S.
2018
Regional flood frequency analysis using support vector regression in arid and semi-arid regions of Iran
.
Hydrological Sciences Journal
63
(
3
),
426
440
.
Tayfur
G.
,
Singh
V. P.
,
Moramarco
T.
&
Barbetta
S.
2018
Flood hydrograph prediction using machine learning methods
.
Water
10
(
8
),
968
.
Van den Honert
R. C.
&
McAneney
J.
2011
The 2011 Brisbane floods: Causes, impacts and implications
.
Water
3
,
1149
1173
.
Xie
K.
,
Ozbay
K.
,
Zhu
Y.
&
Yang
H.
2017
Evacuation zone modeling under climate change: A data-driven method
.
Journal of Infrastructure Systems
2017
(
23
),
04017013
.
Zhang
G.
,
Band
S. S.
,
Jun
C.
,
Bateni
S. M.
,
Chuang
H. M.
,
Turabieh
H.
,
Mafarja
M.
,
Mosavi
A.
&
Moslehpour
M.
2021a
Solar radiation estimation in different climates with meteorological variables using Bayesian model averaging and new soft computing models
.
Energy Reports
7
,
8973
8996
.
Zhu
A. X.
,
Yang
L.
,
Li
B.
,
Qin
C.
,
Pei
T.
&
Liu
B.
2010
Construction of membership functions for predictive soil mapping under fuzzy logic
.
Geoderma
155
(
3–4
),
164
174
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).