Abstract

This research uses the multi-layer perceptron–artificial neural network (MLP-ANN), radial basis function–ANN (RBF-ANN), least square support vector machine (LSSVM), adaptive neuro-fuzzy inference system (ANFIS), M5 model tree (M5T), gene expression programming (GEP), genetic programming (GP) and Bayesian network (BN) with five types of mother wavelet functions (MWFs: coif4, db10, dmey, fk6 and sym7) and selects the best model by the TOPSIS method. The case study is the Navrood watershed in the north of Iran and the considered parameters are daily flow discharge, temperature and precipitation during 1991 to 2018. The derived results show that the best method is the hybrid of the M5T model with sym7 wavelet function. The MWFs were decomposed by discrete wavelet transform (DWT). The combination of AI models and MWFs improves the correlation coefficient of MLP, RBF, LSSVM, ANFIS, GP, GEP, M5T and BN by 8.05%, 4.6%, 8.14%, 8.14%, 22.97%, 7.5%, 5.75% and 10% respectively.

HIGHLIGHTS

  • Eight AI-based models were used for estimation of daily flow discharge.

  • Hybrid of AI-based models with MWFs improved their performance.

  • The stepwise method selected the best combination of hydrometric and climatic data.

  • Selection of the best model and ranking of models by the TOPSIS method.

  • Hybrid M5T with sym7 is the best model for estimation of daily flow discharge.

INTRODUCTION

Prediction and estimation of daily flow discharge is a necessary task for short-term planning of water resources. For this purpose, different methods can be applied. In recent years, use of AI-based models is a conventional approach for forecasting of daily flow discharge. The combination of AI-based models with MWFs is a method for improvement of the performance of AI-based models.

Previous research for forecasting and estimation of flow discharge by AI-based models can be divided into two categories, as follows.

A number of researchers used AI-based models and selected the best model with respect to several performance criteria. Adib et al. (2017), Adnan et al. (2019), Erdal & Karakurt (2013), Hamaamin et al. (2016), Rezaie-Balf et al. (2019), Shamshirband et al. (2020), Tongal & Booij (2018) and Wagena et al. (2020) used different AI-based methods for estimation and prediction of daily or monthly flow discharges. The applied methods have different natures (linear, nonlinear, bilinear, probabilistic, regression or classification natures). A number of methods used were M5 model tree (M5T), Bayesian network (BN), gene expression programming (GEP), genetic programming (GP), least squares support vector machine (LSSVM), the classification and regression tree (CART) models and adaptive neuro-fuzzy inference system (ANFIS).

A number of researchers combined AI-based models and MWFs and distinguished the best hybrid model. Abdollahi et al. (2017), Alizadeh et al. (2018), Nourani et al. (2012, 2019); Dalkiliç & Hashimi (2020), Shafaei & Kisi (2017), Santos et al. (2019) and Yaseen et al. (2018) combined AI models and MWFs for forecasting of daily or monthly flow discharge. These hybrid models improved the results of AI-based models.

This research distinguishes the following four basic matters:

  • 1

    - Data that must be introduced to AI-based models;

  • 2

    - The method that can determine appropriate data for introducing to AI-based models;

  • 3

    - Determination of the mother wavelet that has the most effect on results of AI-based models;

  • 4

    - The method that can show the best hybrid model of AI-based model and MWF.

The meteorological and hydrologic data introduced to AI-based methods are related to daily flow discharge. The flow discharge, temperature and precipitation of days ago and the precipitation and temperature on the current day are suitable data for this purpose. For selection of appropriate data, this study used the autocorrelation function (ACF) and partial ACF (PACF). These methods can determine the suitable lag time for each of the meteorological and hydrologic data, too.

This study considers and combines different AI-based models, multilayer perceptron–artificial neural network (MLP-ANN), radial basis function–ANN (RBF-ANN), ANFIS, LSSVM, M5T, GP, GEP and BN and mother wavelet functions (coif4, db10, dmey, fk6 and sym7) and decomposes mother wavelet functions to several levels. For selection of the most accurate method, it uses different performance criteria. Then, it uses the approach for order of preference by similarity to ideal solution (TOPSIS) method.

The considered eight AI-based models in this study have different structures. The M5T uses simple linear equations while the GP and GEP generally provide nonlinear equations and the BN uses a conditional probability table (CPT). ANFIS is a combination of ANN and fuzzy logic principles and uses linear and nonlinear functions while the LSSVM method is a non-probabilistic binary linear method that is used for classification of data and regression analysis. MLP is a feedforward ANN and uses a back propagation technique for training and is a nonlinear method while RBF uses radial basis functions and its output is a linear combination of these functions. The main object of this research is identification of the best structure of the AI-based models for estimation of daily flow discharge in rivers of mountainous watersheds.

MATERIALS AND METHODS

The case study

The mountainous and forested Navrood Watershed is situated in the north of Iran (between 48°34′57″ to 49°0′53″ E and 37°36′35″ to 37°45′19″N). The characteristics of this watershed are as follows.

Data of two hydrometric stations are on the Navrood River. Kharajgil Station (at the watershed's outlet) at 48°53′44″E and 37°42′40″N (altitude is 137 m) and the Khalian Station (at the center of the watershed) at 48°45′13″E and 37°40′54″N (altitude is 715 m) and the Nav rainfall gauging station at 48°41′27″E and 37°39′1″N (its height is 1,000 m) were utilized (Adib et al. 2019). The data used in this study cover daily flow discharge data of Kharagjil hydrometric station and daily precipitation and temperature of three gauging stations (Nav, Khalian & Kharajgil). These data were prepared from 1991 to 2018. Table 1 illustrates the characteristics of the Navrood watershed. Figure 1 shows the Navrood Watershed and its location in Iran (Adib et al. 2019).

Figure 1

The Navrood Watershed.

Figure 1

The Navrood Watershed.

Data analysis

For prediction of daily flow discharge, the Thiessen polygon method was applied to determine the mean of precipitation and temperature in the watershed.

For prediction of daily flow discharge, different meteorological parameters were considered. These parameters were daily precipitation, temperature, hours of sun, evaporation and relative humidity. The correlation between the daily flow discharge and daily hours of sun, evaporation and relative humidity was very low. Therefore, this study did not consider these parameters for the drawing of the partial autocorrelation function (PACF) diagram.

The correlation between the daily flow discharge and the daily precipitation and temperature was low. Therefore, the lag time must be considered. The PACF diagram distinguishes appropriate lag time in time series. It is observed that a suitable lag time is three days.

The Navrood Watershed is a small watershed. This watershed is a forest watershed and most rainfall penetrates into the soil. The river flow discharge is highly dependent on groundwater flow. The velocity of ground water flow is much lower than the velocity of surface flow. Therefore, three days' lag time is acceptable in this small watershed.

Based on considering lag time, Figure 2 shows the PACF diagram for daily flow discharge with 5% significance limits and correlation coefficient between the daily flow discharge and the daily precipitation and temperature.

Figure 2

(a) PACF of daily flow discharge in the Kharajgil Station. (b) Correlation coefficient between daily flow discharge in the Kharajgil Station and precipitation in the Navrood Watershed (c) Correlation coefficient between daily flow discharge in the Kharajgil Station and temperature in the Navrood Watershed.

Figure 2

(a) PACF of daily flow discharge in the Kharajgil Station. (b) Correlation coefficient between daily flow discharge in the Kharajgil Station and precipitation in the Navrood Watershed (c) Correlation coefficient between daily flow discharge in the Kharajgil Station and temperature in the Navrood Watershed.

For selection of the best combination of inputs, two matters must be considered: high correlation coefficient (R) and low number of inputs. For this purpose, this study used the stepwise regression method at a 99% significance level and SPSS v.25 software.

Therefore, Qt-1 and Pt were an appropriate combination of inputs with correlation coefficient (R = 0.81). Although R of a number of combinations is higher than R of this combination, the number of inputs of these combinations is too much. For example, R of the combination Qt-1, Qt-2, Qt-3, Pt, Pt-1, Pt-2, Pt-3, Tt, Tt-1, Tt-3 and Qt-1, Qt-2, Qt-3, Pt, Pt-1, Pt-2, TtTt-1, Tt-3 are 0.826. As can be seen, the difference between R of these combinations and the selected combination is negligible. Q is daily flow discharge, P is daily precipitation and T is daily temperature.

THEORY/CALCULATION

M5 decision tree

The M5 model tree (M5T) or cubist model is a data-driven model. This model was developed by Quinlan (1992). M5 derives an equation between independent and dependent parameters. The base of this model is a binary decision tree and illustrates a structure of the classified data and lines and the splitting procedure in the M5T utilizes linear regression equations in the leaves or terminal nodes (see Kisi 2015). The Waikato Environment for Knowledge Analysis (Weka) software was used in this study to investigate the relationships and present the M5T model.

Bayesian network construction

A Bayesian network (BN) has two components: a qualitative component and quantitative component. BN is a combination of the Bayesian search approach and the constraint-based search algorithms (see Garcia-Prats et al. 2018). The BN structure applied in this study is illustrated in Figure 3.

Figure 3

The selected BN structure for the prediction of daily flow discharge in this study.

Figure 3

The selected BN structure for the prediction of daily flow discharge in this study.

This study used GeNIE2.0 software for the BN method and the applied algorithm for learning of the BN method was the prototypical constraint-based (PC) algorithm. Because this algorithm (PC) does not impose limits on the number of variables or cases in the input, this study selected PC for learning of the BN method and the value of max adjacency size was 8.

GEP

The GEP is a subdivision of genetic algorithm (GA) and applies the individuals' population concerning fitness and has genetic variation applying genetic operators. In GEP, the expression tree is the individuals (nonlinear entities) with various sizes and shapes and chromosomes are simple strings with a fixed length (see Ferreira 2006; Abdollahi et al. 2017).

This study applied four different models of the GEP method. These four models included different combinations of head size and weight of the operators. Finally, the best model was selected and its parameters are provided in Table 2.

Table 1

The characteristics of the Navrood Watershed

Area267 km2
Perimeter 84 km 
Maximum height 3,006 m 
Mean of height 1,182 m 
Minimum height 137 m 
Mean of annual precipitation 1,000 mm 
Mean of annual temperature 13.74 °C 
Length of the Navrood River 35.613 km 
Mean of slope of the Navrood River 7% 
Mean of annual flow discharge 4.28 m3/s 
Area267 km2
Perimeter 84 km 
Maximum height 3,006 m 
Mean of height 1,182 m 
Minimum height 137 m 
Mean of annual precipitation 1,000 mm 
Mean of annual temperature 13.74 °C 
Length of the Navrood River 35.613 km 
Mean of slope of the Navrood River 7% 
Mean of annual flow discharge 4.28 m3/s 
Table 2

The values of parameters of the GEP model used in this study

Number of chromosomes30
Head size 
Number of genes 
Linking function Addition 
Fitness function MSE 
Mutation rate 0.041 
Inversion rate 0.1 
One-point recombination 0.2 
Two-point recombination 0.3 
Gene recombination 0.2 
Gene transposition 0.1 
IS transposition 0.1 
RIS transposition 0.1 
Operator +, −, ×, /, Pow, Sqrt, Exp, Ln, Atan, sin 
Number of chromosomes30
Head size 
Number of genes 
Linking function Addition 
Fitness function MSE 
Mutation rate 0.041 
Inversion rate 0.1 
One-point recombination 0.2 
Two-point recombination 0.3 
Gene recombination 0.2 
Gene transposition 0.1 
IS transposition 0.1 
RIS transposition 0.1 
Operator +, −, ×, /, Pow, Sqrt, Exp, Ln, Atan, sin 

GP

GP is developed by the help of the GA method and treats using genetic rules. This model was developed by Cramer (1985), then extended by Koza (1994). The GA method finds the optimized values for a series of parameters of the model, whereas GP derives a structure between the inputs and outputs. The values of the parameters of GP are reported in Table 3.

Table 3

The values of parameters of the GP model used in this study

Population size250
Generation number 450 
Maximum depth size of a tree 
Total nodes inf 
Function set +, −, ×, power, log, ln, tan, sin 
Tournament size 
Maximum gene number 
The range of constant input numbers [−10,10] 
Population size250
Generation number 450 
Maximum depth size of a tree 
Total nodes inf 
Function set +, −, ×, power, log, ln, tan, sin 
Tournament size 
Maximum gene number 
The range of constant input numbers [−10,10] 

Wavelet transform

To better cope with the signal analysis under uncertain conditions, a multiresolution analysis method might be acceptable. This is very beneficial for most signals in the real world in which higher frequencies occur in relatively weak time resolution, while lower frequencies remain in the long period. In this regard, based on multiresolution analysis, WT is applied to different time portions of a signal. A continuous wavelet transform (CWT) can be formulated as below:
formula
(1)
where τ and s are the transition and scaling parameters, respectively; ψ is a window function, which is a so-called mother wavelet. In CWT, for every possible scale, the corresponding wavelet coefficients are calculated, which can be time-consuming and costly due to providing a great deal of information. Hence, for different problems especially in water resources studies, it is preferable to use the discrete wavelet transform (DWT). In this approach, the parameters' translation and scale are discretized based on a dyadic pattern as follows:
formula
(2)
where s0 and τ0 are greater than 1. The parameters m and n are integers. Hence, a DWT can be formulated as follows:
formula
(3)
In the early studies, the following formula shows the minimum decomposition level (L):
formula
(4)
where Ns is the number of data in the time series (Nourani et al. 2009). The number of data is 28 × 365 = 10,220 therefore L = 4.

In this study more than 15 types of different mother wavelet functions were evaluated and five MWFs, coif4(W1), db10(W2), dmey(W3), fk6(W4) and sym7(W5), were selected. These MWFs are more appropriate for prediction of the daily flow discharge. Based on Equation (4) and the number of data, DWT has four levels (Figure 4).

Figure 4

Decomposition to four levels in this study.

Figure 4

Decomposition to four levels in this study.

Figure 5

Main concept of TOPSIS approach (A+: ideal point, A: negative ideal point) (Balioti et al. 2018).

Figure 5

Main concept of TOPSIS approach (A+: ideal point, A: negative ideal point) (Balioti et al. 2018).

ANFIS

The ANFIS is an adaptive fuzzy inference system that is inspired from the artificial neural network (ANN) to better learning and adaptation. ANFIS was developed by Jang (1993). In this method a set of fuzzy if–then rules and membership functions (MFs) are used to provide the stipulated pairs of input–output.

The applied ANFIS in this study uses the Takagi–Sugeno–Kang (TSK) inference system. The characteristics of ANFIS and different MWFs–ANFIS are provided in Table 4.

Table 4

The number of MFs and training methods of ANFIS and different MWFs–ANFIS for prediction of daily flow discharge

ModelNo. of MFsTraining method
ANFIS Hybrid 
W1ANFIS Hybrid 
W2ANFIS Hybrid 
W3ANFIS Hybrid 
W4ANFIS Back propagation 
W5ANFIS Hybrid 
ModelNo. of MFsTraining method
ANFIS Hybrid 
W1ANFIS Hybrid 
W2ANFIS Hybrid 
W3ANFIS Hybrid 
W4ANFIS Back propagation 
W5ANFIS Hybrid 

LSSVM

The LSSVM method has been utilized for classification and regression problems. Vapnik (2000) introduced support vector regression (SVR) dependent on the theory of statistical learning. Then, Suykens et al. (2002) developed the LSSVM. The LSSVM applies linear equations, while SVR uses quadratic equations and for this reason, the LSSVM has better computational performance. The values of the parameters of LSSVM and different MWFs–LSSVM in this study are given in Table 5.

Table 5

The parameter values of the LSSVM and different MWFs–LSSVM for prediction of daily flow discharge

ModelKernel functionΓσBias
LSSVM RBF 3.5 11.3 5.1 
W1LSSVM RBF 35.9 45.7 3.9 
W2LSSVM RBF 8.4 97.1 3.6 
W3LSSVM RBF 15.6 25.6 2.9 
W4LSSVM RBF 11.4 61.5 3.5 
W5LSSVM RBF 132.3 107.9 3.4 
ModelKernel functionΓσBias
LSSVM RBF 3.5 11.3 5.1 
W1LSSVM RBF 35.9 45.7 3.9 
W2LSSVM RBF 8.4 97.1 3.6 
W3LSSVM RBF 15.6 25.6 2.9 
W4LSSVM RBF 11.4 61.5 3.5 
W5LSSVM RBF 132.3 107.9 3.4 

Also, this study uses MLP-ANN and RBF-ANN methods. The values of the parameters of these methods and the different MWFs–MLP and MWFs–RBF used in this study are shown in Tables 6 and 7.

Table 6

The values of parameters of the MLP-ANN and different MWFs–MLP for prediction of daily flow discharge

ModelNo. of nodes of hidden layersTransfer functions of hidden layersTransfer function of output layerTraining method
MLP 1–5 Tansig – logsig Linear LMa 
W1MLP Logsig Linear LM 
W2MLP Logsig Linear LM 
W3MLP Tansig Linear LM 
W4MLP Tansig Linear LM 
W5MLP Tansig Linear LM 
ModelNo. of nodes of hidden layersTransfer functions of hidden layersTransfer function of output layerTraining method
MLP 1–5 Tansig – logsig Linear LMa 
W1MLP Logsig Linear LM 
W2MLP Logsig Linear LM 
W3MLP Tansig Linear LM 
W4MLP Tansig Linear LM 
W5MLP Tansig Linear LM 

aLevenberg–Marquardt algorithm.

Table 7

The values of parameter of the MLP-RBF and different MWFs–RBF for prediction of daily flow discharge

ModelSpreadNo. of hidden units
RBF 13 11 
W1RBF 31 13 
W2RBF 30 25 
W3RBF 23 13 
W4RBF 34 15 
W5RBF 42 25 
ModelSpreadNo. of hidden units
RBF 13 11 
W1RBF 31 13 
W2RBF 30 25 
W3RBF 23 13 
W4RBF 34 15 
W5RBF 42 25 

TOPSIS

The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) is a method of multi-criteria decision analysis. This method was proposed by Hwang & Yoon (1981).

The steps of the TOPSIS method are:

Step 1: Configure a decision matrix consisting of m alternatives and n criteria:
formula
(5)
Step 2: Normalize the matrix array by the equation below:
formula
(6)
where aij and rij represent the original and normalized decision matrix arrays respectively.
Step 3: Determine the weight of criteria consisting of and multiply the weights by the normalized matrix:
formula
(7)
Step 4: Determine the distance of the ith alternative from positive ideal A+ and negative ideal A :
formula
(8.1)
formula
(8.2)
formula
(8.3)
formula
(8.4)
Step 5: Determine the distance criteria for the positive ideal Si+ and negative ideal Si:
formula
(9.1)
formula
(9.2)
Step 6: Calculate the relative equation comprising Si+ and Si:
formula
(10)
Step 7: Rank preference order according to the descending order of Ci*, so that Ci* = 1 is the best rank and Ci* = 0 is the worst rank.

In this study, the weighting method in the TOPSIS method was the Shannon entropy algorithm. It can measure the uncertainty of a random process. Figure 5 shows the main concept of the TOPSIS approach.

The performance criteria

The applied performance criteria in this study are:

  • 1
    - Taylor skill score
    formula
    (11)
    where R is the correlation coefficient between observed data and calculated values, R0 is the maximum theoretical correlation coefficient, σ is the ratio of the standard deviation of calculated values (σm) to the standard deviation of observed data (σO), and k is the formulation degree. Based on the research of Zamani & Berndtsson (2019) k was considered as 4 and 2 for temperature and discharge in this study. The score equals 1 for an ideal match (when R and σ equal 1) and 0 for inverse model accuracy (when R equals − 1).
  • 2

    - RSR

The RSR is the ratio of the root mean square error (RMSE) and standard deviation of the observed data:
formula
(12)
where QiObs is the ith observed daily flow discharge, QiCal is the ith calculated daily flow discharge, is the mean of observed daily flow discharges and n is the total number of observed daily flow discharges.

The optimum value of RSR is 0 and RSR > 0.7 shows an inappropriate performance of the model (Hamaamin et al. 2016).

  • 3

    - Nash–Sutcliffe efficiency (NSE)

The Nash–Sutcliffe efficiency (NSE) is the ratio of the residual variance to the observed data's variance.
formula
(13)

The NSE value is between ∞ and 1; NSE= 1 is the ideal match between calculated and observed values of the data. NSEs between 0 and 1 are acceptable values of performance.

  • 4

    - Mean Absolute Error (MAE)

The formula for MAE is:
formula
(14)

MAE should be close to 0.

  • 5

    - Pearson's correlation coefficient (R)

The formula for R is:
formula
(15)

R should be close to 1.

In this study, the weights of R, ST, NSE, RSR and MAE in the TOPSIS method were 0.01, 0.24, 0.28, 0.16 and 0.31 respectively.

RESULTS AND DISCUSSION

Based on data analysis for prediction of the daily flow discharge, the daily flow discharge of the Kharajgil Station is dependent on Qt-1 at this station and the mean of Pt in the watershed. Therefore in AI models, Qt=f (Pt, Qt-1). The considered AI-based models are the MLP, RBF, ANFIS, LSSVM, GP, GEP, BN and M5T models.

The stepwise regression method shows that Qt=f (Pt(d2), Qt-1(a4), Qt-1(d1), Qt-1(d2), Qt-1(d3), Qt-1(d4)) is an appropriate combination for the hybrid AI-based models and MWFs (R = 0.881). Although R of a number of combinations is higher than R of this combination, the number of inputs of these combinations is too much. For example, R of Qt=f (Pt(a4), Pt(d1), Pt(d2), Pt(d3), Pt(d4), Qt-1(a4), Qt-1(d1), Qt-1(d2), Qt-1(d3), Qt-1(d4)) is 0.889 (this combination uses all details and approximation components of all inputs). As can be seen, there is a negligible difference between R of this combination and the selected combination. Therefore, this study uses all details and approximation components of Qt-1 and one detail (d2) of Pt.

The difference between R of the considered combination and R of the combination that uses all details and approximation components of all inputs is 0.9% whereas the run time of the considered combination is much less than that of the combination that uses all details and approximation components of all inputs. This difference is almost 30% for the M5T model. Therefore, this study selected effective details and approximation components instead of all details and approximation components of all inputs.

In this study, 70% of data was used for training and 30% of data was used for testing of different methods. The values of performance criteria of a number of methods in the training and testing stages are given in Tables 8 and 9, in which the values of performance criteria of the best hybrid AI-based models and MWFs are shown.

Table 8

The values of performance criteria (training stage)

AI-based models
Hybrid AI-based models and MWFs
ModelRSTNSERSRMAE(m3/s)ModelRSTNSERSRMAE(m3/s)
MLP 0.86 0.80 0.75 0.50 0.73 W5MLP 0.93 0.91 0.87 0.36 0.58 
RBF 0.86 0.78 0.73 0.52 0.76 W2RBF 0.94 0.92 0.88 0.35 0.58 
LSSVM 0.86 0.78 0.75 0.50 0.72 W5LSSVM 0.96 0.95 0.92 0.28 0.51 
ANFIS 0.85 0.77 0.72 0.53 0.72 W2ANFIS 0.93 0.91 0.87 0.36 0.63 
GP 0.80 0.01 0.04 0.98 2.15 W5GP 0.91 0.88 0.83 0.41 0.65 
GEP 0.84 0.78 0.71 0.54 0.81 W3GEP 0.87 0.77 0.75 0.50 0.79 
M5T 0.86 0.78 0.73 0.52 0.73 W5M5T 0.94 0.91 0.88 0.35 0.54 
BN 0.82 0.70 0.63 0.61 1.25 W5BN 0.88 0.82 0.77 0.48 0.86 
AI-based models
Hybrid AI-based models and MWFs
ModelRSTNSERSRMAE(m3/s)ModelRSTNSERSRMAE(m3/s)
MLP 0.86 0.80 0.75 0.50 0.73 W5MLP 0.93 0.91 0.87 0.36 0.58 
RBF 0.86 0.78 0.73 0.52 0.76 W2RBF 0.94 0.92 0.88 0.35 0.58 
LSSVM 0.86 0.78 0.75 0.50 0.72 W5LSSVM 0.96 0.95 0.92 0.28 0.51 
ANFIS 0.85 0.77 0.72 0.53 0.72 W2ANFIS 0.93 0.91 0.87 0.36 0.63 
GP 0.80 0.01 0.04 0.98 2.15 W5GP 0.91 0.88 0.83 0.41 0.65 
GEP 0.84 0.78 0.71 0.54 0.81 W3GEP 0.87 0.77 0.75 0.50 0.79 
M5T 0.86 0.78 0.73 0.52 0.73 W5M5T 0.94 0.91 0.88 0.35 0.54 
BN 0.82 0.70 0.63 0.61 1.25 W5BN 0.88 0.82 0.77 0.48 0.86 
Table 9

The values of performance criteria (testing stage)

AI-based models
Hybrid AI-based models and MWFs
ModelRSTNSERSRMAE(m3/s)ModelRSTNSERSRMAE(m3/s)
MLP 0.87 0.77 0.76 0.49 0.69 W3MLP 0.94 0.93 0.88 0.35 0.59 
RBF 0.87 0.72 0.74 0.51 0.69 W2RBF 0.91 0.85 0.83 0.41 0.60 
LSSVM 0.86 0.83 0.73 0.52 0.71 W3LSSVM 0.93 0.87 0.86 0.37 0.66 
ANFIS 0.86 0.78 0.74 0.51 0.77 W3ANFIS 0.93 0.92 0.87 0.36 0.61 
GP 0.74 0.03 −0.02 1.01 1.99 W5GP 0.91 0.88 0.83 0.41 0.58 
GEP 0.80 0.62 0.64 0.60 0.87 W1GEP 0.86 0.83 0.75 0.50 0.73 
M5T 0.87 0.76 0.75 0.50 0.74 W5M5T 0.92 0.86 0.85 0.39 0.59 
BN 0.80 0.67 0.57 0.65 1.22 W2BN 0.88 0.83 0.77 0.48 0.70 
AI-based models
Hybrid AI-based models and MWFs
ModelRSTNSERSRMAE(m3/s)ModelRSTNSERSRMAE(m3/s)
MLP 0.87 0.77 0.76 0.49 0.69 W3MLP 0.94 0.93 0.88 0.35 0.59 
RBF 0.87 0.72 0.74 0.51 0.69 W2RBF 0.91 0.85 0.83 0.41 0.60 
LSSVM 0.86 0.83 0.73 0.52 0.71 W3LSSVM 0.93 0.87 0.86 0.37 0.66 
ANFIS 0.86 0.78 0.74 0.51 0.77 W3ANFIS 0.93 0.92 0.87 0.36 0.61 
GP 0.74 0.03 −0.02 1.01 1.99 W5GP 0.91 0.88 0.83 0.41 0.58 
GEP 0.80 0.62 0.64 0.60 0.87 W1GEP 0.86 0.83 0.75 0.50 0.73 
M5T 0.87 0.76 0.75 0.50 0.74 W5M5T 0.92 0.86 0.85 0.39 0.59 
BN 0.80 0.67 0.57 0.65 1.22 W2BN 0.88 0.83 0.77 0.48 0.70 

According to the TOPSIS method, the ranking of different methods is provided in Table 10. It is seen from Table 10 that the selected hybrid models generally have the highest ranking. On the other hand, the M5T performs superior to the hybrid LSSVM, ANFIS, MLP, RBF and GEP models. The main advantage of the M5T over the other methods is that it produces explicit equations and can be simply applied in practical applications. The GP and GEP also have explicit equations. However, the M5T uses simple linear equations while the GP and GEP generally provide nonlinear equations.

Table 10

The ranking of different methods according to the TOPSIS method (in training, testing stages and general ranking)

Training
Testing
General
ModelRankingModelRankingModelRanking
GP GP GP 
ANFIS 0.035 ANFIS 0.072 ANFIS 0.053 
RBF 0.039 LSSVM 0.109 RBF 0.079 
MLP 0.12 RBF 0.119 LSSVM 0.116 
LSSVM 0.122 MLP 0.202 MLP 0.161 
W4RBF 0.443 W1LSSVM 0.38 W1LSSVM 0.508 
W4ANFIS 0.447 W4LSSVM 0.5 W4RBF 0.541 
W4MLP 0.521 W5LSSVM 0.56 W4ANFIS 0.548 
W1RBF 0.526 BN 0.598 W4MLP 0.564 
W5ANFIS 0.554 W4MLP 0.607 W1RBF 0.587 
W1ANFIS 0.56 W4RBF 0.639 BN 0.601 
W3MLP 0.561 W3RBF 0.643 W4LSSVM 0.63 
W3ANFIS 0.563 W1RBF 0.647 W5ANFIS 0.635 
W3LSSVM 0.564 W4ANFIS 0.649 W1ANFIS 0.639 
W1MLP 0.565 W5RBF 0.655 W3RBF 0.656 
W2MLP 0.596 W2RBF 0.661 W5RBF 0.669 
BN 0.604 W2LSSVM 0.713 W2RBF 0.691 
W1LSSVM 0.636 W5ANFIS 0.716 W3LSSVM 0.702 
W3RBF 0.669 W1ANFIS 0.718 W2MLP 0.708 
W2ANFIS 0.671 GEP 0.755 W5MLP 0.73 
W5MLP 0.682 W5MLP 0.778 W3ANFIS 0.746 
W5RBF 0.682 W4GEP 0.791 W1MLP 0.75 
W2RBF 0.72 W5GEP 0.798 W2ANFIS 0.751 
W4LSSVM 0.76 W2MLP 0.82 W3MLP 0.773 
W5GEP 0.783 W2ANFIS 0.831 W2LSSVM 0.775 
W3BN 0.795 W2GEP 0.836 W5LSSVM 0.78 
W1BN 0.799 W3LSSVM 0.84 GEP 0.786 
W2BN 0.808 W3GEP 0.86 W5GEP 0.791 
W2GEP 0.811 M5T 0.869 W4GEP 0.802 
W4BN 0.812 W1GEP 0.884 W2GEP 0.824 
W4GEP 0.812 W4M5T 0.893 W3BN 0.847 
GEP 0.817 W4BN 0.899 W3GEP 0.848 
W5BN 0.818 W3BN 0.9 W1BN 0.851 
W1GEP 0.833 W1BN 0.902 W4BN 0.856 
W3GEP 0.837 W5BN 0.904 W2BN 0.857 
W2LSSVM 0.838 W2BN 0.906 W1GEP 0.859 
M5T 0.854 W2M5T 0.925 M5T 0.861 
W4GP 0.894 W3ANFIS 0.93 W5BN 0.861 
W2GP 0.905 W1MLP 0.936 W4GP 0.915 
W1GP 0.917 W4GP 0.937 W4M5T 0.921 
W3GP 0.921 W3M5T 0.952 W2GP 0.933 
W5GP 0.93 W2GP 0.961 W1GP 0.94 
W4M5T 0.949 W1GP 0.964 W3GP 0.946 
W3M5T 0.951 W1M5T 0.964 W3M5T 0.952 
W1M5T 0.959 W3GP 0.971 W5GP 0.956 
W2M5T 0.99 W5M5T 0.979 W2M5T 0.957 
W5M5T 0.995 W5GP 0.982 W1M5T 0.962 
W5LSSVM W3MLP 0.986 W5M5T 0.987 
Training
Testing
General
ModelRankingModelRankingModelRanking
GP GP GP 
ANFIS 0.035 ANFIS 0.072 ANFIS 0.053 
RBF 0.039 LSSVM 0.109 RBF 0.079 
MLP 0.12 RBF 0.119 LSSVM 0.116 
LSSVM 0.122 MLP 0.202 MLP 0.161 
W4RBF 0.443 W1LSSVM 0.38 W1LSSVM 0.508 
W4ANFIS 0.447 W4LSSVM 0.5 W4RBF 0.541 
W4MLP 0.521 W5LSSVM 0.56 W4ANFIS 0.548 
W1RBF 0.526 BN 0.598 W4MLP 0.564 
W5ANFIS 0.554 W4MLP 0.607 W1RBF 0.587 
W1ANFIS 0.56 W4RBF 0.639 BN 0.601 
W3MLP 0.561 W3RBF 0.643 W4LSSVM 0.63 
W3ANFIS 0.563 W1RBF 0.647 W5ANFIS 0.635 
W3LSSVM 0.564 W4ANFIS 0.649 W1ANFIS 0.639 
W1MLP 0.565 W5RBF 0.655 W3RBF 0.656 
W2MLP 0.596 W2RBF 0.661 W5RBF 0.669 
BN 0.604 W2LSSVM 0.713 W2RBF 0.691 
W1LSSVM 0.636 W5ANFIS 0.716 W3LSSVM 0.702 
W3RBF 0.669 W1ANFIS 0.718 W2MLP 0.708 
W2ANFIS 0.671 GEP 0.755 W5MLP 0.73 
W5MLP 0.682 W5MLP 0.778 W3ANFIS 0.746 
W5RBF 0.682 W4GEP 0.791 W1MLP 0.75 
W2RBF 0.72 W5GEP 0.798 W2ANFIS 0.751 
W4LSSVM 0.76 W2MLP 0.82 W3MLP 0.773 
W5GEP 0.783 W2ANFIS 0.831 W2LSSVM 0.775 
W3BN 0.795 W2GEP 0.836 W5LSSVM 0.78 
W1BN 0.799 W3LSSVM 0.84 GEP 0.786 
W2BN 0.808 W3GEP 0.86 W5GEP 0.791 
W2GEP 0.811 M5T 0.869 W4GEP 0.802 
W4BN 0.812 W1GEP 0.884 W2GEP 0.824 
W4GEP 0.812 W4M5T 0.893 W3BN 0.847 
GEP 0.817 W4BN 0.899 W3GEP 0.848 
W5BN 0.818 W3BN 0.9 W1BN 0.851 
W1GEP 0.833 W1BN 0.902 W4BN 0.856 
W3GEP 0.837 W5BN 0.904 W2BN 0.857 
W2LSSVM 0.838 W2BN 0.906 W1GEP 0.859 
M5T 0.854 W2M5T 0.925 M5T 0.861 
W4GP 0.894 W3ANFIS 0.93 W5BN 0.861 
W2GP 0.905 W1MLP 0.936 W4GP 0.915 
W1GP 0.917 W4GP 0.937 W4M5T 0.921 
W3GP 0.921 W3M5T 0.952 W2GP 0.933 
W5GP 0.93 W2GP 0.961 W1GP 0.94 
W4M5T 0.949 W1GP 0.964 W3GP 0.946 
W3M5T 0.951 W1M5T 0.964 W3M5T 0.952 
W1M5T 0.959 W3GP 0.971 W5GP 0.956 
W2M5T 0.99 W5M5T 0.979 W2M5T 0.957 
W5M5T 0.995 W5GP 0.982 W1M5T 0.962 
W5LSSVM W3MLP 0.986 W5M5T 0.987 

The Taylor diagram shows a comparison between different methods. Figure 6(a) compares the performance of the AI-based models and Figure 6(b) compares the performance of the best hybrid AI-based models and MWFs.

Figure 6

The Taylor diagrams for the testing stage. (a) Comparison between AI-based models. (b) Comparison between the hybrid AI-based models and MWFs.

Figure 6

The Taylor diagrams for the testing stage. (a) Comparison between AI-based models. (b) Comparison between the hybrid AI-based models and MWFs.

The Taylor diagram shows that the GP, GEP and BN methods have the lowest performance. Although the combination of these methods with MWFs increases their performance, the performance of the hybrid models of the other methods and MWFs is better than those of GP, GEP and BN. The performances of the ANFIS, LSSVM, M5T, MLP and RBF methods are almost equal. Also, this matter can be observed for hybrid models. Among the five hybrid models, the hybrid model of LSSVM and MWFs has the highest performance at the testing stage, but the performance of this model is low at the training stage. The TOPSIS method considers performances of different models at testing and training stages together for ranking. The resolution of the TOPSIS method is better than the Taylor diagram. Also, the TOPSIS method can consider different performance criteria that the researcher has selected and can change their importance by giving weight to them. However, the Taylor diagram considers only correlation coefficient, standard deviation and RMSE and cannot change their importance.

Therefore, the TOPSIS method shows the difference between models with more clarity. The TOPSIS method is an appropriate criterion for selection of the best model. The W5M5T is the best model for prediction of daily flow discharge. Among the AI methods, the M5T model is a suitable model for this purpose. Figure 7(a) illustrates that the W5M5T and M5T models can simulate daily flow discharge with an appropriate accuracy. The accuracy of W5M5T is more than the alternatives for simulation of peak discharges. Figure 7(b) shows that GP cannot simulate peak discharges but W5GP improved the accuracy of GP very much.

Figure 7

Comparison between models in simulation of daily flow discharge. (a) W5M5T and M5T models. (b) W5GP and GP models.

Figure 7

Comparison between models in simulation of daily flow discharge. (a) W5M5T and M5T models. (b) W5GP and GP models.

CONCLUSION

This study used eight AI-based models (BN, GP, GEP, LSSVM, MLP, RBF, ANFIS and M5T) and five MWFs (coif4(W1), db10(W2), dmey(W3), fk6(W4) and sym7(W5)). The stepwise regression method distinguished that use of Qt-1 and Pt is a suitable combination of introduced inputs to AI-based models for estimation of daily flow discharge. Use of MWFs improved performance of the AI models. At the training stage, MWFs increased R of MLP, RBF, LSSVM, ANFIS, GP, GEP, M5T and BN by 8.14%, 9.3%, 11.63%, 9.41%, 13.75%, 3.57%, 9.3% and 7.32% respectively and reduced the MAE of these models by 20.55%, 23.68%, 29.17%, 12.5%, 69.77%, 2.47%, 26.03% and 31.2%, respectively. At testing stage, MWFs increased R of MLP, RBF, LSSVM, ANFIS, GP, GEP, M5T and BN by 8.05%, 4.6%, 8.14%, 8.14%, 22.97%, 7.5%, 5.75% and 10% respectively and reduced the MAE of these models by 14.49%, 13.04%, 7.04%, 20.78%, 70.85%, 16.09%, 20.27% and 42.62%, respectively.

MWFs have the most effect on the GP model and the TOPSIS method confirmed this matter. Although the ranking of the GP model is low, the hybrid model of GP and MWFs has the highest ranking after the hybrid model of M5T and MWFs.

Among the AI-based models, M5T has the highest ranking and among the hybrid models W5M5T has the highest ranking. Also, results showed that the best MWF is sym7 while Nourani et al. (2019) stated that db10 is the best MWF. Therefore, this research concluded that the best method for estimation of daily flow discharge is W5M5T. Also, the run time of M5T is less than those of the other methods. Generally, combination of AI-based models with MWFs improves the performance and accuracy of single AI-based models. This matter can help designers in the simulation and monitoring of daily flow discharges in different watersheds.

DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

REFERENCES

Abdollahi
S.
Raeisi
J.
Khalilianpour
M.
Ahmadi
F.
Kisi
O.
2017
Daily mean streamflow prediction in perennial and non-perennial rivers using four data driven techniques
.
Water Resources Management
31
(
15
),
4855
4874
.
https://doi.org/10.1007/s11269-017-1782-7
.
Adib
A.
Kalaee
M. M. K.
Shoushtari
M. M.
Khalili
K.
2017
Using of gene expression programming and climatic data for forecasting flow discharge by considering trend, normality, and stationarity analysis
.
Arabian Journal of Geosciences
10
(
9
),
208
.
https://doi.org/10.1007/s12517-017-2995-z
.
Adib
A.
Lotfirad
M.
Haghighi
A.
2019
Using uncertainty and sensitivity analysis for finding the best rainfall-runoff model in mountainous watersheds (case study: the Navrood watershed in Iran)
.
Journal of Mountain Science
16
(
3
),
529
541
.
https://doi.org/10.1007/s11629-018-5010-6
.
Adnan
R. M.
Liang
Z.
Trajkovic
S.
Zounemat-Kermani
M.
Li
B.
Kisi
O.
2019
Daily streamflow prediction using optimally pruned extreme learning machine
.
Journal of Hydrology
577
,
123981
.
https://doi.org/10.1016/j.jhydrol.2019.123981
.
Alizadeh
M. J.
Nourani
V.
Mousavimehr
M.
Kavianpour
M. R.
2018
Wavelet-IANN model for predicting flow discharge up to several days and months ahead
.
Journal of Hydroinformatics
20
(
1
),
134
148
.
doi:10.2166/hydro.2017.142
.
Balioti
V.
Tzimopoulos
C.
Evangelides
C.
2018
Multi-criteria decision making using TOPSIS method under fuzzy environment: application in spillway selection
.
Proceedings
2
(
11
),
637
.
https://doi.org/10.3390/proceedings2110637
.
Cramer
N. L.
1985
A representation for the adaptive generation of simple sequential programs
. In:
Proceedings of an International Conference on Genetic Algorithms and Their Applications
(J. Grefenstette, ed.), Lawrence Erlbaum Associates, Hillsdale, NJ, USA, pp. 183–187
.
Erdal
H. I.
Karakurt
O.
2013
Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms
.
Journal of Hydrology
477
,
119
128
.
https://doi.org/10.1016/j.jhydrol.2012.11.015
.
Ferreira
C.
2006
Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence
, 2nd edn.
Springer
,
Berlin, Germany
.
doi:10.1007/3-540-32849-1
.
Garcia-Prats
A.
González-Sanchis
M.
Del Campo
A. D.
Lull
C.
2018
Hydrology-oriented forest management trade-offs. A modeling framework coupling field data, simulation results and Bayesian Networks
.
Science of the Total Environment
639
,
725
741
.
https://doi.org/10.1016/j.scitotenv.2018.05.134
.
Hamaamin
Y. A.
Nejadhashemi
A. P.
Zhang
Z.
Giri
S.
Woznicki
S.
2016
Bayesian regression and neuro-fuzzy methods reliability assessment for estimating streamflow
.
Water
8
(
7
),
287
.
https://doi.org/10.3390/w8070287
.
Hwang
C. L.
Yoon
K.
1981
Multiple Attribute Decision Making: Methods and Applications
.
Springer-Verlag
,
New York, USA
.
http://dx.doi.org/10.1007/978-3-642-48318-9.
Jang
J.-S. R.
1993
ANFIS: adaptive-network-based fuzzy inference system
.
IEEE Transactions on Systems, Man, and Cybernetics
23
(
3
),
665
685
.
https://doi.org/10.1109/21.256541
.
Koza
J. R.
1994
Genetic programming as a means for programming computers by natural selection
.
Statistics and Computing
4
,
87
112
.
https://doi.org/10.1007/BF00175355
.
Nourani
V.
Komasi
M.
Mano
A.
2009
A multivariate ANN-wavelet approach for rainfall-runoff modeling
.
Water Resources Management
23
(
14
),
2877
2894
.
Nourani
V.
Komasi
M.
Alami
M. T.
2012
Hybrid wavelet–genetic programming approach to optimize ANN modeling of rainfall–runoff process
.
Journal of Hydrologic Engineering
17
(
6
),
724
741
.
https://doi.org/10.1061/(ASCE)HE.1943-5584.0000506.
Nourani
V.
Tajbakhsh
A. D.
Molajou
A.
Gokcekus
H.
2019
Hybrid wavelet-M5 model tree for rainfall-runoff modeling
.
Journal of Hydrologic Engineering
24
(
5
),
04019012
.
https://doi.org/10.1061/(ASCE)HE.1943-5584.0001777.
Quinlan
J. R.
1992
Learning with continuous classes
. In:
AI'92: Proceedings of the 5th Australian Joint Conference on Artificial Intelligence
(A. Adams & L. Sterling, eds), World Scientific, Singapore
, pp.
343
348
.
Rezaie-Balf
M.
Kim
S.
Fallah
H.
Alaghmand
S.
2019
Daily river flow forecasting using ensemble empirical mode decomposition based heuristic regression models: application on the perennial rivers in Iran and South Korea
.
Journal of Hydrology
572
,
470
485
.
https://doi.org/10.1016/j.jhydrol.2019.03.046
.
Santos
C. A. G.
Freire
P. K. M. M.
da Silva
R. M.
Akrami
S. A.
2019
Hybrid wavelet neural network approach for daily inflow forecasting using tropical rainfall measuring mission data
.
Journal of Hydrologic Engineering
24
(
2
),
04018062
.
https://doi.org/10.1061/(ASCE)HE.1943-5584.0001725.
Shamshirband
S.
Hashemi
S.
Salimi
H.
Samadianfard
S.
Asadi
E.
Shadkani
S.
Kargar
K.
Mosavi
A.
Nabipour
N.
Chau
K. W.
2020
Predicting standardized streamflow index for hydrological drought using machine learning models
.
Engineering Applications of Computational Fluid Mechanics
14
(
1
),
339
350
.
https://doi.org/10.1080/19942060.2020.1715844
.
Suykens
J. A. K.
Van Gestel
T.
De Brabanter
J.
De Moor
B.
Vandewalle
J.
2002
Least Squares Support Vector Machines
.
World Scientific
,
Singapore
.
https://doi.org/10.1142/5089
.
Tongal
H.
Booij
M. J.
2018
Simulation and forecasting of streamflows using machine learning models coupled with base flow separation
.
Journal of Hydrology
564
,
266
282
.
https://doi.org/10.1016/j.jhydrol.2018.07.004
.
Vapnik
V. N.
2000
The Nature of Statistical Learning Theory
.
Springer-Verlag
,
New York, USA
.
doi: 10.1007/978-1-4757-3264-1
.
Wagena
M. B.
Goering
D.
Collick
A. S.
Bock
E.
Fuka
D. R.
Buda
A.
Easton
Z. M.
2020
Comparison of short-term streamflow forecasting using stochastic time series, neural networks, process-based, and Bayesian models
.
Environmental Modelling & Software
126
,
104669
.
https://doi.org/10.1016/j.envsoft.2020.104669
.
Yaseen
Z. M.
Awadh
S. M.
Sharafati
A.
Shahid
S.
2018
Complementary data-intelligence model for river flow simulation
.
Journal of Hydrology
567
,
180
190
.
https://doi.org/10.1016/j.jhydrol.2018.10.020
.
Zamani
R.
Berndtsson
R.
2019
Evaluation of CMIP5 models for west and southwest Iran using TOPSIS-based method
.
Theoretical and Applied Climatology
137
(
1–2
),
533
543
.
https://doi.org/10.1007/s00704-018-2616-0
.