## Abstract

Accurate estimating of daily streamflow forecasting is one of the prominent topics in water resources activities. In this paper, an integrated method including decomposition technique based on the ensemble empirical mode decomposition (EEMD) combined with multivariate adaptive regression spline (MARS) was carried out to predict daily streamflow values. Daily streamflow value datasets collected from two stations in Iran (Gachsar and Kordkheyl) were selected. After dividing into calibration and validation datasets, each of them was decomposed by EEMD. Crow search algorithm (CSA) was used to optimize the MARS parameters (MARS-CSA). The performance of the integrated model (EEMD-MARS-CSA) was investigated by error indices (correlation coefficient (R), root mean squared error (RMSE), mean absolute error (MAE), Nash–Sutcliffe efficiency (NSE), as well as RMSE to standard deviation ratio (RSR)). From the results, EEMD was an important tool for increasing model accuracy and EEMD-MARS-CSA outperformed other alternative methods for daily streamflow estimation. According to one-day-ahead flow forecasting, EEMD-MARS-CSA (R = 0.94, RMSE = 5.94 m^{3}/s (Kordkheyl) and R = 0.98, RMSE = 0.71 m^{3}/s (Gachsar)) outperformed EEMD-MT/MARS, MT, and MARS models. Furthermore, RSR criterion of EEMD-MARS-CSA was reduced by 18%, 16%, and 17% for 3-days, 1-week, and 2-weeks-ahead streamflow forecasting compared to MARS-CSA model, respectively, for Gachsar station.

## INTRODUCTION

Water is essential for irrigation and is becoming scarcer due to climate change and human activities (Duan *et al.* 2017; Zou *et al.* 2019). It is widely acknowledged that the number and diversity of water-related challenges are large and are expected to increase in the future. Therefore, hydrological modeling can be efficient in order to analyze, understand, and explore solutions for sustainable water management in order to support decision-makers and operational water managers (Yaseen *et al.* 2018; Yuan *et al.* 2018; Li *et al.* 2019). In this sense, determining a proper predictive approach to discharge both long- and short-term flow in various rivers is helpful in water resources management, because various parameters including spatial and temporal variations are influential in predicting streamflow. Moreover, some parameters, such as rainfall, which are used to develop predictive methods to estimate flow discharge, have stochastic and uncertain features and this is a challenging problem. In particular, when the methods are applied for estimating a purpose for some days/months ahead, the complexity of the problems highly increases (Rezaie-Balf *et al.* 2019b).

The estimation of hydrological variables such as flow discharge, suspended sediment loads, water quality parameters, etc., due to their complex (nonlinear) and non-deterministic characteristics, is a difficult and challenging task (Duan *et al.* 2013, 2016). In recent decades, artificial intelligence (AI) models (e.g., evolutionary polynomial regression (EPR) (Balf *et al*. 2018; Najafzadeh *et al.* 2018), adaptive neuro fuzzy inference system (ANFIS) (Dariane & Azimi 2016; Vetrivel & Elangovan 2017), gene expression programming (GEP) (Najafzadeh *et al.* 2016; Shahrara *et al.* 2017), extreme learning machine (ELM) (Adnan *et al.* 2017; Rezaie-Balf & Kisi 2018), model tree (MT) (Rahimikhoob 2016; Wang *et al.* 2017), and support vector machine (SVM) (Adnan *et al.* 2017; Kisi *et al.* 2017; Li *et al*. 2016) have been widely developed for solving dozens of environmental and water engineering problems. Moreover, different studies have been conducted by AI approaches for streamflow estimating (Kisi 2015; Humphrey *et al.* 2016; Partal 2016; Zhang *et al.* 2016; Karimi *et al.* 2017; Yaseen *et al.* 2017; Rezaie-Balf *et al*. 2019a, 2019b, 2019c). Recently, many studies have acclaimed multivariate adaptive regression spline (MARS) as a successful non-parametric regression procedure in various fields such as streamflow prediction (Adnan *et al.* 2019a, 2019b, 2019c; Al-Sudani *et al.* 2019), electricity demand forecasting (Al-Musaylh *et al.* 2018), prediction of scour depth under horizontal jets (Rezaie-Balf 2019), and suspended sediment load prediction (Adnan *et al.* 2019a, 2019b, 2019c). In addition, the crow search algorithm (CSA) is a nature-inspired metaheuristic approach to tackle difficult problems, as suggested by Askarzadeh (2016), and employed in Liu *et al.* (2017), Elaziz *et al.* (2019), Rezaie-Balf *et al*. (2019b), and Sun *et al.* (2017). In this study, the CSA is used to optimize the internal parameters as a hyperparameter tuning tool during the training stage. However, applying the robust techniques with acceptable accuracy to forecast streamflow is an important challenge because the time series streamflow dataset is generally nonlinear. In this way, using the pre-processing approaches may enhance the model performance (Wu *et al.* 2010).

More recently, a number of data pre-processing methods (e.g., continuous wavelet transform (CWT), maximum entropy spectral analysis (MESA), maximum overlap discrete wavelet transform (MODWT), wavelet multi-resolution analysis (WMRA), principal component analysis (PCA)) have been effective for enhancing the estimation accuracy (Hu *et al.* 2007; Wu *et al.* 2010; Sang *et al.* 2013; Benedetto *et al.* 2015; Deo *et al.* 2017; Rezaie-Balf *et al.* 2017, 2019d; Zakhrouf *et al.* 2018; Ghaemi *et al.* 2019). Moreover, three noise-assisted data analysis methods, that include empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD), have been introduced by Wu & Huang (2009) and Huang *et al.* (1998), respectively, as self-adaptive data processing methods developed for non-stationary and non-linear and signal sequences (Huang & Wu 2008; Hu *et al.* 2013). However, there is a serious mode mixing phenomenon in EMD that causes other algorithms to pre-process time series datasets. In this study, to solve the mode mixing problem, a noise-assisted data analysis method called ensemble empirical mode decomposition (EEMD) is adopted instead of EMD (Rezaie-Balf *et al*. 2019b). Various studies have used EMD and EEMD in predicting hydrologic time series phenomenon such as participation and streamflow (Napolitano *et al.* 2011; Karthikeyan & Kumar 2013; Wang *et al.* 2013; Barge & Sharif 2016). Several successful applications of EMD and EEMD have been reported to forecast river flow (Karthikeyan & Kumar 2013; Barge & Sharif 2016). For instance, Napolitano *et al.* (2011) explored several aspects of ANN for forecasting daily streamflow using EMD. They showed that the analysis using the EMD technique increases ANN prediction accuracy and reliability. Likewise, Wang *et al.* (2013) applied EEMD as an adaptive data analysis tool for decomposing an annual rainfall dataset based on a SVM model. They reported the developed model's efficiency and capability using optimal SVM parameters for runoff forecasting. According to the reviewed literature, it is clearly observed that most studies used sole models for streamflow forecasting purposes, in which many of those models could not provide a reliable solution for long-term forecasting of sreamflow. In addition, using heuristic algorithms in order to optimize models' parameters is still scarce. Based on the present survey, having a proper course of action to forecast river flow reliably, especially a long-term period, can be a necessity.

Applying hybrid approaches for daily streamflow forecasting has become enormously popular since they have the advantages of different methods. Consequently, the purpose of this study was to employ a decomposition-based model tree algorithm and multivariate adaptive regression spline for daily river flow forecasting at two different stations, Kordkheyl and Gachsar, in Mazandaran and Karaj basin, respectively. This work is one of the first attempts, to the authors' knowledge, that uses the ensemble empirical mode decomposition as the primary decomposition technique to overcome and to diminish noise, trend, and stochastic behaviors of the data in order to enhance the prediction of daily streamflow records with random patterns and fluctuations. The main contribution of this work is to utilize an ensemble hybrid model for forecasting short-term (i.e., single-day-ahead) and long-term daily flow of the foresaid stations. For this purpose, the CSA was considered to optimize the parameters of the MARS to increase forecasting accuracy and prevent time consumption. Then, to achieve a more reliable model, input/output variables were decomposed by the proposed pre-processing approach (e.g., EEMD) for better frequency resolution within the time series data. By incorporating a long-term measured database of climatic predictor parameters as the input, the new hybrid models were evaluated for streamflow prediction at the mentioned stations. The contribution of this research is coupling the EEMD method with the MARS-CSA model to forecast daily streamflow, which is first applied for multi-step-ahead forecasting of streamflow at daily scale.

The rest of this research has been divided into different sections. First, the methodology of EEMD, MARS, and MARS-CSA models are explained. Then, some statistical performance criteria to evaluate the accuracy of proposed models are introduced. The case study and used dataset are described in the next step. Finally, the results of the proposed models are compared for streamflow predicting, and a conclusion of the outline results is given.

## METHODS

### Ensemble empirical mode decomposition (EEMD)

The ensemble empirical mode decomposition (EEMD) as an enhanced EMD data pre-processing technique is a kind of mathematical function which indicates a non-stationary and non-linear signal of the original dataset, introduced for the first time by Wu & Huang (2009). Decomposing the original data to a small and finite number of oscillatory modes based on the local characteristic time scales, is one of the prominent EEMD features (Huang *et al.* 1998). The oscillatory modes are expressed by components of intrinsic mode functions (IMFs) embedded in the data. The EMD is a self-adaptive time-frequency method, without initial information about the nature, and each IMF represents a signal as a sum of mean-zero well-behaved fast and slow oscillation modes relevant to IMFs provided by two strict principles as (Huang *et al.* 1998; Wu & Huang 2009): (i) throughout the whole length, the number of extrema and zero-crossings should either be equal or vary from one; and (ii) in each point, the average value of the local extrema points (e.g., minimum and maximum) equals zero. Based on the aforementioned characteristics, some of the important IMFs can be well expressed. Generally, an IMF represents a simple oscillatory mode compared to the simple harmonic function. According to this description, a shifting process of original daily streamflow time series dataset can be expressed using six steps (Huang *et al.* 1998):

Step 1: Compute local extrema points of the provided time series y(t).

Step 2: Join these extrema points to form an upper (e

_{max}(t)) and a bottom (e_{min}(t)) envelope with spline interpolation.Step 5: If h(t) meets the two IMF properties of the IMF regarding the stopping benchmark, h(t) is selected as the first IMF written as c1(t) in which 1 is its index; else, y(t) is replaced with h(t) and steps 1–4 are repeated until h(t) meets the two IMF features.

- Step 6: The remaining r
_{1}(t) = y(t) − c_{1}(t) is selected as the new dataset, subjected to the same shifting procedure expressed for other IMF from r_{1}(t). In this step, the shifting procedure is stopped if the residue r(t) has a steady behavior or has one local extrema point from which no more IMF may be produced (Huang et al*.*2003). Finally, in this procedure, the original signal y(t) can be built from the IMFs and the remaining aggregation is as follows:

_{n}(t) shows the final residual, and c

_{i}(t) are nearly orthogonal to each other, in which the mean of them is zero. More information about the EMD approach and the stopping benchmark have been provided in Huang

*et al.*(1998, 2003). Therefore, the results present that EMD is still unstable due to the mode mixing disadvantage (Wu & Huang 2009). Mode mixing is either a single IMF containing various components of disparate scales or a similar scale residing in different IMFs (Lei

*et al.*2009). In order to overcome this disadvantage in EMD, the EEMD as an enhanced technique, is modified. Based on this case, the whole space of time-frequency becomes uniformly filled by adding the white noise which can simplify a natural separation of the frequency scales and decay the mode mixing occurrence.

### Multivariate adaptive regression spline (MARS)

*et al.*2017; Kisi

*et al.*2017; Najafzadeh & Ghaemi 2019).

### Crow search algorithm (CSA)

Crows are one of the most intelligent creatures among the category of birds and animals. They have the largest brain relative to their body size and due to this ratio, their brain is slightly inferior than that owned by humans. Crows have shown self-awareness in mirror tests, and also, can make tools. They are capable of memorizing faces, using tools, communicating in sophisticated ways, and hiding and retrieving food during different seasons. This behavior allows them to discover where the others hide their food, and steal it when the owner leaves. If a crow realizes that it is being followed by another crow, it tries to mislead the follower by flying to a different location. Considering this, Askarzadeh (2016) proposed a novel EA named the crow search algorithm (CSA) for solving complex problems of optimization. The principles of this algorithm can be given as follows:

they (crows) live in the form of flocks;

they memorize the places that they have hidden their food;

they follow each other to conduct thievery;

crows preserve their caches from being pilfered by a probability awareness.

Similar to other population-based algorithms, CSA starts the optimization process with a dimensional environment, comprising a number of crows. The crow number (or population size) is *N* and the crow position *i* at time (iteration) in the search space is determined by a vector , where and . *iter _{max}* is the maximum number of iterations. Each crow can memorize the position of its hiding place. In fact, the best hiding place that each crow has experienced is saved in the crow's memory. Then, the position of the crow's hiding place

*i*at iteration

*iter*, is the memory of that crow and is shown by

*m*

^{i}^{,iter}. Two situations may happen when at iteration

*iter*, crow

*j*flies to its hiding place (

*m*

^{j}^{,iter}) and crow

*i*wants to follow crow

*j*to find crow

*j*'s hiding place:

- If crow
*j*does not know that crow*i*is following it, crow*i*will find the hiding place of crow*j*and a new position of crow*i*is obtained as follows:where*r*is a random number with the uniform distribution between 0 and 1,_{i}*fl*^{i}^{,iter}is the flight length of crow*i*in iteration*iter*. Selecting the value of*fl*less than 1 leads to a local search and obtains the next position of crow*i*between*x*^{i}^{,iter}and*m*; and by selecting the value of^{j,iter}*fl*greater than 1, a global search will be expected that leads to the next position of crow*i*obtained far from*x*and which may exceed^{i,iter}*m*.^{j,iter} If crow

*j*knows that crow*i*is following it, to protect the hiding place of its food source, it will mislead crow*i*by going to another position in the environment.

*r*is a random number with a uniform distribution between 0 and 1 and

_{j}*AP*is the awareness probability of crow

^{i,iter}*j*at iteration

*iter*. The role of this parameter in the CSA is to balance the intensification and diversification in order to increase the intensification using small values for awareness probability by searching a local space and by increasing the awareness probability value, CSA tends to explore the searching space on the global scale. More details on the implementation of the CSA can be found in Askarzadeh (2016).

### Multivariate adaptive regression splines trained by crow search algorithm (MARS-CSA)

In computer science, choosing the parameters of data-driven methods (e.g., ANN, ANFIS, and SVM) is a crucial trend for improving the performance of methods. For instance, the number of hidden layers or the weight and bias parameters are important parameters for optimizing ANN models. Even if the model performs well in an adverse problem, because of incorrect choice of optimized parameters, the result may be a worse performance than that expected. This technique to find optimal parameters is combining prior experiences with a limited heuristic search of possibly optimal solutions which is time-consuming for the user and may not address the issue. Therefore, using a meta-heuristic method can be a useful tool, as one no longer has to count on in-depth experience regarding the application of each model to the problem at hand.

MARS as a data-driven model depends on parameters, namely, penalty parameter *d,* maximum basis function *M*_{max}, and interaction *m _{i}*. However, it is difficult to find the optimal parameters which can improve the accuracy of the MARS model simultaneously because of the large range of choices, and suitable values may still lie outside suggested ranges. In this way, this study attempts to exhibit the usage of integrated MARS and crow search algorithm (MARS-CSA) as a method for assisting users encountering this challenging issue (Figure 1). The CSA was preferable in the present study due to its advantages compared to some metaheuristic algorithms. One of the important reasons for using the CSA is that like other well-known algorithms such as genetic algorithm, particle swarm optimization, and harmony search, the CSA makes use of a population of seekers to explore the search space. By use of a population, the probability of finding a good solution and escaping from local optima increases. In addition to population size and maximum number of iterations (generations), optimization algorithms have some other parameters which should be adjusted. Parameter setting is one of the drawbacks of optimization algorithms since it is a time-consuming matter. In this regard, as mentioned by Askarzadeh (2016), unlike other algorithms, this algorithm works with just two parameters, flight length and awareness probability, which is one of the greatest advantages of the CSA.

_{calibration}and E

_{validation}indicate calibration and validation errors, respectively. In the aforementioned equation, root mean square error (RMSE) as the prediction error criteria (Equation (8)) is used. It is worth pointing out that the fitness function in Equation (9) is the trade-off between model complexity and its generalization. It is clear that over-fitting in models may increase from good training data fitness, so, the combination of calibration and validation error can create an optimal model that balances minimum calibration error and model generalizability. After that, the CSA is used to search for the best parameter setting values, including:

*M*

_{max},

*mi*, and

*d*. The optimization process is terminated when the stop criterion is desirable, and in this study, generation number is selected as the stop criterion. Finally, the optimal MARS model with the best fit of parameter settings is found when the stop criterion is performed. It explains that the trend of building MARS-CSA has been completed in the training process and is ready to verify for the testing dataset.

### Model assessment criteria

^{3}/s) and (m

^{3}/s) are the observed and forecasted values of daily streamflow, respectively; and are the average of observed and predicted values, respectively. STDEV denotes standard deviation of the observed values. The underlying performance criteria can be found as common statistical criteria from previous studies (Chang

*et al.*2017; Asteris

*et al.*2019; Kim

*et al.*2019; Rezaie-Balf

*et al*. 2019b), and each of them has been proposed for a specific purpose. These metrics can provide useful information on how models forecast short- and long-term streamflow.

### Case study and data analysis

Providing the hydro-meteorological dataset is one of the most prominent problems for hydrologists and water resources engineers (Jajarmizadeh *et al.* 2016). Applying data from different areas with various meteorological conditions (e.g., dry and temperate climate) can guarantee and improve the robustness of the proposed approaches. Hence, the daily streamflow measurements used were acquired from two stations, namely, Kordkheyl (Tajan River) and Gachsar (Karaj River) in Iran, that are important ones, and should be addressed for several reasons such as flood control for urban management, agriculture, and reservoir operation and allocation (see Figure 2).

Karaj, one of the river watersheds in Iran, has an 850 km^{2} area and a 146 km circumference on the southern slope of Mount Alborz. It is located between 52°2′ to 51°32′E and 35°52′ to 36°11′N. The area of this basin is about 84,894 hectares and reported average annual water flow for Karaj Dam is 472 million cubic meters. The catchment elevation varies between 1,600 and 4,200 m. The measured average daily discharge is 154.54 m^{3}/s. To measure the discharge of this watershed, ten hydrometry stations are used, and among the ten stations, one of them, namely, Gachsar, located above the Karaj dam, is considered for flow forecasting. In this study, the data are available from 1982 to 2008 for Gachsar station.

The Tajan River, one of the river basins of Iran with about 4,147 km^{2} area and 170 km length, is located between latitude 36°48′N and longitude 53°06′E. In the current research, the daily rainfall and runoff data of this station were used. The measured annual average streamflow of this catchment is 207.4 m^{3}/s. The elevation of the catchment differs between 26 and 3,728 m. The climate is semi and cold-humid and the annual rainfall and discharge are equal to 539 mm and 20 m^{3}/s, respectively (Rezaie-Balf *et al.* 2017). The meteorological data available for this station were downloaded from the Meteorological Organization of Mazandaran Province (MOMP, http://www.mazmet.ir/en/ contact-us), according to which, ten years, from 2003 to 2013, were considered for analysis in this study. Additionally, other details of parameter statistics for the two stations used for prediction purposes are provided in Table 1.

Station . | Crd . | El (m) . | X_{min}
. | X_{max}
. | X_{mean}
. | STDEV . | C_{v}
. | C_{sx}
. |
---|---|---|---|---|---|---|---|---|

Kordkheyl | 36° 25′ N, 53° 03′ E | 234 | 0 | 391 | 11.26 | 21.75 | 473.24 | 5.12 |

Gachsar | 36° 06′ N, 51° 19′ E | 2,258 | 0 | 61 | 3.93 | 3.99 | 15.96 | 2.34 |

Station . | Crd . | El (m) . | X_{min}
. | X_{max}
. | X_{mean}
. | STDEV . | C_{v}
. | C_{sx}
. |
---|---|---|---|---|---|---|---|---|

Kordkheyl | 36° 25′ N, 53° 03′ E | 234 | 0 | 391 | 11.26 | 21.75 | 473.24 | 5.12 |

Gachsar | 36° 06′ N, 51° 19′ E | 2,258 | 0 | 61 | 3.93 | 3.99 | 15.96 | 2.34 |

Crd, El, X_{min}, X_{max}, X_{mean}, STDEV, C_{sx}, and K_{x} denote the coordinates, elevation, minimum, maximum, mean, standard deviation, variation coefficient, and skewness coefficient, respectively.

The available data were divided into two stages: roughly 75% of the dataset was selected for the calibration stage and the remainder for the validation stage. To date, the public agencies have not applied integration of EEMD with the MARS model, with its parameters optimized by the CSA algorithm (MARS-CSA) to forecast daily streamflow at the two aforementioned stations in Iran.

## RESULTS AND DISCUSSION

### Input selection and model development

The main aim of this study is using the integrated EEMD-MARS-CSA model to predict daily streamflow at two stations, Kordkheyl and Gachsar. Therefore, the performance of the proposed models-based lag time variables for daily streamflow estimation was evaluated.

First, for developing EEM-based MARS-CSA, streamflow was selected as the target variable based on the lag time values for Kordkheyl and Gachsar stations. In this way, the partial autocorrelation function (PACF) and auto-correlation function (ACF) values of the time series dataset were calculated for determining the important input variables which have the highest effect on the target variable. The PACF analysis for the daily streamflow time series is shown in Figure 3.

After that, for developing the EEMD-based MARS-CSA model, the non-linear and non-stationary daily streamflow time series dataset was selected as calibration and validation datasets and decomposed into different components of IMFs and one residual values, in which high details to low frequencies of input data are provided. Finally, the precision of integrated MARS-CSA with EEMD method is investigated in prediction streamflow by assessment criteria for calibration and validation datasets, separately. Moreover, design parameters of the proposed techniques were considered constant and are shown in Table 2.

. | Design parameters . | |||||
---|---|---|---|---|---|---|

Model . | Max function . | GCV . | Self- interactions . | Max interactions . | Threshold . | Prune . |

MARS | 21–40 | 0, 2–4 | No | 3 | 1.0000 × 10^{−4} | Yes |

CSA | Population | Awareness probability | Flight length | – | – | – |

20 | 0.1 | 2 | – | – | – |

. | Design parameters . | |||||
---|---|---|---|---|---|---|

Model . | Max function . | GCV . | Self- interactions . | Max interactions . | Threshold . | Prune . |

MARS | 21–40 | 0, 2–4 | No | 3 | 1.0000 × 10^{−4} | Yes |

CSA | Population | Awareness probability | Flight length | – | – | – |

20 | 0.1 | 2 | – | – | – |

To summarize, the hybrid EEMD-MARS and MARS-CSA predictive models are applied based on the ‘decomposition and ensemble’ idea. The decomposition is used to simplify the prediction skill, while the ensemble is applied for formulating a consensus prediction on the original data. In this way, in this study, for verifying and making the pattern of the provided IMFs and residual components (e.g., IMF1, IMF2, IMF3 … Residual) to reflect the prediction model and for enhancing the accuracy of prediction, two daily streamflow time series datasets from two stations, Gachsar and Kordkheyl, in Iran, are applied for testing purpose.

Accordingly, a total of 3,653 and 9,672 daily streamflow time series datasets, respectively, from Kordkheyl and Gachsar stations, are employed for achieving this aim. Among total daily streamflow datasets, 75% was selected as calibration dataset and the remainder as validation dataset. A diagram of the developed EEMD decomposition-based MARS and MARS-CSA models is presented in Figure 4.

### One-day-ahead daily streamflow prediction

As demonstrated in Table 3, in the case of Kordkheyl station, during the calibration stage, the standalone MARS model had not outperformed the MARS-CSA, as evidenced by the lower accuracy values for the MARS model (*MAE**=* 4.439; *RSR**=* 0.499; *NSE**=* 0.75) compared to *MAE**=* 0.899, RSR = 0.447, and *NSE**=* 0.79 for the MARS-CSA model; whereas integration with EEMD leads to improving the accuracy significantly. For example, EEMD-MARS-CSA has lower error in terms of RMSE (16.24%) and RSR (16.33%).

Models . | Statistical error indices . | |||||||
---|---|---|---|---|---|---|---|---|

MARS . | EEMD-MARS . | MARS-CSA . | EEMD-MARS-CSA . | |||||

Cal . | Val . | Cal . | Val . | Cal . | Val . | Cal . | Val . | |

Kordkheyl station | ||||||||

R | 0.86 | 0.88 | 0.92 | 0.92 | 0.89 | 0.91 | 0.92 | 0.94 |

RMSE (m^{3}/s) | 11.55 | 7.97 | 9.08 | 6.91 | 10.35 | 7.21 | 8.67 | 5.94 |

MAE (m^{3}/s) | 4.43 | 3.75 | 3.47 | 3.28 | 3.63 | 3.65 | 3.71 | 3.35 |

NSE | 0.75 | 0.77 | 0.84 | 0.83 | 0.79 | 0.82 | 0.85 | 0.87 |

RSR | 0.49 | 0.6 | 0.39 | 0.41 | 0.44 | 0.42 | 0.37 | 0.35 |

Gachsar station | ||||||||

R | 0.87 | 0.93 | 0.96 | 0.96 | 0.97 | 0.97 | 0.99 | 0.98 |

RMSE (m^{3}/s) | 1.84 | 1.47 | 0.95 | 1.01 | 0.84 | 0.92 | 0.53 | 0.71 |

MAE (m^{3}/s) | 0.51 | 0.64 | 0.25 | 0.35 | 0.23 | 0.31 | 0.13 | 0.29 |

NSE | 0.76 | 0.86 | 0.93 | 0.93 | 0.95 | 0.94 | 0.98 | 0.97 |

RSR | 0.48 | 0.36 | 0.25 | 0.25 | 0.22 | 0.23 | 0.12 | 0.17 |

Models . | Statistical error indices . | |||||||
---|---|---|---|---|---|---|---|---|

MARS . | EEMD-MARS . | MARS-CSA . | EEMD-MARS-CSA . | |||||

Cal . | Val . | Cal . | Val . | Cal . | Val . | Cal . | Val . | |

Kordkheyl station | ||||||||

R | 0.86 | 0.88 | 0.92 | 0.92 | 0.89 | 0.91 | 0.92 | 0.94 |

RMSE (m^{3}/s) | 11.55 | 7.97 | 9.08 | 6.91 | 10.35 | 7.21 | 8.67 | 5.94 |

MAE (m^{3}/s) | 4.43 | 3.75 | 3.47 | 3.28 | 3.63 | 3.65 | 3.71 | 3.35 |

NSE | 0.75 | 0.77 | 0.84 | 0.83 | 0.79 | 0.82 | 0.85 | 0.87 |

RSR | 0.49 | 0.6 | 0.39 | 0.41 | 0.44 | 0.42 | 0.37 | 0.35 |

Gachsar station | ||||||||

R | 0.87 | 0.93 | 0.96 | 0.96 | 0.97 | 0.97 | 0.99 | 0.98 |

RMSE (m^{3}/s) | 1.84 | 1.47 | 0.95 | 1.01 | 0.84 | 0.92 | 0.53 | 0.71 |

MAE (m^{3}/s) | 0.51 | 0.64 | 0.25 | 0.35 | 0.23 | 0.31 | 0.13 | 0.29 |

NSE | 0.76 | 0.86 | 0.93 | 0.93 | 0.95 | 0.94 | 0.98 | 0.97 |

RSR | 0.48 | 0.36 | 0.25 | 0.25 | 0.22 | 0.23 | 0.12 | 0.17 |

Cal and Val denote the calibration and validation phases, respectively.

At Gachsar station, based on the calibration dataset, integrated EEMD, MARS model, and CSA algorithm (EEMD-MARS-CSA) with R (0.99), RMSE (0.533), and NSE (0.99) had better accuracy than other models such as the MARS-CSA model with R = 0.97, RMSE = 0.84, and NSE = 0.95 in daily streamflow prediction.

In the validation stage, the evaluation benchmarks for daily streamflow forecasting for Kordkheyl station using EEMD-MARS-CSA were found to indicate the maximum Nash–Sutcliffe efficiency (*NSE**=* 0.877) and correlation coefficient (*R**=* 0.942) as belonging to this model. Conversely, MARS with *R**=* 0.88 and *NSE**=* 0.778 did not lead to the same accuracy when compared with the other approaches (Table 3).

Like the Kordkheyl station, the EEMD-MARS-CSA model provided the lowest values of RMSE (0.716) and MAE (0.29) and these benchmarks demonstrate the EEMD-MARS-CSA had higher accuracy in forecasting daily streamflow at the Gachsar station. Figure 5 indicates the scatter plots of daily streamflow estimated and observed values for Kordkheyl and Gachsar stations. It is obvious that the EEMD and CSA algorithm have remarkable influence on the model performances and the slopes of EEMD-MARS-CSA are closer to the perfect line compared to the alternative models and most of the daily streamflow estimations are also underestimated. Figure 6(a) and 6(b) show the time series plot of observed and predicted values of daily flow river for the validation period; it can be said that the daily streamflow peaks are predicted much better by EEMD-MARS-CSA than other models.

### Multi-day-ahead daily streamflow prediction

It seems that the EEMD pre-processing approach could enhance the performances of the MARS-CSA model effectively for various time steps at both stations. The hybrid techniques depicted reliable values for estimation of daily streamflow up to three-day-ahead using permissible values which can provide a good indication for one- and two-weeks-ahead. Therefore, the performance of the proposed models is evaluated in the sections below:

#### Three-day-ahead daily streamflow prediction

At Kordkheyl station, it can be seen from MARS-CSA and the standalone MARS model that the performance of the MARS model was roughly similar to that of the MARS-CSA model (R = 0.79, NSE = 0.61, and RSR = 0.61) in streamflow forecasting in the calibration stage. Furthermore, the hybrid approaches (EEMD-MARS-CSA and EEMD-MARS) performed better in comparison with MARS-CSA and MARS, respectively. For instance, RMSE and RSR criteria of EEMD-MARS was reduced by 10.44%, and 10.85% (Table 4).

Models . | Statistical error indices . | |||||||
---|---|---|---|---|---|---|---|---|

MARS . | EEMD-MARS . | MARS-CSA . | EEMD-MARS-CSA . | |||||

Cal . | Val . | Cal . | Val . | Cal . | Val . | Cal . | Val . | |

Kordkheyl station | ||||||||

R | 0.79 | 0.81 | 0.83 | 0.85 | 0.795 | 0.83 | 0.9 | 0.91 |

RMSE (m^{3}/s) | 14.27 | 9.94 | 12.78 | 9.31 | 14.28 | 9.76 | 9.98 | 7.21 |

MAE (m^{3}/s) | 5.89 | 5.39 | 7.39 | 5.85 | 7.84 | 5.84 | 4.28 | 3.66 |

NSE | 0.62 | 0.65 | 0.69 | 0.69 | 0.62 | 0.67 | 0.81 | 0.82 |

RSR | 0.61 | 0.58 | 0.55 | 0.54 | 0.617 | 0.57 | 0.43 | 0.42 |

Gachsar station | ||||||||

R | 0.91 | 0.89 | 0.96 | 0.93 | 0.95 | 0.92 | 0.98 | 0.95 |

RMSE (m^{3}/s) | 1.25 | 1.88 | 1.11 | 1.47 | 1.19 | 1.55 | 0.7 | 1.25 |

MAE (m^{3}/s) | 0.35 | 0.94 | 0.42 | 0.67 | 0.45 | 0.71 | 0.22 | 0.61 |

NSE | 0.87 | 0.779 | 0.91 | 0.86 | 0.9 | 0.85 | 0.96 | 0.9 |

RSR | 0.56 | 0.47 | 0.29 | 0.36 | 0.31 | 0.38 | 0.18 | 0.31 |

Models . | Statistical error indices . | |||||||
---|---|---|---|---|---|---|---|---|

MARS . | EEMD-MARS . | MARS-CSA . | EEMD-MARS-CSA . | |||||

Cal . | Val . | Cal . | Val . | Cal . | Val . | Cal . | Val . | |

Kordkheyl station | ||||||||

R | 0.79 | 0.81 | 0.83 | 0.85 | 0.795 | 0.83 | 0.9 | 0.91 |

RMSE (m^{3}/s) | 14.27 | 9.94 | 12.78 | 9.31 | 14.28 | 9.76 | 9.98 | 7.21 |

MAE (m^{3}/s) | 5.89 | 5.39 | 7.39 | 5.85 | 7.84 | 5.84 | 4.28 | 3.66 |

NSE | 0.62 | 0.65 | 0.69 | 0.69 | 0.62 | 0.67 | 0.81 | 0.82 |

RSR | 0.61 | 0.58 | 0.55 | 0.54 | 0.617 | 0.57 | 0.43 | 0.42 |

Gachsar station | ||||||||

R | 0.91 | 0.89 | 0.96 | 0.93 | 0.95 | 0.92 | 0.98 | 0.95 |

RMSE (m^{3}/s) | 1.25 | 1.88 | 1.11 | 1.47 | 1.19 | 1.55 | 0.7 | 1.25 |

MAE (m^{3}/s) | 0.35 | 0.94 | 0.42 | 0.67 | 0.45 | 0.71 | 0.22 | 0.61 |

NSE | 0.87 | 0.779 | 0.91 | 0.86 | 0.9 | 0.85 | 0.96 | 0.9 |

RSR | 0.56 | 0.47 | 0.29 | 0.36 | 0.31 | 0.38 | 0.18 | 0.31 |

In the case of Gachsar station, despite lower computed MAE value for the MARS model (MAE = 0.35) than MARS-CSA (MAE = 0.499), the MARS model did not perform as well as the MARS-CSA model for the calibration dataset. MARS-CSA had a lower percentage error for daily streamflow prediction in terms of RSR (44.12%) and RMSE (5.02%) than the MARS model (RSR = 0.562 and RMSE = 1.253).

In the validation stage, results from Table 4 demonstrate the efficiency of the EEMD-based MARS-CSA model in improving the forecasting accuracy compared to MARS-CSA and MARS models for both proposed stations. On the other hand, the error evaluation NSE of the EEMD-MARS-CSA model was increased by 22.60%, and 6.24% compared to the MARS-CSA model for Kordkheyl and Gachsar stations, respectively.

The scatter plots and the time series between forecasted and observed daily streamflow values for Gachsar and Kordkheyl stations in the validation stage are provided by Figures 7 and 8. According to the best fit line, the integration of EEMD with MARS and MARS-CSA models causes the estimation accuracy to rise significantly and the slope is close to the ideal line. The three-day-ahead EEMD-MARS-CSA and EEMD-MARS methods underperform while the predicted extreme values of the observed time series compared to the corresponding one-day-ahead ones. The observation of this analysis was that the three-day-ahead MARS model does not have an acceptable performance as good as EEMD-based MARS and MARS-CSA models (Figure 8).

#### One-week-ahead daily streamflow prediction

This trend has been done for daily streamflow prediction for one- and two-week ahead in calibration and validation stages. According to Table 5, for the calibration dataset, among the proposed models, the MARS model provided the lowest accuracy in daily streamflow predicting at Kordkheyl station (R = 0.73, MAE = 6.98, and NSE = 0.526) compared to other models such as the EEMD-MARS model (R = 0.8, MAE = 7.506, and NSE = 0.638).

Models . | Statistical error indices . | |||||||
---|---|---|---|---|---|---|---|---|

MARS . | EEMD-MARS . | MARS-CSA . | EEMD-MARS-CSA . | |||||

Cal . | Val . | Cal . | Val . | Cal . | Val . | Cal . | Val . | |

Kordkheyl station | ||||||||

R | 0.73 | 0.76 | 0.8 | 0.845 | 0.76 | 0.8 | 0.845 | 0.89 |

RMSE (m^{3}/s) | 15.92 | 11.09 | 13.908 | 9.2 | 15.228 | 10.26 | 12.359 | 7.93 |

MAE (m^{3}/s) | 6.98 | 6.165 | 7.506 | 5.65 | 7.96 | 6.013 | 4.64 | 3.88 |

NSE | 0.526 | 0.572 | 0.638 | 0.705 | 0.566 | 0.633 | 0.714 | 0.78 |

RSR | 0.688 | 0.653 | 0.601 | 0.542 | 0.658 | 0.604 | 0.53 | 0.467 |

Gachsar station | ||||||||

R | 0.88 | 0.81 | 0.914 | 0.88 | 0.9 | 0.86 | 0.929 | 0.912 |

RMSE (m^{3}/s) | 1.766 | 2.433 | 1.525 | 1.879 | 1.652 | 2.043 | 1.458 | 1.714 |

MAE (m^{3}/s) | 0.832 | 1.225 | 0.725 | 1.241 | 0.725 | 1.121 | 0.728 | 1.06 |

NSE | 0.781 | 0.63 | 0.837 | 0.78 | 0.808 | 0.741 | 0.85 | 0.81 |

RSR | 0.467 | 0.605 | 0.403 | 0.467 | 0.437 | 0.508 | 0.385 | 0.426 |

Models . | Statistical error indices . | |||||||
---|---|---|---|---|---|---|---|---|

MARS . | EEMD-MARS . | MARS-CSA . | EEMD-MARS-CSA . | |||||

Cal . | Val . | Cal . | Val . | Cal . | Val . | Cal . | Val . | |

Kordkheyl station | ||||||||

R | 0.73 | 0.76 | 0.8 | 0.845 | 0.76 | 0.8 | 0.845 | 0.89 |

RMSE (m^{3}/s) | 15.92 | 11.09 | 13.908 | 9.2 | 15.228 | 10.26 | 12.359 | 7.93 |

MAE (m^{3}/s) | 6.98 | 6.165 | 7.506 | 5.65 | 7.96 | 6.013 | 4.64 | 3.88 |

NSE | 0.526 | 0.572 | 0.638 | 0.705 | 0.566 | 0.633 | 0.714 | 0.78 |

RSR | 0.688 | 0.653 | 0.601 | 0.542 | 0.658 | 0.604 | 0.53 | 0.467 |

Gachsar station | ||||||||

R | 0.88 | 0.81 | 0.914 | 0.88 | 0.9 | 0.86 | 0.929 | 0.912 |

RMSE (m^{3}/s) | 1.766 | 2.433 | 1.525 | 1.879 | 1.652 | 2.043 | 1.458 | 1.714 |

MAE (m^{3}/s) | 0.832 | 1.225 | 0.725 | 1.241 | 0.725 | 1.121 | 0.728 | 1.06 |

NSE | 0.781 | 0.63 | 0.837 | 0.78 | 0.808 | 0.741 | 0.85 | 0.81 |

RSR | 0.467 | 0.605 | 0.403 | 0.467 | 0.437 | 0.508 | 0.385 | 0.426 |

In terms of Gachsar station, regarding Table 5, the model built by integration of EEMD and MARS-CSA could forecast the daily streamflow calibration values with the highest accuracy in terms of R = 0.929, RMSE = 1.458, and RSR = 0.85. Moreover, EEMD caused the performance of the MARS model (NSE = 0.837, RMSE = 1.525) to be higher than the MARS-CSA model (NSE = 0.808, RMSE = 1.652).

During the validation stage, the EEMD and CSA techniques have high effectiveness on the model accuracy. EEMD and CSA-based MARS approach outperformed the other models in forecasting daily streamflow at the Kordkheyl and Gachsar stations. At Kordkheyl station, the EEMD-MARS-CSA (*MAE* = 3.88, *NSE* = 0.780) had the best performance in daily streamflow prediction. After that, EEMD-MARS with a higher percentage error compared to EEMD-MARS-CSA (*MAE**=* 45.61%, *NSE**=* 9.61%) was next in rank.

For the validation stage at Gachsar station, the EEMD-MARS-CSA also created the lowest error values of RSR (0.426) and MAE (1.06) for daily streamflow, which indicated the ability of EEMD-MARS-CSA as a suitable technique to forecast daily streamflow. Moreover, the MARS model with R = 0.81 and RSR = 0.605 did not necessarily estimate daily streamflow as accurately as the others.

Figure 9 presents the scatter plots of forecasted daily streamflow time series against the observed daily streamflow time series of one-week-ahead models pertaining to Gachsar and Koedkheyl stations. The scatter and line plots portray the superiority of the EEMD-MARS-CSA model. Scatter plots illustrate that the EEMD-MARS-CSA model approximately has the best scattered and the lowest error compared to EEMD-MARS, MARS-CSA, and MARS models for daily streamflow forecasting. It should be noted that due to avoiding the high volume of content of DRF prediction, the time series plot for one- and two-week-ahead have not been explained.

#### Two-week-ahead daily streamflow prediction

Likewise, exhibiting more accurate daily streamflow prediction using EEMD-MARS-CSA for one-, three-, and seven-day-ahead, the accuracy of this model in daily streamflow predicting for two-weeks-ahead, was evaluated. Based on the result, it can be concluded that the MARS model had poorer performance for the calibration dataset at both the proposed stations. The evaluation indices *R*, *RMSE*, and *RSR* of the MARS were, respectively, 0.72, 16.24, and 0.702 for Kordkheyl station and 0.83, 2.165, and 0.572 for Gachsar station (Table 6).

Models . | Statistical error indices . | |||||||
---|---|---|---|---|---|---|---|---|

MARS . | EEMD-MARS . | MARS-CSA . | EEMD-MARS-CSA . | |||||

Cal . | Val . | Cal . | Val . | Cal . | Val . | Cal . | Val . | |

Kordkheyl station | ||||||||

R | 0.72 | 0.74 | 0.78 | 0.82 | 0.744 | 0.77 | 0.85 | 0.876 |

RMSE (m^{3}/s) | 16.24 | 11.404 | 14.501 | 9.912 | 16.023 | 10.85 | 12.206 | 8.99 |

MAE (m^{3}/s) | 6.97 | 6.501 | 7.345 | 5.866 | 7.621 | 6.33 | 4.787 | 4.24 |

NSE | 0.5 | 0.55 | 0.607 | 0.658 | 0.52 | 0.59 | 0.721 | 0.72 |

RSR | 0.702 | 0.67 | 0.626 | 0.583 | 0.692 | 0.639 | 0.527 | 0.529 |

Gachsar station | ||||||||

R | 0.83 | 0.766 | 0.87 | 0.84 | 0.84 | 0.816 | 0.91 | 0.88 |

RMSE (m^{3}/s) | 2.165 | 2.679 | 1.83 | 2.218 | 2.026 | 2.401 | 1.589 | 1.99 |

MAE (m^{3}/s) | 1.269 | 1.748 | 0.94 | 1.48 | 1.091 | 1.554 | 0.554 | 1.501 |

NSE | 0.671 | 0.574 | 0.76 | 0.708 | 0.712 | 0.657 | 0.823 | 0.763 |

RSR | 0.572 | 0.65 | 0.484 | 0.54 | 0.536 | 0.584 | 0.421 | 0.486 |

Models . | Statistical error indices . | |||||||
---|---|---|---|---|---|---|---|---|

MARS . | EEMD-MARS . | MARS-CSA . | EEMD-MARS-CSA . | |||||

Cal . | Val . | Cal . | Val . | Cal . | Val . | Cal . | Val . | |

Kordkheyl station | ||||||||

R | 0.72 | 0.74 | 0.78 | 0.82 | 0.744 | 0.77 | 0.85 | 0.876 |

RMSE (m^{3}/s) | 16.24 | 11.404 | 14.501 | 9.912 | 16.023 | 10.85 | 12.206 | 8.99 |

MAE (m^{3}/s) | 6.97 | 6.501 | 7.345 | 5.866 | 7.621 | 6.33 | 4.787 | 4.24 |

NSE | 0.5 | 0.55 | 0.607 | 0.658 | 0.52 | 0.59 | 0.721 | 0.72 |

RSR | 0.702 | 0.67 | 0.626 | 0.583 | 0.692 | 0.639 | 0.527 | 0.529 |

Gachsar station | ||||||||

R | 0.83 | 0.766 | 0.87 | 0.84 | 0.84 | 0.816 | 0.91 | 0.88 |

RMSE (m^{3}/s) | 2.165 | 2.679 | 1.83 | 2.218 | 2.026 | 2.401 | 1.589 | 1.99 |

MAE (m^{3}/s) | 1.269 | 1.748 | 0.94 | 1.48 | 1.091 | 1.554 | 0.554 | 1.501 |

NSE | 0.671 | 0.574 | 0.76 | 0.708 | 0.712 | 0.657 | 0.823 | 0.763 |

RSR | 0.572 | 0.65 | 0.484 | 0.54 | 0.536 | 0.584 | 0.421 | 0.486 |

In terms of validation stage and considering Table 6, EEMD-MARS-CSA and MARS models had the maximum and minimum level of accuracy, respectively, at both Kordkheyl and Gachsar stations. For instance, at Koedkheyl station, the EEMD-MARS-CSA model outperformed all other models in the case of *RMSE* (8.99) and *NSE* (0.72). The MARS model had the lowest accurate RMSE (11.4) and RSR (0.67) compared to others such as the MARS-CSA model (RMSE = 10.85, RSR = 0.639).

In addition, MARS with R = 0.766 and NSE = 0.574 could not estimate daily streamflow values as well as EEMD-MARS (R = 0.84 and NSE = 0.708), MARS-CSA (R = 0.816 and NSE = 0.657) and EEMD-MARS-CSA (R = 0.88 and NSE = 0.763) at Gachsar station (Table 6). Figure 10 presents the scatter plots for the forecasted versus daily observed streamflow at the Gachsar and Kordkheyl stations for the proposed models. According to this figure, MARS has more dispersed scattered estimates than other proposed methods concerning daily streamflow forecasting; whereas, daily streamflow predicted values by the hybrid EEMD process with the MARS-CSA model were closer to the perfect line compared with the standalone MARS model.

As discussed, streamflow is characterized by complex phenomena and it is very challenging to comprehend perfectly with regard to forecasting the future pattern due to its dynamical, non-linear, and non-stationarity behavior. However, a large amount of forecasting approaches have been proposed for streamflow prediction; all of these methods have improved the precision of streamflow forecasting to some extent. According to the above study results, it is manifest that CSA, by determining the optimum values of the internal parameters of the MARS, could highly improve short-term (one- and three-day-ahead) forecasting. A remarkable performance improvement can be observed for all indices, in which the MARS-CSA model for Kordkheyl and Gachsar can decrease prediction error by 10% and 37%, respectively, in terms of RMSE for one-day-ahead streamflow forecasting compared with standalone MARS. This value for three-day-ahead forecasting reached about 2% (Kordkheyl) and 18% (Gachsar) for the validation stage. Although the proposed MARS-CSA was capable of one- and three-day-ahead flow forecasting, the models have limited capacity and are not suitable for long-term (one and two-week-ahead) flow forecasting. In this sense, the EEMD, an advanced and effective decomposition technique, was applied to decompose raw streamflow series into several IMF signals for easy analysis and forecasting. The comparison results of EEMD-MARS-CSA and other models are given, which illustrate the superiority of the EEMD algorithm compared with the non-decomposed models.

According to the scatter plots, MARS and MARS-CSA models show an under-predicted performance in the calibration stage for both Kordkheyl and Gachsar stations, although this drawback improved by considering decomposition process of streamflow integratated with the EEMD algorithm.

The whole study revealed that the developed forecasting system produces great improvements when compared with the competitive models (MARS, MARS-CSA), which represents the developed decomposition strategy's contribution to the final success of the developed short- and long-term forecasting system, meaning that the developed decomposition strategy was superior to other DDMs. Hence, the hybrid EEMD-DDMs could increase the performance of sole DDMs up to 10%.

## CONCLUSION

In this study, the performance of a hybrid technique the ensemble empirical mode decomposition (EEMD) as a pre-processing method combined with one of the artificial intelligence methods multivariate adaptive regression spline (MARS) was applied to estimate daily streamflow at two stations, namely, Kordkheyl and Gachsar located in Iran, using daily streamflow dataset. Moreover, for improving the accuracy of the EEMD-MARS model, the crow search algorithm was used to optimize the MARS parameters (MARS-CSA). To achieve this aim, first, the PACF was employed to find the significant lag times to select input variables. For decomposing daily streamflow dataset by EEMD, they were categorized into calibration and validation datasets. In the next step, the accuracy of the proposed method (EEMD-MARS-CSA) with EEMD-MARS, MARS-CSA, and standalone MARS model was investigated in terms of R, RMSE, RSR, MAE, and NSE. According to the results, EEMD and CSA have significant influence to forecast daily streamflow values, especially for long-term forecasting of daily streamflow, thus the short- and long-term daily streamflow values given by EEMD-MARS-CSA were more accurate than other models.

Although this study presented new EEMD-M5tree/MARS models for daily river flow forecasting, various limitations should be mentioned for future studies. As can be seen from analyses and figures, M5tree and MARS approaches could improve the accuracy of single and multi-step flow forecasting when the EEMD technique is integrated to decompose data time series. One of the main disadvantages of EEMD is that its implementation takes a long time as it produces different numbers of IMFs; each IMF of the input variables should be considered with the same IMF of the target variables in order to forecast. Moreover, this study only used an antecedent quantity of river discharge in order to forecast single and multi-day-ahead, and is not reliable for more than two-week-ahead forecasting based on its accuracy, but it may forecast further if other climate variables such as rainfall, max/min temperature, and evaporation are available for the aforementioned period. Generally speaking, river flow, rainfall, and temperature are in close relationship, which follows that acceptable river discharge is low in summer and high in winter. In this regard, it would be valuable and helpful for future research to develop a predictive model utilizing both the seasonal daily river flow and the related climate input data.

For future research, the hybrid model, coupling a data-driven approach, data pre-processing technique, and physically based hydrologic model, can be suggested as a potential alternative to enhance the forecasting accuracy of hydrological processes (i.e., streamflow, rainfall, water stage, groundwater, etc.). To attain this goal, additional data pre-processing techniques, such as complete ensemble empirical mode decomposition (CEEMD), improved CEEMD, and variational mode decomposition (VMD), can be addressed. In addition, it can be suggested that two-phase decomposition can be applied to enhance IMF1 forecasting values, which consist of high pass signals for future works. Another possibility is for optimum selection of input configurations for a more reliable model using feature selection algorithms such as Mallow's Cp, genetic algorithm, PCA, etc. Also, uncertainty analysis for streamflow data can be suggested for future studies, as flow uncertainty analysis is critical for good water management decisions. Uncertainty analysis may also be perceived as arduous and expensive, with results that could add doubt to reported conclusions. If the imprecision and inaccuracy of the model were better quantified, users would be able to better use the flow data in their predictions and decision-making.