## Abstract

In this study, daily river stage–discharge relationship was predicted using different modeling scenarios. Ensemble empirical mode decomposition (EEMD) algorithm and wavelet transform (WT) were used as hybrid pre-processing approach. In the WT-EEMD approach, first temporal features were decomposed using WT. Furthermore, the decomposed sub-series were further broken down into intrinsic mode functions via EEMD to obtain features with higher stationary properties. Mutual information was used to select dominant sub-series and determine efficient input dataset. Relevance vector machine (RVM) was applied to forecast river discharge. Three scenarios were developed to predict river stage–discharge process. First, a successive-station form of forecasting was proposed by incorporating geomorphological features into the modeling process. Subsequently, an integrated RVM (I-RVM) was trained based on the concept of the cascade of reservoirs and the meta-learning approach. The proposed I-RVM had the semi-distributed characteristics of the river discharge model. Finally, a multivariate RVM was trained to predict discharge for different points of the river. For this reason Westhope station's features were used as input to predict discharge at downstream of the river. Results were compared with rating curve and capability of proposed models were approved in prediction of short-term river stage–discharge.

## INTRODUCTION

Water management policy is a crucial global issue that intends to govern water acquisition and determine proper ways to minimize the misapplication and spoil of attainable water resources. River discharge is very important in the hydrological cycle and hydro-environmental management and plays a basic designation in designing and operating irrigation schemes in a watershed scale. One of the most important problems in analysis and management of river discharge is the modeling/simulation. For this purpose, the complexity and uncertainty of the process is widely modeled using artificial intelligence (AI) by hydrologists and scientists.

The collection of continuous discharge measurements is a very hard task and, therefore, a stage–discharge (S-D) relationship is commonly used to estimate stream discharges as measured stage values. Rating curves are widely used to determine the S-D relationship, although they are not able to provide sufficiently accurate results. A S-D rating curve (RC) is a relationship between stream stage (water level) and discharge for a particular section of a stream. Such a relationship is usually established by a regression-based analysis, and the curves are generally expressed in the form of a power equation (Kisi & Cobenar 2009). However, using the S-D relationship for different river conditions might not be sufficiently capable. In other words, to capture precise results, various researchers suggested applying AI-based models that proved to provide more accurate outcome in comparison to the S-D relationship (Ghorbani *et al.* 2016).

AI has been successfully applied in a number of diverse fields including water resources (e.g., Coulibaly *et al.* 1999; Anmala *et al.* 2000; Tokar & Markus 2000; Chou 2013; Singh *et al.* 2015), whereas there are many AI-based approaches that are driven by knowledge discovery (Babovic 2009; Meshgi *et al.* 2015). Likewise, machine learning approaches are remarkable AI methods that have been implemented in the river modeling field in recent years (e.g., Yu *et al.* 2004; Azamathulla *et al.* 2009; Chen *et al.* 2010; Nourani *et al.* 2016). Also, some researchers have performed reviews on good state-of-the-art application of AI and some pre-processing approaches to categorize different researches and their advantages (Labat 2005; Abrahart *et al.* 2012; Sang 2013).

The relevance vector machine (RVM), was first proposed by Tipping (2001), and is unique in its form. RVM was designed to enhance some of the shortcomings of the original support vector machine (SVM), including (a) probabilistic prediction issues, (b) solution solving problems, and (c) less tuning parameters (kernel or covariance functions). Similar to the SVM, RVM is a kernel-based method that parameterizes the unknown function like a weighted sum of non-linear basis functions in the feature space. Opposite to the SVM, RVM supposes that the weights are random variables and employs a Bayesian framework to evaluate the posterior distribution of weights using data. A considerable work on the use of RVM is the statistical downscaling of climate model outputs to predict the runoff for several Indian river basins (Ghosh & Mujumdar 2008).

Wavelet transform (WT) and ensemble empirical decomposition mode (EEMD) are pre-processing methods that donate remarkable vision into the physical form of the data by representing information in both time and frequency domains (Roushangar & Alizadeh 2018). EEMD has been proven to be a quite effective method for extracting signals from data generated in noisy non-linear and non-stationary processes (Huang *et al.* 1998). A time series in the WT breaks down into a series of linearly independent detailed signals and one approximation signal by using a specific wavelet function. Mallat (1998) presented a complete theory for wavelet multi-resolution signal decomposition (also mentioned as a pyramid decomposition algorithm). Research proved that a proper data pre-processing by using the WT and EEMD can lead the models to adequately illustrate the real specifications of the basic system. WT along with EEMD decomposes a non-stationary signal into a given quantity of stationary sub-signals. Then, AI methods can be combined with WT and EEMD to improve preciseness of the prediction. In the present research, feature extraction and learning of historical data are the key to improve forecasting accuracy. For this reason, much of the actual effort is put into the processes of feature extraction, data pre-processing and transformations. Maheswaran & Khosa (2013) also used wavelet technique to improve non-linear model forecasting performances in Krishna River. Bai *et al.* (2015) developed a multiscale deep feature learning method with hybrid models to deal with daily reservoir inflow forecasting. Precise hybrid models were employed in recent years to forecast hydrological and hydrogeological processes (e.g., Partal 2009; Tiwari & Chatterjee 2010; Wang *et al.* 2015).

In the modeling process via AI approaches, some of the input variables might present correlation, noise, or have no significant relationship with target variables and generally are not equally informative. Therefore, one important step is to determine dominant input variables that are independent, informative, and effective. Mutual information (MI) as a non-linear measure of information content can be useful in obviating the selection of effective inputs among huge numbers of WT-based or WT-EEMD sub-series (Babovic & Keijzer 2000; Sannasiraja *et al.* 2004; Sun *et al.* 2010; Roushangar *et al.* 2016). Shannon entropy-based measures are applied in this study to extract dominant inputs of the proposed models for discharge–stage modeling. Since the entropy measures arbitrary dependencies between random variables, it is suitable to be applied to complex classification tasks, where methods based on linear relations are prone to mistakes. Entropy can be regarded as a measure of information, disorder, chaos, and uncertainty (Shannon 1948; Roushangar *et al.* 2016). Meyer *et al.* (2008) presents an original filter approach for effective feature selection in microarray data characterized by a large number of input variables and a few samples.

In most former studies of AI-based hydrological modeling techniques, temporal variables (as historical time series) of the process were proposed as inputs of the model to predict runoff (Anmala *et al.* 2000). On the other hand, in some studies, the AI-based and conceptual models have been combined to estimate the river flow (Tokar & Markus 2000; Ashu & Sanaga 2006; Chen & Adams 2006; Corzo *et al.* 2009; Roushangar *et al.* 2018a). These models, which combine conceptual hydrological models and AI-based model usually include two parts, one is a conceptual model for runoff of every sub-catchment and the other is the AI-based model to ensemble the runoff from each catchment. Nourani *et al.* (2013) developed a geomorphological model to handle Eel River watershed runoff. The quality of results supported the beneficial application of the approach in spatio-temporal modeling of the hydrological processes. Multi-station modeling is a conceptual approach whose capability has been confirmed, and in some cases, imposing geomorphological features on the model caused an improvement in results (Zhang & Govindaraju 2003; Sarangi *et al.* 2005).

By considering the aforementioned issues in prediction of discharge in rivers, the main purpose of the present study is to predict the daily river stage–discharge time series and enhance the capability of modeling scenarios by using hybrid WT-EEMD-MI and RVM for: (1) investigating capability of WT-EEMD-MI based multi-station model along with physical characters to forecast discharge and find improvements; (2) reconstructing integrated RVMs (I-RVMs) by a hybrid use of WT-EEMD-MI and find progress; (3) multivariate prediction of river discharge via multivariate RVM (MRVM) and enhance its capability by the proposed pre-processing approach. Results are going to be compared with classic RC to investigate the improvements of results.

## MATERIAL AND METHODS

### Study area and used data

The Souris River is a river in central North America. It is about 700 km in length and drains about 61,100 km^{2}. It rises in the Yellow Grass Marshes north of Weyburn, Saskatchewan. It wanders south through North Dakota beyond Minot to its most southern point in the city of Velva, and then back north into Manitoba. The river passes through the communities of Melita, Hartney, Souris, and Wawanesa and on to its confluence with the Assiniboine River at Treesbank, about 40 km south east of Brandon. The main tributaries which flow into the Souris in Manitoba are the Antler River, the Gainsborough and Plum Creeks. Much of its drainage basin is fertile silt and clay deposited by former glacial Lake Souris. The channel capacity of the river in Manitoba varies from about 4.2 m^{3}/s near the border, to about 40 m^{3}/s through Melita, to about 31 m^{3}/s near Lauder and 48 m^{3}/s near Hartney. North of Hartney the capacity increases to more than 85 m^{3}/s. The drop between the border and Hartney is only about 9.5 cm/km. Two large dams in Saskatchewan, Rafferty Dam and Alameda Dam, were built, in part, to reduce flood peaks on the Souris River. Table 1 demonstrates the characteristics of the Souris River and used data.

Hydrometric station | Stations discharge (m^{3}/s) | Stage (m) Max. | Max. discharge (m^{3}/s) | Min. discharge (m^{3}/s) | Area (m^{2}) | Distance from next hydrometric station (km) |
---|---|---|---|---|---|---|

Sherwood | 4.1 | 1.02 | 10.8 | 0.36 | 23,154.4 | – |

Foxholm | 6.74 | 2.15 | 11.44 | 0 | 24,527.2 | 145 |

Minot | 7.28 | 1.65 | 12.32 | 0.07 | 27,453.9 | 42 |

Verendrye | 13.74 | 1.96 | 14.78 | 0.31 | 29,266.9 | 126 |

Bantry | 14.13 | 2.28 | 17.28 | 0.42 | 31.856.8 | 124 |

Westhope | 15.66 | 3.05 | 14.8 | 0.12 | 43,770.8 | 121 |

Hydrometric station | Stations discharge (m^{3}/s) | Stage (m) Max. | Max. discharge (m^{3}/s) | Min. discharge (m^{3}/s) | Area (m^{2}) | Distance from next hydrometric station (km) |
---|---|---|---|---|---|---|

Sherwood | 4.1 | 1.02 | 10.8 | 0.36 | 23,154.4 | – |

Foxholm | 6.74 | 2.15 | 11.44 | 0 | 24,527.2 | 145 |

Minot | 7.28 | 1.65 | 12.32 | 0.07 | 27,453.9 | 42 |

Verendrye | 13.74 | 1.96 | 14.78 | 0.31 | 29,266.9 | 126 |

Bantry | 14.13 | 2.28 | 17.28 | 0.42 | 31.856.8 | 124 |

Westhope | 15.66 | 3.05 | 14.8 | 0.12 | 43,770.8 | 121 |

Selected stations (Sherwood, Foxholm, Minot, Verendry, Bantry, and Westhope) as shown in Figure 1, are consecutive which is suitable for use in the proposed models. Also, the dataset was separated into two parts: 70% of total data as calibration dataset and the rest was considered as verification dataset. The historical daily discharge data and morphology of the river were available for six stations on the Souris River at the USGS website (http://co.water.usgs.gov/sediment).

### Rating-curves

*Q*is the stream discharge and H is the water level. The values of

*a*and

*b*for a particular stream are determined from data via a linear regression between log

*H*and log

*Q*. A major limitation of this approach is that it is not able to consider the hysteresis effect. In this study, the values of

*a*and

*b*are computed by using the least squares method.

### Concept of modeling via AI-based approaches

With respect to the reviewed researches, most of the deficiencies about application of AI-based models can be related to its structure. A simple AI model might not be able to handle the natural uncertainty of specific hydrological processes completely. As an instance, an ad hoc AI model is not designed to predict the variable of interest in an arbitrary point across the river. Therefore, it is required to design models which have semi-distributed models' characteristics. To this end, in this research, different scenarios in river discharge modeling context were proposed. Multi-station modeling via RVM, I-RVMs, and MRVMs are the conceptual models used in this study, which will be discussed. Figure 2 shows the modeling flowchart in this study.

On the other hand, discharge data include a broad domain of values and fluctuations and have to be pre-processed. As is demonstrated in Figure 2, it is proposed to use the WT to decompose the original time series into two components: the approximation components and the detailed components. Secondary decompositiom was performed by applying the EEMD to the detailed components obtained from WT into a new sub-series of intrinsic mode functions (IMFs). The goal of this step is to additionally decrease the non-stationarity of the detailed components captured from WT. Although decomposing twice might lead to the required outcome, reproduction of sub-series might defect the RVM performance. For this reason, MI was used to select dominant sub-series. Such a feature selection could increase the performance of RVM.

### Scenarios of modeling via RVM

#### Multi-station modeling of river discharge

The multi-station modeling of river discharge used in this study was designed to catch the non-linearity, uncertainty, and complexity of the river discharge. For this purpose, the multi-station model was established according to the observed discharge time series and spatially varying physical parameters like upstream drainage area of sub-basins. In other words, the model inputs consisted of the temporal features and geomorphological characteristics of the sub-basins. A schematic of the proposed model is shown in Figure 3. The physical characters are used as input data of the model since most physical models take advantage of this parameter (e.g., Melesse *et al.* 2003). Likewise, these characteristics are the prime features used by several experimental equations (e.g., SCS and rational equations). Upstream drainage area of the sub-basins is used in most physically based models. Hence, the spatial feature was used along with other features to improve results.

Since the model is in the multi-station form, the input matrix was determined in a way to cover the spatio-temporal variation of the river discharge uncertainties by using temporal features and geomorphological characteristics of the hydrometric stations. Therefore, multiple inputs were set in a way that all temporal and spatial features could establish a unit matrix. The input variables comprised different sets of antecedent and current gauge S-D values and upstream drainage area of the stations to forecast the discharge values (*Q _{i}*(

*t*),

*i*

*=*1, 2, 3…

*n*, where

*n*is the number of hydrometric stations). The multi-station model presents a single model which is capable enough to be employed instead of several models within the watershed. The proposed model has spatio-temporal characteristics which can be applied to forecast entire river discharge using only one model. Selection of an appropriate input combination for the model is the most important step in the modeling process (Nourani

*et al.*2013). In the multi-station model, the input variables of the model are selected among the temporal and spatial features of river to forecast the discharge values for each station (

*Qi*(

*t*

*+*1)) by assuming the Markovian property of the process. In addition, the proposed model is able to predict runoff in stations with lack of data or any desired point within the river due to employing the spatial and temporal features of the sub-basins as the model inputs. A MATLAB code was developed to address the model.

#### Integrated RVMs

Spatial variations in a river or watershed heterogeneity is common in nature and can be a conclusive factor affecting the modeling performance. Generally, the river discharge can be predicted using a simple AI model by employing time series of discharge and possibly other relevant features (Huo *et al.* 2012). Nevertheless, such a model cannot be trained using other stations' features which might lead to a weak outcome. To overcome this issue, this study planned a conceptual model inspired by the cascade of reservoirs model by the integrated form of the RVMs. The I-RVMs strategy used herein utilizes the concept of representing river as a cascade of reservoirs with different runoff capacity (Nash & Sutcliffe 1970) as presented in Figure 4. Similarly, for *n* reservoirs placed as a cascade, representing each of the stations, the outflow from *n*th reservoir can be specified based on the channel drainage network. In the proposed model, a watershed is divided into different sub-basins, on the basis of approximately the same interval lines passing through the main channel reach joints, as shown in Figure 4. Then, each sub-basin, accompanied by its main channel, is represented by a cascade of linear reservoirs and a linear channel. On the other hand, to transform the hydrologic model to AI-based model, the I-RVMs were designed to take advantage of the meta-learning system. Meta-learning is a privileged technique to unify or aggregate the results of multiple learners (Liao & Feng 2014).

Meta-learning is a general technique to aggregate the results of multiple learners, and it can be loosely defined as learning from information generated by a learner(s) (Chan & Stolfo 1993). Meta-RVM model is achieved by a basic configuration that has several base-RVMs and one meta-learner. The meta-learner learns from the outputs of base-RVMs, as presented in Figure 4. The meta-RVM trains the base-RVMs on disjoint subsets of a dataset and a ‘top’ RVM on the whole dataset where hierarchical architecture forms.

#### Multivariate prediction of river discharge

The proposed methodology was designed as a multi-input multi-output method in which the MRVM was trained based on an input dataset consisting of temporal features (S-D) of Westhope station for prediction of river discharge for the stations located at upstream of Westhope station (based on cross-correlation values captured from MI values). Therefore, by training only one MRVM for Souris River, the discharge values for all hydrometric stations could be predicted based on the S-D dataset of the Westhope station. The great difference between multi-station model and multivariate model is that multi-station model predicts the discharge time series as a united time series, whereas the proposed MRVM model predicts the discharge values of all stations separately. In such multi-output models, the interaction between the discharge values of all hydrometric stations on the river must be considered through the training process of the MRVM and, therefore, the trained system can be employed to fill the missing, or to predict the future discharge values of all hydrometric stations on the river using the hydrological data observed in the Westhope station.

Selecting an appropriate input for MRVM is a very substantial process due to multi-output structure. Therefore, employing the WT-EEMD-MI could increase capability of the MRVM in the modeling of non-linear and non-stationary features of discharge. Also, MI was used to select dominant subcomponents in order to improve MRVM performance.

#### Relevance vector machine

RVM has good generalization performance, as attaining sparsity in the exposure. The RVM is a kind of Bayesian regression structure, in which the weights of each input feature are governed by a set of hyperparameters. These hyperparameters demonstrate the posterior diffusion of the weights which are estimated frequently during training. Most hyperparameters near infinity, causing the posterior diffusions and effectively setting the corresponding weights to zero. The remaining features with non-zero weights are called *relevance vectors*.

Tipping's formulation (Tipping 2001) was designed to allow regression from a multivariate input to a univariate output feature. In multiple modeling, one solution is to use a single RVM for each output dimension. This solution leaves the problem that several RVMs must be trained and might face some issues. MRVM (Thayananthan *et al.* 2006) extended the RVM structure to multivariate outputs, establishing a general regression tool. This formulation allows designation of the same set of patterns for all output targets. In addition, MRVM is optimized based on the weights without the use of hyperparameters. A data likelihood is acquired as a function of weight variables and hyperparameters. The weight variables are then analytically consolidated to obtain a marginal likelihood as a function of the hyperparameters. An optimal set of hyperparameters is obtained by maximizing the marginal likelihood over the hyperparameters using a version of the fast marginal likelihood maximization algorithm. The optimal weight matrix is obtained using the optimal set of hyperparameters.

**W**

*and*

^{K}**S**

*. The weight matrix must be determined to elude overfitting. According to Tipping's relevance vector approach (Tipping 2001) and assuming a Gaussian prior for the weights of each basis function, , where each element*

^{K}*αj*is a hyperparameter that determines the

*relevance*of the associated basis function. The prior distribution over the weights is then where is the element at (

*r, j*) of the weight matrix

**W**

*.*

^{k}### Proposed pre-processing approach

#### Discrete wavelet transform

*et al.*1998; Nourani

*et al.*2013; Farajzadeh & Alizadeh 2018; Roushangar

*et al.*2018a). While the general theory behind WT is quite analogous to that of the short time Fourier transform, WT allows for a completely flexible window function (called the mother wavelet), which can be changed over time based on the shape and compactness of the signal. Given this property, WT can be used to analyze the time–frequency characteristics of any kind of time series. In recent years, WT has been widely used for the analysis of many hydro-meteorological time series (Nourani

*et al.*2013; Roushangar

*et al.*2018a, 2018c). As the mother wavelet moves across the time series during the WT process, it generates several coefficients that represent the similarity between the time series and the mother wavelet (at any specific scale). There are two main types of WT: continuous and discrete. Use of the continuous wavelet transform can generate a large number of (often unnecessary) coefficients, making its use and interpretation more complicated. On the other hand, the discrete wavelet transform (DWT) method simplifies the transformation process while still providing a very effective and precise analysis, since DWT is normally based on the dyadic calculation. DWT coefficients can be calculated by the following equation: where 2

^{a}represents the dyadic scale of the DWT. Applying DWT to a time series decomposes that time series into two ancillary time series shape components, called the approximation (A) and detailed (D) components. Component A comprises the large-scale, low-frequency component of the time series, while component D represents the small-scale, high-frequency component.

#### Ensemble empirical mode decomposition

EEMD was proposed to solve the mode mixing issue of empirical mode decomposition (EMD) which specifies the true IMF as the mean of an ensemble of trials (Wu & Huang 2009). Each trial consists of the decomposition results of the signal plus a white noise of finite amplitude. This new method is based on the insight from recent studies of the statistical properties of white noise (Wu & Huang 2004), which showed that the EMD method is an effective self-adaptive dyadic filter bank when applied to the white noise. On the other hand, studies demonstrated that noise could help data analysis in the EMD method. All these investigations promote the advent of the EEMD method.

The principle of the EEMD algorithm is as follows: the added white noise will populate the whole time–frequency space uniformly with the constituting components of different scales. When a signal is added to this uniformly distributed white noise background, the components in different scales of the signal are automatically projected onto proper scales of reference established by the white noise in the background. Because each of the noise-added decompositions consists of the signal and the added white noise, each individual trial may certainly produce very noisy results. However, the noise in each trial is different in separate trials (Wang *et al.* 2013; Roushangar & Alizadeh 2018). Thus, it can be decreased or even completely canceled out in the ensemble mean of enough trials. The ensemble mean is treated as the true answer because the only persistent part is the signal as more and more trials are added in the ensemble.

#### Shannon entropy and MI and selecting dominant sub-series

*X*with sample size of

*N*(bin number), which takes values

*x*

_{1};

*x*

_{2}; …;

*x*

_{N}with probabilities of

*p*

_{1},

*p*

_{2}, …,

*p*

_{N}, respectively, as (Shannon 1948): Feature selection is more appropriate by MI than other linear correlation measuring tools since sometimes there may be a weak linear but strong non-linear relationship between input(s) and output(s). Such a usage of MI along with RVM is favorable since both tools are beneficial in finding non-linear properties of features. MI between two random variables X and Y is defined as (Yang

*et al.*2000): with

*H*(

*A*) and

*H*(

*B*) being the entropy of

*A*and

*B*, respectively, and

*H*(

*A*,

*B*) their joint entropy as:

*et al.*2016). Using a trial and error process to obtain the optimum decomposition level ended at level 5 in six sub-series (one approximation and five detailed). As was studied by Aussem

*et al.*(1998), on finding a correct decomposition level, initially the following formula which offers a minimum level of decomposition was introduced: where

*L*is decomposition level and

*N*is the number of time series features (Nourani

*et al.*2013; Farajzadeh & Alizadeh 2018; Roushangar

*et al.*2018b). In the present study,

*L*is equal to 3. This experimental equation was derived for fully autoregressive signals and only considering the time series length without considering the seasonal efficacy of the process. In order to handle the seasonality of the process, decomposition level up to 5 (2

^{5}; approximately monthly mode) was selected as the optimum level. Therefore, the seasonality of the process up to one month could be handled by the model. Due to the proportional relationship between the amount of discharge and river stage, these signals were supposed to have the same seasonality level and both time series were decomposed at the same level (i.e., level 5). On the other hand, since daily time series include various properties, non-linearity and non-stationarity, EEMD was applied to further breakdown previously decomposed time series (by WT).

In order to prevent probable issues from overtraining or other problems, it was tried to select the dominant sub-series from the decomposed time series prior to modeling. In this study, feature selection was performed using MI approach. MI is an unsupervised measuring tool computed with respect to input and target values. Input matrix was determined by considering general ranking of sub-series according to MI.

### Efficiency criteria

In this study, the total data were separated into calibration and verification sets. Three different criteria were selected to measure the revenue of the proposed forecasting methods; the root mean square error (RMSE), mean absolute error (MAE), and the determination coefficient (DC). The RMSE, MAE, and DC were applied to exhibit discrepancies between predicted and observed values (Adamowski & Chan 2011). The RMSE and MAE were used to quantify the modeling accuracy by generating a positive value. The RMSE and MAE grow from zero for perfect forecasts through large positive values as the differences between forecasts and observations become increasingly large. Obviously, a high value for DC (up to 1) and small value for RMSE indicate high efficiency of the model. Legates & McCabe (1999) indicated that a hydrological model can be sufficiently evaluated by these statistics. These measures are not oversensitive to extreme values (outliers) and are sensitive to additive and proportional differences between model predictions and observations. Therefore, correlation-based measures (e.g., DC statistic) can indicate that a model is a good predictor (Legates & McCabe 1999).

Input and output variables are normalized by scaling between 0 and 1 to eliminate their dimensions and to ensure that all variables receive equal attention during calibration of the model. Therefore, RMSE values are to be transformed between 0 and 1 (Nourani *et al.* 2016).

## RESULTS AND DISCUSSION

Results of modeling are demonstrated in Figure 5(c). Based on the results it was observed that results need to be strengthened in terms of DC, RMSE, and MAE. We discuss the results of modeling using RVM in the next section.

### Optimizing of Gamma values

In this study, RVM was applied to predict discharge time series for the Souris River. An accurate use of RVM is to select an appropriate kernel function and tune related hyperparameters. A number of kernels were discussed by researchers, but studies suggested the effectiveness of radial basis kernel (RBF) function in the case of machine learning approaches in the majority of civil engineering applications (Gill *et al.* 2006; Goel & Pal 2009; Nourani *et al.* 2016). For a fair comparison of results, a RBF, where *γ* (kernel width) is a kernel specific, was used by the RVM algorithm in this study.

### Results of multi-station model

*Q*(

*t*

*+*1) denotes the river discharge in any cross section of river system,

*ɛ*

_{t+1}is the random error,

*x*(

*t*

*+*1) is the input vector, which may include many variables such as antecedent flow data and other relevant variables at various time lags, but in the present application, it consists of combinations of antecedent daily S-D records. It is evident that the training datasets should cover all the characteristicss of the problem in order to capture effective estimation. One of the most important steps in developing a satisfactory forecasting model is the selection of the input variables. Hence, different combinations of the antecedent S-D are used to construct the appropriate input structure for all scenarios.

In order to determine input variables for AI-based models, the input data were extracted based on the principle underlying the RC, in which different model structures were selected by a combination of stage and/or discharge variables including: *H _{t}*,

*H*

_{t − 1},

*H*

_{t − 2}

_{, …}and

*H*

_{t − 7}representing stage values at times (

*t*), (

*t*− 1), (

*t*− 2), … and (

*t*− 7) and

*Q*

_{t − 1},

*Q*

_{t − 2…}and

*Q*

_{t − 7}representing discharge values at times (

*t*− 1), (

*t*− 2), … and (

*t*− 7). Models 1–3 extract information from stage values alone; Models 4–6 extract information from discharge values alone with a structure similar to autocorrelation; and Models 7–10 extract information from both stage and discharge values. While it is clear that the RC is not the only technique, Table 2 shows the possible different choices of dependent variables for each modeling technique but these are required to have optimized structures by avoiding over-fitting. In the present study, only the best structures are used to report the performance of the proposed models. The performance measures outlined in ‘Performance criteria’ were used to select the particular model structure but other techniques will be used to assist the choice, e.g., scatter diagram, or plotting difference between individual model performances.

Input number | Input definition |
---|---|

1 | H_{(t)} |

2 | H_{(t)}, H_{(t−1)} |

3 | H_{(t)}, H_{(t−1)}, H_{(t−2)} |

4 | Q_{(t)} |

5 | Q_{(t)}, Q_{(t−1)} |

6 | Q_{(t)}, Q_{(t−1)}, Q_{(t−2)} |

7 | Q_{(t)}, H_{(t)} |

8 | Q_{(t)}, Q_{(t−1)}, H_{(t)} |

9 | Q_{(t)}, Q_{(t−1)}, H_{(t)}, H_{(t−1)} |

10 | Q_{(t)}, Q_{(t−1)}, Q_{(t−2)}, H_{(t)}, H_{(t−1)} |

Input number | Input definition |
---|---|

1 | H_{(t)} |

2 | H_{(t)}, H_{(t−1)} |

3 | H_{(t)}, H_{(t−1)}, H_{(t−2)} |

4 | Q_{(t)} |

5 | Q_{(t)}, Q_{(t−1)} |

6 | Q_{(t)}, Q_{(t−1)}, Q_{(t−2)} |

7 | Q_{(t)}, H_{(t)} |

8 | Q_{(t)}, Q_{(t−1)}, H_{(t)} |

9 | Q_{(t)}, Q_{(t−1)}, H_{(t)}, H_{(t−1)} |

10 | Q_{(t)}, Q_{(t−1)}, Q_{(t−2)}, H_{(t)}, H_{(t−1)} |

*et al.*2013). In this way, the dimensionless geomorphological parameters were applied in the model. Therefore, the model construction for each sub-basin could be represented as: In Equation (10),

*i*refers to the sub-basin number (

*i*

*=*1, 2, 3…6).

*H*(

_{i}*t*−

*α*) is river stage values with 1-day lag time at

*ith*sub-basin.

*A*is the drainage area for each sub-basin and

_{i}*A*is the total watershed area. In order to develop the multi-station model for the entire watershed, the data of all six stations were imposed onto the RVM framework. Since the S-D values are sorely affected by recent conditions of watershed at daily time scale, only values with 2-days' lag times (i.e.,

_{t}*α*= 2) was used in the multi-station model. By considering the aforementioned issues for the RVM modeling, the multi-station model was calibrated and verified.

In order to find the efficient structure of the proposed multi-station model, sensitivity analysis was performed. For this end, the six best input combinations were considered to create the input matrix (Table 3). Table 3 shows the multi-station model's performance for different input combinations. In order to explore the efficiency of physical parameters, Comb. (1) was designed using only temporal variables (i.e., discharge-gauge stage data).

Combination No. | Input matrix variables | Calibration | Verification | ||||
---|---|---|---|---|---|---|---|

DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | ||

1 | Q(t), Q(t− 1), H(t) | 0.82 | 0.024 | 0.015 | 0.7 | 0.031 | 0.021 |

2 | Q(t), Q(t− 1), H(t), A | 0.84 | 0.021 | 0.014 | 0.74 | 0.029 | 0.19 |

3 | Q(t), H(t), A | 0.83 | 0.022 | 0.014 | 0.74 | 0.03 | 0.019 |

4 | Q(t), Q(t− 1), H(t), H(t− 1), A | 0.85 | 0.02 | 0.013 | 0.79 | 0.027 | 0.17 |

5 | Qd3, Qd4, Qd5, Qa, A | 0.88 | 0.019 | 0.012 | 0.82 | 0.025 | 0.016 |

6 | Qd1(imf8,9), Qd2(imf 6,7), Qd3, Qd4, Qd5, Qa, Ha, A | 0.90 | 0.017 | 0.011 | 0.85 | 0.021 | 0.014 |

Combination No. | Input matrix variables | Calibration | Verification | ||||
---|---|---|---|---|---|---|---|

DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | ||

1 | Q(t), Q(t− 1), H(t) | 0.82 | 0.024 | 0.015 | 0.7 | 0.031 | 0.021 |

2 | Q(t), Q(t− 1), H(t), A | 0.84 | 0.021 | 0.014 | 0.74 | 0.029 | 0.19 |

3 | Q(t), H(t), A | 0.83 | 0.022 | 0.014 | 0.74 | 0.03 | 0.019 |

4 | Q(t), Q(t− 1), H(t), H(t− 1), A | 0.85 | 0.02 | 0.013 | 0.79 | 0.027 | 0.17 |

5 | Qd3, Qd4, Qd5, Qa, A | 0.88 | 0.019 | 0.012 | 0.82 | 0.025 | 0.016 |

6 | Qd1(imf8,9), Qd2(imf 6,7), Qd3, Qd4, Qd5, Qa, Ha, A | 0.90 | 0.017 | 0.011 | 0.85 | 0.021 | 0.014 |

^{a}RMSE and MAE values are dimensionless due to normalization.

According to the obtained results, imposing dimensionless physical parameters along with temporal dataset caused an increase in modeling accuracy (Combs. (2)–(6)). Generally, Combs. (2)–(6) provided higher efficiency in comparison to Comb. (1). Moreover, replacing spatially varying parameters (i.e., *A _{i}*) instead of temporal variables in the input combination increased precision of outcome. Conjugating WT to temporal features in Comb. 5 gave an incredible insight into temporal features which caused Comb. 5 to have a better outcome. By considering improvement in Comb. 5, WT-EEMD-MI pre-processing approach was applied to find more dominant sub-series. For this reason,

*detail 1*and

*2*of

*db*(

*4*,

*5*) was further decomposed using EEMD. MI was used to select appropriate IMFs from decomposed sub-series. According to Comb. 6,

*imf 8*and

*9*from

*detail subseries 1*and

*imf 6*and

*7*from

*detail subseries 2*was selected via the proposed pre-processing approach. Figure 6 presents decomposed time series of the discharge values by db(4,5)-EEMD-MI for the multiple station format. By comparing the obtained results for Combs. (1)–(6) (Table 3), it could be inferred that Comb. (6) was more efficient than others. The input structure of this combination consisted of two temporal variables and one spatial variable (i.e.,

*A*). In Table 3,

_{i}*Qd1*means

*detail subseries 1*and

*Qd2*(

*imf 6, 7*) means decomposed

*detail subseries 2 via*EEMD and selected

*imf 6*and

*7*via MI and the rest are alike.

It must be mentioned that the drainage area (A) is probably one of the most important physical characteristics of the watershed. Drainage area reflects the volume of water that can be generated from rainfall. Considering this fact, the volume of available water for runoff would be the product of rainfall depth and the drainage area. Results approved the importance of using such a variable in structure of the proposed multi-station model.

### Integrated RVMs

For assessing the overall performance of I-RVMs in forecasting monthly discharge for the Westhope station, first temporal features of each station were pre-processed by WT-EEMD to improve the ability of base-RVMs. Then, each base-RVM was used to forecast target discharge values. Outcome of these models was aggregated to form a single time series. In the second stage, decomposed aggregated discharge time series were pre-processed by WT-EEMD-MI and then imposed into meta-learner to forecast Westhope's discharge values. Such a modeling strategy has the advantage of enjoying all datasets so that meta-learner can handle characteristics of the variable of interest (i.e., *Q*(*t* + 1) for Westhope station). One of the great differences between multi-station modeling and I-RVMs is using spatial data of the watershed. I-RVMs did not use spatial data, where the multi-station model was designed based on employing spatial data. However, I-RVMs were designed to explore spatial variation in S-D process and without requiring physical characteristics of the watershed. This model can operate as a non-linear time–space regression tool rather than a fully lumped model. Table 4 represents the input features of each base-RVM and meta-learner. According to obtained results from Table 4, I-RVMs performed well and in comparison to the multi-station RVM, outperformed it relatively. However, still both these strategies are unique and can be applied for different spatial and temporal conditions. In Table 4, for each variable, *i* represents the stations used in the modeling process. Scatterplots of predicted values of discharge via I-RVM are demonstrated in Figure 7.

Input Comb. (base learner) | Input Comb. (meta learner) | Calibration | Verification | ||||
---|---|---|---|---|---|---|---|

DC | RMSE | MAE^{a} | DC | RMSE | MAE^{a} | ||

Qid1(imf 8,9), Qid2(imf 6,7), Qid3, Qid4, Qid5, Qia,Hia | Σ Qid1(imf 8,9), Qid2(imf 6,7), Qid3, Qid4, Qd5, Qia,Hia | 0.88 | 0.032 | 0.021 | 0.83 | 0.036 | 0.024 |

Input Comb. (base learner) | Input Comb. (meta learner) | Calibration | Verification | ||||
---|---|---|---|---|---|---|---|

DC | RMSE | MAE^{a} | DC | RMSE | MAE^{a} | ||

Qid1(imf 8,9), Qid2(imf 6,7), Qid3, Qid4, Qid5, Qia,Hia | Σ Qid1(imf 8,9), Qid2(imf 6,7), Qid3, Qid4, Qd5, Qia,Hia | 0.88 | 0.032 | 0.021 | 0.83 | 0.036 | 0.024 |

^{a}RMSE and MAE values are dimensionless due to normalization.

### Multivariate prediction of river discharge

MRVM, which was first proposed by Thayananthan *et al.* (2006), was designed to model multivariate processes. Unlike I-RVM modeling process, MRVM was capable to be extended according to the river discharging system. It was concluded that using Westhope station's dataset was suitable to be used as input to handle other stations' discharge values and forecast them so the outputs lead to appropriate results. According to MI, the highest correlation was detected among Westhope and other stations time series (according to the cross MI values (not reported here)), accordingly, it was selected as input of MRVM. Since such an application of MRVM might be very sensitive to input combinations, pre-processing was carried out to capture the seasonality and possible non-stationarity issues. Similar to other scenarios, WT-EEMD pre-processing was performed and MI was used to select the most dominant sub-series. Table 5 shows results of three different input combinations via sensitivity analysis and the outcome was presented in terms of RMSE and DC. Also, it is noteworthy to say that the MRVM was designed in a way to find its optimized structure without requiring pre-defined parameters (Thayananthan *et al.* 2006). Comb. 1 was designed based on the best classic input structure (Table 2) which led to moderate results. Utilizing WT-EEMD-MI by employing *db*(*4,5*) mother wavelet and further decomposing of detail 1 and 2, caused improvement in results. A recovery was also obtained by further decomposing of all detailed signals obtained from *db*(*4,5*). In other words, all detailed sub-series of WT were imposed to EEMD and different numbers of IMFs were obtained. MI was then applied to select the dominant IMFs and determined input dataset was fed into MRVM. Such a decomposition method, which was used for the first time in this research, caused improvement in outcome (20–40%). Outcomes of modeling for Minot and Bantry stations are demonstrated in Figure 8.

Station | Comb. 3 | Comb. 2 | Comb. 1 | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Qd1(imf 8,9), Qd2(imf 6,7), Qd3(imf 6,7), Qd4(imf 6,7), Qd5(imf 6,7), Qa, Ha | Qd1(imf 8,9), Qd2(imf 6,7), Qd3, Qd4, Qd5, Qa, Ha | Q(t), Q(t − 1), Q(t− 2), H(t), H(t − 1) | ||||||||||||||||

Calibration | Verification | Calibration | Verification | Calibration | Verification | |||||||||||||

DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | |

Sherwood | 0.90 | 0.017 | 0.012 | 0.82 | 0.024 | 0.016 | 0.86 | 0.019 | 0.012 | 0.75 | 0.028 | 0.018 | 0.71 | 0.034 | 0.023 | 0.66 | 0.44 | 0.32 |

Foxholm | 0.86 | 0.02 | 0.014 | 0.80 | 0.026 | 0.017 | 0.82 | 0.025 | 0.017 | 0.73 | 0.03 | 0.02 | 0.7 | 0.037 | 0.026 | 0.63 | 0.048 | 0.034 |

Minot | 0.90 | 0.021 | 0.014 | 0.85 | 0.024 | 0.016 | 0.84 | 0.033 | 0.21 | 0.75 | 0.039 | 0.26 | 0.72 | 0.038 | 0.027 | 0.66 | 0.047 | 0.33 |

Vendry | 0.90 | 0.024 | 0.016 | 0.86 | 0.036 | 0.024 | 0.83 | 0.034 | 0.22 | 0.76 | 0.041 | 0.29 | 0.69 | 0.049 | 0.035 | 0.62 | 0.06 | 0.04 |

Bantry | 0.88 | 0.025 | 0.17 | 0.81 | 0.034 | 0.22 | 0.81 | 0.042 | 0.29 | 0.69 | 0. 046 | 0.33 | 0.67 | 0.052 | 0.4 | 0.59 | 0.068 | 0.051 |

Westhope | 0.91 | 0.03 | 0.02 | 0.84 | 0.034 | 0.22 | 0.83 | 0.039 | 0.26 | 0.7 | 0.0.44 | 0.31 | 0.7 | 0.053 | 0.41 | 0.65 | 0.067 | 0.05 |

Station | Comb. 3 | Comb. 2 | Comb. 1 | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Qd1(imf 8,9), Qd2(imf 6,7), Qd3(imf 6,7), Qd4(imf 6,7), Qd5(imf 6,7), Qa, Ha | Qd1(imf 8,9), Qd2(imf 6,7), Qd3, Qd4, Qd5, Qa, Ha | Q(t), Q(t − 1), Q(t− 2), H(t), H(t − 1) | ||||||||||||||||

Calibration | Verification | Calibration | Verification | Calibration | Verification | |||||||||||||

DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | |

Sherwood | 0.90 | 0.017 | 0.012 | 0.82 | 0.024 | 0.016 | 0.86 | 0.019 | 0.012 | 0.75 | 0.028 | 0.018 | 0.71 | 0.034 | 0.023 | 0.66 | 0.44 | 0.32 |

Foxholm | 0.86 | 0.02 | 0.014 | 0.80 | 0.026 | 0.017 | 0.82 | 0.025 | 0.017 | 0.73 | 0.03 | 0.02 | 0.7 | 0.037 | 0.026 | 0.63 | 0.048 | 0.034 |

Minot | 0.90 | 0.021 | 0.014 | 0.85 | 0.024 | 0.016 | 0.84 | 0.033 | 0.21 | 0.75 | 0.039 | 0.26 | 0.72 | 0.038 | 0.027 | 0.66 | 0.047 | 0.33 |

Vendry | 0.90 | 0.024 | 0.016 | 0.86 | 0.036 | 0.024 | 0.83 | 0.034 | 0.22 | 0.76 | 0.041 | 0.29 | 0.69 | 0.049 | 0.035 | 0.62 | 0.06 | 0.04 |

Bantry | 0.88 | 0.025 | 0.17 | 0.81 | 0.034 | 0.22 | 0.81 | 0.042 | 0.29 | 0.69 | 0. 046 | 0.33 | 0.67 | 0.052 | 0.4 | 0.59 | 0.068 | 0.051 |

Westhope | 0.91 | 0.03 | 0.02 | 0.84 | 0.034 | 0.22 | 0.83 | 0.039 | 0.26 | 0.7 | 0.0.44 | 0.31 | 0.7 | 0.053 | 0.41 | 0.65 | 0.067 | 0.05 |

^{a}RMSE and MAE values are dimensionless due to normalization.

The proposed pre-processing approach comprised WT, EEMD, and MI methods. WT decomposed the time series into sub-series grown in a dyadic form. Further breakdown was performed via EMMD and caused sub-series to be obtained with more stationary properties and less noise. Since decomposing into two levels led to a number of sub-series (about 40 to 50 sub-series), feeding them into RVM might have caused over-training and other issues. Therefore, MI was used to decrease the number of sub-series and omit noise and select dominant sub-series. Using WT-EEMD-WT caused improvement in results of about 2–20% in average in comparison other models.

With regard to the importance of river discharge predictions in hydro-environmental management, in the present study, the ability of the proposed models was also examined for multi-step-ahead predictions in daily scale. Three- and seven-days (weekly) ahead predictions were performed by taking advantage of each model's best input dataset. The multi-station model did not provide accurate predictions and sometimes results were very poor. Considering the outcome, lead time forecast for the multi-station model was not reported here. The obtained results for I-RVMs (for Westhope station) and MRVM (all stations except Westhope) are demonstrated in Table 6. Results showed that employing three-step-ahead prediction led to an acceptable outcome by the means of RMSE, MAE, and DC values. Despite the obtained results for 3-days-ahead prediction, outcome of prediction for 7-days-ahead weakened relatively. It is indicated from Table 6 that in multi-days-ahead predictions, selected input led to relatively poor performance in comparison to one-step-ahead modeling and missed its precision by increasing the prediction horizon. According to the results from Table 6, results of prediction using the best structure were relatively weak for some stations. The reason is because in multi-step-ahead prediction, Markovian property weakens by the time step increase.

Station | Three-days-ahead | Seven-days-ahead | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Calibration | Verification | Calibration | Verification | |||||||||

DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | |

Sherwood | 0.84 | 0.02 | 0.013 | 0.72 | 0.032 | 0.024 | 0.7 | 0.035 | 0.028 | 0.65 | 0.046 | 0.036 |

Foxholm | 0.80 | 0.028 | 0.019 | 0.71 | 0.033 | 0.026 | 0.67 | 0.04 | 0.032 | 0.58 | 0.059 | 0.46 |

Minot | 0.77 | 0.034 | 0.024 | 0.72 | 0.037 | 0.029 | 0.69 | 0.043 | 0.034 | 0.57 | 0.062 | 0.049 |

Vendry | 0.79 | 0.039 | 0.029 | 0.71 | 0.045 | 0.034 | 0.64 | 0.05 | 0.039 | 0.53 | 0.068 | 0.052 |

Bantry | 0.74 | 0.04 | 0.031 | 0.69 | 0. 046 | 0.035 | 0.65 | 0.05 | 0.038 | 0.55 | 0.073 | 0.058 |

Westhope | 0.83 | 0.039 | 0.027 | 0.7 | 0.044 | 0.32 | 0.7 | 0.053 | 0.04 | 0.65 | 0.067 | 0.053 |

Station | Three-days-ahead | Seven-days-ahead | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Calibration | Verification | Calibration | Verification | |||||||||

DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | DC | RMSE^{a} | MAE^{a} | |

Sherwood | 0.84 | 0.02 | 0.013 | 0.72 | 0.032 | 0.024 | 0.7 | 0.035 | 0.028 | 0.65 | 0.046 | 0.036 |

Foxholm | 0.80 | 0.028 | 0.019 | 0.71 | 0.033 | 0.026 | 0.67 | 0.04 | 0.032 | 0.58 | 0.059 | 0.46 |

Minot | 0.77 | 0.034 | 0.024 | 0.72 | 0.037 | 0.029 | 0.69 | 0.043 | 0.034 | 0.57 | 0.062 | 0.049 |

Vendry | 0.79 | 0.039 | 0.029 | 0.71 | 0.045 | 0.034 | 0.64 | 0.05 | 0.039 | 0.53 | 0.068 | 0.052 |

Bantry | 0.74 | 0.04 | 0.031 | 0.69 | 0. 046 | 0.035 | 0.65 | 0.05 | 0.038 | 0.55 | 0.073 | 0.058 |

Westhope | 0.83 | 0.039 | 0.027 | 0.7 | 0.044 | 0.32 | 0.7 | 0.053 | 0.04 | 0.65 | 0.067 | 0.053 |

^{a}RMSE and MAE values are dimensionless due to normalization.

### Comparison of results

Previously used AI-based models or physically based models were different from the proposed models used herein and the results are discussed here in terms of quality and quantity. First, we tried to explain the capability of proposed models under different conditions.

In order to demonstrate the capability of the proposed models, it is necessary to investigate in two separate parts: proposed hydrological models and proposed pre-processing approach. Three different strategies were proposed to study Q-H relation of the river, namely, multi-station model, I-RVMs and MRVM. All proposed models have unique capabilities and some deficiencies to be discussed below.

**Rating curve:**RC is a physically based model which is used to extract information from river stage dataset. The best outcome was captured for Verendry station (DC = 0.81, RMSE = 0.042, and MAE = 0.034), whereas the worst result was captured for Westhope station (DC = 0.69, RMSE = 0.046, and MAE = 0.0.36).**Multi-station model:**The proposed model is in continuous and multi-station form and since this model took advantage of watershed morphology data (upstream drainage area of hydrometric stations) it behaved like a spatio-temporal model that could identify and predict the discharge values more accurately when employed with the proposed WT-EEMD-MI pre-processing approach. The multi-station model could handle the uncertainty and non-linearity of the Souris River by exploring the interaction of stations by using the spatial and temporal variables and it is capable of being applied to predict discharge values at an arbitrary point of the watershed because of its structure (input data, form of prediction) (DC = 0.83, RMSE = 0.32, MAE = 0.21, in verification period).**Integrated RVMs:**The integrated form of RVMs was designed to predict the discharge time series of downstream (Westhope station) without using its discharge values which is an advantage in this model. In the proposed model, there are no spatial features and temporal features were pre-processed by the WT-EEMD-MI approach. The mathematical formulation of I-RVMs was inspired from Nash's cascade of linear reservoirs' model and was transformed into AI-based model by using meta-learner system (DC = 0.85, RMSE = 0.21, MAE = 0.014 in verification period (best scenario based on results)).**MRMV:**The MRVM took advantage of MRVM's multi input-multi output structure to predict the river discharge via one input dataset. The main difference between the multi-station model and MRVM is the form of prediction. In the multi-station model, the discharge values of various stations are predicted as a united time series, whereas MRVM predicts the values separately for each station. The obtained results proved to be more accurate when they are applied in conjugation with WT-EEMD-MI.**WT-EEMD-MI:**The proposed pre-processing approach consisted of WT, EEMD, and MI approaches. WT decomposes the time series into sub-series which grow in a dyadic form. The sub-series carry the seasonal properties which strengthen any AI-based model to predict the time series more accurately. Further breakdown via EMMD led to obtaining sub-series with more stationary properties and less noise in the data. Since decomposing in two levels led to a number of sub-series (about 40 to 50 sub-series), feeding them into RVM might lead to over-training or other issues. Therefore, MI was used to select the dominant sub-series and decrease the number of sub-series and omit noise. Using the WT-EEMD-MI caused improvement in results (about 2–20% in comparison to other models. Table 7 demonstrates the properties of the proposed models. Quantitative comparison of overall results are demonstrated in Figure 9.

Strategy of modeling | Characteristics | Advantages and disadvantages | Pre-processing method | Multistep-ahead-predictions |
---|---|---|---|---|

Multi-station | Multi-station form | High accuracy | EEMD-WT-MI | No |

Spatio-temporal properties | Fill the missing values | |||

Predict entire river discharge by training only one RVM | ||||

Using geomorphological features as input | ||||

I-RVMs | Integrated form of AI | High accuracy | EEMD-WT-MI | Yes |

Meta-learning system | Take advantage of semi-distributed specification to predict precisely | |||

Using geomorphological unit hydrograph based on cascade of linear reservoirs | Fill the missing values of target station | |||

Combined form GUHCR and meta-learning system | Unable to predict or fill missing values of other stations in integrated form | |||

Semi distributed AI model | ||||

MRVM | Multi input multi output | Fill the missing values of target stations | EEMD-WT-MI | Yes |

Unable to predict or fill missing values of other stations |

Strategy of modeling | Characteristics | Advantages and disadvantages | Pre-processing method | Multistep-ahead-predictions |
---|---|---|---|---|

Multi-station | Multi-station form | High accuracy | EEMD-WT-MI | No |

Spatio-temporal properties | Fill the missing values | |||

Predict entire river discharge by training only one RVM | ||||

Using geomorphological features as input | ||||

I-RVMs | Integrated form of AI | High accuracy | EEMD-WT-MI | Yes |

Meta-learning system | Take advantage of semi-distributed specification to predict precisely | |||

Using geomorphological unit hydrograph based on cascade of linear reservoirs | Fill the missing values of target station | |||

Combined form GUHCR and meta-learning system | Unable to predict or fill missing values of other stations in integrated form | |||

Semi distributed AI model | ||||

MRVM | Multi input multi output | Fill the missing values of target stations | EEMD-WT-MI | Yes |

Unable to predict or fill missing values of other stations |

## CONCLUDING REMARKS

In this research, capabilities of RVM by using conceptual modeling scenarios were verified to discover their quantitative and qualitative aspects in prediction of river S-D process. The reason for selecting RVM and MRVM was due to their structure and least parameters required to be tuned. Captured results could be separated into different categories. Pre-processing and proposed scenarios are the other subjects which are inferred from this research.

EEMD and WT gave incredible vision in the time and frequency domain of data to capture the non-linear and seasonal properties. WT was first applied to decompose both S-D time series. Since first and second detailed sub-series had a weak correlation with target data, EEMD was used to decompose the sub-series obtained from DWT (details and approximations) again into IMFs. By capturing more correlated first and second detailed sub-series decompositions by MI and imposing them on RVM, an increase in performance was observed. Especially in multivariate modeling, because of its sensitive structure, EEMD was applied for all detailed sub-series and obtained results showed a good agreement with observed time series.

Due to issues in hydrological predictions by using simple AI-based models, the present study took advantage of conceptual models. Multi-station model was designed to model river discharge in multiple-station form by training only one RVM. This strategy was capable of forecasting at the point of interest in a river or watershed by including physical properties of the case study. I-RVMs designed based on Nash's cascade of reservoir and meta-learning RVMs system were able to predict the leading point of the river (outlet of watershed) precisely. MRVM, which was the opposite of I-RVMs, used Westhope station's (last station located on the river) temporal features to predict other stations time series. Incorporating WT-EEMD-MI increased its accuracy and led to accurate results.