Cascade-based multi-scale AI approach for modeling rainfall-runoff process

In this paper, runoff time series of the sub-basins in a cascade form were decomposed by Wavelet Transform (WT) to extract their dynamical and multi-scale features for modeling Multi-Station (MS) rainfall-runoff (R-R) process of the Little River Watershed (LRW) in USA. A Self-Organizing Map (SOM) clustering technique was also employed to find homogeneous extracted sub-series’ clusters. As a complementary feature, extraction criterion of mutual information (MI) was utilized for proper cluster agent choice to impose to the artificial intelligence (AI) models (Feed Forward Neural Network, FFNN; Extreme Learning Machine, ELM; and Least Square Support Vector Machine, LSSVM) to predict the runoff of the LRW sub-basins. The performance of wavelet-based runoff prediction was compared to the Markovian-based MS model. The proposed method not only considers the prediction of the outlet runoff but also covers predictions of interior sub-basins behavior. The outcomes showed that the proposed AI-models combined with the SOM and MI tools enhanced the MS runoff prediction efficiency up to 23% in comparison with the Markovian-based models. Nevertheless, benefit of the seasonality of the process along with reduction of dimension of the inputs could help the AI-models to consume pure information of the recorded data. doi: 10.2166/nh.2017.045 s://iwaponline.com/hr/article-pdf/49/4/1191/537899/nh0491191.pdf Vahid Nourani (corresponding author) Gholamreza Andalib Elnaz Sharghi Department of Water Resources Engineering, Faculty of Civil Engineering, University of Tabriz, P.O. Box: 51666, Tabriz, Iran E-mail: vnourani@yahoo.com Vahid Nourani Department of Civil Engineering, Near East University, P.O. Box: 99138, Nicosia, North Cyprus Fahreddin Sadikoglu Department of Electrical and Electronic Engineering, Near East University, P.O. Box: 99138, Nicosia, North Cyprus


INTRODUCTION
Conversion of rainfall to runoff, according to the laws of gravity, vivifies earth, replenishes groundwater, keeps rivers and lakes full of water, and varies the landscape by the action of erosion. The benefit of rainfall-runoff (R-R) modeling as a role of science is providing information for engineers and decision makers to manage, protect, and enhance water resources. Large uncertainties and high non-linearity of the R-R barricade the process-based modeling and seek a black box relationship between driving and resultant variables.
So, various black box methods such as artificial intelligence (AI) models have been already presented for R-R simulation Resdi ), to the best of our knowledge there was no study using a MS framework considering seasonality or periodic properties and its influence on the runoff transmission of the watershed for patterning the R-R process. Accordingly, this paper aims to not only predict the outlet runoff values of Little River Watershed (LRW) but also to simulate the runoff time series via a cascade manner at particular points inside the watershed. Therefore, R-R MS modeling is considered in cases where the inside runoff of the LRW is necessary to be predicted. In cascade modeling, the upper sub-basins runoff time series are applied to predict the interior subbasins runoff, and consequently, central sub-basins are attended in the LRW outlet runoff prediction. Therefore, the MS model can provide a promising platform regarding the runoff amount in LRW critical places. Hence, two different scenarios are considered for R-R MS modeling to identify an appropriate strategy in hydro-environmental studies. In the first scenario, R-R process Markovian property is suggested as the MS model base, where rainfall antecedent and sub-basins runoff time series are shared (Nourani & Komasi ). Furthermore, MI non-linear feature extraction criterion, which is a more appropriate measure compared to the Correlation Coefficient (CC) linear measure, is used for the suitable inputs selection of the Least Square SVM (LSSVM), Feed Forward Neural Network (FFNN), and ELM models to prevent from the laborious process of trialerror for input selection where FFNN is the most commonly used AI model in hydrology, LSSVM uses the concept of SVM classifier as a pre-processing tool which sometimes can lead to more accurate results and ELM as a newer generation of AI models were employed and compared in this study, of course other AI-based models (e.g. genetic programming, Ravansalar et al. ) can also be applied. In the second scenario, the seasonality-based or multi-scale property of the R-R process would be considered, where the sub-basins runoff time series using the WT are decomposed at an appropriate level to clarify temporal and spectral time series information. Consequently as a new feature extraction approach, both SOM and MI are used respectively for homogeneous sub-signals clustering and choosing the proper agents of clusters, to be fed into LSSVM, FFNN and ELM models for the LRW MS runoff modeling.

STUDY AREA AND DATA SET
The southeast USA is considered as an authoritative region from agricultural, social, and economical points of view, because of its rapidly increasing population which will increase environmental stress and water demand to currently degraded or stressed ecosystems. The LRW covers 334 km 2 and approximately 30% of the northern half and 40% of the southern half has been occupied by agricultural cropland including eight sub-basins of extent increasing from 2.62 to 334 km 2 (Bosch et al. , Figure 1(a)). The observed rainfall and runoff time series from the sub-basins of LRW (Figure 1(a)) considered in this research ranging from January 1990 to December 2012 were compiled and recorded (ftp://www.tiftonars.org/databases/LREW). Table 1 shows the used data statistics and Figure 2 indicates the runoff time series and recorded rainfall at the LRW outlet. The first 75% of total data (01/Jan/1990-02/Apr/ 2007, 6,301 days) were applied for the training and the remaining 25% data set (03/Apr/2007-31/Dec/2012, 2,100 days) used for verification purposes. In this way, higher values of maximum and standard deviation were considered in the training data set, due to the fact that the AI models, LSSVM, ELM and FFNN, can present accurate predictions for unseen data if their interpolator systems are familiar with the same patterns. To speed up training systems, the input and output data were normalized before entering into the training step.

Proposed methodology
Due to the non-linear influential dynamical parameters in the R-R process, the LSSVM, FFNN, and ELM nonlinear AIbased models via two various scenarios were proposed in this research to predict runoff in the LRW outlet and some interior points. In a black-box modeling task of R-R, in a lumped routing manner, the flow is predicted at river specific locations (at sub-basin outlets). In this regard, as can be seen in Figure 1 the sub-basin I runoff was predicted according to the subbasins M, K and J runoff time series. In the next step, the sub-basin F runoff was predicted by the sub-basin I data and in the final step, the LRW outlet runoff at station B could be determined using the sub-basins F, N and O data. Therefore, the MS runoff modeling pattern of the LRW was completed using three AI modeling methods as a reservoir cascade.
Two various scenarios of Markovian and seasonality-based (multi-scale) applied for LRW MS runoff modeling are explained in the following sub-sections (also see Figure 3).

Scenario 1
In this scenario, the LRW MS runoff modeling was set according to the Markovian property so that the runoff values at interior sub-basins I and F, as well as the LRW outlet, were simulated using relevant upstream sub-basins antecedent runoff values via Equations (1)-(3), respectively.
The sub-basin I runoff was predicted through sub-basins M, K, J runoff values and sub-basin I rainfall as follows (see Figure 1(b)): where interior sub-basin I runoff (Q) at time t is the function However, suitable input selection among numerous potential inputs is a crucial step for the MS modeling.
Therefore, the MI supervised feature extraction criterion was used here for proper input set identification instead of applying a trial-error method. Accordingly, from potential input variables, those with maximum MI values with the target (model output) were chosen and considered in the AI models for sub-basin I runoff prediction. Such a nonlinear measure superiority in ANN-based input selection over linear CC measure was previously investigated in several studies (e.g. Nourani et al. ). Similarly, the subbasin F runoff was predicted using sub-basin I runoff via Equation (2), and the outlet runoff at station B was predicted via sub-basins F, N and O runoff values as Equation (3):

Scenario 2
In this scenario, the seasonality of R-R process of the LRW formed the MS model basis. The focus of the second scenario was on applying proper sub-basin dominant frequencies to remove redundant data.
For the interior sub-basin I runoff prediction, three steps were followed. First, the sub-basins M, J, K runoff time series were decomposed by WT at level q to control the process seasonal and non-stationarity influences, in a way that subbasin I runoff was relevant to sub-signals of upstream subbasins as: where Q M a and Q M dq are the approximation and detail runoff sub-signals of sub-basin M at level q, respectively. In the same way, other sub-signals are relevant to J and K subbasins. In the next step, because of numerous potential inputs, the SOM clustering tool was applied for homogenous sub-signals spatio-temporal grouping. Finally, like scenario 1, MI was applied for dominant input sub-signals selection from each cluster to be imposed in the AI models for sub-basin I outlet ( Figure 4). In the second scenario, it is worth mentioning that MI is not appropriate to directly choose dominant inputs as in the first scenario due to, first, when the numerous potential model inputs are ranked according to their MI with the output, the main problem is how to separate dominant inputs from the inputs of ranked potential model with methods such as maximum reduction rate. However, applying SOM before MI would solve the problem of dominant inputs number by clustering potential inputs into particular groups. Second, the criterion of MI can be considered to determine suitable model inputs but it is not directly able to cover the problem of redundant inputs. Consequently, it is possible that MI chooses just one pattern and most likely the prediction accuracy in the verification step will collapse. However, the SOM is applied to cluster the input variables into similar inputs groups with particular patterns. After clustering using the SOM, MI chooses dominant inputs from each cluster with various patterns as inputs of the AI models to improve the prediction accuracy of unseen data in the verification step.
Similarly, for the runoff values prediction at stations F and B, Equations (5) and (6) were respectively applied (see Figure 4). After training the AI models using available historical rainfall and runoff data seen at different stations, it was possible to provide output of each sub-basin in the future just using the data from the relevant upstream subbasins as: The necessary tools for the two suggested scenarios are explained in the subsequent sections.

WT and Shannon entropy
For capturing the R-R process seasonality pattern in the second scenario, WT was applied for the time series decomposition into sub-signals at various time scales. The wavelet provides a time-scale localization of time series obtained from the compact support of its main function and relevance effects between the WT function and time series. In the hydro-environmental fields, the signals mainly have discrete forms, hence discrete WT was introduced by Mallat () as Equation (7): where *, g(t), m and n are the complex conjugate, wavelet function or mother wavelet (MW), wavelet dilation and translation, respectively (a 0 > 1, b 0 > 0).
To select the suitable inputs of AI models with regard to the target in a non-linear process, it is necessary to use a robust supervised tool. To this end, an entropy-based feature extraction tool of MI was utilized which is briefly described below.
Shannon entropy (H ) or information content, for a discrete variable of X by sample size of N (bin number), that obtains values x 1 ; x 2 ; …; x N with probabilities of p 1 , p 2 , …, p N , respectively, as (Shannon ): MI of X and Y is calculated by (Yang et al. ): where H(A) and H(B) are the entropy of A and B, and H(A, B) their joint entropy by: For AI modeling (e.g. FFNN, ELM and LSSVM), the codes were developed in the MATLAB ® environment (MathWorks ).

Evaluation of models precision
The Determination Coefficient (DC) and Root Mean Square Error (RMSE) as two diverse criteria were used to assess the efficiency of the MS runoff values prediction. The DC and RMSE can be utilized to indicate differences between predictions and recorded values. Legates & McCabe () revealed that hydro-environmental models could be effectively evaluated using Equations (13) and (14). where

RESULTS AND DISCUSSION
The results of the proposed R-R MS modeling using LRW sub-basins information for two different scenarios are investigated in the following sub-sections. In this way obtained results of FFNN, ELM and LSSVM models are compared via the proposed scenarios.

Results of scenario 1
In data-driven FFNN, ELM and LSSVM models, the suitable detection of inputs has a crucial role in improving the model's performance in both calibration and verification steps; also, it prevents over-training of the model. For this purpose, dominant inputs selection and their appropriate lag times for prediction of runoff values of the sub-basins was carried out by MI (Equation (9)). Table 2 shows the sensitivity analysis results via MI in selecting dominant inputs for MS predictions of runoff. Table 2 Table 1). So, it could be deduced that the inconformity between outlet runoff values per area with land cover/use classification has an anthropogenic source.
As in sub-basins with high pasture and cropland, water consumption is high which reduces outlet runoff from these subbasins whereas regularly sub-basins with high pasture and cropland with lack of forested areas should allocate less runoff absorption value per area to themselves due to more bulk densities, less infiltration rates, and water holding capacities than forested land. In addition, the MS models input selection using MI was compatible with the LRW geomorphology. In the first model, regarding the sub-basin I runoff prediction, the sub-basins K and J runoff values at time t with no delay were chosen as the inputs due to the high slope and short distance of the river channels between sub-basins K and J outlet stations with the station of I. However, sub-basin M runoff with a 1 day delay was chosen by MI ( for runoff values predicted by MS models (see Table 2) revealed that although FFNN, ELM and LSSVM models could exhibit acceptable results, ELM could lead to more accuracy than FFNN and LSSVM models. An appropriate data pre-processing scheme such as the method proposed in scenario 2 could enhance the modeling performance.
The following sub-section presents the result of MS modeling via scenario 2.

Results of scenario 2
For a better R-R process understanding, the time series clarifying in both spectral and temporal terms would be helpful.
Therefore, to provide such clarifications and to extract   The clustering results for the decomposed time series of LRW can be seen in Table 3. The decomposed runoff sub-signals were classified into six groups for sub-basin I and B modelling, and four groups for sub-basin F. The clustering outcomes by unsupervised SOM indicated that mainly the subsignals were grouped according to their frequency scale, whereas low frequency sub-signals and approximation were grouped in the same groups and high and low frequencies were separated. In addition to the clustering (by SOM), MI was employed to pick dominant sub-signals from each cluster.  Similar to the first scenario, the sub-signals from each cluster which had higher non-linear correlation (high MI) with the main time series (target) were consequently entered into the AI models. Table 4 presents the dominant inputs of MS model in the second scenario picked by MI. In the first model, for sub-basin I runoff prediction, sub-basin J runoff time series participated with four sub-signals, i.e. 2 8 -day, 2 3day, 2 2 -day and 2 1 -day modes, and station I rainfall time series at time t. It is worth mentioning that sub-basin J includes more runoff and a larger area in comparison with sub-basins M and K. As the second effective area, sub-basin K in producing sub-basin I runoff, shared its approximation sub-signal and 2 7day mode. It can be noted that the furthest sub-basin M with runoff poverty due to huge forested lands was not desired in the runoff prediction of sub-basin I. In the second model, for the sub-basin F runoff prediction, the approximation, 2 8 -day,  Table 4 for scenario 2. The comparison of results showed that LSSVM has slightly more accuracy than FFNN, this is perhaps because firstly FFNNs often converge on local minima rather than global minima, and secondly FFNNs often overfit if training goes on too long, meaning that for any given pattern, an FFNN might start to consider the noise as part of the pattern. Also, superiority of ELM was proved to LSSVM and FFNN models due to presenting good generalization performance. For the second scenario, the observed and predicted runoff value in the LRW outlet obtained via ELM model can be seen in Figure 8.

Comparison of models
The MS runoff predictions for the LRW indicated differences between two proposed scenarios of Markovian and seasonality-based modeling ( Figure 9, Table 5). The AI models results indicated that scenario 2 can provide more precision outcomes as far as the R-R process seasonal (multi-scale) pattern was concerned. The justification would be that in scenario 2 the models were fed by dominant pre-processed data, whereas in scenario 1, the worthiness of sub-basins was not compared. Meanwhile, the accuracy of scenario 1 was lost because of the absence of temporal pre-processing and data robust purge. The prediction results of the LRW runoff via ELM can be seen in Cluster 4 Cluster 5 Cluster 6   inputs. Such antecedent modeling and MS modeling results via both scenarios can be observed in Table 5. According to the obtained assessment criteria shown in Table 5, it is evident that the cascade-based MS method has more accuracy in comparison with the lonely station B modeling.
Moreover, Figure 9 shows a comparison between the com-  Figure 10 where the scenario 2 error PDF with lower values of mean and standard deviation is well-proportioned

CONCLUDING REMARKS
In this study, three AI-models (FFNN, ELM and LSSVM) were applied for LRW cascade-based MS runoff prediction outputs. In such a cascade manner the output runoff was predicted using data from upper sub-basins. Before modeling, the conversion of rainfall to runoff in LRW was investigated by land cover classification and data statistics. It was deduced that in the sub-basins with high cropland and pasture, water consumption is high which reduces outlet runoff from such sub-basins. In the first scenario, the R-R time series antecedent were used as inputs, but for appropriate input selection, supervised feature extraction criterion of MI was applied to prevent the trial-error process. The MI input selection results indicated conformity with the LRW geomorphology (i.e. the input sub-basins far from the output sub-basin and low slope would have more lagged time in comparison with the high slope or near sub-basins).
Considering the second scenario and to improve the first scenario results, data pre-processing using feature extraction methods of WT and SOM-MI led to important hydrological parameters detection which proved helpful in enhancing AIbased MS runoff predictions. The second scenario success was because of the fact that the LRW R-R process obeys from a multi-scale seasonal pattern which can be covered using WT. The scenarios comparison indicated that using WT to capture sub-basins multi-scale features could enhance the model accuracy if it is combined with a promising feature extraction method such as SOM-MI, compared to ad hoc AImodels. Runoff time series of sub-basins were decomposed at level 8, which not only considers the dominant seasonality but also does not mar the clustering. Afterwards, decomposed sub-signals were clustered via SOM, and then the MI feature extraction criterion picked the effective subsignals of clusters as AI models inputs. Moreover, it was concluded that for the near upstream sub-basins, higher frequency sub-signals participate in LRW runoff predictions, while for far sub-basins it is the low frequency sub-signals.