Flood forecasting using an improved NARX network based on wavelet analysis coupled with uncertainty analysis by Monte Carlo simulations: a case study of Taihu Basin, China

Reliable flood forecasting can provide a scientific basis for flood risk assessment and water resources management, and the Taihu water level forecasting with high precision is essential for flood control in the Taihu Basin. To increase the prediction accuracy, a coupling model (DWT-iNARX) is established by combining the discrete wavelet transformation (DWT) with improved nonlinear autoregressive with exogenous inputs network (iNARX), for predicting the daily Taihu water level during the flood season under different forecast periods. And the DWT-iNARX model is compared with the back-propagation neural network (BP) and iNARX models to assess its capability in prediction. Meanwhile, we propose an uncertainty analysis method based on Monte Carlo simulations (MCS) for quantifying model uncertainty and performing probabilistic water level forecast. The results show that three models achieve good simulation results with higher accuracy when the forecast period is short, such as 1–3 days. In overall performance, iNARX and DWT-iNARX models show superiority in comparison with the BP model, while the DWT-iNARX model yields the best performance among all the other models. The research results can provide a certain reference for the water level forecast of the Taihu Lake.


INTRODUCTION
has flexibility in choosing the mother wavelet (i.e., the transform function) according to the characteristics of time series (Seo et al. 2015). Accordingly, the time series can be decomposed into its subcomponents using wavelet transformation and these multi-scale signals are used as input data for the ANN model, the result of which is a hybrid wavelet-ANN model. A majority of studies have been dedicated to applying the hybrid wavelet-ANN model to predict hydrological elements such as the water level (Seo et al. 2015;Loh et al. 2019), runoff (Shoaib et al. 2016(Shoaib et al. , 2018, and precipitation (Nourani et al. 2009a). Nourani et al. (2014) pointed out that the neural network model coupled with wavelet analysis has dominant advantages in time series de-noising, multiresolution analysis, and hydrological forecasting (Nourani et al. 2009b;Remesan et al. 2009;Nourani & Parhizkar 2013). The hybrid wavelet-ANN model has proven to be superior compared with the single model as it may present more probable forecasting rather than a single pattern input. However, the existing research focused on the coupling of wavelet analysis and feedforward neural network (FNN), while the coupling of wavelet analysis and NARX network has not been exhausted.
Most previous studies performed linear mixture regression for Taihu water level prediction (Lu et al. 2000;Wang et al. 2001;Wang & Dong 2003;Hu 2008). Practically, there is a complex and highly nonlinear relationship between the Taihu water level, and the previous Taihu water levels, rainfall in the basin, and the operation mode of flood control projects. To fill this knowledge gap, in this study, considering the influencing factors of Taihu water level, taking advantage of the NARX network in time series prediction, we combined the discrete wavelet transformation (DWT) with the improved NARX network (iNARX) to establish a coupling model (DWT-iNARX), for Taihu water level prediction. The main objectives of this study are (1) to forecast daily Taihu water level during the flood season under the forecast period of 1-7 days using the DWT-iNARX model, (2) to compare the DWT-iNARX model with BP and iNARX models to assess its capability in predicting water levels, and (3) to present an uncertainty analysis framework for quantifying model uncertainty and performing probabilistic water level forecast.

STUDY AREA AND DATA Study area
As shown in Figure 1, the Taihu Basin is located downstream of the Yangtze River Delta, with a total drainage area of 36,895 km 2 . The Taihu Lake is located in the middle of the basin, whose water surface area is about 2,336.8 km 2 under normal water level. Based on geographic and hydrological features, the Taihu Basin is usually divided into eight hydrological subregions: Hu Xi Region, Zhe Xi Region, Wu Cheng Xi Yu Region, Yang Cheng Dian Mao Region, Lake Region, Hang Jia Hu Region, Pudong Region, and Puxi Region. There is almost no water exchange between Pudong-Puxi regions and Taihu Lake, due to the Qingsong polder dike engineering in Shanghai.
The Taihu Basin lies in the subtropical monsoon climate zone, which is mild and humid, with an annual average rainfall of 1,189 mm. The distribution of the rainfall is uneven during the year, with the most rainfall occurring in the flood season (from May 1 to September 30), accounting for 75% of the annual rainfall. The Taihu water level generally drops at a rate of 2-3 cm/d when there is no rain, rises at a rate of 1-10 cm/d when there is significant rainfall, and rises at a rate of 20-30 cm/d when there is heavy rain. Due to the flat terrain, backwater effect of the tidal downstream and various water conservancy projects preventing water drainage, the Taihu water level often maintains above 3.5 m in the flood season, posing a serious flood threat to surrounding areas and low-lying flat downstream areas.

Data preparation
The data used in this study included: (1) Daily Taihu water level data, daily water level data of the Ganlu, Fengqiao and Pingwang stations from 1990 to 2017. (2) Daily discharge data of the Wangting Hydro-junction and the Taipu Gate from 1990 to 2017.
(3) Daily precipitation data of 65 hydrological stations evenly distributed in the Taihu Basin from 1990 to 2017.
The locations of each station are shown in Figure 1. The area-mean rainfall of six hydrological subregions was calculated using the Thiessen polygon method. All data were obtained from the Taihu Basin Authority of Ministry of Water Resources and had been reviewed to guarantee the completeness, accuracy, and consistency. The hydrological observation data of the flood season from 1990 to 2017 with consistent human activities were used for calibration models, of which the data from 1990 to 2011 were used to train and test the forecast model, while the data from 2012 to 2017 were used to predict the Taihu water level and validate the model.
The input variables were scaled to accelerate the convergence speed and make the training process faster. The following equation was used to normalize the input data: where B t and B t are the observed and normalized values of the input variable at time t, respectively; B max and B min are the maximum and minimum values of the input variable, respectively.

METHODS iNARX model
Model structure NARX models are recurrent dynamic architectures with several hidden layers and are inspired by discrete-time nonlinear models (Leontaritis & Billings 1985), the common definition for which is given by where u(t)eR and y(t)eR represent inputs and outputs of the network at a discrete time step t, respectively; n is the time delay of input and output variables; and 1 t is the model error between the target and prediction. The function f(Á) is a nonlinear function that should be approximated. When this is done by a multilayer perceptron, the resulting topology is called a NARX recurrent neural network.

Corrected Proof
Since different input time series have the same time delay in the conventional NARX model, we improved the NARX model in our proposed method so that different input sequences had different time delays, which could also be different from the time delay of output sequences, thus the iNARX model with k input variables and one output variable is obtained as where H is the number of hidden neurons; w ih is the connection weight between the input neuron and hidden neuron; w jh is the connection weight between the hidden neuron and output feedback neuron; w ho is the connection weight between the hidden neuron and predicted output; b h and b o are the biases of the hidden neuron and predicted output, respectively; f h (Á) is the hidden layer activation function; and f o (Á) is the output layer activation function. The structure of the iNARX network is depicted in Figure 2. Compared with FNN, the iNARX network can feedback outputs during the training procedure, so it contains information about the long sequence and has long-term memory, which can simulate the long-term dynamics of time series. Meanwhile, when an iNARX network is unfolded in time, the output delays will appear as jumpahead connections in the unfolded network. Intuitively, these jump-ahead connections provide a shorter path for propagating gradient information, thus reducing the sensitivity of the network to long-term dependencies (Lin et al. 1996). Consequently, the iNARX network converges much faster and generalizes better than other networks.

Training algorithm
The Bayesian Regularization (BR) algorithm provides an efficient generalization to reduce the negative effects of large weights by modifying the training function of the neural network based on Bayesian inference techniques (Foresee & Hagan 1997), which has a small network structure, high learning efficiency, and good generalization ability (Hirschen & Schäfer 2006). The network performance function is expressed as where k is the number of output nodes; g is the total number of training values; y jf is the expected output value of the training value at node f; y jf is the network actual output value of the training value at node f; E D is the sum of squared errors; E w is the sum of the square of the network weights; and b and a are objective function parameters.
We supposed that the network weights were random variables and the probability density function (PDF) of the weights could be computed according to the Bayes rules: where w is the vector of network weights; N is the iNARX network we used; and D is the training target data sample. P(wja, N) is the prior distribution of weights; P(Djw, b, N) is the likelihood function which is the probability of occurrence, giving the network weights; P(Dja, b, N) is a normalization factor, which guarantees that the total probability is 1.
Assuming that the noise in the training data and the prior distribution of weights are Gaussian, the probabilities can be written as where Z D (b) ¼ (p=b) m=2 and Z w (a) ¼ (p=a) M=2 , with m being the total number of network weights and M the total number of parameters in the network. Then the Gaussian form for the posterior probability is In this Bayesian framework, the optimal weights should maximize the posterior probability P(wjD, a, b, N). Maximizing the posterior probability of w is equivalent to minimizing the regularized objective function F(w).
Then Bayes' rule is applied to optimize the objective function parameters a and b.
If the prior density P(a, bjN) is uniform for the regularization parameters a and b, then maximizing the posterior is achieved by maximizing the likelihood function P(Dja, b, N), which is the normalization factor for Equation (5). We can solve Equation (5) for the normalization factor where the constants Z D (b) and Z w (a) are known from Equations (6) and (7), and Z F (a, b) is the only unknown part, which can be estimated by Taylor series expansion. Since the objective function has the shape of a quadratic in a small area surrounding a minimum point, F(w) can be expanded around the minimum point of the posterior density w MP , where the gradient is zero. Solving for the normalizing constant yields where H ¼ br 2 E D þ ar 2 E w is the Hessian matrix of the objective function. Placing the result into Equation (10), the optimal values for a and b at the minimum point can be solved. We do this by taking the derivative with respect to each log of Equation (10) and set them equal to zero. This yields is the effective number of parameters, which is a measure of how many parameters in the network are effectively used in reducing the error function. The Bayesian optimization of the regularization parameters requires the computation of the Hessian matrix of F(w) at the minimum point w MP . The Gauss-Newton approximation to the Hessian matrix is applied to optimize regularization. Steps are as follows: 1. Initialize a, b, and the weights. After the first training step, the objective function parameters will recover from the initial setting. 2. Take one step of the Levenberg-Marquardt algorithm to minimize the objective function F(w) ¼ bE D þ aE w . 3. Compute the effective number of parameters g ¼ M À 2atr(H) À1 using the Gauss-Newton approximation to the Hessian available in the Levenberg-Marquardt training algorithm: where J is the Jacobian matrix of the training set errors. 4. Compute new estimates for the objective function parameters a ¼ (g=2E w (w)) and b ¼ (m À g)=(2E D (w)). 5. Iterate steps 2 through 4 until convergence.

DWT
There are two types of WTs: continuous wavelet transformation (CWT) and DWT. DWT is generally used in hydrological series, which can be thought of as the dyadic sampling of CWT. The discretely scaled and translated wavelet can be written as where k is a location index; j is referred to as the decomposition level; and c(t) is the wavelet function or mother wavelet, of which the discrete wavelet transform can be written as where c(t) is the complex conjugate functions of c(t).
For a discrete time series x(n), the dyadic wavelet transform becomes which can also be written as where T (t) is the residual term, called the approximation at level J, which represents the background information of data, and the wavelet coefficients W j (t) are details at levels j ¼ 1, 2, . . . , J, providing the detail signals, which can capture the small features of interpretational value in the data.
In this study, we considered the dyadic wavelet transform for discrete time series and used the Mallat algorithm (Mallat 1989) to do discrete wavelets decomposition and reconstruction. Multiple-level decomposition was performed to get high-scale, low-frequency components (approximations) and low-scale, high-frequency components (details) of original time series. Because of the simplicity of approximations and details after the decomposition, some interesting characteristics such as period, hidden period, dependence, and jump can be diagnosed easily through wavelet components.

DWT-iNARX model
The DWT-iNARX model is the coupling model of iNARX and DWT. Since the iNARX model predicts the current value of the time series from its past values, so the predicted time series is both the input and output of the model, to be specific, past values of the sequence are the input and the current value is the output. DWT is employed to decompose the time series into approximation and detail components, and to reconstruct the signal over the predicted frequency components using the wavelet decomposition structure. The decomposed time series are used as both inputs and outputs to the DWT-iNARX model. The DWT-iNARX model consists of the following three-step algorithm: 1. The wavelet decomposition is employed to pre-process the predicted time series. First, the decomposition level is determined and the appropriate wavelet type is selected according to the characteristics of sequences. Then, multilevel DWT is used to decompose the original signal into details (D 1 , D 2 , . . . , D k ) and approximation (A k ), where k is the decomposition level. 2. The iNARX model is trained and tested using the details and approximation as input. In this step, the strategies for modeling the iNARX model can be used, including the selection of time delay for input and output variables, the number of hidden nodes, learning algorithms, etc. The details and approximation of the final time series can be predicted in this step. 3. The wavelet reconstruction is employed to construct the signal, which is the final prediction result, over the predicted details and approximation using the wavelet decomposition structure in Step 1.

Evaluation of performance
In order to evaluate the accuracy of the constructed model, four statistical parameters were calculated. These parameters are qualification rate (QR), Nash-Sutcliffe efficiency coefficient (NSE), root mean square error (RMSE), and mean relative error (MRE). These parameters were calculated by using the model predictions and the corresponding measured data. These methods are expressed as where n represents the total number of forecasts and m represents the number of qualified forecasts. In our study, a forecast with an error of less than 0:1 m is regarded as a qualified forecast. Output i is the predicted water level by the model; Target i is the corresponding measured water level; and Target denotes the average of the measured water level. The closer the value of NSE is to 1, the better the prediction effect of the model. RMSE and MRE reflect the deviation of predicted values from measured values, and the smaller the values of RMSE and MRE are, the better the model predicts.

Uncertainty analysis
This study examined a methodology for the probabilistic water level forecast to cope with uncertainties by combining the DWT-iNARX model and Monte Carlo simulations (MCS). The prediction interval is comprised of upper and lower bounds that bracket a future unknown value with a confidence level. The model uncertainty is defined by the sum of three components: (1) inherent variability due to random fluctuations, (2) model structure uncertainty (Pappenberger et al. 2006;Sharma & Regonda 2021), and (3) input data uncertainty (Mohanty et al. 2020). In our study, model structure uncertainty was reduced through optimization of model structure and parameters. We mainly considered the uncertainty of the area-mean rainfall in the input data, which could not be estimated with absolute certainty by the error type and magnitude of rain gauges, and their density and pattern of space distribution. Uncertainty and inherent variability of the input data are considered through the definition of probability distributions. The expected value of a function g of the uncertain variables of x with a PDF of f X (x) is characterized by the following equation if x is continuous: where x represents the area-mean rainfall and g(x) represents water level predictions based on DWT-iNARX modeling. A Monte Carlo estimate of E[g(x)] can be obtained by taking an n sample of X s (x 1 , x 2 , . . . , x n ) through the following equation: If E[g(x)] exists, then the weak law of large numbers indicates that as n gets large, the probability that e g n (X) deviates from E[g(x)] becomes smaller. In fact, e g n (X) is an unbiased estimate for E[g(x)]. The prediction intervals are estimated by taking a very similar approach: randomly drawing an n sample (x 1 , x 2 , . . . , x n ) from the hypothetical distribution of the uncertain input variables (i.e., the area-mean rainfall), subsequently estimating the water levels g(x i ) for each x i through DWT-iNARX modeling, and finally employing the resulting statistical populations of g(x i ) to estimate the prediction intervals for each water level. The following procedures are performed.
First, the MCS is performed to simulate N samples for each input area-mean rainfall. The normal distributions are assumed for the area-mean rainfall and different variances represent the different levels of uncertainty. In the random sampling process of MCS, the Latin Hypercube Sampling (LHS) is used to separately draw an N sample from the hypothetical distribution of each of the uncertain input area-mean rainfall. LHS is more efficient than the simple random sampling when the sample size is less than a few thousand. In the LHS process, the range of each variable is divided into N nonoverlapping intervals of equal probability of 1=N, where N is the sample size. Subsequently, a value is selected randomly from each interval according to the probability density of the interval. Here, N is set equal to 1,000. And then Taihu water levels are predicted using the DWT-iNARX model which takes the simulated area-mean rainfall samples as input. Thus, we get N predicted water levels at each time, which are used to determine the prediction intervals. The upper and lower bounds of the prediction intervals have been estimated with 95% confidence level, using the percentile bootstrap method. Figure 3 depicts the major modules of overall methodology flow.

Selection of input variables and time delays
The rainfall of the Taihu Basin, water levels at key flood control stations, and discharges of the key flood control projects are the main factors affecting the water level of the Taihu Lake. In this study, based on the physical genesis and priori knowledge of characteristics of potential input variables, the following four types of predictors, i.e., input variables, were selected: (1) early water levels of the Taihu Lake; (2) Table 1. The water level of the Taihu Lake is predicted, which is the output variable. The time delays of input time series were determined by physical genesis combined with statistical analysis. The time delay of the rainfall sequence in each subregion was determined by the concentration time, which is defined as the time needed for water to flow from the most remote point in each subregion to the Taihu Lake. Based on the distance of each subregion from the Taihu Lake, topography, geology, and land use within each subregion, and layout of hydraulic engineering, the confluence time of each subregion is calculated as: 2 days in the Lake Region, 5 days in the Hu Xi Region, 7 days in the Zhe Xi Region, 7 days in the Hang Jia Hu Region, 5 days in the Wu Cheng Xi Yu Region, and 5 days in the Yang Cheng Dian Mao Region (Wang et al. 2001), which are regarded as time delays of rainfall sequences.
The time delays of other input sequences were determined by the autocorrelation and cross-correlation coefficients. The autocorrelation of Taihu water level (Z Taihu ) and its cross-correlation with the water levels at key flood control stations (Z Ganlu , Z Fengqiao , and Z Pingwang ) and discharges of key flood control projects (Q Wangting and Q Taihu ) are displayed in Figure 4. From Figure 4(a), the first to seventh order autocorrelation coefficients of Z Taihu were all greater than 0.9, indicating an obvious correlation between the current Taihu water level and its historical values in the previous 7 days. As shown in Figure 4(b)-4(d), the cross-correlation coefficients between Z Taihu and Z Ganlu , Z Fengqiao and Z Pingwang did not change much with the increase of the order, but they all had a strong correlation. From Figure 4(e) and 4(f), the correlation coefficients between Z Taihu and Q Wangting and Q Taipu did not change significantly as the order increases, but there was a certain correlation between them. Considering all the above, the time delay of each input time series was determined as shown in Table 1.

Optimization of model structure
The methods for multi-step ahead (also called long-term) forecasting of hydrological time series include recursive strategy, direct strategy, multi-input multi-output strategy (Taieb et al. 2012), etc. This study used the direct strategy for single-step prediction, that is, different neural network models were established for different forecast periods. Three types of models, which were BP, iNARX, and DWT-iNARX models, were built and the prediction periods were 1-7 days. Consequently, a total of 21 prediction models had been established to simulate the Taihu water level in different forecast periods. Hornik et al. (1989) introduced that multilayer neural networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficient hidden units are available. Therefore, this study chose a three-layer network with one input layer, one hidden layer, and one output layer. Based on some rule-of-thumb methods, through trial and error, we obtained that networks with 26 neurons in the  The early water level of Taihu Lake 7 2 The area-mean rainfall of Lake Region 2 3 The area-mean rainfall of Hu Xi Region 5 4 The area-mean rainfall of Zhe Xi Region 7 5 The area-mean rainfall of Hang Jia Hu Region 7   6 The area-mean rainfall of Wu Cheng Xi Yu Region 5   7 The area-mean rainfall of Yang Cheng Dian Mao Region 5   8 The water level at Ganlu Station 7 9 The water level at Fengqiao Station 7   10 The water level at Pingwang Station 7   11 The discharge of Wangting Hydro-junction 7 12 The discharge of Taipu Gate 7 hidden layer generally produced the best results. The transfer functions of the hidden and output layer were 'tansig' and 'purelin', respectively. The network was trained for 10,000 epochs to minimize the mean squared error using the BR algorithm. During the establishment of the DWT-iNARX model, the time series of Taihu water level should be decomposed first. The selection of suitable wavelet function and decomposition level should be considered when Corrected Proof using DWT for sequence decomposition. Commonly used wavelet functions include the simple wavelet functions Haar wavelet, the db4 and db8 wavelet functions of most commonly wavelet family Daubeches, sym4 and sym8 wavelet functions with the sharp peak of the Symlet wavelet family and discrete approximation of the Meyer wavelet function. The research of Shoaib et al. (2016) showed that the db8 wavelet function had the best results for hydrological time series, because it had reasonable support width and also contained good time-frequency localization property, which allowed the db8 wavelet function to capture both the underlying trend as well as the short-term variability in the time series data. Consequently, the db8 wavelet function was considered for this study.
The suitable decomposition level is selected using the following formula (Nourani et al. 2011): where L is the level and N is the total data points. In this study, the decomposition level was calculated as 3 according to the length of time series. Level 3 decomposition of the Taihu water level series by db8 wavelet is presented in Figure 5. Figure 5(a) shows the approximation subsignal at Level 3, while Figure 5(b) and 5(c) shows detail subsignals at Levels 1, 2, and 3, respectively.

Predicting water levels
The BP, iNARX, and DWT-iNARX models were validated by comparing the simulation results with 6 years of data that were not previously included in the training set. The performances of three constructed models were assessed by QR, NSE, RMSE, and MRE. The results in terms of four evaluation indexes for the training and prediction data sets in different forecast periods are shown in Table 2 and Figure 6.
It could be seen from Table 2 and Figure 6 that NSEs of iNARX and DWT-iNARX models were higher than 0.9, and their QRs were not less than 80% in all forecast periods. NSE of the BP model was greater than 0.85, and its QR was more than 70% in different forecast periods. In addition, as the forecast period increased, the prediction accuracy of three models decreased in varying degrees. The accuracy of the BP model dropped the most, while the iNARX and DWT-iNARX models were relatively stable, as the prolongation of the forecast period failed to reduce their accuracy rapidly. When the forecast period was short, such as 1-3 days, three models achieved good simulation results with higher accuracy for both training and prediction data sets. This indicated that the neural network model had certain advantages in forecasting hydrological time series because of its universal approximation capability and flexible structure that allowed us to capture complex nonlinear behaviors. When the forecast period was long, such as 6 or 7 days, the performance of the DWT-iNARX model was obviously superior to the other models, showing that it is robust and accurate.
The gap of each evaluation index between training and prediction periods became larger as the forecast period increased, among which the gap of the BP model was the largest. In the training period, evaluation indexes of the iNARX model were not significantly different from those of the BP model, and even the QR of the BP model was sometimes better than the iNARX model, but in the prediction period, the iNARX model was significantly better than the BP model. The reason was that compared with the traditional BP model, the iNARX model had feedback connections between layers of neurons, which could effectively prevent overfitting, and led to a good performance in the prediction period. Moreover, the gap of each evaluation index between training and prediction periods for the DWT-iNARX model was the smallest, indicating that the DWT-iNARX model had the best generalization ability.
Taking the year 2016 as an example, Figure 7 shows the observed Taihu water level and its predicted values of the three models under the forecast periods of 1-7 days during the flood season.
In 2016, affected by the strong El Niño event, spring floods, plum floods, and autumn floods occurred successively in the Taihu Basin. The large amount of rainfall, high water level of the Taihu Lake, and long duration of flood were rare in history. The Taihu water level was 3.51 m on 1 May 2016, which was the highest water level entering the flood season in the same period in history. On the day of entering the plum, the Taihu water level rose to 3.77 m, which was the second highest in history. After entering the plum, the Taihu water level rose rapidly due to heavy rain. On July 3, the Taihu water level reached and exceeded 4.65 m, and a flood exceeding the designed level occurred in the Taihu Basin. On July 6, the Taihu water level rose to 4.80 m, and an extraordinary flood occurred in the basin. On July 8, the Taihu Lake reached a peak water level of 4.87 m, second only to the highest level recorded in 1999.
From Figure 7, when the observed water level changed greatly and extreme values appeared, the water level hydrograph predicted by the BP model showed severe oscillations, and the hydrograph predicted by Corrected Proof the iNARX model showed a jagged shape. Additionally, the DWT-iNARX model simulated the rising trend of water level well, and the hydrograph predicted by it was relatively smoother and closer to the observed water level under the same forecast period. Rainfall in the basin during the flood season, upstream flooding into the lake, dispatch of pumps and sluices in the lake region, and so on would all cause large fluctuations in the Taihu water level, but the forecast factors selected in this study were limited for convenience and efficiency. Thus, the forecast accuracy would be lower when the water level changed greatly. The reason the iNARX model did not oscillate as sharply as the BP model was that the iNARX model had long-term memory capabilities, which could capture the details on the water level's fluctuations. The DWT-iNARX model performed best, proving that the use of wavelet transformation would capture useful information on various resolution levels and significantly improve the ability of a forecasting model.
As shown in Figure 7, with the extension of the forecast period, the deviation between the actual water level process and predicted hydrograph of all three models became larger, among which the BP model was the largest, and the DWT-iNARX model was the smallest. Especially when the forecast period was long, such as 6 or 7 days, the performance of the BP model was obviously inferior to the other two models. This was consistent with the results shown in Table 2 and Figure 6. Figure 8 shows the scatter plot of the observed and forecast daily Taihu water level for three models in different forecast periods during the flood season from 2012 to 2017. With the increase of forecast period, the points of BP and iNARX models became more scattered compared with the DWT-iNARX model, of which the BP model was more obvious. Figure 9 shows the cumulative probability distribution curve of the absolute error for three models in different forecast periods for the prediction data set. It can be clearly seen from Figure 9 that in the same forecast period, the prediction error of the BP model was the largest among the three models, and the performance of the iNARX model was slightly inferior to that of the DWT-iNARX model. As the forecast period increased, the gap between the prediction error of the BP model and that of the other two models also increased.
In overall performance, iNARX and DWT-iNARX models showed superiority in comparison with the BP model, while the DWT-iNARX model yielded the best performance among all other models. The DWT-iNARX model used various scale and frequency components of the wavelet decomposition as predictors, which can reflect more detailed information on hydrological sequences, so its accuracy was the highest in the same forecast period. It proved that the use of wavelet transformation would capture useful information on various resolution levels and significantly improve the ability of a forecasting model. section, we computed prediction intervals of the water level to cope with the uncertainty of the area-mean rainfall in the input data. Results are shown in Figure 10. Water resources managers can identify how the uncertainty in input variables propagates through the model and influences the resulting water level predictions from Figure 10. It can be seen from Figure 10 that the prediction interval covered the observed water level, indicating that the probabilistic daily water level prediction is reliable. Wide PIs are an indication of the presence of a high level of uncertainty in the prediction of the water level with a high chance of confronting an unexpected condition. Under the same variance of uncertain input variable distributions, as the forecast period increased, the width of the PIs Figure 9 | Cumulative probability distribution curve of absolute error for three models with leading time of (a) 1 day, (b) 2 days, (c) 3 days, (d) 4 days, (e) 5 days, (f) 6 days, and (g) 7 days for prediction data set.
Journal of Water and Climate Change Vol 00 No 0, 19 Corrected Proof became larger, which illustrated that the forecast period and the uncertainty of forecast results increased consistently. Near the extreme water level, the PI was wider than usual, indicating that the forecast uncertainty for extreme values was relatively larger.
The innovative work of this study is as follows: 1. This study proposed a new data-mining-based method for the water level prediction in different forecast periods. The proposed method, called DWT-iNARX, incorporating DWT and iNARX networks, is capable of taking advantage of both temporal and spectral information within the signal, and simulating the longterm dynamics of time series, with fast convergence speed and good generalization ability. Research results validated the effectiveness of the proposed approach and showed a considerable improvement in the Corrected Proof prediction precision. This method was simple, affordable, and practical, and it required less information about real-time rain, water and project operation compared with the simulation of hydrodynamic models. It could provide the accurate forecast of Taihu water level for flood control in the Taihu Basin under the circumstance of less real-time hydrological information. 2. Furthermore, this study investigated a methodology of combining DWT-iNARX and MCSs to quantify model uncertainty and provide the probabilistic water level forecast. The probabilistic forecast offered great potential for water level predictions, and facilitated communications between modelers and urban decision makers. Besides this, the uncertainty analysis method proposed in this study was independent of deterministic hydrological forecasting models, and could be integrated with any deterministic hydrological forecasting model without any assumptions, which provided a methodological framework for various probabilistic hydrological forecasting systems.

CONCLUSIONS
Floods are made more likely by more extreme weather patterns caused by long-term global climate change. The Taihu Basin is a typical plain river network area, with extensive rivers, lakes, and polder areas, and is vulnerable to flooding. Within this context, the Taihu water level forecast is of great significance to the flood risk assessment, flood control dispatching, and water resources management in the Taihu Basin. In this study, a coupling model of DWT and the iNARX network was established to predict the daily Taihu water level under different forecast periods, which was compared with BP and iNARX models to verify its effectiveness and applicability. Meanwhile, an uncertainty analysis method was proposed based on MCS for quantifying model uncertainty and performing probabilistic water level forecast. The conclusions were as follows: 1. Three models achieved good simulation results with higher accuracy when the forecast period was short, such as 1-3 days, indicating that the neural network model had certain advantages in forecasting hydrological time series because of its universal approximation capability and flexible structure that allowed us to capture complex nonlinear behaviors. 2. When the forecast period was long, such as 6 or 7 days, the iNARX and DWT-iNARX models showed obvious superiority in comparison with the BP model. This reflected the superiority of the structure of the iNARX network, which had strong generalization ability and long-term memory capability. 3. When the observed water level changed greatly and extreme values appeared, the water level hydrograph predicted by the BP model showed severe oscillations, the hydrograph predicted by the iNARX model showed a jagged shape, and the hydrograph predicted by the DWT-iNARX model was relatively smoother and closer to the observed water level. This proved that the use of wavelet transformation would capture useful information on various resolution levels and significantly improve the ability of a forecasting model. 4. In overall performance, iNARX and DWT-iNARX models showed superiority in comparison with the BP model, while the DWT-iNARX model yielded the best performance among all other models, indicating that the proposed method could not only improve the prediction accuracy but also prolong the forecast period. 5. The forecast period and uncertainty of forecast results increased consistently, and the forecast uncertainty for extreme values was relatively larger.
The results of this study improve the prediction accuracy of the Taihu water level and prolong the forecast period, which can improve the reliability of decision-making basis for flood control dispatching in the Taihu Basin, and provide references for efficient use of water resources. The probabilistic water level forecast facilitated communications between modelers and decision makers. Moreover, the proposed methodology, which is a blackbox model, is the most intuitive and straightforward approach to retrieve useful flood information from observations. It is especially suitable for areas where the physical-based simulation is difficult due to complex hydrodynamic processes. It uses snapshots of the past and cannot directly predict responses to future or changed conditions. This is the limitation of this model compared with the hydrodynamic models. However, if enough data meeting actual requirements can be obtained, then the proposed method is robust and accurate.