Abstract
This study proposes a stochastic artificial neural network (named ANN_GA-SA_MTF), in which the parameters of the multiple transfer functions considered are calibrated by the modified genetic algorithm (GA-SA), to effectively provide the real-time forecasts of hydrological variates and the associated reliabilities under the observation and predictions given (model inputs); also, the resulting forecasts can be adjusted through the real-time forecast-error correction method (RTEC_TS&KF) based on difference between real-time observations and forecasts. The observed 10-days rainfall depths and water levels (i.e., hydrological estimates) from 2008 to 2018 recorded within the Shangping sub-basin in northern Taiwan are adopted as the study data and their stochastic properties are quantified for simulating 1,000 sets of rainfall and water levels at 36 10-days periods as the training datasets. The results from the model verification indicate that the observed 10-days rainfall depths and water levels are obviously located at the prediction interval (i.e., 95% confidence interval), revealing that the proposed ANN_GA-SA_MTF model can capture the temporal behavior of 10-days rainfall depths and water levels within the study area. In spite of the resulting forecasts with an acceptable difference from the observation, their real-time corrections have evident agreement with the observations, namely, the resulting adjusted forecasts with high accuracy.
HIGHLIGHTS
This study presents a stochastic ANN model.
Use GA-SA method to calibrate the ANN weights.
A large training dataset is accomplished via a non-nomral multivariate Monte Carlo simulation approach.
Real-time correction method is used to adjust the hydrological forecast.
INTRODUCTION
Recently, physically and statistically-based hydrological models are frequently applied to predict the rainfall-induced hydrological variates in response to the effect of extreme rainfall on runoff dynamics under conditions of the observed and forecasted hydrological data given (e.g., precipitation and water levels) (e.g., Bertoni et al. 1992; Dawson & Wilby 2001; Xia 2002; Sun & Kim 2014; Chang et al. 2018; Sun et al. 2020; Yang et al. 2020). The resulting hydrological forecasts with high accuracy and efficiency play an important role in the flooding early-warning/risk-management and water resources management (e.g., Wang et al. 2009; Sayers et al. 2014; Sung et al. 2017; Sankaranarayanan et al. 2019; Yang et al. 2020). Generally speaking, the aforementioned relevant hydrological-analysis modeling can be grouped into three types: data-driven model, conceptual model and distributed model in describing the hydrological cycle process, such as the rainfall-runoff process and rainfall-induced inundation (Kan et al. 2016). Of three types of hydrological models, the reliability and accuracy of the resulting hydrological forecasts might be influenced by uncertainties in the perceptual model, the measurement of input data and parameter calibration (Melching 1995; Gupta et al. 2005; Wagener & Gupta 2005). Furthermore, the hydrological modeling with high resolution in time and space, such as the watershed hydrological models which focus on simulating the entire hydrological cycle process, might be established through numerical coding, especially for the real-time forecasting of hydrological variates (e.g. precipitation, water level/inundation depth); accordingly, this is possibly executed with long computation time in order calibrating the optimal values of model parameters and in achieving the hydrology forecasts (e.g., Paschalis et al. 2013: Kan et al. 2019; Sun et al. 2020). That is to say, desipte the hydrological modeling can be advantageous in emulating the rainfall-runoff-water level characteristic by means of numerical coding based on a deterministic and conceptual relationship given in advance, its performance and effectiveness are probably impacted due to the uncertainties in the complicated model structure and a number of associated parameters which represent the spatial scale differing more than the observation or which have no direct measurements regrading physical signification (Wagener & Gupta 2005; Melsen et al. 2016).
To overcome the above disadvantage, the artificial intelligent (AI) modeling is widely applied in the simulations and predictions in regard to the flood-related hydrological variates (e.g., precipitation, discharge and water level) (e.g., Campolo et al. 1999; Imrie et al. 2000; Cigizoglu & Kisi 2005; Firat 2008; Wang et al. 2009; Shamseldin 2010; Maca et al. 2014; Khan et al. 2016; Sung et al. 2017; Malik et al. 2019). For example, Campolo et al. (1999) utilized the logistic function as the transfer function, namely, the activation function, to train an ANN model which describes the spatial relationship between rainfall and water levels to issue forecasting information on the distributed water levels; Shamseldin (2010) proposed an ANN-derived rainfall-runoff model based on the structure of the multi-layer perceptron with a specific transfer function (i.e., the logistic/sigmoid function) to provide the river-runoff forecasts using the weighted average of rainfall and expectation of the rainfall index and the observed discharge as model inputs. Numerous types of the neural-network (NN) models have been proposed to forecast the hydrological variables and simulate the water resource process, such as the rainfall-runoff-water levels in the watershed and urban area, including the adaptive neuro fuzzy inference system (ANFIS), artificial neural network methods (ANN), generalized regression neural networks (GRNN), feed forward networks (FFNN), support vector machine (SVM) and genetic programing (Firat 2008; Wang et al. 2009). It is well known that although the ANFIS model is generally superior to the ANN model in forecasting hydrological estimates, the ANN model can be efficiently and usefully applied in modeling difficult and complicated phenomena described in terms of nonlinear mathematic relationship by constructing the linear multi-layers network using all possible predictor variables through the multiple training algorithm (Tu 1996; Khan et al. 2016).
Since the ANN-derived models are formatted by establishing the linear relationships between the neurons at input-hidden-output layers, the number of hidden layers and associated number of neurons should be given in advance. Sheela & Deepa (2013) summarized a variety of formulae for estimating the number of connection neurons required in an ANN model under a given number of hidden layers; the above coefficients of the aforementioned linear relationships between neurons (named ANN weights) are commonly determined by means of the optimization method with a known objective function (i.e. loss function) under a specific network structure. Among the well-known optimization approaches, the back-propagation (BP) algorithm integrated with the gradient descent method, is frequently adopted to calibrate ANN weights (e.g., Campolo et al. 1999; Cigizoglu & Kisi 2005; Sung et al. 2017; Chen et al. 2019). In detail, with the BP algorithm used in training the ANN model, the adjusted ANN weights and bias are calculated by the gradient method with a value of the loss function (e.g., root mean square error, RMSE) under given initial value of neurons weights and bias; as the minimum RMSE can be obtained, the corresponding adjusted ANN weights and bias are viewed as the optimal parameters regarding the ANN model. However, the resulting ANN weights fail to be the local optimal values with high likelihood attributed to their given inappropriate initial values; also, the optimal ANN weights are obtained with difficulty due to the unreasonable learning rate which causes the problem of oscillation; this possibly slows down the convergence speed of the BP neural network (Wang et al. 2015). Additionally, although the reliability of the estimates provided by the ANN model is propionate to the number of hidden layers and neurons, but the calibration of their optimal parameters are accomplished with greater difficulty, especially for the large number of model inputs (Xu et al. 2015). To solve the above problem, a genetic algorithm is employed to optimize the parameters of the ANN model, including the connection weights and network structure (i.e., number of hidden layers) (e.g., Xu et al. 2015; Tahboub et al. 2016; Chen et al. 2019). Nevertheless, the accuracy of the resulting optimal model parameters is possibly affected by the uncertainties in the observations and the types of objective functions considered (Wu et al. 2012a, 2012b). The resulting neuron weights from a traditional GA possibly also fall into the local optimal values under consideration of the complex neural network (Chen et al. 2019). To avoid obtaining the local optimizations for a complex model structure, Wu et al. (2012a, 2012b) modified a GA based on the sensitivity of model outputs to the model parameters, named GA-SA, in which the chromosome and genes are selected in the crossover and mutation based on the sensitivity of model parameters. Thus, in this study, the GA-SA algorithm with a specific loss function is employed to training the ANN model in order to determine the connection weights (i.e., the ANN weights) and bias for a specific transfer function under consideration of a specific network structure.
Apart from the forecast-related investigations, the effect of the uncertainty factors on the hydrological estimates is supposed to be evaluated through the numerical hydrological/hydraulic models integrated with the uncertainty/risk analysis (e.g., Tung & Yen 2005; Wu et al. 2012a, 2012b, 2015); the results from the uncertainty/risk analysis enable to assess the reliability of the hydrological estimates. Therefore, stochastic modeling for the ANN model could be presented by integrating the statistical analysis with the results from the ANN models (e.g., Wegman & Habib 1990; Yetilmezsoy & Saral 2007; Malik et al. 2019). In general, the above stochastic ANN model can be classified into two types: one is to quantify the uncertainty of model input by adding a resulting bias from a prior probability function (Wegman & Habib 1990) or using the large number of generated model inputs (Yetilezsoy and Saral); the other is to quantify the stochastic properties by carrying out the statistical analysis for the forecasts provided by the ANN model (Malik et al. 2019). As a result, to efficiently provide the real-time hydrological forecasts with high reliability and accuracy, stochastic modeling can be achieved by quantifying the statistical features of results from the ANN models with a number of appropriate parameters calibrated by the SA-GA approach. This can take into account the uncertainties in ANN parameters on the hydrological estimates.
In addition to the uncertainties in the calibrated ANN weights attributed to a limited range of available observations (Imrie et al. 2000), the selection of the transfer function (i.e. activation functions) should be regarded as an uncertainty factor in the development of the ANN model which possibly influences the generation of the neural network (Imrie et al. 2000; Maca et al. 2014), To quantify the effect of selecting the modules/approaches used in the estimation of hydrological variables, Wu et al. (2011) presented a weighted-average estimator applied in the estimation of rainfall amount at ungauged grids via the Kriging method with a variety of semivariogram functions. In detail, by evaluating the performance of optimizing the model parameters in comparison to the observations, the rainfall amount at ungauged grids can be obtained by calculating the weighted average of the estimated gridded rainfall through the different semivariogram functions of interest. Note that the aforementioned weight regarding the specific semivariogram functions is defined as the ratio of the inverse values of its objective-function values associated with the optimal parameters to the sum of the inverse value of its objective-function values obtained from all semivariogram functions. The weighted-average estimator could effectively respond to the effect of uncertainties in the transfer/activation functions selected on the estimations of hydrological variables by means of the ANN model.
Recently, the real-time observations from the internet of thing (IoT) are comprehensively applied in the flooding early system in order to facilitate the accuracy of hydrological forecasts (Sankaranarayanan et al. 2019; Wu et al. 2020; Yang & Chang 2020). For illustration, Sankaranarayanan et al. (2019) employed the deep neural network to predict the flood occurrence using the temperature and rainfall intensity collected through IoT. Wu et al. (2020) and Yang & Chang (2020) carried out the real-time correction of inundation forecasts, provided by the machine learning techniques and 2-D hydraulic dynamic numerical modeling, respectively, in accordance with the observed inundation depth received through the IoT-derived roadside water-level sensors. In particular, Wu's method (i.e., RTEC_TS&KF), directly corrects the inundations-depth forecasts using the time series approach and Kalman filtering technique with real-time observed inundation depths recorded at IoT sensors without adjusting the parameters of hydrological/hydraulic numerical model; its advantage is not to influence the computation time and convergence speed of hydraulic numerical modeling. As a result, in this study, more accurate hydrological forecasts would be offered by adjusting the results from the proposed ANN model through Wu's method with the real-time observations.
It is well-known that in training the ANN model, the training hydrological-dataset size, such as the precipitation, river flow and water level, might impact the performance of optimizing the associated parameters (Foody & McCulloch 1995). Furthermore, there are frequently high correlations and resolution among hydrological variables in time and space (Wu et al. 2006; Haas et al. 2018). Accordingly, applying a big dataset in successfully training ANN model is a progressive and significant task in order to avoid the model overfitting or underfitting of the model associated with numerous parameters (Chiroma et al. 2018; Chalumuri et al. 2020). Therefore, this study focuses on taking into account the effect of uncertainties in the parameters of ANN model for a network structure set up in advance, the associated neuron weights, and selections of transfer functions, on the reliability and accuracy of the resulting hydrological forecasts to propose a stochastic ANN model (called ANN_GA-SA_MTF model). Eventually, it is expected that the proposed ANN_GA-SA_MTF model would not only provide hydrological forecasts with acceptable accuracy, but also quantify their reliability in terms of confidence intervals of a specific significant level.
METHODOLOGY
Model concept
Generally speaking, in training the ANN model, a large training dataset can effectively facilitate the performance of optimizing the associated parameters (Foody & McCulloch 1995), Thus, in this study, reproducing a large training dataset is the first task of the development of the proposed stochastically-based ANN model. Since the hydrological data have correlations in time and space, especially for the precipitation and runoff (e.g., Nandakumar & Mein 1997; Wu et al. 2006; McMillan et al. 2018), they are commonly regarded as the non-normal correlated variates. Thus, a correlated multivariate Monte Carlo simulation approach is adopted to generate the massive hydrological data used in the training of the proposed stochastically-based model.
According to Section 1 Introduction, this study proposes a stochastically-based ANN model in which a variety of the transfer functions (i.e. activation functions) are adopted under a given multiple network structure. However, it is generally difficult to determine a number of the connection weights between the neural nodes at different layers (i.e., ANN weights) under a multiple network structure using the back-propagation (BP) algorithm attributed to the vanishing and exploding gradient (Hochreiter 1998; Manchev & Spratling 2020). Therefore, in this study, the ANN weights are calibrated by means of a modified genetic algorithm based on sensitivity to the parameters of the complex hydrological models (GA-SA) (Wu et al. 2012a, 2012b). As a result, the proposed stochastically-based ANN model is developed by means of the modified genetic algorithm GA-SA method with a variety of transfer functions to obtain the numerous sets of appropriate parameters, including the ANN weights, bias and adjusting factors, named the ANN_GA-SA_MTF model. Accordingly, a number of the hydrological estimates can be accomplished by the proposed ANN_GA-SA_MTF model with numerous sets of calibrated appropriated parameters for the given hydrological factors defined as the model inputs. Eventually, the resulting hydrological forecasts are calculated using a number of the hydrological estimates with the corresponding objective functions associated with the appropriate parameters regarding various transfer functions. Also, the stochastic properties of the resulting hydrological estimates can be quantified through the statistical approach (i.e., weighted-average estimator) with the numerous sets of hydrological estimates, including the first four statistical moments and the 95% of confidence intervals. Note that the quantified 95% of confidence interval is treated as the prediction interval in which the 95% of hydrological estimates can be found. By doing so, in this study, through the above prediction interval, the likelihood of the forecasts approaching the observations can be briefly quantified and evaluated by comparing the upper and lower bounds of the prediction interval with the observations.
Since the proposed ANN_GA-SA_MTF model is expected to be applied in forecasting the flood-related hydrological variates, such as the inundation depths, water levels, and runoff, the results from the proposed ANN_GA-SA_MTF model is represented in terms of the forecasts. However, in reality, a variety of uncertainties are inherent in the observations (e.g., Nandakumar & Mein 1997; Chen & Wang 2018; McMillan et al. 2018); the resulting responses from the hydrological models might contain variations which affect their accuracy and reliability (Wu et al. 2012a, 2012b, 1015). Therefore, to enhance the performance of forecasting the hydrological variates through the proposed ANN_GA-SA_MTF model, the resulting hydrological forecasts be corrected by incorporating the real-time correction error method RTEC_TS&KF (Wu et al. 2020) in comparison to the real-time observations received through IoT which is the major difference from the well-known stochastic ANN models.
In summary, the development of the proposed ANN_GA-SA_MTF model comprises the five parts: (1) the generation of hydrological data as the training dataset; (2) the calibration of the model parameters; (3) the calculation of the weighted average regarding the resulting estimates; (4) the quantification of the stochastic properties of the resulting estimates; and (5) the real-time correlation of the resulting forecasts. The above concepts and methods mentioned can be introduced as follows.
Introduction to conventional ANN
An artificial neural network (ANN) is an empirical data-driven model which utilizes a parallel computing system with interconnections comparable to a biological neural network. It well-known that the ANN model is advantageous to the situations in which it is difficult to mathematically establish the relationship of the dependent variables with the independent ones for the physical phenomena (Khan et al. 2016). Furthermore, ANN models are capable of simulating nonlinear complicated system without the prior assumption of the relationship between model outputs and inputs (ASCE 2000a, 2000b).
No. of formula . | Formula . | References . |
---|---|---|
1 | Li et al. (1995) | |
2 | Tamura & Tateishi (1997) | |
3 | Zhang et al. (2003) | |
4 | Shibata & Ikeda (2009) | |
5 | Hunter et al. (2012) | |
6 | Sheela & Deepa (2013) |
No. of formula . | Formula . | References . |
---|---|---|
1 | Li et al. (1995) | |
2 | Tamura & Tateishi (1997) | |
3 | Zhang et al. (2003) | |
4 | Shibata & Ikeda (2009) | |
5 | Hunter et al. (2012) | |
6 | Sheela & Deepa (2013) |
Function . | Formula . | Derivative . | Suggested normalization equation . | |
---|---|---|---|---|
TF1 | Logistic(soft step、sigmoid) | Equation (2) | ||
TF2 | Tanh | Equation (3) | ||
TF3 | Arctan | Equation (3) | ||
TF4 | Identity | f^′ (x) = | Equation (3) | |
TF5 | Rectified linear unit (ReLU) | Equation (2) | ||
TF6 | Parameteric rectified linear unit (PReLU、leaky ReLU) | Equation (3) | ||
TF7 | Exponential linear unit(ELU) | Equation (3) | ||
TF8 | Inverse abs (IA) | Equation (3) | ||
TF9 | Rootsig (RS) | Equation (3) | ||
TF10 | Sech function (SF) | Equation (3) |
Function . | Formula . | Derivative . | Suggested normalization equation . | |
---|---|---|---|---|
TF1 | Logistic(soft step、sigmoid) | Equation (2) | ||
TF2 | Tanh | Equation (3) | ||
TF3 | Arctan | Equation (3) | ||
TF4 | Identity | f^′ (x) = | Equation (3) | |
TF5 | Rectified linear unit (ReLU) | Equation (2) | ||
TF6 | Parameteric rectified linear unit (PReLU、leaky ReLU) | Equation (3) | ||
TF7 | Exponential linear unit(ELU) | Equation (3) | ||
TF8 | Inverse abs (IA) | Equation (3) | ||
TF9 | Rootsig (RS) | Equation (3) | ||
TF10 | Sech function (SF) | Equation (3) |
Generation of hydrological training dataset
As training the ANN model, a sufficient dataset size is required in order to calibrate reliable ANN weights between neurons at various layers. Since the hydrological data are possibly regarded as the correlated variates in time and space, in this study, the correlated multivariate Monte Carlo simulation approach (name MMCS) (Wu et al. 2006), which focuses on the simulation of spatially and temporally corrected non-normal variables based on their uncertainties, would be used to reproduce a large number of hydrological data for the training of the proposed ANN_GA-SA_MTF. The above MMCS method is briefly introduced as follows.
In summary, in the proposed ANN_GA-SA_MTF model, the MMCS method is applied to simultaneously generate the hydrological variables as the training dataset.
Parameter calibration using the GA-SA method
In this study which intends to consider the effect of uncertainties in the network structure used in the training of the proposed ANN_GA-SA_MTF model, a multi-layer network structure is adopted. As mentioned in Section 2.1, the gradient method is widely used in determining the ANN weights between two neurons at various layers; however, as the gradient shown in Equation (5) approaches to zero and 1.0, named the problems of the vanishing and exploding gradient, respectively, it is difficult to optimize the neuron weights (Hochreiter 1998; Manchev & Spratling 2020).
In addition, regarding the artificial intelligence (AI) method, it is necessary to identify appropriate model inputs in order to achieve the goal of successfully training the AI model (Wang et al. 2009). Therefore, to figure out the problems with the vanishing and exploding gradients, this study utilizes the modified genetic algorithm (GA-SA) based on the sensitivity of the model parameters to the model outputs which can provide the stochastic model parameters in response to the uncertainty in the hydrological data (Wu et al. 2012a, 2012b). Therefore, regarding the proposed ANN_GA-SA_MTF model, the associated appropriate parameters (i.e. ANN weights, bias and adjusting factor) could be calibrated by using the GA-SA method with the numerous transfer functions of interest. Accordingly, the stochastic information on the hydrological estimates, including the statistical moments and 95% of confidence interval (i.e. prediction interval) can be quantified through the proposed ANN_GA-SA_MTF model using a number of calibrated parameters via the GA-SA method with the multiple transfer functions. The aforementioned GA-SA method is introduced as follows.
The genes selected for the crossover and mutation can be referred to in Figure 3 (Wu et al. 2012a, 2012b).
Calculation of weighted average of hydrological estimates
Quantification of stochastic properties of hydrological forecasts
Since the proposed ANN_GA-SA_MTF can provide a number of estimated hydrological variables with numerous sets of appropriate parameters regarding a variety of transfer functions calibrated by the GA-SA method, the quantiles of estimated hydrological variables, i.e. the probability distribution, can be derived from the aforementioned large number of resulting hydrological variables. In general, the best-fit theoretical probability function should be identified in advance via a statistical test method, e.g., the K-S test and Chi-square methods in order to establish the variable's quantile relationship, i.e. the cumulative probability distribution. Nevertheless, it is difficult to identify the best-fit probability function from a large number of candidate distributions and parameter estimation procedures (Haddad & Rahman 2011). To avoid this problem, this study adopts a nonparametric method to calculate the quantiles of resulting hydrological variables from the proposed ANN_GA-SA_MTF method, i.e. the weighted likelihood sample quantile estimator method (Yang & Tung 1996). The weighted likelihood sample quantile estimator method is addressed as follows.
In summary, the stochastic properties of the resulting hydrological forecasts in terms of quantiles under different cumulative probabilities from the proposed ANN_GA-SA_MTF model can be quantified by using the weighted likelihood sample quantile estimator method with the hydrological estimates provided by the proposed ANN_GA-SA_MTF model with various transfer functions of interest. Thus, the estimated quantiles can be advantageous to the reliability assessment for the resulting hydrological forecasts and the variance analysis for the effect of climatic change on the observations.
Real-time correction of estimation using RTEC_TS&KF
According to the aforementioned model concept, the proposed ANN_GA-SA_MTF can quantify the stochastic information of the hydrological estimates , including statistical moment and 95% of confidence intervals (i.e., prediction intervals). However, the accuracy of the resulting hydrological forecasts from Equation (16) are possibly impacted by the uncertainties in observations as the training dataset attributed to the climate change and occurrence of extreme hydrological events. Therefore, this study collaborates the real-time error correction method RTEC_TS&KF (Wu et al. 2012a, 2012b), developed using the time-series method with the Kalman filtering approach based on the difference between the observation and forecasts at the previous time steps during an event, to facilitate the accuracy of the resulting hydrological forecasts, i.e. the weighted average of hydrological estimates provided by the proposed ANN_GA-SA_MTF model. The brief introduction to RTEC_TS&KF method applied in the correction of hydrological forecasts is addressed as follows.
Therefore, in this study, the correction of the resulting hydrological forecast from the proposed ANN_GA-SA_MTF model could be carried out through the updating process within RTEC_TS&KF method as shown in Figure 4.
Model framework
To sum up the above introduction to the concepts of interest, the model framework for the proposed ANN_GA-SA_MTF model could be classified into two parts, model training and model application. Regarding the model development, the proposed ANN_GA-SA_MTF is developed on the basis of the conventional ANN model whose the parameters, including the ANN weights and bias at neurons within the input-hidden-output layers for the transfer functions of interest, are calibrated through the GA-SA method with a large number of simulated hydrological data as the training dataset by the non-normal correlated multivariate Monte Carlo approach (Wu et al. 2006). With respect to the model application, for each transfer function, the hydrological variables are estimated by means of ANN_GA-SA_MTF model under the given observations/predictions. The stochastic properties of resulting hydrological forecasts are then quantified by the weighted likelihood sample quantile estimator method; and their weighted averages are computed as model output, i.e. the hydrological forecast. Eventually, the resulting model outputs are corrected using the RTEC_TS&KF method in accordance with the difference between observations and forecasts at the previous time steps. The development and application framework of the proposed ANN_GA-SA_MTF method can be summarized as.
Model development
Step [1] Generate the hydrological data as the training dataset using the non-normal correlated multivariate Monte Carlo simulation approach with the statistical features of the hydrological variables.
Step [2] Identify the appropriate hydrological variates as the model inputs through the sensitivity analysis with the simulated training dataset.
Step [3] Standardize the desired model inputs and model outputs as the dimensionless variates listed at (0,1) or (−1,1) using Equations (1) and (2) in order to reduce the effect of the scales in time and space on the hydrological data.
Step [4] Specify the number of neuron nodes at hidden layers through the equations as shown in Table 1.
Step [5] Calibrate the parameters of the proposed ANN_GA-SA_MTF model, i.e. the ANN weights, and bias at each hidden layers at different layers as well as the adjusting factor, using GA-SA approach with the following objective or loss function Equation (12).
Step [6] Calculate the weighted factors of transfer functions used in the calculation of weighted average of hydrological estimates using Equation (14).
Model application
Step [1] Estimate the hydrological variates and the corresponding 95% of confidence intervals (i.e., prediction interval) from the ANN_GA-SA_MTF model with numerous sets of calibrated parameters for a variety of transfer functions of interest.
Step [2] Calculate the weighted average of the resulting hydrological estimates, defined as the resulting hydrological forecasts, from the proposed ANN_GA-SA_MTF model via Equation (16).
Step [3] Carry out the real-time correction of weighed weighted estimations of the resulting hydrological forecasts by the RTEC_TS&KF method via Equations (19)–(24). The introduced model framework can be referred to in Figure 5.
RESULTS AND DISCUSSION
To demonstrate the reliability and accuracy of the resulting hydrological forecasts from the proposed ANN_GA-SA_MTF model, 10-days rainfall-induced water levels are regarded as the hydrological variates of interest used in the model development and verification. The results and relevant discussion are described as follows.
Study area and data
In this study, Shanping Gauge located within Shanping River, a major tributary to Touquian River in northern Taiwan, is selected as the study area as shown in Figure 6. The observations regarding 10-days rainfall depths and water levels are adopted as the study data. Note that the 10-days rainfall depths is the areal average of rainfall amount of 10 days calculated from the quantitation precipitation estimations (QPE) at grids within the study area provided by the Taiwan Central Weather Bureau from 2005 to 2018 as shown in Figure 7. Also, the corresponding the water level observations at Shanping Gauge (see Figure 6) are received from Taiwan Water Resources Agency. Since this study focuses on the rainfall and induced water levels every 10 days, the observed 10-days rainfall depths and water levels at 36 10-days periods are used in the model development and validation. Accordingly, uncertainty analysis for the 10-days rainfall depths and water levels are carried out to quantify the variation in the hydrological data within the study area. The aforementioned results from the uncertainty are applied in the generation of a large number of 10-days rainfall depths and water levels as the training and validating datasets for the proposed ANN_GA-SA_MTF model.
Quantification of uncertainties in 10-days rainfall depth and water level
As introduced in Section 2.7 Model framework, uncertainty analysis for the hydrological estimates (i.e., 10-days rainfall depths and water levels) should be conducted in advance to obtain their statistical properties, including the first statistical moments, 95% of confidence intervals and correlation coefficients as shown in Figure 8. By observing Figure 8, 10-days water level averagely ranges between 209.37 and 209.84 m. Of the statistics derived from the 10-days water level, although the average of 10-days water levels merely approximates 209.6 m, the corresponding statistics, the standard deviation of 0.47 m and the 95% of confidence interval (209.31 m–210.9 m), are significantly greater than those in the remaining 10-days periods; this result means that more variation possibly exists in the 10-days rainfall depths in the 26th 10-days period. With respect to the investigation of statistics of 10-days rainfall depths, the corresponding average ranges from 65 to 435 mm; the standard deviations and the 95% of confidence interval from the 20th to 26th (i.e., rainy season in Taiwan) obviously exceeds those in the remaining 10-days period; namely, similar to the 10-days water level, the variation in the 10-days rainfall depths varies with the time and more oscillation can be found at the 20th–26th 10-days periods. The above results reveal that the variation of the 10-days rainfall depth and water level markedly changes with time (10-days periods); thus, it possibly impacts the performance of calibrating the parameters (i.e., ANN weights and bias as well as adjusting factors) of the proposed ANN_GA-SA_MTF model. As a result, a large training dataset regarding the generated 10-days rainfall depths and water levels in response to the uncertainty in the observations is needed in the training of the proposed ANN_GA-SA_MTF model.
Apart from the statistical moments and 95% of confidence intervals, the dependence among 10-days rainfall depths and water levels in various 10-days periods, respectively, are represented in terms of the correlation coefficients as shown in Figure 9. Note the above correlation coefficients are separately calculated using the 10-days rainfall depth and water level at a specific time step (i.e. 10-days period) with those at the associated forward time steps. In the case of the dependence among the 10-days rainfall depths as shown in Figure 9(a), it can be seen that the correlation coefficient of the 10-days rainfall depth decreases from the forward 1st time step (0.33) to the 3rd time step (0.12) and then stays at a constant at the forward 4th to 5th time step (0.12). After that, the change in the correlation coefficient of 10-days rainfall depth between the forward 6th time step and 9th time step resembles the results from the correlation coefficients among the forward 1st and 5th time steps; that is, the correlation coefficient decreases from 0.12 to −0.03. It concludes that the 10-days rainfall depth in the current time steps is evidently and highly related to those at the continuous three time steps (10-days period); that is to say, every four 10-days rainfall depths can be regarded as the temporally correlated variables with a similar varying trend in time.
Also, From Figure 9(b), it can be observed that the correlation coefficient for the 10-days water level declines with the forward time steps; clearly, the correlation coefficient of the 10-day water level decreases from 0.79 (the forward 1 10-day period) to 0.31 (the forward 14 10-days periods). In particular, the correlation coefficient of 10-days water level for the first three forward time steps, on average, reaches 0.7; this implies that the 10-days water levels at the present time step is strongly related to those in the first three 10-days periods.
The above results summarize that the uncertainties of different degree can be found among the 10-days rainfall depths and water levels; furthermore, the 10-days rainfall depths and water levels among the continuous 10-days periods can be classified according to their high correlation. By doing so, the aforementioned results from the uncertainty analysis could be applied in the simulation of a large number of simulated 10-days rainfall depths and water levels and used in the model training and validating.
Simulation of 10-days rainfall depths and water levels
Since a large training dataset is expected in training the ANN model in order to provide more reliable estimates, the multivariate Monte Carlo simulation (MMCS) (Wu et al. 2006) method in which a number of non-normal correlated variables can be simulated based on the statistical moments and correlation structure, including the water resource, would be widely applied in reproducing data based on the results from the uncertainty analysis. Hence, using the MMCS method with the statistical properties as shown at Figure 8, 1,000 simulations of 10-day rainfall and water levels are achieved as the training dataset (see Figure 10). Note that not only the dependence among the 10-days rainfall depths and water levels are taken into account, but also the correlations between the 10-days rainfall depths and water levels in 36 10-days periods are considered in the simulations of 10-days rainfall depths and water levels. Accordingly, the parameters of the proposed ANN_GA-SA_MTF model for forecasting the 10-days rainfall depths and water levels are calibrated by adopting the resulting 1,000 simulations of the 10-days rainfall depths and water levels.
Sensitivity analysis for correlations between the rainfall and water levels in time
According to results from Section 3.2, the 10-days rainfall depths and water levels are separately correlated to those in the forward three 10-days periods. However, rainfall is a key hydrological factor significantly inducing the discharge and affecting the river stages. Thus, this study utilizes the standardized regression equation introduced in Section 2.4 to evaluate the impact of the rainfall depth in current and forward three 10-days periods on the water-level forecasts in the specific 10-days periods using 1,000 simulations. Figure 11 shows the regression coefficients of rainfall at the forward 6th 10-days periods regarding the water levels in the specific 10-day period, approximately ranging from −0.36 to 0.25. This can be observed that the regression coefficients of the 10-day rainfall change at the various forward 10-days periods; however, on average, the regression coefficient declines with the 10-days period. In detail, the regression coefficient decrease from 0.08 at the current 10-days period to 0.02 at the forward 1st 10-days period; it then reaches a constant (about −0.008) from the forward 2rd to 6th 10-days period. After that, the regression coefficient slightly rises to 0.008, indicating that the water levels in the specific 10-days period is sensible to the rainfall at the current and forward 1st 10-days periods.
In referring to the results from uncertainty and sensitivity analysis for the 10-days rainfall depth and water level, the rainfall and water levels in the specific 10-days periods are related to the associated rainfall in the forward three 10-days periods, respectively. With respect to the 10-days water levels, in addition to the water levels in the forward three 10-days periods, the rainfall in the current 10-days period should be treated as the dominant factor for estimating the 10-days water level; and it should be taken as reference regarding the establishment of the proposed ANN_GA-SA_MTF model for forecasting the 10-days rainfall depth and water level in the period of interest.
Establishment of ANN_GA-SA_MTF model
In summary, as forecasting the water level in the specific 10-days period through Equation (25), the associated 10-days rainfall depth should be estimated through Equation (26) in advance. Moreover, the parameters of the proposed ANN_GA-SA_MTF model and corresponding objective function (i.e., loss function) values can be determined via the GA-SA method with the training dataset in terms of 1,000 simulations of rainfall depths and water-levels in 36 10-days periods. Specifically, Equations (25) and (26) used in the proposed ANN_GA-SA_MTF model are individually derived for the 36 10-days periods corresponding to their own parameters calibrated.
Parameter calibration
It is well-known that in training the ANN model, i.e., calibrating the parameters of the ANN model, the number of hidden layers and the associated ANN weights are supposed to be defined in advance. Nevertheless, the neural network structures change in the model complexity and efficiency and the performance of the neural network can be obviously improved with the number of connections at various layers (Hunter et al. 2012). Generally speaking, regarding the relevant hydrological/hydraulic analysis, a three-layer network structure, i.e., the input/output layer and one hidden layer is comprehensively applied in training the ANN model in which the adjusting neuron weights could be effectively estimated via the back-propagation (BP) algorithm (e.g. Dawson & Wilby 2001; Khan et al. 2016; Sung et al. 2017); nevertheless, a large number of hidden layers, possibly exhibiting long-term dependencies, should be a challenge as the training process sets up a complicated network structure (Bengio 1991; Dawson & Wilby 2001). In this study, in order to the proposed ANN_GA-SA_MTF model in response to the nonlinear characteristics among hydrological variables, a four-layer network structure, consisting of the input and output layers as well as two hidden layers are used. Additionally, regarding the number of the hidden neurons defined in the network structures of the ANN model, several methods (see Table 1) can be used to estimate the number of hidden neurons as shown in Figure 12. In view of Figure 12, the number of hidden neurons used in the proposed ANN_GA-SA_MTF for 10-days rainfall-depth and water-level forecasts are located between 2–39 (10-days rainfall depth) and 2–17 (10-days water level), respectively; on average, the number of hidden neurons can be assigned as 10 (10-days rainfall depth) and 6 (10-days water level). To reduce the uncertainties in the selection of transfer functions in training the ANN model, a variety of the transfer functions as shown in Table 2 are adopted in this study. Additionally, this study utilizes a stochastic GA-based optimization method (GA-SA) to calibrate the parameter of the proposed ANN_GA-SA_MTF model, including the ANN weights, bias and adjusting factor; so that, the prior statistics of the calibrated parameters, including the mean and standard deviation, are summarized in Table 3.
Parameters . | Definition . | |||
---|---|---|---|---|
Transfer functions used | TF1-TF10 | |||
Input factors | 10-days rainfall | |||
10-days water level | ||||
Output factor | 10-days rainfall | |||
10-days water level | ||||
Number of hidden levels | 2 | |||
Number of hidden neurons | 10-days rainfall | 10 | 1st hidden layer | 5 |
2nd hidden layer | 5 | |||
10-days water level | 6 | 1st hidden layer | 3 | |
2nd hidden layer | 3 | |||
Calibration of parameters of transfer function | Number of optimizations | 1,000 | ||
Weights of neurons | Mean | 1 | ||
Standard deviation | 5 | |||
Bias of function | Mean | 0 | ||
Standard deviation | 1 | |||
Adjusting factor | Mean | 0.5 | ||
Standard deviation | 0.35 |
Parameters . | Definition . | |||
---|---|---|---|---|
Transfer functions used | TF1-TF10 | |||
Input factors | 10-days rainfall | |||
10-days water level | ||||
Output factor | 10-days rainfall | |||
10-days water level | ||||
Number of hidden levels | 2 | |||
Number of hidden neurons | 10-days rainfall | 10 | 1st hidden layer | 5 |
2nd hidden layer | 5 | |||
10-days water level | 6 | 1st hidden layer | 3 | |
2nd hidden layer | 3 | |||
Calibration of parameters of transfer function | Number of optimizations | 1,000 | ||
Weights of neurons | Mean | 1 | ||
Standard deviation | 5 | |||
Bias of function | Mean | 0 | ||
Standard deviation | 1 | |||
Adjusting factor | Mean | 0.5 | ||
Standard deviation | 0.35 |
Note that in this study, the 650 simulations randomly extracted from the training dataset (named development set), comprising the 1,000 simulations of 10-days rainfall depths and water levels (i.e., training sets), are used in training the proposed ANN_GA-SA_MTF model; the remaining 350 simulations (i.e., the 35% of the training dataset) are treated as the validation set. Thus, the training of the proposed ANN_GA-SA_MTF model for forecasting the 10-days rainfall depth and water level can be accomplished to obtain numerous sets of appropriate associated parameters regarding 10 transfer functions in 36 10-days periods. Tables 4 and 5 illustrates the calibrated ANN_GA-SA_MTF parameters of the for 10-days rainfall depth and water level in the 20th 10-days period under consideration of the transfer function TF1. In addition to appropriate parameters, Figure 13 shows the weighted factors of transfer functions of interest obtained via Equation (14) for quantifying the corresponding weighed average of hydrological estimates in the 20th 10-days period; it can be seen that the weighted factors regarding the ten transfer function for the 10-days rainfall depth resemble those for the 10-days water level. In comparison to the resulting weighted factors, TF1 (Sigmoid function) and TF5 (ReLU) make more significant contributions to predict the hydrological estimates as a result of the largest weighted factors (approximately 0.18); on the contrary, TF8 (Inverse absolute function) make few contributions to the hydrological forecasts in association with the worst factor. This reveals that estimated hydrological variates by the ANN models are significantly affected by the transfer functions selected and relevant parameters with high likelihood. Consequently, the effect of uncertainty in the selection of transfer functions on the estimation of hydrological variable through the ANN model is supposed to be considered in training the ANN model.
Model validation
- (1)
10-days rainfall depth
From Figure 16, it can be observed that the validated rainfall depths in 36 10-days periods obviously exceed the resulting estimations from the proposed ANN_GA-SA_MTF model; also, Figure 18(1) indicate the root mean square error (RMSE) drastically varies between 25 and 450 mm (on average 128 mm) and the KG value approximate 1.28, meaning that the temporal change in the estimated 10-days rainfall depth is somewhat unlike that regarding the validated ones. These results reveal that the proposed ANN_GA-SA_MTF model possibly overestimates the 10-days rainfall depths which depart from the validation sets in 10-days periods possibly attributed to high variation in observed 10-days rainfall depth (see Figure 18(1)).
In spite of overestimating the 10-days rainfall depths, the resulting prediction intervals of 10-days rainfall-depth estimates from the proposed ANN_GA-SA_MTF covers the validated ones in various 10-days periods. Actually, the validations of the 10-day rainfall depth approaches the lower bound of the prediction interval, implying that the 10-days rainfall depth estimates can capture the validation data with high likelihood. That is to say, the proposed ANN_GA-SA_MTF model can provide reliable 10-days rainfall-depth estimates associated with the acceptable bias.
- (2)
10-days water level
With respect to the 10-days water-level estimates, the resulting estimation and quantified statistical properties of the water levels in various 10-days periods are compared to 350 sets of validation data (i.e., validation sets) (see Figure 17). It can be seen that as similar to the evaluation of the 10-days rainfall-depth estimates, the validation sets are randomly located between the prediction intervals in 36 10-days periods, unlike the validation sets regarding the 10-days rainfall depth markedly being close to the lower bound. In detail, Figure 18(2) indicates that the RMSE of 10-days water-level estimates changes from 0.2 to 0.7 m with an excellent agreement with the validated data, in which the associated KG is about 1.0, except at the 21th 10-days period with the bias of 0.7 m; the remaining forecasts approach the validations with a small error of 0.4 m. In addition, the validated data are located within the 95% confidence interval, revealing that the proposed ANN_GA-SA_MTF is capable of providing the reliable 10-days water-level estimates.
- (3)
Summary
In summary, the results (i.e. weighted averages) of the 10-days rainfall depths and water levels separately provided by the proposed ANN_GA-SA_MTF model are possibly impacted by the uncertainties in the calibrations of appropriate parameters attributed to the variations in the training dataset; accordingly, their accuracy evidently changes with the 10-days periods. Furthermore, the wider ranges of the prediction intervals (i.e., 95% of confidence intervals) regarding 10-days rainfall depths and water levels exist at various 10-days periods. This reveals that even using the optimal ANN weights for the transfer functions, the results exhibit a significant difference among the corresponding estimations using a variety of transfer functions. Nevertheless, the proposed ANN_GA-SA_MTF model can provide the reliable results which can capture the observation with high opportunity in response to the variation in the training dataset and relevant induced uncertainties in relation to the model parameters.
Real-time correction of resulting forecasts
Since the climate change and occurrence of extreme events probably trigger the uncertainty in hydrological data (e.g., Wu et al. 2012a, 2012b), the accuracy of the resulting 10-days rainfall depths and water levels (defined as forecasts) from the proposed ANN_GA-SA_MTF model might be affected by the above data variation. Thus, to provide the 10-day rainfall-depth and water-level forecasts with high accuracy via the proposed ANN_GA-SA_MTF model, the real-time correction method based on the difference between observation and forecasts in the past time steps, i.e., the RTEC_TS&KF developed by Wu et al. (2012a, 2012b), is utilized in simultaneously adjusting the results from the proposed ANN_GA-S_MTF model.
This study illustrates the process of correcting the forecasted 10-days rainfall depths and water levels using the observations of 10-days rainfall depths and water levels at the Shanping gauge during the 4th–15th 10-days periods in 2018 (see Figure 19). Figure 19 presents that the observed 10-days rainfall depths are positively related to the 10-days water levels, in which the corresponding correlation coefficient approximates 0.72; also, they reach the maximum values, 209.67 m (10-days water level) and 128.6 mm (10-days rainfall depth), respectively, in the 15th and 7th 10-days periods.
Figure 20 shows the comparison among the observed, forecasted and corrected 10-days rainfall depths and water levels, respectively. Note that the rainfall and water level in the 4th–10th 10-days periods are regarded as the observations, whereas the observations in the remaining 10-days periods are regarded as the validations. In detail, the rainfall depths and water levels in the 11th–15th 10-days period are forecasted and corrected by the proposed ANN_GA-SA_MTF model and RTEC_TS&KF method, respectively, model with the observation at the 4th–10th 10-days periods. Additionally, the corresponding performance indices (root mean square error RMSE and correction coefficient) are also calculated as shown in Table 6. By referring to Figure 19 and Table 6, it can be seen that the 10-days rainfall-depth forecasts has good match regarding the varying trend in time with the observations, but with an overestimated bias (from 165.7 to −0.73 mm), of which the average approximates 46.64 mm. Moreover, the rainfall forecasts at the 8th and 9th 10-days periods slightly lie outside the preditions; However, after the real-time correction implemented by the RTEC_TS&KF method, the corresponding corrections are effectively improved to be closer to the observations and the corresponding performance indices RMSE and the correlation coefficient significantly declines from 46.5 to 15.43 mm and rises from 0.67 to 0.98, respectively.
Performance index . | 10-days rainfall depth (mm) . | 10-days water level (m) . | ||
---|---|---|---|---|
Forecasted . | Corrected . | Forecasted . | Corrected . | |
Root mean error square (RMSE) | 46.642 | 15.430 | 0.295 | 0.109 |
Correlation coefficient | 0.665 | 0.976 | 0.670 | 0.845 |
Performance index . | 10-days rainfall depth (mm) . | 10-days water level (m) . | ||
---|---|---|---|---|
Forecasted . | Corrected . | Forecasted . | Corrected . | |
Root mean error square (RMSE) | 46.642 | 15.430 | 0.295 | 0.109 |
Correlation coefficient | 0.665 | 0.976 | 0.670 | 0.845 |
In the case of the 10-days water level, similar to the 10-days rainfall depth, the observed 10-days water at the 8th–12th periods in 2019 level are slightly lower than or close to the bound of the prediction interval. Thus, the estimations significantly depart from the observations with a large overestimated bias (i.e. RMSE) being 0.295 mm, but with an acceptable temporal varying trend (correlation coefficient = 0.67). Although the proposed ANN_GA-SA_MTF model overestimates the 10-days water level and the associated prediction interval, the RTEC_TS&KF can predict the errors in the future 10-days periods so as to correct the corresponding forecasts; namely, the 10-days water-level forecasts are subtracted by a predicted error changing with time, on average, 0.62 m. Accordingly, the performance indices can be effectively amended, that is, the RMSE values are from 0.298 m down to 0.1 m, and the correction coefficient rises from 0.67 to the 0.84.
To sum up the above results, the observed 10-days rainfall depths and water levels locate within or near the lower bound of the prediction interval quantified by the proposed ANN_GA-SA_MTF model. Moreover, the temporal change in the resulting forecasts from the proposed ANN_GA-SA_MTF model resembles that in the observations, revealing that the proposed ANN_GA-SA_MTF model can produce the 10-days rainfall depth and water-level forecasts with a highly comparable varying trend in time. Despite the resulting forecasts possibly evincing an acceptable bias from the proposed ANN_GA-SA_MTF model, by collaborating the RTEC_TS&KF method, the accuracy of the 10-days rainfall depth and water-level forecasts can be effectively improved based on the difference between the observation and forecasts in the previous 10-days period.
CONCLUSION
This study intends to develop a stochastic artificial neural network (ANN_GA-SA_MTF) model for forecasting the hydrological estimates in association with the stochastic properties, including the statistical moments and probabilities. To train the proposed ANN_GA-SA model, the multivariate Monte Carlo simulation (Wu et al. 2006) is adopted to generate a large number of hydrological data as the training dataset; they are then used to identify the sensible model inputs via the sensitivity analysis and to calibrate the associated parameters using the modified genetic (GA-SA) algorithm based on the sensitivity to uncertain parameters (Wu et al. 2012a, 2012b). Moreover, within the proposed ANN_GA-SA_MTF model, the multi-transfer functions are utilized in the combination of standardized variates at the input layer, hidden layers and output layers. Through the proposed ANN_GA-SA_MTF model, the reliability of the hydrological forecasts, defined as the likelihood of the forecasts approaching the observations, could be evaluated through the prediction interval (i.e., 95% confidence interval) by comparing the upper and lower bounds of the prediction interval with the observations under the uncertainties in the selection of transfer functions and associated parameters. Eventually, the hydrological forecasts in terms of the weighted average of results from the proposed ANN_GA-SA_MTF model can be corrected by the real-time error correction method (RTEC_TS&KF) using the time-series and Kalman filtering methods (Wu et al. 2012a, 2012b) as to enhance forecast accuracy.
To demonstrate the performance of the proposed ANN_GA-SA_MTF model in real-time forecasts of hydrological variates, the observations of the 10-days rainfall depths sand water levels recorded from 2008 to 2017 at Shanping Gauge in Shinging Creek, a tributary of Toquian River, are selected as study data within the study area. Using 1,000 simulations of 10-days rainfall depths and water levels as the training dataset, the numerous sets of appropriate parameters of the transfer functions used within the proposed ANN_GA-SA_MTF model for 36 10-days periods are calibrated and validated. The results from the model training and validation indicate that the simulated rainfall depths and water levels in 36 10-days periods mostly locate between their prediction intervals as quantified by the proposed ANN_GA-SA_MTF model; this implies that the proposed ANN_GA-SA_MTF is supposed to estimate the 10-days rainfall depths and water levels which approach the observations with high likelihood. In spite of the forecasted 10-days rainfall depth and water-levels by the proposed ANN_GA-SA_MTF model with the difference of varying degree from observations, the aforementioned forecasts is capable of being immediately corrected by means of the RTEC_TS&KF method with real-time observations received through IoT to enhance their accuracy.
The proposed ANN_GA-SA_MTF model can quantify the uncertainties in the forecasted 10-day rainfall depths and water levels with an acceptable bias under consideration of uncertainties in the types of transfer functions and associated connection weights and bias at various layers, but without taking into account network structures (i.e., number of hidden layers). Therefore, to evaluate the effect of network structures on the resulting hydrological forecasts from the proposed ANN_GA-SA_MTF model, the various network structures could be applied in the model training, and the corresponding results can be compared with those from a given number of hidden layers. Furthermore, since the coverage probability of a confidence interval in statistics is defined as the ratio of the true values being located within interval, the quantified 95% of confidence intervals by the proposed ANN_GA-SA_MTF model can be applied in the well-known hydrological/hydraulic modeling for evaluating the performance of the resulting estimations/forecasts under consideration of uncertainties in the observations. Eventually, the proposed model concept and framework of training the proposed ANN_GA-SA_MTF model, including simulating of a large number of training datasets using the multivariate Monte Carlo simulation, calibrating parameters through the GA-SA algorithm and adjusting model outputs carried out by the real-time error correction method RTEC_TS&KF, can be employed in the stochastic modeling the artificial intelligence (AI) methods (e.g., CNN, LSTM) and be advantageous to facilitate the forecast's reliability and accuracy.