Modeling of hydrological time series is essential for sustainable development and management of lake water resources. This study aims to develop an efficient model for forecasting lake water level variations, exemplified by the Poyang Lake (China) case study. A random forests (RF) model was first applied and compared with artificial neural networks, support vector regression, and a linear model. Three scenarios were adopted to investigate the effect of time lag and previous water levels as model inputs for real-time forecasting. Variable importance was then analyzed to evaluate the influence of each predictor for water level variations. Results indicated that the RF model exhibits the best performance for daily forecasting in terms of root mean square error (RMSE) and coefficient of determination (R2). Moreover, the highest accuracy was achieved using discharge series at 4-day-ahead and the average water level over the previous week as model inputs, with an average RMSE of 0.25 m for five stations within the lake. In addition, the previous water level was the most efficient predictor for water level forecasting, followed by discharge from the Yangtze River. Based on the performance of the soft computing methods, RF can be calibrated to provide information or simulation scenarios for water management and decision-making.
INTRODUCTION
Lake water level forecasting has important applications for identifying the main influencing factors of water level fluctuations, determination of the watershed hydrological cycle variation trends under projections of global climate changes, integration of reservoir management schemes, and ensuring sufficient freshwater supply (Wantzen et al. 2008; Hu et al. 2008; Kourgialas et al. 2015). However, lake water level variations involve a complex nonlinear process, which integrates precipitation, discharge from tributaries, topography, and so on. The variations become even more complex when the lake interacts with a large river (e.g., the interaction of the Poyang Lake with the Yangtze River). Reliable and accurate forecasting of lake water level has always been a challenge for hydrologists and water resource managers.
In recent decades, numerous forecasting techniques, including physically based hydrodynamic models (e.g., CHAM, MIKE21, and EFDC), time series analysis (e.g., auto-regressive moving average and auto-regressive integrated moving average), and soft computing methods (e.g., artificial neural networks (ANNs), support vector regression (SVR), and model trees), have been developed to simulate hydrological time series. (Belmans et al. 1983; Hsu et al. 1995; Dawson & Wilby 2001; Alvisi et al. 2006; Khan & Coulibaly 2006; Altunkaynak 2007; Lai et al. 2013; Li et al. 2013). In particular, physically based hydrodynamic models exhibit the best performance in forecasting water level. However, these methods require detailed terrain data, as well as complex boundaries and parameters as input, and are computationally expensive and limited to restricted duration (Li et al. 2015). Time series analysis is more complex and unreliable than the neural network model (Altunkaynak 2007). In addition, time series analysis does not consider the nonstationary and nonlinear characteristics of data structure (Kumar & Maity 2008). It is difficult to use nonlinear and complex exhibition of model variables for accurate quantification of uncertainty associated with the predictions, which often mislead water resource managers during decision-making (Aqil et al. 2007; Mustafa et al. 2012). Soft computing methods are capable of capturing complex nonlinear relationships between inputs and outputs without the need for explicit knowledge of the physical process, and they also avoid the creation of extremely complex models in the rare cases when all information is available (Trichakis et al. 2011). Soft computing methods, particularly ANN and SVR, have been successfully applied to solve nonlinear problems in hydrological series simulations, such as groundwater level forecasting (Daliakopoulos et al. 2005; Yoon et al. 2011; Gholami et al. 2015), rainfall prediction (Chau & Wu 2010), and surface water level/discharge forecasting (Altunkaynak 2007; Callegari et al. 2015).
The random forests (RF) model has been proposed as a new soft computing method by Breiman (2001). RF handles nonlinear and non-Gaussian data well, is amenable to model interpretation, and is free of over-fitting problems as the number of trees increases. Furthermore, RF provides a measure of the relative importance of descriptors, which can be further utilized in variable selection (Genuer et al. 2010). In the past few years, RF has been employed to simulate suspended sediment concentration and soil organic carbon stocks (Francke et al. 2008; Were et al. 2015). However, soft computing methods with different algorithms may have different levels of adaptability for diverse problems. For example, Yoon et al. (2011) found that the performance of ANN is better than that of SVM in the model training and testing stages when predicting groundwater level in a coastal aquifer. Rodriguez-Galiano et al. (2015) found that the RF method performs better than ANN and SVM in predicting and mapping mineral prospectivity. Were et al. (2015) concluded that RF has the highest tendency for overestimation, and that SVR is the best model for predicting soil organic carbon stocks. However, few studies have compared the adaptability and accuracy of different soft computing methods for hydrological series forecasting, especially for highly nonlinear water level forecasting. In the present work, the RF model was first utilized for forecasting water level fluctuations and then compared with commonly used ANN, SVR, and a linear model (LM) in terms of accuracy.
Poyang Lake, the largest freshwater lake in China, is fed by five main tributaries and is connected to the Yangtze River, whose blocking effect (even intrusion) greatly affects water level variations in the lake. In recent decades, intensified global climate changes and anthropogenic activities have greatly altered Poyang Lake's water regime to some extent (Guo et al. 2008), with more frequent occurrence of floods and droughts, which take on a trend of sharp transformation (Guo et al. 2012; Li & Zhang 2015). Building a dam has been proposed in the downstream area of the lake to alleviate severe droughts and flood risk in Poyang Lake (Huang et al. 2015). A few researchers have studied Poyang Lake water level forecasting (Jiang & Huang 1997; Lan 2014; Li et al. 2015). However, few have taken into account the effects of both time lag and the previous hydrological status of the lake. The time lag effect was proved to be important for hydrological series forecasting (Aqil et al. 2007; Chau & Wu 2010; Bao et al. 2014). Chau & Wu (2010) found notable differences at 1-, 2-, and 3-day ahead by using partial autocorrelation for daily rainfall prediction using an ANN model. Cross-correlation has been used to determine lag times of precipitation and discharge (Yoon et al. 2011; Li et al. 2015). The trial-and-error method was also utilized to obtain the most sensitive time lag (Hipni et al. 2013). Therefore, an accurate water level forecasting model that considers the previous hydrological status and the time lag effect is required to provide suggestions for the development and management of water resources. Such a model can also help identify the main factors that influence water levels in Poyang Lake.
The specific objectives for this paper were: (1) to determine a model of highest accuracy by comparing RF with ANN, SVR, and the LM model, and incorporating discharge from lake catchment tributaries and the Yangtze River, the time lag effect and the previous hydrological status for water level forecasting; and (2) to explore the relative importance of each predictor for different water level stations within the lake. The proposed model provides a useful tool for water resource management and for identifying the major influencing factors for lake water level fluctuations.
MATERIALS AND METHODS
Study area
Data collection
As shown in Figure 1, Jiujiang station is the closest representative of the Yangtze River to affect water level variations within the lake. Meanwhile, given the missing discharge data in Jiujiang station prior to 1988, Hankou station was chosen to be a substitute in the model, as it has significant correlation with Jiujiang station (correlation coefficient = 0.995). The data applied in this study include the following: (1) daily discharge observations of six hydrological stations in the lake's catchment and the Yangtze River from 1955 to 2012, namely, Waizhou (wz), Hushan (hs), Dufengkeng (dfk), Lijiadu (ljd), Meigang (mg), and Wanjiabu (wjb) stations of the upstream tributaries (the Gan, Fu, Xin, Rao, and Xiu Rivers) and Hankou station of the Yangtze River (Table 1); and (2) daily water level observations of five gauge stations within the lake, namely, Hukou, Xingzi, Duchang, Tangyin, and Kangshan stations (from north to south). Observation at Tangyin station started in 1962; thus, only the 1962–2012 data were considered from this station. The observations on water level and discharge were obtained from the Hydrological Bureau of Jiangxi Province. Figure 1 shows the locations of these hydrological stations.
Characteristics of input and output data used in this study
Inputs and outputs . | Stations . | Duration . | Drainage areaa (104 km2) . |
---|---|---|---|
Gan River | Waizhou | 1955–2012 | 8.12 |
Fu River | Lijiadu | 1955–2012 | 1.58 |
Xin River | Meigang | 1955–2012 | 1.55 |
Rao River | |||
Chang River | Dufengkeng | 1955–2012 | 1.5 |
Lean River | Hushan | 1955–2012 | |
Xiu River | |||
Liao River | Wanjiabu | 1955–2012 | 1.48 |
Yangtze River | Hankou | 1955–2012 | – |
Poyang Lake | Hukou | 1955–2012 | – |
Xingzi | 1955–2012 | – | |
Duchang | 1955–2012 | – | |
Tangyin | 1962–2012 | – | |
Kangshan | 1955–2012 | – |
Inputs and outputs . | Stations . | Duration . | Drainage areaa (104 km2) . |
---|---|---|---|
Gan River | Waizhou | 1955–2012 | 8.12 |
Fu River | Lijiadu | 1955–2012 | 1.58 |
Xin River | Meigang | 1955–2012 | 1.55 |
Rao River | |||
Chang River | Dufengkeng | 1955–2012 | 1.5 |
Lean River | Hushan | 1955–2012 | |
Xiu River | |||
Liao River | Wanjiabu | 1955–2012 | 1.48 |
Yangtze River | Hankou | 1955–2012 | – |
Poyang Lake | Hukou | 1955–2012 | – |
Xingzi | 1955–2012 | – | |
Duchang | 1955–2012 | – | |
Tangyin | 1962–2012 | – | |
Kangshan | 1955–2012 | – |
aData are from the study on Poyang Lake (the Editorial Committee of ‘Study on Poyang Lake’ 1987).
Soft computing methods
RF model
Demonstration of the RF methodology (Malekipirbazari & Aksakalli 2015).
SVR model
An internationally recognized uniform method for SVM parameter optimization has not been established. This study adopted the most commonly used method, in which , C and
are calibrated in a certain range by grid search in R statistical environment. Similarly, 500 pairs of parameters were tried and the set with the best performance was selected.
ANN model
Model training and evaluation
Input–output scenarios
For all of the soft computing methods regarded in this study, the daily discharge observations of the seven hydrological stations were used as input variables, and the water level of the five gauge stations within Poyang Lake was chosen as output in each model. These specific variables were selected based on the unique hydrological processes of Poyang Lake, in which water level variations are mainly influenced by discharge from catchment and the Yangtze River. In addition, describing the hydrological processes through a nonlinear model is necessary (Chen et al. 2015). For simplicity, the daily water level forecasting of Hukou station was used as an example for evaluating the performance of RF, SVR, ANN, and LM. Furthermore, to examine the effect of time lag and previous lake water level on forecasting performance, three input scenarios were developed: (1) the current daily discharge from the seven hydrological stations of the Poyang Lake tributaries and the Yangtze River; (2) the daily discharge of the seven stations between day (t) and (t-5); and (3) on the basis of scenario 2, the average lake water level over the previous week (wl7) was incorporated. The trial-and-error method was then used and the time lag with the highest accuracy of water level forecasting for Poyang Lake was determined as most sensitive.
Performance evaluation
The popular v-fold cross-validation, which provides a good trade-off between model over-fitting and under-fitting, was employed to evaluate the performance of the candidate models (Yoon et al. 2011; Hipni et al. 2013). The entire data sets (daily records from 1955 to 2012) were randomly partitioned into v equal-sized subsets. During each modeling process, one of the partitions was used for validation, while the others were used for training. Furthermore, the modeling process was repeated v times, and the performance metrics were averaged to achieve the final performance. In reference to similar studies, using a v of 5, 10, and 20 could result in slightly different error estimates, which are often not significant (Feng et al. 2005; El-Shafie & Noureldin 2010; Hipni et al. 2013). Therefore, five-fold cross-validation is used to evaluate the performance of the models regarded in this work to reduce computing time.
The terms ‘training’ and ‘testing’ of soft computing models correspond to the calibration and validation of physically based hydrodynamic model. Data preparation and analysis were conducted using Microsoft Excel 2007 and R 3.1.3. Specifically, we have used the implementation and optimization of RF, ANN, and SVR available in the ‘randomForests’, ‘e1071’, and ‘nnet’ package, respectively, in R statistical environment (Team 2014). The model parameter sensitivity analysis was conducted by generating 500 model parameter sets for each model. The model performance was estimated using here the NSCE value for each parameter set.
RESULTS AND DISCUSSION
Comparison of the models for daily water level forecasting
Forecasting performance of RF, SVR, ANN, and LM for forecasting water level.
Comparison of RF and ANN for RMSE variations and model stabilization.
Effect of time lag on daily water level forecasting
In this subsection, a series of time lags is selected to predict daily water level using RF and five-fold cross-validation under scenario 2 for each station. For simplicity, only the performance during the testing stage is displayed (Table 2). The models with a time lag of 4 days for both the Yangtze River (Hankou station) and the catchment tributaries exhibited the best performance, with the lowest RMSE values of 0.51, 0.55, 0.56, and 0.46 m for Hukou, Xingzi, Duchang, and Tangyin, respectively. Thus, the time lag of discharge from the tributaries and the Yangtze River has a substantial influence on water level variations. The inflow for different intervals of time has significant influence on the predicted flow/water level (Aqil et al. 2007). The R(t-3)T(t-3) time lag of the Kangshan station was obtained, which slightly outperforms the time lag of R(t-4)T(t-4) (Table 2). Simply put, the R(t-4)T(t-4) time lag was chosen for the forecasting model of all the five gauge stations in scenarios 2 and 3.
The performance evaluation for scenario 2 using RF and five-fold cross-validation for daily water level forecasting
. | Hukou . | Xingzi . | Duchang . | Tangyin . | Kangshan . | |||||
---|---|---|---|---|---|---|---|---|---|---|
. | RMSE . | R2 . | RMSE . | R2 . | RMSE . | R2 . | RMSE . | R2 . | RMSE . | R2 . |
R(t)T(t-1) | 0.69 | 0.967 | 0.71 | 0.957 | 0.69 | 0.943 | 0.58 | 0.938 | 0.52 | 0.917 |
R(t)T(t-2) | 0.67 | 0.968 | 0.69 | 0.959 | 0.68 | 0.945 | 0.57 | 0.940 | 0.52 | 0.918 |
R(t)T(t-3) | 0.66 | 0.969 | 0.68 | 0.96 | 0.68 | 0.946 | 0.56 | 0.941 | 0.52 | 0.916 |
R(t)T(t-4) | 0.66 | 0.97 | 0.68 | 0.961 | 0.67 | 0.946 | 0.57 | 0.940 | 0.54 | 0.913 |
R(t)T(t-5) | 0.65 | 0.971 | 0.68 | 0.96 | 0.68 | 0.945 | 0.57 | 0.939 | 0.55 | 0.908 |
R(t-1)T(t-1) | 0.62 | 0.973 | 0.66 | 0.963 | 0.66 | 0.948 | 0.54 | 0.946 | 0.50 | 0.926 |
R(t-1)T(t-2) | 0.62 | 0.974 | 0.64 | 0.965 | 0.65 | 0.950 | 0.53 | 0.947 | 0.49 | 0.927 |
R(t-1)T(t-3) | 0.61 | 0.974 | 0.64 | 0.965 | 0.64 | 0.951 | 0.53 | 0.948 | 0.50 | 0.925 |
R(t-1)T(t-4) | 0.60 | 0.975 | 0.64 | 0.965 | 0.64 | 0.951 | 0.54 | 0.946 | 0.51 | 0.920 |
R(t-1)T(t-5) | 0.60 | 0.975 | 0.60 | 0.975 | 0.65 | 0.950 | 0.55 | 0.944 | 0.53 | 0.916 |
R(t-2)T(t-2) | 0.57 | 0.978 | 0.60 | 0.969 | 0.61 | 0.956 | 0.50 | 0.954 | 0.46 | 0.935 |
R(t-2)T(t-3) | 0.56 | 0.978 | 0.60 | 0.97 | 0.60 | 0.957 | 0.50 | 0.954 | 0.47 | 0.932 |
R(t-2)T(t-4) | 0.56 | 0.978 | 0.60 | 0.969 | 0.61 | 0.955 | 0.51 | 0.952 | 0.49 | 0.927 |
R(t-2)T(t-5) | 0.56 | 0.978 | 0.61 | 0.968 | 0.63 | 0.953 | 0.52 | 0.949 | 0.51 | 0.921 |
R(t-3)T(t-3) | 0.53 | 0.981 | 0.57 | 0.973 | 0.57 | 0.961 | 0.47 | 0.959 | 0.45 | 0.938 |
R(t-3)T(t-4) | 0.53 | 0.98 | 0.57 | 0.972 | 0.59 | 0.960 | 0.48 | 0.957 | 0.47 | 0.933 |
R(t-3)T(t-5) | 0.54 | 0.98 | 0.59 | 0.97 | 0.60 | 0.957 | 0.50 | 0.953 | 0.49 | 0.926 |
R(t-4)T(t-4) | 0.51 | 0.982 | 0.55 | 0.974 | 0.56 | 0.963 | 0.46 | 0.961 | 0.46 | 0.937 |
R(t-4)T(t-5) | 0.53 | 0.98 | 0.58 | 0.972 | 0.59 | 0.959 | 0.49 | 0.956 | 0.48 | 0.930 |
R(t-5)T(t-5) | 0.53 | 0.981 | 0.56 | 0.973 | 0.57 | 0.961 | 0.47 | 0.959 | 0.47 | 0.933 |
. | Hukou . | Xingzi . | Duchang . | Tangyin . | Kangshan . | |||||
---|---|---|---|---|---|---|---|---|---|---|
. | RMSE . | R2 . | RMSE . | R2 . | RMSE . | R2 . | RMSE . | R2 . | RMSE . | R2 . |
R(t)T(t-1) | 0.69 | 0.967 | 0.71 | 0.957 | 0.69 | 0.943 | 0.58 | 0.938 | 0.52 | 0.917 |
R(t)T(t-2) | 0.67 | 0.968 | 0.69 | 0.959 | 0.68 | 0.945 | 0.57 | 0.940 | 0.52 | 0.918 |
R(t)T(t-3) | 0.66 | 0.969 | 0.68 | 0.96 | 0.68 | 0.946 | 0.56 | 0.941 | 0.52 | 0.916 |
R(t)T(t-4) | 0.66 | 0.97 | 0.68 | 0.961 | 0.67 | 0.946 | 0.57 | 0.940 | 0.54 | 0.913 |
R(t)T(t-5) | 0.65 | 0.971 | 0.68 | 0.96 | 0.68 | 0.945 | 0.57 | 0.939 | 0.55 | 0.908 |
R(t-1)T(t-1) | 0.62 | 0.973 | 0.66 | 0.963 | 0.66 | 0.948 | 0.54 | 0.946 | 0.50 | 0.926 |
R(t-1)T(t-2) | 0.62 | 0.974 | 0.64 | 0.965 | 0.65 | 0.950 | 0.53 | 0.947 | 0.49 | 0.927 |
R(t-1)T(t-3) | 0.61 | 0.974 | 0.64 | 0.965 | 0.64 | 0.951 | 0.53 | 0.948 | 0.50 | 0.925 |
R(t-1)T(t-4) | 0.60 | 0.975 | 0.64 | 0.965 | 0.64 | 0.951 | 0.54 | 0.946 | 0.51 | 0.920 |
R(t-1)T(t-5) | 0.60 | 0.975 | 0.60 | 0.975 | 0.65 | 0.950 | 0.55 | 0.944 | 0.53 | 0.916 |
R(t-2)T(t-2) | 0.57 | 0.978 | 0.60 | 0.969 | 0.61 | 0.956 | 0.50 | 0.954 | 0.46 | 0.935 |
R(t-2)T(t-3) | 0.56 | 0.978 | 0.60 | 0.97 | 0.60 | 0.957 | 0.50 | 0.954 | 0.47 | 0.932 |
R(t-2)T(t-4) | 0.56 | 0.978 | 0.60 | 0.969 | 0.61 | 0.955 | 0.51 | 0.952 | 0.49 | 0.927 |
R(t-2)T(t-5) | 0.56 | 0.978 | 0.61 | 0.968 | 0.63 | 0.953 | 0.52 | 0.949 | 0.51 | 0.921 |
R(t-3)T(t-3) | 0.53 | 0.981 | 0.57 | 0.973 | 0.57 | 0.961 | 0.47 | 0.959 | 0.45 | 0.938 |
R(t-3)T(t-4) | 0.53 | 0.98 | 0.57 | 0.972 | 0.59 | 0.960 | 0.48 | 0.957 | 0.47 | 0.933 |
R(t-3)T(t-5) | 0.54 | 0.98 | 0.59 | 0.97 | 0.60 | 0.957 | 0.50 | 0.953 | 0.49 | 0.926 |
R(t-4)T(t-4) | 0.51 | 0.982 | 0.55 | 0.974 | 0.56 | 0.963 | 0.46 | 0.961 | 0.46 | 0.937 |
R(t-4)T(t-5) | 0.53 | 0.98 | 0.58 | 0.972 | 0.59 | 0.959 | 0.49 | 0.956 | 0.48 | 0.930 |
R(t-5)T(t-5) | 0.53 | 0.981 | 0.56 | 0.973 | 0.57 | 0.961 | 0.47 | 0.959 | 0.47 | 0.933 |
R and T represent the Yangtze River and tributaries, respectively, and the lowest RMSE and highest R2 are in bold font.
Effect of input scenarios on daily water level forecasting
Scenario 3 produced the best results among all the three scenarios for all five hydrological stations (Table 3). For the training stage, the values of R2 are close to 1 for all five stations. Although scenario 3 has the highest R2, only a slight difference is observed among the three scenarios. By contrast, RMSE decreased from scenario 1 (0.29 m on average) to scenario 3 (0.14 m on average). The forecasting precision in the testing stage is relatively lower than the training stage. Similarly, R2 increased, whereas RMSE decreased from scenario 1 to scenario 3, with an average RMSE of 0.25 m. Thus, scenario 3 utilizes more information from the input data, indicating that the time lag effect of water level responses to discharge, and previous water level, should be incorporated in establishing a water level forecasting model. The RMSE decreased by 63.8%, 64.4%, 60.6%, 64.4%, and 54.7% in Hukou, Xingzi, Duchang, Tangyin, and Kangshan, respectively, from scenario 1 to scenario 3. Nevertheless, when previous water level data are missing, scenario 2 can also attain a desirable level of precision (Table 3). In addition, for the five hydrological stations within Poyang Lake, the R2 values slightly decreased from north to south (i.e., longer distance from the Yangtze River) (Table 2). Kangshan has the lowest level of forecasting precision, which indicates that the discharge of the Yangtze River greatly affects the water level within the lake and its effect gradually decreases in the upstream direction. Similar results were obtained by Li et al. (2015) using the back-propagation neural network. Thus, scenario 3 is considered the best among the three scenarios for all five water level stations. In other words, the RF algorithm and five-fold cross-validation comprise the best model to forecast the daily water level in Poyang Lake, when the inputs include the 4-day time lag of the Yangtze River, the daily discharge of the tributaries, and the previous water level within the lake.
Performance of RF in the training and testing sets using five-fold cross-validation
. | . | Output water level . | ||||
---|---|---|---|---|---|---|
Stage . | Metrics . | Hukou . | Xingzi . | Duchang . | Tangyin . | Kangshan . |
Scenario 1 | ||||||
Training | R2 | 0.994 | 0.992 | 0.989 | 0.988 | 0.984 |
RMSE | 0.31 | 0.32 | 0.32 | 0.26 | 0.24 | |
Testing | R2 | 0.966 | 0.955 | 0.94 | 0.935 | 0.914 |
RMSE | 0.69 | 0.73 | 0.71 | 0.59 | 0.53 | |
Scenario 2 | ||||||
Training | R2 | 0.996 | 0.995 | 0.993 | 0.993 | 0.988 |
RMSE | 0.23 | 0.24 | 0.25 | 0.2 | 0.2 | |
Testing | R2 | 0.982 | 0.974 | 0.963 | 0.961 | 0.937 |
RMSE | 0.51 | 0.55 | 0.56 | 0.46 | 0.36 | |
Scenario 3 | ||||||
Training | R2 | 0.999 | 0.999 | 0.998 | 0.998 | 0.997 |
RMSE | 0.11 | 0.11 | 0.13 | 0.09 | 0.11 | |
Testing | R2 | 0.996 | 0.994 | 0.991 | 0.992 | 0.983 |
RMSE | 0.25 | 0.26 | 0.28 | 0.21 | 0.24 |
. | . | Output water level . | ||||
---|---|---|---|---|---|---|
Stage . | Metrics . | Hukou . | Xingzi . | Duchang . | Tangyin . | Kangshan . |
Scenario 1 | ||||||
Training | R2 | 0.994 | 0.992 | 0.989 | 0.988 | 0.984 |
RMSE | 0.31 | 0.32 | 0.32 | 0.26 | 0.24 | |
Testing | R2 | 0.966 | 0.955 | 0.94 | 0.935 | 0.914 |
RMSE | 0.69 | 0.73 | 0.71 | 0.59 | 0.53 | |
Scenario 2 | ||||||
Training | R2 | 0.996 | 0.995 | 0.993 | 0.993 | 0.988 |
RMSE | 0.23 | 0.24 | 0.25 | 0.2 | 0.2 | |
Testing | R2 | 0.982 | 0.974 | 0.963 | 0.961 | 0.937 |
RMSE | 0.51 | 0.55 | 0.56 | 0.46 | 0.36 | |
Scenario 3 | ||||||
Training | R2 | 0.999 | 0.999 | 0.998 | 0.998 | 0.997 |
RMSE | 0.11 | 0.11 | 0.13 | 0.09 | 0.11 | |
Testing | R2 | 0.996 | 0.994 | 0.991 | 0.992 | 0.983 |
RMSE | 0.25 | 0.26 | 0.28 | 0.21 | 0.24 |
Lowest RMSE and highest R2 are in bold font.
Source of uncertainty
RF, SVR, and ANN models for forecasting water level fluctuations were compared based on continuous measured data quality controlled by the Hydrological Bureau of Jiangxi Province. The models were calibrated with five-fold cross-validation data in order to reduce the uncertainty. In addition, the training/testing data set represents relatively real-time hydrological processes in lake water level fluctuations, which incorporated the influence of lake catchment tributaries, the Yangtze River discharge, the time lag effect and the previous water level, to ensure that the model ‘gives the right answers for the right reasons’ (Kirchner 2006). Furthermore, efforts were made to increase the overall goodness of fit of each model by calibrating optimal parameters. For example, selection of hidden neurons and learning rate was based on a trial-and-error optimization process in the ANN model. Grid search cross-validation was implemented to determine optimal C and in the SVR model. Moreover, the present work established a water level forecasting model in a specific location, thereby reducing the spatial fluctuation in the output, which consequently reduces the uncertainty (Kourgialas et al. 2015). However, although the soft computing methods in the present study have desirable forecasting precision, they only consider streamflow conditions. The main uncertainty may be attributed to the influence of meteorological factors (e.g., precipitation and evaporation) and local inflows on lake water level variations. Thus, the employed models may have limitations when applied to simulate water level variations under possible climate change scenarios (Panagoulia 2006). Nevertheless, the proposed model can be used to provide management schemes under streamflow simulation scenarios (e.g., flood control and drought relief for the Poyang Lake region), specifically, through the combined discharge dispatch of upstream reservoirs and the Three Gorges Dam upstream of the Yangtze River to regulate proper water level variations within the lake.
Variability of model performance (NSCE value) for 500 pairs of parameter sets in the sensitivity analysis.
Variability of model performance (NSCE value) for 500 pairs of parameter sets in the sensitivity analysis.
Relative importance of the predictor variables
Relative importance of each predictor as determined from 100 runs of RF models for five water level stations. wz, Waizhou station; mg, Meigang station; wjb, Wanjiabu station (Xiu River); dfk, Dufengkeng station; hankou, Hankou station; ljd, Lijiadu station; hs, Hushan station; wl7, water level of previous 7 days.
Relative importance of each predictor as determined from 100 runs of RF models for five water level stations. wz, Waizhou station; mg, Meigang station; wjb, Wanjiabu station (Xiu River); dfk, Dufengkeng station; hankou, Hankou station; ljd, Lijiadu station; hs, Hushan station; wl7, water level of previous 7 days.
CONCLUSIONS
This study aimed to determine the most efficient model by comparing RF with SVR, ANN, and LM, and incorporating the time lag effect as well as previous hydrological status for forecasting the water level within lake stations.
Results demonstrated that for daily water level forecasting, the RF model can obtain more reliable and accurate forecasting results than ANN, SVR, and LM in terms of RMSE and R2. The best forecasting performance was obtained by incorporating input data with 4-day time lag of discharge from catchment tributaries and the Yangtze River, as well as the water level over the previous week, with RMSEs of 0.25, 026, 0.28, 0.21, and 0.24 m for Hukou, Xingzi, Duchang, Tangyin, and Kangshan, respectively.
In addition, variable importance analysis was implemented for each water level station using the most accurate RF model and scenario 3. Results indicated that the previous water level was the most efficient predictor for water level forecasting. Moreover, the discharge from the Yangtze River also has a fundamental effect on water level variations.
Nevertheless, meteorological factors are not included in this study, thereby unavoidably introducing uncertainty to real-time water level forecasting. Future work should fully consider the complex hydrological and hydrodynamic processes of Poyang Lake.
ACKNOWLEDGEMENTS
This work was financially supported by the National Basic Research Program of China (973 Program) (Grant 2012CB417006) and the National Scientific Foundation of China (Grant 41271500), (Grant 41571107). Special thanks to Prof. Dr Qi Zhang of the Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences for the valuable suggestions.