Abstract
Daily streamflow modeling is an important tool for water resources management and flood mitigation. This study compared the performance of the Xinanjiang (XAJ) model and random forests (RF) method in a daily streamflow simulation, and proposed several hybrid models based on the XAJ model, wavelet analysis, and RF method (including XAJ-RF model, WRF model, and XAJ-WRF model). The proposed methods were applied to Shiquan station, located in the Upper Han River basin in China. Five performance measures (NSE, RMSE, PBIAS, MAE, and R) were adopted to evaluate the modeling accuracy. Results showed that XAJ-RF model had a relatively higher level of accuracy than that of the XAJ model and the RF model. Compared to the RF and XAJ-RF models, the performance statistics of WRF and XAJ-WRF were better. The results indicated that the coupled XAJ-RF model can be effectively applied and provide a useful alternative for daily streamflow modeling and the application of wavelet analysis contributed to the increasing accuracy of streamflow modeling. Moreover, 14 wavelet functions from various families were tested to analyze the impact of various mother wavelets on the XAJ-WRF model.
HIGHLIGHTS
This study proposed several hybrid models based on the Xinanjiang (XAJ) model, wavelet analysis and the random forests (RF) method (XAJ-RF, WRF and XAJ-WRF model).
The results indicated that the XAJ-RF model can provide a useful alternative and the application of wavelet analysis contributed to the increasing accuracy in streamflow modeling.
INTRODUCTION
Daily streamflow modeling is an important issue for flood mitigation, drought modeling, and water resources management. It can help urban planners and water-related authorities in better and optimal decision-making (Noori & Kalin 2016). In general, hydrological modeling techniques can be divided into three categories: deterministic distributed models (white box models), conceptual hydrological models (grey box models), and data-driven models (black box models) (Brath et al. 2002; Zhang et al. 2018). Each kind of model has its own merits and drawbacks.
Both of the former two models are theory driven (Babovic 2005; Rezaie-Balf et al. 2017) and they can capture dominant catchment dynamics and describe hydrological sub-processes by mathematical formulas (Kavetski et al. 2006). In the deterministic distributed models like MIKE SHE and SWAT, various hydrological processes of the hydrological cycle can be integrated, considered, and simulated based on partial differential equations of conservation of mess, energy, and motion (Feyen et al. 2000). Compared to conceptual models, these models have the obvious advantage of well representing the spatial variability of the underlying surface and precipitation. However, they require a high temporal and spatial resolution of topography, precipitation, evapotranspiration, and other inputs, which have a direct impact on modeling results. The conceptual models like TANK, HBV, XAJ, use simple mathematical elements such as linear or nonlinear reservoirs and channels to describe the hydrological response in an approximate way (Chadalawada et al. 2017). Such models involve a limited number of parameters and are commonly used in practice. The Xinanjiang model (XAJ) is one of the most widely applied models since its development in the 1970s, especially in China (Zhao 1977). It is based on the concept of soil moisture replenishment, depletion, and redistribution mechanism, and runs with a smaller number of physically meaningful parameters (Ahirwar et al. 2018). In the past decades, many efforts have been directed in the field of the XAJ model and most of them have focused on model calibration (Cheng et al. 2006; Xu et al. 2013; Tian et al. 2019), model improvements (Jayawardena & Zhou 2000; Yao et al. 2012; Meng et al. 2018), and application in ungauged basins (Li et al. 2009; Yao et al. 2014; Bai et al. 2016). Gan & Biftu (1996) compared the XAJ model with the Nedbor-Afstromnings model (NAM), the Sacramento model (SMA), and the soil moisture and accounting model (SMAR) in eight catchments from different parts of the world. They concluded that XAJ seemed more versatile than other models in handling various catchment conditions because it considered the uneven distribution of runoff producing area. The XAJ model was employed by Jiang et al. (2007) for streamflow simulations in the Pearl River basin with good results. Nghi et al. (2008) presented hydrological simulations for the Nong Son catchment in central Vietnam with NAM and XAJ models. Their results showed that the XAJ model performed better reproduction of the runoff components. Rahman & Lu (2015) investigated some primary factors affecting spin-up time using the XAJ model for 22 river basins throughout the US. Based on historical climate data and two future precipitation scenarios, Yuan et al. (2016) used the XAJ model for streamflow simulation in the Pearl River basin, China. Ahirwar et al. (2018) investigated the performance of the Xinanjiang model for runoff simulation in six Indian watersheds having different climatic conditions (dry, average, and wet). Their results indicated the suitability of the XAJ model in wet and average catchment but poor in dry climate areas.
With an increase of computation capability and data availability, the data-driven models are more appealing in the field of streamflow modeling. Different to the former two approaches, the data-driven models build a functional relationship between input and output variables directly without considering the internal structure of underlying physical processes (Solomatine & Ostfeld 2008; Chadalawada et al. 2017). These models include traditional statistical techniques such as autoregressive moving average (ARMA) and emerging machine learning methods such as multivariate adaptive regression spline (MARS) (Najafzadeh & Ghaemi 2019; Sadeghi et al. 2019; Najafzadeh & Oliveto 2020), genetic programming (GP) (Liong et al. 2002; Meshgi et al. 2015), model tree (MT) (Homaei & Najafzadeh 2020), chaos theory (Sun et al. 2010; Liang et al. 2019), wavelet-based neural network (Nourani & Parhizkar 2013; Zeinolabedini & Najafzadeh 2019), and support vector machine (SVM) (Yu et al. 2004; Mehrparvar & Asghari 2018). The random forests (RF) model is one of the most commonly used machine learning methods that handles nonlinear and non-Gaussian data well. It is amenable to model interpretation and is free of over-fitting problems as the number of trees increases (Li et al. 2016). Wang et al. (2015) proposed a flood hazard risk assessment method based on RF and applied it in the Dongjiang River basin, China. Muñoz et al. (2018) developed several RF models to forecast discharge based on past precipitation and discharge information. Papacharalampous & Tyralis (2018) assessed the performance of RF and Prophet in forecasting daily streamflow up to 7 days ahead in a river in the US, pointing out that the former model worked better. Li et al. (2019) compared the accuracy of five data-driven models (ELM, ELM-kernel, RF, BPNN, and SVR) and found that the RF model was slightly superior to other models in modeling the peak flows.
As a whole, the conceptual hydrological models are limited to simplified hydrological processes and have difficulty coping with complex nonlinear relationships between environmental variables and runoff (Ren et al. 2018). However, data-driven models have good capability in capturing complex nonlinear relationships, which can compensate for the former models. The drawback of data-driven models lies in their overdependence on data and lack of consideration in physical mechanisms. To overcome the aforementioned drawbacks, there is a need to explore the continuum between theory-based and data-driven models, where both theory and data are used in a synergistic manner (Napolitano et al. 2009; Karpatne et al. 2017). Hybrid models have been investigated in water-related fields, such as modeling flow of waste water pipe (Zoran et al. 2003), error updating of numerical models (Babovic et al. 2001), and reference evapotranspiration estimation (Raza et al. 2020). However, studies of hybrid models in streamflow modeling are limited so far (Babovic 2009; Corzo & Solomatine 2014; Chadalawada et al. 2017; Ren et al. 2018). Humphrey et al. (2016) used the simulated soil moisture from GR4 J model to represent initial catchment conditions in a Bayesian artificial neural network (ANN) model. Noori & Kalin (2016) established a coupled SWAT and ANN model for enhanced daily streamflow prediction, achieving good results. More recently, Chadalawada et al. (2020) presented a novel machine learning algorithm guided through the incorporation of existing hydrological knowledge and evaluated the capabilities of the proposed hydrologically informed machine learning framework on the Blackwater River basin in the US. Ghaith et al. (2020) proposed a hybrid hydrological data-driven model and found it succeeded in improving daily streamflow prediction compared to HYMOD model. To the best of the authors’ knowledge, few works have compared the performance of the XAJ model and wavelet-based RF model (WRF) and the combination of XAJ model and WRF model for daily streamflow simulation has not been reported. In this paper, a hybrid approach of conceptual hydrological model (XAJ model) and random forest method (RF) was developed to improve the daily streamflow simulation in the upper Han River basin, China. Moreover, discrete wavelet transformation technique was further applied to improve the modeling accuracy.
The rest of the paper is organized as follows. The next section describes the Xinanjiang model, RF method, and the proposed hybrid methods. This is followed by a section introducing the study area and data collection. Next, a case study of the daily streamflow simulation and analysis of the modeling results is presented. In the final section, the conclusions are summarized.
METHODOLOGY
In this paper, five models were developed for daily streamflow modeling: (1) Xinanjiang model (XAJ); (2) random forests model (RF) with the inputs of accumulated precipitation and antecedent streamflow; (3) hybrid model (XAJ-RF) with the inputs of antecedent streamflow and simulated streamflow from the XAJ model; (4) wavelet-based RF model (WRF) using subseries of accumulated precipitation and antecedent streamflow as input variables; and (5) hybrid XAJ-WRF model using subseries of antecedent streamflow and simulated streamflow from the XAJ model as input variables. The first two were established as benchmark models for the third hybrid model, while the last two models adopted wavelet transformation technique to further improve the level of accuracy.
Xinanjiang model
The Xinanjiang model is a lumped rainfall-runoff model developed by Zhao et al. (1980). The core of this model is the concept of runoff formation on repletion of storage. That is to say, the runoff is not generated until the soil moisture content of the aeration zone reaches field capacity, and thereafter runoff equals the rainfall excess without future loss (Zhao et al. 1980). The model consists of four parts as shown in the list below: a three-layer evapotranspiration sub-model, a runoff production sub-model, a runoff separation sub-model, and a runoff concentration sub-model (Figure 1). The inputs are daily precipitation (P) and measured evapotranspiration (EM), and the output is daily streamflow (Q). More details about the XAJ model can be found in Zhao (1992).
Evapotranspiration sub-model. This module considers the soil as three vertical layers (upper, lower, and deep layers). The actual evapotranspiration is calculated from potential evapotranspiration while soil storage deficit is calculated in the three-layer soil moisture model (Yao et al. 2014).
Runoff production sub-model. The module introduced a tension water capacity curve to describe the uneven distribution of tension water capacity throughout the catchment. Runoff is produced once the tension water storage reaches the repletion point.
Runoff separation sub-model. The total runoff is separated into its three components: surface runoff, interflow, and groundwater according to different freewater storage structure.
Runoff concentration sub-model. The surface runoff directly flows into the outlet of the river, while the interflow and groundwater are routed to river channels through linear reservoirs. The discharge of sub-basins flowing to the outlet of the basin can be calculated by Muskingum routing equation.
Random forests
The RF model, proposed by Breiman (2001), is a classification and regression method based on an ensemble of decision trees. The model employs two powerful machine learning techniques: bagging and random feature selection, which are helpful to reduce over-fitting and statistical instability.
Given a training dataset (N records with M features), the RF model grows individual classification and regression trees (CART) by randomly selecting S bootstrap samples of size N (the same size as the training dataset) with replacements (Zhu & Pierskalla 2016). Generally, each bootstrap sample leaves out about one-third of the training data, which are also called the out-of-bag (OOB) observations (Xu & Valocchi 2015). The random selection from the features of the dataset is also performed by the selection of m (m < M) features as a subset; however, the parameter m is not sensitive to the final output. According to the random subspace theory, the number of selected attributes m was generally determined as the square of the number of input variables M (Liang et al. 2017). Therefore, the samples of size N with M features were randomly selected to build a single decision tree. After a large number of trees is generated, the prediction for a new data x is formed by taking the average over the trees, which is also reckoned as ensemble learning.
In this paper, the accumulated precipitation and antecedent streamflow of past 1 day, 2 days, 3 days, and 4 days (, ,,,,,,) were used as possible inputs to the RF model for daily streamflow simulation. The RF model is easy to tune and implement due to the low number of parameters (Papacharalampous & Tyralis 2018). The main parameter is the number of trees. Higher number of trees always results in more precise predictions but costs more computation time at the same time. Considering the balance between accuracy and computational cost, previous studies found that for most datasets, when the number of trees is 100, the RF accuracy can meet the requirement of the method (Papacharalampous & Tyralis 2018). Thus, S in this research is selected as 100 and the parameter setting remains unchanged in the models involved in the RF method. More details of the RF model can be found in Xu & Valocchi (2015) and Schnier & Cai (2014).
Wavelet transformation
Conventional rainfall-runoff schemes often use hydrological time series data to approximate the dynamic behavior of the hydrological system. In fact, it is difficult to extract the internal mechanism of rainfall-runoff response based on the one-dimensional series because they are controlled and influenced by complex factors (Chou 2013). In recent years, wavelet-based multi-resolution analysis has been widely used to model a hydrological system at various scales.
Hybrid models
To improve daily streamflow simulation, three hybrid models (XAJ-RF, WRF, and XAJ-WRF) were developed based on the Xinanjiang model, wavelet analysis, and random forests model. In the XAJ-RF model, two input variables, the antecedent streamflow with the highest correlation coefficient and the simulated streamflow from the XAJ model, were used as inputs to the RF model for daily streamflow simulation. In the WRF model, the accumulated precipitation and antecedent streamflow term with the highest correlation coefficient were decomposed by various mother wavelets; thus, eight subseries acted as input variables to the RF model. Compared to the WRF model, the accumulated precipitation was replaced by the simulated streamflow from the XAJ model in the XAJ-WRF model. The input variables of the RF model and hybrid models are shown in Table 1.
Models . | aP . | aS . | Sxaj . | wP . | wS . | wSxaj . |
---|---|---|---|---|---|---|
RF | √ | √ | ||||
XAJ-RF | √ | √ | ||||
WRF | √ | √ | ||||
XAJ-WRF | √ | √ |
Models . | aP . | aS . | Sxaj . | wP . | wS . | wSxaj . |
---|---|---|---|---|---|---|
RF | √ | √ | ||||
XAJ-RF | √ | √ | ||||
WRF | √ | √ | ||||
XAJ-WRF | √ | √ |
aP: accumulated precipitation; aS: antecedent streamflow; Sxaj: simulated streamflow by XAJ; wP: subseries of accumulated precipitation obtained by wavelet analysis; wS: subseries of antecedent streamflow obtained by wavelet analysis; wSxaj: subseries of simulated streamflow by XAJ obtained by wavelet analysis.
Performance measures
Very good: ;
Good: ;
Satisfactory: ;
Unsatisfactory: ;
STUDY AREA AND DATA
Han River originates in the Qinling Mountains in Shanxi Province and joins the Yangtze River at Wuhan City in Hubei Province, with a total length of 1,577 km. It is the largest tributary of the Yangtze River and covers a total area of approximately 159,000 km2. The Han River basin belongs to the subtropical monsoon climate zone of East Asia. It is worth noting that the Han River is not only a vital grain production region and economic center, but also a source of the middle route of the South-to-North Water Transfer Project (SNWTP).
The study area, Shiquan catchment, is a sub-basin situated in the upper Han River basin, spanning an area of 23,805 km2 and with an altitudinal range of 357–3,549 (Figure 2). The catchment receives an annual precipitation amount of approximately 800 mm, belonging to the humid area in China. The rainfall distribution is quite uneven over a year in that precipitation in summer accounts for over 70%.
The presented models were applied to daily streamflow simulation in Shiquan catchment. In this paper, the daily precipitation, evapotranspiration, and streamflow observations from 2004 to 2014 were collected from the China Hydrological Yearbook. There were no missing values in the data series.
RESULTS AND DISCUSSION
XAJ model simulation
The daily observations from 2004 to 2014 were split into two subseries: the data from 2004 to 2012 were used to calibrate the parameters in the XAJ model, while the remaining two years’ data were used for model validation. In this paper, the multi-objective optimization technique was employed to find the optimal values of the associated parameters. First, the measures of NSE, RMSE, PBIAS, MAE, and R were taken into consideration. Then, these measures were assigned with equal weights and converted to a comprehensive measure for parameter optimization. The optimized parameters of the XAJ model are shown in Table 2 and the comparison of observation and simulation are displayed in Figure 3. As the figure depicts, the observed daily streamflow and simulated values from the XAJ model exhibited good fitness. The measure values of NSE, RMSE, PBIAS, MAE, and R under optimized parameters during calibration were 0.85, 239 m3/s, 1.74%, 113 m3/s, and 0.92, respectively, while those values during validation were 0.80, 254 m3/s, −8.34%, 133 m3/s, and 0.92, respectively. Overall, the calibration and validation results of the XAJ model applied to Shiquan station were good. Thus, the corresponding parameters and simulation results were applicable in this study.
Sub-model . | Parameter . | Description . | Value . |
---|---|---|---|
Evapotranspiration | KC | Ratio of potential evapotranspiration to pan evaporation | 1.030 |
UM | Tension water capacity of upper layer | 10.012 | |
LM | Tension water capacity of lower layer | 66.777 | |
C | Coefficient of deep evapotranspiration | 0.157 | |
Runoff production | WM | Areal mean tension water capacity | 169.739 |
B | Exponent of the tension water capacity curve | 0.400 | |
IM | Ratio of the impervious to the total area of the basin | 0.010 | |
Runoff separation | SM | Free water capacity | 19.999 |
EX | Exponent of the freewater capacity curve | 1.250 | |
KG | Outflow coefficient of the freewater storage to groundwater | 0.284 | |
KI | Outflow coefficient of the freewater storage to interflow | 0.416 | |
Runoff concentration | CI | Recession constant of inflow storage | 0.848 |
CG | Recession constant of groundwater storage | 0.996 | |
CS | Recession constant in the lag-and-route method | 0.232 |
Sub-model . | Parameter . | Description . | Value . |
---|---|---|---|
Evapotranspiration | KC | Ratio of potential evapotranspiration to pan evaporation | 1.030 |
UM | Tension water capacity of upper layer | 10.012 | |
LM | Tension water capacity of lower layer | 66.777 | |
C | Coefficient of deep evapotranspiration | 0.157 | |
Runoff production | WM | Areal mean tension water capacity | 169.739 |
B | Exponent of the tension water capacity curve | 0.400 | |
IM | Ratio of the impervious to the total area of the basin | 0.010 | |
Runoff separation | SM | Free water capacity | 19.999 |
EX | Exponent of the freewater capacity curve | 1.250 | |
KG | Outflow coefficient of the freewater storage to groundwater | 0.284 | |
KI | Outflow coefficient of the freewater storage to interflow | 0.416 | |
Runoff concentration | CI | Recession constant of inflow storage | 0.848 |
CG | Recession constant of groundwater storage | 0.996 | |
CS | Recession constant in the lag-and-route method | 0.232 |
Predictor identification
Two types of data, accumulated precipitation and antecedent streamflow, are commonly used in data-driven models for daily streamflow modeling. In this paper, accumulated precipitation and antecedent streamflow of the past 1 day, 2 days, 3 days, and 4 days (, ,,,,,,) were used as possible factors. The correlation coefficient, autocorrelation function (ACF), and partial autocorrelation function (PACF) were adopted to choose the most relevant factors for streamflow simulation (Özger et al. 2020). The results are shown in Table 3 and Figure 4.
. | . | . | . | . | . | . | . | . | Qt . |
---|---|---|---|---|---|---|---|---|---|
1 | |||||||||
0.920 | 1 | ||||||||
0.823 | 0.945 | 1 | |||||||
0.755 | 0.873 | 0.960 | 1 | ||||||
0.478 | 0.625 | 0.671 | 0.682 | 1 | |||||
0.261 | 0.428 | 0.569 | 0.623 | 0.764 | 1 | ||||
0.179 | 0.256 | 0.400 | 0.529 | 0.543 | 0.764 | 1 | |||
0.170 | 0.198 | 0.262 | 0.386 | 0.440 | 0.543 | 0.764 | 1 | ||
Qt | 0.676 | 0.700 | 0.699 | 0.696 | 0.764 | 0.543 | 0.440 | 0.404 | 1 |
. | . | . | . | . | . | . | . | . | Qt . |
---|---|---|---|---|---|---|---|---|---|
1 | |||||||||
0.920 | 1 | ||||||||
0.823 | 0.945 | 1 | |||||||
0.755 | 0.873 | 0.960 | 1 | ||||||
0.478 | 0.625 | 0.671 | 0.682 | 1 | |||||
0.261 | 0.428 | 0.569 | 0.623 | 0.764 | 1 | ||||
0.179 | 0.256 | 0.400 | 0.529 | 0.543 | 0.764 | 1 | |||
0.170 | 0.198 | 0.262 | 0.386 | 0.440 | 0.543 | 0.764 | 1 | ||
Qt | 0.676 | 0.700 | 0.699 | 0.696 | 0.764 | 0.543 | 0.440 | 0.404 | 1 |
Qt represents the observed daily streamflow at t.
It can be concluded from Table 3 that the accumulated precipitation of past 2 days and the antecedent streamflow of past 1 day had a correlation coefficient over 0.7. Moreover, the relevant antecedent streamflow can be identified by ACF. As shown in Figure 4, the ACF values were high until lag-3, and the partial autocorrelation function (PACF) was truncated at lag-3. It is indicated that the current daily streamflow has a relatively stronger relationship with the previous three daily streamflows. Thus, the models were established with one lag time value at first, then one and two lag times were used, and finally one, two, and three lag times were used as inputs (Table 4). The performance statistics of different input variables for RF models are presented in Table 5. It is observed that model 7 yielded the highest NSE (0.889) and R (0.945) and lowest RMSE (204 m3/s) and MAE (71 m3/s) among these RF models. Meanwhile, its PBIAS value was also very low (0.149%). As well, model 1 performed better than models 2, 3, and 5 on all evaluation statistics, indicating that Qt−1 was very important in the prediction of Qt. The performance of models 1, 4, and 6 had a similar level of accuracy, while model 7 outperformed them all. Ultimately, model 7 outperformed models 1–6, thus , Qt−1, Qt−2, Qt−3 were used for streamflow simulation in this paper.
Considered time lags . | Model number . | Input variables . | Output variable . |
---|---|---|---|
One time lag | Model 1 | , Qt−1 | Qt |
Model 2 | , Qt−2 | Qt | |
Model 3 | , Qt−3 | Qt | |
Two time lags | Model 4 | , Qt−1, Qt−2 | Qt |
Model 5 | , Qt−2, Qt−3 | Qt | |
Model 6 | , Qt−1, Qt−3 | Qt | |
Three time lags | Model 7 | , Qt−1, Qt−2, Qt−3 | Qt |
Considered time lags . | Model number . | Input variables . | Output variable . |
---|---|---|---|
One time lag | Model 1 | , Qt−1 | Qt |
Model 2 | , Qt−2 | Qt | |
Model 3 | , Qt−3 | Qt | |
Two time lags | Model 4 | , Qt−1, Qt−2 | Qt |
Model 5 | , Qt−2, Qt−3 | Qt | |
Model 6 | , Qt−1, Qt−3 | Qt | |
Three time lags | Model 7 | , Qt−1, Qt−2, Qt−3 | Qt |
Models . | Model 1 . | Model 2 . | Model 3 . | Model 4 . | Model 5 . | Model 6 . | Model 7 . |
---|---|---|---|---|---|---|---|
Statistics | |||||||
NSE | 0.858 | 0.852 | 0.839 | 0.860 | 0.835 | 0.858 | 0.889 |
RMSE | 230 m3/s | 235 m3/s | 245 m3/s | 229 m3/s | 249 m3/s | 230 m3/s | 204 m3/s |
PBIAS | 0.093% | − 0.477% | − 0.699% | − 0.190% | 0.162% | − 0.467% | 0.149% |
MAE | 81 m3/s | 95 m3/s | 103 m3/s | 79 m3/s | 93 m3/s | 80 m3/s | 71 m3/s |
R | 0.928 | 0.924 | 0.917 | 0.930 | 0.921 | 0.928 | 0.945 |
Models . | Model 1 . | Model 2 . | Model 3 . | Model 4 . | Model 5 . | Model 6 . | Model 7 . |
---|---|---|---|---|---|---|---|
Statistics | |||||||
NSE | 0.858 | 0.852 | 0.839 | 0.860 | 0.835 | 0.858 | 0.889 |
RMSE | 230 m3/s | 235 m3/s | 245 m3/s | 229 m3/s | 249 m3/s | 230 m3/s | 204 m3/s |
PBIAS | 0.093% | − 0.477% | − 0.699% | − 0.190% | 0.162% | − 0.467% | 0.149% |
MAE | 81 m3/s | 95 m3/s | 103 m3/s | 79 m3/s | 93 m3/s | 80 m3/s | 71 m3/s |
R | 0.928 | 0.924 | 0.917 | 0.930 | 0.921 | 0.928 | 0.945 |
Analysis of modeling results
The wavelet decomposition method was introduced to improve the accuracy of the developed model. First, the relevant factors such as accumulated precipitation, antecedent streamflow, and simulated streamflow from the XAJ model, were transformed to approximate component and detail components under the decomposition level. Therefore, the number of input variables increased from 4 to 20 in the hybrid models (WRF and XAJ-WRF models). Quantitative results of the XAJ model, RF model, and hybrid models are presented in Table 6. In fact, a wide range of mother wavelets employed on the time series can provide various results. Therefore, 14 wavelet functions, which originated from five various wavelet families with different orders of subclass, were investigated. The performance statistics of WRF and XAJ-WRF models were the average value of the 14 cases with different wavelet functions.
Models . | NSE . | RMSE . | PBIAS . | MAE . | R . | F0 . |
---|---|---|---|---|---|---|
XAJ | 0.843 | 242 m3/s | 0.058% | 117 m3/s | 0.921 | 1.315 |
RF | 0.889 | 204 m3/s | 0.149% | 71 m3/s | 0.945 | 1.178 |
XAJ-RF | 0.916 | 177 m3/s | − 0.907% | 66 m3/s | 0.958 | 0.802 |
WRF | 0.942 | 148 m3/s | 0.531% | 44 m3/s | 0.975 | 0.613 |
XAJ-WRF | 0.947 | 141 m3/s | 0.579% | 44 m3/s | 0.976 | 0.647 |
Models . | NSE . | RMSE . | PBIAS . | MAE . | R . | F0 . |
---|---|---|---|---|---|---|
XAJ | 0.843 | 242 m3/s | 0.058% | 117 m3/s | 0.921 | 1.315 |
RF | 0.889 | 204 m3/s | 0.149% | 71 m3/s | 0.945 | 1.178 |
XAJ-RF | 0.916 | 177 m3/s | − 0.907% | 66 m3/s | 0.958 | 0.802 |
WRF | 0.942 | 148 m3/s | 0.531% | 44 m3/s | 0.975 | 0.613 |
XAJ-WRF | 0.947 | 141 m3/s | 0.579% | 44 m3/s | 0.976 | 0.647 |
Table 6 indicates that the XAJ-RF model had a relatively higher level of accuracy (NSE = 0.916, RMSE = 177 m3/s, and MAE = 66 m3/s) than that of the XAJ model (NSE = 0.843, RMSE = 242 m3/s, and MAE = 117 m3/s) and RF model (NSE = 0.889, RMSE = 204 m3/s, and MAE = 71 m3/s) in most performance statistics. However, the PBIAS of the XAJ-RF model was slightly larger than those of the XAJ model and RF model. Moreover, compared to the RF and XAJ-RF models, the NSE values of WRF and XAJ-WRF were raised from 0.889 to 0.942 and from 0.916 to 0.947, respectively. Other performance statistics, such as MAE and R, also showed slight improvements. The MAE of the XAJ model was decreased to 43 and 62% by the XAJ-RF model and XAJ-WRF model, respectively. The PBIAS of the wavelet-based models (WRF and XAJ-WRF) was within the range of ±6%. Concerning F0.05,6,49, F0 values obtained by the five models have accepted the hypothesis and indicated accurate estimation of daily streamflow modeling. Overall, these statistics indicate that the application of wavelet analysis contributed to the increasing accuracy of hydrological modeling.
The scatter plots between observed daily streamflow and simulated daily streamflow are displayed in Figure 5. It can be observed that the XAJ-WRF model showed good agreement of observations and simulations among all proposed models, particularly in medium streamflow, but it seems there was no significant improvement in high streamflow simulation over 8,000 m3/s. The possible reason lies in the fact that the RF model always provided final results by averaging all trees for regression tasks. Therefore, the performance of the RF method on simulating extreme values is not so good. As well, compared to the other models, the scatters of observed daily streamflow with simulated values from XAJ-WRF and WRF models were distributed symmetrically and more concentrated on both sides of the 1:1 line.
Figures 6 and 7 present the NSE, RMSE, and R values of XAJ, RF, XAJ-RF, WRF, and XAJ-WRF models in yearly scales. It was observed from Figure 6 that the XAJ-WRF model yielded the highest level of NSE values, with ten years NSEs exceeding 0.9. The NSEs of WRF and XAJ-RF model were slightly lower than those of the XAJ-WRF model, with nine and seven years NSEs exceeding 0.9. In most years, the RF model has higher NSEs than the XAJ model. As depicted in Figure 7, the XAJ model and RF model achieved higher RMSEs than hybrid models in most of the period except 2004, 2009, and 2014. Thus, hybrid models demonstrated better performance in this regard. In contrast to the XAJ-RF model, the yearly RMSE values increased 35% in the XAJ-WRF model on average. In conclusion, these results indicate that the XAJ-RF model worked better than the other benchmark models (XAJ and RF) and the introduction of wavelet transform technique can effectively enhance the level of modeling accuracy. The reasons may lie in the fact that wavelet analysis can reveal the temporal variation of the rainfall-runoff relationship and capture the small but meaningful characteristics in the datasets.
Comparison of various wavelet functions
To investigate the impact of various wavelet functions on WRF and XAJ-WRF model, 14 wavelet functions which originated from five various wavelet families with different orders of subclass were employed. The simulation results are listed in Table 7.
Wavelet function . | WRF model . | XAJ-WRF model . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NSE . | RMSE . | PBIAS . | MAE . | R . | F0 . | NSE . | RMSE . | PBIAS . | MAE . | R . | F0 . | |
haar | 0.941 | 148 m3/s | 0.333% | 42 m3/s | 0.973 | 0.594 | 0.948 | 140 m3/s | 0.577% | 42 m3/s | 0.976 | 0.517 |
db1 | 0.946 | 142 m3/s | 0.529% | 41 m3/s | 0.976 | 0.552 | 0.949 | 138 m3/s | 0.644% | 41 m3/s | 0.977 | 0.499 |
db2 | 0.941 | 148 m3/s | 0.327% | 42 m3/s | 0.974 | 0.610 | 0.950 | 137 m3/s | 0.504% | 42 m3/s | 0.977 | 0.500 |
db3 | 0.942 | 147 m3/s | 0.383% | 43 m3/s | 0.975 | 0.602 | 0.948 | 140 m3/s | 0.588% | 43 m3/s | 0.977 | 0.527 |
db4 | 0.941 | 148 m3/s | 0.607% | 44 m3/s | 0.975 | 0.621 | 0.946 | 142 m3/s | 0.426% | 44 m3/s | 0.976 | 0.543 |
coif1 | 0.944 | 145 m3/s | 0.739% | 42 m3/s | 0.976 | 0.594 | 0.950 | 136 m3/s | 0.731% | 42 m3/s | 0.978 | 0.504 |
coif2 | 0.941 | 148 m3/s | 0.492% | 43 m3/s | 0.974 | 0.612 | 0.950 | 137 m3/s | 0.500% | 44 m3/s | 0.977 | 0.505 |
coif3 | 0.939 | 150 m3/s | 0.793% | 45 m3/s | 0.975 | 0.650 | 0.945 | 143 m3/s | 0.622% | 45 m3/s | 0.976 | 0.565 |
coif4 | 0.942 | 147 m3/s | 0.315% | 46 m3/s | 0.975 | 0.600 | 0.940 | 150 m3/s | 0.605% | 47 m3/s | 0.973 | 0.621 |
sym2 | 0.941 | 148 m3/s | 0.556% | 42 m3/s | 0.975 | 0.619 | 0.951 | 135 m3/s | 0.622% | 42 m3/s | 0.978 | 0.489 |
sym3 | 0.942 | 147 m3/s | 0.703% | 42 m3/s | 0.976 | 0.618 | 0.948 | 139 m3/s | 0.674% | 43 m3/s | 0.977 | 0.530 |
sym4 | 0.943 | 146 m3/s | 0.590% | 43 m3/s | 0.976 | 0.602 | 0.945 | 143 m3/s | 0.753% | 44 m3/s | 0.976 | 0.565 |
sym5 | 0.942 | 147 m3/s | 0.437% | 44 m3/s | 0.975 | 0.603 | 0.948 | 139 m3/s | 0.499% | 45 m3/s | 0.977 | 0.519 |
dmey | 0.934 | 158 m3/s | 0.635% | 50 m3/s | 0.971 | 0.706 | 0.937 | 153 m3/s | 0.362% | 51 m3/s | 0.971 | 0.647 |
Wavelet function . | WRF model . | XAJ-WRF model . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NSE . | RMSE . | PBIAS . | MAE . | R . | F0 . | NSE . | RMSE . | PBIAS . | MAE . | R . | F0 . | |
haar | 0.941 | 148 m3/s | 0.333% | 42 m3/s | 0.973 | 0.594 | 0.948 | 140 m3/s | 0.577% | 42 m3/s | 0.976 | 0.517 |
db1 | 0.946 | 142 m3/s | 0.529% | 41 m3/s | 0.976 | 0.552 | 0.949 | 138 m3/s | 0.644% | 41 m3/s | 0.977 | 0.499 |
db2 | 0.941 | 148 m3/s | 0.327% | 42 m3/s | 0.974 | 0.610 | 0.950 | 137 m3/s | 0.504% | 42 m3/s | 0.977 | 0.500 |
db3 | 0.942 | 147 m3/s | 0.383% | 43 m3/s | 0.975 | 0.602 | 0.948 | 140 m3/s | 0.588% | 43 m3/s | 0.977 | 0.527 |
db4 | 0.941 | 148 m3/s | 0.607% | 44 m3/s | 0.975 | 0.621 | 0.946 | 142 m3/s | 0.426% | 44 m3/s | 0.976 | 0.543 |
coif1 | 0.944 | 145 m3/s | 0.739% | 42 m3/s | 0.976 | 0.594 | 0.950 | 136 m3/s | 0.731% | 42 m3/s | 0.978 | 0.504 |
coif2 | 0.941 | 148 m3/s | 0.492% | 43 m3/s | 0.974 | 0.612 | 0.950 | 137 m3/s | 0.500% | 44 m3/s | 0.977 | 0.505 |
coif3 | 0.939 | 150 m3/s | 0.793% | 45 m3/s | 0.975 | 0.650 | 0.945 | 143 m3/s | 0.622% | 45 m3/s | 0.976 | 0.565 |
coif4 | 0.942 | 147 m3/s | 0.315% | 46 m3/s | 0.975 | 0.600 | 0.940 | 150 m3/s | 0.605% | 47 m3/s | 0.973 | 0.621 |
sym2 | 0.941 | 148 m3/s | 0.556% | 42 m3/s | 0.975 | 0.619 | 0.951 | 135 m3/s | 0.622% | 42 m3/s | 0.978 | 0.489 |
sym3 | 0.942 | 147 m3/s | 0.703% | 42 m3/s | 0.976 | 0.618 | 0.948 | 139 m3/s | 0.674% | 43 m3/s | 0.977 | 0.530 |
sym4 | 0.943 | 146 m3/s | 0.590% | 43 m3/s | 0.976 | 0.602 | 0.945 | 143 m3/s | 0.753% | 44 m3/s | 0.976 | 0.565 |
sym5 | 0.942 | 147 m3/s | 0.437% | 44 m3/s | 0.975 | 0.603 | 0.948 | 139 m3/s | 0.499% | 45 m3/s | 0.977 | 0.519 |
dmey | 0.934 | 158 m3/s | 0.635% | 50 m3/s | 0.971 | 0.706 | 0.937 | 153 m3/s | 0.362% | 51 m3/s | 0.971 | 0.647 |
As seen in Table 7, the XAJ-WRF model developed by sym2 wavelet function had the best efficiency (NSE = 0.951, RMSE = 135 m3/s, and R = 0.978). Compared to the XAJ-RF model, its corresponding increases in NSE and R values were 0.035 and 0.02, respectively, whereas RMSE and MAE values decreased 42 m3/s and 24 m3/s, respectively. The XAJ-WRF model using dmey wavelet function stood as the worst performance among 14 mother wavelets (NSE = 0.937, RMSE = 153 m3/s, and R = 0.971), but it was still better than that of the XAJ-RF model. The NSE, PBIAS, and R of the XAJ-RF model were raised from 0.916 to 0.937, −0.907% to 0.362%, and 0.958 to 0.971, respectively. Meanwhile, the RMSE and MAE dropped by 14 and 16%, respectively. As a whole, the impact of various mother wavelets on the XAJ-WRF model was limited in this research. As shown in Table 8, the interval of variation of RMSE, MAE, and R values was small. The changing intervals of NSE and PBIAS values were relatively larger. The NSE and PBIAS values of the WRF model varied from 0.934 to 0.946 and 0.315% to 0.793%, whereas those of XAJ-WRF varied from 0.937 to 0.951 and 0.362% to 0.753%, respectively. Overall, the XAJ-WRF model worked better than XAJ-RF and WRF models. Moreover, the proposed model can be applied to daily streamflow prediction once the future precipitation during the lead time can be obtained. The XAJ-RF model can run for operational streamflow forecasting.
Performance statistics . | WRF model . | XAJ-WRF model . | ||
---|---|---|---|---|
Max . | Min . | Max . | Min . | |
NSE | 0.946 | 0.934 | 0.951 | 0.937 |
RMSE | 158 m3/s | 142 m3/s | 153 m3/s | 135 m3/s |
PBIAS | 0.793% | 0.315% | 0.753% | 0.362% |
MAE | 50 m3/s | 41 m3/s | 51 m3/s | 41 m3/s |
R | 0.976 | 0.971 | 0.978 | 0.971 |
Performance statistics . | WRF model . | XAJ-WRF model . | ||
---|---|---|---|---|
Max . | Min . | Max . | Min . | |
NSE | 0.946 | 0.934 | 0.951 | 0.937 |
RMSE | 158 m3/s | 142 m3/s | 153 m3/s | 135 m3/s |
PBIAS | 0.793% | 0.315% | 0.753% | 0.362% |
MAE | 50 m3/s | 41 m3/s | 51 m3/s | 41 m3/s |
R | 0.976 | 0.971 | 0.978 | 0.971 |
CONCLUSIONS
This paper compared the performance of the XAJ model and RF method in daily streamflow simulation, and proposed several hybrid models based on XAJ model, wavelet analysis, and RF method. In this study, five models were developed for daily streamflow modeling: (1) Xinanjiang model (XAJ); (2) random forests model (RF) with the inputs of accumulated precipitation and antecedent streamflow; (3) hybrid model (XAJ-RF), with the inputs of antecedent streamflow and simulated streamflow from the XAJ model; (4) wavelet-based RF model (WRF) using subseries of accumulated precipitation and antecedent streamflow as input variables; and (5) hybrid XAJ-WRF model using subseries of antecedent streamflow and simulated streamflow from the XAJ model as input variables. The first two were established as benchmark models for the third hybrid model, while the last two models adopted wavelet transformation technique to further improve the level of accuracy. The proposed methods were applied to Shiquan station, located in the Upper Han River basin in China. Based on the results of the case study, the following conclusions were drawn.
The case study indicated that the XAJ-RF model had a relatively higher level of accuracy (NSE = 0.916, RMSE = 177 m3/s, and MAE = 66 m3/s) than that of the XAJ model (NSE = 0.843, RMSE = 242 m3/s, and MAE = 117 m3/s) and RF model (NSE = 0.889, RMSE = 204 m3/s, and MAE = 71 m3/s). These results indicated that the coupled XAJ-RF model can be effectively applied, providing a useful alternative for daily streamflow simulation.
To investigate the impact of various mother wavelets, 14 wavelet functions from Haar, Daubechies, Coiflets, Symlets and DMeyer families were tested, and acted as preprocessors to the input variables of the RF model. In contrast to the XAJ-RF model, the yearly RMSE values increased 35% by XAJ-WRF model on average. The XAJ-WRF model developed by sym2 wavelet function had the best efficiency (NSE = 0.951, RMSE = 135 m3/s, and R = 0.978). Moreover, even for the worst XAJ-WRF model with dmey wavelet function, the NSE, PBIAS, and R of the XAJ-RF model were raised from 0.916 to 0.937, −0.907% to 0.362%, and 0.958 to 0.971, respectively. Therefore, the introduction of wavelet analysis made contributions to the increased accuracy.
With the introduction of numerical weather prediction or precipitation forecasting, the proposed method can be used for daily streamflow prediction with different lead times. It can also be applied for operational forecasting.
ACKNOWLEDGEMENTS
This work was supported by the Special Foundation for National Program on Key Basic Research Project (2016|YFC0402703, 2019|YFC0409005, 2019YFC0409003), The Fundamental Research Funds for the Central Universities (B200201027), National Natural Science Foundation of China (51709077) and Open Research Fund of the Yellow River Sediment Key Laboratory of Ministry of Water Resources (201804).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.