Abstract

Due to the large uncertainties of long-term precipitation prediction and reservoir operation, it is difficult to forecast long-term streamflow for large basins with cascade reservoirs. In this paper, a framework coupling the original Climate Forecasting System (CFS) precipitation with the Soil and Water Assessment Tool (SWAT) was proposed to forecast the nine-month streamflow for the Cascade Reservoir System of Han River (CRSHR) including Shiquan, Ankang and Danjiangkou reservoirs. First, CFS precipitation was tested against the observation and post-processed through two machine learning algorithms, random forest and support vector regression. Results showed the correlation coefficients between the monthly areal CFS precipitation (post-processed) and observation were 0.91–0.96, confirming that CFS precipitation post-processing using machine learning was not affected by the extended forecast period. Additionally, two precipitation spatio-temporal distribution models, original CFS and similar historical observation, were adopted to disaggregate the processed monthly areal CFS precipitation to daily subbasin-scale precipitation. Based on the reservoir restoring flow, the regional SWAT was calibrated for CRSHR. The Nash–Sutcliffe efficiencies for three reservoirs flow simulation were 0.86, 0.88 and 0.84, respectively, meeting the accuracy requirement. The experimental forecast showed that for three reservoirs, long-term streamflow forecast with similar historical observed distribution was more accurate than that with original CFS.

INTRODUCTION

Long-term streamflow forecasting plays an important role in flood control, drought prediction, reservoir operation, efficient water use, etc. How to construct appropriate long-term hydrological forecasting models to meet accuracy requirements has always been one of the key issues researched by hydrologists. A long-term streamflow forecast is defined as a forecast at the monthly, seasonal, or yearly scale, and its lead time is greater than the maximum watershed confluence time (Liang et al. 2018a). There are many studies in long-term hydrological forecasting (Yang et al. 2005; Li et al. 2014a; Anghileri et al. 2016; Hong et al. 2016; Yaseen et al. 2016; Chu et al. 2017; Dariane et al. 2018). However, the research on long-term streamflow forecasting for basins with intensive cascade reservoir system influences is limited.

Currently, long-term hydrological forecasting models are mainly based on the statistical relationship between predictors and predictands or the time series changes in runoff. The predictors could be circulation characteristics, sea surface temperature and hydro-meteorological factors. However, the drawback of using this approach is the lack of a physical basis. Another physically based approach to forecast long-term runoff couples a hydrological model with numerical weather prediction (NWP) results (Wood et al. 2002; Yang et al. 2005). However, there are some limitations to applying this method. It is well acknowledged that the direct outputs from NWPs are inadequate as the input for hydrological models for long-term hydrological forecasting at regional scales. This inadequacy is primarily due to two reasons, as described below.

First, there are doubts about the reliability of some land surface variables output from NWPs (particular variables, such as basin precipitation and surface runoff, which critically depend on sub-grid-scale processes) (Risbey & Stone 1996). Many studies demonstrate that the accuracy of NWP varies with the spatial-temporal scale and forecast period (Bauer et al. 2015). It is acknowledged that the prediction accuracy will increase as the spatial scale increases or the lead time decreases. Specifically, compared with the daily point precipitation forecasting, the monthly area precipitation prediction is more accurate (Hulme 1994). However, in order to achieve an accurate hydrological simulation, the NWP results still require processing.

Second, the spatial resolution of NWPs (such as the Climate Forecasting System (CFS) outputs, which typically have a spatial resolution of 10,000 km2) is larger than that required for input into a hydrological model at smaller scales (on the catchment or basin scales of 102–103km2) (Yuan et al. 2011). Therefore, in the coupled atmospheric-hydrological model, the difficulty in bridging the gap between the coarse resolution of NWPs and fine resolution of hydrological models, known as ‘downscaling for hydrological impact studies’, needs to be resolved (Wilby et al. 2000; Wood et al. 2004). There are mainly two downscaling approaches. Some researchers apply regional climate models, i.e. dynamic downscaling, to translate historical reanalysis or future prediction output into local meteorological forcing for hydrologic models (Kim et al. 2000; Wilby et al. 2000). Another widely used method is the statistical-based approach. The advantages and disadvantages of these two approaches have been thoroughly documented (Wilby & Wigley 1997; Fowler et al. 2007). Compared to the dynamic model-based alternative, the key advantage of the statistical approach is the lower computational requirement (Wilby et al. 1998).

The classical statistical framework, i.e. bias correction and spatial disaggregation (BCSD), uses quantile mapping and spatial interpolation to solve the bias and temporal-spatial mismatch (Wood et al. 2002; Ines & Hansen 2006; Piani et al. 2010; Jiang et al. 2013). Our study adopts this framework to couple the CFS precipitation with the hydrological model. First, the prerequisite of this correction is that the cumulative distribution functions (CDFs) of the measured and forecasted data need to be determined. However, CDFs are uncertain and have a significant impact on the correction results (Li et al. 2010; Chen et al. 2015). By contrast, the post-processing of the original forecasted prediction could be more direct if based on the regression relationship between the raw prediction and the observation. Currently, whether this regression takes on a linear or non-linear relationship, machine learning algorithms can characterize them well (Vandal et al. 2017). Therefore, machine learning algorithms, such as random forest (RF) (Liang et al. 2018b) and support vector regression (SVR) (Adnan et al. 2017), are introduced. In this study, we apply RF and SVR to process the forecasted precipitation from CFS outputs. Second, in the BCSD method, usually spatial interpolation or random sampling is used to execute spatio-temporal disaggregation (Wood et al. 2004). However, research on downscaling approaches considering the spatial-temporal distribution effect of precipitation on the hydrological response process is rare. This research involves the selection of a hydrological model. Currently, many hydrological models have been developed to simulate long-term runoff (Wood et al. 2004; Yang et al. 2008; Li et al. 2014b). For cascade reservoir systems, inflow to the downstream reservoir consists of two parts: one is the flow from the upstream reservoir, which is managed by the operation strategy of the reservoir and is unpredictable; the other is the raw runoff from intermediate areas between the adjacent dams and could be forecasted to assist in cascade reservoir operation. In this paper, to simulate the runoff of different intermediate areas, the Soil and Water Assessment Tool (SWAT), a physically semi-distributed hydrological watershed model, was used. The SWAT model has been widely applied to runoff simulation by many researchers (Kardhana et al. 2017). The two advantages of the SWAT model are that (a) the characteristics of runoff yield and concentration for different intermediate areas can be described with different parameters suites and (b) daily precipitation input at the subbasin scale can describe the spatio-temporal distribution of precipitation over the basin. However, it is difficult to predict daily precipitation in a subbasin. In this study, typical precipitation from two spatio-temporal distribution models, an original CFS forecast and the most similar historical observations, was scaled to forecast the daily precipitation in the subbasin.

This paper describes an exploratory long-term monthly streamflow forecasting model for the Cascade Reservoir System of Han River (CRSHR); this model combines the CFS outputs and SWAT model based on two machine learning algorithms, SVR and RF. The paper is organized as follows: the next section introduces the methodology and data, respectively. In the following section, the forecasted monthly CFS precipitation is processed and disaggregated spatio-temporally, and the streamflow forecast of reservoirs for the CRSHR is obtained. The conclusions are presented in the final section.

METHODOLOGY

In our forecast scheme, after original CFS precipitation post-processing, the daily subbasin precipitation is determined by downscaling the monthly area precipitation with typical scaling. Other meteorological elements (temperature, wind speed, relative humidity and solar radiation) are produced with the SWAT weather generator (Neitsch et al. 2011). Then, the nine-month meteorological driving forcings for the SWAT model are obtained. Finally, nine-month streamflow forecast is produced by driving the initialized SWAT model coupled with the predicted meteorological elements.

CFS precipitation post-processing

The post-processing technology is based on the assumption that either a stable linear or non-linear relationship could be developed between the CFS surface forecast fields and observed areal climatology. To reduce the prediction uncertainties, this study uses two machine learning algorithms (RF and SVR) to process the forecasted CFS precipitation to determine the monthly areal precipitation. Based on the positive correlation between temperature and precipitation (Trenberth & Shea 2005), the temperature forecasting was relatively stable and treated as a covariate in the processing procedure to reduce anomalies. Considering that the accuracy of precipitation forecasting will decrease as lead time is extended (Li et al. 2017), the regression relationships between forecasts for different lead times with the corresponding observations are constructed and investigated. The principles of the two algorithms used in this study are described as follows:

  1. RF is a machine learning algorithm that can train and determine the predictand with a classifier consisting of the classification and regression tree (CART) produced with sampling (Breiman 1996a). First, this algorithm adapts a bootstrap re-sampling approach (Breiman 1996b) and out-of-bag (OOB) error estimation (Breiman 1996a) to sample the original data and calculate the generalization error. Second, it uses decision tree analysis, random subspace theory and the Gini impurity level index to predict the result (Zhu & Pierskalla 2016). Finally, the optimal regression result is obtained by voting or by other methods.

  2. Another machine learning algorithm is SVR which is based on the support vector machine (SVM) algorithm (Vapnik 2000). In the SVR, a small number of support vectors can be used to represent the entire sample set (Vapnik 2000). SVM uses slack variables and the error penalty parameter (C) to adjust the model complexity and training error (Tripathi et al. 2006). This method can solve non-linear regression optimization problems well. Based on the Lagrange function and Karush–Kuhn–Tucker conditions (Kuhn & Tucker 1951), for a given training datasets (xi, yi) , the objective function can be expressed as follows:  
    formula
    (1)
    where is a kernel function, is an insensitive loss parameter. The optimal solution can be obtained by Equation (2), and the corresponding is the support vectors. Specifically, this paper selected a Gaussian radial basis function (RBF) as the kernel function (Scholkopf et al. 1997). The function is:  
    formula
    (2)
    where is the width factor of the RBF. and C are the key variables in the SVR algorithm.

Precipitation spatio-temporal disaggregation

A typical monthly precipitation scaling method (Liu et al. 2018) based on the water balance principle was used to disaggregate the monthly areal precipitation forecasting to obtain the daily precipitation in each subbasin:  
formula
(3)
 
formula
(4)
 
formula
(5)
where A is the total area, is the observed monthly areal precipitation in the basin, is the area of the ith subbasin , m is the total number of subbasins, is the precipitation of the ith subbasin on the jth day , n is the number of days in the typical month, is the precipitation in the ith subbasin on the jth day as a percentage of the total areal precipitation on that day, is the monthly areal CFS precipitation, and is the CFS precipitation of the ith subbasin on the jth day.
In this paper, one spatio-temporal distribution is an adaptation of the original CFS forecasting distribution, while the other is the historical observation model obtained by hydrological similarity analysis (Sivapalan et al. 1987). Note that hydrological similarity analysis is the similarity of nine-month (same seasons) precipitation between CFS forecasting and historical observation. For the historical observation model, the typical month is selected through the Euclidean metric function (Anton & Rorres 1994) which can measure the similarity between the predicted and observed precipitation vectors (Liang et al. 2018b):  
formula
(6)
where and are the predicted and observed precipitation vectors in the ith year, respectively. In addition, is the predicted monthly precipitation in the jth month, is the observed monthly precipitation in the jth month of ith year, is the Euclidean distance between the prediction and observation in the ith year , and n is the number of years. The Euclidean distance metric was used because it is universal and simple to implement. The similarity principle dictates that the smaller the Euclidean distance the prediction and observation is, the higher the similarity. The most similar observed series is selected as the typical daily precipitation.

SWAT hydrological model

The SWAT model can predict water and sediment transport in large basins over a long period (Arnold et al. 1998). It is a semi-distributed subbasin-scale hydrological model (Arnold et al. 2011). Based on various land use/land cover (LULC) conditions, which are a combination of land use, soil and slope type, SWAT divides the subbasin into hydrological response units (HRUs) to represent the runoff units (Neitsch et al. 2011). The runoff parameters of SWAT vary with LULC. Although a HRU is a runoff unit, an entire subbasin is the input unit of the daily meteorological forcings (precipitation, temperature, wind speed, relative humidity and solar radiation) and the output unit of hydrological factors (flow, sediment concentration, etc.) (Winchell et al. 2011). Therefore, SWAT can simulate the intermediate runoff under different LULC conditions in the cascade reservoir system (Nguyen-Tien et al. 2018).

Evaluation criteria

To evaluate the accuracy of the CFS precipitation statistical post-processing, the correlation coefficient (R), relative error (RE) and mean absolute percentage error (MAPE) were used. The MAPE is the mean of the absolute RE between the measured and processed areal precipitation:  
formula
(7)
The Nash–Sutcliffe efficiency (NSE) was used to evaluate the performance of the long-term streamflow simulation:  
formula
(8)
where is the ith observed value, is the ith simulated value, is the mean observed value, and n is the total number of simulations.

Furthermore, to perform an accuracy assessment (AA) of the precipitation or flow prediction, a ±20% variation in amplitude during multiple years (1986–2016) is taken as the allowable deviation. If the forecast error is within the allowable deviation, the forecast value is qualified. If the forecast is qualified, the MAPE is calculated based on the relative error of the qualified forecast; this index is called the QR-MAPE. The qualified rate (QR) and QR-MAPE can indicate the overall forecast accuracy.

STUDY DOMAIN AND DATA

Study domain

The Han River, with a length of approximately 1,577 m, is the largest tributary of the Yangtze River. The Han River originates from the southern edge of the Qinling Mountains in China and has a basin area of approximately 159,000 km2. There are many large water conservancy projects in this basin, including the Shiquan reservoir, Ankang reservoir, and Danjiangkou reservoir, as shown in Figure 1. The storage capacities of these three reservoirs are 0.44, 2.58 and 29.05 billion cubic metres, respectively. This area is considered the CRSHR due to the reservoir interaction. The three areas of the CRSHR are the Shiquan watershed, Shiquan-Ankang intermediate area and Ankang-Danjiangkou intermediate area, which are shown in Figure 2. The spatio-temporal distribution of precipitation in the Han River basin is non-uniform, and the sum volume of precipitation during the flood season (roughly July to October) accounts for 65% of the overall annual flow.

Figure 1

Map of the CRSHR and the distribution of CFS grid points covering the basin.

Figure 1

Map of the CRSHR and the distribution of CFS grid points covering the basin.

Figure 2

SWAT subbasins and cascade reservoir system along the Han River. I, II and III denote the Shiquan watershed, Shiquan-Ankang intermediate area and Ankang-Danjiangkou intermediate area, respectively.

Figure 2

SWAT subbasins and cascade reservoir system along the Han River. I, II and III denote the Shiquan watershed, Shiquan-Ankang intermediate area and Ankang-Danjiangkou intermediate area, respectively.

Data

The second generation CFS is a fully coupled ocean-atmosphere-land model developed by the National Centers for Environmental Prediction (NCEP) (Saha et al. 2006, 2014). This CFS has a T126 spatial resolution (∼10,000 km2) resolution. Figure 1 shows all 25 mesh points over the Han River basin. Compared to other seasonal forecast models, CFS demonstrated a better seasonal climate forecasting performance (Yuan et al. 2011). This NCEP model became operational in March 2011 and runs at 0, 6, 12 and 18 UTC every day (Saha et al. 2014). The forecast period is nine months, and forecasts are provided at six-hour steps. In addition, the NCEP also provides a nine-month retrospective forecast every fifth day, beginning January 1st, over a 29-year period from 1982 to 2010. These data could ensure a larger sample size for robust evaluation.

In this study, daily observed meteorological variables (precipitation, temperature, wind speed, solar radiation and relative humidity) (1982–2016) were provided by China National Meteorological Center. The monthly average restoring inflows of the three reservoirs (2003–2016) were provided by the Yangtze River Waterway Bureau. Note that the restoring monthly inflow is equal to observed monthly inflow plus the variation of upstream reservoir storage capacity. The underlying surface data used as input into the SWAT model include the elevation, land use and soil type. NASA and NIMA provided the 90 m Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM). The land use data were extracted from the WESTDC_Land_Cover_V.1.0 dataset at a 1:100,000 scale provided by the Chinese Academy of Sciences. The soil data were provided by the second national land survey of the Institute of Soil Science and acquired in 1995.

RESULTS AND DISCUSSION

Nine-month precipitation forecasting

SVR and RF regression construction

To evaluate the accuracy of the original CFS, a monthly gridded CFS precipitation forecast with a lead time of one month from 1982 to 2016 was compared with the observed ground precipitation at both the grid and basin scales. The coefficients of correlation are 0.66 and 0.72, respectively, for these two comparisons (shown in Figure 3). This result indicates that the relationship between the CFS and observation may be not linear. Currently, it is difficult to determine that the relationship is linear or non-linear. In this premise, the advantage of RF and SVR is that both machine learning algorithms can describe the relationship well without consideration of linear or non-linear. Moreover, the correlation coefficient between the gridded CFS and on-site precipitation was compared with that between the areal CFS and observed areal precipitation. It suggests that the agreement (correlation) between the CFS precipitation and observation climatology on the area scale is better than that on the mesh-point scale. Therefore, RF and SVR were used to process the original areal CFS precipitation in the Han River basin. Considering that the precipitation at each mesh point has an impact on the basin areal precipitation, this study constructed the regression relationship between the CFS forecast at all mesh points over the Han River basin and basin areal precipitation.

Figure 3

The relationship between the gridded CFS precipitation and on-site observed precipitation, and the relationship between the areal CFS precipitation and observed areal precipitation.

Figure 3

The relationship between the gridded CFS precipitation and on-site observed precipitation, and the relationship between the areal CFS precipitation and observed areal precipitation.

Different regression relationships between the historical CFS forecast and observation were constructed for different lead times because the accuracy of CFS forecasting varies as the forecasting time increases. Note that in the regression relationships, the regression factors are the historical monthly CFS retrospective precipitation and temperature forecasts at the 25 mesh points, and the dependent variable is the monthly observed areal precipitation in the Han River basin. Then, the regression relationships were used to reprocess the historical monthly areal CFS precipitation retrospective forecast. The correlation coefficients between the processed areal CFS precipitation and observed precipitation are shown in Figures 4 and 5. Whether the original CFS was processed by RF or SVR, the R coefficient of the processed CFS is greater than 0.91 for various forecasting periods, indicating that both RF and SVR are effective. Furthermore, to represent the effects of calibration, the RE was calculated to reflect the accuracy of the processed CFS. Taking the processed CFS with a one-month lead time as an example, the RE was calculated from 1982 to 2016, as shown in Figure 6. Figure 6 shows that although there are several processing anomalies with significant errors, the median RE is nearly zero, especially during the rainy seasons from March to September. Therefore, the post-processing result meets the accuracy requirement. Furthermore, the two algorithms have a similar effect, and a better algorithm could not be determined. Therefore, when processing future CFS precipitation forecast, it is suggested that the results of these two algorithms are integrated.

Figure 4

The relationships between the monthly areal CFS precipitation processed with the RF regression algorithm and observed precipitation.

Figure 4

The relationships between the monthly areal CFS precipitation processed with the RF regression algorithm and observed precipitation.

Figure 5

The relationships between the monthly areal CFS precipitation processed by the SVR regression algorithm and observed precipitation.

Figure 5

The relationships between the monthly areal CFS precipitation processed by the SVR regression algorithm and observed precipitation.

Figure 6

The RE (%) between the processed CFS precipitation and observed precipitation from 1982 to 2016: (1) and (2) utilize RF; (3) and (4) utilize SVR.

Figure 6

The RE (%) between the processed CFS precipitation and observed precipitation from 1982 to 2016: (1) and (2) utilize RF; (3) and (4) utilize SVR.

To evaluate the performance of the two algorithms, this study calculated nine-month retrospective precipitation predictions based on historical data from December 31, 2010 to March 31, 2015. For each hindcasting, the regression relationship was reconstructed based on historical observations and forecasts produced before the period of the retrospective prediction. Tables 1 and 2 show the MAPE of the forecast from the validation with different lead times. The results suggest that the predicted MAPE increases with the lead time, which is reasonable. Moreover, it indicates that the processing performance is better for the rainy season than for the dry season.

Table 1

MAPE of CFS processed by RF regression for different forecasting months (%)

Forecasting period Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
1st month 52 65 23 26 18 22 17 19 27 26 54 94 
2nd month 66 52 24 25 17 24 20 21 30 23 98 143 
3rd month 67 50 29 27 17 23 16 23 29 27 92 182 
4th month 61 48 25 24 17 22 15 26 27 28 80 183 
5th month 67 52 23 24 16 20 16 25 28 30 91 155 
6th month 77 47 22 27 18 25 15 25 30 29 93 181 
7th month 64 54 22 28 18 21 15 24 28 29 97 129 
8th month 76 47 27 23 17 23 15 24 26 24 74 187 
9th month 59 59 27 28 17 22 19 22 29 29 81 111 
Forecasting period Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
1st month 52 65 23 26 18 22 17 19 27 26 54 94 
2nd month 66 52 24 25 17 24 20 21 30 23 98 143 
3rd month 67 50 29 27 17 23 16 23 29 27 92 182 
4th month 61 48 25 24 17 22 15 26 27 28 80 183 
5th month 67 52 23 24 16 20 16 25 28 30 91 155 
6th month 77 47 22 27 18 25 15 25 30 29 93 181 
7th month 64 54 22 28 18 21 15 24 28 29 97 129 
8th month 76 47 27 23 17 23 15 24 26 24 74 187 
9th month 59 59 27 28 17 22 19 22 29 29 81 111 
Table 2

MAPE of CFS processed by SVR regression for different forecasting months (%)

Forecasting period Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
1st month 106 95 45 37 18 11 12 21 37 94 153 
2nd month 128 107 44 42 20 22 13 20 18 31 110 272 
3rd month 135 87 50 45 15 17 11 23 37 56 154 268 
4th month 121 74 36 41 20 28 22 14 25 35 141 314 
5th month 120 87 42 41 24 30 16 21 24 36 100 268 
6th month 113 97 34 37 22 18 13 31 30 111 326 
7th month 109 86 48 43 18 21 12 17 29 36 81 223 
8th month 126 75 39 31 23 19 22 16 32 130 280 
9th month 100 93 45 36 23 25 15 23 30 38 131 323 
Forecasting period Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
1st month 106 95 45 37 18 11 12 21 37 94 153 
2nd month 128 107 44 42 20 22 13 20 18 31 110 272 
3rd month 135 87 50 45 15 17 11 23 37 56 154 268 
4th month 121 74 36 41 20 28 22 14 25 35 141 314 
5th month 120 87 42 41 24 30 16 21 24 36 100 268 
6th month 113 97 34 37 22 18 13 31 30 111 326 
7th month 109 86 48 43 18 21 12 17 29 36 81 223 
8th month 126 75 39 31 23 19 22 16 32 130 280 
9th month 100 93 45 36 23 25 15 23 30 38 131 323 

Figures 4 and 5 show that for a nine-month prediction period, with the exception of the first month, the correlation efficient from RF is equal to or slightly greater than that from SVR in the model calibration period. Therefore, it is plausible that the RF algorithm is superior to the SVR algorithm for CFS precipitation post-processing. However, when testing the two algorithms based on the nine-month (from January to September 2017) monthly CFS precipitation forecast obtained on December 31, 2016, the experimental result shows that SVR generally gives a better post-processing result than RF, as shown in Table 3. Specifically, the QR result from SVR is larger than that from RF, i.e. only the forecast for September 2017 using SVR is not qualified, and the QR-MAPE from SVR is less than that from RF. In addition, for the months of March, May, August and September, the errors from the RF result are smaller than those from the SVR result. For other months, the opposite conclusion could be obtained. Therefore, for precipitation forecast for a certain month, it is difficult to determine which method has a better post-processing effect. To reduce the uncertainty of forecasting, averaging the results of the two algorithms could be good practice.

Table 3

The post-processed nine-month CFS monthly precipitation from January to September 2017

Date (yyyy-mm) Obs (mm) RF
 
SVR
 
Corrected CFS (mm) AA RE (%) Corrected CFS (mm) AA RE (%) 
2017 − 01 17.7 19.5 10.4 16.7 –5.9 
2017 − 02 22 28.7 30.4 25.7 17.0 
2017 − 03 53.9 45.7 –15.2 40.9 –24.1 
2017 − 04 77.9 99.8  78.5 0.7 
2017 − 05 80.4 94.0 17.0 100.6 25.2 
2017 − 06 119.8 144.9 20.9 134.0 11.9 
2017 − 07 109.1 157.8  143.0 31.0 
2017 − 08 140.1 138.5 –1.2 143.7 2.6 
2017 − 09 211.6 136.1  111.4  
QR   67%   89%  
QR-MAPE    15.8   14.8 
Date (yyyy-mm) Obs (mm) RF
 
SVR
 
Corrected CFS (mm) AA RE (%) Corrected CFS (mm) AA RE (%) 
2017 − 01 17.7 19.5 10.4 16.7 –5.9 
2017 − 02 22 28.7 30.4 25.7 17.0 
2017 − 03 53.9 45.7 –15.2 40.9 –24.1 
2017 − 04 77.9 99.8  78.5 0.7 
2017 − 05 80.4 94.0 17.0 100.6 25.2 
2017 − 06 119.8 144.9 20.9 134.0 11.9 
2017 − 07 109.1 157.8  143.0 31.0 
2017 − 08 140.1 138.5 –1.2 143.7 2.6 
2017 − 09 211.6 136.1  111.4  
QR   67%   89%  
QR-MAPE    15.8   14.8 

Typical monthly precipitation scaling

The daily precipitation forecasting on the subbasin scale was obtained by multiplying the weighted coefficient in the typical month by the processed monthly areal CFS precipitation according to Equations (3)–(5). A typical historical storm was selected by using the hydrological similarity measure method. Finally, the spatio-temporal distribution of daily precipitation from January to September in 2015 was determined as the historical similar precipitation spatio-temporal distribution. Figures 7 and 8 show the spatial and temporal distributions of the forecasted precipitation in June 2017 obtained based on the historical similar observed precipitation and original CFS precipitation forecast, respectively. These results indicate that there are considerable differences between the similar analysis and original CFS forecast method for both the temporal and spatial distribution.

Figure 7

The forecasting spatial distribution in June 2017 based on the typical monthly precipitation scaling: (a) historical similar observed precipitation and (b) original CFS precipitation forecast.

Figure 7

The forecasting spatial distribution in June 2017 based on the typical monthly precipitation scaling: (a) historical similar observed precipitation and (b) original CFS precipitation forecast.

Figure 8

The temporal distribution of daily precipitation in June 2017 based on the typical monthly precipitation scaling: (a) historical similar observed precipitation and (b) original CFS precipitation forecast.

Figure 8

The temporal distribution of daily precipitation in June 2017 based on the typical monthly precipitation scaling: (a) historical similar observed precipitation and (b) original CFS precipitation forecast.

Monthly streamflow forecasting

SWAT model construction

In the SWAT model, the whole Han River basin was divided into 64 subbasins and further subdivided into 367 HRUs according to LULC. Note that the areal average precipitation of a subbasin is the precipitation inputting unit in the SWAT model. The other meteorological elements of the subbasin were obtained by inverse distance interpolation of the data measured at the 12 observation gauges; these elements include temperature, wind speed, solar radiation and relative humidity.

The restored reservoir monthly streamflows for the period of 2003–2012 were used for the SWAT model parameters calibration, and data from 2013 to 2016 were used for validation. The Sequential Uncertainty Fitting optimization algorithm (version 2) (Arnold et al. 2012) in the SWAT-CUP program (Noori & Kalin 2016) was used to calibrate the 19 SWAT parameters automatically targeted (Table 4). Moreover, considering the physical meaning of some parameters, the automated calibration parameters values needed to be manually adjusted. Finally, the optimal parameter set of the SWAT model for all three regions (Shiquan watershed, Shiquan-Ankang intermediate area and Ankang-Danjiangkou intermediate area) was obtained, as shown in Table 5. Figure 9 shows the simulation of monthly discharge for the three reservoirs in the calibration and validation period. To evaluate the accuracy of the model simulation, the NSE was calculated. The values of NSE for the Shiquan, Ankang, Danjiangkou reservoirs are 0.86, 0.88 and 0.84 in the calibration period, while they are 0.73, 0.73 and 0.66 in the validation period, respectively. Therefore, the SWAT model can provide an acceptable accuracy for the monthly streamflow simulation in the Han River basin in the calibration and validation periods.

Table 4

Summary of the 19 calibration parameters of the SWAT model in the Han River basin

Parameter Input file Units Default value Range Description 
CN2a .mgt – HRU ±20% SCS run off curve number 
ALPHA_BNK .rte day 0.048 0–1 Base flow factor for bank storage 
ESCO .hru – 0.95 0.01–1 Soil evaporation compensation factor 
SOL_BDa .sol mg/m3 Soil data ±50% Soil bulk density 
SOL_Ka .sol mm/h Soil data ±80% Soil saturated infiltration coefficient 
REVAPMN .gw mm 0–1000 Threshold water level in shallow aquifer at which ‘revap’ occurs 
GW_DELAY .gw day 31 0–500 Groundwater delay 
CH_N2 .rte – 0.014 –0.01 to 0.3 Channel Manning coefficient 
CH_K2 .rte mm/h –0.01 to 500 Channel effective hydraulic conductivity 
TIMP .bsn – 0.01 to 1 Snow pack temperature lag factor 
TLAPS .sub °C/km –10 to 0 Temperature lapse rate 
SURLAG .bsn day 1–24 Surface runoff lag time 
LAT_TTIME .rte day 0–180 Lateral flow time 
ALPHA_BF .gw day 0.048 0–1 Base flow recession constant 
SOL_AWCa .sol – Soil data ±20% Available water capacity of the soil layer 
EPCO .hru – 0.01–1 Plant evaporation compensation factor 
GW_REVAP .gw – 0.02 0.02–0.2 Groundwater ‘revap’ coefficient 
CANMX .hru mm 0–100 Maximum canopy storage 
GWQMN .gw mm 0–5000 Threshold water depth in the shallow aquifer for return flow to occur 
Parameter Input file Units Default value Range Description 
CN2a .mgt – HRU ±20% SCS run off curve number 
ALPHA_BNK .rte day 0.048 0–1 Base flow factor for bank storage 
ESCO .hru – 0.95 0.01–1 Soil evaporation compensation factor 
SOL_BDa .sol mg/m3 Soil data ±50% Soil bulk density 
SOL_Ka .sol mm/h Soil data ±80% Soil saturated infiltration coefficient 
REVAPMN .gw mm 0–1000 Threshold water level in shallow aquifer at which ‘revap’ occurs 
GW_DELAY .gw day 31 0–500 Groundwater delay 
CH_N2 .rte – 0.014 –0.01 to 0.3 Channel Manning coefficient 
CH_K2 .rte mm/h –0.01 to 500 Channel effective hydraulic conductivity 
TIMP .bsn – 0.01 to 1 Snow pack temperature lag factor 
TLAPS .sub °C/km –10 to 0 Temperature lapse rate 
SURLAG .bsn day 1–24 Surface runoff lag time 
LAT_TTIME .rte day 0–180 Lateral flow time 
ALPHA_BF .gw day 0.048 0–1 Base flow recession constant 
SOL_AWCa .sol – Soil data ±20% Available water capacity of the soil layer 
EPCO .hru – 0.01–1 Plant evaporation compensation factor 
GW_REVAP .gw – 0.02 0.02–0.2 Groundwater ‘revap’ coefficient 
CANMX .hru mm 0–100 Maximum canopy storage 
GWQMN .gw mm 0–5000 Threshold water depth in the shallow aquifer for return flow to occur 

aThese parameters are varied as a percentage of their default values to maintain their relative spatial variability.

Table 5

The optimal parameter sets of the SWAT model for the three regions

Parameter Shiquan watershed Shiquan-Ankang intermediate area Ankang-Danjiangkou intermediate area 
SURLAG 0.325425 9.594075 9.857525 
CN2 –0.0854 0.1038 0.1986 
SOL_K 0.5832 0.4376 0.7384 
SOL_BD 0.26615 0.40805 0.14625 
SOL_AWC 0.0979 0.3037 0.1501 
GW_DELAY 186.75 200.75 398.75 
ALPHA_BF 0.9575 0.6715 0.0785 
GWQMN 912.5 1307.5 987.5 
GW_REVAP 0.09137 0.06275 0.02675 
REVAPMN 852.5 350.5 866.5 
ALPHA_BNK 0.6685 0.2005 0.7705 
CH_N2 0.249315 0.225755 0.001625 
CH_K2 380.7476 23.24047 275.7455 
CH_N1 26.56614 22.78741 24.82673 
CH_K1 220.35 48.15 34.95 
LAT_TTIME 5.49 149.49 1.71 
CANMX 92.35 3.35 13.65 
ESCO 0.143155 0.060985 0.233245 
EPCO 0.598555 0.680725 0.454015 
Parameter Shiquan watershed Shiquan-Ankang intermediate area Ankang-Danjiangkou intermediate area 
SURLAG 0.325425 9.594075 9.857525 
CN2 –0.0854 0.1038 0.1986 
SOL_K 0.5832 0.4376 0.7384 
SOL_BD 0.26615 0.40805 0.14625 
SOL_AWC 0.0979 0.3037 0.1501 
GW_DELAY 186.75 200.75 398.75 
ALPHA_BF 0.9575 0.6715 0.0785 
GWQMN 912.5 1307.5 987.5 
GW_REVAP 0.09137 0.06275 0.02675 
REVAPMN 852.5 350.5 866.5 
ALPHA_BNK 0.6685 0.2005 0.7705 
CH_N2 0.249315 0.225755 0.001625 
CH_K2 380.7476 23.24047 275.7455 
CH_N1 26.56614 22.78741 24.82673 
CH_K1 220.35 48.15 34.95 
LAT_TTIME 5.49 149.49 1.71 
CANMX 92.35 3.35 13.65 
ESCO 0.143155 0.060985 0.233245 
EPCO 0.598555 0.680725 0.454015 
Figure 9

Observed and simulated monthly discharge in the three reservoirs in the calibration and validation periods: (a) Shiquan reservoir, (b) Ankang reservoir and (c) Danjiangkou reservoir.

Figure 9

Observed and simulated monthly discharge in the three reservoirs in the calibration and validation periods: (a) Shiquan reservoir, (b) Ankang reservoir and (c) Danjiangkou reservoir.

Experimental monthly streamflow forecasting for the CRSHR

Based on the above nine-month CFS monthly forecast precipitation post-processing from January to September in 2017 above under ‘Nine-month precipitation forecasting’, we evaluated the monthly streamflow forecast from January to September in 2017 for the three areas of CRSHR using the long-term forecast scheme proposed in this study. For comparison, the daily precipitation in the subbasin was obtained by scaling the original CFS daily precipitation and similar historical observation model, respectively. Other forecasted meteorological elements (temperature, wind speed, solar radiation and relative humidity) were obtained by the weather generator. All the forecasted meteorological elements were input into each subbasin to produce the monthly streamflow at the three outlets (Shiquan, Ankang and Danjiangkou), as shown in Figure 10. The values of QR and QR-MAPE are summarized in Table 6. For the Shiquan reservoir, compared to the streamflow from the original CFS spatio-temporal distribution, the QR of the streamflow from the similar historical observed spatio-temporal distribution was 11% greater, while the QR-MPAE was 15% less; for the Ankang reservoir, although the QR remained unchanged, the QR-MPAE was 8% less; for the Danjiangkou reservoir, the QR was 23% greater, while the QR-MPAE was 28% less. Therefore, the streamflow forecast based on the historical measured spatio-temporal distribution of precipitation is more accurate than that using the original CFS spatio-temporal distribution.

Table 6

QR and QR-MAPE of the monthly streamflow forecasting in the three reservoirs

  Streamflow with the original CFS spatio-temporal distribution
 
Streamflow with the similar historical observed spatio-temporal distribution
 
QR (%) QR-MAPE (%) QR (%) QR-MAPE (%) 
Shiquan reservoir 67 27 78 12 
Ankang reservoir 78 23 78 15 
Danjiangkou reservoir 44 45 67 17 
  Streamflow with the original CFS spatio-temporal distribution
 
Streamflow with the similar historical observed spatio-temporal distribution
 
QR (%) QR-MAPE (%) QR (%) QR-MAPE (%) 
Shiquan reservoir 67 27 78 12 
Ankang reservoir 78 23 78 15 
Danjiangkou reservoir 44 45 67 17 
Figure 10

Monthly streamflow forecast at the three outlets: (a) Shiquan reservoir, (b) Ankang reservoir and (c) Danjiangkou reservoir.

Figure 10

Monthly streamflow forecast at the three outlets: (a) Shiquan reservoir, (b) Ankang reservoir and (c) Danjiangkou reservoir.

CONCLUSIONS

In this study, a nine-month streamflow forecast for the CRSHR was obtained based on the SWAT model with CFS output. The results and findings are concluded as follows:

  1. An attempt to apply the NCEP Climate Forecast System outputs to long-term streamflow forecasting was made. Two machine learning approaches, RF and SVR, were proposed for the post-processing of the CFS precipitation forecast with different lead times. The results showed that no matter which method (RF or SVR) is used or how long the forecast period is (within nine months), the correlation coefficients between the processed CFS and observed precipitation were greater than 0.91. In addition, the processing performance for the rainy season (from May to August) is better than that for the dry season. Meanwhile, the advantage of post-processing is that an increased lead time has no effect on the accuracy of the processed prediction. Furthermore, the QR and MAPE of the testing results suggested that the two processing methods used for the CFS precipitation met the accuracy requirement. However, it is difficult to determine which method is better. In this paper, to reduce uncertainty, the final precipitation forecast is an average of the results from the two algorithms.

  2. To forecast the streamflow of the studied cascade reservoirs system, SWAT construction for the three regions is suggested to reflect the rainfall-runoff mechanism of different regions. Meanwhile, considering the effect of the spatio-temporal distribution of precipitation in the subbasin, the daily precipitation in each subbasin was determined by using the typical monthly precipitation scaling. Different spatio-temporal distribution models have a significant impact on the streamflow forecast results. For the long-term streamflow forecast, the spatio-temporal distribution model, based on a hydrological similarity analysis, is better than that of the original CFS prediction.

Finally, a long-term streamflow forecasting scheme for the CRSHR was achieved by combining CFS precipitation post-processing with the SWAT model. This scheme could provide new thought for a coupled atmospheric-hydrological streamflow forecasting model. The new framework proposed in this paper could be used to forecast long-term streamflows for similar cascade reservoir systems.

ACKNOWLEDGEMENTS

This paper was jointly supported by the National Key Research and Development Program of China (2016YFC0402706, 2016YFC0402707), the Fundamental Research Funds for the Central Universities (2017B611X14), the Key Program of National Natural Science Foundation of China (41730750), and Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX17_0415).

REFERENCES

REFERENCES
Adnan
R. M.
,
Yuan
X.
,
Kisi
O.
&
Yuan
Y.
2017
Streamflow forecasting using artificial neural network and support vector machine models
.
Am. Sci. Res. J. Eng. Technol. Sci. (ASRJETS)
29
,
286
294
.
Anghileri
D.
,
Voisin
N.
,
Castelletti
A.
,
Pianosi
F.
,
Nijssen
B.
&
Lettenmaier
D. P.
2016
Value of long-term streamflow forecasts to reservoir operations for water supply in snow-dominated river catchments
.
Water Resour. Res.
52
,
4209
4225
.
Anton
H.
&
Rorres
C.
1994
Elementary Linear Algebra: Applications Version
,
7th edn
.
John Wiley & Sons
,
Hoboken, NJ
,
USA
, pp.
170
171
.
Arnold
J. G.
,
Srinivasan
R.
,
Muttiah
R. S.
&
Williams
J. R.
1998
Large-area hydrologic modeling and assessment: part I model development
.
J. Am. Water Resour. Assoc.
34
,
73
89
.
Arnold
J. G.
,
Kiniry
J. R.
,
Srinivasan
R.
,
Williams
J. R.
,
Haney
E. B.
&
Neitsch
S. L.
2011
Soil and Watet Assessment Tool Input/Output File Documentation Version 2009
.
Texas Water Resources Institute Technical Report
,
Texas
,
USA
.
Arnold
J. G.
,
Moriasi
D. N.
,
Gassman
P. W.
,
Abbaspour
K. C.
,
White
M. J.
,
Srinivasan
R.
,
Santhi
C.
,
Harmel
R. D.
,
van Griensven
A.
,
Van Liew
M. W.
,
Kannan
N.
&
Jha
M. K.
2012
SWAT: model use, calibration, and validation
.
Trans. ASABE
55
,
1491
1508
.
Bauer
P.
,
Thorpe
A.
&
Brunet
G.
2015
The quiet revolution of numerical weather prediction
.
Nature
525
,
47
55
.
Breiman
L.
1996a
Out-Of-Bag Estimation. CiteSeer: Technical Report 513
.
University of California, Department of Statistics
,
Berkeley, CA
,
USA
.
Breiman
L.
1996b
Bagging predictors
.
Mach. Learn.
24
(
2
),
123
140
.
Chen
J.
,
Brissette
F. P.
&
Lucas-Picher
P.
2015
Assessing the limits of bias-correcting climate model outputs for climate change impact studies
.
J. Geophys. Res. Atmos.
120
,
1123
1136
.
Dariane
A. B.
,
Farhani
M.
&
Azimi
S.
2018
Long term streamflow forecasting using a hybrid entropy model
.
Water Resour. Manage.
32
,
1439
1451
.
Hong
M.
,
Wang
D.
,
Wang
Y.
,
Zeng
X.
,
Ge
S.
,
Yan
H.
&
Singh
V. P.
2016
Mid- and long-term runoff predictions by an improved phase-space reconstruction model
.
Environ. Res.
148
,
560
573
.
Hulme
M.
1994
Validation of large-scale precipitation fields in General Circulation Models
. In:
Global Precipitations and Climate Change
(
Desbois
M.
&
Désalmand
F.
, eds).
NATO ASI Book, Springer Verlag
,
Berlin, Heidelberg
, pp.
387
406
.
Ines
A. V. M.
&
Hansen
J. W.
2006
Bias correction of daily GCM rainfall for crop simulation studies
.
Agric. For. Meteorol.
138
,
44
53
.
Kardhana
H.
,
Arya
D. K.
,
Hadihardaja
I. K.
,
Widyaningtyas
E. R.
&
Lubis
A.
2017
Small hydropower spot prediction using SWAT and a diversion algorithm, case study: Upper Citarum Basin
. In:
Proceedings of the 3rd International Conference on Construction and Building Engineering (ICONBUILD)
,
American Institute of Physics
,
Palembang
,
Indonesia
.
Kuhn
H. W.
&
Tucker
A. W.
1951
Nonlinear programming
. In:
Proceedings of 2nd Berkeley Symposium
.
University of California Press
,
Berkeley
,
California
, pp.
481
492
.
Li
B.
,
Yu
Z.
,
Liang
Z.
&
Acharya
K.
2014a
Hydrologic response of a high altitude glacierized basin in the central Tibetan Plateau
.
Glob. Planet. Change
118
,
69
84
.
Li
J.
,
Chen
Y.
,
Wang
H.
,
Qin
J.
,
Li
J.
&
Chiao
S.
2017
Extending flood forecasting lead time in a large watershed by coupling WRF QPF with a distributed hydrological model
.
Hydrol. Earth Syst. Sci.
21
,
1279
1294
.
Liang
Z.
,
Li
Y.
,
Hu
Y.
,
Li
B.
&
Wang
J.
2018a
A data-driven SVR model for long-term runoff prediction and uncertainty analysis based on the Bayesian framework
.
Theor. Appl. Climatol.
133
(
1–2
),
137
149
.
Neitsch
S. L.
,
Arnold
J. G.
,
Kiniry
J. R.
&
Williams
J. R.
2011
Soil and Water Assessment Tool Theoretical Documentation Version 2009
.
Texas Water Resource Institute Technical Report
,
Texas
.
Nguyen-Tien
V.
,
Elliott
R. J. R.
&
Strobl
E. A.
2018
Hydropower generation, flood control and dam cascades: a national assessment for Vietnam
.
J. Hydrol.
560
,
109
126
.
Piani
C.
,
Haerter
J. O.
&
Coppola
E.
2010
Statistical bias correction for daily precipitation in regional climate models over Europe
.
Theor. Appl. Climatol.
99
,
187
192
.
Saha
S.
,
Nadiga
S.
,
Thiaw
C.
&
Wang
J.
2006
The NCEP climate forecast system
.
J. Clim.
19
,
3483
3517
.
Saha
S.
,
Moorthi
S.
,
Wu
X.
,
Wang
J.
,
Nadiga
S.
,
Tripp
P.
,
Behringer
D.
,
Hou
Y.-T.
,
Chuang
H.-Y.
,
Iredell
M.
,
Ek
M.
,
Meng
J.
,
Yang
R.
,
Peña Mendez
M.
,
van den Dool
H.
,
Zhang
Q.
,
Wang
W.
,
Chen
M.
&
Becker
E.
2014
The NCEP climate forecast system version 2
.
J. Clim.
27
,
2185
2208
.
Scholkopf
B.
,
Sung
K.-K.
,
Burges
C. J. C.
,
Girosi
F. P.
,
Niyogi
T. P.
&
Vapnik
V.
1997
Comparing support vector machines with Gaussian kernels to radial basis function classifiers
.
IEEE Trans. Signal Process.
45
,
2758
2765
.
Sivapalan
M.
,
Beven
K.
&
Wood
E. F.
1987
On hydrologic similarity: 2. A scaled model of storm runoff production
.
Water Resour. Res.
23
,
2266
2278
.
Trenberth
K. E.
&
Shea
D. J.
2005
Relationships between precipitation and surface temperature
.
Geophys. Res. Lett.
32
,
L14703
.
Tripathi
S.
,
Srinivas
V. V.
&
Nanjundiah
R. S.
2006
Downscaling of precipitation for climate change scenarios: a support vector machine approach
.
J. Hydrol.
330
,
621
640
.
Vandal
T.
,
Kodra
E.
&
Ganguly
A. R.
2017
Intercomparison of machine learning methods for statistical downscaling: the case of daily and extreme precipitation. Available from: https://arxiv.org/abs/1702.04018v1
.
Vapnik
V. N.
2000
The Nature of Statistical Learning Theory
,
2nd edn
.
Springer
,
Heidelberg
.
Wilby
R. L.
,
Wigley
T. M. L.
,
Conway
D.
,
Jones
P. D.
,
Hweitson
B. C.
,
Main
J.
&
Wilks
D. S.
1998
Statistical downscaling of general circulation model output: a comparison of methods
.
Water Resour. Res.
34
,
2995
3008
.
Wilby
R. L.
,
Hay
L. E.
,
Gutowski
W. J. J.
,
Arritt
R. W.
,
Takle
E. S.
,
Pan
Z.
,
Leavesley
G. H.
&
Clark
M. P.
2000
Hydrological responses to dynamically and statistically downscaled climate model output
.
Geophys. Res. Lett.
27
,
1199
1202
.
Winchell
M.
,
Srinivasan
R.
,
DiLuzio
M.
&
Arnold
J.
2011
ArcSWAT Interface for SWAT 2009 User’ Guide
.
Texas Agricultural Experiment Station and United States Department of Agriculture
,
Texas
,
USA
.
Wood
A. W.
,
Maurer
E. P.
,
Kumar
A.
&
Lettenmaier
D. P.
2002
Long-range experimental hydrologic forecasting for the eastern United States
.
J. Geophys. Res.
107
(
D20
),
4429
.
Wood
A. W.
,
Leung
L. R.
,
Sridhar
V.
&
Lettenmaier
D.
2004
Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs
.
Clim. Change
62
,
189
216
.
Yang
J.
,
Abbaspour
K. C.
,
Reichert
P.
&
Yang
H.
2008
Comparing uncertainty analysis techniques for a SWAT application to Chaohe Basin in China
.
J. Hydrol.
358
(
1–2
),
1
23
.
Yuan
X.
,
Wood
E. F.
,
Luo
L.
&
Pan
M.
2011
A first look at Climate Forecast System version 2 (CFSv2) for hydrological seasonal prediction
.
Geophys. Res. Lett.
38
,
L13402
.