Abstract
To alleviate the computational burden of parameter calibration of the Variable Infiltration Capacity (VIC) model, a stepwise surrogate model (SM) is developed based on AdaBoost. An SM first picks out the parameter sets in the range that the values of objective functions are close to the optimization objectives and then approximates the values of objective functions with these parameter sets. The ɛ-NSGA II (Nondominated Sorting Genetic Algorithm II) algorithm is used to search the optimal solutions of SM. The SM is tested with a case study in the upper Brahmaputra River basin, Tibet Plateau, China. The results show that the stepwise SM performed well with the rate of misclassification less than 2.56% in the global simulation step and the root mean square error less than 0.0056 in the local simulation step. With no large difference in the optimal solutions between VIC and the SM, the SM-based algorithm saves up to 90% time.
HIGHLIGHTS
A stepwise surrogate model (SM) for VIC parameter calibration is built with AdaBoost.
The SM first simulates the response surface coarsely in the entire parameter space and then simulates the response surface in the crucial range precisely.
The surrogate-based algorithm improves the efficiency of VIC parameter calibration remarkably.
INTRODUCTION
The distributed hydrological models are widely used to study the climate change effects on hydrological cycle (Leng et al. 2015; Lu et al. 2018), streamflow forecast (Schumann et al. 2013; Mo & Lettenmaier 2014), and water resource assessment (Xu et al. 2015; Sridhar et al. 2019). Calibrating the parameters of a distributed hydrological model automatically usually requires a lot of computational power, where the model often runs thousands of times. To alleviate the computational burden of model parameter calibration, a surrogate model (SM) is one of the most important measures (Maier et al. 2014).
There are a number of studies (Yan & Minsker 2006; Kourakos & Mantoglou 2009; Teweldebrhan et al. 2019; Xing et al. 2019) to handle the problem of parameter optimization for complex models with SM in several engineering fields. These applications show the efficiency of SM in optimization. Artificial neural network (ANN) is one of the most commonly used methods to construct SMs (Behzadian et al. 2009; Yalçın et al. 2018; Sayers et al. 2019), and it can reduce the number of simulations of the model required without reducing the accuracy of the optimization results. Beside the ANN, the radial basis function (RBF; Christelis et al. 2016), support vector machine (Zhang et al. 2009), and Gaussian process regression (Zhang et al. 2016) are also widely employed as SMs. In these cases where data-driven models or black box models are used to estimate the relationships between input parameters of the physical models and the outputs, it is known as response surface surrogate (Razavi et al.2012a).
SMs referred to in most studies are global response surface surrogates, which means that these SMs simulate the outputs in the entire parameter space. However, the global approaches are slow to converge when outputs are relatively flat around the optimal solution (Wang et al. 2004). Therefore, the local response surface surrogates are proposed to solve this problem. Bliznyuk et al. (2008) applied the RBFs to surrogate the physical model in the high posterior density region when calibrating parameters with Markov Chain Monte Carlo (MCMC). But this algorithm simulated a region in the entire response surface, which makes it difficult to apply in model parameter calibration coupled with a global optimization algorithm, such as Nondominated Sorting Genetic Algorithm II (NSGA II). On the contrary, some studies (Regis & Shoemaker 2004; Razavi et al.2012b) mainly surrogated offspring with approximated functions in the single objective genetic algorithm (GA) and replaced the most promising ones with the values from the original model. However, the practicality and efficiency of this algorithm without updating approximated functions are not clear when applying in a more complex case, such as multi-objective optimization.
In the distributed hydrological model parameter calibration, several studies (Zhang et al. 2009; Gong et al. 2016; Teweldebrhan et al. 2019) adopted black box models as SMs to surrogate hydrological models and then linked SM with optimization algorithms to calibrate parameters of the distributed model. In these studies, the utilization of global response surface surrogate avoids the losses of local optimal solutions. Usually, these local optimal solutions are discretely distributed in the entire parameter space because of the equifinality for different parameters in the hydrological model (Khatami et al. 2019; Liu et al. 2019). However, the problem of slow convergence in the local optimal regions for global approaches is still unsolved. Adding a physical layer between input parameters and output objectives to convert the highly nonlinear response into a linear one is one of the methods to address this problem (Razavi & Tolson 2013; Sun et al. 2017). But the efficiency of the SM algorithm will be affected due to the addition of the physical layer.
In our study, a stepwise SM is proposed to improve the efficiency of the distributed hydrological model parameter calibration. With global surrogates approximating the response surface in the entire parameter space and local surrogates adding details of the response surface in the local optimal regions, the approach can address the problem of slow convergence and avoid the losses of local optimal solutions. The self-adapted updating scheme is also introduced in this SM-based parameter calibration procedure. The improvement of efficiency with SM in multi-objective optimizations is mainly investigated in this study.
STUDY AREA AND DATA
The upper Brahmaputra River (UBR) basin is located in the southern part of the Tibetan Plateau, Southwest China, with its drainage area of approximately 1.8 × 105 km2 controlled by the Yangcun hydrologic station. The UBR basin is the longest plateau river in China and one of the highest rivers in the world with a mean altitude over 4,000 m. The river originates on the Jiemayangzong Glacier at an elevation of 5,590 m above sea level located on the northern side of the Himalayas (Zeng et al. 2018). The altitudes of the UBR basin range from 3,509 to 7,185 m. Precipitation, groundwater, and snow (including glacier) melt water are the three main sources of water in the UBR basin. The annual average precipitation is about 560 mm in the UBR basin and changes only slightly annually, while monthly runoff distribution is highly inhomogeneous where most of the precipitation occurs during the period from June to September. The annual average temperature in the basin is about 6.27 °C. In general, the highest temperature in a year occurs in June and July and the lowest one occurs in January. As a result, a large amount of precipitation always accompanies a large amount of snow melt water.
Figure 1 shows the basin location, elevations, weather stations, and two hydrologic outlets that are used in this study. The gauged meteorological data, including daily precipitation, daily average air temperature, wind speed, relative humidity, and atmospheric pressure, are collected from 23 China National Meteorological Observatory stations located in and around the UBR as shown in Figure 1 (black points). The daily runoff series of two gauge stations (black triangles), i.e. Nugesha station (NGS) and Yangcun station (YC), are utilized in this study. The digital elevation model data with 90 m resolution is downloaded from American National Aeronautics and Space Administration Shuttle Radar Topographic Mission; the soil data set is Harmonized World Soil Database version 1.1 (HWSD v1.1; Fischer et al. 2008) in a resolution of 1 km; the landcover data set is WESTDC landcover data set version 2.0 (Ran et al. 2010) in a resolution of 1 km. Due to the intermittence of discharge data at NGS and YC from 2005 to 2009, the meteorological and discharge data in the first 6 years from 1999 to 2004 are employed for calibration and the last 6 years from 2010 to 2015 are used as the period for validation.
METHODS
Hydrological model
The Variable Infiltration Capacity (VIC) Macroscale Hydrological Model was originally developed by Liang et al. (1994, 1996) at the University of Washington for macroscale catchments. The model simulates runoff grid by grid. However, the VIC model itself does not include a routing model to converge the runoff of each grid cell into the outlet station. Therefore, the routing model by Lohmann et al. (1996, 1998) is used to calculate the streamflow at the outlet station, which is a source-to-sink model to solve linearized Saint-Venant equations.
In this study, the land surface simulation uses VIC (version 4.2.d) at a six-hourly time step in water and energy balance modes. Since the study basin is located in the alpine region, the snow (Bowling et al. 2004; Andreadis et al. 2009) and frozen soil (Cherkauer & Lettenmaier 1999; Cherkauer et al. 2003) algorithms are adopted. Since the percentage of glaciers in the study basin is less than 5% (Xuan et al. 2018), the glacier melt water is simulated in the snow module coupled with the snow melt water.
The study area is divided into 0.25° × 0.25° grid cells, in which surface characteristics such as elevation, soil, and vegetation are specified (Nijssen et al. 2001). The gauged meteorological data are also interpolated into data with a resolution of 0.25° × 0.25° using the Inverse Distance Weighted (IDW) method coupled with elevation-based lapse rates, setting as 0.6 mm/km for precipitation and −6.5 °C/km for temperature (Cuo et al. 2014).
There are numerous parameters in the VIC model. However, a large proportion of them are usually set as default values and some of them adopt the measured values (such as most vegetation parameters). As a result, the parameters for calibration are usually some soil parameters which are not easy to measure. Generally, the parameters selected for calibration are the thickness of the second and third soil moisture layers (d2, d3), the fraction of maximum soil moisture where nonlinear baseflow occurs (Ws), the maximum velocity of baseflow (Dsmax), the fraction of Dsmax where nonlinear baseflow begins (Ds), and the variable infiltration curve parameter (binf) (Lohmann et al. 1996; Xie et al. 2007; Mishra et al. 2010; Voisin et al. 2011; Lu et al. 2013; Zhao et al. 2013; Kang et al. 2014; Hengade & Eldho 2016; Liu et al. 2017). Some studies also calibrated more parameters according to different catchment conditions, such as routing parameters (Parada et al. 2003), snow parameters (Isenstein et al. 2015), and vegetation parameters (Newman et al. 2017). In our work, the calibration period is close to the period when the Global Land Data Assimilation System (GLDAS) data were used to estimate the default values of leaf area index and minimum stomatal resistance instead of calibration. According to suggestions from previous work, eight parameters in VIC are selected for calibration in our study. The calibration parameters are d2, d3, binf, Ws, Ds, Dsmax, surface roughness of snowpack (Sr), and mean flow velocity (v). The feasible ranges of these parameters are given in Table 1.
Type . | Parameter . | Description . | Unit . | Range . |
---|---|---|---|---|
Soil | binf | Variable infiltration curve parameter | – | 0–0.4 |
Dsmax | Maximum velocity of baseflow | mm/day | 0–30 | |
Ds | Fraction of Dsmax where nonlinear baseflow begins | Fraction | 0–1 | |
Ws | Fraction of maximum soil moisture where nonlinear baseflow occurs | Fraction | 0–1 | |
d2 | Thickness of the second soil moisture layer | m | 0.1–1.5 | |
d3 | Thickness of the third soil moisture layer | m | 0.1–1.5 | |
Snow | Sr | Surface roughness of snowpack | m | 0.001–0.03 |
Routing | v | Mean flow velocity | m/s | 0.5–5 |
Type . | Parameter . | Description . | Unit . | Range . |
---|---|---|---|---|
Soil | binf | Variable infiltration curve parameter | – | 0–0.4 |
Dsmax | Maximum velocity of baseflow | mm/day | 0–30 | |
Ds | Fraction of Dsmax where nonlinear baseflow begins | Fraction | 0–1 | |
Ws | Fraction of maximum soil moisture where nonlinear baseflow occurs | Fraction | 0–1 | |
d2 | Thickness of the second soil moisture layer | m | 0.1–1.5 | |
d3 | Thickness of the third soil moisture layer | m | 0.1–1.5 | |
Snow | Sr | Surface roughness of snowpack | m | 0.001–0.03 |
Routing | v | Mean flow velocity | m/s | 0.5–5 |
Calibration strategy
In this study, ɛ-NSGA II (Kollat & Reed 2006) is used to calibrate parameters of VIC as a multi-objective optimization algorithm, which is a global optimization algorithm improved from NSGA II (Deb et al. 2002). Liu et al. (2019) applied ɛ-NSGA II for VIC parameter calibration in the same study area successfully. Three features adopted in ɛ-NSGA II make the algorithm efficient, reliable, and easy-to-use for water resource applications (Kollat & Reed 2007).
First, the ɛ-dominance is used in ɛ-NSGA II instead of the original dominance (Laumanns et al. 2002). In the method of ɛ-dominance, the search space is divided into a grid with the grid cell size of ɛ × ɛ, and then the solution nearest to the lower left-hand corner in each grid cell is preserved. The value of ɛ determines the precision of the Pareto solution. A user can specify ɛ based on the precision goals (Kollat & Reed 2006), which makes the algorithm more efficient.
Second, the size of the population in each run is adaptive with the size of nondominated solutions as recommended by Harik & Lobo (1999), termed as the 25% injection scheme. In this scheme, 25% of the population in the next run is the nondominated solutions and the remaining 75% is generated randomly. With the injection scheme, ɛ-NSGA II determines the population size more objectively, while the population size of other GAs, such as Deb's NSGA II, is specified by users with the trial-and-error method. The injection scheme simplifies the parameterization and makes the algorithm easy to use.
Third, the search is terminated automatically when the solutions are not improved or the population reaches the maximum. The spontaneous termination can minimize the generations of optimization (Reed et al. 2003).
In our work, the initial population size of ɛ-NSGA II was set to 24. The maximum generations to adjust the population size and the maximum number of populations are specified as 250 and 10,000, respectively. ɛ is specified as 0.001, which is determined by the smallest parameter search step length of Sr.
Surrogate model
Adaboost algorithm
Boosting is an ensemble algorithm for reducing bias and variance in supervised learning based on the theory of Kearns & Valiant (1994). In Boosting, a weak predictor refers to a simulator working only slightly better than random guessing, while a strong predictor means that it has a perfect prediction ability. Therefore, it is much easier to acquire a weaker predictor than a strong predictor.
AdaBoost (Freund et al. 1996) is one of the boosting methods which assembles several weak predictors into a strong predictor to increase the accuracy of ANN and have no requirement to guess a distribution-independent bound on the accuracy of weak predictors (Schapire 1990). AdaBoost is an iterative algorithm (Zhao et al. 2014). In each step of computations, different weights are assigned to training samples according to the errors of training results, which are used in the next step. A weak predictor is saved in one step of computations until the end of the loop. Each weak predictor is assigned a weight according to the predict error of the weak predictor. At last, the result of the strong predictor is a weighted mean of the results of these weak predictors. In this study, the standard backpropagation neural network (BPNN) is used as a weak learner.
The AdaBoost algorithm can solve the problem of unstable training results brought by the random generation of initial sample weights and thresholds in BPNN. Furthermore, it can change weights automatically to pay more attention to points, the training results of which are worse than others, to improve the accuracy of BPNN.
The range of the number of nodes in the hidden layer is from 4 to 14, which is calculated with the empirical formula proposed by Li & Liu (2009). The best number of nodes in the hidden layer is usually decided by the trial-and-error method. However, in AdaBoost, BPNN is used as the weak learner. The final results of the strong learner can be fixed in the iterative process, so the structure of one weak learner in AdaBoost is not so important as that of BPNN used in isolation. As a result, the number of nodes in the hidden layer is set to 10 in this study, which was around the median of the range (from 4 to 14). The maximum number of iterations and the training rate are set to 100 and 0.01, respectively, here. The sigmoid activation function whose ranges of input and output are scaled from −1 to 1 is selected to compute the output of each neuron. The default values of other parameters suggested by Beale et al. (2010) are employed in the training of weak learners. Ten weak predictors are trained and then assembled into a strong predictor.
Algorithm of SM
With the number of optimization generations increasing, the values of objective functions coverage in a range around the optimization objectives. The parameter sets selected as the calibrating results are usually distributed in this range. Therefore, we define the range that the values of objective functions converge in as the crucial range and the remaining range outside the crucial range as the trivial range. As more attention is paid to the results in the crucial range, a stepwise SM is proposed. The crucial boundaries of NSE at NGS and YC stations (NSENGS, NSEYC), which are the split lines between crucial ranges and trivial ranges, were set to the upper quarter quantiles of the training samples, and those of BIAS at NGS and YC stations (BIASNGS, BIASYC) were the lower quarter quantiles.
The SM can be divided into two steps, i.e., the global simulation step and the local simulation step. In the global simulation step, SM is used to estimate the values of objective functions in the entire parameter space and screen the values in the crucial range, where the values acquired in this step are not very accurate. In the local simulation step, SM simulates the values of objective functions in the crucial range more accurately. This step helps to acquire Pareto fronts, which are closer to VICs in the crucial range.
To train the stepwise SM, it needs an adequate number of samples in the crucial range. The uniformly distributed sampling algorithms, such as MC, Good Lattice Points (GLPs; Gong et al.2015a, 2015b), and Latin Hypercube (LH; McKay et al. 1979), can generate samples distributed uniformly in the whole sample space. However, these uniformly distributed sampling algorithms cannot generate more samples in the crucial range without adding a large size of samples in the trivial range. In our work, to acquire more samples in the crucial range, ɛ-NSGA II is used to generate the training samples.
Figure 2 indicates the flowchart for the parameter calibration based on SM and the algorithm is described as follows:
Step 1: Initial samples generation. An initial set of samples S are generated. In order to acquire sufficient samples in the crucial range to train SM, ɛ-NSGA II coupled with VIC is operated in a handful of generations. All the population in ɛ-NSGA II is used as the training and testing samples.
Step 2: SM training. After eliminating duplicates of S, SM is trained. The hold-out cross-validation is utilized, where 70% of samples are extracted randomly for the model training and the remaining 30% for the model testing. There are two steps to train SM. The global simulation step is training SM to simulate response surface in the entire range (both trivial and crucial ranges) and screen the points in the crucial range. The local simulation step is training SM to simulate the response surface in the crucial range in order to improve the simulation accuracy. The metrics in testing are used to show the performances of SM in both the global simulation step and the local simulation step.
Step 3: Calibration with SM. The trained SM is used to calibrate the parameters with ɛ-NSGA II instead of the VIC model. The optimal Pareto fronts Pf and the optimal Pareto solutions Ps of the SM in the crucial range are acquired.
- Step 4: Optimal Pareto solution evaluation. The optimal Pareto solutions of the SM in the crucial range Ps are used as the input parameters of VIC to calculate the values Pv of the objective functions. Pv is used as the validation to Pf to confirm whether Pf can meet the condition to replace the optimal Pareto fronts of VIC. ɛ-indicator (Zitzler et al. 2003; Kollat & Reed 2006) is utilized to evaluate the consistency between Pf and Pv, which is calculated as:where Iɛ(Pv, Pf) is ɛ-indicator between Pv and Pf; n is the number of objectives; z = [z1, z2, …, zn] is one vector in Pf; p = [p1, p2, …, pn] is one vector in Pv; |·| is the absolute function.
Step 5: SM updating. New samples consisting of Ps and Pf are mixed in S and Steps 2–5 are repeated until ɛ-indicator is less than a specified value.
Performance metrics
RESULTS
Sampling for SM training and testing
To investigate the relationship between the performance of SM with the size of training and testing samples, ɛ-NSGA II with a small number of generations was used to generate the samples. Here, the generations ɛ-NSGA II operated varied from 5 to 25, and each number of generations was repeated 100 times to investigate the uncertainty.
Figure 3 shows the relationship between ER of the four outputs of SM in the global simulation step and the increasing generations for sampling. In Figure 3(a), with the increase of generations, the uncertainty of ER of NSENGS reduces slightly, and the mean value also changes slightly. In contrast to NSENGS, the uncertainties and the mean values of ER of the other three outputs NSEYC, BIASNGS, and BIASYC decrease remarkably with the increase of generations as shown in Figure 3(b) and 3(c). Generally, the performance of SM in the global simulation step becomes stable when the number of sampling generations increases around 20.
The variations of RMSE of the four outputs of SM in the local simulation step with the increasing sampling generations are shown in Figure 4. As shown in Figure 4(a), the mean value and the uncertainty of RMSE of NSENGS decrease sharply with the number of generations increasing from 13 to 20, and then the decreases become mild gradually. In Figure 4(c), BIASNGS has a similar result and becomes stable with the number of sampling generations around 20. Meanwhile, as shown in Figure 4(b) and 4(d), the mean values and the uncertainties of RMSE of NSEYC and BIASYC decline sharply with the number of generations from 6 to 10 and they are also stable around the 20th generation.
With the increasing number of sampling generations, the crucial boundaries of the four outputs are also changed. As shown in Figure 5(a) and 5(b), the mean values of the crucial boundaries of NSENGS and NSEYC increase sharply with the number of sampling generations rising from 5 to 20 and then become stable, while their uncertainties decrease remarkably. The mean values of the crucial boundaries of BIASNGS and BIASYC decrease remarkably with the number of generations increasing from 5 to 10. As shown in Figure 5(c) and 5(d), their mean values are changed slightly and their uncertainties reach minimal values when the number of generations is from 11 to 20. After that, though their mean values continue to become smaller slightly, their uncertainties begin to become larger. This may be caused by the stochastic method to choose the training samples. In this case, when the limited number of samples with small values of BIAS is allocated into the training, the crucial boundaries will be smaller; otherwise, the crucial boundaries will be larger. When the number of sampling generations is around 20, the variations of mean values and uncertainties of the crucial boundaries of BIASNGS and BIASYC are not too large compared with those with 5–11 sample generations, so we did not delve more deeply in this part.
Considering the results shown in Figures 3–5, the training and testing samples for SM were generated by ɛ-NSGA II with 20 generations, with which the stepwise SM was trained. The crucial ranges of NSENGS, BIASNGS and NSEYC, BIASYC were ≥0.8385, ≤0.1145 and ≥0.8445, ≤0.2534, respectively.
Figure 6 shows the scatter plots of SM in the global simulation step in training and testing with SM outputs on the vertical y-axis against corresponding VIC outputs on the horizontal x-axis. The blue line is the ‘y = x’ line. As shown in Figure 6(a)–6(h), the points are distributed along the ‘y = x’ lines and most outliers are in the trivial ranges. ERs for the four outputs in SM in the global simulation step are also calculated, which are shown in Figure 6 with a maximum value of 1.10% in training and 2.56% in testing. This means that SM in the global simulation step shows good performance to classify the response points into the crucial range and the trivial range.
Figure 7 shows the performance of SM outputs in the crucial range in the training and testing phases. In Figure 7, the points are almost distributed along the ‘y = x’ lines, except Figure 7(e) and 7(f). In Figure 7(e) and 7(f), the points clearly deviate from the ‘y = x’ lines, with RMSE of 0.0043 in training and 0.0056 in testing. However, compared with NSEYC simulated in the global simulation step whose values of RMSE in the crucial range are 0.0161 and 0.0152, the performance of SM in the local simulation step on the simulation of NSEYC in the crucial range is improved greatly. In the local simulation step, with the maximum RMSE value of 0.0057 in testing, the SM outputs in the crucial range are reliable to replace the VIC outputs.
Parameter calibration and validation
The trained stepwise SM was operated in ɛ-NSGA II and updated until the ɛ-indicator between the optimal Pareto fronts of SM (FSM) and the validated-SM fronts (FVS) in the crucial range was less than 0.0275, where FVS are the values of objective functions calculated in the original model with the optimal Pareto solutions of SM (PSM). The optimal Pareto solutions of VIC (PVIC) were acquired in ɛ-NSGA II, which were terminated with the total populations to the maximum value of 10,000.
As shown in Figure 8, green points are the optimal Pareto fronts of VIC (FVIC), and red points and gray points are FSM and FVS, respectively. Figure 8(a)–8(c) shows FSM in the three iterations, respectively. In the third iteration, ɛ-indicator between FSM and FVS in the crucial range reaches 0.0261, so the optimization is terminated with three iterations. In Figure 8(a) and 8(b), there are large deviations between FSM and FVS, and the red points and the green points in the crucial range are clearly gathered together into individual groups. This phenomenon disappeared in Figure 8(c), where red and green points in the crucial range are mixed. Interestingly, it is found that the ɛ-indicator of the second iteration at 0.0663 is slightly larger than the one in the first iteration at 0.0480. This is caused by the further convergence of FSM to the ideal point, where some of FSM are not included in the range of training samples and the SM needs to be updated. The ideal point is a four-dimensional objective point with NSENGS of 1, BIASNGS of 0, NSEYC of 1, and BIASYC of 0.
The final results of FVIC, FSM, and FVS are shown in Figure 8(c). In the crucial range, there are larger deviations between FVIC and FSM, while FVIC and FVS are more similar. In particular, compared with FSM, larger NSENGS and smaller BIASNGS are obtained in FVIC at the NGS station.
The optimal Pareto solutions of VIC and SM in the crucial range found by ɛ-NSGA II are presented in Figure 9. The optimal Pareto solutions of VIC in the crucial range (gray lines) show a similar structure and distribution with those of SM in the crucial range (orange lines). For Pareto solutions in the trivial range, there are significant differences between VIC (gray dashes) and SM (orange dashes), especially in the parameter D3. The solutions of SM in the trivial range are also much fewer than those of VIC. The loss of solutions in the trivial range is because SM mainly concentrates on the simulation in the crucial range, and samples for SM updating are also mainly distributed in the crucial range. In the crucial range, the parameter Sr has different distributions in VIC and SM. Sr in VIC ranges from 0.0011 to 0.0147, while that in SM converges around 0.0030. It may be caused by the low sensitivity of Sr (Houle et al. 2017).
Seen from Figure 9, the compromise solution of VIC (P1), whose corresponding front is the closest to the ideal point, is shown as the black line, and that of SM (P2) is the green line. These two solutions, P1 and P2, were used as the input parameters of the VIC model to verify whether P2 could replace P1 in the streamflow simulation.
At NGS and YC stations, simulated daily streamflows with P1 and P2 during the calibration and validation periods are displayed in the left panels of Figure 10, and the right panels show the corresponding empirical CDF curves. It illustrates that the empirical distributions of the simulation with P1 are consistent with those with P2, and the two series of simulations are also well consistent. It is noted that the peak flow simulation with P2 is slightly degraded compared with that with P1.
With the parameter set P1, the values of NSE at the NGS station during the calibration and validation periods are 0.8650 and 0.8609, respectively, while those of BIAS are 0.0294 and 0.0413, respectively. Similarly, the values of NSE at the YC station during the calibration and validation periods are 0.8898 and 0.8933, respectively, and those of BIAS are −0.1443 and −0.1782, respectively, where the absolute symbol of BIAS function is removed to evaluate whether the streamflow is overestimated or underestimated. The performance of the simulation with P2 is slightly worse at the NGS station with NSE of 0.8421 and 0.8429, and BIAS of 0.0433 and 0.0604 during the calibration and validation periods, respectively, but slightly better at the YC station with NSE of 0.8959 and 0.8995, and BIAS of −0.1337 and −0.1663. The values of BIAS at NGS show that the streamflow is overestimated slightly and the negative values at YC mean that the streamflow is underestimated with both P1 and P2. In general, the daily simulation results simulated with the parameter set obtained by the SM at NGS and YC demonstrate decent applicability.
Efficiency evaluation
The efficiency of the SM is also a key point as our original intention is to save the calibration time. One execution of VIC in the study area requires about 24 min on one processor of Intel Core CPU i5-7500 3.40 GHz (Ubuntu 16.04 LTS for desktop). Hence, it takes a long time to calibrate the parameters with the original model even if the parallel mode is conducted, where all the four processors were used in parallel.
The optimal Pareto fronts of VIC in the final generation of ɛ-NSGA II (when the whole population reaches 10,000) in the crucial range were used as the reference fronts to calculate ɛ-indicators of the optimal fronts of VIC in the crucial range obtained from each generation and ɛ-indicators of the optimal fronts of SM in the crucial range obtained from each iteration. Figure 11 shows the ɛ-indicator variation against the time consumed. The time consumed for the parameter calibration shrinks sharply by using SM instead of VIC. The computational time to acquire the solutions of SM is about 58 h with the ɛ-indicator of 0.0164, where the time consumed for sampling occupies 78.72%. The ɛ-indicator for the VIC model is larger than the value of 0.0164 until the optimal generations reach 250, which costs about 561 h. The computational saving is 89.66% when the ɛ-indicator for VIC is less than the one for SM for the first time.
DISCUSSION
Comparison of different sampling methods
Sampling using ɛ-NSGA II with a small number of generations in our work is to acquire more samples in the crucial range. To show the difference from using the sample generator based on ɛ-NSGA II instead of the uniformly distributed sampling algorithms, two popular approaches, GLP and LH, were compared with the ɛ-NSGA II sample generator.
In this study, ɛ-NSGA II generated the training samples using 20 generations with 24 populations in each generation. As a result, the size of samples generated by GLP and LH was set to 480. To acquire a large enough number of samples in the crucial range and the trivial range, the testing samples consisted of two parts. The first testing samples were generated with MC. The other part was the population in the final generation (the 280th generation) of ɛ-NSGA II when calibrating VIC parameters without SM. The size of the testing samples was 306.
Figure 12 shows the performance of SM in the global simulation step with the three sample generators. As shown in Figure 12, the performance of SM with samples generated by ɛ-NSGA II has a better performance than that of SM with samples generated by GLP and LH. Especially, for BIASNGS and BIASYC, ER of SM with GLP and LH is up to around 50%. It is found that the sources of ER in SM with the three sample generators are different. For SM with the sample generator based on ɛ-NSGA II, the first type of error makes a major contribution to ER. However, ERs of SM with GLP and LH are mainly from the second type of error, which means that it is much easier for SM with GLP and LH to misclassify the points in the crucial range into the trivial range. This is caused by insufficient training samples when using GLP and LH sample generators. If these uniformly distributed sample generators are utilized to train SM, additional iterations are needed in order to update SM.
Limitations and prospects
For the multi-objective optimization algorithms, ɛ-NSGA II is one of the good choices. As shown in many previous studies, ɛ-NSGA II is, in fact, highly efficient in solving multi-objective problems (Hadka & Reed 2012; Hurford et al. 2014; Wang et al. 2014; Liu et al. 2017). Other methods, such as NSGA II, NSGA III (Deb & Jain 2013), MOCOM-UA (Yapo et al. 1998), and SPEA2 (Zitzler et al. 2001), can also be utilized as optimization algorithms instead of ɛ-NSGA II. As for the efficiency comparison of different algorithms, it has been studied for many previous works (Reed et al. 2003; Hadka & Reed 2012), and it is therefore not the key point in our work. However, the compatibility of the SM with different optimization algorithms is not investigated in this work, which is to be studied in our future work.
In this study, the NSE which describes the simulation skill of modeled streamflow with the observations and BIAS which reflects the error between the simulated and observed streamflow were selected to evaluate the performance of the model. There are many other evaluation indicators and statistical indices to quantify the accuracy of streamflow estimates against observations from gauges, such as correlation coefficient, probability of detection, mean error (Zhu et al. 2016), and Kling–Gupta efficiency (Gupta et al. 2009). Although different choices of the evaluation indicators may influence the parameters calibrated and the performance of VIC, the efficiency of SM is not affected.
The UBR basin is used as a case study here. Figure 10 illustrates that two large peak flows occurred in the summer of 1999 and 2000, which are the first two years of the calibration period. The streamflow in other years of the calibration period is similar to that in the validation period. According to Klemeš (1986), large floods are suggested to be included in the calibration period. This makes a good performance of both VIC and SM in the validation period, especially at the YC station, where NSEYC in the validation period is larger than that in the calibration period. The SM prediction capabilities are mainly limited by the original hydrological model. When some extreme floods occur in the validation period and result in poor performance of VIC, the SM prediction capabilities are also affected.
In terms of the UBR basin, as shown in Figure 10, the high flow is overestimated at the NGS station and the low flow is underestimated at the YC station. The ungauged condition in the upper stream of the UBR basin may result in these deviations. Although the area of glaciers in the study basin is small, the glacier melt water calculated as a part of snow melt water without a special glacier melt module can also produce a small amount of errors. Some studies indicate that remote sensing data such as TRMM 3B42 precipitation data product (Wang et al. 2017) and GRACE total water storage changes (Chen et al. 2017) can improve the performance of VIC. Besides, the glacier algorithm based on energy balance embodied into VIC in the study area can improve the performance of VIC (Zhao et al. 2013). In our work, the performance of VIC can be accepted with the smallest NSE of 0.8421 and the largest BIAS of 0.1782. Hence, the improving methods mentioned above are not conducted but can be adopted in our future work.
CONCLUSIONS
In this study, a stepwise SM was proposed for VIC parameter calibration. This SM-based algorithm has been applied in the UBR basin successfully. The method of AdaBoost was used to screen the parameter sets whose values of objective functions are in the crucial range and then the values of objective functions in the crucial range were simulated more accurately. ɛ-NSGA II was adopted as the calibration strategy. The main results we found are as follows:
- (1)
The AdaBoost algorithm performed well with a low ER in the global simulation step and a small RMSE in the local simulation step. The response surface interpolated with SM could be used for parameter calibration instead of VIC.
- (2)
Compared with the optimal Pareto solutions of VIC in the crucial range, there is no large difference for the optimal Pareto solutions of SM in the crucial range.
- (3)
The stepwise SM accelerated the calibration process in the UBR basin remarkably, and SM saved up to 90% time to obtain a similar solution of VIC.
ACKNOWLEDGEMENTS
The authors thank the National Natural Science Foundation of China (91547106) and National Key Research and Development Plan ‘Inter-governmental Cooperation in International Scientific and Technological Innovation’ (2016YFE0122100) for the financial support. China Meteorological Administration, Tibet Bureau of Hydrology, and Water Resources Survey are also greatly acknowledged for providing meteorological and hydrologic data used in this study.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.