Improving the integrated hydrological simulation on a data-scarce catchment with multi-objective calibration

The process-based hydrological model Soil and Water Assessment Tool ensures the simulation’s reliability by calibration. Compared to the commonly applied single-objective calibration, multiobjective calibration benefits the spatial parameterization and the simulation of specific processes. However, the requirements of additional observations and the practical procedure are among the reasons to prevent the wider application of the multi-objective calibration. This study proposes to consider three groups of objectives for the calibration: multisite, multi-objective function, and multimetric. For the study catchment with limited observations like the Yuan River Catchment (YRC) in China, the three groups corresponded to discharge from three hydrometric stations, both Nash– Sutcliffe efficiency (NSE) and inversed NSE for discharge evaluation, and MODIS global terrestrial evapotranspiration product and baseflow filtered from discharge as metrics, respectively. The applicability of two multi-objective calibration approaches, the Euclidean distance and nondominated sorting genetic algorithm II, was analyzed to calibrate the above-mentioned objectives for the YRC. Results show that multi-objective calibration has simultaneously ensured the model’s better performance in terms of the spatial parameterization, the magnitude of the output time series, and the water balance components, and it also reduces the parameter and prediction uncertainty. The study thus leads to a generalized, recommended procedure for catchments with data scarcity to perform the multi-objective calibration.

The classic calibration approach applied a regressionbased summary statistic (Gupta et al. ) as the objective function, e.g. Nash-Sutcliffe efficiency (NSE), to optimize the goodness of fit of the output variable, e.g. the discharge at the catchment outlet. This approach is also referred to as single-objective calibration if only one output variable is evaluated and it is fundamental to perform the calibration procedure. However, the limitations of the single-objective approach are also obvious. It neglects the trade-off among interactive processes and responses at the interior location of the watershed (Yen et al. ), and the neglect of the catchment heterogeneity will lead to the parameters selected inconsistent with their physical meanings (Zhang et al. ). Also, applying only one objective function tends to derive a biased assessment, e.g. NSE is more sensitive to the peak values (Krause et al. ) and leads to a less reliable simulation in the low-flow period.
Multi-objective calibration was later proposed as a diagnostic approach (Gupta et al. ) that applied additional data, e.g. time-series or qualitative data, for validation (Yilmaz et al. ). The implementation of the additional observations in the calibration, e.g. mean annual water balances (Pfannerstill et  Multi-objective calibration optimizes the parameters to meet the multiple criteria that control varied processes. Due to the existing conflicts between objectives in hydrological modeling, it is often not feasible to obtain the best performance of all the objectives simultaneously. The classic aggregation approach converts the objectives into a scalar function, e.g. the weighted average of all objectives (Zhang et al. ). The multi-objective evolutionary algorithms (MOEAs) (Schaffer ) combine the evolutionary algorithms with the optimization of multiple objectives, e.g. the nondominated sorting genetic algorithm II (NSGA-II) (Deb et al. ).
The multi-objective calibration has not yet been widely adopted by the integrated hydrological models, such as the SWAT. The reasons include the requirements of additional observations, which are critical to catchments with limited measurements, and the lack of a practical procedure for implementation. Therefore, we proposed to overcome the data scarcity by utilizing satellite-based datasets and metrics extracted from the existing observations. The To analyze the applicability of the multi-objective calibration on a catchment that only measured discharges were available for calibration, we conducted this study by utilizing the data from the MOD16 ET dataset and the existing measured discharge as the additional objectives.
We implemented one representative aggregation approach, the Euclidean distance (ED), and one representative MOEA, the NSGA-II. ED is to aggregate the objectives (Gupta et al. ; Pfannerstill et al. ), and NSGA-II is a fast and efficient population-based optimization technique (Ercan & Goodall ). The research was designed to achieve the following aims: (1) to prove the applicability of the three objective groups with ED and NSGA-II, and (2) to demonstrate the advantages of ED and NSGA-II in comparison to the single-objective calibration.

METHODOLOGY
The study area and the input data  Table 1.

The SWAT model
The SWAT is an integrated process-based hydrological model to predict the long-term impact of land management practices on the water, sediment, and agricultural chemical yields on catchments (Neitsch et al. ). The computational unit of the SWAT is the hydrologic response unit (HRU) that consists of a certain land-use type, soil type, and slope in a sub-watershed. The hydrological cycle is simulated based on the water balance equation. The Soil Conservation Service (SCS) curve number method quantifies the daily surface runoff. Each soil layer is assumed to be  homogeneous horizontally. The Green-Ampt infiltration method and the storage routing method calculate the infiltration and percolation rates. The variable storage method was applied to compute the discharge in the stream. The groundwater system is simplified by the SWAT model to be a shallow aquifer and a deep aquifer.
The objectives for multi-objective calibration

Multisite
At the sequential order from upstream to downstream, the subcatchments were grouped into three subcatchment clusters based on the hydrometric stations (see Figure 1). Therefore, every parameter had a different value at each cluster, e.g. parameter CN2 (Table 4) was assigned to LX, MZ, and JK clusters, as CN2_LX, CN2_MZ, and CN2_JK, respectively. Subcatchments at the downstream of the JK station were ungauged and shared similarities of the identical land-use types (e.g. paddy field and forest) and soil types (e.g. red soil and paddy soil) with the subcatchments between MZ and JK; thus, the calibrated parameters in the JK cluster at the upstream of the JK station were assumed to be applicable at the ungauged subcatchments.

Multi-objective function
where Y obs i and Y sim i are the observed or simulated values at time step i, respectively; Y obs mean is the mean value over the period n.

Multi-metric
Actual ET. The observed actual ET (AET) was obtained from the MOD16 ET at a monthly time step from 2008 to 2014. A global average mean absolute error (MAE) of 24.1% was obtained by validating at 46 field-based eddy covariance flux towers (Running et al. ). Therefore, in this study, an uncertainty band with MAE at ±24.1% was applied to the MOD16 ET. The evaluation statistic of the AET simulation was computed as AET cover , which is the coverage of the simulated AET by the uncertainty band, as listed below.
where N is the total number of simulated AET in a simulation, and No_of_covered is the number of simulated AET covered by the MOD16 ET uncertainty band. The simulated monthly ratio of surface runoff (including lateral flow) or groundwater return flow to water yield was computed for each subcatchment cluster. Percent bias (PBIAS) was used to evaluate the deviation of the simulated ratio to the observed.
The calibration approach Single-objective calibration with Sequential Uncertainty Fitting ver. 2 SUFI-2 (Sequential Uncertainty Fitting ver. 2) (Abbaspour et al. ) was the single-objective calibration approach applied in the study, and the only objective is the simulated discharge at JK evaluated by the NSE. The latin hypercube sampling (LHS) generated the parameter sets for every iteration, which is assumed to be a uniform distribution of the parameters in their selected value range. Accordingly, a Jacobian matrix was formed by all the parameter sets, and the Cremér-Rao theorem updated the parameter values for the next iteration. In this study, two iterations were performed for the hydrologic calibration and each iteration contained 2,000 simulations.

Multi-objective calibration with ED
The equation below listed the ED computed to aggregate the evaluation statistic for each objective.
where Obj i is the evaluation statistic of each objective, and N is the number of objectives. the Rank ¼ 1 on. When Rank ¼ j can only be partially selected, the priority of the simulations in Rank ¼ j will be determined by the crowding distance that simulations with distinct performances are preferred. After forming one generation, parameter sets, respectively, are updated via mutation and crossover. The diversity of the population is achieved by the crowding distance, and elitism is maintained by the merge of the current and previous generations for evolution. In this study, the NSGA-II was executed by R language, with the code adapted from Whittaker () and Ercan & Goodall ().

Multi
The evaluation of the convergence. The convergence of the Pareto Front indicates how well the former generation can surpass the latter generation, which is determined by the population size and the number of generations. Therefore, the convergence was used in this study as a sign of whether more generations are needed for evolution. The convergence is quantified by the C function (Zitzler & Thiele ).
where G n and G nÀ1 are n and (n À 1) generations; the denominator is the number of simulation in G n and the nominator is the number of simulations in G n that is dominated by or equal to simulations in G nÀ1 . When C(G nÀ1 , G n ) equals 1, all simulations in G n are dominated by or equivalent to G nÀ1 .

The standardized evaluation statistics
The original value ranges of the evaluation statistics differed, and they were thus standardized to enable the optimization by the nondominated sorting or the ED and to have the optimal value of 0, as listed in Table 2.

The summary of the metrics
To ensure the applicability and the efficiency of the nondominated sorting algorithm, it was proposed in this study to substitute the 13 individual objectives into three groups of categorized objectives, which are: (1) NSE for the three hydrometric stations; (2) NSE_in for the three stations; and (3) the water balance components. Each categorized objective is the ED of the individual objectives included.
The categorized objectives are thus the objectives considered by ED and NSGA-II. All the objectives obtained for the YRC are summarized in Table 3.

The calibration procedure
Ten sensitive parameters (Table 4) were selected for the calibration and assigned with an individual value to each subcatchment cluster (see Figure 1), and each approach thus calibrated 30 parameters in total. The initial iteration  another 2,000 simulations as the next iteration or generations. Therefore, for each calibration approach, 4,000 simulations were performed in total. The procedure is displayed in Figure 3. The best simulation of SUFI-2 is the one with the lowest NSE (standardized) at JK, and ED is the one with the lowest ED value. NSGA-II derived the Pareto Front with optimal simulations, and thus the best simulation is defined as the one with the lowest ED value.
In this study, the Pareto Front contained 31 simulations.
Therefore, to process the comparison, the first 31 simulations according to their performances were also obtained from SUFI-2 and ED.

RESULTS
The convergence of the NSGA-II RAM, the number of simulations is a critical factor to determine the efficiency of the calibration approaches. Therefore, the population size and generations should be first analyzed.
To be consistent with SUFI-2 and ED, the population size that reached the convergence within 2,000 simulations was preferred. Figure  proposed to determine the convergence to be reached if the C function is 1 in consecutive 10 generations. In this study, due to the preferred lower number of simulations, the convergence was defined to be reached if C function values were always above 0.8 after this generation.
In Figure 4, though the C function values are fluctuating, as the generation increases, population sizes of 50 and 100   A population size of 50 was thus selected for the study.
The parameters for calibration

The best simulation comparison
The radar chart (Figure 9) displayed the best simulation of SUFI-2, ED, and NSGA-II. It is obvious to visualize the biased results obtained from SUFI-2 that larger values of most objectives were obtained. In contrast, though ED and NSGA-II had higher NSE_JK, JK_s, and AET cover values than SUFI-2, the remaining objectives showed a significantly better performance. ED and NSGA-II generated an equivalent simulation of the discharge at all stations, and NSGA-II had a lower bias of baseflow and surface runoff ratios at LX and JK.
The water balance components were generated at each subcatchment cluster (see Figure 10) thus the higher threshold for baseflow to occur and, consequently, a lower simulated baseflow than the other two approaches. A higher POT_VOL and CN2 at JK of NSGA-II would also contribute to the higher surface runoff rate.

DISCUSSION
The evaluation of the objectives The multi-metric objectives, excluding AET, were derived from the existing measured discharge. Multi-objective function is applied to ensure both low and peak values of the time series. These two categories were applied in the calibration without the effort of collecting additional observations. Therefore, the fundamental observations, e.g.
discharge, are likely sources to obtain objectives for catchments without additional observations. The AET obtained from the MOD16 ET was applied previously as a reference to assess the model's perform- uncertainty. Therefore, when evaluating the calibration results, higher tolerance of the multi-metric in terms of the uncertainty and the performance should be considered.

The evaluation of the three calibration approaches
The trade-off among objectives

Sorting algorithm comparison
When compared to NSGA-II that the parameters were optimized after each generation, ED is a post-processing procedure in this study. The value range of each objective is not identical in this study. Therefore, in comparison to nondominated sorting, ED is a more subjective approach.
The scalar function of ED is computed based on the absol- was not comparable to SUFI-2 or ED. However, the larger the population size is, the more diverted the parameter values can be selected. However, to guarantee the convergence and the stability of the performance of the last generation within 2,000 simulations, a population size of 50 was applied. Therefore, comparably more constrained parameter value ranges are displayed, and a narrower prediction uncertainty was also obtained by NSGA-II. Meanwhile, it should be mentioned that, though the multi-objective calibration leads to a reduced parameter and, thus, a prediction uncertainty, it cannot overcome the uncertainty introduced by the input data scarcity, e.g. the accuracy of the rainfall data or the measured hydrometric data, and the model itself, e.g. the limitation of the empirical SCS curve number method to capture the surface runoff, as displayed by the relatively large discrepancy of the simulated peak values in Figure 7, which also indicates the limits of the calibration approaches in terms of improving the model's performance.

CONCLUSION
In terms of the comparison between single-objective calibration and multi-objective calibration, the key findings are summarized as follows: 1. The trade-off is unavoidable among objectives. However, multi-objective calibration can add extra constraints to the model to ensure the reliability of the critical internal processes for model application.
2. The applicability of NSGA-II is also determined by the proper size of population and generation. If the computational effort is critical, the smaller size of the population is preferred.
3. ED and NSGA-II are both suitable multi-objective calibration approaches, represented by a constrained parameter uncertainty and prediction uncertainty.
Therefore, it is highly recommended to apply the multiobjective calibration to the process-based hydrological model, like SWAT, even to the catchment with limited observations. The multi-objective calibration framework applied in the study is also provided a practical procedure for the calibration of catchments similar to the YRC. The key points include the following: 1. Preparing the observations according to the multisite, multi-objective function, and multi-metric with the best use of already obtained datasets, e.g. measured discharge, but not necessarily the objectives applied in the study.
2. If the ED or NSGA-II is considered as the calibration approach, their applicability and suitability should be analyzed according to the processing time, the number of objectives, the objectivity of the dominance, and the equifinality.
The advantage of the multi-objective calibration applied for the process-based hydrological model is clearly illustrated in the study, which would be a great benefit for future application of the model, e.g. an expected improvement of the nutrient load simulation at the inner stations due to a more reliable simulated discharge. However, multi-objective calibration cannot overcome the uncertainty introduced by the model's approach and the input data;