## Abstract

Root zone soil moisture plays an important role in water storage in hydrological processes. The recently launched Soil Moisture Active Passive (SMAP) mission has produced a high-resolution assimilation product of global root zone soil moisture that can be applied to improve the performance of hydrological models. In this study, we compare three calibration approaches in the Beimiaoji watershed. The first approach is single-objective calibration, in which only observed streamflow is used as a benchmark for comparison with the other approaches. The second and third approaches use multi-objective calibration based on SMAP root zone soil moisture and observed streamflow. The difference between the second and third approaches is the metric used to characterize the root zone soil moisture. The second approach applies the mean, which was commonly used in previous studies, whereas the third approach applies the hydrologic complexity *μ*, a dimensionless metric based on information entropy theory. These approaches are implemented to calibrate the distributed hydrological model MIKE SHE. Results show that the root zone soil moisture simulation is clearly improved, whereas streamflow simulation suffers from a slightly negative impact with multi-objective calibration. The hydrologic complexity *μ* performs better than the mean in capturing the features of root zone soil moisture.

## INTRODUCTION

Parameter calibration is an important step in hydrological models. The consistency of observed and simulated streamflows is a common criterion used to determine whether model results are reliable (Hu *et al.* 2005; Li *et al.* 2014a; Liang *et al.* 2017). Multi-objective calibration involving phases of flow such as the flow duration curve, low flow or high flow is employed to accurately simulate streamflow (Pfannerstill *et al.* 2014). Similarly, several performance functions such as Nash–Sutcliffe efficiency (NSE) (Nash & Sutcliffe 1970), mean absolute error (MAE) (Willmott & Matsuura 2005), Kling-Gupta efficiency (KGE) (Gupta *et al.* 2009), and combinations can be employed to produce reasonable streamflow results (Yang *et al.* 2014; Zhang *et al.* 2016).

Hydrological processes are a complex whole consisting of different components (Li *et al.* 2014b), and no evidence exists showing that the use of only streamflow information can produce better estimates for other components such as evapotranspiration and soil moisture. Additional hydrological information can lead to greater constraints for related processes in model calibration (Pfannerstill *et al.* 2017). The constraints can be designed to increase the predictive power and the ability to reproduce hydrological processes (Hrachowitz *et al.* 2015). However, it is nearly impossible to optimize the objective variables of all components when calibrating hydrological models, and trade-offs between hydrologic variables must be considered (Ahmadi *et al.* 2014). Soil moisture plays a key role in the water cycle. Hence, guaranteeing the proportion of soil moisture in the hydrologic model is of great practical importance for the water balance in each hydrological process (Rajib *et al.* 2016).

Soil moisture has widely varying features of temporal and spatial distribution at the watershed scale, and thus, determining how to capture the most prominent features of this distribution is an important problem. Studies on the mean and variance of soil moisture distribution throughout watersheds are abundant. Since soil moisture is a bounded variable, the variance of soil moisture exhibits different relationships with the mean of soil moisture in different climatic regions (Meyles *et al.* 2003; Martínezfernández & Ceballos 2003; Lawrence & Hornberger 2007). According to Mahmood & Vivoni (2008, 2011), the correlation length (a measure of inhomogeneities, which represents the decay rate of correlation functions (Chayes *et al.* 1986; Simonovski *et al.* 2004).) and coefficient of variance together can partially capture the spatial and temporal distribution features of soil moisture. Castillo *et al.* (2015) introduced hydrologic complexity *μ*, a metric characterizing the distribution of soil moisture based on information entropy, which provides richer information than mean and variance alone.

Most studies have focused on the assimilation of *in situ* or satellite products into land models to increase the accuracy of surface soil moisture estimates (Reichle & Koster 2004), but only a few studies have integrated the satellite surface soil moisture product into the calibration of hydrological models (Campo *et al.* 2006; Kunnath-Poovakka *et al.* 2016). None of these studies have revealed evident changes in model outputs such as overland flow, streamflow, and deeper soil moisture (root zone soil moisture, ). The coarse resolution and shallow depth (top 5 cm in depth) of the satellite product and the conceptualization of calculating the surface and values are the reasons for the limited performance of the simulated results (Brocca *et al.* 2011).

The functions of soil moisture in a hydrologic model are rainfall partitioning and water storage. The storage effect of soil moisture on hydrological processes that can last several days or months is primarily reflected in the root zone (Fenicia *et al.* 2007). Albergel *et al.* (2008) showed that the index derived from satellite surface soil moisture product, an empirical filter referred to as the soil water index, was used in model calibration. Chen *et al.* (2011) used an ensemble Kalman filter assimilated into soil moisture observation to improve the simulation of . Although it is limited in sensing depth, the Soil Moisture Active Passive (SMAP) mission (Entekhabi *et al.* 2010), launched in 2015, allows high-resolution estimates of based on land surface data assimilation. These estimates can help hydrological models improve the simulation of related processes through multi-objective calibration.

In this study, the SMAP root zone soil moisture product and the observed streamflow are used to calibrate the distributed hydrological model MIKE SHE. Our objectives are to: (i) evaluate the performance of MIKE SHE using both and streamflow for calibration; (ii) compare two metrics describing the spatial and temporal distribution features of (the mean and hydrologic complexity *μ*); and (iii) ascertain the change sensitivity of relative parameters with different objectives in the calibration process. The remainder of the paper is organized as follows. The section below presents a description of the study area and the data set used. The MIKE SHE setup, the hydrologic complexity *μ*, and the calibration methods and relative parameter sensitivity method are described in the methodology section. The results of calibration and parameter sensitivity analysis are detailed in the results and discussion section, with the final section presenting the conclusions from the study.

## STUDY AREA AND DATA

The Beimiaoji watershed, illustrated in Figure 1, is located in eastern China and belongs to the Huaihe River basin. The basin area covers 1,710 km^{2}, and a drainage system drains the watershed with one main stream and a tributary. The elevation ranges from 27 m to 885 m above mean sea level (a.m.s.l.). The annual precipitation is 1,000–1,400 mm, mainly concentrated in summer, and the average annual temperature is 15 °C. More than 85% of the watershed land use is cropland. The major crop is rice, and the soil texture is clay.

MIKE SHE for the Beimiaoji watershed requires the following data: (i) 1:100,000 land cover database using the Land Cover of China Product from the Environmental and Ecological Science Data Centre for West China (WESTDC) (Ran *et al.* 2010); (ii) the 90 m digital elevation model (DEM) using the SRTM-DEM database v4.1 from the International Centre for Tropical Agriculture (Jarvis *et al.* 2008); (iii) the 1:1,000,000 soil database using the Harmonized World Soil Database (HWSD) v1.1 from the International Soil Reference and Information Centre (Nachtergaele *et al.* 2009); (iv) the observed daily precipitation and streamflow time series from gauge stations; and (v) meteorological data, such as minimum temperature, maximum temperature, solar radiation, and wind speed, obtained from the China Meteorological Data Service Centre (CMDC) and input into the FAO Penman–Monteith equation (Allan *et al.* 1998) to calculate the daily reference evapotranspiration (Mattar 2018). In addition, remotely sensed root zone soil moisture data (top 100 cm in depth) are extracted from the SMAP L-Band brightness temperature data assimilated into a land surface model. The product (SMAP L4 9 km EASE-Grid Surface and Root Zone Soil Moisture Geophysical Data (SPL4SMGP) version 3) retrieved by the National Snow and Ice Data Centre following Reichle *et al.* (2017) is used in our study. The SMAP root zone soil moisture product is supplied with volumetric soil moisture (unit: m^{3}/m^{3}) dated from March 31, 2015. The spatial resolution is 9 km × 9 km, and the temporal resolution is 3 h. We used the arithmetic mean method to pre-process the remotely sensed images distributed in 3-h intervals throughout a given day into daily average data.

## METHODOLOGY

### Model setup

In its original version, MIKE SHE is a deterministic, physically based, distributed model for the simulation of all of the major hydrological processes. Currently, this model is developed as an integrated process-based model coupled with conceptual and physically based methods, and we can solve water resource and ecological problems at a variety of spatial and temporal scales according to our project purposes and data availability (Christian *et al.* 2010; Thompson 2012).

We chose certain processes to describe the hydrological process in the MIKE SHE model. Overland flow routing is modeled using the diffusive wave approximation of the 2D Saint-Venant equations. Channel flow is based on the kinematic wave approximation of the 1D Saint-Venant equations. Evapotranspiration is calculated using the Kristensen and Jensen method with the required time series for reference evapotranspiration, leaf area index (*LAI*), and root depth (*R _{d}*). Unsaturated water flow is modeled as a vertical flow using the 1D Richards' equation. The Van Genuchten function is input to simulate the vertical water flow in the root zone, where the evapotranspiration is also calculated. The saturated zone (ground flow) calculation is based on the linear reservoir replacement finite difference of the 3D Darcy equation due to a lack of detailed groundwater data. Use of a linear reservoir method for the saturated zone in MIKE SHE involves the definition of the interflow reservoirs and baseflow reservoirs as well as their parameters, e.g., time constant.

The model domain is defined using a dfs2 grid file, and the grid size is fixed at 600 m × 600 m (after Vázquez *et al.* 2002). Inverse distance method is used to interpolate gridded data. The top 1 m of soil depth below the ground surface is simulated in MIKE SHE. The number of vertical discretization of the soil profile is specified as 10, and each soil layer is 10 cm in depth. The simulation period ranges from April 1, 2015, to October 31, 2017, or a total of 31 months, of which, the first seven months are the warm-up period and the remaining two years are the actual simulation period (estimation period). The time step is one day.

### Hydrologic complexity

*et al.*(2015). This measure assumes that soil moisture plays a leading role in the hydrologic response and is not suitable for special basins such as karst geology. The hydrological complexity metric

*μ*, a dimensionless index, is introduced in this study and is defined as follows: where

*X*is a set of

*x*, , and

*X*is relative soil moisture; is a probability density function; is a uniform distribution function; expresses volumetric soil moisture, and when soil moisture is saturated and when the soil is dry; and

*D*is the Kullback–Liebler divergence.

_{KL}We use the hydrologic complexity *μ* to characterize the distributional features of that evolve over time. The range of *μ* is from 0 to 1. In the simplest case, *μ* is equal to zero, suggesting that is the same and that the function *f*(*x*) is a Dirac delta function. When is uniformly distributed from 0 to and is a uniform distribution function, *μ* is equal to 1 and is the most complex case.

### Multi-variable calibration

We use the daily time series of the observed streamflow and to calibrate the distributed hydrologic model MIKE SHE. Nine parameters involving surface/vegetation, unsaturated zone, and saturated zone (Table 1) are used in calibration and sensitivity analysis based on previous studies (McMichael *et al.* 2006; Hansen *et al.* 2009; Christian *et al.* 2010). The ranges of the parameters are set according to previous studies or prior knowledge. The Manning number (*M*) and detention storage (*D _{s}*) are used to compute the overland flow. Because the majority of the Beimiaoji watershed is covered with rice fields, the

*R*and

_{d}*LAI*of rice are the inputs for computing evapotranspiration. Saturated moisture content (

*θ*), residual moisture content (

_{s}*θ*) and saturated hydraulic conductivity (

_{r}*K*) are used to characterize the property of clay soil types. Time constants

_{s}*K*and

_{i}*K*, respectively, control the interflow and baseflow in MIKE SHE.

_{b}Parameter . | Units . | Lower bound . | Upper bound . | Optimal values . | ||
---|---|---|---|---|---|---|

C1 . | C2 . | C3 . | ||||

D _{s} | mm | 0 | 20 | 4.5 | 3.4 | 1.2 |

M | m^{1/3}/s | 10 | 100 | 26.3 | 45.9 | 53.2 |

R _{d} | mm | 0 | 1,000 | 278 | 432 | 633 |

LAI | – | 0 | 5 | 3.3 | 2.5 | 1.5 |

θ _{s} | – | 0.350 | 0.500 | 0.467 | 0.43 | 0.419 |

θ _{r} | – | 0.01 | 0.25 | 0.05 | 0.11 | 0.15 |

K _{s} | m/s | 1.00 × 10^{−8} | 1.00 × 10^{−5} | 7.80 × 10^{−7} | 3.30 × 10^{−6} | 2.58 × 10^{−6} |

K _{i} | day | 1 | 30 | 5.3 | 8.4 | 11 |

K _{b} | day | 1 | 180 | 23.1 | 42.2 | 32.3 |

Parameter . | Units . | Lower bound . | Upper bound . | Optimal values . | ||
---|---|---|---|---|---|---|

C1 . | C2 . | C3 . | ||||

D _{s} | mm | 0 | 20 | 4.5 | 3.4 | 1.2 |

M | m^{1/3}/s | 10 | 100 | 26.3 | 45.9 | 53.2 |

R _{d} | mm | 0 | 1,000 | 278 | 432 | 633 |

LAI | – | 0 | 5 | 3.3 | 2.5 | 1.5 |

θ _{s} | – | 0.350 | 0.500 | 0.467 | 0.43 | 0.419 |

θ _{r} | – | 0.01 | 0.25 | 0.05 | 0.11 | 0.15 |

K _{s} | m/s | 1.00 × 10^{−8} | 1.00 × 10^{−5} | 7.80 × 10^{−7} | 3.30 × 10^{−6} | 2.58 × 10^{−6} |

K _{i} | day | 1 | 30 | 5.3 | 8.4 | 11 |

K _{b} | day | 1 | 180 | 23.1 | 42.2 | 32.3 |

*et al.*1993) is conducted in the calibration of MIKE SHE, which has been proven as an effective global optimization method for calibrating a hydrological model (Jie

*et al.*2016). To measure the fitness between simulation and observation, the KGE is used as the objective function of SCE-UA and can be expressed as follows: with: where

*r*is correlation, and are, respectively, the bias and variability ratios, and are the respective means of the simulated and observed variables, and and are the respective standard deviation of the simulated and observed variables. The range of KGE is to 1, with values closer to 1 representing a better fit.

^{′}by Abbaspour

*et al.*(2015) with multiple variables. KGE

^{′}is written as follows: where

*n*and

*w*, respectively, represent the number and weight of objective variables, the index

*f*denotes streamflow and the index

*s*denotes , and

*i*and

*j*, respectively, denote the streamflow gauge stations and calibrated sub-basins with . In this study, we set because there is only one streamflow gauge station at Beimiaoji (outlet). Three calibration approaches are compared: (C1) if only streamflow is used as the objective variable, then ; (C2) if the streamflow and the mean of the spatial root zone soil moisture distribution across the watershed are used as the objective variables, then (after Rajib

*et al.*2016); (C3) if the streamflow and the hydrologic complexity

*μ*of root zone soil moisture are used as the objective variables, then the values are set as in C2. All approaches are evaluated after the same number of iterations (1,000) for 31 months which included the warm-up period and estimation period. The coefficient of determination (R

^{2}) and MAE are also used to evaluate the fitness of the observation and optimal simulation.

### Parameter sensitivity analysis

Changes in objective variables have important effects on simulation results such as and streamflow. To ascertain the contribution degree of each parameter to the calibration process, we analyze the sensitivity of parameters in two steps. First, Latin hypercube sampling is used to produce random samples of the parameter values as inputs into MIKE SHE. The upper and lower bounds of the parameters are listed in Table 1. All cases are evaluated after 1,000 iterations. The second step describes the sensitivity of parameters with the quantity *q _{value}*, which equals

*p*-value (Abbaspour

*et al.*2015; Rajib

*et al.*2016). The level of significance

*α*is commonly set to 0.05. The larger the

*q*and the closer it is to 1, the greater the significance of the sensitivity. We can rank the significance degree of parameters' sensitivity using the

_{value}*q*.

_{value}## RESULTS AND DISCUSSION

### Model calibration and temporal patterns

The nine calibrated parameter values for three cases are presented in Table 1. Figures 2–4 compare the observation and three simulation approaches involving streamflow and . Due to the different metrics used to describe , we estimate the temporal distribution of from two aspects (daily and daily ). Fitting values (KGE, R^{2}, MAE) are listed in Table 2 for the three calibration approaches (C1, C2, C3). Generally, the high fit values represent higher KGE and R^{2} and lower MAE, and better simulation results can be achieved.

Goodness of fit criteria . | C1 . | C2 . | C3 . |
---|---|---|---|

Streamflow | |||

KGE | 0.85 | 0.79 | 0.77 |

R^{2} | 0.80 | 0.74 | 0.71 |

MAE | 14.36 | 16.26 | 15.02 |

Root zone soil moisture^{a} | |||

KGE | 0.22 | 0.69 | 0.66 |

R^{2} | 0.15 | 0.64 | 0.57 |

MAE | 0.10 | 0.02 | 0.05 |

Root zone soil moisture^{b} | |||

KGE | −0.95 | 0.23 | 0.69 |

R^{2} | 0.07 | 0.45 | 0.61 |

MAE | 0.08 | 0.05 | 0.03 |

Goodness of fit criteria . | C1 . | C2 . | C3 . |
---|---|---|---|

Streamflow | |||

KGE | 0.85 | 0.79 | 0.77 |

R^{2} | 0.80 | 0.74 | 0.71 |

MAE | 14.36 | 16.26 | 15.02 |

Root zone soil moisture^{a} | |||

KGE | 0.22 | 0.69 | 0.66 |

R^{2} | 0.15 | 0.64 | 0.57 |

MAE | 0.10 | 0.02 | 0.05 |

Root zone soil moisture^{b} | |||

KGE | −0.95 | 0.23 | 0.69 |

R^{2} | 0.07 | 0.45 | 0.61 |

MAE | 0.08 | 0.05 | 0.03 |

^{a}The mean of distribution across the watershed .

^{b}The hydrologic complexity *μ* of .

The calibration approach C1, without using any information in the calibration process, has the best fitness for observed streamflow (Figure 2) with the highest KGE and R^{2} and the lowest MAE (Table 2). Although they are affected negatively by multi-variable calibration, C2 and C3 also achieve acceptable streamflow results. When is used in calibration, the simulation of is effectively improved (Figures 3 and 4), i.e., C2, KGE and R^{2} increase from 0.22 to 0.69 and 0.15 to 0.64, respectively, and MAE decreases from 0.1 to 0.02 compared with C1 for the simulation of (Table 2). and streamflow are both important components in the model, and simultaneous calibration of them can bring more constraints to the related processes. Tradeoffs have to be made between and streamflow in multi-variable calibration, since optimizing all objective variables is almost impossible (Pfannerstill *et al.* 2017). Hence, it is reasonable to find that the simulation accuracy of streamflow is slightly reduced while the accuracy of is improved.

In addition, we can estimate which metrics can effectively track the temporal feature of . C2 and C3 are compared using the estimates of daily and . With as one of the objective variables, C2 performs slightly better than C3 for daily , but with as one of the objective variables, the fitting values of C3 are far higher than those of C2 with KGE, R^{2}, and MAE improved from 0.23 to 0.69, 0.45 to 0.61 and from 0.05 to 0.03, respectively. It is evident that the use of *μ* can provide a more explicit and comprehensive temporal distribution of than the mean in the model calibration.

### Enhanced simulation in spatial patterns of *θ*_{root}

_{root}

The cumulative distribution function (CDF), a quantile-based mapping method, is used to spatially characterize the distributional features of . This approach is quite simple and has been used successfully in hydrological studies (Li *et al.* 2017). The CDF is compared for and three simulated values (Figure 5). Figure 5 shows the CDF of three simulated values for the 25th, 50th, and 75th percentiles and the overall average compared with . Generally, the of C1 is much drier than . C2 and C3 show a better fitness with SMAP, but C3 is superior to C2 with respect to the range of . The range of C2 simulated for is too large. For example, the of Case 2, Case 3, and SMAP ranges from 0.162 to 0.405, 0.181 to 0.343, and 0.250 to 0.336, respectively (Figure 5(b)). It is apparent that the range of C3 using as one of the objective variables shows a fair correspondence with . Therefore, *μ* has a stronger quality than the mean (central tendency) in capturing the spatial distribution features of .

### Evaluation of relative parameter sensitivity

Figure 6 compares the rankings of parameter sensitivity for the three cases based on *q _{value}*. Selected parameters in C2 and C3, such as

*R*,

_{d}*θ*,

_{r}*θ*,

_{s}*LAI*, become more sensitive relative to the unsaturated zone. These parameters have important effects on the simulation of unsaturated flow. Calculation of occurs in the unsaturated zone and can be significantly affected by these parameters. Overall, the sensitivity of these related parameters is enhanced when is used in the calibration process. In addition, compared with C2, C3 makes these parameters more sensitive, such as the change in ranking of

*θ*from fifth to fourth and

_{r}*θ*from eighth to sixth. This increased sensitivity can explain, to a certain extent, why

_{s}*μ*enhances the significance of parameters related to to a greater extent than does the mean.

## CONCLUSION

In this study, three calibration approaches are proposed based on streamflow and/or in the MIKE SHE model. These approaches are tested in the Beimiaoji watershed. To compare the simulation performance of for these approaches, the metrics of temporal and spatial distribution are estimated. The main conclusions are as follows:

- (1)
Among three calibration approaches, the best streamflow simulation is achieved by relying only on streamflow in the model calibration. However, the simulation is the worst.

- (2)
The simulation performance of is evidently improved despite the negative effects for streamflow simulation but is acceptable if applying in the model calibration. For example, the fitting values KGE and R

^{2}, respectively, increase from 0.22 to 0.69 and 0.15 to 0.64, and MAE shows a decrease from 0.10 to 0.04 for the simulation between C1 and C2. - (3)
Although better performance is achieved in the estimate of daily when the mean is a metric describing the features of in the calibration, the metric

*μ*has better performance for both the estimates of and the CDF. The*μ*can also obtain an acceptable performance in the estimate of and is far superior to the mean in the estimate of daily . We conclude that the*μ*is more effective than the mean in enhancing the estimates for . - (4)
The ranking of relative parameter sensitivity is affected when is used in the model calibration. Thus, the objective variables are related to the variety of significance of the parameter sensitivities. In addition, the effects on the significance of parameters related to – such as

*R*,_{d}*θ*,_{r}*θ*, and_{s}*LAI –*are prominent, especially with as one of the objective variables.

## ACKNOWLEDGEMENTS

This study was supported by the National Key Research and Development Program of China (grant no. 2016YFC0402706) and the Major Program of the National Natural Science Foundation of China (grant no. 41730750).