Assessment of the suitability of rainfall – runoff models by coupling performance statistics and sensitivity analysis

Conceptual rainfall–runoff models are widely used to understand the hydrologic responses of catchments of interest. Modellers calculate the model performance statistics for the calibration and validation periods to investigate whether these models serve as satisfactory representations of the natural hydrologic phenomenon. Another useful method to investigate model suitability is sensitivity analysis (SA), which investigates structural uncertainty in the models. However, a comprehensive method is needed, which led us to develop a model suitability index (MSI) by combining the results of model performance statistics and SA. Here, we assessed and compared the suitability of three rainfall–runoff models (GR4J, IHACRES and Sacramento model) for seven Korean catchments using MSI. MSI showed that the GR4J and IHACRES models are suitable, having more than 0.5 MSI, whereas the Sacramento has less than 0.5 MSI, representing unsuitability for most of the Korean catchments. The MSI developed in this study is a quantitative measure that can be used for the comparison of rainfall–runoff models for different catchments. It uses the results of existing model performance statistics and sensitivity indices; hence, users can easily apply this index to their models and catchments to investigate suitability. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/). doi: 10.2166/nh.2016.129 om https://iwaponline.com/hr/article-pdf/48/5/1192/365497/nh0481192.pdf er 2019 Mun-Ju Shin Chung-Soo Kim (corresponding author) Water Resources and River Research Institute, Korea Institute of Civil Engineering and Building Technology (KICT), 283, Goyangdae-ro, Ilsanseo-gu, Goyang-si, Gyeonggi-do 411-712, Republic of Korea E-mail: alska710@kict.re.kr


INTRODUCTION
Conceptual rainfall-runoff models are widely used to understand and predict the hydrologic responses of catchments of interest. In particular, parsimonious conceptual rainfallrunoff models have benefits of ease of use because of a smaller number of input data required, a faster computational time with reasonable results and a lower structural identifiability problem (Shin et al. ). The range of study using the conceptual models is broad: parameter calibration and validation (Vansteenkiste et al.  To investigate whether the conceptual models sufficiently replicate the observed hydrographs, a routine methodthe split-sampling test (Klemeš ) is applied for the calibration and validation periods, and model performance statistics are calculated for these periods.
Modellers assess model performance using various objective functions and performance statistics to decide the suitability of the models to their study catchments. Hydrologic models with more parameters tend to have higher performance statistics compared to parsimonious models, but more complex models could have a serious interaction between parameters, which causes uncertainty in the results (Shin et al. ).
Another useful method to investigate and compare the suitability of models is SA. If all or many parameters are sensitive, then the models are structurally sound, which means that modellers can trust the results from these models. SA is an essential step to assess model parameters and, therefore, to investigate the model structure (Shin et al. ). The sensitivity of model parameters may vary for different study areas (Shin et al. ); hence, every study that makes use of hydrologic models with new catchment data needs SA.
SA is categorised into global and local methods, and global SA methods are appropriate if there are interactions between model parameters, which results in a non-linear output (Saltelli & Annoni ).
Suitable rainfall-runoff models should have both good model performance and less uncertainty. There were several studies and discussions about the suitability or acceptability of models in terms of model performance and structural uncertainty (Beven & Freer ; Vrugt et al. ; Wagener ); however, these are not enough for users to determine which model should be selected as the suitable one. The aim of this study is to provide a guide for suitable model selection considering both model performance and uncertainty.
To provide a clearer and easier guide, we developed the model suitability index (MSI) by coupling the results of model performance statistics and global SA. MSI is a quantitative index; hence, users can compare the different rainfall-runoff models using this measure. We compared three conceptual rainfall-runoff models using MSI for seven Korean catchments to investigate which models are suitable or require caution.
The subjects of each section are as follows: the second section describes the SA method, model performance criteria and MSI; the third section briefly describes rainfallrunoff models, catchments, data, the target functions for SA, the calibration method and the method used for this study; the fourth section shows the results of SA, performance statistics and MSI for the three conceptual rainfall-runoff models; and the final section provides a discussion and the conclusions of this study. As mentioned by Saltelli et al. (), TSI provides more reliable results than FSI for the investigation of the overall effect of each parameter on the output. Hence, we used TSI only for the investigation of parameter sensitivity, and it can be defined as (Saltelli & Annoni ): We applied Saltelli's scheme (Saltelli ) in the R 'sensitivity' package (Pujol et al. ), which calculates TSI using the reduced number of samples from n(2k þ 2) to n (k þ 2). Here, n represents the initial sample size used (10,000 in this study), and k is the number of the parameters of the hydrological model. Hence, the total number of samples generated is 60,000 for four parameter models, and 70,000 and 150,000 for five and 13 parameter models, respectively. More details of the sampling method are described in the work by Shin et al. ().

Model performance criteria
The value of the objective function for the calibration of parameters can be used as the model performance statistics. We used three objective function values as model performance statistics to compare the performance of conceptual rainfall-runoff models. The first objective function, the Nash-Sutcliffe efficiency (NSE) (Nash & Sutcliffe ), is as follows: where n is the number of time steps, Q obs,i is the observed flow at time step i (daily here), Q obs is the mean of the observed flow, and Q sim,i is the simulated flow.
where SR is sensitivity ratio, which is the ratio of the number of sensitive parameters out of the total number of par-

Rainfall-runoff models
We used three well-known conceptual rainfall-runoff models with different complexities from four to 13 parameters. The GR4J model (Perrin et al. ) with four parameters uses two unit hydrographs and two stores for the production and routing of water. The storage of rainfall, evapotranspiration and percolation in the surface soil are controlled by the production store, and the routing of effective rainfall is controlled by the routing store. In the routing store, effective rainfall is separated by the ratio of 0.9 and 0.1, and then, two unit hydrographs route the portioned effective rainfall. Quick flow is generated by 10% of effective rainfall, and slow flow is generated using 90% of effective  which is an open-source software package in R. The Hydromad package is available at http://hydromad.catchment.org.

Catchments
The study area is composed of seven dam catchments in South Korea ( Figure 1). These study sites are unregulated natural mountainous catchments, and are the major sources of irrigation and city water. These catchments represent the major hydrologic characteristics of South Korea by having various sizes (Soyanggang, 2,703 km 2 ; Chungju, 6,648 km 2 ; Andong, 1,584 km 2 ; Imha, 1,361 km 2 ; Yongdam, 930 km 2 ; Seomjingang, 763 km 2 ; and Juam, 1,010 km 2 ) and various spatial locations. The 5-year moving average of runoff ratio (%) for the seven catchments is shown in Figure 2. Similar patterns are shown for the runoff ratio between Soyanggang and Chungju, Andong and Imha, and Seomjingang and Juam. These mountainous catchments have relatively large magnitude and long duration of precipitation due to the monsoon climate, therefore, the dominant runoff is relatively large. Figure 2 shows the relatively large runoff ratios of at least 35% that represent the characteristic of wet catchments.

Data
The daily areal rainfall and potential evapotranspiration data for the seven catchments were used as input data for The total data periods are different for the seven dam catchments because of the different years of dam construction. To use the data efficiently, the data were split, having the same length of sub-periods for each catchment but different sub-periods for different catchments (Table 2). Hence, the calibration and validation periods were the same for each catchment but different over the seven catchments.  For calibration and validation, 1 year warm-up period was used, that is, the warm-up period for the P1 period of the Soyanggang catchment in Table 2 The • Step 1: Select the initial 'population' randomly throughout the feasible parameter space.
• Step 2: Divide the 'population' into 'complexes', which represent communities that have parents to generate an offspring. • Step 3: Evolve each complex independently by applying a competitive complex evolution strategy. • Step 4: Shuffle the evolved complexes into a single, whole population to share the information. • Step 5: Iterate Steps 2-4 until the results are converged to the preselected threshold.

Method used
First, we performed SA by applying the Sobol method for the parameters of the three models using the three target Lastly, we assessed the suitability of the rainfall-runoff models by applying MSI. We compared the three conceptual models for the seven catchments using the bar plot of MSI.
The bar plot is an easy and quick method to investigate which models are suitable and which models require caution.

RESULTS
SA for the parameters of the three conceptual models  For the Sacramento model,parameter nos. 2,5,8,9,11 and 12 are relatively sensitive (at least one parameter is sensitive) to the NSE* target function with respect to the Comparison of the three model performances for the three objective functions In Figure 9, for the NSE objective function, the three models have good model performances over the seven catchments except for the P2 periods of the Seomjingang  Boxplots to assess the overall replicability and predictability of the models Figures 12 and 13 show the boxplots of the model performances for each objective function using all the results of the seven catchments. Figure 12, which is for calibration, is to investigate the replicability of the natural flow of the models, and Figure 13, which is for validation, is to investigate the predictability of the models for each objective function.  In Figure 13    give the same weight to model performance and sensitivity; therefore, the threshold value of good MSI is 0.5.
The MSIs using all the model performances and sensitivity indices in the sub-sections above are shown in Figure 14. In Figure 14 possible. We gave the same weight to the model performance and sensitivity results, but this weight can be adjusted by users with valid reasons. The analysis of sensitivity to the weights given to these two parts is out of bounds for this study so we have not done this analysis. However, this analysis can be an interesting study because the MSI can be changed with different weights; therefore, this work will be researched in the future.
The parameters of the models used in this study were related to the input rainfall and its response of runoff, therefore, they can be used for the SA, model performance test and, finally, the MSI analysis. However, if some of the parameters from a conceptual model are related to snow accumulation and melt, and are applied to some tropical catchments, then the MSI of the model will be low because the parameters will be insensitive, even if the other parameters of the model capture the runoff reasonably.
Therefore, the selection of relevant model parameters is an essential prerequisite step before the MSI analysis.
While this paper focused on three conceptual rainfallrunoff models, the method can be applied to other physically based distributed rainfall-runoff models. The detailed analysis for the sensitivity indices in the sub-section 'SA for the parameters of the three conceptual models' can be used to investigate the parametric and structural uncertainty of the model, and the detailed analysis for the model performances in the sub-section 'Comparison of the three model performances for the three objective functions' can be used to investigate the strength of the model, i.e., the extent of capturing the dynamics of rainfall-runoff processes. The overall analysis using all the sensitivity indices or model performances in the sub-sections 'Boxplots to assess the overall sensitivity of the parameters' and 'Boxplots to assess the overall replicability and predictability of the models' can be a useful tool for gaining intuition about model suitability. The MSI analysis in the sub-section 'Comparison of rainfall-runoff models using MSI' can be a pragmatic measure to screen out the suitable model.
To summarise, we have investigated the suitability of three conceptual rainfall-runoff models for the Korean catchments through the comparison of the sensitivity of model parameters, model performances and MSI by coupling the results of SA and the model performance statistics. We used three well-known and widely used hydrological models -GR4J, IHACRES and Sacramentofor the seven unregulated dam catchments, which have different periods of data sets from 12 to 40 years. Three target functions for high, low and very low flow were used for SA, and three objective functions for high, medium and low flow were used for calibration. The SCE algorithm was used for parameter estimation, and the split-sample independent data sets were used for the calibration and validation periods.