Exploration on hydrological model calibration by considering the hydro-meteorological variability


 The hydrological response is changeable for catchments with hydro-meteorological variations, which is neglected by the traditional calibration approach through using time-invariant parameters. This study aims to reproduce the variation of hydrological responses by allowing parameters to vary over clusters with hydro-meteorological similarities. The Fuzzy C-means algorithm is used to partition one-month periods into temperature-based and rainfall-based clusters. One-month periods are also classified based on seasons and random numbers for comparison. This study is carried out in three catchments in the UK, using the IHACRES rainfall-runoff model. Results show when using time-varying parameters to account for the variation of hydrological processes, it is important to identify the key factors that cause the change of hydrological responses, and the selection of the time-varying parameters should correspond to the identified key factors. In the study sites, temperature plays a more important role in controlling the change of hydrological responses than rainfall. It is found that the number of clusters has an effect on model performance, model performances for calibration period become better with the increase of cluster number; however, the increase of model complexity leads to poor predictive capabilities due to overfitting. It is important to select the appropriate number of clusters to achieve a balance between model complexity and model performance.


INTRODUCTION
Understanding the hydrological response of catchments is crucial for various issues related to water resources management. Hydrological models with varying degrees of complexity have been developed to represent the rainfall-runoff transformation relationship. The accuracy of hydrological models is affected by multiple factors. The error associated with the observed data is one of these factors.
Although improving the measuring technology could reduce the error, for the existing data, noise reduction could be an effective pre-processing method and has been widely explored and applied (Chou 2014;Li et al. 2018). The model's representation of the hydrological process (or model structure) also affects the model performance. Melsen et al. (2016) studied the representation of spatial and temporal variability in large-domain hydrological models through investigating parameter transferability across different temporal and spatial resolutions. Magnusson et al. (2015) demonstrated the usefulness of multimodel framework for identifying appropriate model structures according to data availability, properties of interest and computational cost. Due to the lack of field measurements, hydrological model parameters are generally estimated through calibration, which is also a key procedure controlling the model capability. Some works investigated the role of the multi-objective calibration in improving model accuracy (Zhang et al. 2016;Her & Seong 2018;Zhang et al. 2018b), and some works aimed at the improvement of the optimization algorithm for calibration . Taking into account 3 the variation of the hydrological response during model calibration is another way to improve the model performance, which is also the task of this study.
It is widely accepted that the more stable the catchment conditions are, the better should the estimated parameters represent the hydrological response, and the more similar the calibration data is to the validation data, the better should the performance of hydrological models be. Based on these cognitions, many studies attempt to use varying model parameters to capture the variation of the hydrological response which is caused by climatic variation or land-cover changes in catchments. Efstratiadis et al. (2015) simulated the hydrological process of the Ferson Creek basin (USA) that has experienced growing urbanization over the past 30 years through employing a lumped conceptual model with one time-varying parameter and a semi-distributed scheme based on two hydrological response units with the time-varying surface area. Pathiraja et al. (2016aPathiraja et al. ( , 2016b investigated the potential of data assimilation techniques to detect temporal patterns in hydrological model parameters from streamflow observations, and then examined the proposed method to paired catchment systems in Western Australia that have different extents of deforestation. The results demonstrate that the time-varying model structures are able to improve both predictions and modelling of changing catchments. Sadegh et al. (2019) proposed the Nonstationary Rainfall-Runoff Toolbox (NRRT) to permit time-varying realizations of hydrological models to predict nonstationary hydrological response in watersheds, where the physical changes are manifested in time-varying parameters in a conceptual model. Their case study in the Wights catchment in Australia shows that the decrease of the maximum capacity of the production store (S1max) of the GR4J model adequately represents the loss of near surface storage due to deforestation.
In addition to employing time-varying model parameters to capture the variation of the hydrological response that is caused by land-use changes, time-varying parameters are also used to account for the effect of climatic temporal variations on hydrological processes, including intra-annual variations (or seasonal variations) and inter-annual variations. For attempts focusing on seasonal variations of hydrological responses, explorations are made under the hypothesis that the hydrological response of different seasons can be reproduced by using different parameter sets. Paik et al. (2005) proposed a seasonal tank model to calibrate season-varying parameters for three 4-month seasons. The application of the seasonal tank model to a watershed located in central South Korea indicates that the seasonal tank model has a smaller sum of square errors than those of the non-seasonal tank model for the calibration period. LÉVesque et al. (2008) evaluated the hydrological behaviour of the SWAT model by distinguishing the hydrological dynamics related to winter and summer seasons for two watersheds in southeastern Canada. the summer performance was considerably improved when only summer streamflow was provided for calibration. However, calibration based solely on winter observations 4 resulted in minor improvements in performance. Luo et al. (2012) explored the possible effects of the hydrologic non-stationarity by testing ten parameterization schemes at 12 catchments in eastern Australia. Results show that among all parameterization schemes, calibrating the model using the data from each individual month benefits the seasonal streamflow forecasting. Zhang et al. (2018a) configured a season-based probability-distributed model (PDM-CEMADEN) to simulate different hydrological responses during wet and dry seasons. The season-based models constructed for five basins in southeastern Brazil are adequate to reproduce the intra-annual and inter-annual variability of the streamflow.
As for the impact of inter-annual climatic variations on hydrological processes, KlemeŠ (1986) initially considered the necessity of verifying the hydrological model under different climate conditions. He proposed a differential split-sampling test to identify two periods with different climatic characteristics, the hydrological model was then calibrated and validated by the contrast periods. This method was applied to 273 catchments in Austria by Merz et al. (2011). They found that parameters representing snow and soil moisture processes have high correlations to changing climatic conditions in the more recent years, such as higher evapotranspiration and drier soil conditions. Under these changing climatic conditions, the simulation errors clearly increase as the time lag between the simulation and calibration periods increases. Brigode et al. (2013) classified the available records to four 3-year sub-periods on the basis of the Aridity Index (here defined as the ratio of mean Penman potential evapotranspiration to mean precipitation): a wet sub-period, two dry ones and an intermediate one. The driest sub-period was used as the validation period and the three others were used as calibration periods separately. The results show that the model performance is the worst when the wet sub-period is used for calibration. Kim et al. (2016) investigated the calibration scheme where one parameter of the IHACRES model is selected to vary against time and climate conditions while other parameters remain the fixed. They found that the model that takes into account the nonstationary effects works well for both calibration and validation periods.
When using time-varying parameters to account for the variation of hydrological responses caused by climatic variations, clustering methods are commonly used to identify periods with similar climatic characteristics, during which parameter sets are assumed the same. Choi & Beven (2007) classified subperiods with the length of 30 days into 15 clusters using Fuzzy C-Means algorithm, where the climatic conditions are described with six variables. They then calibrated and validated the TOPMODEL for each cluster in the GLUE framework. Although the satisfactory model performance could be achieved at the global level, there was no parameter set that performs well for all 15 clusters. de Vos et al. (2010) employed the k-means clustering algorithm to classify the historical data into 12 clusters with similar characteristics in terms of precipitation and soil moisture, and model parameters are allowed to vary 5 over clusters. They improved the model structure by analyzing the variation pattern of parameters. A clustering method based on Self-Organising Maps (SOM) was also used to partition climatic conditions by Toth (2009). They found that an adequate distinction of the climatic conditions may considerably improve the rainfall-runoff modeling performance.
Although it is widely recognized that allowing parameters to vary according to the variations of climatic conditions could improve the hydrological modeling performance, there are some issues remaining unexplored. For example, when identifying similar climatic conditions, which climatic factor is more related to the variation of the hydrological response. Besides, despite the time-varying parameters could better reproduce the real hydrological response, it will increase the model complexity and has an effect on the predictive capabilities of the model. This study aims to investigate these problems. In order to identify periods with climatic similarities, the Fuzzy C-means (FCM) algorithm is used to partition 1month periods to different clusters. Here the FCM algorithm is executed based on the temperature information and rainfall information of 1-month periods separately. 1-month periods are also classified on the basis of seasons and random numbers for comparison. Parameters are allowed to vary over clusters during the calibration procedure. Model performances are then evaluated using the criteria R 2 , R 2 and relative bias to represent high flows, low flows and water balance. The trade-off between the model complexity and model performance is studied by evaluating the model performance under different numbers of the cluster. Three catchments in the southwest of UK are selected to carry out this study, with the use of a lumped conceptual rainfall-runoff model IHACRES.

STUDY SITES
Three catchments located in the southwest of UK are explored in this study: Exe River at Thorverton (45001), Brue River at Lovington (52010) and Avon River at Great Somerford (53008). The main land use of these three catchments is grassland and horticulture, presenting little changes in recent decades. Figure 1 shows the location of the selected catchments and the corresponding stream gauging stations.
Information on these three catchments and available data are listed in Table 1. The average daily rainfall data are obtained from the NERC Environmental Information Data Centre (Tanguy et al. 2016). The catchment average temperatures are calculated with the use of the UKCP09 gridded observation data sets, and the National River Flow Archive (NRFA) provides the daily time series of observed streamflow data. All data have been checked for possible outliers and missing data, etc. The data during the period from 2003 to 2015 is selected for analysis because this period has minimal missing records for all types of data for all catchments.   Figure 2a, it is seen that the temperature difference between summer and winter months is not significant, ranging from 4.4 °C in February to 16.3 °C in July. The studied catchments are hardly affected by snow. Figure 2b shows the temporal variations of monthly temperature, a clear seasonal pattern is seen. The average annual temperature indicates the temperature rise during the study period is not distinct compared with temperature intra-annual variations, therefore the climate change could be neglectable in this study. The distribution of the average monthly rainfall and measured streamflow for three catchments is shown in Figure 3. The three catchments show a similar pattern for both rainfall and streamflow. They have heavy rainfall all year round, and there is a seasonal pattern, with wet autumns and winters and relatively dry springs and summers. It is clear that the monthly rainfall varies a lot over years, especially for the summer season. The average monthly streamflow shows a similar pattern to rainfall, where the autumn and winter have high streamflow, while the spring and summer have low streamflow. Despite this similar pattern, it is interesting to find that the decrease of streamflow in summer is more distinct compared with that of rainfall, and the summer streamflow has less interannual variations. This could be explained by the high temperature of summer, which plays a key role in controlling the runoff through affecting the evapotranspiration process. Given the location of these three catchments, it is found that the monthly rainfall and monthly streamflow decrease from the west (Thorverton catchment) to the east (Great Somerford catchment).

Rainfall-runoff model
The IHACRES model (Jakeman & Hornberger 1993) is a lumped conceptual rainfall-runoff model, which has been widely applied to a range of catchments for hydrological analysis and climate impact studies due to its simple structure and less requirement for input data Letcher et al. 2001;Kim & Lee 2014;Kim et al. 2016). 9 The IHACRES model consists of two modules in series: the non-linear loss module and the linear routing module, as shown in Figure 4. The non-linear loss module calculates effective rainfall by calculating the catchment wetness index on the basis of rainfall and temperature. The percentage of rainfall that becomes effective rainfall varies linearly from 0% to 100% as the catchment wetness index varies between zero to unity. The linear routing module then converts the effective rainfall to streamflow based on the unit hydrograph theory, where the catchment is conceptualized as a configuration of linear storages acting in series and/or parallel. Model parameters are listed in Table 2.

Calibration schemes
The parameters of the hydrological model are allowed to vary over clusters during the calibration procedure. These clusters consist of 1-month periods in the calibration period, and the 1-month period is divided based on the calendar month. There are four types of the cluster, which are identified based on temperature similarity, rainfall similarity, calendar seasons and random numbers, respectively.
Temperature and rainfall are crucial variables affecting the hydrological response. Clusters considering their similarities are classified using the Fuzzy C-means (FCM) algorithm. The reason of exploring 10 seasons is that the study areas show seasonal variations of hydro-meteorological conditions, therefore cluster based on seasons could take into account the similarity of both temperature and rainfall, although these similarities are not as distinct as those identified using the Fuzzy C-means (FCM) algorithm.
Clusters based on random numbers are just used for comparison. In addition, for the purpose of better evaluating the parameter-varying calibration scheme, we also studied the traditional calibration approach which uses time-invariant parameters. As a result, there are five calibration schemes, referred as Tradition scheme, FCM_T scheme, FCM_R scheme, Season scheme, and Random scheme.

Fuzzy C-Means (FCM) algorithm
The Fuzzy C-Means (FCM) algorithm is an unsupervised clustering algorithm, initially proposed by Bezdek (1981). During clustering, objects with similar characteristics are classified into one cluster, and objects in different clusters are dissimilar in terms of the same characteristics (Sbai 2001;Pakhira et al. 2004). In this study, the FCM algorithm was used to partition multiple 1-month periods based on hydro-meteorological conditions in terms of temperature and rainfall. The temperature information of the 1-month period was described with 4 variables: average monthly temperature, maximum monthly temperature, minimum monthly temperature, and monthly temperature variance. The rainfall information of the 1-month period was described with the following variables: monthly rainfall, maximum monthly rainfall, the rate of rainy days and monthly rainfall variance.
When classifying the multiple 1-month periods = { 1 , 2 , … , } into k clusters represented as fuzzy sets ( , = 1, … , ) , the algorithm is carried out by minimizing the following objective function: where µ is the membership degree of to the fuzzy cluster set , ∑ µ = 1. m ∈ [1, ∞) is a weight exponent controlling the degree of fuzzification. is the cluster centroids of the fuzzy cluster set , and ‖ − ‖ is an Euclidean norm between and . In this study, is the ℎ variable of each hydro-meteorological factor.
Fuzzy partitioning is performed through an iterative optimization of the above objective function, with the membership degree µ and the cluster centroids updated until cannot be further improved .
The number of clusters needs to be defined before the FCM algorithm is conducted. When exploring the performance of different calibration schemes, to avoid the effect of the difference in model complexity which is associated with the number of clusters, we define the number of clusters as 4, which equals the number of seasons. When investigating the effect of model complexity, model performance under different numbers of clusters are tested.

Model calibration
When calibrating the parameter-varying hydrological model, there are two calibration methods, the parallel calibration scheme (PCS) and the serial calibration scheme (SCS) (Kim & Han 2017). For the PCS approach, parameter sets for different clusters are calibrated parallelly. Each time the model is run, only the data belonged to one cluster is used in the objective function although the model is run for the whole calibration period. In this way, the parameter set for this cluster could be calibrated. When there are n clusters, the model needs to be run for n times to calibrate n sets of parameters. With all parameter sets, the simulated streamflow could be obtained by extracting and combining the streamflow of each cluster that is simulated with its corresponding parameter set. As for the SCS approach, all parameter sets are calibrated simultaneously. Parameters vary according to the cluster the data belong to. When the calibration procedure switches from one 1-month period to the following one, the subsequent period's state variables and streamflow are updated with the prior period's ones. The PCS approach is easy to implement and widely used; however, the state variables and simulated streamflow are discontinuous, which does not make sense. The SCS approach increases the complexity of the model while overcoming the discontinuous problem. In this study, the SCS approach is employed to calibrate the varying parameters.
The IHACRES model has six parameters, as listed in Table 2: three parameters in the non-linear loss module and three parameters in the linear routing module ( = 1 − ). During calibration, all parameters except for vary over clusters, and a value of is selected such that the volumes of effective rainfall and observed streamflow are equal over the calibration period. Therefore, when there are n clusters, the number of parameters is 5n + 1. The calibration procedure is illustrated in Figure 5, which takes the season-based clusters as an example. As is seen, each cluster (here is the season) are assigned one set of parameters, when the data switch from one season to another, the parameters vary accordingly. During the calibration procedure, Nash-Sutcliffe Efficiency (NSE) (Nash & Sutcliffe 1970) is used to minimize the difference between the observed and simulated streamflow, defined as: ( 2) where and are the ℎ observed and simulated streamflow, respectively. ̅̅̅̅ is the arithmetic mean of the observed streamflow. is the total number of days in the calibration period. The Nash-Sutcliffe Efficiency can vary from −∞ to 1. An efficiency of 1 corresponds to a perfect match of the simulated streamflow to the observed streamflow.
The shuffled complex evolution (SCE-UA) method (Duan et al. 1992) is used to maximize the above

Model evaluation
When validating the hydrological model, for FCM_T scheme, FCM_R scheme and Season scheme, each 1-month period in the validation period are assigned one parameter set according to its similarity to the existing clusters. As for the Random calibration scheme, the 1-month periods in the validation period are assigned parameter sets randomly. Once the parameter sets are assigned to the 1-month period, the model is run with parameters varying.
Model performance is assessed for all the calibration periods and validation periods based on three evaluation criteria: R 2 , R ln 2 and relative bias.
where and are the ℎ observed and simulated streamflow, respectively. ̅̅̅̅ is the arithmetic mean of the observed streamflow. is the total number of days in the calibration period. 90 represents the 90 ℎ percentile of the observed non-zero streamflow. 2 is commonly used to assess the overall fit of a hydrograph which is sensitive to high flow events (Croke 2009). 2 , which is the logarithmic form 13 of 2 , is often used to reduce the sensitive of extreme values and results in the increasing sensitivity of low flows (Krause et al. 2005;Kim & Lee 2014). The value of 2 and 2 can vary from −∞ to 1, with the value of 1 corresponding to an optimal model. According to Moriasi et al. (2007), the model performance is considered very good when both 2 and 2 are greater than 0.75, good when the values in the range of 0.65-0.75, and satisfactory if they are in 0.50-0.65. The relative bias is used to assess the water balance error for a certain period. The perfect result is achieved when the bias equals zero. The larger the absolute value, the worse the result, and the positive and negative values correspond to the underestimation and overestimation of streamflow, respectively. Figure 6 shows the distribution of clusters identified based on temperature, rainfall, season and random numbers, respectively. Clusters for temperature are numbered according to the value of average monthly temperature, and clusters for rainfall are numbered based on the value of monthly rainfall (from a low level to a high level). As for seasons, the number from 1 to 4 refers to winter, autumn, spring and summer, respectively. As is seen, for clusters identified using the FCM algorithm, the same month of less rainfall than 2012. In summary, the clustering algorithm shows superiority in considering the interannual and intra-annual variation of temperature and rainfall, compared with the season-based approach. By comparing the cluster distribution of these three catchments, it is found that clusters based on temperature and rainfall have a similar distribution, indicating the variations in terms of temperature and rainfall is similar for these three catchments.

Classification of 1-month periods
The distribution of average monthly temperature and monthly rainfall of the objects in different clusters is explored. It is found that the three catchments have a similar distribution, therefore, we take the Thorverton catchment as an example, whose corresponding distribution is shown in Figure 7. As is seen, the difference of average monthly temperature among clusters is most significant for temperaturebased clusters, followed by season-based clusters. Clusters identified based on rainfall and random numbers have no distinct difference in terms of the average monthly temperature. As for the monthly 14 rainfall, the distinct difference can be found among rainfall-based clusters, while the difference among other clusters is not significant. In addition to the difference among clusters, it is also found that the similarity of the objects in the same cluster varies a lot. For instance, the objects in each temperaturebased cluster are highly similar in terms of the average monthly temperature, and the objects in each rainfall-based cluster also have the high level of similarity in terms of the monthly rainfall. As the temperature-based cluster and rainfall-based cluster are identified using the FCM algorithm, it is inferred that FCM algorithm has a better performance in grouping objects with similar characteristics and separating objects that are dissimilar in terms of the same characteristics. the parameter is determined to ensure that the volumes of the effective rainfall and observed streamflow are equal over the calibration period, so the relative bias of the calibration period is equal to zero. Therefore, only the criteria R 2 and R 2 are used to evaluate the model performance in the calibration period. The value of these criteria is the average value of two calibration periods. The model performance in terms of R 2 is greater than 0.8 for almost all calibration schemes at the three catchments.
However, the value of R 2 is relatively low. This indicates that the model performs better in simulating high flows than low flows. It could be explained by the fact that the model is calibrated only based on Nash-Sutcliffe Efficiency (NSE), so the calibration procedure only focuses on matching one aspect of the hydrological process reflected in the observations and ignores other hydrological processes, as NSE is sensitive to high flows, the calibrated model has a better performance in simulating high flows.
Through comparing the model performance of different calibration schemes, it is found the same result could be found for the three catchments. Calibration schemes that allow parameters to vary produce perform better than the tradition calibration approach in terms of R 2 and R 2 , indicating that allowing parameters to vary could better reproduce the hydrological process by considering the change of the hydrological response. Among calibration schemes that allow parameters to vary, the FCM_T scheme has the best performance, followed by the Season scheme, though the extent of the improvement caused by these two calibration schemes varies among three catchments. Given the temperature-based clusters 16 and season-based clusters have similar patterns in recognizing the temperature variation, it could be inferred that allowing parameters to vary according to temperature similarities could achieve better model performance in the calibration period for the studied catchments.

Model performance in validation
The calibrated parameter sets for the five calibration schemes over two calibration periods (2003-2010 and 2008-2015) were validated using the data in the period 2011-2015 and 2003-2007 respectively. The model performance of the five calibration schemes for validation periods is compared in Figure 10 with the use of the average value of two validation periods. The improvement produced by parameter-varying calibration schemes (except for the Random scheme) is more significant compared with that of the calibration period. The FCM_T scheme and the Season scheme have relatively higher values of R 2 and R 2 compared with other calibration schemes for all catchments, which is similar to the results of the calibration period. In the validation period, not all calibration schemes that allow parameters to vary could lead to better performances than the traditional scheme. For example, the model performance of the Random scheme in R 2 is poorer than the traditional scheme for both Lovington catchment and Great Somerford catchment. This indicates although allowing parameters to vary could improve the hydrological model performance, it is of great importance to define the appropriate cluster which could represent the variation of the hydrological response, otherwise, the increased model complexity may have adverse impacts on the model's predictive capabilities. The value of R 2 is still lower than that of R 2 for all calibration schemes, indicating the calibrated model has better capabilities in simulating high flows than low flows, this is caused by the choice of the objective function for the calibration procedure.
For the relative bias, it is seen that the FCM_T scheme has the smallest bias for three catchments.
Although the bias of the Season scheme is not as good as the FCM_T scheme, it is still better than the traditional approach. Despite the three catchments present similar results in terms of the improvement caused by the parameter-varying calibration schemes, it is clear that their model performance differs.
In general, the Thorverton catchment has a better performance than the other two catchments, with relatively higher R 2 and R 2 and lower bias, which also applies to the calibration period. The difference in model performance could be caused by the difference of the catchment properties, for example, the Lovington catchment and the Great Somerford catchment have less rainfall than the Thorverton catchment, the vegetation conditions of the three catchments may show differences, etc. Although it is important to find the reasons that lead to the model performance difference, it is beyond the scope of this study and will be investigated in the future works. The FCM_T scheme has a better performance in terms of R 2 , R 2 and relative bias in validation, and the Season scheme has higher values of R 2 and R 2 . The seasonal model performance (R 2 ) of these two schemes for validation periods are also compared with that of the Tradition scheme, as shown in Figure   11. The pattern of the improvement caused by the FCM_T scheme and the Season scheme in the validation period is similar to that in the validation period. The value of R 2 are improved in almost all seasons at different extents with the use of the FCM_T scheme and Season Scheme. The improvement in summer is the most significant, though the value of R 2 for summer is still low.

The variation of model parameters
As the FCM_T calibration scheme has the superior model performance for both calibration and validation periods, the model parameters of this calibration scheme are used to analyze the variation of parameters against clusters. It is found that the variation pattern of parameters is the same for two calibration periods. Figure 12 shows the distribution of model parameters calibrated during the period 2003-2010 for three catchments. The average monthly temperature is lowest for 1-month periods in Cluster 1 and highest for 1-month periods in Cluster 4. It is seen that the parameter and f in the nonlinear loss module show distinct variation patterns against clusters for all catchments, while there is no obvious variation pattern of the parameter , and in the linear routing module. The reference drying rate shows a decrease trend with the increase of temperature. This is plausible since when the reference drying rate is small, according to the equations of the non-linear module in Figure 4, the soil tends to be drier (smaller catchment wetness index), which is achieved when the temperature is high. The temperature modulation of drying rate f controls the sensitivity of drying rate to changes in temperature, showing an increase trend with the increase of temperature. For the cluster with lower temperatures, the difference between the real temperature and the reference temperature 20C is 19 relatively larger and the reference drying rate is higher. Due to these two factors, the drying rate may have a larger variation range even the temperature varies within a small range. In this case, a smaller value of the parameter f could decrease the variation range of the drying rate in this cluster.
In contrast, for the cluster with higher temperatures, the difference between the real temperature and the reference temperature 20C is relatively smaller and the reference drying rate is lower, which makes the variation of the drying rate insignificant even when the temperature shows an obvious difference. Here a larger value of the parameter f could solve this problem. Therefore, the variation pattern of the parameter f is also plausible.
It is interesting to find that the water loss process controlled by the parameter and f has a high correlation with the temperature, this is also the reason why these two parameters show distinct variation patterns against the change of temperature. The linear routing module aims at converting the effective rainfall to the streamflow, so the parameter , and in this module are more related to the catchment characteristics. They are also associated with the rainfall characteristics, for instance, the higher intensity rainfall facilitates the quicker surface water flow. However, from Figure 7, the difference of the monthly rainfall is not significant among temperature-based clusters, so even these parameters are associated with the rainfall characteristics, they do not show significant variations among temperature-based clusters. Based on the above results, it is inferred that after identifying the key factors that cause the change of the hydrological response in the catchment, the selection of the time-varying parameters should correspond to the identified key factors. In this study, the key factor controlling the change of the hydrological response is temperature, because the FCM_T calibration scheme presents the superior model performance for both calibration and validation periods than other calibration schemes. And the parameters that are related to the effect of temperature in the IHACRES model are the parameter and f, so these two parameters vary significantly among clusters.

The effect of the cluster number
The FCM_T calibration scheme is used to explore the effect of the cluster number on model performances, owing to its superior performance in both calibration and validation periods than other schemes. The effect of the cluster number is based on the trade-off between the bias and variance, as shown in Figure 13 (Han, 2011). If the number of clusters is too small, the classification may not be flexible enough to recognize specific similarities, and the corresponding calibration scheme may have limitations to capture the variation of the hydrological response. In this case, underfitting will be caused, with high bias and low variance. On the other hand, if the number of clusters is too large, even the noise will be recognized, which will lead to overfitting, with low bias and high variance.
In order to avoid too many parameters, the FCM_T calibration scheme with the cluster number ranging from 1 to 6 is explored. Their model performance in terms of R 2 is compared in Figure 14. The typical trend of bias trade-off ( Figure 13) could be found in Figure 14 for all catchments. The model performance for the calibration period becomes better with the increase of the cluster number. However, the increase of model complexity leads to overfitting and poor predictive capabilities of the model. The cluster number has an effect on the model performance; therefore, it is important to choose an appropriate cluster number to avoid both underfitting and overfitting. From Figure 14, the appropriate 21 number of clusters is 4 for the study sites, where the model performance for both calibration period and validation period is the optimal.

DISCUSSION AND CONCLUSIONS
This study attempts to improve the hydrological model performance by using time-varying parameters to represent the variation of the hydrological response. However, allowing parameters to vary according to similarities of catchment conditions will increase the model complexity, which may lead to overfitting and affect the predictive capabilities of the model. Two issues of concern are the identification of clusters with similarities and the effect of the increased model complexity. In this study, four types of clusters are explored. Clusters based on the similarity of temperature and rainfall are identified using the Fuzzy C-means (FCM) algorithm. Clusters are also classified based on seasons and random numbers. The component of these clusters are the data of 1-month periods which are divided according to the calendar month. From the distribution of different clusters, the FCM algorithm 22 performs better in grouping objects with similar characteristics and separating objects that are dissimilar in terms of the same characteristics. The clusters identified using the clustering algorithm could better account for the inter-annual and intra-annual variation of temperature and rainfall, compared with the season-based approach. It is noted when identifying the clusters, we divided the data into multiple 1month periods according to the calendar month. For the choice of 1-month periods, there is a problem of balance between the computational efficiency and model performance. For example, if daily data are used for clustering, it is possible that four consecutive days belong to four different clusters respectively.
In this case, parameters vary more frequently and decrease the computational efficiency. In contrast, if the period length is too long, there may be different hydro-meteorological conditions during this period, and the difference cannot be identified when they belong to the same period. The reason of selecting 1 month as the period length is that the 1-month period is often regarded as the minimum unit to describe the hydro-meteorological conditions in the previous studies (LÉVesque et al. 2008;Luo et al. 2012). In order to directly compare with the season-based clusters, the 1-month periods are divided according to the calendar month.
With the different types of clusters, parameters are calibrated by varying over clusters. For the purpose of comparison, the traditional calibration approach which assumes the parameters stable is also explored. During the calibration procedure, the difference between the observed and simulated flow is minimized by maximizing the Nash-Sutcliffe Efficiency (NSE). The performance of the calibrated model for the calibration period and validation period is evaluated with the use of criteria R 2 , R 2 and relative bias to represent high flows, low flows and water balance respectively. The studied three catchments in the southwest of UK show similar results. The FCM_T calibration scheme provides a more accurate simulation in high and low flows and water balance for both calibration period and validation period, followed by the Season calibration scheme. Given both temperature-based clusters and season-based clusters have capabilities of recognizing the temperature variation, it could be inferred that the temperature plays a crucial role in affecting the hydrological response in our study sites, and model performances could be improved by allowing parameter sets to vary according to temperature similarities. Through analyzing the variation pattern of parameters in the FCM_T calibration scheme, it is found that the parameter and f in the non-linear loss module show distinct variation patterns against temperature-based clusters for all catchments, while there is no obvious variation pattern of the parameter , and in the linear routing module. Given that the parameter and f are more related to the water loss process which is highly associated with temperature, it could be concluded that after identifying the key factors that cause the change of the hydrological response in the catchment, the selection of the time-varying parameters should correspond to the identified key factors. This conclusion provides inspiration for applying time-varying parameters to more complicated models.
Complicated hydrological models like the physically based models always involve multiple parameters, 23 for these models, it is not feasible to allow each parameter to vary among clusters because too many parameters may lead to overfitting and poor computational efficiency. In this case, the parameters that are more related to the changes of the catchment are selected to vary while the other parameters remain unchanged, which could avoid too large number of parameters while taking into account the changes of the hydrological response.
We also explored the seasonal model performance of the FCM_T scheme and the Season scheme, the improvement is found at different extent for different seasons, compared with the Tradition scheme.
Although the model performance of the summer season is improved most significantly, the value of R 2 for the summer season is still low. This is because the calibrated model has poor performance in simulating low flows since the objective function Nash-Sutcliffe Efficiency (NSE) is more sensitive to high flows, and for the study sites, the streamflow in summer is much lower than other seasons. The use of the single objective of Nash-Sutcliffe Efficiency (NSE) also causes the fact that for all calibration schemes, the model performance in terms of R 2 is better than that of R 2 , indicating the model performs better in simulating high flows than low flows. Only using the single objective optimization to calibrate the model is one limitation of our study, multiple objective optimization will be investigated in future work.
When allowing parameters to vary among clusters with similarities, the model complexity is highly correlated with the number of clusters, which raises the question of the trade-off between the model complexity and model performance. Through changing the number of clusters in the FCM_T calibration scheme, the effect of the cluster number on model performance is investigated. It is found that the model performance for the calibration period becomes better with the increase of the cluster number; however, the increase of model complexity leads to poor predictive capabilities of the model due to overfitting.
Overall, the main findings of this paper are as follows: among two hydro-meteorological factors: rainfall and temperature, temperature plays a more crucial role in controlling the change of the hydrological response in the study sites, so allowing parameters to vary among temperature-based clusters could improve the model performance. When using the time-varying parameters to account for the variation of the hydrological response, it is important to identify the key factors that cause the change of the hydrological response, and the selection of the time-varying parameters should correspond to the identified key factors. Clustering algorithm is an effective method to identify data with similarities of characteristics of interest. The number of clusters has an effect on model performance, therefore, it is of great importance to select the appropriate cluster number to achieve a balance between the model complexity and model performance. In this study, the optimal performance for both calibration period and validation period is achieved when the cluster number is equal to four. 24 This study only used one hydrological model at three catchments, which really limits the generalization of conclusions. However, the methodology proposed in this study is generic and applicable to other catchments and hydrological models. We hope this paper will stimulate more studies to explore a variety of sites with different hydrological models using the proposed methodology to gain more knowledge about the variation of the hydrological response.