Abstract
Hydrological indicators support analyses about the impact of climate and anthropogenic changes on riverine ecosystems. As these studies often rely on hydrological models for estimating the future value of the indicators, it is important to investigate how well, and under which conditions, we can replicate changes in the indicators. This study looks at these questions by investigating the performance that can be achieved depending on the objective function for calibrating the model, the direction of the change in the indicator, the magnitude of this change and the properties of the catchments. The results indicate that, in general, indicators describing the magnitude of discharge (monthly and annual) can be adequately estimated with hydrological models, but that there are difficulties when estimating the characteristics of flow pulses, flow reversals and timing variables. For some of these indicators, it is not even possible to correctly estimate the direction of large changes. The analysis showed further that these problems cannot be resolved by adjusting the calibrated parameters, but that the model structure is unsuitable for modelling these indicators.
INTRODUCTION
The expected economic and population growth rates, coupled with the effect of climate change, raise fears that the number and severity of water-related conflicts might increase in the future. When dealing with competing uses of water, it is necessary that societies agree on the water quality and quantity allocated to fulfilling different needs and that they define, based on these agreements, clear targets for managing this resource (Chopra et al. 2005). The defined targets will depend, among other factors, on the current state and level of use of the aquatic resources as the closer the systems are to their natural state, the more likely that they will be managed for preserving ecosystems and their biodiversity (Acreman et al. 2014a). Accurate assessments of the current state of river ecosystems are thus important for supporting the definition of management objectives. Furthermore, such assessments constitute the basis for estimating how ecosystems could be affected by future interventions or changes (Acreman et al. 2014a), playing a prominent role in preparing for the future.
One important aspect these assessments need to consider is the health of river communities (i.e., plants, invertebrates and fish). However, due to the lack of available ecological data, such analyses are difficult to carry out (Acreman et al. 2014b). This is remedied by resorting to indicators describing the flow regime and using them as proxies for the ecological state by assuming that these indicators are the main determinants of the ecosystem properties (Poff et al. 1997). This allows for comparing the values of the indicators, for instance, at two points in time, and concluding that there is no impact on the river ecosystem if these values do not vary significantly (Acreman et al. 2014b). While it is recognized that this approach leaves out important factors influencing riverine communities (e.g., water temperature, sediment load), it is not unreasonable to do so as these factors tend to be correlated with river discharge (Colby 1956; Sinokrot & Gulliver 2000; Acreman et al. 2014b), which is regarded as the most important factor.
While the hydrological indicators can be directly calculated from discharge data, it is necessary to rely on data-based statistical approaches or hydrological models when discharge data are unavailable. Data-based approaches have been widely used for estimating long-term hydrological indicators by establishing links between catchment properties (e.g., soil properties or average precipitation) and the hydrological indicators using, for instance, regional regressions (Sanborn & Bledsoe 2006; Carlisle et al. 2010; Knight et al. 2012). Hydrological models, on the other hand, use climate time series as model inputs and provide discharge time series as output. The hydrologic indicators can then be calculated from the simulated hydrographs (Shrestha et al. 2014; Caldwell et al. 2015). While both approaches have their merits and disadvantages (see detailed account in Murphy et al. 2013), hydrological models tend to be better suited for carrying out more detailed analyses (e.g., annual variability) at more local scales (see Olsen et al. 2013), while statistical models might be preferred for obtaining a general picture or a first approximation of the long-term averages at more regional scales (see Carlisle et al. 2010). The present study ultimately aims at improving our abilities for estimating changes in the hydrological properties of streams and using this information for adapting to climate change. As this requires detailed local analyses, the study focuses on model-based rather than on data-based indicator estimates.
Previous studies investigating the factors influencing the quality of estimated hydrological indicators obtained with modelled discharge time series have focused primarily on the impact of different model calibration strategies and objective functions. This is not surprising since this is a practical question practitioners and researchers are often confronted with when working with more than one indicator as it has been observed that models tend to replicate the characteristics to which they are calibrated well, while they might have a quite bad performance for estimating other metrics (Murphy et al. 2013). Many efforts have, therefore, gone into finding a small set of objective functions that allow a close reproduction of several indicators. While this question has not yet been settled, the results indicate that considering the indicator of interest in the objective function has a positive impact on the quality of the estimated indicators. It is possible to consider single indicators by calibrating the model directly to the indicator of interest (Kiesel et al. 2017; Pool et al. 2017) or to combine indicators in the context of a multi-objective Pareto calibration (Hernandez-Suarez et al. 2018) or with objective functions that combine various indicators (Vis et al. 2015; Kiesel et al. 2017; Pool et al. 2017).
An aspect that has not been thoroughly considered in this context is our ability for replicating changes in hydrologic indicators. This is somewhat surprising, given that the assessment of changes in river ecosystems is one of the main applications of hydrologic indicators and there is a vast body of literature looking at the performance of rainfall-runoff models in periods with different climate properties to those observed during the calibration period (Coron et al. 2012; Seiller et al. 2012; Brigode et al. 2013; Patil & Stieglitz 2015; Dakhlaoui et al. 2017; Vormoor et al. 2018). Many of these studies were inspired by the work of Klemeš (1986), who advocated for assessing the usefulness of models for estimating hydrographs under changed climate conditions with differential split-sample tests. In such tests, the model is calibrated for a subset of climate conditions and then tested on periods with contrasting climate conditions. Most of these studies compare ‘wetter’ periods with contrasting ‘dryer’ periods as precipitation is the ultimate driver of discharge. If the analysis, however, does not focus on the annual or seasonal discharge, it might not be straightforward to identify contrasting periods a priori (Vormoor et al. 2018). This has implications when considering several indicators as it is expected that they will be sensitive to different climate characteristics, resulting in different sets of years characterizing contrasting climate properties depending on the indicator. In the only study addressing the ability of hydrological models for reproducing changes for a set of hydrological indicators that was identified, Shrestha et al. (2016) considered different ‘cold’ and ‘warm’ ENSO (El Niño Southern Oscillation) and PDO (Pacific Decadal Oscillation) phases for four Canadian subbasins. While this might not necessarily result in highly contrasting periods for each indicator, it is an approach that can be easily justified and which avoids the need for a sensitivity analysis for each indicator. This study follows another route as it investigates our ability for modelling changes in the IHAs irrespective of the changes in climate. This is achieved by analysing changes in performance between different periods with contrasting IHA values and not periods with contrasting climate conditions.
Considering (i) the importance of knowing how well we can reproduce changes in the IHAs and (ii) the lack of studies investigating this question systematically for a large number of catchments, the objectives of this study are:
Investigating if models calibrated to achieve a good overall performance outperform models calibrated towards specific hydrological indicators when modelling different types of changes in the indicators.
Investigating if the performance for estimating changes in the hydrologic indicators varies with the direction and magnitude of the considered changes.
Identifying possible links between catchment properties and our ability for modelling changes in the indicators.
A strength of the analysis presented here is that it considers a large number of catchments exhibiting considerable variations in climate and catchment properties for obtaining more robust and representative results.
METHODOLOGY
Datasets and catchments
This study analyses 560 catchments located across the United States with no discharge gaps between 1981 and 2008. They are representative of a large spectrum in climate and catchment properties as shown by the ranges covered by the mean annual precipitation (between 304 mm and 3,250 mm), mean annual discharge (between 2 mm and 3,508 mm) and aridity index (between 0.21 and 4.65). These catchments belong to the GAGES II (Geospatial Attributes of Gages for Evaluating Streamflow, version II) dataset (Falcone 2011), which consists of gauges maintained by the USGS (United States Geological Survey) with at least 20 years of data or active in 2009. The gauges used in the present study are reference basins, which means they are among the least disturbed basins.
The climate data for the daily precipitation and the minimum and maximum temperatures were obtained from the Daymet 3 dataset (Thornton et al. 1997; Thornton et al. 2014) produced by the NASA Oak Ridge National Laboratory (ORNL). This dataset covers North America from 1980 to the present at a 1 × 1 km2 resolution. The daily discharge time series were downloaded from the USGS webpage.
The catchments were characterized with respect to seven variables describing the climate (Walsh seasonality, aridity index, maximum 1-day annual precipitation, percentage of precipitation falling as snow), subsurface properties (baseflow index (BFI)), topography (average slope) and the water balance (runoff coefficient). The climate properties were estimated with the Daymet 3 data. The approach described by Walsh & Lawler (1981) was used for calculating the seasonality index and the aridity index is defined as the quotient between precipitation and potential evapotranspiration. The BFI was taken from a gridded dataset provided by the USGS (Wolock 2003) and the average slope was estimated from a 10 m resolution digital elevation model obtained from the National Elevation Dataset (NED).
Hydrological model
The conceptual rainfall-runoff model used in this study is the most complex member of a set of eight models exhibiting a stepwise increase in complexity. The models were initially developed by Jothityangkoon et al. (2001), Atkinson et al. (2002) and Farmer et al. (2003) and later modified by Bai et al. (2009). The required model inputs consist of the daily precipitation and the maximum and minimum daily temperatures. Interception is modelled as a fraction of the precipitation input and described by the parameter cei. A day-degree-based snow accumulation and melt routine (described by the parameters tt and dd) implemented for 100 m elevation bands, computes then the liquid input to the soil bucket of size sb. The mechanisms available for water removal from this bucket are saturation overflow, subsurface flow (described by parameter ass), groundwater recharge (described by the coefficient kd) and evapotranspiration. The water reaching the deep flow bucket is routed to the stream as defined by the baseflow recession coefficient abf. The evapotranspiration is modelled as a function of soil moisture at the current timestep and the potential evapotranspiration, estimated according to the Hargraeves–Samani approach (Hargraeves & Samani 1982). Unlike most conceptual models, this model uses a parameter representing the field capacity (fc) for distinguishing a saturated and an unsaturated zone in the soil bucket. These zones interact differently with the potential evapotranspiration and differ in their way of estimating the fraction of the catchment covered by vegetation (m) and the fraction of bare soil (1 − m). A sketch of the model structure can be found in Massmann (2019).
The hydrological model was run at a daily timestep between 1 October 1981 and 30 September 2008. A three-year warm-up period was considered for stabilizing the state variables, leaving the period between 1 October 1984 and 30 September 2008 (24 years) for the analysis. The model was calibrated under a Monte Carlo (MC) framework that relied on 40,000 model runs for each catchment. For each of these model runs, the parameter values were randomly sampled assuming a uniform distribution between the ranges shown in the Supplementary Material (Table S1).
Indices of hydrological alteration
For keeping the study manageable, it was necessary to select a subset of hydrological indicators. One option would have been to build on a recent study identifying the indicators correlating with the integrity of invertebrates and fish populations in different regions in the United States (Carlisle et al. 2017). However, since the correlated indicators vary depending on the region and between the invertebrate and fish communities, it would have been possible to focus only on one region. Instead, it was decided to select a subset of Richter's indices of hydrological alteration (IHAs) which characterize different flow conditions and components of the flow regime (Richter et al. 1996). Flow condition refers here to the magnitude of discharge which could be classified, for instance, into average, low and high discharges. The five components of the flow regime affecting the ecological processes in rivers are the magnitude of the discharge, the frequency, duration and timing of certain flow magnitudes and the rate of change of the discharge (Richter et al. 1996). This study considers 12 indicators which attempt to cover a wide range of flow conditions (average, high, very high, low, very low) and all five regime descriptors (Table 1).
Name . | Description . | Unit . | Regime property . | Flow condition . |
---|---|---|---|---|
Q Jan | Mean annual flow January (winter) | mm | Magnitude, timing | Average |
Q Jul | Mean annual flow July (summer) | mm | Magnitude, timing | Average |
Q95 | 95 exceedance percentile | mm | Magnitude, duration | Low extreme |
Q5 | 5 exceedance percentile | mm | Magnitude, duration | High extreme |
Q year | Mean annual discharge | mm | Magnitude | Average |
dHFP | Duration of high flow pulses | days | Magnitude, duration | High |
dLFP | Duration of low flow pulses | days | Magnitude, duration | Low |
nHFP | Number of high flow pulses | – | Magnitude, frequency | High |
nLFP | Number of low flow pulses | – | Magnitude, frequency | Low |
Reversals | Number of reversals | – | Rate of change | Average |
DOYx | Day of year maximum discharge | – | Timing | High extreme |
DOYn | Day of year minimum discharge | – | Timing | Low extreme |
Name . | Description . | Unit . | Regime property . | Flow condition . |
---|---|---|---|---|
Q Jan | Mean annual flow January (winter) | mm | Magnitude, timing | Average |
Q Jul | Mean annual flow July (summer) | mm | Magnitude, timing | Average |
Q95 | 95 exceedance percentile | mm | Magnitude, duration | Low extreme |
Q5 | 5 exceedance percentile | mm | Magnitude, duration | High extreme |
Q year | Mean annual discharge | mm | Magnitude | Average |
dHFP | Duration of high flow pulses | days | Magnitude, duration | High |
dLFP | Duration of low flow pulses | days | Magnitude, duration | Low |
nHFP | Number of high flow pulses | – | Magnitude, frequency | High |
nLFP | Number of low flow pulses | – | Magnitude, frequency | Low |
Reversals | Number of reversals | – | Rate of change | Average |
DOYx | Day of year maximum discharge | – | Timing | High extreme |
DOYn | Day of year minimum discharge | – | Timing | Low extreme |
While the information provided in Table 1 suffices for calculating most indicators, additional information might be necessary for understanding how the number of high flow pulses (HFPs) and low flow pulses (LFPs) was estimated. The approach used in this study consisted of taking the discharge time series for the 24 years considered in the analysis and estimating the 25th and the 75th flow percentiles. A pulse is then defined as an uninterrupted period during which the discharge is above the 75th percentile (HFP) or below the 25th percentile (LFP).
Study set-up
Estimation of changes in the hydrological indices
The steps that need to be followed for calculating the changes in the IHAs are (Figure 1):
Calculate the annual value of the indicator using the observed discharge (IHAo).
Calculate the annual value of the indicator (IHAsim) and the annual value of the Nash–Sutcliffe Efficiency (NSE) for each of the 40,000 simulated discharge timeseries.
Sort the annual IHAo values and separate them into four groups with increasing IHAo value. This is done for analysing different types of changes (e.g., from very low to high indicator values and vice versa). Randomly select five years in each group and calculate the average IHAo, IHAsim and NSE value for these 5 years. For IHAo, these averages are identified as To,g with g standing for the group. For the IHAsim and NSE, the 5-year averages are identified as Th,g,c and Tn,g,c, respectively with c denoting the number of the MC run (i.e., ranging from 1 to 40,000). The usage of the average of five annual values, instead of one value accounting for the whole period, is inline of the use of the split KGE by Fowler et al. (2018), who found this results in consistently better performances during validation.
Identify, for each group, the MC run within the highest NSE value and the lowest absolute IHA error (abs(IHAsim- IHAo)). Store these values in the vectors MCn and MCh, respectively, with MCn[g] and MCh[g] specifying the MC run with the highest NSE and lowest IHA error in group g, respectively.
For all combination of the four groups (g1, g2, g3, g4) calculate the observed and simulated changes in the indicators. This results in 12 cases (i.e., g1 to g2, g1 to g3, g1 to g4, g2 to g1 and so on). The observed changes in IHAs are calculated as the difference in the observed IHA between both considered periods. For the modelled timeseries, two sets of changes are calculated. The first is based on the optimal Monte Carlo run with respect to the NSE, whereas the other set considers the optimal parameter set with respect to the IHA of interest.
Evaluation metrics
Data analysis
It is expected that both the ability for estimating the direction and the magnitude of changes might vary depending on the magnitude and direction of the considered change. For instance, it might be more difficult to get the direction of change right, the smaller the change is. The ability for correctly reproducing the magnitude of change might also vary depending on the type of change (i.e., if it corresponds to a wet-to-dry or a dry-to-wet situation and to a large or small change). The analysis was therefore carried out for different percentiles of observed IHA changes.
For shedding some light on the factors affecting the performance of hydrological models for estimating IHA changes, the relationship between some catchment descriptors (see section ‘Datasets and catchments’) and the performance metrics (AiD, NE and RE) was analysed using Pearson's correlation coefficient. This analysis was not carried out for the two variables describing the timing of events (DOYx and DOYn) as they are circular variables and thus require a different approach.
RESULTS AND DISCUSSION
How do models calibrated with different objective functions differ in their ability to reproduce IHAs?
The impact of the objective function on the calculated IHAs is shown in Figure 2, which depicts scatterplots relating the observed (x-axis) with the simulated IHAs (y-axis). The plots on the left were obtained with the Monte Carlo run achieving the highest NSE in each group (i.e., each plot has 560 (catchments) × 4 (groups) points), whereas the plots on the right side show the results when using the model run with the best value for the corresponding IHA.
The plots for the Q5 exceedance are similar to the plots for Q Jan, Q Jul, Q95 and Q year (Figure S1 in the Supplementary material), all of which describe the magnitude of the discharge (i.e., their units are mm). It can be seen that the IHAs calculated with the NSE calibrated model agree relatively well with the observed IHAs (left side plots). As the plots on the right side show the best IHA value that was achieved with the 40,000 Monte Carlo runs, it is not surprising that the observed and simulated values lie almost on the 1:1 line.
The patterns observed for the number of low pulses and reversals are similar to the patterns observed for the number of high pulses (Figure S1 in the Supplementary Material). While the plots on the right side agree better with the observed IHAs than the plots calibrated with the NSE, there is a considerable underestimation of the observed values even when the IHAs are used for calibrating the model. This underestimation is more pronounced for higher IHA values as the model is able to reproduce the values of the indicators at the lower end of the distribution to a certain extent, but there are large deficiencies when the IHAs exhibit higher values. For example, the model was unable to reach a value above 30 for the LFP indicator, while the observed values rise above 50. The fact that the hydrological model is unable to reproduce these features with any of the 400,000 MC runs indicates that the performance cannot be improved with more sophisticated calibration approaches, but that the problem lies in the model structure. The recent surge of interest in model structure-related issues has led to the development of modular hydrological models which enable researchers to compare and test different model structures (Clark et al. 2008; Fenicia et al. 2011) and to investigate the impact of different structural components on the hydrograph (Lane et al. 2019; Massmann 2020). The identification of an adequate model structure for a given catchment is, however, still an unresolved problem. The reason for this is that besides model structural uncertainty there are also measurement errors (e.g., in the discharge timeseries) and uncertainties in the climate data which is used for forcing the models. Since model parameters can compensate for errors in the forcing, discharge measurements and model structure, it is difficult to isolate the effect of model structures on model performance (Gupta & Govindaraju 2019).
Due to the similarity of the plots for the indicators describing the duration of the HFPs and LFPs, only the results for the HFPs are presented in the manuscript. The model is able to correctly simulate this indicator as there is good agreement between the observed and simulated values on the right-hand side plot. However, it is observed that the optimal parameter set with respect to the NSE is not a good choice for modelling these indicators as it results in large overestimations of the duration of the pulses.
As the plots for the DOY of maximum and minimum discharge have similar patterns, the results are only shown for the day of maximum discharge. It is seen that the NSE is not able to reproduce this feature well, but there is no consistent over- or underestimation. The results reached with the IHA calibrated models agree in general with the observed IHA, but there is a large scatter. It is further interesting to note that the model calibrated with the IHA can model the day with minimum discharge better than the day with maximum discharge, but that this is the other way around when the NSE is used as objective function during calibration.
In summary, the hydrological model is able to simulate the indicators describing the magnitude of the flow and the timing of extreme events, but it is inadequate for modelling the number of pulses and reversals, which tend to be underestimated. These results agree well with previous studies. Shrestha et al. (2014) found, for instance, that the correlation between observed and modelled IHAs describing the magnitude tended to be above 0.5, while most IHAs related to flow pulses and reversals had values below 0.5.
The next sections investigate if our ability for correctly modelling the IHAs describing the magnitude and timing properties of hydrographs is reflected in good estimates of the changes in these IHAs. Analogously, we want to find out if the difficulties of hydrological models for reproducing the number of pulses and reversals result in low performances for modelling changes in these indicators as it might be possible that estimates of the IHAs obtained with the models are biased, but that their changes can be nevertheless modelled with some confidence.
Which objective function is most suitable for modelling IHA changes?
Table 2 allows for assessing the performance of the NSE and the IHAs as objective functions for calibrating the hydrological model used for estimating the IHAs. The table shows, for each IHA, the percentage of cases in which the model calibrated with the corresponding IHA has a better performance than the model calibrated with the NSE. The performance is evaluated with respect to two criteria: the agreement in the direction and magnitude of the modelled and observed IHA changes. The results are shown for five different types of changes, each comprising 20% of the total cases. The first (second) column shows the results for very large (large) negative changes, the central column shows the result for small changes (both negative and positive) and the last two columns show the results for large and very large positive changes. It is seen, for example, that all values for the January discharge (Q Jan) are below 50. This means that the model calibrated with the IHA tends to have a better performance than the model calibrated with the NSE in less than 50% of the cases.
Cells with white and grey background indicate where the NSE and the IHA achieved the best performance, respectively. Cells in light grey colour show no significant difference (p-value <0.05) according to the z-score of proportions test. The results are shown for five change percentiles. The first column shows the results for the largest negative changes, the central column for the smallest (negative and positive) changes and the last column for large positive changes.
There is a tendency for the objective function that is best for modelling the direction of change to have also the best performance for modelling the magnitude of change. It is further seen that for the direction of change there is a tendency of having values closer to 50 in the central columns and more extreme values (either closer to zero or closer to 100) towards the edges. This indicates that the differences between the two objective functions are less pronounced for smaller than for larger changes. This relationship is, however, less evident for the magnitude of change.
Changes in the IHAs describing flow magnitudes (i.e., the first five indicators in Table 1) are generally better modelled with parameter sets obtained with the NSE (rather than with the IHA) as objective function. This indicates that the focus of the NSE on achieving a good overall performance in reproducing the hydrograph makes this a good objective function for modelling changes in the annual flows, monthly flows and flow percentiles. On the other hand, it suggests that single hydrological indicators might not constrain well enough all parameters to which the indicators are sensitive.
For the indicators describing the pulses and reversals, no clear trend could be observed. Finally, the indicators describing the day of the year in which the maximum and minimum values are observed, tend to be better modelled when the calibration is done with the IHA. As the DOY is calculated using the discharge of only one day in each year, it is important that the model reproduces adequately the relative magnitudes of the peaks and low period flows. This seems to be best achieved by focusing just on the days with extreme values rather than with a model calibrated to reproduce adequately overall model performance.
How well can we estimate the direction of IHA changes?
Figure 3 informs about the agreement in direction between the observed and simulated changes for the IHAs calculated using the NSE as objective function. It can be seen that the direction of change can be best reproduced for the Q year indicator. The ability for reproducing the direction of change increases with the magnitude of the change until changes of about 1,500 mm. From there on, the ability of the model for identifying the direction of changes starts to decline and it is observed, for the largest changes, that the model is only capable of getting the direction of changes right about 50% of the time.
The patterns for the HFPs and LFPs are similar, although the direction of change of the LFPs is less likely to be correctly reproduced than the direction of change of the HFPs. It is further interesting to note that for the number of pulses there is a clear increase in the ability for modelling the direction of changes as the magnitude of the changes increases, while this is not the case for the duration of the pulses, where the ability for reproducing the direction seems to be independent of the magnitude of the change.
The size of the considered change has only a small impact on the ability for reproducing the indicators describing the timing of extreme flows (DOYx and DOYn). In agreement with the results shown for the pulses, which showed that difficulties for reproducing the direction of change were larger for the indicators describing low flows, it is seen here that the day of minimum discharge has a lower agreement in direction than the day of maximum discharge. Moreover, as the agreement in direction of the DOYn varies around 0.5, it can be concluded that the model has no skill for modelling changes of this indicator.
The impact of the objective function used for defining the best parameter set is observed in Figure 4, which shows the results when using the IHA as objective function. The most striking difference between these plots at the ones presented in Figure 3 is their asymmetry, which indicates that the results vary depending on the type of change (i.e., from high to low indicator values or vice versa), something which is not observed when the models are calibrated with the NSE.
How well can we estimate the magnitude of IHA changes?
The first four rows of plots in Figure 5 show the absolute nominal error (NE) incurred when modelling changes in the IHAs with the model run optimizing the NSE. The plot for Q year is representative for all indicators describing the discharge magnitude. It shows an increase in the error as the modelled changes become larger. This increase is small for smaller changes but rises for more extreme changes. The error of the number of pulses and reversals shows an almost linear increase with little scatter as the magnitude of the change increases. The reason for this can be understood when looking at Figure 1, which shows only small differences in the simulated values of these IHAs regardless of the observed IHA values.
Besides an analysis of the NE, it is important to look at the relative error as it might be easier to grasp the relevance of an error when comparing its magnitude, for example, to the mean value of the indicator. Another advantage of using the relative error is that it facilitates comparisons of model performance between indicators. The last three rows of plots in Figure 5 show the relative error for some indicators. The relative error for the indicators describing the discharge magnitude is high for small changes. For large changes, the errors are low and almost independent of the size of the modelled change. The REs for the number of pulses tend to increase as the modelled changes increase, which can again be explained by the little variability of the modelled IHA, even as the observed values increase (see Figure 1). The relative error of the timing IHAs is not considered as all catchments have the same 365 days and there is thus no need for normalizing the DOYx and DOYn.
Analysing the impact of the direction of change on the reproducibility of IHA changes
Sections ‘How well can we estimate the direction of IHA changes?’ and ‘How well can we estimate the magnitude of IHA changes?’ show how our ability for modelling IHA changes varies with the magnitude and direction of the considered IHA change. In most cases, it was possible to observe differences as a function of the magnitude of the change; the impact of the direction of change is, however, more subtle and requires additional analyses.
Table 3 shows the quotient between the mean absolute negative error and the mean absolute positive error for five change percentiles. For example, the percentiles considered for the largest changes were 0.1–1 (negative changes) and 99–99.9 (positive changes). The Q Jan indicator has an average error of 125.2 mm in the 0.1–1 percentile and an error of 63.5 mm in the 99–99.9 percentile (Table S4 in the Supplementary Material). The quotient of the negative and positive errors is thus 1.97 (Table 3), indicating that the error for negative changes is almost twice as large as the error for positive changes. Analogously, all other values larger than one in Table 3 indicate that the errors for negative changes are larger than the errors for positive changes, whereas this is the other way around when the values in Table 3 are smaller than one. Since negative changes are encountered when the indicator goes from high to lower values, values above one in Table 3 indicate that changes can be better modelled when they are from low to high indicator values.
. | NSE calibration decreasing absolute size of the change . | IHA calibration decreasing absolute size of the change . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | . | ||||||||
Percentile quotient | 0.1–1 99–99.9 | 1–15 85–99 | 15–25 75–85 | 25–35 65–75 | 35–45 55–65 | 0.1–1 99–99.9 | 1–15 85–99 | 15–25 75–85 | 25–35 65–75 | 35–45 55–65 |
Q Jan | 1.97 | 1.51 | 1.23 | 1.15 | 1.01 | 2.57 | 1.63 | 1.23 | 1.27 | 1.94 |
Q Jul | 0.85 | 1.32 | 1.27 | 1.14 | 1.10 | 2.65 | 1.88 | 1.94 | 1.74 | 1.75 |
Q95 | 0.92 | 1.12 | 1.11 | 1.02 | 1.09 | 1.26 | 1.50 | 1.35 | 1.43 | 1.71 |
Q5 | 1.34 | 1.74 | 1.41 | 1.29 | 1.19 | 1.77 | 1.23 | 1.15 | 1.23 | 1.25 |
Q year | 1.12 | 1.78 | 1.34 | 1.25 | 1.19 | 1.49 | 1.12 | 1.16 | 1.20 | 1.28 |
nHFP | 0.70 | 1.02 | 0.91 | 1.08 | 1.12 | 2.33 | 2.30 | 2.59 | 2.66 | 2.12 |
dHFP | 1.05 | 1.03 | 1.03 | 1.15 | 0.92 | 1.22 | 1.34 | 1.44 | 1.48 | 1.50 |
nLFP | 1.01 | 1.00 | 1.03 | 1.03 | 1.06 | 1.14 | 1.19 | 1.24 | 1.27 | 1.32 |
dLFP | 1.01 | 1.01 | 1.01 | 1.04 | 1.00 | 1.04 | 1.10 | 1.14 | 1.20 | 1.19 |
Reversals | 0.99 | 1.00 | 1.01 | 1.03 | 1.01 | 1.04 | 1.18 | 1.18 | 1.24 | 1.24 |
. | NSE calibration decreasing absolute size of the change . | IHA calibration decreasing absolute size of the change . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
. | . | . | ||||||||
Percentile quotient | 0.1–1 99–99.9 | 1–15 85–99 | 15–25 75–85 | 25–35 65–75 | 35–45 55–65 | 0.1–1 99–99.9 | 1–15 85–99 | 15–25 75–85 | 25–35 65–75 | 35–45 55–65 |
Q Jan | 1.97 | 1.51 | 1.23 | 1.15 | 1.01 | 2.57 | 1.63 | 1.23 | 1.27 | 1.94 |
Q Jul | 0.85 | 1.32 | 1.27 | 1.14 | 1.10 | 2.65 | 1.88 | 1.94 | 1.74 | 1.75 |
Q95 | 0.92 | 1.12 | 1.11 | 1.02 | 1.09 | 1.26 | 1.50 | 1.35 | 1.43 | 1.71 |
Q5 | 1.34 | 1.74 | 1.41 | 1.29 | 1.19 | 1.77 | 1.23 | 1.15 | 1.23 | 1.25 |
Q year | 1.12 | 1.78 | 1.34 | 1.25 | 1.19 | 1.49 | 1.12 | 1.16 | 1.20 | 1.28 |
nHFP | 0.70 | 1.02 | 0.91 | 1.08 | 1.12 | 2.33 | 2.30 | 2.59 | 2.66 | 2.12 |
dHFP | 1.05 | 1.03 | 1.03 | 1.15 | 0.92 | 1.22 | 1.34 | 1.44 | 1.48 | 1.50 |
nLFP | 1.01 | 1.00 | 1.03 | 1.03 | 1.06 | 1.14 | 1.19 | 1.24 | 1.27 | 1.32 |
dLFP | 1.01 | 1.01 | 1.01 | 1.04 | 1.00 | 1.04 | 1.10 | 1.14 | 1.20 | 1.19 |
Reversals | 0.99 | 1.00 | 1.01 | 1.03 | 1.01 | 1.04 | 1.18 | 1.18 | 1.24 | 1.24 |
The results were estimated for five different change percentiles and for the NSE and IHA calibrated models.
The cells without values had non-significant correlation coefficients (p-value = 0.05).
For almost all considered indicators and change percentiles, the model is able to better model changes from low to high indicator values. These differences on performance as a function of the direction of the modelled change are more pronounced when the model is calibrated with respect to the IHA than to the IHAs. For the indicators describing flow magnitudes (i.e., Q year), this means that it is easier to model changes when models are calibrated in dryer periods and then run in wetter periods than the other way around. This is in agreement with previous studies (Vaze et al. 2010; Coron et al. 2012).
Besides analysing the impact of the direction of change on the mean error (Table 3), it was investigated how the direction of the change impacts our ability for reproducing the direction of change (Tables S2 and S3 in the Supplementary Material). There is almost no influence of the direction of change on our ability for identifying the direction of change if the NSE is used as objective function. The differences are more pronounced when the model is calibrated with the IHAs and when modelling the indicators describing the flow dynamics (i.e., pulses and reversals). It is interesting to see that changes in the HFPs are better modelled when they are from high to low flows, while changes in LFPs can be better modelled when they are from low to high values.
Identification of catchment descriptors influencing the quality of the IHA estimates
Table 4 shows the results of the correlation analysis between the catchment descriptors and two evaluation metrics (the agreement in direction and the NE). In general, there is a high agreement in the sign of the correlation coefficient observed for different IHAs. A higher agreement in the modelled direction of change is observed for wetter catchments (i.e., higher maximum 1-day precipitation and lower aridity), as well as for catchments with lower BFI. As previous studies have found that it is more difficult getting a good performance in more arid (van Esse et al. 2013; Weingartner et al. 2013) catchments and in catchments with higher BFI (Massmann 2020), these results indicate that the ability for getting the direction of the changes right is related to our ability of getting a good overall model performance.
The NE of the IHAs describing the magnitude of discharge has a negative correlation with aridity and a positive correlation with the maximum 1-day precipitation, indicating that dryer catchments, with lower discharges, tend to also have lower nominal errors. There is further a positive correlation with the BFI. The correlation for the other catchment descriptors is difficult to interpret and it might be more helpful to analyse the patterns visually. Figure 6 shows the spatial distribution of the average NE for each catchment and some representative IHAs. The IHAs describing the flow magnitude (Q year) tend to have low errors in the dryer central Great Plains and higher errors in the wet western coast. The number of reversals, on the other hand, has higher errors in dryer catchments and lower errors in the wetter ones. The error in the number of HFPs is low in the snowy Rocky Mountains area, while reaching high values in the eastern part characterized by a more homogeneous precipitation pattern across the year. The error in the number of LFPs is lowest in the western coast, which has a high seasonality in the precipitation and higher in the remaining United States. The errors in the duration of the pulses are highest in the central part and decrease towards the coasts. Finally, it is seen that changes in the day of the maximum discharge (DOYx) can be best modelled in areas with a large influence of snow and in the western coast, with clear seasonal precipitation patterns. The errors when modelling changes in the day on minimum discharge (DOYn) do not have a distinct spatial pattern.
Limitations of the study
Investigating the ability of hydrological models to model changes in IHAs is important in the context of climate change studies. This study sheds some light on the ability of hydrological models for modelling changes in IHAs, but the results cannot be used directly for inferring about the ability for modelling the impact of climate change as the link between IHAs and the climate dynamics (i.e., information about climate sensitivity) was not considered.
SUMMARY AND CONCLUSIONS
This study investigated the performance of a conceptual rainfall-runoff model for estimating changes in indicators of hydrological alteration.
The results agreed with previous studies showing that indicators describing the magnitude of discharge are better modelled than indicators describing the number of pulses and reversals. An interesting finding in this context was that the difficulties for modelling the number of flow pulses and reversals cannot be attributed to calibration deficiencies, but that the model is unable to reach the observed values. This suggests that, instead of focusing on different calibration alternatives, we should investigate and address the causes of model failure. It would be further interesting to find out if this is a limitation of the specific rainfall-runoff model used in this study or a more general deficiency of the type of model (e.g., are distributed models able to achieve better results than lumped models?). Another question that could be investigated is if the quality of the IHAs estimates can be improved by redefining the indicators, for example, by constraining the number of considered pulses and reversals.
It was found that our ability for modelling changes in hydrological indicators is correlated with our ability for modelling the corresponding indicators.
The quality of the estimates of IHA changes depends strongly on the considered indicator. The hydrological model was able to provide reliable estimates of the direction of the changes for the IHAs describing the discharge. While there are some difficulties in estimating the direction of the change when changes are small, the model achieves good performances as the magnitude of the changes increases. On the contrary, the model was not able to correctly estimate the direction of the changes for the duration of the pulses, the number of reversals and the day on minimum discharge, independently of the size of the considered change. For the number of pulses, the model had some skill for estimating the direction of large changes, but the estimates for smaller changes were unreliable.
The magnitude of changes might be adequately estimated across the entire range of variation of the changes for indicators describing the discharge magnitude. For pulses and reversals, on the other hand, there is a clear reduction in performance with increasing magnitude of the considered changes, decreasing the potential applicability of this approach for estimating the impacts of climate change on the discharge dynamics. These results do not point to a shortcoming of the models for general hydrology, but to limitations when used for modelling hydrologic indicators under change. The results highlight, furthermore, the need for developing alternative methodologies for understanding how climate change could affect riverine ecosystems.
Regarding the influence of the objective function used for calibrating the hydrological model on the ability of reproducing changes in the IHAs, it was observed that the results vary depending on the considered IHA. For indicators describing discharge magnitude, the results tended to be better when the NSE was used as objective function. For flow pulses, reversals and timing variables, on the other hand, the magnitude of the error tended to be lower when the IHA was used for calibrating the model, while there seemed to be no clear tendency when estimating the direction of the changes.
An analysis of differences in the quality of the IHA estimates depending on the direction of the changes showed that changes from low to high indicator values could be better modelled than changes from high to low values. This effect was more pronounced when the models were calibrated with respect to the IHAs. This has some implications when modelling the impacts of climate change as we will calibrate our models for the current climate and the use for estimating the discharge for dryer conditions. Since it was found that these changes (wet to dry) are more difficult to get right, it is important to avoid overestimating our ability for modelling dryer periods.
An exception is found for the agreement in direction for the indicators describing HFPs and calibrated with the IHA, where the direction of changes could be best predicted for changes from high to low indicator values.
An important strength of this study is that it relies on many catchments and that it considers different types of changes (with respect to the magnitude and direction of the changes). Such a stratification of the changes allows for additional insight. For example, we found that models are able to model adequately low IHA values of the indicators describing the discharge dynamics, but have problems in reproducing high values.
DATA AVAILABILITY STATEMENT
The primary data used in the study are publicly available from U.S. organizations. More detailed information about how it can be accessed is found in the text. Processed data are available from the author upon request.
ACKNOWLEDGEMENTS
Three anonymous reviewers provided numerous comments and suggestions that helped to greatly improve the manuscript. This research was funded by the Austrian Science Fund (FWF): project J-3689-N29.
SUPPLEMENTARY MATERIAL
The Supplementary Material for this paper is available online at https://dx.doi.org/10.2166/nh.2020.073.