## Abstract

We explore post-processing methods that can reduce biases in simulated flow in a hydrological model (HYMOD). Here, three bias-correction methods are compared using a set of calibrated parameters as a baseline (Cases 1 and 5). The proposed bias-correction methods are based on a flow duration curve (Case 2), an autoregressive model based on residuals obtained from simulated flows (Case 3), and a rating curve (Case 4). A clear seasonality representing a more substantial variability in winter than summer was evident in all cases. The extended range of residuals was usually observed in winter, indicating that the HYMOD may not reproduce high flows appropriately. This study confirmed that bias-corrected flows are more effective than the baseline model in terms of correcting a systematic error in the simulated flow. Moreover, a comparison of root mean square error over different flow regimes demonstrates that Case 3 is the most effective at correcting systematic biases over the entire flow regime. Finally, monthly water balances for all cases are evaluated and compared during both calibration and validation periods. The water balance in Case 3 is also closer to the observed values. The effects of different post-processing approaches on the performance of bias-correction are examined and discussed.

## HIGHLIGHTS

We explore three bias-correction methods that can reduce biases in simulated flow.

This is the first attempt to combine calibration and bias correction in the runoff simulation.

This study confirmed that bias correction approaches are more effective in terms of correcting a systematic error.

The monthly water balance estimated from the proposed approach was close to that of the observed value.

## INTRODUCTION

Hydrological models are essential tools (Madsen 2000; Wagener *et al.* 2003) for understanding water cycles that provide a basis for decision-making related to water resource management, land planning, infrastructure design, and water supply. They are also imperfect representations of the real world due to model uncertainties, measurement errors, sampling errors stemming from insufficient data, climate variability, and inadequate descriptions of initial and boundary conditions. Understanding the hydrological behavior of a catchment is a complicated task (De Vos *et al.* 2010b). Typical hydrological modeling processes include: (1) parameterization based on observed data; (2) sensitivity analysis to diagnose which parameter is the most essential during calibration; and (3) calibration, either manually or automatically (Eckhardt & Arnold 2001; Eckhardt *et al.* 2005; Muleta & Nicklow 2005; Abbaspour *et al.* 2007; Bekele & Nicklow 2007; Van Griensven & Meixner 2007; Green & Van Griensven 2008; Zhang *et al.* 2009).

Commonly used performance metrics in the form of objective functions in rainfall–runoff model calibration are Nash–Sutcliffe efficiency and root mean square error (RMSE), which can be obtained by comparing observed and simulated streamflows. However, the limited ability of these metrics to cover different aspects of hydrologic behavior can lead to systematic bias after calibration. Outputs from a calibrated hydrological model also have systematic biases, although model parameters can be obtained by minimizing the difference between the observed and simulated flows within an optimization framework. Model efficiency is influenced more by high flows than low flows (Pushpalatha *et al.* 2012) without explicitly treating them, and information on temporal variability of flow is poorly preserved. Attempts to explore these issues and improve model uncertainties have been made. For example, a combination of several performance measures was applied to explore different aspects of the fit between observed and simulated flow (Gupta *et al.* 1998; Wagener *et al.* 2001a; Freer *et al.* 2004).

Apart from model calibration, statistical post-processing approaches for ensemble streamflow predictions have been introduced to improve model prediction skills by reducing systematic biases (Bogner & Kalas 2008; Weerts *et al.* 2011; Zhao *et al.* 2011). The main objective of post-processing is to reduce biases in the mean and variance of raw ensemble streamflow predictions. A number of post-processing approaches are available to correct simulated streamflow in a hydrologic model. One such approach is a quantile mapping of the cumulative distributions of a prediction onto those of the observed distributions (Shi *et al.* 2008; Madadgar *et al.* 2014). Another correction approach is based on a Bayesian model (Krzysztofowicz & Maranzano 2004; Brown & Seo 2010, 2013). A simple bias-correction method and generalized linear regression (Zhao *et al.* 2011) have been applied to reduce uncertainties in model predictions. Ensemble streamflow forecasts along with statistical post-processing produce superior performance compared with raw ensemble hydrological forecasts (Krzysztofowicz & Kelly 2000; Krzysztofowicz & Maranzano 2004; Montanari & Brath 2004; Seo *et al.* 2006; Wood & Schaake 2008; Weerts *et al.* 2011). These studies suggest that post-processing can improve a conventional calibration approach in an efficient way (Shi *et al.* 2008; Yuan & Wood 2012; Ye *et al.* 2015).

Previous studies have focused separately on the hydrological model calibration process and the statistical post-processing approach. This study took an integrated approach that combined post-processing techniques (i.e., bias-correction methods) with the calibration of a hydrological model. Specifically, three bias-correction models (Cases 2, 3, and 4) were proposed to reduce systematic biases in streamflow simulation. The proposed bias-correction approaches were evaluated along with a conceptual hydrological model. To the best of our knowledge, few studies have combined hydrological model calibration and bias-correction approaches in a single model. Model performance was also compared with simulated flow obtained from an existing calibration approach (Case 1) and a sub-annual calibration approach (Case 5). A brief overview of the post-processing procedures considered in this study follows:

Case 1 was based on the use of an existing calibration approach without bias correction.

Case 2 was based on flow duration curves (FDCs) of observed and calibrated flows.

Case 3 was based on an autoregressive model constructed from time series of the residual, which were calculated from observed and simulated flows.

Case 4 was also based on an autoregressive model constructed from time series of the residual. However, the residual was calculated from observed and simulated river levels, which were converted using a rating curve.

Case 5 was based on a sub-annual calibration approach without bias correction. Two different parameter sets are obtained by calibrating the dry season (spring and summer) and the wet season (autumn and winter) separately.

We compared the performance of the simulated flow from a well-calibrated model as a baseline (Cases 1 and 5) with the three bias-correction models (Cases 2, 3, and 4) to address the following questions:

- (1)
Can a seasonal pattern and systematic trend in simulated flow from a rainfall–runoff model be identified?

- (2)
Can a bias-correction approach reduce systematic biases in the residual for both calibration and validation periods? What is the most effective method of correcting biases of an entire flow regime?

- (3)
Can simulated flow with bias correction effectively reproduce observed water balance?

The balance of this study is organized as follows. In the section ‘Case Study Area and the Hydrological Model’, we describe the study area, data, and hydrological model. The three different post-processing methods introduced in this study are summarized in the section ‘Bias-Correction Scheme’. The results of different bias-correction methods are presented in the ‘Results and Discussion’. The main conclusions and discussions of this study are provided in the section ‘Discussion and Conclusion’.

## CASE STUDY AREA AND THE HYDROLOGICAL MODEL

### Study area and data

The Thorverton catchment has an area of 606 km^{2} and is a sub-catchment of the Exe catchment. The Exe catchment is in the southwest of England and has an area of 1,530 km^{2} and an average annual rainfall of 1,088 mm. Figure 1 provides an overview of the Thorverton catchment. A daily time series of the observed precipitation, potential evapotranspiration, and flow data (1961–1990) from the Thorverton catchment was obtained from the UK Met Office. Rating curves were acquired from the National River Flow Archive. Thirty-year (1961–1990) mean rainfall and temperature data for this catchment are presented in Figure 2. Winter is the wettest season and summer is the warmest.

### Hydrological model and calibration

*et al.*2003; Wagener

*et al.*2001b; Vrugt

*et al.*2003; De Vos

*et al.*2010a). Model parameters are described in Table 1, and the model structure is illustrated in Figure 3. The cumulative distribution function of the water storage capacity,

*C*, has the following form:where is the maximum soil moisture storage capacity in the catchment, and controls the degree of spatial variability of the soil moisture capacity. The excess rainfall can be transformed into the runoff, which is divided into quick and slow flows based on partitioning factor . The runoffs are routed through three identical quick-flow tanks and a parallel slow-flow tank. The flow rates are determined by the recession coefficient for the quick-flow tank () and slow-flow tank .

Parameter . | Unit . | Range . | Description . |
---|---|---|---|

C_{max} | mm | 1–500 | Maximum soil moisture storage capacity |

b_{exp} | – | 0.01–1.99 | Spatial variability of soil moisture capacity |

α | – | 0.01–0.99 | Quick-/slow-flow distribution factor |

R_{s} | Day | 0.01–0.99 | Recession coefficient for slow-flow tank |

R_{q} | Day | 0.01–0.99 | Recession coefficient for quick-flow tank |

Parameter . | Unit . | Range . | Description . |
---|---|---|---|

C_{max} | mm | 1–500 | Maximum soil moisture storage capacity |

b_{exp} | – | 0.01–1.99 | Spatial variability of soil moisture capacity |

α | – | 0.01–0.99 | Quick-/slow-flow distribution factor |

R_{s} | Day | 0.01–0.99 | Recession coefficient for slow-flow tank |

R_{q} | Day | 0.01–0.99 | Recession coefficient for quick-flow tank |

For the optimization algorithm, we used dynamically dimensioned search (DDS; Tolson & Shoemaker 2007), which is a simple, single-objective, and heuristic global search algorithm. DDS searches a set of parameters globally and incrementally by localizing the parameter space as the number of iterations reaches a predefined maximum number of simulations. The procedure from the global to the local scale is carried out by probabilistically decreasing the number of model parameters in the neighborhood. The DDS algorithm easily converges to global solutions by perturbing the current solution values in randomly selected dimensions with magnitudes of the perturbations randomly sampled from a normal distribution with a mean of zero. More details can be found in Tolson & Shoemaker (2007).

*i*is the

*i*th day, and

*N*is the number of days in the calibration period. The effect of different objective functions and their combinations in the calibration of rainfall–runoff models was not explored.

## BIAS-CORRECTION SCHEME

First, hydrological data, including precipitation and temperature from 1961 to 1990, were compiled and used to calibrate the HYMOD. Second, different transfer functions (i.e., FDC, autoregressive relations, and rating curve) were constructed to map the simulated flow onto the observed flow in the process of bias correction. Third, remaining hydrological data from 1991 to 2008 were used for validation purposes. Figure 4 illustrates a schematic diagram of the bias-correction approaches and the improved sub-annual calibration scheme proposed in this study. This study compared five different schemes, including a conventional calibration method (Case 1), three post-processing methods that were applied to the simulated flow in the context of bias correction (Cases 2, 3, and 4), and sub-annual calibration based on wet and dry seasons separately (Case 5). The parameters obtained from the calibration process (Case 1) were subsequently used to simulate streamflow sequences as a baseline for the other three cases. In other words, three post-processing methods were aimed at correcting the simulated flow from a conventional calibration process (Case 1). In Case 2 (the FDC model), the cumulative probability of simulated flow at a given time was obtained from the FDC of the simulated flow, and the corresponding observed probability was then taken from that of the observed value (FDC from observation) to adjust the calibrated flow further. The processes were as follows:

First, the FDC was built with data from the entire calibration period for both observed and simulated flows.

Second, the probability of the simulated flow was computed from the simulated FDC.

- Third, the bias-corrected flow () was computed by mapping the probability of simulated flow obtained from the simulated FDC onto the observed FDC . The equation can be expressed as follows:where is constructed from the simulated flow, is an inverse function built with the observed flow, and is the bias-corrected flow. The FDC played the role of a transfer function for bias correction of the simulated flow by matching the FDC of observed flow for the same period. The transfer functions were estimated for 30 years (1961–1990), and these transfer functions were then applied to the remaining period (1991–2008) for validation.

The following bias-correction process is based on a first-order autoregressive model of the residual AR_{Q}(1), named Case 3:

First, the residual series was obtained by subtracting the simulated flow from that of the observed (Equation (4)).

Second, the residual was assumed to follow the first-order autoregressive model, AR

_{Q}(1), as written in Equation (5).- Third, the bias-corrected flow was obtained by adding the modeled residuals from AR
_{Q}(1) to the simulated flow (Equation (6)) as follows:where*t*is time index (days) and*T*is the number of records. is the modeled residual at time*t**+*1, while*a*and*b*are coefficients of the AR_{Q}(1) model.

In the proposed bias-correction method (Case 4), AR_{H}(1) was applied in the rating curve domain. In other words, the simulated flow was mapped to the rating curve to obtain the water stage instead of flow. Moreover, the residual was assumed to follow the first-order autoregressive model as adopted by Case 3. The key aspects of the bias-correction methods in Case 4 can be summarized as follows:

First, the simulated flow was converted to the water stage through a rating curve.

Second, the residual sequences were obtained by subtracting the simulated water stage from that of the observed value (Equation (7)).

Third, the AR

_{H}(1) autoregressive model was applied to the residual series (Equation (8)).Fourth, the bias-corrected water stage was synthesized by adding the modeled residuals from AR

_{H}(1) to the simulated water stage (Equation (9)).- Fifth, the bias-corrected flow was obtained by mapping the water stage onto the rating curve (Equation (10)).

Kim & Han (2017) explored sub-annual calibration schemes. They found that the model calibrated on the sub-annual period schemes generally performs better than the model calibrated on the entire datasets without its separation. Therefore, the sub-annual calibration method is additionally considered as a reference (Case 5) for comparison with those of the bias-correction schemes (Cases 2, 3, and 4). The purpose of this experiment was to explore whether an improved calibration framework can be more effective than a post-processing approach to the simulated flow. More specifically, the model (HYMOD) was calibrated for two distinct seasons (i.e., dry and wet seasons), and the obtained parameter sets were subsequently used in the validation phase. During calibration, the model was kept running continuously until the optimization was done only for the chosen seasons. The calibrated flow data from sub-annual calibration schemes are then combined together to examine the model performance over the entire time period.

## RESULTS AND DISCUSSION

### Comparison of results during the calibration period

Figure 5 compares residual distributions over the year for five different cases. Case 1, which is based on a conventional calibration process without bias correction, is used as a baseline for the other three, while Case 2 corrects for bias in the FDC domain. A temporal pattern in the residuals that may be influenced by seasonality in hydrologic variables (i.e., precipitation and temperature) can be seen in Figure 5. No clear trend emerged with respect to the residual because the distribution was nearly symmetric. The relative range (i.e., the range/the average flow) in summer was greater than that in winter for five different cases. The ratio of relative range in summer to that in winter ranges from 1.49 to 1.83, indicating that the HYMOD may not adequately capture low flows during summer. The degree of reduction in the range of the residual in Case 1 was primarily the extent to which bias correction based on FDC (Case 2) was effective. The median of the residual continues to converge toward zero in all months of the year, and the reduction ratio in the range of residual in the Case 2 with respect to Case 1 was significantly larger in summer (21.3–21.8%) than in winter (6.3–8.4%) as summarized in Table 2. This suggests that the FDC model (Case 2) reduced bias in the low flow more effectively than in the high flow. For Case 3, a clear seasonality with a large variability in winter was evident in line with Case 2. The degree of reduction in the range of the residual was more pronounced in Case 3, as shown in Table 2. The reduction ratio in the winter (26.9%–30.7%) was substantially larger than that of the Case 2 (6.3–8.4%), confirming the efficacy of the model compared with the FDC model in reducing bias in the simulated flow. More importantly, the median of the residual remained close to zero, particularly for those treated with the first-order autoregressive model on residuals, AR_{Q}(1), compared with Case 3. The monthly distribution of the residual in Case 4 was largely comparable to that of Case 3 in terms of seasonality and variability. Furthermore, the reduction ratio expanded slightly over the year, relative to Case 3. However, the median of the residual tended toward positive values that led in turn to an increase in bias associated with underestimation of the flow, indicating that the efficacy of treating biases in the simulated flow may be limited in its application to bias correction. For Case 5, a clear seasonality with large variability in winter was evident as in other cases. The reduction ratio in the winter (−5.2 to −0.5%) showed negative values, which implies that the performance of the sub-annual calibration scheme model has not been improved compared with that of the conventional calibration method (Case 1). However, the reduction ratio in summer (41.4–48.9%) was largely comparable to that of Cases 3 and 4. This indicates that the sub-annual calibration model performs better in the low flows than in the high flows.

Month . | Range . | Range of residuals . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Case 1 . | Case 2 . | Case 3 . | Case 4 . | Case 5 . | ||||||

Value . | Value . | RR (%) . | Value . | RR (%) . | Value . | RR (%) . | Value . | RR (%) . | ||

1 | Max–Min | 29.0 | 27.1 | 6.3 | 20.2 | 30.3 | 20.2 | 30.2 | 30.5 | − 5.2 |

Interquartile | 7.2 | 6.8 | 5.0 | 5.1 | 7.6 | |||||

2 | Max–Min | 28.4 | 26.0 | 8.4 | 19.7 | 30.7 | 18.2 | 35.9 | 28.5 | − 0.5 |

Interquartile | 7.1 | 6.5 | 4.9 | 4.6 | 7.1 | |||||

3 | Max–Min | 20.2 | 17.1 | 15.2 | 13.0 | 35.6 | 12.6 | 37.5 | 18.7 | 7.3 |

Interquartile | 5.0 | 4.3 | 3.2 | 3.2 | 4.7 | |||||

4 | Max–Min | 13.3 | 11.3 | 14.9 | 8.6 | 35.0 | 7.9 | 40.4 | 11.8 | 11.6 |

Interquartile | 3.3 | 2.8 | 2.2 | 2.0 | 2.9 | |||||

5 | Max–Min | 12.5 | 10.7 | 14.4 | 7.8 | 37.6 | 8.1 | 35.6 | 10.8 | 13.6 |

Interquartile | 3.1 | 2.7 | 2.0 | 2.0 | 2.7 | |||||

6 | Max–Min | 10.5 | 8.2 | 21.8 | 6.0 | 42.6 | 5.9 | 43.7 | 6.2 | 41.4 |

Interquartile | 2.6 | 2.1 | 1.5 | 1.5 | 1.5 | |||||

7 | Max–Min | 10.1 | 7.9 | 21.8 | 5.7 | 43.4 | 4.7 | 53.5 | 5.2 | 48.9 |

Interquartile | 2.5 | 2.0 | 1.4 | 1.2 | 1.3 | |||||

8 | Max–Min | 11.5 | 9.0 | 21.3 | 6.4 | 43.7 | 5.6 | 50.8 | 6.2 | 45.8 |

Interquartile | 2.9 | 2.3 | 1.6 | 1.4 | 1.6 | |||||

9 | Max–Min | 15.1 | 12.3 | 19.1 | 8.4 | 44.4 | 6.9 | 54.4 | 12.0 | 20.8 |

Interquartile | 3.8 | 3.1 | 2.1 | 1.7 | 3.0 | |||||

10 | Max–Min | 21.3 | 17.2 | 19.0 | 13.7 | 35.4 | 11.0 | 48.4 | 22.5 | − 5.5 |

Interquartile | 5.3 | 4.3 | 3.4 | 2.7 | 5.6 | |||||

11 | Max–Min | 25.3 | 22.3 | 11.9 | 18.3 | 27.8 | 16.1 | 36.2 | 27.9 | − 10.2 |

Interquartile | 6.3 | 5.6 | 4.6 | 4.0 | 7.0 | |||||

12 | Max–Min | 27.5 | 25.3 | 7.9 | 20.1 | 26.9 | 18.4 | 33.1 | 28.8 | − 4.6 |

Interquartile | 6.9 | 6.3 | 5.0 | 4.6 | 7.2 |

Month . | Range . | Range of residuals . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Case 1 . | Case 2 . | Case 3 . | Case 4 . | Case 5 . | ||||||

Value . | Value . | RR (%) . | Value . | RR (%) . | Value . | RR (%) . | Value . | RR (%) . | ||

1 | Max–Min | 29.0 | 27.1 | 6.3 | 20.2 | 30.3 | 20.2 | 30.2 | 30.5 | − 5.2 |

Interquartile | 7.2 | 6.8 | 5.0 | 5.1 | 7.6 | |||||

2 | Max–Min | 28.4 | 26.0 | 8.4 | 19.7 | 30.7 | 18.2 | 35.9 | 28.5 | − 0.5 |

Interquartile | 7.1 | 6.5 | 4.9 | 4.6 | 7.1 | |||||

3 | Max–Min | 20.2 | 17.1 | 15.2 | 13.0 | 35.6 | 12.6 | 37.5 | 18.7 | 7.3 |

Interquartile | 5.0 | 4.3 | 3.2 | 3.2 | 4.7 | |||||

4 | Max–Min | 13.3 | 11.3 | 14.9 | 8.6 | 35.0 | 7.9 | 40.4 | 11.8 | 11.6 |

Interquartile | 3.3 | 2.8 | 2.2 | 2.0 | 2.9 | |||||

5 | Max–Min | 12.5 | 10.7 | 14.4 | 7.8 | 37.6 | 8.1 | 35.6 | 10.8 | 13.6 |

Interquartile | 3.1 | 2.7 | 2.0 | 2.0 | 2.7 | |||||

6 | Max–Min | 10.5 | 8.2 | 21.8 | 6.0 | 42.6 | 5.9 | 43.7 | 6.2 | 41.4 |

Interquartile | 2.6 | 2.1 | 1.5 | 1.5 | 1.5 | |||||

7 | Max–Min | 10.1 | 7.9 | 21.8 | 5.7 | 43.4 | 4.7 | 53.5 | 5.2 | 48.9 |

Interquartile | 2.5 | 2.0 | 1.4 | 1.2 | 1.3 | |||||

8 | Max–Min | 11.5 | 9.0 | 21.3 | 6.4 | 43.7 | 5.6 | 50.8 | 6.2 | 45.8 |

Interquartile | 2.9 | 2.3 | 1.6 | 1.4 | 1.6 | |||||

9 | Max–Min | 15.1 | 12.3 | 19.1 | 8.4 | 44.4 | 6.9 | 54.4 | 12.0 | 20.8 |

Interquartile | 3.8 | 3.1 | 2.1 | 1.7 | 3.0 | |||||

10 | Max–Min | 21.3 | 17.2 | 19.0 | 13.7 | 35.4 | 11.0 | 48.4 | 22.5 | − 5.5 |

Interquartile | 5.3 | 4.3 | 3.4 | 2.7 | 5.6 | |||||

11 | Max–Min | 25.3 | 22.3 | 11.9 | 18.3 | 27.8 | 16.1 | 36.2 | 27.9 | − 10.2 |

Interquartile | 6.3 | 5.6 | 4.6 | 4.0 | 7.0 | |||||

12 | Max–Min | 27.5 | 25.3 | 7.9 | 20.1 | 26.9 | 18.4 | 33.1 | 28.8 | − 4.6 |

Interquartile | 6.9 | 6.3 | 5.0 | 4.6 | 7.2 |

‘Max–Min’ represents a range between the maximum value and the minimum value. ‘interquartile’ represents lower and upper quartiles (25th and 75th percentiles). ‘RR’ represents a reduction ratio of residuals for Cases 2, 3, 4 and 5 against Case 1.

The relationship between the observed flow and the residual was further explored in the context of residual distribution by categorizing the residual into six classes from −60 to 60 m^{3}s^{−1}, a range that includes most (99.8%) of the residuals. The labels of categorized residuals in Figure 6 indicate the upper boundary of each class. Here, our focus was to explore the effects of bias correction in adjusting residual distribution with respect to the magnitude of the flow. Clear tendencies for both negative and positive residuals were evident. For Case 1, residuals showed an increasing trend as the flow magnitude increased. The results indicate that the model had a systematic error in simulating high flows, and this may have been due in part to a limited ability to represent the fast-runoff component. In Case 1, negative residuals were associated with relatively low flows of between approximately 5 and 80 m^{3}s^{−1}, while positive residuals were associated with relatively high flows of 20–170 m^{3}s^{−1}. In particular, after bias correction, the positive residual associated with the high flow showed a decreasing tendency, while the negative residual associated with the high flow appeared to increase. However, this does not necessarily mean that it had no important effects on the correction of negative bias (i.e., overestimation of simulated flow).

The increase in the negative bias can be partially attributed to a bias that cannot be fully explained by the proposed bias-correction schemes. More specifically, the positive bias relating to the high flow may have migrated to the negative bias with the effect of bias correction being most pronounced at high flows, resulting in a decrease in the positive residual. A more detailed exploration of this migration pattern is provided in the following section. For Case 5, as is the case with Case 1, negative residuals were associated with relatively low flows and positive residuals were associated with relatively high flows except for the residual range from 0 to 20 m^{3}s^{−1}.

Figure 7 shows the distribution of residuals along with the flow to compare Case 1 and the other three cases; conditional heteroscedastic variances on the magnitude of flow are evident in all cases. As expected, the bias-corrected flows for the three cases appear to be more effective than the baseline (Case 1) in correcting systematic bias. Among the three bias-correction schemes, the performance of Cases 3 and 4 was superior in terms of overall efficiency compared with Case 2. The shape of the marginal distribution of residual converged to zero with reduced variance through the use of bias-correction approaches. The reduced variability of the residual is represented graphically in Figure 7. Overall, the marginal distribution of the residual in Case 2 was largely similar to that of Case 1 and a significant decrease in negative bias was evident in Cases 2, 3, and 4. However, negative biases ranging from −40 to −60 were slightly increased by the use of the bias-correction approach, which is depicted in Figure 6. The slight increase in negative residual in a certain range can be attributed to the shift from a positive bias after bias correction, but the shift toward the negative reported here was not significant.

Table 3 provides a comparison of the RMSE in different cases during the calibration period. The RMSE for Cases 1 and 2 is comparable to each other, while an apparent reduction in the RMSE is seen in Cases 3 and 4, which can be attributed to the efficacy of bias-correction approaches proposed in this study. The residuals in the RMSE were squared before averaging them to ensure a relatively high penalty on a large residual was assigned (Kim *et al.* 2018); an increase in RMSE could therefore be expected, particularly for the significant difference in the simulation. In general, the results imply that the FDC model (Case 2) played a limited role in correcting large residuals, whereas the AR_{Q}(1) (Case 3) and AR_{H}(1) (Case 4) bias-correction approaches appeared to be much more effective in terms of removing a large residual. To further investigate the efficacy of bias-correction methods, a comparison of RMSE over different flow regimes (i.e., low, intermediate, and high flow) is provided in Table 3. Low, intermediate, and high flows are classified by quartiles based on the 25th and 75th percentiles. Cases 3 and 4 appeared to be most effective in correcting systematic biases over the entire flow regime. Compared with Case 1, Case 2 performed slightly better for intermediate- and low-flow regimes, while a slight increase in RMSE was observed for high flow. The sub-annual calibration model (Case 5) were mainly effective in terms of reproducing low-flow regimes compared with Case 1.

Flow range . | Bias-correction methods . | ||||
---|---|---|---|---|---|

Case 1 . | Case 2 . | Case 3 . | Case 4 . | Case 5 . | |

Entire flow | 7.219 | 7.277 | 6.429 | 6.638 | 7.483 |

High flow (75th–100th) | 13.05 | 13.355 | 12.069 | 12.511 | 13.313 |

Intermediate flow (25th–75th) | 4.189 | 3.943 | 3.123 | 3.101 | 4.744 |

Low flow (0–25th) | 1.727 | 1.521 | 1.115 | 1.286 | 1.295 |

Flow range . | Bias-correction methods . | ||||
---|---|---|---|---|---|

Case 1 . | Case 2 . | Case 3 . | Case 4 . | Case 5 . | |

Entire flow | 7.219 | 7.277 | 6.429 | 6.638 | 7.483 |

High flow (75th–100th) | 13.05 | 13.355 | 12.069 | 12.511 | 13.313 |

Intermediate flow (25th–75th) | 4.189 | 3.943 | 3.123 | 3.101 | 4.744 |

Low flow (0–25th) | 1.727 | 1.521 | 1.115 | 1.286 | 1.295 |

Table 4 presents the monthly water balances of the simulated flows for all four cases during the calibration period. The water balance of Case 3 was closer to that of the observed value, in line with the results presented in Figure 7 and Table 3. The water balance for Case 2 on an annual basis was equal to the observed value because bias correction was based on an FDC. However, monthly differences in water balance were substantially increased. Apart from Case 2, Case 3 was the most similar to the observed values.

Month . | Water balance . | |||||
---|---|---|---|---|---|---|

Obs. . | Case 1 . | Case 2 . | Case 3 . | Case 4 . | Case 5 . | |

Jan | 25,939 | 25,358 | 25,149 | 25,438 | 23,464 | 25,347 |

Feb | 23,612 | 22,955 | 22,857 | 23,202 | 21,446 | 22,945 |

Mar | 17,259 | 16,130 | 15,709 | 16,521 | 15,025 | 16,550 |

Apr | 11,691 | 10,447 | 10,172 | 10,947 | 9,680 | 10,360 |

May | 8,071 | 7,225 | 7,185 | 7,550 | 6,512 | 6,613 |

Jun | 4,977 | 5,027 | 5,229 | 4,833 | 4,015 | 3,976 |

Jul | 4,270 | 4,571 | 4,875 | 4,389 | 3,653 | 3,358 |

Aug | 5,094 | 5,845 | 5,956 | 5,469 | 4,621 | 4,471 |

Sep | 6,834 | 7,508 | 7,558 | 7,269 | 6,264 | 7,283 |

Oct | 13,664 | 14,879 | 14,702 | 14,327 | 12,971 | 15,478 |

Nov | 20,028 | 21,332 | 21,108 | 20,826 | 19,188 | 21,742 |

Dec | 25,575 | 26,343 | 26,513 | 26,027 | 24,162 | 26,522 |

Total | 167,014 | 167,620 | 167,014 | 166,799 | 151,003 | 164,643 |

Month . | Water balance . | |||||
---|---|---|---|---|---|---|

Obs. . | Case 1 . | Case 2 . | Case 3 . | Case 4 . | Case 5 . | |

Jan | 25,939 | 25,358 | 25,149 | 25,438 | 23,464 | 25,347 |

Feb | 23,612 | 22,955 | 22,857 | 23,202 | 21,446 | 22,945 |

Mar | 17,259 | 16,130 | 15,709 | 16,521 | 15,025 | 16,550 |

Apr | 11,691 | 10,447 | 10,172 | 10,947 | 9,680 | 10,360 |

May | 8,071 | 7,225 | 7,185 | 7,550 | 6,512 | 6,613 |

Jun | 4,977 | 5,027 | 5,229 | 4,833 | 4,015 | 3,976 |

Jul | 4,270 | 4,571 | 4,875 | 4,389 | 3,653 | 3,358 |

Aug | 5,094 | 5,845 | 5,956 | 5,469 | 4,621 | 4,471 |

Sep | 6,834 | 7,508 | 7,558 | 7,269 | 6,264 | 7,283 |

Oct | 13,664 | 14,879 | 14,702 | 14,327 | 12,971 | 15,478 |

Nov | 20,028 | 21,332 | 21,108 | 20,826 | 19,188 | 21,742 |

Dec | 25,575 | 26,343 | 26,513 | 26,027 | 24,162 | 26,522 |

Total | 167,014 | 167,620 | 167,014 | 166,799 | 151,003 | 164,643 |

### Results during the validation period

As illustrated in Figure 5 during the calibration period, Figure 8 compares residual distributions over the year for the five different cases during the validation period. Besides, Figure 9 illustrates the relationship between the observed flow and the residual in the context of residual distribution. Both the results are comparable to that of the calibration period. The median of residual distributions for Case 3 becomes closer to zero, while other cases are relatively distant from zero. Thus, a comparison of validation experiment and simulation again confirms that the systematic bias of the simulated flow can be more effectively reduced by Case 3, as already seen in the calibration period.

A comparison of RMSE during the validation period for each case is summarized in Table 5. The efficiency of Case 2 was not as good as the conventional calibration scheme (Case 1), but the other three cases (3, 4, and 5) were superior to Case 1. This indicates that the FDC model (Case 2) played a limited role in reducing bias, especially for correcting unseen data during the validation period. This may have been due to non-stationary flow, which in turn led to a change in the FDC, as illustrated in Supplementary Figure S1. However, the autoregressive model-based bias-correction approach (Case 3) was most effective for both calibration and validation periods. Table 6 presents the monthly water balance during the validation period for each case. Unlike the results reported in Table 4, the water balance of Case 2 represented a large difference in the residual compared with the other four cases on an annual basis. This may be because the temporal pattern of the bias-corrected flow from the FDC was relaxed in the sense that the FDC was constructed in a probability space rather than a temporal domain. Moreover, monthly differences in water balance were substantially improved by Case 3, among the other cases.

Period . | RMSE . | ||||
---|---|---|---|---|---|

Case 1 . | Case 2 . | Case 3 . | Case 4 . | Case 5 . | |

Validation (1991–2008) | 8.256 | 8.339 | 7.684 | 8.108 | 8.083 |

Period . | RMSE . | ||||
---|---|---|---|---|---|

Case 1 . | Case 2 . | Case 3 . | Case 4 . | Case 5 . | |

Validation (1991–2008) | 8.256 | 8.339 | 7.684 | 8.108 | 8.083 |

Month . | Water balance . | |||||
---|---|---|---|---|---|---|

Obs. . | Case 1 . | Case 2 . | Case 3 . | Case 4 . | Case 5 . | |

Jan | 17,150 | 16,156 | 15,043 | 16,455 | 15,422 | 16,214 |

Feb | 12,317 | 12,008 | 11,017 | 12,045 | 15,534 | 12,040 |

Mar | 10,396 | 9,697 | 8,884 | 9,873 | 13,957 | 10,031 |

Apr | 7,402 | 6,682 | 6,209 | 6,864 | 12,883 | 7,011 |

May | 4,801 | 4,256 | 4,065 | 4,375 | 7,977 | 4,371 |

Jun | 3,847 | 3,692 | 3,597 | 3,603 | 6,575 | 3,602 |

Jul | 3,462 | 3,761 | 3,665 | 3,571 | 4,263 | 3,460 |

Aug | 3,312 | 3,632 | 3,553 | 3,475 | 3,675 | 3,384 |

Sep | 4,168 | 4,471 | 4,332 | 4,278 | 3,517 | 4,726 |

Oct | 9,030 | 9,275 | 8,670 | 9,063 | 4,853 | 9,360 |

Nov | 14,374 | 15,632 | 14,488 | 15,070 | 7,358 | 15,481 |

Dec | 17,714 | 17,509 | 16,603 | 17,540 | 12,718 | 17,456 |

Sum | 107,974 | 106,770 | 100,126 | 106,212 | 108,732 | 107,134 |

Month . | Water balance . | |||||
---|---|---|---|---|---|---|

Obs. . | Case 1 . | Case 2 . | Case 3 . | Case 4 . | Case 5 . | |

Jan | 17,150 | 16,156 | 15,043 | 16,455 | 15,422 | 16,214 |

Feb | 12,317 | 12,008 | 11,017 | 12,045 | 15,534 | 12,040 |

Mar | 10,396 | 9,697 | 8,884 | 9,873 | 13,957 | 10,031 |

Apr | 7,402 | 6,682 | 6,209 | 6,864 | 12,883 | 7,011 |

May | 4,801 | 4,256 | 4,065 | 4,375 | 7,977 | 4,371 |

Jun | 3,847 | 3,692 | 3,597 | 3,603 | 6,575 | 3,602 |

Jul | 3,462 | 3,761 | 3,665 | 3,571 | 4,263 | 3,460 |

Aug | 3,312 | 3,632 | 3,553 | 3,475 | 3,675 | 3,384 |

Sep | 4,168 | 4,471 | 4,332 | 4,278 | 3,517 | 4,726 |

Oct | 9,030 | 9,275 | 8,670 | 9,063 | 4,853 | 9,360 |

Nov | 14,374 | 15,632 | 14,488 | 15,070 | 7,358 | 15,481 |

Dec | 17,714 | 17,509 | 16,603 | 17,540 | 12,718 | 17,456 |

Sum | 107,974 | 106,770 | 100,126 | 106,212 | 108,732 | 107,134 |

## DISCUSSION AND CONCLUSIONS

The FDC model effectively reduced biases during the calibration period but was ineffective with respect to both RMSE and water balance during the validation period. As illustrated in Figure 10, the empirical FDCs were applied for adjusting biases in the simulated flow against the observed flow, and in such cases, it may not be appropriate to correct point-wise biases over time, leading to degradation of model efficiency. For example, at time *t*, the flow after bias correction moved further away from the observation, which led to a larger error. However, at the time , the bias-corrected flow approached the observed value. In this study, FDCs were constructed for the entire period of observed and simulated flow. Building FDCs in several groups associated with temporal scales can minimize the impact of this issue. More specifically, flow data for the construction of the FDC curve can be divided into different temporal scales, such as biannual, seasonal, or monthly periods, to build FDCs. By comparing (or optimizing), the results of different periods of FDC models, the optimal period that could reduce biases for both the calibration and validation periods can be identified, which is another research topic.

In this study, an AR(1) first-order autoregressive process was applied and the obtained results confirmed the usefulness of reducing the bias of simulated flow by examining the hypothesis that the residuals were autocorrelated, but further investigation is needed. For example, autoregressive processes can be expanded to a general form and an autoregressive integrated moving-average model, and the most appropriate form of the AR model can be explored by comparing Akaike's information criterion or Bayesian information criterion.

Hydrological models are imperfect representations of real hydrological circulation due to incomplete model structure, insufficient data, measurement errors, and inadequate initial and boundary conditions. The main objective of this study was to assess whether post-processing methods can reduce the biases in hydrological model outputs. Results of three bias-correction methods were compared with the conventionally calibrated flow and sub-annually calibrated flow. The main conclusions can be summarized as follows:

Monthly residuals were compared across four different cases. Generally, there was a clear seasonality representing a more substantial variability in winter than summer. The range of the residual usually expanded in winter, demonstrating that the HYMOD may not reproduce high flows adequately. For a systematic comparison, the reduction ratio in the range of the residual with respect to Case 1 was evaluated. An overall reduction in the residuals of bias-corrected flows (Cases 2, 3, and 4) was clear. A substantial decrease in the range of residual was identified in Cases 3 and 4. However, the residual from Case 4 showed an increasing tendency toward positive values, and the efficacy of treating biases may be limited.

This study examined the effects of bias correction on adjusting residual distribution with respect to the magnitude of the flow. There was a clear increasing tendency for the residual as the magnitude of the flow increased, as seen in the monthly distribution of residuals. The positive residual related to high flow showed a decreasing trend, whereas the negative residual associated with high flow appeared to increase after the bias correction. There is evidence of the shift from a positive bias toward negative bias and

*vice versa*, with the effect of bias correction being most pronounced at high flows, resulting in a decrease in the positive residual (or an increase in the negative residual).This study confirmed that the bias-corrected flows with three cases were more effective than the baseline model (Case 1), in terms of correcting a systematic error in simulated flow. The residual distribution converged to zero with reduced variability, indicating the efficacy of bias correction. A significant decrease in negative residual with high flow appeared in Cases 2, 3, and 4. However, negative residuals in a certain range were slightly higher due to the shift from a positive residual after bias correction.

A more detailed comparison with the RMSE during the calibration and validation periods revealed that the RMSE in Cases 1 and 2 were similar to each other, while a clear reduction in the RMSE was seen in Cases 3 and 4 during the calibration phase. On the other hand, Case 3 was more effective than the other three cases, which were comparable with each other in terms of the RMSE during the validation phase. A comparison of the RMSE over different flow regimes demonstrated that Case 3 appeared the most effective in correcting systematic biases in the entire flow regime. Finally, monthly water balances for all the cases were compared during both calibration and validation periods. The monthly water balance of Case 3 was close to that of the observed value, whereas Case 2 showed considerable differences in the residual compared with the other three cases.

An AR model (Cases 3 and 4) showed relatively good efficiency than the conventional calibration method (Case 1) and the FDC model (Case 2) at correcting systematic biases both in calibration and validation periods. Here, the AR model is only applicable when the lagged residual is obtained from observations at the previous time step. For example, the AR model cannot be applied to correct the simulated flow in the future period (e.g., 2050–2100). To alleviate this issue, the non-stationary nature of the FDC needs to be explored for better representation of time-varying behavior in the residual, including the identification of the optimal temporal scales for the construction of an FDC curve within an optimization framework. Moreover, future research should include a rigorous comparison of the bias-correction approach under different conditions (e.g., different hydrological models, catchment size, climatic conditions, land use, and terrain) to provide a general guideline.

A sub-annual calibration scheme (Case 5) was introduced to explore whether an improved calibration framework can be more effective than the post-processing approach to the simulated flow. The results showed that this model performance did not show much improvement compared with three post-processing approaches.

## ACKNOWLEDGEMENTS

This work was funded by the Korea Meteorological Administration Research and Development Program under Grant KMI (KMI 2018-07010).

## DATA AVAILABILITY STATEMENT

All relevant data are available from an online repository or repositories. The data used in this study are available from the UK Met Office (https://www.metoffice.gov.uk/research/climate/maps-and-data/about/archives) and the National River Flow Archive (https://nrfa.ceh.ac.uk/data/station/info/45001).