Abstract
We present a novel hybrid framework that incorporates information from the process-based global hydrological model PCR-GLOBWB, to reduce prediction errors in streamflow simulations. In addition to catchment attributes and meteorological data, our methodology employs simulated streamflow and state variables from PCR-GLOBWB as predictors of observed river discharge. These outputs are used in a random forest, trained on a global database of streamflow measurements, to improve estimates of simulated river discharge across the globe. PCR-GLOBWB was run for the years 1979–2019 at 30 arcmin and its inputs and outputs were upscaled from daily to monthly time steps. A single random forest model was trained with these state variables, meteorological data and catchment attributes, as predictors of observed streamflow at 2,286 stations worldwide. Model performance was evaluated using Kling–Gupta efficiency (KGE). Results based on cross-validation show that the model is capable of discerning between a variety of hydroclimatic conditions and river flow dynamics, improving KGE of PCR-GLOBWB simulations at more than 80% of testing locations and increasing median KGE from −0.03 in uncalibrated runs to 0.51 after post-processing. Performance boosts are usually independent of the availability of streamflow data, making our method a potential candidate in addressing prediction in poorly gauged and ungauged basins.
HIGHLIGHTS
A hybrid framework for global streamflow modelling is developed, connecting PCR-GLOBWB with random forest.
The framework enables the correction of global-scale streamflow predictions with parsimonious parametrization.
Random forests improve streamflow predictions better when additionally fed with outputs from the hydrological model, as opposed to only using meteorological forcing and catchment attributes.
INTRODUCTION
River discharge is a highly human-relevant component of the hydrological cycle, as it influences and is influenced by human practices and anthropogenic climate change (Haddeland et al. 2013). Accurate streamflow predictions from rainfall-runoff models are needed for efficient water management, flood and drought risk assessment and for the direction of mitigation efforts (Sharma & Machiwal 2021), as well as improving water access and the understanding of human–water interactions (Montanari et al. 2013).
The growing complexity of process-based hydrological models (PBHM) over the past few decades, however, has not considerably increased accuracy in streamflow prediction (Liu & Gupta 2007). As more processes are included in the representation of streamflow generation, the uncertainties that characterize each model component have propagated errors to the simulated runoff (Liu & Gupta 2007; Montanari et al. 2009; Beven 2012; Li et al. 2016; Moges et al. 2020). Observational data used to evaluate the model are also subject to uncertainty (McMillan et al. 2018).
In contrast to classic calibration methods (Liu & Gupta 2007; Todini 2007), alternative approaches to address hydrological uncertainty focus on residuals, by estimating the overall error of a simulation as the difference between modelled and observed streamflow (Liu & Gupta 2007; Evin et al. 2014; Li et al. 2016). Such methods are also referred to as ‘aggregated error models’, as they characterize total uncertainty without attempting to disentangle individual sources of error (Evin et al. 2014; Matthews et al. 2022). Error models may be characterized as ‘joint’ or ‘post-processor’, depending on the timing of streamflow error correction (Evin et al. 2014). Here we focus on post-processing, where the hydrological model is first run, its output compared to observational data and then corrected with a selected error model to improve streamflow forecasts (Bogner et al. 2016; Hopson et al. 2018; Matthews et al. 2022), thereby ignoring interactions between hydrological and error model parameters (Evin et al. 2014; Valdez et al. 2022).
In the past two decades, statistical learning (SL) has gained increasing attention in the hydrological community (Nearing et al. 2021; Xu & Liang 2021; Mosaffa et al. 2022), sparked by its many different implementations in the geosciences at large (Dramsch 2020). SL applications in streamflow prediction mostly relate to the use of various forms of artificial neural networks (ANNs) (Abrahart et al. 2012). An extensive list of SL algorithms for hybrid runoff modelling can be found in Mosavi et al. (2018). PBHM and data-driven models have complementary strengths and weaknesses, and combining them in various ways provides opportunities to enhance the knowledge of both modelling approaches (Herath et al. 2021; Xu & Liang 2021). This integration of theoretical knowledge and observational data enables the extraction of greater volumes of information (Ghaith et al. 2020; Nearing et al. 2021).
In more recent years, the research has been directed towards hybrid streamflow models employing SL algorithms that are more computationally efficient than ANNs (Tyralis et al. 2019). Specifically, random forest (RF) (Breiman 2001), an ensemble machine learning algorithm (Zounemat-Kermani et al. 2021) that uses decision trees as base learners, has become increasingly popular in the hydrological sciences (Tyralis et al. 2019). RF has been successfully applied in streamflow modelling (Li et al. 2019; Tyralis et al. 2019; Pham et al. 2021; Roy et al. 2023), with performances comparable to those of more complex ANNs (Li et al. 2019; Desai & Ouarda 2021; Hauswirth et al. 2021; Ghazikhani et al. 2022), albeit being extremely parsimonious in terms of parameters, computationally efficient and of easier interpretability (Tyralis et al. 2019). All previous studies, however, are limited to the use of observed streamflow, meteorological inputs and a few other variables as predictors of river discharge.
Shen et al. (2022a) developed an innovative hybrid modelling framework, consisting of an RF-based post-processing approach to correct predictions from the global hydrological model (GHM) PCRaster Global Water Balance (PCR-GLOBWB) (Sutanudjaja et al. 2018). Meteorological input as well as intermediate hydrological state variables from PCR-GLOBWB were used as predictors for the RF to estimate residuals in streamflow predictions, which were then applied to correct simulated discharge at the daily scale. The RF was trained and tested at three individual stations in the Rhine basin characterized by a variety of physiographic features and streamflow generation processes, showing excellent improvements in model performance. The RF post-processor achieved comparable results when PCR-GLOBWB was not manually calibrated prior to error correction, implying that the SL algorithm was able to properly account for the various sources of uncertainty in the uncalibrated hydrological model, without the need for previous human intervention.
This approach reconciles the use of PBHM and data-driven models (Todini 2007), by exploiting the theoretical understanding that is engrained in the intermediate outputs of the hydrological model. This makes it potentially capable to characterize universal catchment behaviours (Sivapalan 2005), by extrapolating transferable knowledge on the main sources of error of a GHM, across a variety of climates and catchment characteristics. Unlike other methodologies, proper availability of observational data and adjustments to the single-station framework can thus enable larger-scale correction of discharge for predictions in ungauged basins (PUB) (Sivalapan et al. 2003; Hrachowitz et al. 2013).
A potential application of such a framework is thus to correct simulations of the global hydrological cycle, of which there are very rare and only recent examples. Kraft et al. (2022) developed a joint hybrid framework, but they exclusively rely on a combination of long short-term memory networks and simple ANNs to enhance model results, with little insight into the processes involved in improving streamflow predictions. The method by Shen et al. (2022a), on the other hand, could only be applied to singular stations, undermining its potential to correct simulations at any catchment over the globe.
The main idea underlying this study was to test if information on errors acquired by training the model in gauged catchments could be used as proxy to correct streamflow simulations at previously unseen locations. In this context, the current study aims at expanding the work of Shen et al. (2022a) to the global scale. The work presented here is thus intended to answer the following research questions: can a hybrid framework extrapolate beyond individual catchment characteristics to properly address streamflow prediction in ungauged basins? Does a hybrid framework, which includes hydrological model outputs as predictors of observed streamflow, outperform an SL methodology that only relies on catchment attributes and meteorological inputs?
Reaching this goal requires including predictors of river discharge that involve basin characteristics of a pixel (topography, soil and groundwater characteristics, land use and climatic indices) to be able to generalize global watershed behaviours. It also requires that a global dataset of streamflow observations is built, such that a variety of hydroclimatic regions is captured during model training. This will also enable the understanding of where future hydrological modelling efforts should be directed to address the main sources of uncertainty in global-scale streamflow models.
In the following, we present a global dataset of monthly streamflow observations (section 2) and the novel hybrid methodology employed to generate post-processed values of river discharge across the globe (section 3). Section 4 describes the results as they happen during workflow (hyperparameter tuning, RF training and cross-validation), while section 5 discusses a few key points concerning the potential use of our framework for prediction in ungauged basins, its current limitations and future research directions. Conclusions are given in section 6.
DATA
River discharge data were downloaded from the Global Runoff Data Centre (GRDC 2022). Stations were selected with the following criteria: spatially, a minimum upstream area of 10,000 km2, such that catchments occupied at least ∼4 pixel windows (PCR-GLOBWB was run at a resolution of 30 arcmin, or ∼50 km at the equator) and enough physical processes were assumed to be represented by the hydrological model; temporally, GRDC stations had to have at least 1 year in the period 1979–2019, such that a few timesteps could be compared between simulated and observed discharge. We considered both managed and unmanaged catchments.
METHODS
PCR-GLOBWB
PCR-GLOBWB (Sutanudjaja et al. 2018) is a grid-based GHM. GHMs attempt to simulate the global hydrological cycle and associated processes, providing valuable spatiotemporal estimates of global water resources (Sood & Smakhtin 2015). PCR-GLOBWB represents anthropogenic influences on the water cycle (de Graaf et al. 2014; Wada et al. 2014) alongside natural hydrological processes (van Beek et al. 2011) and is characterized by high flexibility due to its modular structure (Supplementary Material S1).
Meteorological forcing for the hydrological model consists of three variables, namely precipitation (P), temperature (T) and reference potential evapotranspiration (PET), based on the W5E5 v2 (Lange et al. 2021). Here, PCR-GLOBWB was run without calibration at daily timesteps for the years 1979–2019, in its version with a spatial resolution of 30arcmin (∼50 km at the equator), using its standard input parameters (Sutanudjaja et al. 2019) and with the kinematic wave solution for water routing. Model inputs and outputs (I/O) were then upscaled to monthly averages.
Random forests
RF (Breiman 2001) is an ensemble tree-based algorithm that samples predictor variables in each split node of an independent tree and aggregates all the trees for prediction. Supplementary Material S2 shows the in-depth structure of a random forest. The number of trees in the forest is controlled by the parameters ntree. The depth of each tree, i.e. the number of iterations that are executed before tree growth is stopped, is regulated by the nodesize parameter. The ensemble algorithm in the RF can deal with the high correlations occurring between the predictors by randomly selecting a subset of candidate variables for each split. This selection is effectively carried out through the mtry parameter, which can thus take on values between 1 and the total number of predictors. This bootstrapping technique avoids highly influential variables dominating all trees. Each tree is then developed by minimizing a loss function (here, Mean-Squared Error (MSE), ti). After a certain value is determined during hyperparameter calibration, a larger ntree will lead to a decrease in computational efficiency without a justifiable increase in predictive performance. On the other hand, larger mtry and nodesize values will usually cause overfitting on top of larger computational times.
In RF, only a certain subset of the training data is used to grow the forest (Bootstrap n). The unused data, referred to as Out-Of-Bag (OOB) observations, are exploited to estimate the generalization error. Each OOB observation has an OOB prediction, from which an overall OOB root-mean-squared error (RMSE) of the n observations can be obtained instead of using cross-validation, which requires larger computation efforts and may lead to biased error estimates (Breiman 2001). Moreover, internal estimates of variable importance in reducing predictive error are calculated as mean decrease in node impurity (Grömping 2009; Gregorutti et al. 2016; Wright & Ziegler 2017). These are useful in understanding which predictors contribute the most to overall model performance and to increase the interpretability of RF.
Hybrid modelling setup
Predictors selection and pre-processing
Predictor variables were exclusively extracted from PCR-GLOBWB inputs and outputs (I/O), allowing for coherency in data-feeding to the RF to correct uncertainty in the hydrological model, i.e. no external dataset was used. To predict the observed discharge at time t (response variable), we used both time-variant and static predictors. No lagged variable was employed. Time-variant predictors consist of all PCR-GLOBWB I/O, while static predictors are extracted from PCR-GLOBWB parameter maps (see Supplementary Material S3 for a few examples) and are used to inform the algorithm with catchment characteristics.
All predictors were first averaged to the upstream area using the PCRaster Python framework (Karssenberg et al. 2010). They were then extracted from the netCDF (network Common Data Form) files where PCR-GLOBWB I/O is stored using the pixels corresponding to each station of the GRDC. These then had its own predictor table to be used for training or validation. Static predictor maps' values were scaled and centred globally prior to extraction, such that they had a mean of 0 and a standard deviation of 1. For meteorological inputs and state variables, this was done after feature extraction. Simulated and observed runoff (m3 s−1) were converted to flow depth (m day−1), using data on the upstream area.
The final selection of predictors and a brief description of their functioning in PCR-GLOBWB are shown in Supplementary Material S4. For further details on their inner workings, the reader is referred to Sutanudjaja et al. (2018). Supplementary Material S5 shows the correlation analysis that was conducted prior to training to verify which predictors were highly correlated (Gregorutti et al. 2016).
Hybrid modelling configurations
Shen et al. (2022a) showed that the RF improves the performance of the uncalibrated hydrological model, with comparable results when PCR-GLOBWB is manually calibrated prior to feeding data to the RF. They also proved that the hybrid runoff framework always achieves better performance both than the uncalibrated and calibrated PCR-GLOBWB. On the other hand, the calibration of a GHM requires massive efforts, which were beyond the scope of this study. For these reasons, we decided not to calibrate PCR-GLOBWB. Overall, our approach saves time in setting up the model framework, in addition to enabling the algorithm to compensate for errors in hydrological model parameters and input without prior human intervention.
In Shen et al. (2022a), the response variable in the RF post-processor was the residual between the simulated and the observed discharge. However, early trials on previously untested GRDC stations showed that this approach would cause the creation of negative discharge values and a generalized incapability of the RF to properly correct runoff simulations. A preliminary test proved that directly using streamflow observations as a response variable in the RF, instead of the residual, enhanced the overall performance of the post-processor, therefore also reducing data pre-processing and model complexity. This stems from the fact that discharge observations can only take positive values, whereas residual dynamics are characterized with both positive and negative values. Additional model developments are described in Supplementary Material S6.
With these preconditions, three different model setups were tested (Table 1), including the following predictor combinations: (1) qMeteoStatevars is the setup developed by Shen et al. (2022a), which uses uncalibrated discharge (q), meteorological input (meteo) and uncalibrated hydrological state variables (Statevars). Here we developed two additional configurations: (2) meteoCatchAttr only employs meteo and static catchment attributes (CatchAttr). This configuration is used as a baseline to understand if a fully SL-based framework generalizes catchment behaviours, while excluding all PCR-GLOBWB output from the predictors. (3) Finally, in this study, we combine the previous two configurations, by adding predictor values of static catchment attributes, extracted from global maps of hydrological parameters, to qMeteoStatevars. This hybrid configuration (allPredictors) ultimately shows if the use of outputs from a GHM can complement predictors of input catchment parameters and meteorological forcing to improve streamflow predictions.
. | Catchment attributes . | Meteorological input . | PCR-GLOBWB state variables . | PCR-GLOBWB discharge . | Total predictors . |
---|---|---|---|---|---|
qMeteoStatevars | v | v | v | 25 | |
meteoCatchAttr | V | v | 30 | ||
allPredictors | v | v | v | v | 52 |
. | Catchment attributes . | Meteorological input . | PCR-GLOBWB state variables . | PCR-GLOBWB discharge . | Total predictors . |
---|---|---|---|---|---|
qMeteoStatevars | v | v | v | 25 | |
meteoCatchAttr | V | v | 30 | ||
allPredictors | v | v | v | v | 52 |
Model training and evaluation
RF was implemented in R with the ranger package (Wright & Ziegler 2017), a fast implementation of RF for high-dimensional data. The model requires tuning of the three RF hyperparameters (section 3.2): ntree, mtry and nodesize. To do this, the OOB RMSE was used for hyperparameter tuning in the RF, to minimize overfitting and maximize its generalization capabilities, while also optimizing performance.
The hyperparameters were optimized for each post-processor configuration (section 3.3.2) with the following approach: first, forests were grown with a fixed ntree of 200, to quickly understand the response of mtry. mtry was initially searched with a spacing of 5, ranging between 1 and the possible number of predictors; after this, the search was reduced to ∼10 units within the mtry with the two lowest RMSEs, and mtry was calculated with a spacing of 1. Once the optimal mtry was determined for each configuration, this was kept constant to understand the response of ntree. nodesize was kept constant with a value of 5 as in Shen et al. (2022a). The hyperparameter set that gave the smallest OOB RMSE with the highest computational efficiency was used to train each RF post-processor.
To train and validate the RF, location-based split sampling was executed, generating five random subsamples of the whole dataset. Each subsample was divided into training stations and testing stations. All data from the training stations were accumulated into a single table that was used to tune the hyperparameters and train the RF. The training table contained ∼2/3 of all available observations (∼ 427,823 timesteps). During the training phase, the algorithm fits the predictors to the response variable (discharge observations), to generate a predictive model, also referred to as a ‘trained RF’.
Each of the five trained RF was then applied on each station from their respective testing dataset and values of streamflow were predicted at each timestep. Improvements in Kling–Gupta Efficiency (KGE) (Gupta et al. 2009) were calculated to validate the error-correcting capabilities of the trained RF, comparing the performance of uncalibrated PCR-GLOBWB to the fully SL-based and hybrid setups, for each subsample.
KGE was created to remedy the limitations of the Nash–Sutcliffe Efficiency (NSE) (Nash & Sutcliffe 1970). In fact, exclusively relying on NSE to evaluate a hydrological model can lead to untruthful estimates of its general performance, due to normalization of the MSE in NSE (Gupta et al. 2009). This causes the relative importance of bias evaluation to vary across basins and in highly variable observed flows, in addition to systematically underestimating flow variability (Gupta et al. 2009). The use of KGE, on the other hand, allows to disentangle the components that constitute the overall performance of a hydrological model, namely linear correlation, bias and variability.
A hydrological model with perfect performance has a KGE of 1, meaning that ideal values of r, and correspond to 1, such that the equation term under the square root is minimized to 0. r can take values between −1 and 1, and between −∞ and +∞. Improvements upon mean flow benchmark for prediction have been shown to undertake a value of 1 − √2 (∼ −0.41) (Knoben et al. 2019), which is used in the following as a baseline to evaluate good model performance.
RESULTS
Hyperparameter tuning
Since stations with different availability of observational data were used, the training data are not uniformly distributed across the globe. The table in Supplementary Material S7 enumerates, for each subsample, how many stations were used for training and testing, respectively, and contributions to the total training data for each region of the World Meteorological Association (WMO 2006). The percentage was calculated by dividing the available timesteps in each region by the total timesteps used in a certain training subset. These ratios clearly show that some WMO regions are characterized by data scarcity more than others. This issue can potentially be tackled using the transferability of catchment behaviours within our framework.
mtry is characterized by an initially decreasing OOB RMSE to a global optimum at a value of around half the possible predictors; higher values cause the RF to slightly overfit the training data and error starts increasing again. The mtry that gave the lowest OOB RMSE (Supplementary Material S8) was then used to study the response of ntrees. For what concerns ntrees, the OOB RMSE converges early for all predictor configurations at ∼500 trees, a value that was ultimately used to train the various models. After this point, further decreases in OOB RMSE outweigh the computational necessities of the RF.
Training
The results clearly demonstrate that including static predictors influences the relationship between different variables at the node splits. For instance, in the allPredictors configuration (Figure 4(a)), half of the major 20 predictors are static ones, with the most significant ones being related to climate, river channel and land use characteristics. Moreover, having different configurations that include various predictor combinations allows to better understand the relative importance within time-series (qMeteoStatevars, Figure 4(b)) and within meteorological and catchment parameter inputs (meteoCatchAttr, Figure 4(c)).
The most important time-variant variable is the flow depth from PCR-GLOBWB, showing that the process-based streamflow prediction is highly informative to the SL algorithm, even in its uncalibrated form. Other very significant transient predictors are the groundwater recharge rate, the water storage of the lower soil layer and the total evaporation.
These conclusions are, however, subjective to the training dataset, where contributions of discharge observations from GRDC stations are unbalanced towards some regions more than others. The bars in Figure 4 clearly show the relative uncertainty of each predictor, with some being more sensitive to the specific training sub-dataset, especially the flow depth, the aridity index, bank width and bank depth. Another variable with large uncertainty in importance is the irrigation water withdrawal (Figure 4(b)), most certainly dependent on the large differences in irrigated area upstream of a certain station around the globe.
Cross-validation
The validation results show that the various configurations work consistently, independently of the training set, with small divergences related to the random subsampling and the base performance of the PCR-GLOBWB. Compared to the uncalibrated runs (median KGE = −0.03), the qMeteoStatevars (median KGE = 0.37) setup (from Shen et al. 2022a) improved correlation better than meteoCatchAttr (median KGE = 0.34), while this last one achieved greater performances in improving bias and variability. This can also be seen in the greater interquartile ranges in KGE for the qMeteoStatevars setup. Intuitively, the hydrological state variables act as a ‘state update’ of the hydrological system, while the predictors relating to catchment attributes and meteorological inputs enable a generalization of catchment behaviours.
However, blending all these predictors together in a hybrid setup ultimately achieved the overall best performance, as it is quite evident from the results of the allPredictors configuration. Including meteorological variables, hydrological state variables and static catchment attributes brought a clear improvement in KGE (median KGE = 0.51) as well as in correlation, bias and variability across all subsamples. The biggest performance boost in correlation comes from the hydrological state variables, while reduced variability and bias ratio are likely caused by the catchment attributes. This proves that including descriptors of catchment behaviours improves the reduction in predictive error of the RF-based post-processor. Supplementary Material S9 summarizes the values of KGE and its components cumulated over all subsamples (rightmost column in Figure 5).
Improvements were usually independent of the availability of training data (e.g. Central and Western Africa, South-East Asia, the Iberic peninsula and Russia) (Figure 1), demonstrating that a good donor catchment from a data-rich region was often available in the training dataset. On the other hand, it is also the case that not all data-rich regions necessarily received performance boosts, which may indicate the influence of other factors in poor behaviour generalization, e.g. arid environments and/or highly engineered catchments.
In Figure 7(a) and 7(b), the hydrographs show that streamflow dynamics remain realistic after post-processing and are not negatively impacted by the RF. Moreover, the model is capable of correcting significant differences between real-world and simulated values, which can be better seen in the flow duration curves, highlighting how our methodology is effective at correcting flow quantiles. The residual plots show that the framework properly accounts for autocorrelation without the need for lagged state variables and correctly addresses heteroscedasticity, as increasing values of discharge magnitude are not anymore associated with a higher difference in error between observed and simulated values.
On average, streamflow simulations were improved after post-processing at ∼80% of all stations in the validation subsets, for the allPredictors configuration, thereby increasing median KGE from −0.19 to 0.57 (as exemplified in Figure 7(a) and 7(b)). However, this implies that KGE degraded after passing through the RF in ∼20% of testing stations for the best setup. At such locations, median KGE decreased from 0.50 (uncalibrated) to 0.23 (allPredictors). An example of this case is shown in Figure 7(c).
DISCUSSION
Prediction in ungauged basins
Our results show that including hydrological state variables and simulated river discharge, in addition to meteorological inputs and catchment attributes, as predictors to the RF improves streamflow simulations better than a setup that excludes outputs from the hydrological model. This can be seen in the evident improvements to KGE values between the meteoCatchAttr and the allPredictors configurations, which, respectively, showed a median KGE of 0.34 and 0.51. These findings reconcile the long-standing divisions between fully data-driven and process-based hydrological models (Todini 2007), demonstrating that the two approaches can enhance each other's strengths and aid their respective weaknesses.
The current study aimed at properly addressing prediction in ungauged basins (PUB), which are characterized by scarce, inadequate, or absent data records of hydrological observations (Sivalapan et al. 2003). The issue of PUB, which is essentially a problem of extrapolation of universal hydrological laws (Sivapalan 2005), has been addressed with a variety of approaches (Wagener & Montanari 2011), though it is unclear which method works best, due to varying results across different areas (Singh et al. 2014).
Here, the use of SL algorithms enabled efficient extraction of information from large amounts of data, showing promising results in addressing PUB (Besaw et al. 2010; Kratzert et al. 2019a). The non-linear nature of SL methods is highly effective at regionalizing hydrological parameters and characterizing universal catchment behaviours, given sufficient training data (Kratzert et al. 2019b; Prieto et al. 2019; Potdar et al. 2021). This results in widespread generalization capabilities of the RF in ungauged basins, causing the spatial variability of physical catchment characteristics not to be an issue as with a classical calibration approach (Kling & Gupta 2009; Hrachowitz et al. 2013). This additionally enables post-processing across a variety of catchments and regions, a procedure that has been relatively little explored until now (Matthews et al. 2022).
In our case, the model was previously applied on a station-by-station approach, both for training and validation purposes (Shen et al. 2022a). This was done with a time-based split sampling, where the two halves of the available observations were, respectively, used for training and testing. However, this approach has been shown to produce inferior performance in the validation phase, compared to calibrating to the full available observations, which also eliminates subjective decisions on the calibration period (Shen et al. 2022b). Extending such a methodology to the global scale was not feasible, as many stations may only have a few timesteps of observations available compared to the runs of the hydrological model.
Thus, to be able to transfer hydrological knowledge between catchments, we extended the training dataset to include all streamflow observations in each training subset, i.e. all training stations contributed to the RF at all times when river discharge observations were available, to maximize extraction of global-scale knowledge on error in PCR-GLOBWB simulations. Even though this may generate a bias in the training sets, due to some regions contributing more than others, we assumed that an acceptable variety of catchment behaviours was represented.
In our modelling framework, the testing sets in each subsample are treated as ungauged basins, where the RF post-processor is applied to verify its capabilities in accurately predicting river discharge. The results presented here demonstrate a hybrid setup enables the RF post-processor to discern between a multitude of climates and catchment behaviours, making it a viable option to address PUB. This is encouraged by the fact that stations where runoff simulations improved were missing on average 41% of data over the whole modelling period (1979–2019), flagging that performance boosts were independent of the availability of the streamflow data at a particular location. Finally, the SL algorithm was also often capable of correcting for autocorrelation and heteroscedasticity, in addition to properly reproducing streamflow quantiles, whenever there was training data of sufficient quality.
Current limitations and future research
Notwithstanding the good results achieved by the RF-based post-processor, the modelling framework is still subject to a variety of limitations that require further examination and testing. For instance, 15–20% of stations in the various testing subsets are still characterized by a KGE < −0.41, often caused by an already poor performance in PCR-GLOBWB (e.g. central United States, south-east Australia). Since no single factor was found to strongly correlate to these negative performances, future analysis should focus on clustering these stations to understand if there are any shared causes of poor results.
We hypothesize these may stem from the inherent gaps of PCR-GLOBWB in capturing anthropogenic modifications of hydrological processes in arid environments (Smakhtin 2001; Shen & Chen 2010) or significant changes in the water balance of rivers due to inter-basin water transfers (Shumilova et al. 2018; Siddik et al. 2023). Although PCR-GLOBWB does include a module for calculating the storage of lakes and reservoirs, the corresponding predictor variable (surfaceWaterStorage) was not found amongst the most important ones, which may be due to its highly simplified nature.
Another potentially related limitation in our study is the lack of a calibrated version of PCR-GLOBWB at the global scale as further benchmark against our hybrid model. This may constitute additional material for future research, which should primarily focus on the calibration of stations that have lower KGE than −0.41, both in the uncalibrated PCR-GLOBWB and after post-processing. Such a calibration could enhance the information contained in PCR-GLOBWB outputs, thereby potentially improving the generalization capabilities of the hybrid framework and runoff prediction in (semi-)ungauged catchments.
Moreover, the RF post-processor degraded PCR-GLOBWB performance at ∼20% of stations, a sign that the SL algorithm does not always recognize when the hydrological model is already in a good state. At these stations, median KGE was downgraded from 0.50 to 0.23, which is, however, negligible in comparison to the increase from −0.19 to 0.57 at locations where performance was boosted. Results may also be biased on the training datasets, which would surely benefit from a higher length of the available observations. This can be achieved by extending PCR-GLOBWB runs to years prior to 1979, since many stations are characterized by a decline in observations in more recent years, as also noted by Ruhi et al. (2018).
Results would also benefit from a larger number of catchments, which can be included by running PCR-GLOBWB in its version at a resolution of 5 arcmin (∼10 km at the equator), enabling the use of additional catchments for training (5272 GRDC stations have an upstream area between 400 and 10,000 km2). This would also potentially increase the scaling capabilities of the RF post-processor by using a dataset with a higher variety of catchment sizes (Peters-Lidard et al. 2017; Sidle 2021; Tsai et al. 2021).
Further developments of the modelling framework may also focus on including additional predictors of catchment characteristics, especially extending feature extraction from PCR-GLOBWB parameter maps that were not included in this study, but also potentially through the use of external datasets at matching resolutions, e.g. HydroATLAS (Linke et al. 2019). Even though the exclusion of lagged time-variant predictors did not hinder good model results, it may still be significant to execute a feature extraction from PCR-GLOBWB time-series, as delineated in Papacharalampous et al. (2021) and Papacharalampous & Tyralis (2022). In addition to these potential experiments, the current framework could be adjusted and transferred to post-process streamflow simulations from another regional or global-scale process-based hydrological model, if tabular time-series of (at least) input meteorological variables, comparable catchment parameters and simulated discharge are available.
Finally, it is important that other tree-based algorithms are explored for hybrid streamflow modelling, especially more efficient ones and capable of handling the presence of missing values in the input predictors, e.g. XGBoost (Chen & Guestrin 2016), but also entirely new hydrological theories based on non-linearity and systems complexity, to develop more parsimonious process descriptions for predictive hydrological modelling, at multiple scales and in different biomes (Sivalapan et al. 2003).
CONCLUSION
In this study, we have presented a novel hybrid post-processing framework to address uncertainty in streamflow modelling. Static catchment attributes, meteorological input, hydrological state variables and simulated runoff from the global hydrological model PCR-GLOBWB were used as predictors of observed river discharge at the global scale for a random forest regressor.
Results show that the framework is capable of correcting bias and heteroscedasticity, and improving linear correlation between observed and simulated streamflow, by distinguishing different hydroclimatic conditions and a variety of river flow responses. While the median KGE for the uncalibrated PCR-GLOBWB was −0.03, meteoCatchAttr and qMeteoStatevars improved the median KGE to 0.34 and 0.37, respectively. However, combining all selected features as input to the RF (allPredictors setup) ultimately improved KGE at ∼80% of the locations in the validation datasets. The cumulative median KGE for the hybrid setup (allPredictors) was 0.51, showing that the use of predictor values from the hydrological model outputs improved streamflow predictions at unseen locations.
These findings prove that the SL post-processor captures hidden non-linear dynamics between model components, to properly correct the uncertainties in a process-based hydrological model. Moreover, it reduces the need for calibration and the poor performances associated with the indiscriminate transfer of parameters sets between catchments, thus making it a potential candidate for improving streamflow prediction in ungauged basins.
Future research efforts to improve the current framework should focus on: (1) testing additional predictors of catchment and/or time-series behaviour; (2) increasing the spatial resolution of PCR-GLOBWB and the length of available simulations, to expand the current dataset of streamflow observations; (3) calibrating PCR-GLOBWB and (4) cluster analysing stations where the hybrid framework is not capable of improving performance above the mean benchmark, to understand any potentially shared reason of poor generalization.
Overall, the hybrid framework presented here works consistently across scales and climates, holding promises to bridge the gap between process-based and data-driven methods in streamflow prediction, by enhancing the information contained in both approaches. However, it also highlights the deficiencies of the two, such as ignorance about physical processes, poor representation of human impacts on catchments and bias on available data, further implying the urgency of exploring bolder hydrological theories to properly capture runoff formation at multiple scales.
ACKNOWLEDGEMENTS
The results presented in this work were made possible by the CPU cluster velocity at Utrecht University. The authors would like to sincerely thank ICT developer Dr. Oliver Schmitz for the patience and understanding while we were running intensive computations. We also thank the anonymous reviewers that took the time to read our work and helped improve the overall quality of the manuscript.
DATA AVAILABILITY STATEMENT
Please refer to the online version of this article for figures in colour. Uncompressed figures can be found at https://doi.org/10.5281/zenodo.8220029. Additional relevant data to the manuscript are included in the Supplementary Materials. Data and source code of the post-processing framework developed here are available at https://github.com/m7351gn/PCR-GLOBWB-RF. Model code of PCR-GLOBWB can be found at https://github.com/UU-Hydro/PCR-GLOBWB_model. Model inputs and outputs of the hybrid framework can be found at https://doi.org/10.5281/zenodo.7890583 and https://doi.org/10.5281/zenodo.7891352, respectively. All hydrographs produced during validation can be found as a supplement at https://doi.org/10.5281/zenodo.7893903.
AUTHORSHIP STATEMENT
D.K., E.H.S. and M.M. conceptualized and designed the study. E.H.S. ran PCR-GLOBWB simulations and provided process-based outputs. M.M. and E.H.S. contributed to data acquisition. M.M. and Y.S. developed the modelling framework. M.M. drafted the manuscript, which was edited and approved with input by all authors.
CONFLICT OF INTEREST
The authors declare there is no conflict.