Abstract
The recently presented Global BROOK90 automatic modeling framework combines a non-calibrated lumped hydrological model with ERA5 reanalysis data as the main driver, as well as with global elevation, land cover and soil datasets. The focus is to simulate the water fluxes within the soil–water–plant system of a single plot or of a small catchment especially in data-scarce regions. The comparison to runoff is an obvious choice for the validation of this approach. Thus, we choose for validation 190 small catchments (with a median size of 64 km2) with discharge observations available within a time period of 1979–2020 and located all over the globe. They represent a wide range of relief, land cover and soil types within all climate zones. The simulation performance was analyzed with standard skill-score criteria: Nash–Sutcliffe Efficiency, Kling–Gupta Efficiency, Kling–Gupta Efficiency Skill Score and Mean Absolute Error. Overall, the framework performed well (better than mean flow prediction) in more than 75% of the cases (KGESS > 0) and significantly better on a monthly rather than on a daily scale. Furthermore, it was found that Global BROOK90 outperforms GloFAS-ERA5 discharge reanalysis. Additionally, cluster analysis revealed that some of the catchment characteristics have a significant influence on the framework performance.
HIGHLIGHTS
The study evaluates the runoff component performance of the Global BROOK90 automatic framework for hydrological modeling.
Discharge observations from 190 small catchments located all over the globe were used.
Satisfactory results for more than 75% of the catchments were achieved.
KGE decomposition and influence of catchment characteristics on the framework performance were discussed.
INTRODUCTION
Despite the constant advances and development of water balance components measurements, a comparison of observed and modeled discharge still remains the most widespread and common method to evaluate the performance of a hydrological model (Beck et al. 2017). The main reason for that is that the discharge can be treated as an integrated indicator for the whole catchment and its recordings normally have globally much higher spatio-temporal coverage (Beven 2012), while estimation techniques of the other water balance components reveal apparent limitations for a catchment scale application. For example, commonly used eddy-covariance measurements of water fluxes (FLUXNET) (Pastorello 2020) for evapotranspiration are normally considered as representative only for a relative small radius in the vicinity of the tower (Chen et al. 2009; U.S. Department of Energy, Office of Science 2017). Soil moisture active passive satellite product (SMAP) (Das et al. 2018) is limited to near-surface estimations due to a very shallow penetration of the satellite sensor's signal (Koster et al. 2018; Das et al. 2019). The above-mentioned problems and constraints also hold true for the global modeling applications (e.g. Sood & Smakhtin 2015; Siderius et al. 2018a; Harrigan et al. 2020a).
Generally, there are at least two common approaches utilized to prepare and test the applicability of a hydrological model or of a framework on a global scale. The first one is a relative simple way of multiple case studies with catchment-wise calibration and validation (Zhang et al. 2016; Siderius et al. 2018b). This, however, only proves that the process representation in the model is suitable for various geographic locations and hydrological regimes. A second strategy is much more sophisticated and refers to the evaluation of the models, which have initially a global scale parameterization (Samaniego et al. 2010; Beck et al. 2016; Mizukami et al. 2017; Arheimer et al. 2020). This means that these models at first undergo global multi-catchment calibration. Then, various assimilation techniques are applied to match the catchment characteristics used in calibration and global datasets (e.g. elevation, land cover and soil type). The outcome of this approach is a global-coverage parameter set (lumped or distributed) for the respective model.
Most recent studies in the field of global modeling tackle problems of getting reliable spatially distributed global parameter estimates (Qi et al. 2020; Samaniego et al. 2020) and going to higher modeling resolutions on the global scale (<1 km) (Wood et al. 2011; Bierkens et al. 2015). On the contrary, there is a simpler alternative, which does not require large computational power to implement a global model setup. The model parameter sets could be derived by binding global datasets (i.e. land cover and soil) with expert knowledge or available studies (often location specific) on physical properties of the vegetation, surface and soils (Gosling & Arnell 2011; Zhang et al. 2016). However, this approach is associated with more uncertainties due to subjectivity in parameter assignment and inevitable generalization of catchment characteristics by global datasets (i.e. unification of tree species with different parameters under a common ‘deciduous broadleaf forest’ class).
This simplified above-mentioned approach refers as well to the study subject – the Global BROOK90 (GB90) framework, which uses a non-calibrated physically based lumped model with a global parameter set. It was pointed out that the stand-alone (not global) hydrological models (e.g. SWAT and HBV) normally possess a large number of parameters which needs to be calibrated or estimated (Sood & Smakhtin 2015). Thus, due to information constraints (difficulties to find reliable sources with high spatial resolution for model parameterization) and technical issues (considerable computational capabilities are required), the application of such types of models is normally limited to a basin or even finer resolutions and not feasible on a global scale. However, exactly here GB90 finds its niche. Sacrificing the complexity of the model setup in a tradeoff with targeted high-resolution by deploying non-calibrated parameter sets derived from highly resolved global datasets, GB90 gives a benefit of going into the scale of a point, plot or even a small catchment and namely assess hydro response unit (HRU). Thus, the framework serves as a global framework for a local scale application.
The main objective of the study is to validate the ‘Global BROOK90’ framework by a comparison of modeled runoff with observed streamflow using data from small catchments all over the globe within various geographical conditions (i.e. climate, topography, land cover and soils).
Research questions of the paper state as follows:
Can the non-calibrated globally parameterized lumped hydrological model (BROOK90) simulate reasonable runoff on a small (up to few hundred square kilometers) catchment scale?
Is the performance of the discharge estimated by the GB90 related to the certain catchment characteristics or is it rather independent?
DATA AND METHODS
Short description of the ‘Global BROOK90’ framework
The GB90 framework (Vorobevskii et al. 2020) incorporates global open-source datasets for parameterization and forcing of the BROOK90 lumped physical model (Federer 2002a) via an R-package. The scope is to get a detailed representation on daily scale of vertical water fluxes in an automatic mode for nearly any catchment/site on the globe when providing only the desired time period of the simulation and the location of the respective catchment or site. GB90 requires meteorological data input (precipitation on hourly basis; minimum and maximum air temperature, wind speed, surface solar net radiation and vapor pressure deficit on daily scale) as well as local (vegetation and soil characteristics) and default model parameters. The framework automatically downloads the necessary datasets for elevation (Amazon Web Service Terrain Tiles (Mapzen Data Products 2020)), land cover (Land Cover 100 m (Buchhorn et al. 2019)), soil characteristics (SoilGrids250 (Hengl et al. 2017)) and meteorological forcing (ERA5 reanalysis (Copernicus Climate Change Service Information 2018)) for the considered site. Afterwards, these datasets are processed as follows: the catchment is divided into hydro response units (HRU or hydrotop) of 50 m × 50 m regular grid, which are unique combinations of land cover (23 classes in total, including 18 vegetation classes) and soil characteristics (11 classes of soil texture, soil depth to the bedrock and fracture of coarse fragments). Then, the BROOK90 model is applied to each HRU, and a catchment-weighted mean is calculated. Finally, the output data of all desired variables on a daily scale as well as time-series plots are stored. A more detailed description of the framework is presented in Vorobevskii et al. (2020).
Selection of catchments for validation
A comparison of simulated and observed discharge was performed for the validation of the GB90. Thus, the BROOK90 output variable of interest was the total runoff (direct surface flow and subsurface flow from the soil column). The primarily used source of validation data was the discharge database provided by Global Runoff Data Centre (Federal Institute of Hydrology Koblenz Germany 2020). Since it does not include data for many small catchments (<200 km2), few national databases from Germany (Bayerisches Landesamt für Umwelt Deutschland (Bavarian State Office of the Environment Germany) 2020; Ministerium für Energiewende Landwirtschaft Umwelt Natur und Digitalisierung Schleswig-Holstein Deutschland (Schleswig-Holstein Ministry for Energy Transition Agriculture Environment Nature and Digitalization Germany) 2020; Ministerium für Umwelt Landwirtschaft Natur- und Verbraucherschutz des Landes Nordrhein-Westfalen Deutschland (Ministry for the Environment Agriculture Nature and Consumer Protection of the State of North Rhine-Westphalia Germany) 2020; Sächsisches Landesamt für Umwelt Landwirtschaft und Geologie Deutschland (Saxon State Office for Environment Agriculture and Geology Germany) 2020), France (Ministère de l'Ecologie du Développement Durable et de l'Energie France (Ministry of Ecology Sustainable Development and Energy France) 2020), UK (UK Centre for Ecology & Hydrology 2020), Portugal (Departamento de Monitorização de Recursos Hídricos Portugal (Water Resources Monitoring Department Portugal) 2020), Russia (Федеральное агентство водных ресурсов России (Federal Agency of Water Resources of Russia) 2020), Chili (Ministerio de Obras Publicas Gobierno de Chile (Ministry of Public Works Government of Chile) 2020), USA (US Geological Survey 2020), Canada (Government of Canada 2020) and Australia (Australian Government Bureau of Meteorology 2020) were additionally considered. A short overview of the catchments is provided in Table 1, and more extensively is presented in the Supplementary Material 1.
Continental overview on the catchments
Continenta . | Countrya . | Databaseb . | Number of catchments . | Median area (km2) . |
---|---|---|---|---|
Europe | UK, Portugal, Germany, France, Finland, Italy, Norway, Russia, Iceland, Slovenia, Czech Republic, Poland, Ukraine | NRFA, iDA LfULG, AIS GMVO, SNIRH, HYDRO Fr, LuUSH, ELWAS, GDB, GRDC | 88 | 55 |
Asia | Russia, Oman | AIS GMVO, GRDC | 13 | 37 |
North America | US, Canada, Mexico, Puerto Rico, Costa Rica | USGS, HHDC, GRDC | 51 | 67 |
South America | Brazil, Chile, Trinidad and Tobago | DGA Ch, GRDC | 7 | 154 |
Africa | Zambia, Senegal, South Africa | GRDC | 5 | 43 |
Australia and Oceania | Australia, New Zealand, Indonesia | AHRS, GRDC | 26 | 66 |
Continenta . | Countrya . | Databaseb . | Number of catchments . | Median area (km2) . |
---|---|---|---|---|
Europe | UK, Portugal, Germany, France, Finland, Italy, Norway, Russia, Iceland, Slovenia, Czech Republic, Poland, Ukraine | NRFA, iDA LfULG, AIS GMVO, SNIRH, HYDRO Fr, LuUSH, ELWAS, GDB, GRDC | 88 | 55 |
Asia | Russia, Oman | AIS GMVO, GRDC | 13 | 37 |
North America | US, Canada, Mexico, Puerto Rico, Costa Rica | USGS, HHDC, GRDC | 51 | 67 |
South America | Brazil, Chile, Trinidad and Tobago | DGA Ch, GRDC | 7 | 154 |
Africa | Zambia, Senegal, South Africa | GRDC | 5 | 43 |
Australia and Oceania | Australia, New Zealand, Indonesia | AHRS, GRDC | 26 | 66 |
aIslands (e.g. Hawaii) and associated countries (e.g. Island) assigned to the nearest continent.
bAbbreviations according to Supplementary Material 1 footnote.
The selection of appropriate catchments used for validation was based on the following criteria:
Due to the general scope of the model itself, the catchment should have a relatively small drainage area. The common threshold of 200 km2 for the definition of a small catchment was chosen.
The flow regime has to be close to a natural. This was implicitly verified by a manual check for significant human influence based on satellite images. Namely only the general features like dams, reservoirs, felling and dominance of urbanization were checked and catchments with the presence of them were eliminated.
At least 5 years of daily discharge records should be available within the time period 1979–2020 (current extent of ERA5 reanalysis) to account for the statistical significance of the results.
The selected catchments should be spatially distributed on the globe in a way to represent a wide range of land covers and soil texture classes within all climate zones to draw conclusions on a global scale.
Catchment borders were delineated automatically in QGIS (QGIS Development Team 2020) using SRTM30 (∼30 m resolution) (NASA JPL 2019) or ArcticDEM (2 or 10 m resolution) (Porter et al. 2018) (depending on the latitude and the data availability) as an elevation source. Afterwards, a manual quality check and corrections, if necessary, of the results with available satellite images and OpenStreetMap (OpenStreetMap Contributors 2019) were conducted. The maximal allowable difference between delineated and recorded (values which were provided along with the runoff databases) catchment areas was assigned to 10%.
Unfortunately, not too much of the catchments from the open-source databases met the criteria. The most problematic ones were to match the drainage area (unclear gauge position due to coordinate rounding) and to eliminate the evident human activity. The geographical distribution of the finally chosen 190 catchments is shown in Figure 1. With no surprise, mainly European and North American watersheds are predominating due to larger monitoring networks and overall data availability, accessibility and quality. The major issues of poor coverage in Asia and Africa (Table 1; Figure 1) could be explained by three reasons. At first, the GRDC database does not have a lot of data matching the main criteria of the catchments size in these regions. Furthermore, attempts to find data from scientific communities or local databases failed because of either databases have a limited access or there are data policy issues. And finally, some data had to be discarded in most of the cases due to significant human activity in the catchment or due to reasonable concerns regarding data quality. Thus, one has to keep in mind that our ‘global’ study does have some ‘white’ spots. Most of the finally selected catchments are located in a temperate/mesothermal climate zone (Kottek et al. 2006; Figure 2). The majority has a drainage area of less than 50 km2 (with median value of 64 km2), low elevation levels (<500 m) and small terrain slopes (<10°). Mainly various types of forest represent the dominated land cover types. Soils in these catchments have dominantly a loamy texture with a low fracture of coarse fragments (<20%) and a depth to bedrock of 1.5–2.0 m. Most of the catchments have almost a 40-year of discharge record. More detailed metadata on the chosen watersheds is provided in Supplementary Material 1. Characteristics were acquired from the original discharge databases (river name, data length and location), digital elevation models (area and slope), satellite images and Land Cover 100 m (Buchhorn et al. 2019) (land cover type) and SoilGrids250 (Hengl et al. 2017) (soil characteristics).
Chosen catchments (n = 190; WGS-84 Pseudo-Mercator projection) with Köppen–Geiger world climate zones (Kottek et al. 2006) as background.
Chosen catchments (n = 190; WGS-84 Pseudo-Mercator projection) with Köppen–Geiger world climate zones (Kottek et al. 2006) as background.
Histograms of characteristics of chosen catchments (n = 190). Data used to assign characteristics: catchment size, elevation and slope calculated – Amazon Web Service Terrain Tiles (Mapzen Data Products 2020), climate zones – after Köppen–Geiger (Kottek et al. 2006), land cover – Land Cover 100 m (Buchhorn et al. 2019) and soil characteristics – SoilGrids250 (Hengl et al. 2017). For land cover and soil characteristics, catchment-dominant values are shown. Adjusted total length of observed discharge time series is derived within 1979–2020 time period as the total length minus all missing values.
Histograms of characteristics of chosen catchments (n = 190). Data used to assign characteristics: catchment size, elevation and slope calculated – Amazon Web Service Terrain Tiles (Mapzen Data Products 2020), climate zones – after Köppen–Geiger (Kottek et al. 2006), land cover – Land Cover 100 m (Buchhorn et al. 2019) and soil characteristics – SoilGrids250 (Hengl et al. 2017). For land cover and soil characteristics, catchment-dominant values are shown. Adjusted total length of observed discharge time series is derived within 1979–2020 time period as the total length minus all missing values.
Model performance metrics
The following criteria were used to evaluate the model performance on daily and monthly scales:
- Mean absolute error (MAE, in mm) is used to describe the average absolute difference between observed and simulated runoff.where
and
are the modeled and observed discharge values (in mm) at time t and T is the overall length of time series. It has a range of [0, +∞], while MAE = 0 mm corresponds to a perfect match of modeled discharge to the observed data.
- Nash–Sutcliffe Efficiency (NSE) (Nash & Sutcliffe 1970):where
and
are the modeled and observed discharge values (in mm) at time t and T is the overall length of time series. NSE has a range of [−∞, 1] with NSE = 0 meaning that the model has the same explanatory power as the observed mean flow
and NSE = 1 corresponding to a perfect match of modeled discharge to the observed data.
- Kling–Gupta Efficiency (KGE) (Gupta et al. 2009) can be decomposed into three main components important to assess flow dynamics: correlation, bias and variability errors.where r is the Pearson correlation coefficient between the modeled and observed discharge,
is the ratio between the simulated and observed flow variability, and
is the ratio between the mean simulated and mean observed flow:
KGE has a range of [−∞, 1] with KGE = −0.41 meaning that the model performance is as good as the observed mean flow (Knoben et al. 2019). KGE = 1 corresponds to a perfect match of modeled discharge to the observed data, thus 1 stays also as an optimum value for all three KGE components (correlation, bias and variability ratios). Remark to (5): as the same length for observed and modeled time series was considered, a division by sample size (from the definition of standard deviation) eliminates. In some cases, it is more illustrative and clearer to express the bias ratio in percentage (with an optimum value of 0%): - Modified KGE skill score (KGESS): Mean of observed flow is considered as a benchmark a is defined (Knoben et al. 2019):where
and
are KGE values for the model and chosen benchmark (for observed mean flow
).
Finally, the influence of the time scale (daily and monthly) and individual factors (catchment characteristics) on the model performance was tested with the Kruskal–Wallis test (Kruskal & Wallis 1952). It is a non-parametric method based on ranks, which originally tests the null-hypothesis that the given samples originate from the same distribution. However, the test is often used as an analogy of the analysis of variance (ANOVA) test but for median values if an identically shaped distribution for all sample groups can be assumed (McDonald 2014).
RESULTS AND DISCUSSION
Global discharge performance of the framework on daily and monthly scales
The framework performance based on NSE, KGE, KGESS and MAE skill-scores for daily and monthly scales is presented in Figure 3. Since in all of the cases the kernel densities of the skill-scores values are similar and negatively (left side) skewed, it is more reasonable to compare median values rather than means. All skill-score characteristics display similar histogram shapes with a maximum density shifted toward better performance values. The estimated median values of NSE, KGE and KGESS for a daily time scale are −0.06, 0.11 and 0.37, respectively. For the monthly averaged discharge all skill scores show higher results: 0.06 for NSE, 0.22 for KGE and 0.45 for KGESS. Thus, it can be stated that more than a half of the catchments outperform the mean observed flow benchmark. Specifically, on a daily time scale, the exact numbers yield to 48% (NSE > 0) and 75% (KGESS > 0) from the total number of the catchments. Slightly higher values for monthly discharge aggregates (namely 52 and 78% for NSE and KGESS, respectively) were achieved. Furthermore, it can be pointed out that NSE values show much higher variation in Inter-Quartile Range (IQR) than the other skill-score values. Especially on the monthly scale, it was found that IQR varies from −1.5 to 0.4. Overall, the global parameterization scheme and global meteorological forcing did not show a very high performance of the GB90 framework for the considered catchments. However, this does not diminish the benefits of the framework as an easy-to-use and universal modeling tool. For instance, the most recent innovation in global discharge reanalysis datasets GloFAS-ERA5 showed median KGESS value of 0.21 in the class of the smallest catchments areas (500–2,500 km2) and an MAE of 0.41 mm day−1 (Harrigan et al. 2020b) whereas GB90 achieved values of 0.37 and 1.25 mm day−1, respectively. Moreover, IQR for KGESS for the GloFAS-ERA5 (−0.28 ÷ 0.50) has higher deviations than for GB90 (0.04 ÷ 0.55), especially for the values lower than median. However, a remark should be placed that although the study of the GloFAS-ERA5 dataset evaluation is maybe the most relevant to compare with (regarding coverage, time-series length, resolution and catchment size), its spatial scale (0.1° or ∼ 11 km) is still far away from the one which GB90 uses (50 m).
Summary of the GB90 framework performance for all 190 catchments: NSE, KGE, modified KGESS and MAE for daily and monthly scales. Filled contours show rotated kernel density, horizontal lines and numbers – median values, vertical red/blue boxes – IQR, vertical red/blue lines – 1.5 IQR. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/nh.2021.150.
Summary of the GB90 framework performance for all 190 catchments: NSE, KGE, modified KGESS and MAE for daily and monthly scales. Filled contours show rotated kernel density, horizontal lines and numbers – median values, vertical red/blue boxes – IQR, vertical red/blue lines – 1.5 IQR. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/nh.2021.150.
Finally, Figure 3 pointed out that the GB90 framework has a better performance on a monthly rather than on a daily scale, which is additionally confirmed by the results of the conducted Kruskal–Wallis tests. These tests indicated that the median values of KGE (and Pearson's correlation as one of the decomposed components), KGESS and MAE values showed a significant (p-value < 0.05) difference between scales, unlike the NSE values, variability and bias ratio.
The spatial distribution of the KGESS for the tested catchments is presented in Figure 4. The global KGESS values of 0.37 for median and of 0.12 for mean, respectively, were found. The performance is better in Europe (0.82 for Rio Beca in Portugal), eastern parts of Australia (0.73 for Alligator Creek), Russia (0.79 for Mikhansky Creek) and western regions of the U.S. (0.84 for Little Creek). The lowest values of KGESS arose in many catchments in the Middle East (lowest found value of −4.99 for Wadi Khabb Shamsi), central parts of the U.S. and Canada, Eastern Europe and Africa. In general, no clear geographical patterns were detected, except a better performance for coastal regions (U.S., Canada and Australia).
Modified KGESS on daily scale. KGESS = 0 refers to performance on a benchmark level of a mean flow.
Modified KGESS on daily scale. KGESS = 0 refers to performance on a benchmark level of a mean flow.
One evident advantage of the KGE skill score is that it can be split up into three components. Based on the decomposition more insights can be gained on the GB90 performance aspects namely from the statistical perspective. All catchments showed a positive correlation (Figure 5(a)) with a global median and mean Pearson correlation coefficient of 0.51 (IQR = [0.38, 0.62]). The best results were observed in Europe, where however only one of the catchments showed a correlation >0.9. Figure 5(c) shows that runoff simulations generally expose a positive bias. 66% of the catchments have a bias ratio >1 with a global median bias ratio of 1.36. This means that expressed as a percent bias (PBIAS) this error yields +36% with an IQR of −10 and +100%. The highest positive bias ratios were found for the catchments in central parts of North America, Africa, Eastern Europe and Eastern Asia. In the recent study of global river discharge reconstruction using the high-resolution model for over 14,000 gauges (Lin et al. 2019), authors consider PBIAS within ±20% to describe a very good model performance, which was observed for 35% of tested sites. The results of the presented analysis are consistent with the aforementioned study as only 22% of the catchments fulfill this ‘good performance’ benchmark. Figure 5(b) depicts a lower variability in GB90 results than in observations for 57% of the catchments (variability ratio <1). However, the deviations of variability ratio from the optimum value are less prominent than for the bias with global median value of 0.91 with IQR of −40 and 30% (Figure 5(b)). The highest overestimations of variability were discovered in U.S. (arid regions), Canada, Africa and Australia. The most noticeable lack in the representation of variability was found in northern parts of Europe, Canada and catchments with volcanic islands (Hawaii and New Zealand). Overall, the spatial patterns of large positive bias and variability ratios look consistent with the catchments, which showed the low KGESS values (Figures 4 and 5).
Decomposition of KGE for daily (left) and monthly (right) scales: (a) Pearson correlation r, (b) variability ratio and (c) bias ratio
. Value of 1 for all components refers to an optimum. Bias ratio remains the same for daily and monthly scale.
Decomposition of KGE for daily (left) and monthly (right) scales: (a) Pearson correlation r, (b) variability ratio and (c) bias ratio
. Value of 1 for all components refers to an optimum. Bias ratio remains the same for daily and monthly scale.
The comparison of daily and monthly KGESS decomposition reveals that the significantly better performance of the GB90 on a monthly scale (namely by 22% based on comparison of the median values) is driven by the noticeable rise of correlation coefficient, whether variability ratio remains almost on the same level.
Furthermore, an analysis of the average magnitude of absolute errors in discharge estimations is presented. This was done by calculation of the MAE metric (Figure 6). The GB90 median MAE of 1.25 mm day−1 (IQR = 0.91, 1.91 mm day−1) is twice as big as, for example, GloFAS-ERA5 discharge reanalysis (Harrigan et al. 2020b) which is driven by the same ERA5 reanalysis data. However, in that study, authors did not consider catchments smaller than 500 km2. It is particular important to look at the MAE values for rivers in dry regions as a small over- or underestimation of discharge can potentially produce large biases (Harrigan et al. 2020a). Here by dry regions, we mean the presence of long dry season with no or very low channel flow. Indeed, the results indicate that those catchments with the largest PBIAS (>100%, Figure 5(c)) have in fact the lowest (<0.5 mm day−1) absolute magnitude of errors given their dry locations. For instance, this happens in the central U.S., Oman and southern Australia with arid or semi-arid climate on one side; northern Canada and central Russia with strongly pronounced continental climate on the other side characterized by dry of freezing, respectively, channel for the most of the year. Further, the highest MAE was found in southern Alaska U.S. (Wolverine Creek, 17.3 mm day−1), North Island of New Zealand (Manganui River, 8.5 mm day−1) and Island of Hawaii U.S. (Honolii Creek, 8.3 mm day−1) catchments. Poor results for the above-mentioned catchments possibly might be explained by the specifics of the geographical conditions. For example, glaciers are the dominant land cover type in the first catchment. Permanent ice in GB90() is currently simplified to impervious areas and thus does not account for glacier's mass balance processes. Furthermore, volcanic landscapes within the other two catchments have their own soil specifics with regard to hydrologic response and are not covered by standard soil types presented in GB90. In general, it seems like most of the catchments with high MAE could be aggregated in the following clusters: southern Alaska, Hawaii, Indonesia, Iceland and New Zealand. One of the reasons can be a coincidence with the areas of high annual precipitation (over 2,000–3,000 mm year−1). As precipitation has usually a direct positive correlation to discharge, a relatively big total annual runoff sum forms and thus estimated MAE values (even few dozens of mm per year) represent in fact low relative annual errors.
Nevertheless, high or low values of the performance characteristics do not necessarily imply absolute truth in terms of the real representation of the hydrological response of the model. It implies that metrics like NSE or KGE usually help to answer questions like ‘how much’ or ‘where’ and cannot provide all information to understand the model performance. Especially the question ‘why’ is not sufficiently answered. Therefore, a bad metric value does not always imply a bad model performance and vice versa. Thus, one should not forget to have a look on the raw data ‘by hand’ and try to find a physical, rather than just purely statistical explanations of the differences in modeled and observed variable. A manual visual analysis of observed and modeled hydrographs (Supplementary Material 2) may reveal some controversial cases. For example, Figure 7 shows the analysis of 10-year modeled and observed discharge time series for two rivers with complete different hydrological regimes. On the one hand, it is visible for Razat Creek in Oman that GB90 modeled discharge matches relatively well with the only one big flood (both magnitude and timing), while most of the other time the river remains in near dry conditions (arid climate zone), which increases artificially the performance (multiply zero values). On the other hand, low values of NSE and KGE for the Edoma River in Russia can be implicitly explained by a 2–3 week shift of spring floods and the absence of flow in the winter months because of channels and/or soil freezing. Probable reasons for that include issues of ERA5 itself (resolution, bias) as a meteorological input for the catchment area and misrepresentation of snowmelt and soil frost processes (which are well-known issues of BROOK90 (Federer 2002b)).
10 years of a daily performance of GB90 for (a) Razat (Oman) and (b) Edoma (Russia).
10 years of a daily performance of GB90 for (a) Razat (Oman) and (b) Edoma (Russia).
Influence of the catchment's characteristics on the framework performance
At first, the individual influence of all numeric catchment characteristics (namely area; mean elevation and slope; number of HRUs; observation data length; percentage of land cover, soil texture, stone fracture and depth to bedrock classes dominance) on the GB90 framework performance was tested by Pearson's correlation. It was found that only the mean slope showed a significant (p-value <0.05) correlation with both NSE and KGE with coefficients of 0.16 and 0.26 for daily and monthly scales, respectively (for KGE).
Figure 8 demonstrates a KGESS performance of GB90 for various classes of the catchment characteristics. Clusters for each specific characteristic were derived based on the original formal classification (for qualitative variables) or quantile breaks (for numerical variables). By means of violin (rotated density) plots and median values for daily and monthly scale, one can derive a ‘virtual’ catchment for which GB90 shows the highest performance. This catchment is between 50 and 100 km2 size with a mean elevation of 500–1,000 m and a slope of 10–20°. It could be quite heterogeneous according to the number of HRUs (>400) and has more than 35 years of the discharge record for the discharge evaluation. The catchment is located in A (tropical/megathermal) or C (temperate/mesothermal) climate zones, and its land cover is dominated mostly by shrubs, evergreen (broad- and needle-leaf) or deciduous (broadleaf) forests. The catchment has a high areal dominance of more than 75%. Soil texture is formed mostly with sandy loam. Finally, the soil column depth usually amounts to 100–150 cm with the stone fracture of 20–40%.
Clustering of the model performance (KGESS) for daily and monthly scales according to the catchment characteristics: drainage area, mean elevation and slope, number of hydro response units (HRUs or hydrotops), length of the discharge record, climate zone, dominant land cover type and percentage of its dominance, soil texture type, soil depth to the bedrock and soil stone fracture. Line inside plots refer to median value. Number above each of the violin plot refers to the number of catchments inside the cluster. Clusters for numeric variables assigned approximately via ‘quantile’ breaks. Clusters with sample size less than 5 were omitted. For illustration purposes, Y-axis is limited to −1.
Clustering of the model performance (KGESS) for daily and monthly scales according to the catchment characteristics: drainage area, mean elevation and slope, number of hydro response units (HRUs or hydrotops), length of the discharge record, climate zone, dominant land cover type and percentage of its dominance, soil texture type, soil depth to the bedrock and soil stone fracture. Line inside plots refer to median value. Number above each of the violin plot refers to the number of catchments inside the cluster. Clusters for numeric variables assigned approximately via ‘quantile’ breaks. Clusters with sample size less than 5 were omitted. For illustration purposes, Y-axis is limited to −1.
A considerably lower performance was found for the watersheds with small mean slope (<5°) and cultivated/managed land cover type which implicitly shows limitations of BROOK90 and the framework itself. Absence of the flow routing in the model could play a higher role for the flatter catchments because those typically possess dense channel network with highly variable and often delayed response time. Although one case study (Yu et al. 2013) showed that slope does not have a significant influence on the accuracy of the flow generation in BROOK90, this aspect was not yet studied on a global or even multi-site scale. Further, the global generalization of vegetation parameters undoubtedly yields to the same parameter set for completely different plant species united under the class of ‘cultivated’ land cover, for example, vineyards and rice. Moreover, general watershed management (irrigation and dam regulation) can add more uncertainties since it is not considered by the framework setup. Therefore, in the current state, the GB90 is limited to natural catchments. The lack of sensibility of the model to soil texture classes (which is consistent with other studies (e.g. Tafasca et al. 2020)) can be explained by the coarser resolution and general quality of the map (lower class predictability accuracy according to validation (Hengl et al. 2017) in comparison to land cover map), generalized parameterization of hydraulic characteristics and an overall small sample size.
According to the results of the Kruskal–Wallis test, many catchment characteristics (Figure 8) possess at least one class inside which demonstrated a significant (p-value <0.05) difference in the KGESS median values for both daily and monthly scales. Namely, these are the following: mean elevation and slope, number of HRUs, land cover type, soil depth to the bedrock and stone fracture. This means that for the above-mentioned characteristics, their classification actually plays a role in the clustering of the framework performance.
CONCLUSIONS AND OUTLOOK
GB90 is an automatized, non-calibrated modeling framework, which couples the physically based lumped BROOK90 hydrological model with a global land-soil parameterization scheme and is driven by ERA-5 meteorological input data for different time periods and catchments. 190 small catchments located all over the world within various geographical conditions, climate zones and catchment characteristics were selected within the 1979–2020 time period to validate the approach. The modeled total runoff was compared with observed discharge via the metrics MAE, NSE, KGE and KGESS for daily and monthly results.
The non-calibrated simulations performed well in more than 75% of the cases (i.e. KGESS > 0). Catchments in the mild climate zones of central Europe, U.S. and Canada showed the best results, while the worst performance was found in dry regions of Africa, the central U.S. and Canada, Australia and eastern Russia. The decomposition of the KGE revealed that in most of the cases, the performance is mostly dominated by the correlation component and a large bias (which is positive for the majority of the catchments). Additionally, a significantly (according to the Kruskal–Wallis test) higher agreement with the observations was found on a monthly scale. Moreover, a comparison (based on KGESS) of GB90 and the GloFAS-ERA5 discharge reanalysis (which are both driven by same meteorological ERA5 input) revealed that GB90 showed better median and IQR results for the small catchments.
An attempt to cluster the framework performance by the classification of the catchment characteristics revealed a significant influence of the mean elevation and slope, the number of HRUs, the land cover type, the soil depth to the bedrock and stone fracture on the median KGESS values. Namely, it was found that heterogeneous flat catchments with dominantly managed/cultivated land cover are the most difficult catchments to model properly by the GB90 framework, while the framework demonstrated much better performance for elevated and steep forest sites. The application limitations for agricultural management can be resolved though by the incorporation of much more detailed information about the crops. Unfortunately, the current experimental setup does not allow a more detailed uncertainty analysis by splitting the framework's uncertainty into a meteorological and parameterization part. To gain that information one needs to restructure the framework flow and force it with multiple parameter sets (both accounting for possible misclassification of the initial land cover and soil type datasets and wrong parameterization of each class) and various meteorological datasets (including real observations and other reanalysis). Afterwards, by analyzing the model runs using different cross-combinations, one might estimate and distinguish between two uncertainty types by, for example, cluster analysis.
Overall, the GB90 framework showed good performance results considering its scope of being an automatic non-calibrated water balance modeling framework on a small catchment scale. Thus, the framework proved its ability to produce quick and rough, yet reliable reanalysis water balance estimations for ungauged catchments all over the globe. This study result emphasizes its application in the absence of detailed information, while, if presented, methods that are more sophisticated are beneficial (e.g. global parameter estimations based on the calibration outputs). In addition, by the incorporation of open access short and long-term weather forecasts (e.g. 16-day Global Forecast System (GFS) by NOAA with 0.5° resolution (National Centers for Environmental Information, US 2020) or 7-day Icosahedral Nonhydrostatic (ICON) model by DWD with 13 km resolution (Deutscher Wetterdienst (German Weather Service) 2020)), GB90 might be extended to an operational and forecasting framework. As a further outlook, the authors see, based on these promising results, a big potential for further improvements and research in this field, namely by:
validation of the evapotranspiration fluxes,
comparison of the soil moisture component with other global datasets (modeled, estimated with remote sensors, data assimilation),
global sensitivity analysis and optimization of the global parameter set,
in-deep analysis of framework uncertainty and
incorporation of various meteorological input datasets (reanalysis/forecasts).
AUTHOR CONTRIBUTIONS
Conceptualization I. V. and R. K.; methodology I. V. and R. K.; software I. V.; results I. V.; writing: original draft preparation I. V., review C. B. and R. K., editing I. V.; visualization I. V.; supervision R. K.
FUNDING
Open Access Funding by the Publication Fund of the TU Dresden. This research was also funded by the German Federal Ministry of Education and Research (FKZ 01LR 2005A – funding measure ‘Regional Information on Climate Action’ (RegIKlim), section (a) Model Regions).
ACKNOWLEDGEMENTS
The authors express great thanks to our colleagues – Judith Pöschmann and Philipp Körner for the valuable comments to the paper draft. Furthermore, many thanks to data providers: global (GRDC) and national (Germany, France, UK, Portugal, Russia, Chile, US, Canada and Australia) open discharge data banks. Additionally, the authors thank BMBF for providing the funding opportunities for the study under the scope of ‘KlimaKonform’ project.
CONFLICTS OF INTEREST
The authors declare that they have no conflict of interest.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information. Global BROOK90 package is available on GitHub (Global_BROOK90), discharge data was collected from global and local open-data platforms, and raw model runs could be provided upon request (∼70 GB).