ABSTRACT
Hydrological research in tropical regions has been limited by a lack of high-quality precipitation and evapotranspiration input data, and high-quality observed runoff calibration data. Hence, most hydrological models have been developed for non-tropical regions and calibrated on non-tropical climatic data. To address these challenges, we assessed the performance of 24 rainfall–runoff models in combination with seven precipitation and three evapotranspiration datasets (504 combinations), in three catchments in the Citarum basin, Indonesia, to identify which models and input datasets were best suited to modelling daily mean flow in a volcanic region and tropical climate, and to assess the parameter sensitivity of the most suitable models. We found that model performance was most strongly affected by choice of input precipitation data and that the most suitable option was different for each of the three study catchments, varying over tens of kilometres. Hydrological model structure played a less significant role, and models developed explicitly for volcanic or tropical regions did not show a clear advantage over those designed for other regions or climates. However, models with two separate, parallel, fast and slow routing pathways had higher predictive skill than those with just one pathway in representing the year-round range of flow behaviours.
HIGHLIGHTS
Large model and gridded climate data intercomparison in a tropical volcanic region.
Catchment runoff simulation performance depends most on precipitation suitability.
The suitability of precipitation datasets can change over tens of kilometres.
Models developed for tropical or volcanic regions do not outperform other models.
Minimum model complexity requires two separate, parallel internal flow paths.
INTRODUCTION
Conceptual or lumped rainfall–runoff modelling is often used in situations where limited computing power and/or knowledge prohibit more detailed modelling of the catchment(s) of interest. These models simplify and conceptualize the main runoff-generating processes and can be used to provide input boundaries to more detailed hydrodynamic models, assess water resources and storage, and estimate the relationships between flood peaks and rainfall events under different land-use or climatic conditions (e.g. Mishra et al. 2017; Rickards et al. 2020; Zhang et al. 2023; Li et al. 2024). However, most hydrological models have been developed to model runoff generation in non-tropical regions and calibrated on non-tropical climatic data. For example, of the 46 rainfall–runoff models (RRMs) described and implemented by Knoben et al. (2019), only MODHYDROLOG (Chiew 1990; Chiew & McMahon 1994) and SIMHYD (Chiew et al. 2002) were originally developed using data recorded at a handful of gauging stations in northern Australia and the SMAR (Soil Moisture Analytical Relationship) model (Kachroo 1992) was developed using data from one catchment in Malaysia. In all three cases, non-tropical catchments were also used, and tropical catchments were outnumbered in the development dataset. Hence, the performance or suitability of RRMs in tropical regions is only very rarely considered during the development phase and often in conjunction with performance and suitability in other climate groups, despite tropical regions experiencing higher annual precipitation and evapotranspiration totals, more intense precipitation events, and often strongly seasonal precipitation, in comparison with other climate types. This bias against tropical regions exists despite approximately 40% of the world's population living in tropical countries (though not always under a tropical climate), a percentage expected to reach 50% by 2050 (State of the Tropics 2014).
The suitability of well-known hydrological models for tropical catchments has only received limited attention. Petheram et al. (2012) compared five models in 105 catchments in northern Australia for modelling daily mean flows, finding that RRMs designed for temperate climates may not be suited to the dramatic fluctuations in streamflow within and between wet seasons in tropical climates but that the same general modelling strategies applied to temperate climates are applicable to tropical (savanna) regions. Chiew et al. (2018) modelled daily mean flows in 780 Australian catchments, including a handful under a tropical monsoon climate in the northeast. Model performance was poorest in northern Australia, though this was attributed to the larger typical catchment size compared to other regions. All models were able to reproduce high flows but struggled with low flows – both occur annually in monsoon-driven climates. Desclaux et al. (2018) modelled four small (32–51 km2) and mountainous catchments in New Caledonia, being able to model both flash floods at hourly timesteps and large annual variations. However, the models were less reliable in catchments where land cover had changed over the study period.
Despite the challenges posed by tropical catchments, numerous studies in Indonesia have been undertaken to explore the complexities of rainfall–runoff modelling in these environments. Gunawan (2021) compared flood peak magnitudes and timings for the Air Bengkulu watershed estimated using different synthetic unit hydrographs and HEC–HMS (Hydrologic Engineering Center-Hydrologic Modeling System) but did not recommend any one over any other. Yanto et al. (2017a) used the VIC model with gridded precipitation, temperature, and wind speed products (Yanto et al. 2017b) to estimate daily mean flows in five catchments in Java but had to reject 14 potential catchments due to poor observed flow data quality. Of the accepted catchments, calibration and validation Nash–Sutcliffe efficiency (Nash & Sutcliffe 1970) ranges were 0.31–0.89 and 0.07–0.79, respectively. Harlan et al. (2010) compared the GR4J and NRECA RRMs for calibration and validation periods in the Citarum Hulu basin. Nash–Sutcliffe efficiency was in the range of 0.73–0.78 for both models during calibration and validation periods. Van den Brink (2009) used HBV (Hydrologiska Byråns Vattenbalansavdelning) to model the daily discharge of the Cidanau River in Java. The model underestimated large peak flows and could not reproduce an observed constant baseflow. Jones et al. (2002) used PDM (Probability Distributed Model) to model 15-min river flow at several gauging stations in the Citanduy basin, but the ability to calibrate the model was strongly limited by the low density and unrepresentativeness of available rain gauge data and discontinuities in the 15-min gauged river flow records.
The low availability of high-quality observed runoff data for tropical (and arid and polar) catchments has limited the improvements that can be made in rainfall–runoff modelling outside temperate and continental climates (Beck et al. 2016). The reduced abundance of historical river gauging data can largely be attributed to the fact that tropical countries tend to have benefited from lower levels of historical economic development compared to temperate countries. Petheram et al. (2012) identified that, even within Australia, a more economically developed country, tropical regions suffer from more erroneous rainfall and evaporation data. Consequently, comparatively less research into rainfall–runoff modelling has focused on tropical climates. The availability of various gridded precipitation and evapotranspiration products can mitigate a lack of equivalent gauged data in sparsely gauged regions, but work to assess the relative suitability of different gridded products is ongoing. Suroso et al. (2023) assessed the suitability of TRMM rainfall data for hydrological modelling using three contrasting catchments in Java, Indonesia, and found rain gauge data to be systematically superior. Sekaranom et al. (2018) found TRMM data to underestimate extreme daily rainfall totals in Central Java, while Senjaya et al. (2020) found TRMM data to underestimate wet-season monthly totals in East Java.
There has been limited research on the integrated problem of weather data and hydrological model structure. Therefore, this study evaluates the suitability of common hydrological models and a range of different gridded precipitation and evapotranspiration datasets in tropical climates, using high-quality river flow data obtained from three gauging stations in the Citarum basin, West Java, Indonesia, for calibration. Seven rainfall and three evapotranspiration products are combined with 24 RRMs. Our aim is to identify suitable RRMs and provide an empirical basis for their use in this tropical volcanic region. The Citarum basin has been selected because of its relatively high density of hydrological monitoring, with one river flow gauge per approximately 290 km2. Furthermore, land uses in the gauged catchments of the Citarum basin are common in many other tropical monsoon regions around the world, consisting mainly of forest and cropland (Tanaka et al. 2021; Rahayu et al. 2023). Additionally, relatively low levels of urban development make these catchments more suitable for conceptual hydrological modelling.
The aims of this study were to identify the most appropriate hydrological models and input datasets for modelling daily mean flow in a volcanic region under a tropical climate and to assess the importance of individual model parameters in the most suitable models.
STUDY AREA
The Citarum basin is located in West Java province on Java in Indonesia, approximately between 107° and 108° east and 6 and 7¼° south (Figure 1). The total catchment area is 6,600 km2 (Fulazzaky 2010; Yoshida et al. 2017). The southern part of the basin is ringed by several volcanoes, the highest of which, Mandalawangi, peaks at 3,019 m above sea level (Lehner et al. 2008). These volcanoes surround a plateau at approximately 650–700 m, on which sits the largest metropolitan area in the catchment, Bandung, with an estimated population of 8.9 million as of mid-2022. The Citarum descends through three reservoirs north and east of Bandung, then flows across a large alluvial plain at near sea level before reaching the eastern edge of Jakarta Bay. Land cover in the Citarum basin is typical of West Java, with urban and wetland farming (mostly rice paddies) in the lowlands, dryland farming (non-irrigated seasonal crops) in the adjacent lower slopes, tea plantations in the higher slopes and primary forest on the mountain peaks (Rahayu et al. 2023).
The basin's climate is predominantly tropical rainforest and tropical monsoon (Köppen groups Af and Am), with small areas of tropical savannah (Aw) at the coast and temperate climate (Cfb) in the highest mountains (Beck et al. 2018). The majority of annual rainfall is driven by the Indo-Australian monsoon, which causes streamflow to fluctuate dramatically during the year. Haylock & McBride (2001) suggest that wet-season rainfall is especially, and perhaps inherently, unpredictable in Indonesia, and global climate models disagree on how climate change is expected to affect Indo-Australian Monsoonal rainfall over Indonesia (Jourdain et al. 2013; Narsey et al. 2020). However, a significant international focus on improving the understanding and prediction of atmospheric and oceanic multiscale variability in the Maritime Continent (Indonesia, Borneo, New Guinea, the Philippines, Malay Peninsula and surrounding seas) has developed in recent years (Yoneyama & Zhang 2020), and a more recent regional climate model has predicted increases in mean and extreme precipitation over land in the Maritime Continent (Argüeso et al. 2022), with decreases in both over the sea. Regardless, Indonesia already experiences regular flooding, with 6,098 damaging flood events reported over 2020–2023 (BNPB 2024), a mean of over 1,500 per year.
DATA
Gauged streamflow
Gauged daily mean flows were acquired from the Center for Hydrology and Water Management (Indonesia) for three gauging stations in the Citarum basin: Citarum at Nanjung (58.8 years during 1918–2016), Cirasea at Cengkrong (24.3 years during 1984–2012) and Cikundul at Cikerta (10.1 years during 1998–2011). There are no apparent changes to the flow regime in any specific period of record, despite the recent and rapid urbanization of the Citarum basin. This stationarity could indicate that the study catchments remained rural, that they remained at the same urbanization level, or that their flow regimes are heavily managed. The third possibility presents a challenge to rainfall–runoff modelling, as heavy management of flows implies that discharge varies in ways that do not relate to measurable natural processes (Vu et al. 2023). However, non-stationary flow regimes also present a challenge, as they imply that the model parameters must vary over time to account for changes in local climate or land use/land cover. Time-varying parameter values were not considered in this study due to additional challenges in deciding which parameters should vary with time, the nature of each variation, and the additional variables needed to specify time-varying (versus constant) parameters.
Digital elevation model
The HydroSHEDS 15-arcsecond digital elevation model (DEM) (Lehner et al. 2008) was used to generate boundaries for the catchment area upstream of each gauging station, after first snapping the location of each station to the nearest location on the mainstream network defined by the HydroSHEDS grids. Nanjung is the largest catchment, with an area of 1,817 km2. It has been the subject of several previous hydrological and flood studies of the Citarum basin, often alone (e.g. Harlan et al. 2010; Nastiti et al. 2015; Julian et al. 2019; Hatmoko et al. 2020; Rusli et al. 2021). Cikerta and Cengkrong are considerably smaller, with areas of 160.6 and 67.7 km2, respectively.
Precipitation and evapotranspiration data
All RRMs require input rainfall data from which to generate modelled runoff hydrographs. Continuous-simulation models also require evapotranspiration data to model the drying of water storage elements between precipitation events. Many also require temperature data to model snowfall, thawing, and freezing, and temperature data may also be used to estimate evapotranspiration if it is not supplied.
Precipitation data were obtained from seven gridded precipitation products with daily temporal resolution and various spatial resolutions and start/end dates (Table 1).
Precipitation datasets used in this study
Dataset . | Spatial resolution . | Temporal range . | Reference . |
---|---|---|---|
AgMERRA | 0.25 | 1980–2010 | Ruane et al. (2015) |
CHIRPS | 0.05 | 1981–near present | Funk et al. (2015) |
CPC | 0.5 | 1979–near present | Xie et al. (2010) |
MSWEP v2 | 0.1 | 1979–near present | Beck et al. (2019) |
PERSIANN-CDR | 0.25 | 1983–near present | Ashouri et al. (2015) |
SA-OBS | 0.25 | 1981–2014 | van den Besselaar et al. (2017) |
Yanto | 0.125 | 1985–2014 | Yanto et al. (2017b) |
Dataset . | Spatial resolution . | Temporal range . | Reference . |
---|---|---|---|
AgMERRA | 0.25 | 1980–2010 | Ruane et al. (2015) |
CHIRPS | 0.05 | 1981–near present | Funk et al. (2015) |
CPC | 0.5 | 1979–near present | Xie et al. (2010) |
MSWEP v2 | 0.1 | 1979–near present | Beck et al. (2019) |
PERSIANN-CDR | 0.25 | 1983–near present | Ashouri et al. (2015) |
SA-OBS | 0.25 | 1981–2014 | van den Besselaar et al. (2017) |
Yanto | 0.125 | 1985–2014 | Yanto et al. (2017b) |
Map of Citarum basin and gauging stations: Citarum at Nanjung (purple outline), Cikundul at Cikerta (pink outline), and Cirasea at Cengkrong (yellow outline).
Map of Citarum basin and gauging stations: Citarum at Nanjung (purple outline), Cikundul at Cikerta (pink outline), and Cirasea at Cengkrong (yellow outline).
Mean monthly precipitation and evapotranspiration in the Citarum basin during 1985–2010. Bracketed values in legends indicate the mean annual total during 1985–2010.
Mean monthly precipitation and evapotranspiration in the Citarum basin during 1985–2010. Bracketed values in legends indicate the mean annual total during 1985–2010.
While precipitation throughout the representative year follows the same overall ‘monsoonal’ pattern in all seven precipitation datasets, there are differences between them. Mean annual rainfall varies from 1,962 mm (CPC) to 2,811 mm (CHIRPS). The mean of all seven yearly totals is 2,359 mm, which is most closely matched by Yanto (2,374 mm), a dataset focused specifically on Java Island, based entirely on rain gauge data without satellite or reanalysis products, and incorporating over 750 gauges, although with relatively sparse coverage in the middle of the Citarum basin. The only other dataset to incorporate many Javanese rain gauges is SA-OBS, which matches Yanto more closely than any other dataset but still shows differences, including a wetter January–March and drier and longer dry season. Correspondence between all seven rainfall datasets is best in the drier months – four agree on August totals of 55–62 mm, while two agree on 42–43 mm – although CHIRPS estimates double this, at 87 mm. Correspondence is worse during the wettest months – AgMERRA and PERSIANN-CDR both estimate a lower total in February than either January or March; this unusual agreement is possibly a consequence of AgMERRA using PERSIANN as one-third of its ensemble ‘target’ for average monthly precipitation patterns (Ruane et al. 2015). CHIRPS estimates the shortest wet season, while CPC estimates the longest. AgMERRA estimates a wet season onset near the average of all seven datasets but still has the second lowest annual total rainfall as its estimated totals for December, January, and February are the lowest of any dataset. Three models, CPC, MSWEP, and SA-OBS, estimate a > 20% increase in monthly totals from December to January, while CHIRPS estimates an 8% decrease. Unlike rainfall, all three evapotranspiration datasets estimate similar totals for each month and the year as a whole, although Yanto estimates over 20 mm more potential evapotranspiration per month than AgMERRA during July–October. Evapotranspiration is relatively stable throughout the representative year, ranging from 107 to 163 mm per month in all datasets and being more than (less than) representative rainfall in the dry (wet) season, respectively.
The 100 largest daily accumulations in each precipitation dataset from 1985 to 2010, including the mean and SD of these accumulations (a). The longest continuous run of wet days per hydrological year in each precipitation dataset for wet day thresholds of 1, 5, and 10 mm (b).
The 100 largest daily accumulations in each precipitation dataset from 1985 to 2010, including the mean and SD of these accumulations (a). The longest continuous run of wet days per hydrological year in each precipitation dataset for wet day thresholds of 1, 5, and 10 mm (b).
Figure 3(b) estimates the continuity of the wet season according to each rainfall model by plotting the longest continuous run of days per hydrological year (01 September–31 August) where rainfall exceeds a threshold of either 1, 5, or 10 mm on each day. This shows some similarity with Figure 3(a). For example, Yanto has many more runs of days with >1 mm or >5 mm rainfall than any other model, but this matches its middling annual rainfall total because it also has the smallest top 100 events. This result implies that wet-season rainfall in Yanto is distributed more evenly across days, while in other models, particularly MSWEP, wet-season rainfall is more concentrated in fewer days. In general, there is an inverse relationship between a model's top 100 event sizes and its longest run of days with >1 mm or >5 mm rainfall. This relationship is subdued when a dry/wet threshold of 10 mm is considered because 10 mm/day is approximately the mean daily rainfall for the wet season, and many daily rainfalls must be below the mean by definition. However, the pattern seen using a 10 mm threshold is still similar to those observed using a 1 or 5 mm threshold.
It is important to note that a continuous run of 242 wet days, the maximum seen in Yanto, does not imply 242 days of continuous rain and does not even imply as many rainfall minutes over a whole wet season as a shorter run of consecutive wet days. Due to the monsoonal nature of the rain, it is likely that no rain actually occurred during most hours of each day in any wet season (the storms are likely concentrated within short sub-daily durations). Also, due to the definition of ‘consecutive’, a single dry day in the middle of any period would be sufficient to halve the number of consecutive wet days. Conversely, a longer-than-24-h dry period would not result in a dry day if it occurred across 2 days that both had rain outside of the dry period.
Table 2 presents the correlation between daily catchment-average precipitation and gauged daily mean flow for all precipitation datasets and all study catchments. In all catchments, precipitation-flow correlation is highest with Yanto, followed by PERSIANN and SA-OBS in Cikerta and Cengkrong; the order of second and third is reversed in Nanjung. In all catchments, precipitation-flow correlation is lowest with AgMERRA and CPC. Overall, correlation ranges from a minimum of 25.5% to a maximum of 52.6%, being under 46% in all except one case.
Correlation between daily catchment-average precipitation and gauged daily mean flow for three catchments and seven gridded precipitation datasets
Catchment . | AgMERRA . | CHIRPS . | CPC . | MSWEP . | PERSIANN . | SA-OBS . | Yanto . |
---|---|---|---|---|---|---|---|
Nanjung | 0.315 | 0.393 | 0.308 | 0.336 | 0.416 | 0.459 | 0.526 |
Cikerta | 0.308 | 0.364 | 0.318 | 0.353 | 0.389 | 0.380 | 0.409 |
Cengkrong | 0.255 | 0.316 | 0.255 | 0.273 | 0.385 | 0.338 | 0.442 |
Catchment . | AgMERRA . | CHIRPS . | CPC . | MSWEP . | PERSIANN . | SA-OBS . | Yanto . |
---|---|---|---|---|---|---|---|
Nanjung | 0.315 | 0.393 | 0.308 | 0.336 | 0.416 | 0.459 | 0.526 |
Cikerta | 0.308 | 0.364 | 0.318 | 0.353 | 0.389 | 0.380 | 0.409 |
Cengkrong | 0.255 | 0.316 | 0.255 | 0.273 | 0.385 | 0.338 | 0.442 |
MODELS AND METHODS
This study compared 24 RRMs, all of which were coded in R (R Core Team 2019). Eighteen of the models followed the diagrams and equations in the supplement of Knoben et al. (2019) (Table 3).
RRMs following Knoben et al. (2019)
Model . | No. parameters . | Reference . |
---|---|---|
FLEX-I | 10 | Fenicia et al. (2008) |
GR4J | 4 | Perrin et al. (2003); Santos et al. (2018) |
HBV-96 | 9 | Lindström et al. (1997) |
Hillslope | 7 | Savenije (2010) |
HYCYMODEL | 12 | Fukushima (1988) |
HyMOD | 6 | Wagener et al. (2001); Boyle (2001) |
IHACRES | 7 | Jakeman et al. (1990) |
MODHYDROLOG | 15 | Chiew (1990); Chiew & McMahon (1994) |
MOPEX-1 | 5 | Ye et al. (2012) |
MOPEX-3 | 6 | Ye et al. (2012) |
MOPEX-4 | 7 | Ye et al. (2012) |
NAM | 10 | Nielsen & Hansen (1973) |
New Zealand v2 | 8 | Atkinson et al. (2003) |
SIMHYD | 7 | Chiew et al. (2002) |
SMAR | 8 | Tan & O'Connor (1996) |
Susannah Brook v1-5 | 6 | Son & Sivapalan (2007) |
Thames Catchment Model | 6 | Greenfield (1984); Moore & Bell (2001) |
Xinanjiang | 12 | Zhao (1992) |
Model . | No. parameters . | Reference . |
---|---|---|
FLEX-I | 10 | Fenicia et al. (2008) |
GR4J | 4 | Perrin et al. (2003); Santos et al. (2018) |
HBV-96 | 9 | Lindström et al. (1997) |
Hillslope | 7 | Savenije (2010) |
HYCYMODEL | 12 | Fukushima (1988) |
HyMOD | 6 | Wagener et al. (2001); Boyle (2001) |
IHACRES | 7 | Jakeman et al. (1990) |
MODHYDROLOG | 15 | Chiew (1990); Chiew & McMahon (1994) |
MOPEX-1 | 5 | Ye et al. (2012) |
MOPEX-3 | 6 | Ye et al. (2012) |
MOPEX-4 | 7 | Ye et al. (2012) |
NAM | 10 | Nielsen & Hansen (1973) |
New Zealand v2 | 8 | Atkinson et al. (2003) |
SIMHYD | 7 | Chiew et al. (2002) |
SMAR | 8 | Tan & O'Connor (1996) |
Susannah Brook v1-5 | 6 | Son & Sivapalan (2007) |
Thames Catchment Model | 6 | Greenfield (1984); Moore & Bell (2001) |
Xinanjiang | 12 | Zhao (1992) |
As well as the 18 models in Table 3, two additional variants of HyMOD and four variants of the PDM were implemented (Table 4).
Additional RRMs
Model . | No. parameters . | Notes . |
---|---|---|
HyMOD (exp) | 4 | As HyMOD, replacing Pareto-distributed soil store with exponentially distributed soil store |
HyMOD (1 res) | 4 | As HyMOD, replacing Pareto-distributed soil store with exponentially distributed soil store, and three serial ‘fast’ linear reservoirs with one linear reservoir |
PDM | 9 | As Moore (2007) |
PDM (exp) | 7 | As Moore (2007), replacing Pareto-distributed soil store with exponentially distributed soil store |
PDM (3 par) | 3 | As Mathias et al. (2016), Mathias (2023, p. 447) |
PDM (2 par) | 2 | As Mathias et al. (2016), Mathias (2023, p. 447), replacing a nonlinear routing reservoir with a linear routing reservoir |
Model . | No. parameters . | Notes . |
---|---|---|
HyMOD (exp) | 4 | As HyMOD, replacing Pareto-distributed soil store with exponentially distributed soil store |
HyMOD (1 res) | 4 | As HyMOD, replacing Pareto-distributed soil store with exponentially distributed soil store, and three serial ‘fast’ linear reservoirs with one linear reservoir |
PDM | 9 | As Moore (2007) |
PDM (exp) | 7 | As Moore (2007), replacing Pareto-distributed soil store with exponentially distributed soil store |
PDM (3 par) | 3 | As Mathias et al. (2016), Mathias (2023, p. 447) |
PDM (2 par) | 2 | As Mathias et al. (2016), Mathias (2023, p. 447), replacing a nonlinear routing reservoir with a linear routing reservoir |
Snow and freeze/thaw routines were not implemented for any models as snow is not a realistic occurrence in the Citarum basin (note that MOPEX-2 was not considered as it is equivalent to MOPEX-1 with a snow store). Due to near-constant (i.e. non-seasonal) evapotranspiration rates, MOPEX-4 was slightly modified, with the equation for interception (I) replaced by a constant to be optimized. According to the classifications of Paul et al. (2021), all models are conceptual, continuous simulation, catchment-scale, lumped, and deterministic. They are all solved sequentially.
All the hydrological models tested here follow similar overall concepts, consisting of stores that are charged either by rainfall or previous stores and depleted by evapotranspiration, inter-store flow, and/or model outflow, which are all either related to the current volume or depth of water in the store, or are constant. The main differences between the models are the number of stores, their arrangement, and each store's relationship between storage and inter-store flow, model outflow, and/or evapotranspiration.
Jansen et al. (2021) noted that several models with identical names may differ internally in their numerical and mathematical formulations (e.g. implicit or explicit solution, sequential or simultaneous fluxes) and that parameter values are not transferrable between different models with the same name. However, it is possible for these different models to achieve very similar performance scores using different parameter sets.
Some of the compared models have been used previously in tropical climates, including PDM by Jones et al. (2002), HBV by van den Brink (2009), SMAR, IHACRES and SIMHYD by Petheram et al. (2012), GR4J, SIMHYD, and Xinanjiang by Chiew et al. (2018), GR4J by Harlan et al. (2010), and GR4H, an hourly version of GR4J, by Desclaux et al. (2018). Additionally, three of the models (MODHYDROLOG, SIMHYD, and SMAR) included some tropical catchments in their development datasets. However, no model was designed specifically for tropical catchments, and most were developed exclusively or almost exclusively for temperate and/or continental climates.
Model performance was quantified in this study by modified Kling–Gupta efficiency (KGE': Kling et al. 2012) and its components, corresponding to correlation, bias, and variation between modelled and observed daily flows. Despite its clear potential, few studies consider how the individual components of KGE' contribute towards the final score. Missing data periods and the period corresponding to the first three years covered by each input dataset, which were used as the spin-up period for the models' internal reservoirs, were excluded from the calculation.
Models were assessed through a two-stage procedure to allow the identification of the ensemble of behavioural model parameter sets. First, each model's parameterization was optimized via the shuffled complex evolution algorithm (Duan et al. 1992, 1994) to find the models with the potential to perform best in the tropical volcanic study region. This was implemented through the R function SCEoptim in the package ‘hydromad’ (Guillaume 2013). Four optimizations, each using 12 complexes, but with elitism set to 2, 1, 0.5, and 0.1 respectively, were performed for each combination of catchment, model, rainfall, and evapotranspiration data. In every case, we assumed no prior knowledge of the hydrological processes active in the catchments, so no model parameters were fixed and all were optimized. In each case, the parameter values corresponding to the highest KGE' were kept. Higher elitism gives more weight to more optimal parent parameter sets when evolving toward optimal parameter values.
The second stage of the assessment followed the GLUE methodology (Beven & Binley 1992) to account for the equifinality in the different parameter sets. A subset of models that performed well during the first stage were selected, and then each was input with one of 100,000 random parameter sets, where each parameter value was sampled randomly from a uniform distribution with limits suggested by Knoben et al. (2019). For each case, the subset of 1,000 parameter sets giving the top 1% of KGE' values was considered ‘behavioural’. For selected models, parameter identifiability was measured using Kolmogorov–Smirnov (KS) tests (Massey 1951) to compare the complete set of 100,000 values against the subset of 1,000 behavioural values.
Since all input data were trimmed to the same 1985–2010 period, the first three years (1985–1987) were always used for model spin-up. Calibration and validation periods for each river gauge were defined according to the gauging period, with the first approximately two-thirds of data recorded between 1988 and 2010 used for calibration and the last third used for validation (Table 5). Note that missing data within a year may or may not form a continuous period. The results and discussion presented in this article consider the validation period only.
Period-of-record, calibration period, validation period, and missing data for the Citarum at Nanjung, Cikundul at Cikerta, and Cirasea at Cengkrong gauging stations
Gauge . | Period-of-record . | Calibration period (non-missing days) . | Validation period (non-missing days) . | Missing data from 1988 to 2010 . |
---|---|---|---|---|
Citarum at Nanjung | 1918–2016 (58.8 years, 58 complete) | 1988–2002 (5,114 days) | 2003–2010 (2,556 days) | 1989 (full year) |
2004 (full year) | ||||
Cikundul at Cikerta | 1997–2011 (10.1 years, 6 complete) | 1998–2006 (2,294 days) | 2008–2010 (1,048 days) | 1988–1996 (full years) |
1997 (215 days) | ||||
1998 (29 days) | ||||
2001 (36 days) | ||||
2002 (full year) | ||||
2003 (177 days) | ||||
2004 (171 days) | ||||
2005 (full year) | ||||
2007 (full year) | ||||
2009 (48 days) | ||||
Cirasea at Cengkrong | 1984–2012 (24.3 years, 20 complete) | 1988–2001 (4,263 days) | 2002–2010 (2,701 days) | 1995 (full year) |
1996 (full year) | ||||
1997 (120 days) | ||||
2003 (103 days) | ||||
2004 (full year) | ||||
2008 (30 days) | ||||
2009 (48 days) | ||||
2010 (39 days) |
Gauge . | Period-of-record . | Calibration period (non-missing days) . | Validation period (non-missing days) . | Missing data from 1988 to 2010 . |
---|---|---|---|---|
Citarum at Nanjung | 1918–2016 (58.8 years, 58 complete) | 1988–2002 (5,114 days) | 2003–2010 (2,556 days) | 1989 (full year) |
2004 (full year) | ||||
Cikundul at Cikerta | 1997–2011 (10.1 years, 6 complete) | 1998–2006 (2,294 days) | 2008–2010 (1,048 days) | 1988–1996 (full years) |
1997 (215 days) | ||||
1998 (29 days) | ||||
2001 (36 days) | ||||
2002 (full year) | ||||
2003 (177 days) | ||||
2004 (171 days) | ||||
2005 (full year) | ||||
2007 (full year) | ||||
2009 (48 days) | ||||
Cirasea at Cengkrong | 1984–2012 (24.3 years, 20 complete) | 1988–2001 (4,263 days) | 2002–2010 (2,701 days) | 1995 (full year) |
1996 (full year) | ||||
1997 (120 days) | ||||
2003 (103 days) | ||||
2004 (full year) | ||||
2008 (30 days) | ||||
2009 (48 days) | ||||
2010 (39 days) |
RESULTS
Stage 1: Model optimization
Maximum KGE' between observed and modelled daily flow achieved at Nanjung gauging station using all combinations of seven precipitation and three evapotranspiration input datasets through 24 models (top left). Components of the maximum KGE' score (correlation coefficient, top right; coefficient of variation, bottom left; bias ratio, bottom right). Solid horizontal lines show the median value per model, boxes show the interquartile range. Dotted horizontal lines show median and interquartile range across all models. The numbers at the top of the bars show the number of model parameters.
Maximum KGE' between observed and modelled daily flow achieved at Nanjung gauging station using all combinations of seven precipitation and three evapotranspiration input datasets through 24 models (top left). Components of the maximum KGE' score (correlation coefficient, top right; coefficient of variation, bottom left; bias ratio, bottom right). Solid horizontal lines show the median value per model, boxes show the interquartile range. Dotted horizontal lines show median and interquartile range across all models. The numbers at the top of the bars show the number of model parameters.
Seven models were able to achieve KGE' scores above 0.7, exclusively with Yanto precipitation. The Citarum basin is a tropical and volcanic region, so it is notable that MODHYDROLOG, HYCYMODEL, and New Zealand v2 were within the top four performing models. MODHYDROLOG was developed purely on Australian catchments, some of which were in a tropical climate, while HYCYMODEL and New Zealand v2 were developed in Japan and New Zealand, both containing volcanic regions, though under non-tropical climates.
Maximum KGE' between observed and modelled daily flow achieved at Cikerta gauging station using all combinations of seven precipitation and three evapotranspiration input datasets through 24 models (top left). Components of the maximum KGE' score (correlation coefficient, top right; coefficient of variation, bottom left; bias ratio, bottom right). Solid horizontal lines show the median value per model, boxes show the interquartile range. Dotted horizontal lines show median and interquartile range across all models. The numbers at the top of the bars show the number of model parameters.
Maximum KGE' between observed and modelled daily flow achieved at Cikerta gauging station using all combinations of seven precipitation and three evapotranspiration input datasets through 24 models (top left). Components of the maximum KGE' score (correlation coefficient, top right; coefficient of variation, bottom left; bias ratio, bottom right). Solid horizontal lines show the median value per model, boxes show the interquartile range. Dotted horizontal lines show median and interquartile range across all models. The numbers at the top of the bars show the number of model parameters.
Maximum KGE' between observed and modelled daily flow achieved at Cengkrong gauging station using all combinations of seven precipitation and three evapotranspiration input datasets through 24 models (top left). Components of the maximum KGE' score (correlation coefficient, top right; coefficient of variation, bottom left; bias ratio, bottom right). Solid horizontal lines show the median value per model, boxes show the interquartile range. Dotted horizontal lines show median and interquartile range across all models. The numbers at the top of the bars show the number of model parameters.
Maximum KGE' between observed and modelled daily flow achieved at Cengkrong gauging station using all combinations of seven precipitation and three evapotranspiration input datasets through 24 models (top left). Components of the maximum KGE' score (correlation coefficient, top right; coefficient of variation, bottom left; bias ratio, bottom right). Solid horizontal lines show the median value per model, boxes show the interquartile range. Dotted horizontal lines show median and interquartile range across all models. The numbers at the top of the bars show the number of model parameters.
The highest KGE' scores, both above 0.68, were achieved by two variants of the PDM, with MOPEX variants taking three of the top seven places. Hence, there is some agreement between Cikerta and Cengkrong regarding optimal models that is not shared strongly with Nanjung, even though Cengkrong is nested inside Nanjung. However, the only commonality between the three catchments in terms of optimal input datasets is that the choice of evapotranspiration dataset is unimportant. The varying suitability of precipitation datasets for the three nearby catchments indicates that the suitability of any particular precipitation dataset can change over short distances.
Stage 2: Parameter identification
The parameter identification study focused on three models that were top or near-top performers in at least two catchments: MOPEX-1, NAM, and two-parameter PDM. PDM parameter names are taken from Moore (2007), and MOPEX-1 and NAM parameter names are taken from Knoben et al. (2019).
Cumulative distributions for MOPEX-1 parameters, corresponding to 100,000 nominally uniform samples for each parameter (dotted grey lines), the 1,000 samples giving the highest KGE' scores compared to gauged flow data for each catchment (coloured lines), and the Stage 1 optimized values for each parameter (coloured symbols).
Cumulative distributions for MOPEX-1 parameters, corresponding to 100,000 nominally uniform samples for each parameter (dotted grey lines), the 1,000 samples giving the highest KGE' scores compared to gauged flow data for each catchment (coloured lines), and the Stage 1 optimized values for each parameter (coloured symbols).
The performance of MOPEX-1 is most sensitive to the value of Se, the root zone storage capacity. While all three catchments require low Se for high performance, the cumulative density function (CDF) and optimized values for Cikerta and Nanjung are more similar to each other than they are to those for Cengkrong. Performance in Cikerta and Nanjung is very sensitive to the value of Sb1, maximum soil moisture (although with different optimal values), while performance in Cengkrong is more sensitive to tu, the time parameter for leakage from the root zone to slow runoff store. Considering the single optimized values of all time constants together, Nanjung has the fastest flow through both the ‘fast’ and ‘slow’ paths, while Cikerta has a relatively slow ‘fast’ path, and Cengkrong has a fast ‘fast’ path and slow ‘slow’ path. This suggests that MOPEX-1 behaves as intended in Cengkrong and Nanjung, with reasonably high performance (KGE' = 0.654 and 0.679, respectively). All three catchment models are less sensitive to other parameters, but KS values are significant at p = 0.01 in all cases.
Cumulative distributions for NAM parameters, corresponding to 100,000 nominally uniform samples for each parameter (dotted grey lines), the 1,000 samples giving the highest KGE' scores compared to gauged flow data for each catchment (coloured lines), and the Stage 1 optimized values for each parameter (coloured symbols).
Cumulative distributions for NAM parameters, corresponding to 100,000 nominally uniform samples for each parameter (dotted grey lines), the 1,000 samples giving the highest KGE' scores compared to gauged flow data for each catchment (coloured lines), and the Stage 1 optimized values for each parameter (coloured symbols).



Cumulative distributions for two-parameter PDM parameters, corresponding to 100,000 nominally uniform samples for each parameter (dotted grey lines), the 1,000 samples giving the highest KGE' scores compared to gauged flow data for each catchment (coloured lines), and the Stage 1 optimized values for each parameter (coloured symbols).
Cumulative distributions for two-parameter PDM parameters, corresponding to 100,000 nominally uniform samples for each parameter (dotted grey lines), the 1,000 samples giving the highest KGE' scores compared to gauged flow data for each catchment (coloured lines), and the Stage 1 optimized values for each parameter (coloured symbols).
DISCUSSION
This study confirms the Citarum at Nanjung as having sufficient flow data quality for hydrological modelling, giving confidence to previous studies in this area (e.g. Harlan et al. 2010; Nastiti et al. 2015; Julian et al. 2019; Hatmoko et al. 2020; Rusli et al. 2021). However, we also highlight the Cirasea at Cengkrong and the Cikundul at Cikerta as other stations in the same basin with potentially high-quality, long-duration flow records that could be used to broaden the scope of other hydrological studies focused on the Citarum basin, Indonesia, or volcanic or tropical regions generally.
Overall, considering both the broad model optimization study and the more focused study of individual catchment model parameterizations, it is apparent that performance, measured by modified KGE, is much more strongly linked to the availability of appropriate precipitation data than either hydrological model structure or evapotranspiration data. Overall, the greater influence of precipitation data than the model structure on high-flow estimates agrees with several previous studies (e.g. Bárdossy et al. 2022; Zhang et al. 2022; Aitken et al. 2023), particularly in wetter regions (Thober et al. 2018) and tropical climates (van Kempen et al. 2021). Furthermore, the strong relationship between precipitation characteristics and major runoff event characteristics in tropical climates agrees with, e.g. Birch et al. (2021). Perhaps surprisingly, given that it is so strongly based on gauged rainfall data from the study area, the use of Yanto precipitation data did not result in consistently high KGE' scores and even resulted in model outputs with greatly underestimated total flow (bias) and poor correlation with observed flows in Cikerta. Hence, it cannot be assumed that gauged rainfall within the study area is more ‘true’ than remotely sensed estimates.
This study demonstrated that the suitability of many common, widely used, and easily available gridded precipitation datasets can change noticeably over short distances of tens of kilometres, extending the findings of Baez-Villanueva et al. (2018) and dos Santos Silva et al. (2023) to considerably smaller spatial scales. Our study also highlights that individual grid cells in gridded precipitation datasets may be larger than a study catchment, leaving calculated catchment-average rainfalls vulnerable to the interpolation procedures used to create the gridded datasets. Luo et al. (2019) found that a model of the lower Lancang-Mekong Basin driven by gridded (CHIRPS or TRMM) rainfall data outperformed the same model driven by gauged rainfall data, supporting this finding. For all these reasons, it is important that modelling of tropical catchments considers multiple precipitation datasets, and that assumptions about suitability are not made beforehand.
Within each catchment, the top 22–24 (of 24) models achieved KGE' values within a 0.1 range, meaning that almost all models performed similarly well. No model structure or structures ranked consistently highly in all three study catchments, nor was there a tendency for models developed in volcanic or tropical regions to perform particularly highly. However, a more detailed comparison of MOPEX-1, NAM, and the two-parameter PDM suggested that model structure did have some importance, as an optimal model structure seemed to need two separate, parallel internal flow paths (having more than two did not provide additional benefits). While these are nominally for ‘fast’ and ‘slow’ routing, the similar KGE' values achieved by almost all models (per-catchment) suggested that a range of internal structures, developed mainly for temperate climates, can be ‘repurposed’ to better suit other types of catchments, such as the surface and baseflow runoff paths of the standard PDM appearing to trade roles (consider for example the study of Vesuviano et al. (2022)).
It should be noted that, while KGE' is a suitable metric for high flow-focused studies, such as this one (Mizukami et al. 2019), alternative metrics should be considered for water resources studies. van Kempen et al. (2021) suggest that model structure may be relatively more important than input data for low-flow performance.
Practical implications for flood prediction and water management
Given our finding that modelled flow depends more strongly on input precipitation than model structure, future work to improve flood estimation, prediction, and risk management in tropical climates should focus more on improving precipitation estimates and forecasts than on changes to hydrological model structures. In this context, future work on flood modelling in tropical regions should prioritise the acquisition of more accurate precipitation data, while current studies should consider using an ensemble of precipitation inputs from varied sources to produce a range of flow estimates. It is important to note that locally gauged precipitation data cannot automatically be assumed to be more accurate than remotely sensed products. This is because a rain gauge represents a single point, compared to the spatially integrated remotely sensed products. This source of uncertainty should be carefully considered in flood prediction models.
To effectively improve flood prediction and water management, policymakers should focus on advancing the accuracy of precipitation data, emphasising the integration of ground-based, remotely sensed, and appropriate modelled estimates. This multi-source approach will enhance the reliability of flow predictions, providing a stronger foundation for flood risk management and water resource planning in tropical regions. The modelling approach generates an ensemble of river flow predictions that can be integrated into frameworks such as robust decision-making (Hall et al. 2012), enabling effective decision-making under uncertainty. These approaches allow the information contained within the different models and precipitation datasets to be effectively used to support water and flood risk management.
CONCLUSIONS
Flooding remains a significant problem for tropical regions due to their high annual and daily wet-season rainfalls and, in many cases, high population densities. However, most rainfall–runoff modelling tools were not developed for either tropical climates or, as particularly relevant in Indonesia, volcanic regions. Furthermore, while many different rainfall data sources are available for tropical regions, their accuracy is hard to assess as there are few rainfall gauging stations against which to compare them. In this study, we compared the performance of 24 existing hydrological model variants, with 21 combinations of seven precipitation and three evapotranspiration data sources, to model gauged daily flows at three river gauging stations in the Citarum basin, Java, Indonesia, first comparing overall performance per-catchment, per-climatic input and per-model via modified KGE', then identifying which specific parameters in which models had the greatest impact on performance.
We found that appropriate precipitation data had the greatest effect on the maximum KGE' achieved, but that the highest KGE' scores were achieved with different precipitation datasets in different study catchments. This highlights the need to consider multiple precipitation input datasets when modelling catchments with strong rainfall–runoff relationships in wet climates. The choice of evapotranspiration dataset had little effect on KGE' as the differences between datasets were relatively small. In comparison to precipitation, the hydrological model structure also had little effect on KGE', with the top 22–24 (of 24) models achieving KGE' values within a 0.1 range in each study catchment. Model structures developed for volcanic or tropical regions did not outperform other model structures in this tropical volcanic region. However, this study did imply a minimum model complexity required to successfully represent the range of flow behaviours seen in this volcanic tropical monsoon region, consisting of two separate, parallel, fast and slow routing pathways, with flexibility allowed in the details of how they are implemented. These findings demonstrate the need to focus on improving precipitation data and estimates to improve flood prediction and water management in tropical climates.
ACKNOWLEDGEMENTS
This work was supported by the Natural Environment Research Council (NE/S002790/1, NE/S002790/2 and NE/S00310X/1). Rahmawati Rahayu's PhD scholarship was funded by the Durham University Global Challenges Centre for Doctoral Training, which in turn was funded by the UK Global Challenges Research Fund (GCRF). The authors are also grateful for the river flow rate data provided by the Center for Hydrology and Water Management, which is part of the Indonesia Ministry of Public Works (Balai Besar Wilayah Sungai). We would like to thank Hydrology Research's Subject Editor, Associate Editor and two anonymous reviewers for their helpful comments.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.