ABSTRACT
Accurate precipitation is crucial for hydrological modelling in sparse gauge regions like the Lam River Basin (LRB) in Vietnam. Gridded precipitation data from satellite and numerical models offer significant advantages in such areas. However, satellite precipitation estimates (SPEs) are subject to uncertainties, especially in high variable of topography and precipitation. This study focuses on enhancing the accuracy of Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (IMERG), Climate Prediction Center morphing technique (CMORPH) using the Quantile Mapping (QM) technique, aligning the cumulative distribution functions of the observed precipitation data with those of the SPEs, and assessing the impact on hydrological predictions. The study highlights that the post-correction IMERG precipitation using QM performs better than other data sets, enhancing the hydrological model's performance for the LRB at different temporal scales. Nash–Sutcliffe efficiency values increased from 0.60 to 0.77, surpassing the original IMERG's 0.52 to 0.74, and correlation coefficients improved from 0.79 to 0.89 (compared with the previous 0.75–0.86) for hydrological modelling. Additionally, Percent Bias (PBIAS) decreased from approximately −1.66 to −2.21% (contrasting with the initial −20.22 and 4.6%) with corrected SPEs. These findings have implications for water resource management and disaster risk reduction initiatives in Vietnam and other countries.
HIGHLIGHTS
IMERG and CMORPH satellite precipitation data were corrected for the Lam River basin.
Uncertainties in satellite-derived precipitation data improved by aligning distribution functions using quantile mapping.
Hydrological modelling assesses post-corrected data effectiveness.
Post-corrected IMERG data using QM outperform other data sets.
Precise precipitation data led to more reliable streamflow and water balance simulation.
INTRODUCTION
Accurate precipitation data are vital for the integrity of hydrological modelling, serving as a critical input for simulations that predict water cycle dynamics, which subsequently impact water resource management and disaster mitigation strategies. As hydrological science progresses, the refinement of data collection and processing methods remains a critical area of research. With advancements in meteorological technology, the utilisation of satellite precipitation estimates (SPEs) has become integral in hydrological and climatological modelling endeavours due to their wide coverage and accessibility, particularly in regions with limited ground-based observations (Sun et al. 2018). However, the reliability of SPEs is subject to various uncertainties stemming from retrieval algorithms, data sources (Xue et al. 2013), and gauge adjustment procedures (Ebert et al. 2007; Gebremichael et al. 2014; Guo et al. 2015). These uncertainties typically manifest as systematic biases related to algorithms and post-processing, and random errors influenced by measurement instruments. Moreover, studies indicate that these errors vary seasonally, regionally, and topographically, presenting significant challenges (Serrat-Capdevila et al. 2016; Sun et al. 2018), especially in tropical maritime regions like south-east Asia (Iqbal et al. 2022). For instance, previous evaluations of SPEs in Malaysia have revealed a modest performance in daily rainfall estimation, indicative of the complexities inherent in estimating precipitation over such diverse terrains (Tan et al. 2015). Thus, the selection and calibration of appropriate SPEs is crucial for hydrological modelling applications.
SPEs can be categorised into two groups based on their lag time: real-time (RT) products with short lag times, usually used for calibration, such as Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (IMERG)-E Real-Time and Global Satellite Mapping of Precipitation (GSMaP) Near Real-Time (NRT) data, and gauge data-merged products with longer lag times (about a few months). Gauge data-merged products, such as GSMaP_Gauge, IMERG-F, and Climate Prediction Center Morphing Technique (CMORPH), often exhibit better suitability due to their calibration against ground-based observations (Joyce et al. 2004; Huffman et al. 2017). This study focuses on two prominent data sets: CMORPH and IMERG. IMERG combines data from multiple satellites (including Global Precipitation Measurement – GPM) to estimate precipitation globally. It is invaluable in areas lacking ground-based instruments. It tends to overestimate small rainfall amounts and underestimate large ones. In arid and mountainous regions, IMERG exhibits substantial uncertainty at daily and hourly scales, with variations across different areas (Pradhan et al. 2022; Huffman et al. 2024). Additionally, IMERG tends to underestimate rainfall, particularly in the GPM_3IMERGHH (30-min and 0.1°) product. The daily product, GPM_3IMERGDF, performs better but still underestimates precipitation (Das et al. 2022; Wang et al. 2024). IMERG tends to overestimate precipitation, particularly over mountainous regions like the Tibetan Plateau and China (Ma et al. 2018). Researchers studying extreme events should consider the limitations of satellite algorithms in providing realistic information (Huffman et al. 2024). CMORPH blends satellite-based precipitation estimates with gauge data to create high-resolution precipitation maps. Uncertainty arises due to the morphing process, which interpolates between gauge observations and satellite data. Different precipitation modes (stratiform, shallow convection, and deep convection) impact the CMORPH accuracy due to variations in the cloud ice concentration affecting precipitation rates (Wang et al. 2019). CMORPH underestimates precipitation, especially over snow-covered ground. The claimed temporal resolution of 0.5 h and spatial resolution of 8 km may not hold after interpolation (Wang et al. 2019). Errors propagate through time, affecting subsequent estimates. Even when averaging CMORPH products over specific accumulation time intervals (e.g. 1–6 h), errors remain correlated (Zeweldi & Gebremichael 2009).
Studies comprehensively compare various artificial neural network (ANN) algorithms and machine learning (ML) approaches for predicting daily streamflow. This highlights the progress made in understanding hydrological complexities and the advantages and disadvantages of these widely used ML models (Mohammadi 2021; Burgan 2022). However, these studies underscore the importance of calibrating satellite precipitation data, which significantly reduces the errors of SPEs compared with pre-calibration states, to improve the simulation of streamflow and the overall performance of hydrological models (Jahanshahi et al. 2024). These methodologies serve as the bedrock for enhancing the calibration process, each contributing uniquely to achieving a robust hydrological model. The recent literature on runoff forecasting has increasingly focused on calibrating satellite precipitation data to enhance the accuracy and reliability of hydrological models. Various methods reflect a concerted effort to refine the predictive capabilities of these models through sophisticated calibration techniques and advanced algorithms, such as Quantile Mapping (QM), linear scaling (LS) (Yang et al. 2016), intensity thresholds (Saber & Yilmaz 2018), non-linear power transformation (Pratama et al. 2018), ratio bias-correction (Gumindoga et al. 2019), regression analysis (Chen et al. 2020), deep neural networks (Yang et al. 2022), geo-weighted regression, optimal interpolation, kriging-based algorithms (Li et al. 2023), and dynamic Bayesian models (Ma et al. 2018).
Kofidou et al. (2023) provide valuable insights into downscaling algorithms for satellite-derived precipitation data. The review critically compares statistical and dynamical methods for the downscaling of spatial or spatiotemporal GPM and Tropical Rainfall Measuring Mission (TRMM) precipitation estimates. Sharifi et al. (2019) evaluate three downscaling techniques, multiple linear regression, ANNs, and spline interpolation, to improve the spatial resolution of IMERG precipitation data. These downscaled methods may assume a linear relationship between SPEs and cloud variables or require large amounts of training data. This linear relation leads to potential overfitting or poor generalisation, additionally assuming smoothness between data points, which might mean that it would struggle to represent the study area accurately in terms of sharp transitions or localised topography variations.
Nan et al. (2024) explore three bias-correction approaches: LS, local intensity scaling (LOCI), and power transformation (PT). The study introduces a novel approach based on window sliding data correction and Bayesian data fusion to correct GPM IMERG and FY-2G to outperform TRMM 3B42RT. While the proposed correction–fusion method enhances SPEs, the effectiveness of the method depends on parameter choices and careful consideration is needed when correcting other SPE products. Li et al. (2023) introduce a new method called matching precipitation threshold by time series quantile mapping (MPTT-QM) by improving the spatial distribution of precipitation forecasts and preserving temporal changes more effectively to enhance computing efficiency, which is crucial for large-scale applications. The application of MPTT-QM to the Pearl River basin using the Flexible Global Ocean-Atmosphere-Land System (FGOALS) precipitation forecasts and IMERG-final product as the observations has shown promising results. However, MPTT-QM is a complex method that requires additional computing resources for processing and it relies on a robust set of observational data to establish the thresholds and mappings. Moreover, MPTT-QM was developed and tested primarily in the Pearl River basin, which has unique climatic and hydrological conditions. Transferring the method with careful calibration could result in accuracy differences due to different environmental factors. MPTT-QM is designed for sub-seasonal- to seasonal-scale forecasts.
Laverde-Barajas et al. (2020) introduce a novel bias-correction method, ST-CORAbico, which significantly improved the storm prediction accuracy by leveraging satellites' spatiotemporal data. It addresses two significant sources of systematic error in satellite data: displacement and volume. The results from the study indicate that ST-CORAbico considerably reduces the root mean square error (RMSE) and bias of GPM IMERG data. This method also shows a lower impact on the spatial correlation of the storm event, which is crucial for maintaining the integrity of the storm's spatial structure. Iqbal et al. (2022) combine ML classifiers and regression models to correct biases in estimating rainfall occurrence and the amount of rainfall. The study demonstrated that the bias-corrected IMERG data significantly reduced the RMSE compared with the original data, indicating the potential of such advanced bias-correction methods for hydro-climatic studies.
The choice between these methods should be guided by the specific requirements of the prediction task at hand, considering the strengths of each approach in the context of the desired application. QM remains a fundamental technique due to its comprehensive correction capabilities and effectiveness in dealing with extremes, which is crucial for accurate hydrological predictions (Gudmundsson et al. 2012; Yang et al. 2016). The adaptability of QM to different climatic conditions, its simplicity and lower data requirements, and its ability to correct a wide range of biases offer a more practical and equally effective solution for bias-correction in hydrological applications for a particular case. While ANN algorithms and Multiple Linear Regression (MLR) provide a direct approach to streamflow prediction, QM is an essential post-processing step to ensure that the predictions are free from biases and accurately reflect observed conditions, thus providing a comprehensive approach to runoff modelling. Integrating QM with predictive models such as ANNs and ML could lead to more reliable streamflow forecasts and represent a significant step forward in the future to predict runoff and manage water resources effectively.
This distribution-based method focuses on the statistical distribution of precipitation values and aligns the cumulative distribution functions (CDFs) of SPEs with the CDFs of observed gauge data (Themeßl et al. 2012). By matching the SPEs and observed CDFs, QM ensures that the statistical properties of SPEs closely resemble those of the observed data. This method is particularly adaptable in scenarios where RT reference data may be unavailable or inaccurate, as it utilises historical data independently and does not rely on concurrent observations (Enayati et al. 2021; Tumsa 2022; Beyene et al. 2023). The effectiveness of error-correction approaches using the QM method has been demonstrated not only for precipitation products from climate models but also for reducing errors in satellite precipitation products. For instance, Yang et al. (2016) employ the QM method along with a Gaussian weighting interpolation scheme to correct Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Cloud Classification System (PERSIANN-CCS) precipitation data (Hong et al. 2004) for Chile, providing evidence of this method's success. In a separate study (Katiraie-Boroujerdy et al. 2020), empirical CDFs were utilised for each climate region in Iran, yielding promising results in reducing the errors of PERSIANN-CCS precipitation data. Numerous studies have utilised the QM method to calibrate satellite precipitation data as input for hydrological models (Valdés-Pineda et al. 2016; Tumsa 2022).
While it has been successful in various scientific contexts, QM has yet to be specifically applied to satellite-derived precipitation data in Vietnam. By integrating QM, the study bridges the gap between global-scale satellite products, specifically focusing on SPE products (CMORPH and IMERG-F), and basin-specific hydrological processes in a region with complex topography and seasonal monsoons, emphasising a finer temporal resolution and integrating it into hydrological modelling. Therefore, this study aims to enhance the accuracy of satellite precipitation data using the QM technique for the LRB and assesses the impact of enhanced precipitation data on hydrological predictions. The findings have broader implications for enhancing the accuracy of SPEs for hydrological models in other regions, particularly those with similar climatic and geographical characteristics.
Through the integration of calibrated SPEs into hydrological models, researchers are equipped to enhance the simulation and prediction of water resource availability, flood events, and ecosystem dynamics. Consequently, this contributes significantly to the improvement of water resource management reliability and decision-making processes, and climate studies within the LRB. Section 2 describes the LRB's unique hydrological context and data sources. Section 3 presents the methodology, including QM for spatially correcting SPE data across the LRB and the Soil and Water Assessment Tool (SWAT) model for the integration of the corrected SPEs into hydrological predictions, followed by the results and their interpretation in Section 4. The discussion in Section 5 contextualises our findings within hydrological processes and identifies avenues for future research. Finally, Section 6 concludes the paper by summarising key contributions and emphasising the significance of integrating QM for improved satellite precipitation correction into hydrological modelling.
STUDY AREA AND DATA
Study area
The LRB, also known as the Ca River Basin, is located within the tropical region and is one of the largest transboundary river basins in North Central Vietnam (Figure 1). Covering an area of 27,200 km2, the upper basin extends into the Lao People's Democratic Republic (comprising 35%), while the lower basin lies within Vietnam (comprising 65%) (Pham et al. 2023). Annual precipitation within the basin ranges from 1,100 to 2,500 mm, delineated into two distinct seasons: a rainy season spanning from May to October, and a dry season from November to April.
During the rainy season, characterised by hot and humid weather influenced by the south-west monsoon, average temperatures hover around 27 °C, with peak temperatures reaching up to 44.2 °C in May. Conversely, the dry season experiences cold and arid conditions influenced by the north-east monsoon, with average temperatures around 19 °C and minimum temperatures dropping to −0.5 °C.
The hydrological patterns within the LRB are representative of a tropical climate, characterised by an uneven distribution of streamflow across the year. The peak flow occurs predominantly during the rainy season, while flow rates diminish significantly during the dry season. The total observed streamflow at the Dua station amounts to approximately 12 billion m3 annually, with around 8.6 billion m3 (72%) attributed to the rainy season. The peak flow typically transpires between September and November, whereas the lowest flows are recorded between February and April, with a minimum observed flow of around 0.11 billion m3 (equivalent to 41 m3/s).
The topography of the LRB is characterised by its complexity, diversity, and fragmentation, with elevations ranging from 0 to 2,750 m and an average slope of 18.3%. The density of rivers and streams is approximately 0.6 km/km2 (Phuong et al. 2023). However, the density of rain gauge stations is notably low, with only about 0.63 stations per 1,000 km2, falling below the WMO's standards for mountainous regions (WMO 2008), which typically recommend around 2.5 stations per 1,000 km2. This scarcity of gauge stations is particularly pronounced in the upper reaches of the basin compared with other areas. Consequently, improving the accuracy of precipitation data, which serve as the input for hydrological models, holds paramount importance for assessing water resources and conducting water resource inventories to support socio-economic development within the basin.
Data
Two primary sources of data are utilised to generate corrected SPEs. Satellite-based rainfall data are sourced from the GPM and CPC of the United States National Weather Service, while daily gauge rainfall and discharge data are obtained from the Vietnam Meteorological and Hydrological Administration. The data used to run the model include SPEs, measurements from hydrological stations, and geological data.
IMERG-F (The Integrated Multi-satellitE Retrievals for Global Precipitation Measurement Final Precipitation V07), with a spatial resolution of 0.1°, has been used for correction. These data are collected from NASA's website (https://disc.gsfc.nasa.gov/). IMERG-F represents the final product of the GPM mission, released after careful adjustment against monthly ground rainfall records and other precipitation estimates. It is generated by integrating observations from multiple satellites and incorporating ground-based observations where they are available (Hou et al. 2014).
CMORPH (Joyce et al. 2004) rainfall data: This product was developed by the CPC of the United States National Weather Service and provides global rainfall estimates based on satellite data. The data have been corrected and reprocessed using the CPC's morphing technique. CMORPH offers rainfall estimates at three different spatial and temporal resolutions, 30 min/8 km, 3 h/0.25°, and 1 day/0.25°, covering the period from 1998 to the present (https://www.ncei.noaa.gov/products/climate-data-records/precipitation-cmorph). Based on the density of ground observation stations and the temporal resolution of the available data within the basin, this study selected the 1 day/0.25° resolution CMORPH data for evaluation and correction.
Meteorological and hydrological data: Rainfall data from 17 meteorological stations and daily streamflow data from the Dua station, covering an area of 20,800 km2 and collected from 2000 to 2022, are used for correction and hydrological modelling. The data are obtained from the Vietnam Meteorological and Hydrological Administration. The statistical properties, including the skewness, coefficient of variation, confidence intervals, distribution characteristics, minimum, maximum, median, etc., of the used data are given in Table S1, Supplementary material.
Geographical and land-use data used for hydrological modelling include topographical maps from USGS-HydroSHEDS (https://www.hydrosheds.org/products) and landuse and soil moisture maps for the basin area in Vietnam and Laos collected from the Ministry of Natural Resources and Environment and Food and Agriculture Organization (https://www.fao.org/).
These data are used to run and calibrate hydrological models. Correction methods such as QM are used to reduce errors in satellite rainfall estimates. The IMERG-final products are rigorously adjusted against monthly ground rainfall records and other available precipitation estimates, enhancing their suitability for hydrological studies. Ground-based precipitation data from monitoring stations were directly utilised as input in the model. Satellite-derived precipitation data (both pre- and post-correction) were converted into average precipitation values for each sub-basin; these are called virtual precipitation stations. Other meteorological data (temperature, solar radiation, relative humidity) were retained unchanged during model runs with different precipitation data sets.
To create precise precipitation data for hydrological simulations for the LRB in Vietnam, a complex region with diverse topography and seasonal monsoons, data from 17 local rain gauges should be combined with satellite-derived data.
METHODS
Correction framework
The implementation process of the QM method, as proposed by Aedo et al. (2021), consists of four steps, which are followed without any modifications: (1) the calculation of statistical indices (mean, standard deviation (SD), skewness, log-skewness) for the monthly precipitation series from each gauge station and each cell in the satellite precipitation data set; (2) the computation of the CDFs for the assigned values of each month using the Kolmogorov–Smirnov test for both observations and satellite precipitation; (3) the application of the CDF of satellite precipitation, determined based on historical satellite precipitation statistics; and (4) the application of the inverse CDF of observed data, determined based on historical observed data statistics, following Equation (1).
In this study, CDFs will be constructed for each month for both gauge stations and IMERG-F and CMORPH precipitation cells. Subsequently, the Thiessen polygon method will be employed to determine the control area of each gauge station. The CDFs of precipitation cells falling within the Thiessen polygon area of each gauge station will be matched with the CDFs of that station. The post-corrected IMERG-F and CMORPH precipitation products are called IMERG-F-cor and CMORPH-cor, respectively. This procedure not only adjusts the mean, SD, and quantiles of the satellite data but also preserves extreme rainfall amounts (Themeßl et al. 2012).
Rainfall–runoff modelling
To assess the effectiveness of these precipitation data sets as inputs for hydrological models, the study utilises the SWAT model to evaluate each type of precipitation data.
SWAT model
The SWAT represents a spatially distributed, continuous-time hydrological model operating at the basin scale. It simulates the movement of water, sediment, nutrients, chemicals, and bacteria within a basin resulting from complex interactions among various factors such as weather conditions, soil properties, stream channel characteristics, vegetation, crop growth, and land-management practices.
is the final soil water content (mm);
is the initial soil water content (mm);
represents time (days);
is the precipitation on day i (mm);
is the surface runoff on day i (mm);
is the evapotranspiration on day i (mm);
is the amount of water from the soil profile inflowing to the vadose zone on day i (mm);
is the base flow on day i (mm).
Model set-up
The SWAT2012 version, integrated with the ArcGIS10.2 interface, was utilised to establish the hydrological model for the LRB. The basin was partitioned into 76 sub-basins, each averaging 30 km2, based on a 30 m ×30 m digital elevation model and a river network map, along with the locations of hydrological monitoring stations within the basin. These sub-basins were further sub-divided into hydrologic response units (HRUs) based on homogeneity in land use, soil type, and terrain. Employing a five-slope classification scheme (0–2, 2–6, 6–15, 15–25, and >25%), the entire basin was divided into 130 HRUs.
The simulation period spanned from 2003 to 2022, encompassing a calibration phase from 2003 to 2010, a validation phase from 2011 to 2022, and a 2-year warm-up period from 2000 to 2002.
Evapotranspiration is determined utilising the Penman–Monteith method. Surface runoff is computed employing the Soil Conservation Service's curve number method, while channel routing employs the variable storage method (Arnold et al. 2012; Kha et al. 2020; Le et al. 2024).
Model calibration and validation
The model calibration process involved optimising parameter values by adjusting the simulated streamflow (Qsim) to match the observed streamflow (Qobs) at a monthly time step. Observed data from the Dua gauging station, situated on major tributaries of the LRB, were utilised for both calibration and validation purposes. The adjustment of the simulated streamflow to the observed streamflow was facilitated by the SWAT-CUP software package (Rouholahnejad et al. 2012; Li et al. 2016; Tuo et al. 2016). The parameterisation of the model entailed adjusting 15 influential parameters known for their general sensitivity (Dinh et al. 2020; Kha et al. 2020; Le et al. 2024). These parameters were related to various processes related to groundwater, surface runoff, evapotranspiration, and infiltration (Table 1). The regionalisation of parameters was carried out based on the specific characteristics of the sub-basins.
No. . | Parameter . | Description . | Range . | CMORPH . | IMERG . | CMORPH-cor . | IMERG-cor . | GAUGE . |
---|---|---|---|---|---|---|---|---|
1 | v__ALPHA_BF | Baseflow alpha factor (days) | [0, 1] | 0.25 | 0.10 | 0.10 | 0.10 | 0.17 |
2 | v__GW_DELAY | Groundwater delay (days) | [0, 500] | 172.13 | 275.13 | 275.13 | 275.13 | 157.83 |
3 | v__GW_REVAP | Groundwater ‘revap’ coefficient | [0.02, 0.2] | 0.06 | 0.09 | 0.09 | 0.09 | 0.14 |
4 | v__GWQMN | Threshold depth of water in the shallow aquifer required for return flow to occur (mm) | [0, 5,000] | 1,493.75 | 1,796.25 | 1,796.25 | 1,796.25 | 4,868.33 |
5 | v__REVAPMN | Threshold depth of water in the shallow aquifer for ‘revap’ to occur (mm) | [0, 500] | 474.13 | 325.88 | 325.88 | 325.88 | 334.50 |
6 | v__CANMX | Maximum canopy storage | [0, 100] | 8.88 | 2.73 | 2.73 | 2.73 | 8.37 |
7 | v__ESCO | Soil evaporation compensation factor | [0, 1] | 0.65 | 0.19 | 0.19 | 0.19 | 0.66 |
8 | v__EPCO | Plant uptake compensation factor | [0, 1] | 0.96 | 0.98 | 0.98 | 0.98 | 0.67 |
9 | v__CH_K2 | Effective hydraulic conductivity in main channel alluvium | [−0.01, 500] | 165.37 | 28.12 | 28.12 | 28.12 | 84.49 |
10 | v__CH_N2 | Manning's ‘n’ value for the main channel | [−0.01, 3] | 0.06 | 0.23 | 0.23 | 0.23 | 0.25 |
11 | v__CH_K1 | Effective hydraulic conductivity in tributary channel alluvium | [0, 300] | 102.53 | 11.93 | 11.93 | 11.93 | 139.10 |
12 | v__CH_N1 | Manning's ‘n’ value for the tributary channels | [0.01, 30] | 7.69 | 27.41 | 27.41 | 27.41 | 10.28 |
13 | r__SOL_K | Saturated hydraulic conductivity | [−1, 470] | 1,601.30 | 1,751.38 | 1,751.38 | 1,751.38 | 251.30 |
14 | r__SOL_AWC | Available water capacity of the soil layer | [−1, 3.3] | −0.94 | −0.71 | −0.71 | −0.71 | −0.86 |
15 | r__CN2 | Soil Conservation Service (SCS) runoff curve number f | [−0.42, 0.065] | 0.00 | 0.01 | 0.01 | 0.01 | −0.22 |
No. . | Parameter . | Description . | Range . | CMORPH . | IMERG . | CMORPH-cor . | IMERG-cor . | GAUGE . |
---|---|---|---|---|---|---|---|---|
1 | v__ALPHA_BF | Baseflow alpha factor (days) | [0, 1] | 0.25 | 0.10 | 0.10 | 0.10 | 0.17 |
2 | v__GW_DELAY | Groundwater delay (days) | [0, 500] | 172.13 | 275.13 | 275.13 | 275.13 | 157.83 |
3 | v__GW_REVAP | Groundwater ‘revap’ coefficient | [0.02, 0.2] | 0.06 | 0.09 | 0.09 | 0.09 | 0.14 |
4 | v__GWQMN | Threshold depth of water in the shallow aquifer required for return flow to occur (mm) | [0, 5,000] | 1,493.75 | 1,796.25 | 1,796.25 | 1,796.25 | 4,868.33 |
5 | v__REVAPMN | Threshold depth of water in the shallow aquifer for ‘revap’ to occur (mm) | [0, 500] | 474.13 | 325.88 | 325.88 | 325.88 | 334.50 |
6 | v__CANMX | Maximum canopy storage | [0, 100] | 8.88 | 2.73 | 2.73 | 2.73 | 8.37 |
7 | v__ESCO | Soil evaporation compensation factor | [0, 1] | 0.65 | 0.19 | 0.19 | 0.19 | 0.66 |
8 | v__EPCO | Plant uptake compensation factor | [0, 1] | 0.96 | 0.98 | 0.98 | 0.98 | 0.67 |
9 | v__CH_K2 | Effective hydraulic conductivity in main channel alluvium | [−0.01, 500] | 165.37 | 28.12 | 28.12 | 28.12 | 84.49 |
10 | v__CH_N2 | Manning's ‘n’ value for the main channel | [−0.01, 3] | 0.06 | 0.23 | 0.23 | 0.23 | 0.25 |
11 | v__CH_K1 | Effective hydraulic conductivity in tributary channel alluvium | [0, 300] | 102.53 | 11.93 | 11.93 | 11.93 | 139.10 |
12 | v__CH_N1 | Manning's ‘n’ value for the tributary channels | [0.01, 30] | 7.69 | 27.41 | 27.41 | 27.41 | 10.28 |
13 | r__SOL_K | Saturated hydraulic conductivity | [−1, 470] | 1,601.30 | 1,751.38 | 1,751.38 | 1,751.38 | 251.30 |
14 | r__SOL_AWC | Available water capacity of the soil layer | [−1, 3.3] | −0.94 | −0.71 | −0.71 | −0.71 | −0.86 |
15 | r__CN2 | Soil Conservation Service (SCS) runoff curve number f | [−0.42, 0.065] | 0.00 | 0.01 | 0.01 | 0.01 | −0.22 |
‘v’ indicates a parameter replacement, and ‘r_’ represents a relative adjustment from the initial parameter values.
Evaluation metrics
The study used several commonly used metrics to assess the performance of post-correction SPEs, the SWAT model, and the impact of post-correction SPEs in streamflow simulation. These metrics included Pearson's correlation coefficient (R), to assess the linear relationship between the observed and simulated streamflow (Pearson 1897); the SD, which provides information about the variability of the simulated streamflow compared with observations; the RMSE and mean absolute error (MAE), to quantify the average magnitude of the errors between the observed and simulated values; the per cent bias (PBIAS), to evaluate the average deviation of simulated values from the observed values; and the Nash–Sutcliffe efficiency (NSE), to measure the model's ability to replicate the observed variance relative to the residual variance (Nash & Sutcliffe 1970). By utilising these well-established metrics, the study was able to gain a thorough understanding of the model's performance (Li et al. 2019; Dinh et al. 2020; Le et al. 2024). Table 2 provides the formulas, range values, and optimal values for these evaluation metrics.
Evaluation metric . | Range values . | Optimal value . | Equation . |
---|---|---|---|
Pearson's correlation coefficient (R) | [−1,1] | 1 | |
MAE | 0 | ||
PBIAS | 0 | ||
NSE | 1 | ||
RMSE | 0 | ||
SD | 0 |
Evaluation metric . | Range values . | Optimal value . | Equation . |
---|---|---|---|
Pearson's correlation coefficient (R) | [−1,1] | 1 | |
MAE | 0 | ||
PBIAS | 0 | ||
NSE | 1 | ||
RMSE | 0 | ||
SD | 0 |
Note: n denotes the number of samples, represents the observed precipitation (or observed streamflow), and signifies the precipitation estimates from the evaluated products (or simulated streamflow).
RESULTS
SPE assessment
Seasonal (rainy/dry) and annual assessment
After applying the QM algorithm for correction, improvements were seen in the spatial distribution and the rainfall ratio between the wet and dry seasons, and in the annual precipitation for both the IMERG-cor and CMORPH-cor data sets. The IMERG precipitation data outperform the CMORPH data in terms of both the rainfall distribution (Figure 3) and value (Figure 4).
Monthly and daily assessment
Time scale . | Precipitation products . | R . | MAE (mm) . | PBIAS (%) . |
---|---|---|---|---|
Monthly | IMERG | 0.89 | 55.87 | 4.61 |
CMORPH | 0.81 | 71.02 | −20.22 | |
IMERG-cor | 0.90 | 48.71 | −2.21 | |
CMORPH-cor | 0.83 | 65.74 | −1.66 | |
Daily | IMERG | 0.47 | 6.08 | 4.61 |
CMORPH | 0.42 | 5.72 | −20.22 | |
IMERG-cor | 0.82 | 2.79 | −2.21 | |
CMORPH-cor | 0.65 | 4.48 | −1.66 |
Time scale . | Precipitation products . | R . | MAE (mm) . | PBIAS (%) . |
---|---|---|---|---|
Monthly | IMERG | 0.89 | 55.87 | 4.61 |
CMORPH | 0.81 | 71.02 | −20.22 | |
IMERG-cor | 0.90 | 48.71 | −2.21 | |
CMORPH-cor | 0.83 | 65.74 | −1.66 | |
Daily | IMERG | 0.47 | 6.08 | 4.61 |
CMORPH | 0.42 | 5.72 | −20.22 | |
IMERG-cor | 0.82 | 2.79 | −2.21 | |
CMORPH-cor | 0.65 | 4.48 | −1.66 |
The QM algorithm, correcting along monthly CDFs, significantly improves the spatial distribution and rainfall amounts for the wet and dry seasons. It also enhances the accuracy of precipitation data at monthly and daily scales compared with ground-based data, as shown by the R, MAE, and PBIAS criteria (Table 3). Additionally, IMERG-cor performs better than other precipitation data sets at monthly and daily scales.
Hydrological modelling assessment
. | R . | NSE . | MAE (m3/month) . | ||||||
---|---|---|---|---|---|---|---|---|---|
Precipitation product . | Calibration . | Validation . | Whole . | Calibration . | Validation . | Whole . | Calibration . | Validation . | Whole . |
Rain gauge | 0.83 | 0.75 | 0.80 | 0.68 | 0.47 | 0.61 | 144.5 | 201.6 | 162.2 |
IMERG | 0.87 | 0.85 | 0.86 | 0.75 | 0.72 | 0.74 | 145.7 | 155.8 | 149.3 |
CMORPH | 0.77 | 0.83 | 0.75 | 0.48 | 0.61 | 0.52 | 178.0 | 178.0 | 178.7 |
IMERG-cor | 0.90 | 0.88 | 0.89 | 0.77 | 0.77 | 0.77 | 133.3 | 129.0 | 132.6 |
CMORPH-cor | 0.83 | 0.82 | 0.79 | 0.61 | 0.57 | 0.60 | 156.3 | 194.2 | 168.3 |
. | R . | NSE . | MAE (m3/month) . | ||||||
---|---|---|---|---|---|---|---|---|---|
Precipitation product . | Calibration . | Validation . | Whole . | Calibration . | Validation . | Whole . | Calibration . | Validation . | Whole . |
Rain gauge | 0.83 | 0.75 | 0.80 | 0.68 | 0.47 | 0.61 | 144.5 | 201.6 | 162.2 |
IMERG | 0.87 | 0.85 | 0.86 | 0.75 | 0.72 | 0.74 | 145.7 | 155.8 | 149.3 |
CMORPH | 0.77 | 0.83 | 0.75 | 0.48 | 0.61 | 0.52 | 178.0 | 178.0 | 178.7 |
IMERG-cor | 0.90 | 0.88 | 0.89 | 0.77 | 0.77 | 0.77 | 133.3 | 129.0 | 132.6 |
CMORPH-cor | 0.83 | 0.82 | 0.79 | 0.61 | 0.57 | 0.60 | 156.3 | 194.2 | 168.3 |
Furthermore, the MAE index reflects reduced absolute discrepancies in the corrected SPEs (132.6 m3/month for IMERG-E and 168.3 m3/month for CMORPH) compared with the uncorrected SPEs (149.3 m3/month for IMERG-E and 178.7 m3/month for CMORPH), highlighting the effectiveness of the correction method in improving the accuracy of precipitation estimates.
In general, our study recommends the QM method for enhancing the accuracy of satellite-derived precipitation data, aiming to improve the precision of hydrological modelling and water resources research in the LRB.
Discussion
Both post-correction SPEs showed significant improvements in various precipitation metrics. The IMERG-cor data notably enhanced the performance of the SWAT hydrological model compared with other precipitation data, including rain gauges.
The NSE index was used to optimise the model parameters for each precipitation data set, with all NSE values exceeding 0.52, meeting the recommendations of Moriasi et al. (2015) for using hydrological models. However, during flood seasons, computed streamflow values were lower than the values in the observed data, raising concerns about capturing extreme rainfall values (Giang et al. 2014) and reflecting the spatial distribution of precipitation data in the SWAT model (Le et al. 2024). Using fully distributed hydrological models enhances the hydrological simulation. Producing more integrated bias-correction frameworks and reliable hydrological projections is crucial, especially in the context of climate change, where extreme hydrological events are becoming more frequent and intense (Meresa et al. 2023).
Despite improvements through QM, challenges with satellite observations and hydrological models persist, potentially impacting model accuracy and reliability. Additionally, it is essential to acknowledge that the findings and methodologies of this study are specific to the LRB in Vietnam. Extrapolating these results to other regions requires careful consideration of regional hydrological characteristics and environmental conditions. Zhang et al. (2021) also underscore the scale-dependent nature of SPEs and the necessity for integrated bias-correction frameworks. Thus, the effectiveness and accuracy of SPEs can vary significantly depending on the applied spatial scale; for smaller catchment areas (less than 20,000 km2, similar to the LRB), the performance is more variable and heavily reliant on the local precipitation accuracy.
Multiple studies have evaluated SPE suitability and hydrological modelling. However, these studies primarily focused on evaluating and comparing the hydrological performance of SPEs without addressing bias-correction. For example, Dinh et al. (2020) modelled the hydrological process using the SWAT model for multiple SPEs for the Mekong River Basin (MRB); the challenge in obtaining observational rainfall data across the entire MRB, due to data-sharing policies among the countries in this region, may have contributed to this oversight. In SPE bias-correction for SWAT hydrological simulation in the Jing River Basin, multi-source weighted-ensemble precipitation data were applied to the IMERG and TRMM products. Similarly, another study concluded that IMERG provided a superior performance, and the NSE of the SWAT model was 0.74. Previous studies have used the SD and deep learning models for adjusting the bias of the SPE product (Le et al. 2020). Nonetheless, these studies did not focus on the hydrological performance of the corrected products. The current research implements a convolutional neural network-based deep learning framework for SPE bias-correction and evaluates the influence of these corrections on the SWAT model.
The results from this study represent a significant advancement in hydrological modelling using QM techniques. The enhanced accuracy of SPEs enables more dependable hydrological simulations, improving our understanding of water resource dynamics. The study findings also enhance our capacity to model and adapt to changing hydrological regimes, ultimately bolstering resilience to climate-induced challenges.
CONCLUSIONS
The study demonstrates that the QM method effectively corrects systematic precipitation errors by adjusting CDFs and utilising Thiessen polygons. The IMERG-cor precipitation data outperformed other data sets, highlighting the robustness of the QM approach. Integrating corrected SPE data with gauge measurements in hydrological simulations using the SWAT model led to a superior performance compared with uncorrected data. Metrics such as the MAE, NSE, and R and the Taylor diagram demonstrated the enhanced accuracy of model outputs when utilising corrected precipitation inputs and that integrating corrected SPEs with gauge measurements in hydrological simulations using the SWAT model significantly improved the model performance compared with uncorrected data. This finding underscores the importance of combining satellite and ground-based data, especially in basins with limited gauge networks. Additionally, the effective use of Thiessen polygons for delineating control areas around gauge stations facilitated accurate CDFs for corrected SPEs. Researchers should explore advanced bias-correction techniques and consider shorter periods to enhance flood simulation capabilities. The study recommends exploring event-based models for short-term flood events and addressing other satellite precipitation data sets. Additionally, integrating ML with traditional statistical methods holds promise for enhancing the precision and reliability of SPEs at various temporal scales for a better hydrological modelling performance.
FUNDING
This research has been conducted under the research project QG.23.19 of Vietnam National University, Hanoi.
AUTHOR CONTRIBUTIONS
N.Y.N., T.N.A., and D.K.D. contributed to conceptualisation and methodology. N.Y.N., T.N.A., D.K.D., and H.D.N. contributed to methodology, material preparation, validation, analysis, writing – original draft, writing – review and editing. N.Y.N., D.K.D., and H.D.N. contributed to data collection. All authors read and approved the final manuscript.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.