Abstract
Accurate precipitation estimates over space and time are critically important, particularly in data-scarce areas, for effective hydrological modeling and efficient regional water resources management. Gridded precipitation datasets are the preeminent alternative in such areas. However, gridded precipitation datasets contain different kinds of uncertainties owing to the retrieval algorithms used in their development. In this study, five precipitation datasets (Tropical Rainfall Measuring Mission (TRMM), Climate Prediction Centre (CPC), APHRODITE, Climate Hazards Group Infra-Red Precipitation with Station data (CHIRPS), and PERSIANN) were evaluated, and an ensemble of daily precipitation datasets from 2001 to 2017 at a resolution of 0.05 degree was created based on three ensemble approaches (Bayesian model ensemble, relative bias-based ensemble, and correlation-based ensemble) over the Upper Indus basin. To improve the accuracy of the ensemble dataset, a linear bias correction technique is applied with respect to gauging precipitation. The accuracy of the bias-corrected ensemble dataset was evaluated using statistical and novelty categorical measures. A reasonable agreement was found between the ensemble and gauge precipitation (Pearson correlation 0.83–0.89 and relative bias 1–8.7 mm/month), while large biases were noted in five precipitation datasets (1.7–53.9 mm/month). The study suggests that utilizing ensemble approaches to gridded precipitation can significantly enhance the accuracy of the estimates compared to relying on a single precipitation dataset.
HIGHLIGHTS
The study developed bias-corrected precipitation estimates using three ensemble approaches.
The new relative bias-based ensemble approach estimates are slightly better than the existing ensemble approaches used in this study.
A nonlinear precipitation increase/decrease trend is found with altitude.
The direct use of gridded precipitation is not recommended due to the large biases present in each precipitation dataset.
INTRODUCTION
The Hindukush-Karakoram-Himalayan (HKH) mountainous region and Tibetan Plateau (TP) possess the largest snow and ice cover on the earth outside of the Polar regions, making them often referred to as the ‘Third Pole’ (Pomee et al. 2020; Pomee & Hertig 2022). The Indus River originates from the same HKH-TP region and passes from China, India, Pakistan, and Afghanistan to sustain the subsistence of more than 215 million people (Jane 2008; Dahri et al. 2016). Defining the relationship between precipitation and the mountain ranges is challenging due to their intricate topography, leading to a lack of clarity regarding precipitation patterns. The extent and pattern of precipitation at high altitudes are currently the most uncertain factors (Immerzeel et al. 2012, 2013, 2015; Ragettli & Pellicciotti 2012; Palazzi et al. 2013; Mishra 2015). The processes affecting precipitation in the HKH regions are characterized by a variety of meteoclimatic regimes and by the interaction of local and large-scale circulation systems, which subsequently cause a large uncertainty in the spatial distribution of gauge precipitation (IPCC 2007). Based on different circulation systems, the HKH region can be divided into two subregions: the western Hindukush-Karakoram, which is largely influenced by the westerly, mid-latitude perturbations that transport precipitation during the winter season (December–January), and the eastern Himalaya, which is subjected to summer monsoon (June–September) precipitation (Archer & Fowler 2004; Syed et al. 2006; Yadav et al. 2012; Palazzi et al. 2014).
The most reliable source of precipitation is gauge observation (Verdin et al. 2016). However, accurate precipitation measurements using gauges are challenging in the high-altitude Indus River basin due to its complex terrain and the sparse gauging network that is mostly located in valleys where large biases exist in the direction of the valley feet (Fowler & Archer 2006; Immerzeel et al. 2012; Khan & Koch 2021). In addition, the strength of the wind to bring moisture and the relief of the mountain range largely affect the amount and intensity of precipitation, causing a higher precipitation rate on the windward side than on the leeward side (Singh & Kumar 1997; Anders et al. 2006; Ghimire et al. 2015). Precipitation is a key component of the hydrological cycle and is the main source of water supplies for human consumption, wildlife, and agriculture. The estimation of the amount of precipitation fall is vital for quantifications of available water. This is essential for various aspects including population health, flood and drought monitoring, water resource allocation and management, disaster risk management, protection of the ecosystem, and economic decisions (Chua et al. 2022a; Zhang et al. 2022). It is, therefore, of utmost importance to obtain an accurate estimate of precipitation for the sustainable management of water resources (Ali et al. 2017).
The precipitation of high spatial resolution plays a crucial role in influencing the exchange of moisture and heat between the land surface and the atmosphere (Fekete et al. 2004; Gottschalck et al. 2005; Tian et al. 2007). Therefore, the development of an accurate and reliable gridded precipitation dataset is important for understanding regional hydrological processes and water resource management (Wilk et al. 2006; Immerzeel et al. 2009; Sorooshian et al. 2011; Wang et al. 2018; Gao et al. 2021; Jiang et al. 2023). The main obstacle in generating gridded data is the insufficient collection of gauge precipitation data in the mountainous region, as there is a lack of a proper gauge network in the area (Kulkarni et al. 2013; Ghimire et al. 2015; Kumar et al. 2015).
Furthermore, the Indus River is transboundary, and collecting precipitation data from riparian countries is challenging. Therefore, many global and regional precipitation datasets have been developed and utilized in various studies concerning the Indus basin (Lutz et al. 2014; Pan et al. 2014; Mishra 2015; Ali et al. 2017; Anjum et al. 2018; Rizwan et al. 2019, 2022; Khan & Stamm 2023). In this perspective, satellite-based precipitation estimates and numerical modeling-based estimates are attractive alternatives. These estimates are widely used due to their ability to provide reasonably fine spatial coverage, but they may tend to underestimate the intensity of extreme precipitation events (AghaKouchak et al. 2011; Andermann et al. 2011). Meanwhile, the numerical products that cover high-altitude river basins rely on gauges situated mostly located at mid-altitude or in the valley areas. As a result, these methods fail to capture the full extent of the spatial and topographical complexity of precipitation (Reggiani & Rientjes 2015; Khan & Koch 2021). In addition, retrieval procedures and the indirect association between satellite remote sensing measurements and precipitation intensities can lead to numerically inaccurate satellite-derived precipitation estimates (Xie & Arkin 1997; Ebert et al. 2007; AghaKouchak et al. 2011).
Merging or an ensemble of multiple precipitation datasets is an effective approach to minimizing uncertainties. Generally, bias correction of satellite precipitation with gauge observation-based data is used to improve the accuracy of the precipitation dataset (Jiang et al. 2023). Several researchers (Shen et al. 2014; Ma et al. 2020, 2022; Hong et al. 2021; Zhu et al. 2021) introduced multiple merging techniques to produce highly accurate precipitation datasets. For example, Li et al. (2021) and Ma et al. (2018) merged the multiple precipitation datasets by allocating weights determined by Bayesian-based methods. These techniques are easy to apply and can assimilate information from multiple sources. Li et al. (2021) developed an efficient precipitation dataset for the southern TP by assimilating three satellite precipitation datasets (SPDs) with dense rain gauge data.
Given the foregoing, it is clear that accurate precipitation estimation holds the key to reliable weather forecasting as well as the prediction of associated natural hazards such as droughts, floods, and landslides (Chen et al. 2015; Rizwan et al. 2022). However, precipitation is one of the highly important components in many studies and the acquisition of reliable precipitation estimates on reasonably fine spatiotemporal scales is quite a challenge for the scientific community (Wang et al. 2017). The improvement of precipitation datasets can be made using different ensemble approaches. Hence, the major objective of this study is to analyze five different precipitation datasets and to develop bias-corrected ensemble precipitation estimates that can be used for the hydrological modeling study.
The novelty of this study is that a new relative bias-based ensemble (RBE) approach is implemented, in addition to two previously used approaches. These ensemble approaches are used to precisely determine the precipitation distribution over the Upper Indus basin (UIB) rather than using a single precipitation dataset and for further hydrological modeling studies.
STUDY AREA
The location map of the Upper Indus basin comprises different mountain ranges (the number refers to corresponding meteorological stations mentioned in Appendix 1).
The location map of the Upper Indus basin comprises different mountain ranges (the number refers to corresponding meteorological stations mentioned in Appendix 1).
In this study, we defined a domain containing the high-altitude region of the Indus basin in the range of 69E-84E/29 N-38 N for an evaluation and setup of an ensemble of gridded precipitation datasets.
DATA DESCRIPTION
Observed data
The gauged precipitation data were collected from the Pakistan Meteorological Department (PMD) and the Water and Power Development Authority (WAPDA). The PMD meteorological stations are mostly located at lower elevations, while the WAPDA meteorological stations are at relatively higher elevations. A total of 32 meteorological station data were collected from both agencies and are shown in Figure 1. The gauges are situated at elevations varying from 305 to 4,730 m asl (refer to Appendix 1).
Gridded datasets
Considerable scientific progress during the last three to four decades has led to the development of several global and regional precipitation-gridded datasets. These datasets are created through various means and are available for water-related studies. In this study, we used five precipitation datasets for evaluation purposes to highlight the associated inaccuracies, and an ensemble of these datasets was prepared. In addition, bias correction was applied to further refine the ensemble dataset before it is used in further studies.
APHRODITE
The Asian Precipitation Highly Resolved Observational Data Integration Toward Evaluation of Water Resources (APHRODITE) provides an accurate daily gridded precipitation dataset over the Asian land area with a high spatial resolution. The APHRODITE gridded precipitation is based on gauge observations collected from the member countries. The APHRODITE-2 project commenced in June 2016 and was concluded in March 2019 by the Japanese Ministry of Environment. The dataset is available for 18 years from 1998 to 2015 at 0.25- and 0.5-degree grid resolutions. We used the latest and improved daily precipitation version (APHRO_MA_V1901) at 0.25-degree resolution. The dataset is available at http://aphrodite.st.hirosaki-u.ac.jp/download/.
CHIRPS
Climate Hazards Group Infra-Red Precipitation with Station data (CHIRPS) combines a more than 30-year quasi-global precipitation dataset. The dataset spans 50°S–50°N (globally) and is available from 1981 to the present. CHIRPS integrates satellite imagery at 0.05 degree resolution with in situ station data to generate gridded precipitation time series. CHIRPS version 2.0 is complete and available to the public from 12 February 2015 (Funk et al. 2015). The dataset can be downloaded from http://chrsdata.eng.uci.edu/.
CPC
The Climate Prediction Centre (CPC) is a global gauge-based analysis of daily precipitation. It assembles approximately 30,000 gauge station data collected from multiple sources, including national and international organizations. Historical records are used for quality control through comparisons and independent information from nearby stations, satellite observations, and forecast models. The daily precipitation dataset also considers orographic effects (Xie et al. 2007). The daily precipitation data are released on a 0.5-degree grid resolution over the global domain for a period from 1979 to the present. This dataset has a ‘retrospective version’ that uses 30,000 stations and spans from 1979 to 2005 and a ‘real-time version’ that uses 17,000 stations and spans from 2006 to the present (Chen et al. 2008; Xie et al. 2010). The dataset can be downloaded from https://climatedataguide.ucar.edu/climate-data/cpc-unified-gauge-based-analysis-global-daily-precipitation.
PERSIANN
The Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) is developed by the Center for Hydrometeorology and Remote Sensing (CHRS) at the University of California, Irvine (UCI). Currently, the neural network function is used to classify/approximate the estimate of the precipitation rate at a 0.25-degree grid resolution. The PERSIANN system was initially established on geostationary infrared imagery and later extended to include the use of both infrared and daytime visible imagery. The PERSIANN algorithm used here is based on geostationary longwave infrared imagery to generate a global precipitation dataset that covers 60°S − 60°N. The dataset is available for 1, 3, 6 h, and daily from March 2000 to the present. The dataset can be downloaded from http://chrsdata.eng.uci.edu/.
TRMM
The Tropical Rainfall Measuring Mission (TRMM) is a joint space mission between NASA and Japan's National Space Development Agency. The mission was launched in November 1997, and its objective was to study and monitor tropical/subtropical precipitation and the associated release of energy. The TRMM used five instruments/sensors to gather data: TRMM Microwave Imager (TMI), Precipitation Radar (PR), Visible Infrared Scanner (VIRS), Lightning Imaging Sensor (LSI), and Clouds & Earths Radiant Energy System (CERES). The TMI and PR are the main instruments used for precipitation. An algorithm assembles the calibration dataset (TRMM 2B31) for TRMM multi-satellite precipitation analysis (TMPA). TMPA 3B43 is a monthly precipitation average, and TMPA 3B42 is a daily and sub-daily (3 h) average at 0.25-degree resolution covering 50°N–50°S globally from 1 January 1998, to 1 January 2020. We used TRMM_3B42_Daily version-7 in this study. The dataset can be downloaded from https://disc.gsfc.nasa.gov/datasets/TRMM_3B42_Daily_V7.
METHODOLOGY
Methodology flowchart for preparation of bias-corrected ensemble precipitation estimates.
Methodology flowchart for preparation of bias-corrected ensemble precipitation estimates.




Some categorical indices, for example, the probability of detection (POD), frequency bias (FB), false alarm ratio (FAR), threat score (TS), equitable threat score (ETS), extreme dependency index (EDI), and symmetrical extreme dependency index (SEDI), were also used to evaluate the predictive efficiency of the precipitation products (Pan et al. 2014; Sunilkumar et al. 2019). These indices were calculated from the contingency table approach (Wilks 2011). Twenty precipitation threshold values (precipitation bins) were selected from 0.1 to 20 mm/day. Table 1 lists the daily precipitation event contingency table calculated by comparing the precipitation values of the datasets and the gauge precipitation.
Contingency table for comparison of gauge precipitation and five precipitation datasets
Gauge precipitation . | Precipitation estimates . | ||
---|---|---|---|
Yes . | No . | Total . | |
Yes | P11 (hits) | P10 (misses) | P11 + P10 |
No | P01 (false alarms) | P00 (correct rejection) | P01 + P00 |
Total | P11 + P01 | P10 + P00 | P11 + P10 + P01 + P00 |
Gauge precipitation . | Precipitation estimates . | ||
---|---|---|---|
Yes . | No . | Total . | |
Yes | P11 (hits) | P10 (misses) | P11 + P10 |
No | P01 (false alarms) | P00 (correct rejection) | P01 + P00 |
Total | P11 + P01 | P10 + P00 | P11 + P10 + P01 + P00 |




RBE of precipitation datasets



Bayesian model averaging-based ensemble of precipitation datasets
Bayesian model averaging (BMA) is a statistical postprocessing method based on Bay's theory and is used to derive the relative weights and variances of individual models in a multimodel ensemble (Raftery et al. 2005). In this study, the BMA weights of individual datasets in the ensemble were computed using Markov Chain Monte Carlo (MCMC) simulation, as explained in the studies by Xie et al. (2009) and Liu et al. (2000). The MCMC simulation uses multiple Markov chains simultaneously to sample the BMA weights and variances preferentially based on their weights in the likelihood function. The MCMC can deal with high-dimensional sampling problems, e.g., precipitation, very efficiently; for further details, interested readers are referred to the studies by Raftery et al. (2005); Xie et al. (2009); Fang & Li (2016); Zhu et al. (2016).
Correlation-based ensemble of precipitation datasets
The correlation coefficient-based weighted ensemble average (CBE) assumes that some precipitation estimates in the ensemble are more accurate than others in terms of the correlation coefficient (Fang & Li 2016). Generally, assigning weights to individual models in the multi dataset ensemble depends on the accuracy of the precipitation dataset. This is calculated by computing the correlation coefficients between the gauge and the gridded precipitation datasets. Then, from the correlation coefficient, the weights of the individual dataset are calculated to combine the same in the ensemble. The details of this method were described by Fang & Li (2016).
Bias correction of the ensemble datasets





RESULTS AND DISCUSSION
Comparison of precipitation datasets and gauge precipitation
Mean annual precipitation estimate over the Upper Indus basin of the five precipitation datasets (TRMM, APHRODITE, CHIRPS, PERSIANN, and CPC).
Mean annual precipitation estimate over the Upper Indus basin of the five precipitation datasets (TRMM, APHRODITE, CHIRPS, PERSIANN, and CPC).
Scatter plot of monthly precipitation estimates of TRMM, APHRODITE, CHIRPS, PERSIANN, and CPC datasets and gauge precipitation.
Scatter plot of monthly precipitation estimates of TRMM, APHRODITE, CHIRPS, PERSIANN, and CPC datasets and gauge precipitation.
Ensemble of precipitation datasets
Weights of precipitation datasets for Bayesian model ensemble average (BME), CBE, and relative bias-based ensemble average (RBE): weights for (a) 2001 − 2015 and (b) 2016 − 2017.
Weights of precipitation datasets for Bayesian model ensemble average (BME), CBE, and relative bias-based ensemble average (RBE): weights for (a) 2001 − 2015 and (b) 2016 − 2017.
Bias correction
(a) Mean annual bias-corrected ensemble precipitation and (b) annual bias-adjusted after bias correction. Note: All values are given in mm.
(a) Mean annual bias-corrected ensemble precipitation and (b) annual bias-adjusted after bias correction. Note: All values are given in mm.
Mean annual gauge precipitation and corresponding grid precipitation of the five precipitation datasets and the three bias-corrected ensemble datasets (relative bias-based ensemble, Bayesian model average ensemble, and correlation-based ensemble).
Mean annual gauge precipitation and corresponding grid precipitation of the five precipitation datasets and the three bias-corrected ensemble datasets (relative bias-based ensemble, Bayesian model average ensemble, and correlation-based ensemble).
Evaluation of precipitation datasets
Continuous metrics, e.g., bias, rBias, RMSE, and r, were computed for the five precipitation datasets, and three ensemble datasets with reference to gauge precipitation are presented in Table 2. A negative bias (underestimation) was observed in all precipitation datasets corresponding to the gauge precipitation. The largest underestimation occurs in the CPC precipitation (−53.9 mm/month), while APHRODITE has the smallest value of underestimation (−1.7 mm/month). However, TRMM and CHIRPS showed a higher correlation coefficient and a low RMSE, which represent better estimates. In addition, the CPC and PERSIANN precipitations have the lowest correlation coefficient values and the highest RMSE values, which signify poor estimates. It may be interesting to know that some researchers have prescribed that the precipitation estimate must have r > 0.7 and a range of rBias ±10%, which is generally an acceptability criterion (Brown 2006; Condom et al. 2011).
Statistical parameters of monthly precipitation for five precipitation and three bias-corrected ensemble datasets with reference to gauge precipitation
. | . | Bias (mm/month) . | rBias (%) . | RMSE (mm/month) . | r . |
---|---|---|---|---|---|
Precipitation datasets | TRMM | −7.4 | −11.3 | 18.6 | 0.93 |
APHRODITE | −1.7 | −2.1 | 29.9 | 0.81 | |
CPC | −53.9 | −82.1 | 68.5 | 0.08 | |
CHIRPS | −4.9 | −7.4 | 22.1 | 0.86 | |
PERSIANN | −31.1 | −47.3 | 60.6 | 0.08 | |
Ensemble approaches | RBE | −1 | −1.5 | 23.6 | 0.86 |
BME | 3.5 (3.9) | 5.9 (5.9) | 27.3 (27.1) | 0.83 (0.83) | |
CBE | 8.7 (9.6) | 14.8 (15) | 30.2 (30.5) | 0.89 (0.88) |
. | . | Bias (mm/month) . | rBias (%) . | RMSE (mm/month) . | r . |
---|---|---|---|---|---|
Precipitation datasets | TRMM | −7.4 | −11.3 | 18.6 | 0.93 |
APHRODITE | −1.7 | −2.1 | 29.9 | 0.81 | |
CPC | −53.9 | −82.1 | 68.5 | 0.08 | |
CHIRPS | −4.9 | −7.4 | 22.1 | 0.86 | |
PERSIANN | −31.1 | −47.3 | 60.6 | 0.08 | |
Ensemble approaches | RBE | −1 | −1.5 | 23.6 | 0.86 |
BME | 3.5 (3.9) | 5.9 (5.9) | 27.3 (27.1) | 0.83 (0.83) | |
CBE | 8.7 (9.6) | 14.8 (15) | 30.2 (30.5) | 0.89 (0.88) |
Notes: The parameter value in the bracket represents the exclusion of CPC precipitation for BME and CBE. RMSE is root mean square error, r is Pearson correlation, and rBias is relative bias.
An intercomparison of the ensemble dataset showed reasonably good performance by the RBE ensemble, i.e., very low values of bias (−1 mm/month), relative bias (−1.5%), and RMSE (23.6 mm/month). However, the RBE has a slightly smaller coefficient of correlation of 0.86 lagging behind the CBE (0.89) ensemble. The lower value of bias and the higher value of the correlation coefficient in RBE were the expected outcomes given how the ensemble was formulated. Furthermore, the BME ensemble's indices lie between those of RBE and CBE. The BME and CBE have the CPC precipitation that has the largest biases, while in RBE, the CPC precipitation was excluded as of the criteria. For the comparison, the statistical parameters were re-calculated while excluding CPC precipitation from BME and CBE, and the results are presented in Table 2. Overall, the RBE approach yielded a better result than the BME and CBE.
The predictive capacity of the precipitation datasets
Novelty categorical indices trend curve for threshold values of precipitation from 0.1 ≥ to 20 ≥ mm/day for five precipitation datasets and three ensemble datasets.
Novelty categorical indices trend curve for threshold values of precipitation from 0.1 ≥ to 20 ≥ mm/day for five precipitation datasets and three ensemble datasets.
To summarize the aforementioned results, the predictive capacity of the precipitation datasets is prone to various errors and data uncertainty. It is observed that APHRODITE underestimated the precipitation in the Hindukush, Karakoram, and Himalayas, while CHIRPS underestimated precipitation over the whole basin relative to the gauge observations. Although TRMM showed better results than other datasets, it is prone to underestimation in the lower part of the Hindukush and Himalayas and slightly overestimates in the Karakoram Range. Based on the evaluation of the five datasets, the performance of TRMM and CHIRPS is better than that of the other datasets, as their coefficients of correlation were 0.93 and 0.86, respectively, while the APHRODITE dataset is better in terms of relative bias (−2.1%). Furthermore, an analysis of the precipitation datasets using novel categorical measures (POD, TS, FAR, FB, ETS, EDI, and SEDI) showed that the APHRODITE precipitation estimates performed better than other datasets. However, the performance of the PERSIANN and CPC datasets remained below the rest in terms of both statistical and novelty categorical measures.
Spatial pattern of seasonal ensemble precipitation estimates
Mean ensemble precipitation for winter (December–February), spring (March–May), summer (June–August), and autumn (September–November) season estimates over the Upper Indus basin (note that all units are in mm).
Mean ensemble precipitation for winter (December–February), spring (March–May), summer (June–August), and autumn (September–November) season estimates over the Upper Indus basin (note that all units are in mm).
Seasonal variation in precipitation with altitude
Seasonal precipitation dependence on altitude over the Hindukush (HK), Karakoram (KK), Himalaya (HM), and South-Westen Tibetan Plateau (SWTP) for four seasons, winter (December–February; DJF), spring (March–May; MAM), summer (June–August; JJA), and autumn (September–November; SON)
Method . | Mountain range . | DJF . | MAM . | JJA . | SON . |
---|---|---|---|---|---|
RBE | HK | 0.73 | 0.89 | 0.77 | 0.82 |
KK | 0.87 | 0.28 | 0.82 | 0.11 | |
HM | 0.19 | 0.65 | 0.01 | 0.43 | |
SWTP | 0.94 | 0.45 | 0.56 | 0.12 | |
BME | HK | 0.82 | 0.86 | 0.71 | 0.78 |
KK | 0.58 | 0.31 | 0.72 | 0.43 | |
HM | 0.32 | 0.64 | 0.04 | 0.42 | |
SWTP | 0.09 | 0.62 | 0.17 | 0.17 | |
CBE | HK | 0.85 | 0.89 | 0.79 | 0.81 |
KK | 0.25 | 0.48 | 0.36 | 0.09 | |
HM | 0.41 | 0.67 | 0.15 | 0.53 | |
SWTP | 0.16 | 0.73 | 0.19 | 0.19 |
Method . | Mountain range . | DJF . | MAM . | JJA . | SON . |
---|---|---|---|---|---|
RBE | HK | 0.73 | 0.89 | 0.77 | 0.82 |
KK | 0.87 | 0.28 | 0.82 | 0.11 | |
HM | 0.19 | 0.65 | 0.01 | 0.43 | |
SWTP | 0.94 | 0.45 | 0.56 | 0.12 | |
BME | HK | 0.82 | 0.86 | 0.71 | 0.78 |
KK | 0.58 | 0.31 | 0.72 | 0.43 | |
HM | 0.32 | 0.64 | 0.04 | 0.42 | |
SWTP | 0.09 | 0.62 | 0.17 | 0.17 | |
CBE | HK | 0.85 | 0.89 | 0.79 | 0.81 |
KK | 0.25 | 0.48 | 0.36 | 0.09 | |
HM | 0.41 | 0.67 | 0.15 | 0.53 | |
SWTP | 0.16 | 0.73 | 0.19 | 0.19 |
Note: The values represent R2 (square of the correlation coefficient).
Altitudinal variation of seasonal precipitation over the Hindukush, Karakoram, Himalaya, and South-Wester Tibetan Plateau for four seasons: winter (December–February), spring (March–May), summer (June–August), and autumn (September–November).
Altitudinal variation of seasonal precipitation over the Hindukush, Karakoram, Himalaya, and South-Wester Tibetan Plateau for four seasons: winter (December–February), spring (March–May), summer (June–August), and autumn (September–November).
The winter season is marked by high precipitation in the southern Hindukush and the western Himalaya (241 and 266 mm, respectively) due to western disturbances, as reported in the literature (Dimri & Mohanty 2009; Ghimire et al. 2015; Rajbhandari et al. 2015; Dimitrov 2016). This trend is repeated in the summer for the western Himalayas due to monsoon precipitation (Kulkarni et al. 2013). The study showed that the lower parts of the western Himalayas received high precipitation (558 mm) compared to the eastern Himalayas. A likely cause for the aforementioned phenomenon may be the existence of a barrier in the southeast of the UIB, which affects the amount and intensity of precipitation and causes high precipitation toward the windward side compared to the leeward side (Singh & Kumar 1997; Anders et al. 2006; Ghimire et al. 2015).
Comparison of previous ensemble and bias correction studies
In the past, several studies have been conducted to use various SPDs along with other datasets for a wide range of applications. These datasets include TRMM (Tropical Rainfall Measuring Mission), IMERG (Integrated Multisatellite Retrievals for the Global Precipitation Measurement), CMORPH (Climate Prediction Center MORPHed Precipitation), PERSIANN, GSMaP (Global Satellite Mapping of Precipitation), TMPA (Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis), ERA5 (European Centre for Medium-Range Weather Forecasts Reanalysis 5), SM2RAIN-ASCAT (Soil Moisture to Rain-Advanced Scatterometer), and CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data).
A study conducted by Soo et al. (2020) focused on modeling extreme flood events using four SPDs (TRMM, IMERG, CMORPH, and PERSIANN). The study used inverse distance weighting-based interpolation of 28 gauge precipitation data and applied three bias correction methods (linear scaling (LS), local intensity scaling (LOCI), and power transformation). The results showed that the original TRMM dataset predicted peak streamflow accurately, while bias-corrected LS-IMERG and LOCI-TRMM datasets showed the highest performance for rainfall and streamflow simulation. Yin et al. (2021) developed a three-stage blending approach to merge three multi-SPDs (IMERG, TMPA, and PERSIANN), the ERA5 dataset, and a gauge dataset. The biases were corrected using LOCI and LS bias correction methods. The study optimized the ‘state weights’ for determining dry/wet events based on individual product evaluation. Furthermore, the ‘intensity weights’ were optimized using the cuckoo search (CS) algorithm and BMA method. The blended dataset showed superior performance, especially with the CS algorithm. Soo et al. (2022) investigated the efficiency of quantile mapping bias correction and kriging merging techniques to improve the accuracy of TRMM and IMERG datasets. The study revealed a remarkable improvement (50%) with a high coefficient of correlation (0.80) in both datasets (QK-TRMM and QK-IMERG). Chua et al. (2022a) developed an enhanced satellite-gauge rainfall analysis over Australia. The study incorporated Australian station data, the GSMaP, and the Australian Gridded Climate Dataset (AGCD) rainfall analysis. Two bias correction methods, linear correction and quantile-to-quantile matching were applied along with blending methods. The dataset corrected linearly and subsequently blended with AGCD using an inverse error variance technique exhibited the highest performance and corrected the patches of excessive rainfall. In another study, Chua et al. (2022b) blended the GSMaP satellite-rainfall dataset with station-based rain gauge data over Australia. The GSMaP estimates were adjusted using rain gauge data through LS that was gridded using ordinary kriging. The adjusted GSMaP data were then blended with the AGCD rainfall analysis using an inverse error variance weighting method. The blended dataset showed the best performance among nongauge-based datasets and exhibited more accurate rainfall patterns in areas without rain gauges. The average mean absolute error against the station data was reduced from 0.89 to 0.31. Khan & Stamm (2023) assessed the performance of four SPDs (IMERG, PERSIANN, CHIRPS, and CMORPH) in predicting daily streamflow. This study employed the LOCI bias correction method and found that the bias-corrected IMERG dataset revealed the highest performance, with an R2 value of 0.96 and a PBIAS value of 0.01%. Zhang et al. (2022) evaluated the adjusted and unadjusted versions of the TMPA, IMERG, and PERSIANN series in a sparsely gauged semi-arid watershed. LS was used to enhance the accuracy of the SPDs. The bias-adjusted SPDs showed higher Nash–Sutcliffe efficiency (>0.34) compared to the gauge data during streamflow simulation. The TMPA series exhibited the most significant performance improvement, making it the preferred choice for daily and monthly simulations. Iqbal et al. (2022) evaluated different SPDs (SM2RAIN-ASCAT, IMERG, GSMap, CHIRPS, and PERSIANN). A novel machine learning (ML)-based bias correction method was employed, which incorporates an ML classifier to correct rainfall occurrence bias and an ML regression model to correct rainfall amount bias. Bias-corrected IMERG showed better performance with a higher correlation coefficient (0.57) and Kling–Gupta efficiency (0.5). IMERG and the bias-corrected IMERG showed an average reduction in root-mean-square error (RMSE) by 55% in simulating observed rainfall.
Previous studies (Soo et al. 2020, 2022; Iqbal et al. 2022; Zhang et al. 2022; Khan & Stamm 2023) have evaluated the performance of several SPDs and applied different bias correction techniques to minimize the biases of the individual dataset using gauge data. Conversely, some studies (Yin et al. 2021; Chua et al. 2022a, 2022b) blended gridded gauge data with SPDs and found that the blending of datasets minimized the biases. In comparison to the previous studies, this study focused on the ensemble of five SPDs based on their weights. The weights were calculated based on the performance of the individual SPD compared to the gauge data. Further bias correction of the ensemble dataset is performed using gauge precipitation data. The bias-corrected ensemble dataset performance was superior in terms of statistical parameters (Table 2).
CONCLUSIONS
The assessment of the water resources of the UIB has always been a key requirement for numerous reasons to support the food and energy needs of one of the most populous regions in the world. A detailed assessment of precipitation distribution is always lacking in this region due to the sparse number of meteorological stations in this high-altitude basin that are necessary to understand the seasonal and spatial precipitation patterns.
This study evaluated the performance of five gridded precipitation datasets (APHRODITE, TRMM, CHIRPS, CPC, and PERSIANN) that are widely available sources of continuous and long-term precipitation records. The analysis showed that all the precipitation datasets carry significant errors that vary spatially as well as temporally. The direct use of these datasets, especially over the high-altitude UIB, is unsuitable for hydrological modeling studies.
Further, this study introduced a new ensemble approach (RBE approach) along with two existing approaches (Bayesian ensemble average and correlation-based ensemble approach) to improve the precipitation estimates over the basin by taking into account the precipitation distribution in different seasons and mountain ranges. The implementation of the three ensemble approaches (RBE, BME, and CBE) and the bias adjustment procedure served to minimize the inaccuracies present in the precipitation datasets. A detailed comparison of the ensemble approaches revealed that the RBE had the lowest relative bias (−1.5%), while the CBE attained the highest correlation coefficient (0.89), which was expected, as the weights of individual precipitation datasets in the ensemble were based on these two indices. Furthermore, the performance of the BME ensemble dataset remains intermediate to that of the RBE and CBE ensembles. The categorical measure also served to highlight the superior detection capability of the RBE precipitation. Overall, the performance of the RBE ensemble dataset was better than that of the other ensemble method. Furthermore, the altitudinal analysis of improved precipitation estimates demonstrates the existence of a nonlinear trend between precipitation change and altitude. The study finds that the two weather systems, namely, the westerly system in the winter season and the Indian summer monsoon, exercise a dominant influence on the water resources of the basin.
This study used a small number of gauges (32) for preparation of the ensemble precipitation and bias correction from 2001 to 2017. In addition, the gauges are generally located in the valleys at altitudes below 5,000 m asl and thus spatially do not cover the whole UIB. Moreover, the accuracy of the ensemble datasets and bias correction is heavily reliant on the constrained availability of gauges.
Further, the implementation of a new ensemble approach based on relative bias showed its utility and can be used in other study areas. This study recommends the use of ensemble precipitation datasets instead of relying on any single dataset along with a sophisticated bias adjustment method before its application for downscaling/bias correction of climate model data or uses in hydrological modeling studies.
ACKNOWLEDGEMENTS
This work is financially supported by The Strategic Priority Research Program of the Chinese Academy of Sciences (XDA20100104) and The National Natural Science Foundation of China (41871280). The authors would like to express their gratitude to the Water and Power Development Authority (WAPDA) and Pakistan Meteorological Department (PMD), Pakistan, for sharing meteorological data. The authors also want to acknowledge the organizations for providing the TRMM, APHRODITE, CHIRPS, PERSIANN, and CPC precipitation datasets.
AUTHOR CONTRIBUTIONS
All authors are involved in the intellectual part of this paper. X.L., K.J., and Y.C. designed the research work. K.J. conducted the research and wrote the draft manuscript. M.R. helped in data analysis. X.L., Y.C., S.H., and S.A. revised the article and provided many suggestions. All authors have read and approved the final manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.