Multi-source data-fusion approaches have been developed for estimating regional precipitation. However, studies considering the specific upper limits of the improved gridded rainfall data for different fusion approaches are limited. Here, the potential ranges of accuracy improvement for satellite and reanalysis rainfall products were addressed using various machine learning fusion approaches, including multivariate linear regression (MLR), feedforward neural network (FNN), random forest (RF), and long short-term memory (LSTM), over the Chinese mainland. All four fusion methods reduce errors in the original precipitation products. The upper limits of accuracy improvement in terms of correlation coefficient (CC) and root mean square error (RMSE) were 30.65 and 15.27%, respectively. M-RF showed the best average CC (0.828) and RMSE (4.62 mm/day) in the four seasons. LSTM performed the best under light rainfall events, whereas MLR and RF exhibited better performance under moderate and heavy rainfall events, respectively. Overall, these results serve as a basis for the fusion approach and technique selection, based on the comprehensive validation in different climate zones, altitudes, and seasons over the Chinese mainland.

  • Machine learning fusion approaches were used for precipitation product estimation.

  • Four different models were used: MLR, feedforward neural network, RF, and LSTM.

  • All four fusion methods reduced errors in the original precipitation products.

  • LSTM showed the best performance for light rainfall events.

  • MLR and RF performed better for moderate and heavy rainfall events.

CC

correlation coefficient

CNN

convolutional neural network

FAR

false alarm ratio

FNN

feedforward neural network

GPM

global precipitation measurement mission

LSTM

long short-term memory

M-FNN

the fusion precipitation data based on FNN

MLR

multivariate linear regression

M-LSTM

the fusion precipitation data based on LSTM

M-MLR

the fusion precipitation data based on MLR

M-RF

the fusion precipitation data based on RF

POD

probability of detection

RB

relative bias

RF

random forest

RGD

rain gauge data

RMSE

root mean square error

The acquisition of rainfall information is critical for understanding global precipitation distribution in the context of climate change (Demaria et al. 2019). Historically, surface instrumentation has been the primary method for collecting rainfall data (Steiner et al. 1995; Lewis et al. 2019; Marzuki et al. 2021). However, uneven distribution of rain gauges can lead to insufficient data for capturing spatial descriptions and local peaks, particularly over areas where gauges are scarce, such as over the oceans (Wimhurst & Greene 2021). Satellite and reanalysis rainfall data have emerged as viable alternatives to address these issues, employing sensors, retrieval algorithms, and assimilation technologies with improved spatiotemporal resolutions (Nguyen et al. 2019; Hersbach et al. 2020). These methods have led to the development of spatially continuous gridded rainfall products, which have become primary inputs for early warning systems and hydrological modelling (Ma et al. 2020).

The systematic validation of satellite and reanalysis rainfall products using in situ records or operating hydrological models has revealed that their accuracy varies with the area of intersection (Jiang et al. 2021; Moazami & Najafi 2021). In a study by Tang et al. (2020), 10 mainstream rainfall products with diverse climatic conditions in China were evaluated, demonstrating that microwave–infrared combined satellite products perform better than infrared-based products and that gauge adjustment plays an important role in accuracy improvement. While topography strongly affects the performance of reanalysis datasets, they are more reliable than satellite-based rainfall products for winter rainfall. However, reanalysis products overestimate topographic rainfall, whereas satellite-based estimates underestimate it (Sun et al. 2018). Quagraine et al. (2020) reported inconsistent accuracy in the products when estimating monsoon rainfall in a comparison of five reanalysis datasets over West Africa.

Pradhan et al. (2022) systematically reviewed the accuracy of global precipitation measurement and suggested that the performance of gridded rainfall products varies with region, topography, climatic conditions, and rainfall intensity. Moreover, most gridded rainfall products underestimate the intensity of extreme rainfall (Wang et al. 2021b). Errors in satellite-based estimation can be attributed to limited retrieval algorithms, poor sampling frequency, and insufficient bias correction (Dosio et al. 2021), whereas numerical model issues and unremoved input source errors largely constrain the reliability of reanalysis products (Janjić et al. 2018).

Multilevel efforts, including advanced retrieval algorithms (Skofronick-Jackson et al. 2018), bias correction (Hashemi et al. 2017), and data fusion (Wang et al. 2021a), have been made to address these issues. One of these efforts includes the use of multi-source data-fusion products that improve global or regional rainfall estimation by complementing accuracy characteristics from various publicly available datasets, which may be categorized into two main types.

The first type of data fusion characterization is the weighting fusion data, which describes the transform coefficient of the input datasets. In early research, a global monthly fusion product of 2.5° was created using the maximum likelihood estimation method (Xie & Arkin 1997). However, despite researchers' best efforts, the spatiotemporal resolution was coarser for regional applications. To utilize the complementarity of different sources, Beck et al. (2017) developed the multi-source weighted-ensemble precipitation product (MSWEP), and weight maps of ground, satellite, and reanalysis rainfall estimates were generated using the correlation magnitude with rain gauges. Version 2 of MSWEP improved the spatial resolution from 0.25° to 0.1° with an improvement in the input data and fusion algorithm (Beck et al. 2019). Nonetheless, there may still be uncertainty in these products over gauge-scarce areas (Liu et al. 2019). To consider the state weights of different rainfall conditions, Yin et al. (2021) used the Bayesian model averaging method to optimize the dynamic weights of satellite and reanalysis datasets. Validation over China showed that their results were better than those of MSWEP v2.

The other type of data fusion relies on machine learning methods, which are popular in Earth systems research due to their competitive advantages in automatically extracting space-time information from data (Reichstein et al. 2019). Unlike the MSWEP method, the weighting procedure is largely influenced by prior knowledge, and machine learning strictly adheres to statistical transformation (Adnan et al. 2023; Mostafa et al. 2023). Moreover, machine learning-based approaches such as random forest (RF), long short-term memory (LSTM), and artificial neural network (ANN) have demonstrated their effectiveness in multi-source data-fusion by improving the overall accuracy of gridded precipitation products. Bhuiyan et al. (2018) used a non-parametric technique (quantile regression forests) to optimally fuse multi-source rainfall datasets to generate ensemble rainfall fields by combining land surface conditions and atmospheric variables over the Iberian Peninsula. However, due to the sensitivity of the approach to topographical conditions, challenges associated with complex terrain resulted in the underestimation of heavy rainfall (Ehsan Bhuiyan et al. 2019). To investigate whether an RF can improve the spatiotemporal prediction of precipitation over data-scarce regions, Baez-Villanueva et al. (2020) proposed a fusion framework based on RF. They found that the RF performs well and without overfitting, although it reduces the training size of rain gauges by 10%. Shen et al. (2022) developed a hierarchical fusion model that integrated the LSTM to improve the accuracy of multi-satellite rainfall over Hanjiang River, China. Chen et al. (2023) compared the performance of an ANN with that of a convolutional neural network (CNN) and the extended triple collocation approach for fusing rainfall over the Tibetan Plateau. Yuan et al. (2018) calibrated the parameters of the LSTM using Ant Lion Optimizer, which increased the accuracy of the LSTM model in forecasting monthly runoff. In addition, Li & Yuan (2023) observed the default LSTM model provides skilful streamflow forecast for most watersheds, while a cascade LSTM achieves further improvements. Furthermore, Ikram et al. (2023) used weighted mean of vectors optimizer (INFO) to optimize the learning rate and hidden neurons, which outperformed the default LSTM model.

In general, using different machine learning approaches in the design of a data fusion framework could improve the capabilities of the methods in hydrological applications. However, most studies have focused primarily on the overall accuracy of a single fusion algorithm, without sufficiently considering the specific upper limits of the improved gridded rainfall data for different fusion approaches. In addition, the reliability of machine learning models is sensitive to the size of training and auxiliary data, and the differences in fusion capabilities under various climatic and topographic conditions require further discussion. There are still gaps in our understanding, including (1) the potential ranges of accuracy improvement for satellite and reanalysis rainfall products using different machine learning fusion approaches and (2) whether the grid-to-grid fusion approach promotes the reliability of rainfall data based on non-homogeneous regions (climatic or altitudes) and conditions (seasonal or levels).

To address these challenges, we selected the Chinese mainland as the study area. The climate systems (i.e., five major climate zones; Shi & Yang (2020)) and terrains of the Chinese mainland are highly complex and distinct, allowing for a better assessment of the applicability of different approaches. Four conventional machine learning models were selected, namely MLR (linearized), FNN, RF, and LSTM (all three are non-linearized). The fused data included four satellite-based precipitation products and one reanalysis from 2010 to 2015. The obtained fused datasets, M-MLR, M-FNN, M-RF, and M-LSTM, were compared to the original datasets. The results of the present study could provide valuable references for the fusion approach and technique selection, based on the comprehensive validation in different climate zones, altitudes, and seasons over the Chinese mainland.

Study area

The study area is the Chinese mainland, located in East Asia, with complex terrain. To compare the effects of fusion methods in different climate regions, the study area was divided into four climate zones (plateau, temperate monsoon, temperate continental, and subtropical monsoon climate zones) according to Jiang et al. (2021).

Datasets and prepossessing

The datasets used in our research were as follows. (1) Daily precipitation records (2010–2015) across the Chinese mainland, produced by the China Meteorological Data Service Centre. The distribution of the rain gauges is shown in Figure 1(a). After quality control, 84, 215, 140, and 355 rain gauges were available for the plateau, temperate monsoon, temperate continental, and subtropical monsoon climate zones, respectively. (2) Four satellite-based precipitation products and one reanalysis precipitation product (Climate Prediction Center Morphing technique satellite-gauge blended product, CMORPH-BLD; Tropical Rainfall Measuring Mission 3B42 version 7, 3B42V7; Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record, PERSIANN-CDR; Climate Hazards Group InfraRed Precipitation with Station data version 2.0, CHIRPS; and European Centre for Medium-Range Weather Forecasts Reanalysis 5, ERA5). Satellite-based precipitation products were developed by NASA, JAEA, NCWEP, and the University of California. ECMWF developed ERA5. All selected gridded and fusion precipitation products had a spatial resolution of 0.25° and were accumulated to a time resolution of 1 day during preparation. (3) Shuttle Radar Topography Mission (SRTM) version 3 with a spatial resolution of 90 m was used. SRTM collects radar topographic mapping data jointly provided by NASA and the National Imaging and Mapping Administration.
Figure 1

Study area showing the (a) distributions of four climate zones with rain gauge locations and (b) elevation.

Figure 1

Study area showing the (a) distributions of four climate zones with rain gauge locations and (b) elevation.

Close modal

Gridded precipitation products were accumulated at the daily scale and normalized into [0,1] using the MinMaxScalar function in Python 3.6. The ‘grid-to-point’ method was used to extract the normalized gridded precipitation data corresponding to each rain gauge and obtain the time sequence , where . The variable t shows the number of days and y shows the index of the five gridded precipitation products.

The data from 792 rain gauges were randomly divided into 10 groups using 10-fold cross-validation; 9 sets were used for training and 1 was used for testing.

Methods

Four fusion approaches were used: MLR, FNN, RF, and LSTM. Training and verification sets were constructed using the leave-one-out method. Additional details on the model are provided in our previous research (Fan et al. 2021). The algorithms are outlined in detail in Table 1.

Table 1

Specifications of the four merging methods

ApproachSpecifications
MLR Least squares method 
FNN Optimizer: L-BFGS 
Number of neurons in the hidden layer: 15 
Iterations: 2,000 
Learning rate: 0.05 
RF Number of trees: 200 
Minimum number of samples required for a leaf node: 6 
Number of features to consider when searching the optimal split: 3 
LSTM Number of neurons in the hidden layer: 4 
Optimizer: Adam 
Epochs: 1,000 
Learning rate: 0.001 
Batch size: 30 
ApproachSpecifications
MLR Least squares method 
FNN Optimizer: L-BFGS 
Number of neurons in the hidden layer: 15 
Iterations: 2,000 
Learning rate: 0.05 
RF Number of trees: 200 
Minimum number of samples required for a leaf node: 6 
Number of features to consider when searching the optimal split: 3 
LSTM Number of neurons in the hidden layer: 4 
Optimizer: Adam 
Epochs: 1,000 
Learning rate: 0.001 
Batch size: 30 

Multivariate linear regression

MLR is a widely used regression analysis method (Korkmaz 2021). We used the ordinary least squares method to estimate the parameters. The MLR model was as follows:
(1)
where represents the fusion result for gauge-generated data based on MLR, represent the weights corresponding to gridded precipitation products, and is the intercept.

Feedforward neural network

FNN is a popular method owing to its advanced self-organization abilities and capability to approach any non-linear continuous mapping problem (Xue & Cui 2019). FNN is composed of input, hidden, and output layers. The information in the network is only transmitted forward, and the neurons between layers are completely connected without forming a cycle.

In this study, the numbers of neurons in the input and output layers of the FNN model were set to five and one, respectively. The number of hidden layers was set to one. After comparison, the stochastic gradient descent (SGD) method was selected as the optimizer because its performance is better than that of the adaptive moment estimation method and memory-limited Broyden–Fletcher–Goldfarb–Shanno method. The learning rate was set to 0.05, and the number of iterations was set to 2,000.

Random forest

RF is a machine learning algorithm proposed by Breiman (2001) composed of multiple decision trees. Although decision trees, as a non-parametric method, perform well in regression and classification problems, the training of decision trees is prone to overfitting. RF addresses this problem. The RF model was as follows:
(2)
where e is a bootstrap sample, X is input data for the test set, is an independent decision tree, is the number of decision trees, and is the fusion result.

The number of decision trees and samples contained in each leaf node were set to 200 and 6, respectively, as they performed best from candidate sets {50, 100, 200, 400, 800} and {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}, respectively. The number of features, considered when determining the best segmentation was set to three.

Long short-term memory network

LSTM was proposed by Hochreiter & Schmidhuber (1997) as a variant of a recurrent neural network (RNN). The structure of the hidden layer unit is more complex in LSTM than in a traditional RNN. A basic LSTM unit consists of one memory neuron and three gates. LSTM implements temporary storage via the switches of these gates and memory neurons to prevent the problem of vanishing gradients (Yuan et al. 2020).

LSTM processes the input vector at time t as follows:
(3)
(4)
(5)
(6)
(7)
(8)
where I is the input gate, F is the forget gate, O is the output gate, C is the memory neuron, S is the input vector, H is the vector of the hidden layer, W is the weight among the three types of gates or memory neurons and input, U is the weight among the three types of gates or memory neurons and hidden layer states, is the bias of the three types of gates and memory neurons, and σ is the activation function sigmoid. If the subscript of an element is t, it is a variable at time t; if the subscript of an element is , it is a variable at time .

The fusion result was generated by adding a fully connected layer after the LSTM layers. The optimizer of the entire model was set to Adam because it performed better than SGD and root mean square propagation (RMSprop). The number of hidden layer neurons, was set to four because it performed best from the candidate set {2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40}. The learning rate was 0.001 and the batch size was 30. In addition, to prevent overfitting during model training, this study used the early stopping method (the model completes training within 1,000 iterations each year).

Accuracy evaluation methods

The CC, RMSE, and relative bias (RB) were used to evaluate the statistical error of the fusion results. CC reflects the synchronization of the fusion precipitation data (or gridded precipitation product) and precipitation change in rain gauge data (RGD); RMSE reflects the average error between the fusion precipitation data (or gridded precipitation product) and RGD; and RB shows the probability of overestimating (RB > 0) or underestimating (RB < 0) the surface rainfall in the fusion precipitation data (or gridded precipitation product).

The probability of detection (POD) and false alarm ratio (FAR) were used to reflect the precipitation detection capability of the fusion results. POD reflects the probability that the fusion precipitation data (or gridded precipitation product) can correctly detect precipitation events and FAR reflects the probability that RGD do not monitor precipitation events while the fusion precipitation data (or gridded precipitation product) monitor precipitation events.

For precipitation data, the higher the CC, the smaller the RMSE, the closer the RB is to 0, the higher the POD, the smaller the FAR, and the better the accuracy of the data. The formulas for these indicators are shown in Table 2.

Table 2

Accuracy evaluation formulas

IndexCalculation formulaUnit
CC  
RMSE  mm 
RB  
POD  
FAR  
IndexCalculation formulaUnit
CC  
RMSE  mm 
RB  
POD  
FAR  

is the number of data being evaluated; is the value of the -th sample in RGD; is the value of the -th sample in the fusion precipitation data (or gridded precipitation product); and are the average value of RGD and fusion precipitation data (or gridded precipitation product), respectively; H is the number of days that both the RGD and fusion precipitation data (or gridded precipitation product) monitor precipitation events; M shows the number of days when only RGD can monitor precipitation events; and F shows the number of days that only the fusion precipitation data (or gridded precipitation product) can monitor precipitation events.

Overall performance

Figure 2 shows scatter plots of the original and fusion precipitation results against RGD. The original gridded precipitation products generally exhibited larger scatter against the RGD, whereas the four fused precipitation datasets were primarily concentrated near the diagonal. Box plots of the three evaluation indicators (Figure 3) show that CMORPH-BLD had the best accuracy among the original precipitation products over the entire study area, with an average CC of 0.72, an RMSE of 6.09 mm/day, and an RB of 1.00%. In addition to the evaluation results, the four fusion methods generally improved the accuracy of the gridded precipitation products. The RMSE and CC of the four fused precipitation datasets were substantially improved, and M-RF showed the best results among the four datasets. Compared with those of CMORPH-BLD, the RMSE of M-RF was reduced by 15.27% and the CC was increased by 12.50% (Table 3). In terms of RB, the degrees of improvement based on the four fusion methods were insufficient, particularly for LSTM, which had a negative bias.
Table 3

Degree of error improvement for fusion precipitation data

Fusion dataCC of IMA (%)RMSE of IMA (%)CC of IMB (%)RMSE of IMB (%)
M-MLR 29.03 25.03 11.11 12.64 
M-FNN 29.03 24.89 11.11 12.48 
M-RF 30.65 27.28 12.50 15.27 
M-LSTM 30.65 26.16 12.50 13.96 
Fusion dataCC of IMA (%)RMSE of IMA (%)CC of IMB (%)RMSE of IMB (%)
M-MLR 29.03 25.03 11.11 12.64 
M-FNN 29.03 24.89 11.11 12.48 
M-RF 30.65 27.28 12.50 15.27 
M-LSTM 30.65 26.16 12.50 13.96 

IMA and IMB are the improvements in the fused precipitation data compared with the average and optimal values of the gridded precipitation products, respectively.

Figure 2

Scatter plots of the selected gridded and fusion precipitation datasets against the rain gauge data.

Figure 2

Scatter plots of the selected gridded and fusion precipitation datasets against the rain gauge data.

Close modal
Figure 3

Evaluation results of the selected gridded and fusion precipitation datasets against rain gauge data.

Figure 3

Evaluation results of the selected gridded and fusion precipitation datasets against rain gauge data.

Close modal
Box plots of the evaluation of rainfall detection capability (Figure 4) show that CMORPH-BLD had the best capability of detecting rainfall events among the original precipitation products over the entire study area, with an average POD of 0.94 and an FAR of 0.43. The POD of the four fusion precipitation datasets was higher than the average of the original precipitation products; M-RF and M-LSTM showed the best results among the four datasets. Compared with that of CMORPH-BLD, which performed best among the original precipitation products, the FAR of M-RF and M-LSTM decreased by 18.60 and 20.93%, respectively. However, the FAR (0.68) of M-FNN was the worst, and the improvement degree was −53.15% when compared with the average of the gridded precipitation products (Table 4).
Table 4

Degree of improvement in the capability of detecting precipitation for the fusion precipitation data

Fusion methodPOD of IMA (%)FAR of IMA (%)
M-MLR 13.52 23.42 
M-FNN 27.55 −53.15 
M-RF 17.35 21.17 
M-LSTM 13.52 23.42 
Fusion methodPOD of IMA (%)FAR of IMA (%)
M-MLR 13.52 23.42 
M-FNN 27.55 −53.15 
M-RF 17.35 21.17 
M-LSTM 13.52 23.42 

IMA is the improvement in the fused precipitation data compared with the average of the gridded precipitation products.

Figure 4

Capability of detecting precipitation events against rain gauge data.

Figure 4

Capability of detecting precipitation events against rain gauge data.

Close modal

Accuracy evaluation for various climatic regions

The study area was divided into four climatic regions: plateau mountainous, temperate monsoon, temperate continental, and subtropical monsoon. Figures 5 and 6 show the spatial distributions of CC, RMSE, RB, POD, and FAR of the selected gridded and fusion precipitation datasets, respectively. The fusion precipitation datasets performed better in all four climatic regions.
Figure 5

Spatial distribution of CC, RMSE, and RB. (a) CMORPH-BLD, (b) 3B42V7, (c) PERSIANN-CDR, (d) CHIRPS, (e) ERA5, (f) M-MLR, (g) M-FNN, (h) M-RF, and (i) M-LSTM.

Figure 5

Spatial distribution of CC, RMSE, and RB. (a) CMORPH-BLD, (b) 3B42V7, (c) PERSIANN-CDR, (d) CHIRPS, (e) ERA5, (f) M-MLR, (g) M-FNN, (h) M-RF, and (i) M-LSTM.

Close modal
Figure 6

Spatial distribution of probability of detection (POD) and false alarm ratio (FAR), corresponding to Figure 5.

Figure 6

Spatial distribution of probability of detection (POD) and false alarm ratio (FAR), corresponding to Figure 5.

Close modal

Except for that of ERA5, the CC of precipitation data decreased from southeast to northwest. M-RF had the highest CC in all climatic regions. Moreover, the four fusion methods had the best effect on improving CC in the temperate monsoon climate region and reduced the spatial difference in CC in the other regions.

The RMSE of all precipitation data decreased from southeast to northwest. Additionally, the RMSE of the four fusion precipitation datasets based on MLR, FNN, RF, and LSTM performed better than that of the five gridded precipitation datasets; in all regions, M-RF had the lowest RMSE. Data fusion reduced the spatial difference in RMSE in all climatic regions.

The RB of all precipitation data decreased from southeast to northwest. Additionally, the RB of the fusion precipitation data was improved, although M-LSTM underestimated the results in the temperate monsoon and temperate continental climate regions.

The POD of the fusion precipitation datasets was significantly better than that of 3B42V7, PERSIANN-CDR, and CHIRPS; M-RF performed the best for the fusion precipitation data.

The FAR of the fusion precipitation datasets was lower than that of gridded precipitation datasets, except 3B42V7, and increased from southeast to northwest.

Figure 7 shows the Taylor diagrams of the selected gridded and fusion precipitation results in distinct climate zones. The four fusion results exhibited high correlation and low error with observations among all climate zones. However, they underestimated the spatial variance based on the standard deviation.
Figure 7

Taylor diagrams of the selected gridded and fusion precipitation datasets in different climate zones. (a) Subtropical monsoon climate, (b) temperate monsoon climate, (c) temperate continental climate, and (d) plateau mountainous climate.

Figure 7

Taylor diagrams of the selected gridded and fusion precipitation datasets in different climate zones. (a) Subtropical monsoon climate, (b) temperate monsoon climate, (c) temperate continental climate, and (d) plateau mountainous climate.

Close modal

Accuracy evaluation in different seasons

The precipitation datasets were categorized by season, including spring (March–May), summer (June–August), autumn (September–November), and winter (December–February of the following year).

Figure 8 presents the CC, RMSE, RB, POD, and FAR of the selected gridded and fusion precipitation datasets for each season. The CC of all four fusion precipitation datasets for each season exceeded that of all gridded precipitation products. M-RF demonstrated the highest CC among the fusion precipitation datasets (0.82, 0.78, 0.83, and 0.88 in four seasons, respectively). However, the differences in CC between M-RF and other fusion precipitation datasets were negligible, with a maximum difference of 0.02 observed in the same season.
Figure 8

Evaluation results for the selected gridded and fusion precipitation datasets in different seasons.

Figure 8

Evaluation results for the selected gridded and fusion precipitation datasets in different seasons.

Close modal

The RMSE of the gridded and fusion precipitation products was the highest in summer and lowest in winter, and this trend was maintained in the fusion precipitation datasets. M-FNN performed the worst among fusion products; nevertheless, its RMSEs in each season (4.81, 8.25, 4.29, and 1.85 mm/day in four seasons) were superior to those of all gridded precipitation products whereas M-RF had the lowest RMSEs (4.61, 8.02, 4.18, and 1.68 mm/day) among the fusion precipitation products.

The RB of the gridded and fusion precipitation datasets did not show seasonal characteristics. Among the fusion methods, only RF improved RB significantly. The RB of M-RF was closer to 0 than that of all gridded precipitation products in the four seasons.

For gridded and fusion precipitation products, PODs were the highest in summer. The fusion precipitation datasets exhibited similar trends. The fusion precipitation datasets performed better than 3B42V7 and CHIRPS. However, only the POD of M-RF was higher than those of all gridded precipitation products in spring (0.96), summer (0.97), and autumn (0.95). However, the POD of M-RF in winter (0.88) was lower than that of ERA5.

The FAR of all precipitation datasets, excluding M-RF, were the highest in winter. M-MLR performed the best in the fusion precipitation datasets, and its FAR was the lowest in spring (0.36), summer (0.35), and autumn (0.38) and second only to that of M-RF in winter (0.38), which was significantly lower than those of the other two fusion precipitation datasets.

M-RF showed the smallest CCs and RMSEs in the four seasons and the best capability of detecting daily rainfall events in spring, autumn, and winter. Furthermore, M-MLR performed the best in detecting daily rainfall events in summer when compared with other fusion precipitation datasets.

Accuracy evaluation at various altitudes

Three topographic classes were established (low-altitude, <1,500 m; medium-altitude, 1,500–3,500 m; and high-altitude, ≥3,500 m) to evaluate the performance of the four fusion methods. For each topographic condition, the fusion precipitation datasets showed better CC and RMSE than the original data (Figure 9(a)). The improvements in CC and RMSE over the original data were more significant at higher altitudes.
Figure 9

Evaluation results for the selected gridded and fusion precipitation datasets at (a) different altitudes and (b) different rainfall levels.

Figure 9

Evaluation results for the selected gridded and fusion precipitation datasets at (a) different altitudes and (b) different rainfall levels.

Close modal

In the low-altitude area, M-MLR, M-FNN, M-RF, and M-LSTM showed an increase in CC by 28.05, 27.89, 30.18, and 29.81%, respectively, when compared with the average results of the gridded precipitation products. The corresponding RMSEs decreased by 24.79, 24.64, 27.07, and 25.93%. However, all gridded precipitation products overestimated actual precipitation, excluding M-LSTM, which underestimated it.

In the medium-altitude zone, ERA5 overestimated rainfall with an RB of 43.02%. The corresponding CCs of the four fusion precipitation datasets increased by 34.52, 34.31, 36.79, and 36.81%, when compared with the average results of the gridded precipitation products. The corresponding RMSEs decreased across the same altitudes, by 28.19, 28.03, 30.39, and 29.13%.

In the high-altitude areas, the fused precipitation data showed improvements in RMSE and CC when compared with the original data. The corresponding CCs of the four fusion precipitation datasets increased by 36.54, 36.17, 39.77, and 37.98%, when compared with the average results of the gridded precipitation products. The corresponding RMSEs decreased by 27.79, 27.93, 30.34, and 29.52%.

Overall, M-RF was the best-performing method among the four fusion methods. However, there was variation in the degree of improvement, particularly for M-LSTM. While the CC and RMSE improved after the fusion process, the degree of improvement varied.

Accuracy evaluation for different rainfall levels

Daily rainfall events were divided into three categories based on the RGD rainfall intensity: light rainfall (0.1–1 mm/day), moderate rainfall (1–50 mm/day), and heavy rainfall (≥50 mm/day). Figure 9(b) shows the CC, RMSE, and RB of the selected gridded and fusion precipitation datasets for light, moderate, and heavy precipitation events. The four fusion methods significantly improved the accuracy at each precipitation level when compared with the average results of the gridded precipitation products.

For light rainfall, the CC of M-MLR, M-FNN, M-RF, and M-LSTM increased by 85.19, 85.19, 103.70, and 66.67%, respectively, when compared with the average results of the gridded precipitation products; RMSE decreased by 30.28, 31.95, 42.77, and 56.94%, respectively. All the gridded precipitation products overestimated the actual rainfall. The M-LSTM had the best RMSE and RB performance.

For moderate rainfall, the corresponding CCs of the four fusion precipitation datasets increased by 40.76, 38.66, 42.86, and 42.86%, respectively, when compared with the average results of the gridded precipitation products; RMSEs decreased by 25.90, 25.24, 25.05, and 24.48%, respectively.

For heavy rainfall, the RMSE of the gridded precipitation products was greater than that for light and moderate rainfall. The gridded precipitation products underestimated heavy rainfall. The corresponding CCs of four fusion precipitation datasets increased by 42.86, 40.00, 37.14, and 40.00%, respectively, when compared with the average results of the gridded precipitation products; RMSEs decreased by 19.54, 19.48, 21.82, and 17.31%, respectively.

Overall, with increasing rainfall intensities, the products exhibited insufficient rainfall capture accuracy. LSTM performed the best under light rainfall, whereas MLR and RF exhibited better performance under moderate and heavy rainfall levels, respectively.

Robustness test

A robustness check was conducted to evaluate the four fusion methods used. Goodfellow et al. (2014) applied some perturbations to inputs of neural networks and successfully ensured that neural networks output incorrect answers. Therefore, for each grid point in the Yangtze River Delta, perturbations were applied to the validation data of one randomly selected day in 2015. The perturbations induced errors in one of the gridded precipitation products randomly (the error follows the standard normal distribution). Subsequently, MLR, FNN, RF, and LSTM were used to fuse the data in case of error and generate four datasets: error data based on MLR (E-MLR), FNN (E-FNN), RF (E-RF), and LSTM (E-LSTM). The test was repeated 10 times. The inverse distance weight interpolation method was used to interpolate the RGD as the truth value. The RMSEs between the data and the truth value were calculated (Figure 10). In the robustness test, the mean value of the RMSEs of E-LSTM (8.71 mm/day) was the lowest, followed by those of E-RF (9.00 mm/day) and E-MLR (9.09 mm/day). The RMSE of the merged data generated by the error data based on the FNN (9.34 mm/day) was the greatest. Compared with those of M-MLR, M-FNN, M-RF, and M-LSTM, the RMSEs of the corresponding data in the robustness test were all larger. Moreover, the RMSE of E-LSTM only increased by 86.11%, which was lower than the increases of that of E-MLR (97.18%), E-FNN (100.86%), and E-RF (95.23%). The results illustrate that the robustness of LSTM is superior to those of MLR, FNN, and RF when fusing multiple precipitation data.
Figure 10

RMSEs of the data fused by error data and truth value.

Figure 10

RMSEs of the data fused by error data and truth value.

Close modal

In this paper, we outline the validation of four conventional machine learning models (MLR, FNN, RF, and LSTM) to improve rainfall estimation by fusing five mainstream rainfall products; we highlight their performance under different conditions over the Chinese mainland. Overall, based on the grid-to-grid fusion of multi-source data, the four models showed higher rainfall detection accuracies and capabilities in the entire region, as well as superior performances across the different climatic regions, seasons, elevations, and rainfall levels. The comprehensive evaluation not limited to specific climatic regions or thresholds can provide useful information for the selection of different fusion approaches, in particular machine learning algorithms.

In general, according to the results for each indicator, the highest ranges of conventional machine learning-based fusion approaches in improving the satellite and reanalysis rainfall products are approximately 30%. However, different accuracies of the four models were observed. First, RF outperformed the other machine learning models, especially in reducing the data biases (Section 3.1, RB is 0.20%). Its efficiency was also confirmed by Nguyen et al. (2021) and Zhang et al. (2021), showing robustness to overfitting. In a comparable evaluation conducted by Lei et al. (2022), the RF model was still superior to the gradient boosting decision tree. The bagging-based ensemble learning characteristic facilitates achievement of more robust predictions through multiple decision trees. Furthermore, FNN is prone to false alarms on rainy days, with the worst FAR (0.44), followed by LSTM (0.42, see Figures 2 and 4). Previous studies have pointed out that correct classification of rainy and no-rain days is key to estimating rainfall accurately (Chen et al. 2023). Weak performance in event detection of fusion data based on the LSTM model showed serious underestimation of precipitation (RB of −21.38%). Due to the no-rain days dominating the sample, the strong time-series dependencies, limit its capacity (Sheng et al. 2023) and tend to false alarm the events, especially underestimating the heavy precipitation.

Notably, despite the promising results derived from these fusion models, all fusion results faced challenges in capturing the extremes and overestimated the light rainfall event. Considering that the extremes are less-sampled events, the performances of original and fusion datasets can be masked. Such as the statistical metrics showed that the accuracies of fusion data have serious discrepancies only under the different rainfall levels. The principal factor contributing to these biases is the systematic error in the original rainfall products (AghaKouchak et al. 2012).

With the development of machine and deep learning techniques, hybrid multi-models have shown tremendous potential in emulating non-homogeneous precipitation data and can be adopted to efficiently address the issues (Reichstein et al. 2019). Generally, the hybrid modelling approach aims to concatenate two or more machine or deep learning models, to obtain more accurate results compared to a solo model (Ahmed et al. 2023; Jena et al. 2021). According to Reichstein et al. (2019), a review paper, hybrid models will offer an opportunity for the development of deep learning in earth system science.

Zhang et al. (2021) developed a double machine learning method by combining the RF model with three other machine learning methods to fuse rainfall data. Their comparative results confirmed that the reliability of the model could be enhanced by increasing rain gauge densities and training sizes. Moreover, networks with multiple machine methods have been used to generate satellite-gauge fusion rainfall datasets. Wu et al. (2020) hybrid the CNN and LSTM models to fuse a satellite-based product and RGD over China. According to the results, the hybrid fusion model outperformed the individual models and achieved better accuracy under different precipitation intensities. Consequently, based on our analysis in the present study, application of hybrid multi-machine learning models will be emphasized in subsequent works to improve the fused data in extreme precipitation estimates.

In the present study, we investigated four machine learning fusion approaches to enhance the estimation of gridded precipitation products over the Chinese mainland. The conclusions are summarized as follows:

  • (1)

    In general, all four fusion methods could reduce the error. CMORPH-BLD and M-RF had the highest accuracies among the gridded and fusion precipitation products, respectively.

  • (2)

    Excluding RB, which did not show seasonal characteristics, the fusion precipitation datasets exhibited the same seasonal variations as the gridded precipitation products. Moreover, M-RF had the lowest error in the four seasons and the best capability for detecting daily precipitation events in spring, autumn, and winter. M-MLR performed better than other fusion precipitation datasets in detecting daily precipitation events in summer.

  • (3)

    Under all topographic conditions, the four fusion methods improved the RMSE and CC of the precipitation data. With increasing altitude, the improvements in CC and RMSE were more profound.

  • (4)

    All fusion methods performed well under light and heavy precipitation events, but they also exhibited insufficient precipitation capturing accuracy. LSTM performed best under light precipitation events, whereas MLR and RF exhibited superior performance under moderate and heavy precipitation events, respectively.

Despite the promising results derived from the four machine learning models, there are insufficient estimates of extreme and light rainfall events. Therefore, bias correction should be considered to maximize the fusion accuracies. In particular, hybrid multi-models can be applied to address such issues in future research.

This research was funded by the National Natural Science Foundation of China (42171389, 41730642) and the Natural Science Foundation of Shanghai (19ZR1437500). In addition, we thank the student Maopu Xu from the University of California, Los Angeles for his hard work in data collection and processing.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Adnan
R. M.
,
Mostafa
R. R.
,
Dai
H. L.
,
Heddam
S.
,
Kuriqi
A.
&
Kisi
O.
2023
Pan evaporation estimation by relevance vector machine tuned with new metaheuristic algorithms using limited climatic data
.
Engineering Applications of Computational Fluid Mechanics
17
(
1
),
2192258
.
AghaKouchak
A.
,
Mehran
A.
,
Norouzi
H.
&
Behrangi
A.
2012
Systematic and random error components in satellite precipitation data sets
.
Geophysical Research Letters
39
(
9
), L09406.
Ahmed, A., Deo, R. C., Feng, Q., Ghahramani, A., Raj, N., Yin, Z. L., & Yang, L. S.
2023
Hybrid deep learning method for a week-ahead evapotranspiration forecasting
.
Stochastic Environmental Research and Risk Assessment
36
(
3
),
831
849
.
Baez-Villanueva
O. M.
,
Zambrano-Bigiarini
M.
,
Beck
H. E.
,
McNamara
I.
,
Ribbe
L.
,
Nauditt
A.
,
Birkel
C.
,
Verbist
K.
,
Giraldo-Osorio
J. D.
&
Thinh
N. X. X.
2020
RF-MEP: a novel random forest method for merging gridded precipitation products and ground-based measurements
.
Remote Sensing of Environment
239
,
111606
.
Beck
H. E.
,
van Dijk
A. I. J. M.
,
Levizzani
V.
,
Schellekens
J.
,
Miralles
D. G.
,
Martens
B.
&
de Roo
A.
2017
MSWEP: 3-hourly 0.25° global gridded precipitation (1979–2015) by merging gauge, satellite, and reanalysis data
.
Hydrology and Earth System Sciences
21
(
1
),
589
615
.
Beck
H. E.
,
Wood
E. F.
,
Pan
M.
,
Fisher
C. K.
,
Miralles
D. G.
,
van Dijk
A. I. J. M.
,
McVicar
T. R.
&
Adler
R. F.
2019
MSWEP v2 global 3-hourly 0.1° precipitation: methodology and quantitative assessment
.
Bulletin of the American Meteorological Society
100
(
3
),
473
500
.
Bhuiyan
M. A. E.
,
Nikolopoulos
E. I.
,
Anagnostou
E. N.
,
Quintana-Seguí
P.
&
Barella-Ortiz
A.
2018
A nonparametric statistical technique for combining global precipitation datasets: development and hydrological evaluation over the Iberian Peninsula
.
Hydrology and Earth System Sciences
22
(
2
),
1371
1389
.
Breiman
L.
2001
Random forests
.
Machine Learning
45
(
1
),
5
32
.
Chen
H.
,
Wen
D.
,
Du
Y.
,
Xiong
L.
&
Wang
L.
2023
Errors of five satellite precipitation products for different rainfall intensities
.
Atmospheric Research
285
,
106622
.
Demaria
E. M. C.
,
Hazenberg
P.
,
Scott
R. L.
,
Meles
M. B.
,
Nichols
M.
&
Goodrich
D.
2019
Intensification of the North American monsoon rainfall as observed from a long-term high-density gauge network
.
Geophysical Research Letters
46
(
12
),
6839
6847
.
Ehsan Bhuiyan
M. A.
,
Nikolopoulos
E. I.
&
Anagnostou
E. N.
2019
Machine learning-based blending of satellite and reanalysis precipitation datasets: a multiregional tropical complex terrain evaluation
.
Journal of Hydrometeorology
20
(
11
),
2147
2161
.
Fan
Z. D.
,
Li
W.
,
Jiang
Q.
,
Sun
W.
,
Wen
J.
&
Gao
J.
2021
A comparative study of four merging approaches for regional precipitation estimation
.
IEEE Access
9
,
33625
33637
.
Goodfellow
I. J.
,
Shlens
J.
&
Szegedy
C.
2014
Explaining and harnessing adversarial examples. arXiv:1412.6572
Hashemi
H.
,
Nordin
M.
,
Lakshmi
V.
,
Huffman
G. J.
&
Knight
R.
2017
Bias correction of long-term satellite monthly precipitation product (TRMM 3b43) over the conterminous United States
.
Journal of Hydrometeorology
18
(
9
),
2491
2509
.
Hersbach
H.
,
Bell
B.
,
Berrisford
P.
,
Hirahara
S.
,
Horányi
A.
,
Muñoz-Sabater
J.
,
Nicolas
J.
,
Peubey
C.
,
Radu
R.
,
Schepers
D.
,
Simmons
A.
,
Soci
C.
,
Abdalla
S.
,
Abellan
X.
,
Balsamo
G.
,
Bechtold
P.
,
Biavati
G.
,
Bidlot
J.
,
Bonavita
M.
,
Chiara
G.
,
Dahlgren
P.
,
Dee
D.
,
Diamantakis
M.
,
Dragani
R.
,
Flemming
J.
,
Forbes
R.
,
Fuentes
M.
&
Geer
A.
2020
The ERA5 global reanalysis
.
Quarterly Journal of the Royal Meteorological Society
146
(
730
),
1999
2049
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
Ikram
R. M. A.
,
Mostafa
R. R.
,
Chen
Z.
,
Parmar
K. S.
,
Kisi
O.
&
Zounemat-Kermani
M.
2023
Water temperature prediction using improved deep learning methods through reptile search algorithm and weighted mean of vectors optimizer
.
Journal of Marine Science and Engineering
11
(
2
),
259
.
Janjić
T.
,
Bormann
N.
,
Bocquet
M.
,
Carton
J. A.
,
Cohn
S. E.
,
Dance
S. L.
,
Losa
S. N.
,
Nichols
N. K.
,
Potthast
R.
,
Waller
J. A.
&
Weston
P.
2018
On the representation error in data assimilation
.
Quarterly Journal of the Royal Meteorological Society
144
(
713
),
1257
1278
.
Jena, B., Saxena, S., Nayak, G. K., Saba, L., Sharma, N., & Suri, J. S.
2021
Artificial intelligence-based hybrid deep learning models for image classification: The first narrative review
.
Computers in Biology and Medicine
137
,
104803
.
Jiang
Q.
,
Li
W.
,
Fan
Z.
,
He
X.
,
Sun
W.
,
Chen
S.
,
Wen
J.
,
Gao
J.
&
Wang
J.
2021
Evaluation of the ERA5 reanalysis precipitation dataset over Chinese mainland
.
Journal of Hydrology
595
,
125660
.
Korkmaz
M.
2021
A study over the general formula of regression sum of squares in multiple linear regression
.
Numerical Methods for Partial Differential Equations
37
(
1
),
406
421
.
Lewis
E.
,
Fowler
H.
,
Alexander
L.
,
Dunn
R.
,
McClean
F.
,
Barbero
R.
,
Guerreiro
S.
,
Li
X. F.
&
Blenkinsop
S.
2019
GSDR: a global sub-daily rainfall dataset
.
Journal of Climate
32
(
15
),
4715
4729
.
Ma
M.
,
Wang
H.
,
Jia
P.
,
Tang
G.
,
Wang
D.
,
Ma
Z.
&
Yan
H.
2020
Application of the GPM-IMERG products in flash flood warning: a case study in Yunnan, China
.
Remote Sensing
12
(
12
),
1954
.
Marzuki
M.
,
Suryanti
K.
,
Yusnaini
H.
,
Tangang
F.
,
Muharsyah
R.
,
Vonnisa
M.
&
Devianto
D.
2021
Diurnal variation of precipitation from the perspectives of precipitation amount, intensity and duration over Sumatra from rain gauge observations
.
International Journal of Climatology
41
(
8
),
4386
4397
.
Mostafa
R. R.
,
Kisi
O.
,
Adnan
R. M.
,
Sadeghifar
T.
&
Kuriqi
A.
2023
Modeling potential evapotranspiration by improved machine learning methods using limited climatic data
.
Water
15
(
3
),
48
.
Nguyen
P.
,
Shearer
E. J.
,
Tran
H.
,
Ombadi
M.
,
Hayatbini
N.
,
Palacios
T.
,
Huynh
P.
,
Braithwaite
D.
,
Updegraff
G.
,
Hsu
K.
,
Kuligowski
B.
,
Logan
W. S.
&
Sorooshian
S.
2019
The CHRS data portal, an easily accessible public repository for PERSIANN global satellite precipitation data
.
Scientific Data
6
(
1
),
180296
.
Nguyen
G. V.
,
Le
X. H.
,
Van
L. N.
,
Jung
S.
,
Yeon
M.
&
Lee
G.
2021
Application of random forest algorithm for merging multiple satellite precipitation products across South Korea
.
Remote Sensing
13
(
20
),
4033
.
Pradhan
R. K.
,
Markonis
Y.
,
Vargas Godoy
M. R. V.
,
Villalba-Pradas
A.
,
Andreadis
K. M.
,
Nikolopoulos
E. I.
,
Papalexiou
S. M.
,
Rahim
A.
,
Tapiador
F. J.
&
Hanel
M.
2022
Review of GPM IMERG performance: a global perspective
.
Remote Sensing of Environment
268
,
112754
.
Quagraine
K. A.
,
Nkrumah
F.
,
Klein
C.
,
Klutse
N. A. B.
&
Quagraine
K. T.
2020
West African summer monsoon precipitation variability as represented by reanalysis datasets
.
Climate
8
(
10
),
111
.
Reichstein
M.
,
Camps-Valls
G.
,
Stevens
B.
,
Jung
M.
,
Denzler
J.
,
Carvalhais
N.
&
Prabhat
2019
Deep learning and process understanding for data-driven earth system science
.
Nature
566
(
7743
),
195
204
.
Sheng
S.
,
Chen
H.
,
Lin
K.
,
Zhou
N.
,
Tian
B.
&
Xu
C. Y.
2023
An integrated framework for spatiotemporally merging multi-sources precipitation based on F-SVD and ConvLSTM
.
Remote Sensing
15
(
12
),
3135
.
Skofronick-Jackson
G.
,
Kirschbaum
D.
,
Petersen
W.
,
Huffman
G.
,
Kidd
C.
,
Stocker
E.
&
Kakar
R.
2018
The global precipitation measurement (GPM) mission's scientific achievements and societal contributions: reviewing four years of advanced rain and snow observations
.
Quarterly Journal of the Royal Meteorological Society
144
(
51
),
27
48
.
Steiner
M.
,
Houze
R. A.
Jr.
&
Yuter
S. E.
1995
Climatological characterization of three-dimensional storm structure from operational radar and rain gauge data
.
Journal of Applied Meteorology and Climatology
34
(
9
),
1978
2007
.
Sun
Q. H.
,
Miao
C.
,
Duan
Q.
,
Ashouri
H.
,
Sorooshian
S.
&
Hsu
K.-L.
2018
A review of global precipitation data sets: data sources, estimation, and intercomparisons
.
Reviews of Geophysics
56
(
1
),
79
107
.
Wang
C. G.
,
Tang
G. Q.
&
Gentine
P.
2021a
PrecipGAN: merging microwave and infrared data for satellite precipitation estimation using generative adversarial network
.
Geophysical Research Letters
48
(
5
),
GL092032
.
Wu
H. C.
,
Yang
Q. L.
,
Liu
J. M.
&
Wang
G. Q.
2020
A spatiotemporal deep fusion model for merging satellite and gauge precipitation in China
.
Journal of Hydrology
584
,
124664
.
Xue
H. Z.
&
Cui
H. W.
2019
Research on image restoration algorithms based on BP neural network
.
Journal of Visual Communication and Image Representation
59
,
204
209
.
Yin
J. B.
,
Guo
S.
,
Gu
L.
,
Zeng
Z.
,
Liu
D.
,
Chen
J.
,
Shen
Y.
&
Xu
C.-Y.
2021
Blending multi-satellite, atmospheric reanalysis and gauge precipitation products to facilitate hydrological modelling
.
Journal of Hydrology
593
,
125878
.
Yuan
X.
,
Chen
C.
,
Lei
X.
,
Yuan
Y.
&
Muhammad Adnan
R.
2018
Monthly runoff forecasting based on LSTM–ALO model
.
Stochastic Environmental Research and Risk Assessment
32
,
2199
2212
.
Yuan
X.
,
Li
L.
,
Shardt
Y. A. W.
,
Wang
Y.
&
Yang
C.
2020
Deep learning with spatiotemporal attention-based LSTM for industrial soft sensor model development
.
IEEE Transactions on Industrial Electronics
68
(
5
),
4404
4414
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).