Abstract
Satellite-based precipitation products and reanalysis precipitation products have the potential to overcome the lack of information in regions where there are no or insufficient rain gauges to achieve any hydrological study. The Google Earth Engine (GEE) data analysis platform has products in its repository with global coverage that offers different geospatial information capable of measuring the amount of precipitation. However, it is necessary to evaluate the reliability of the products. There are precipitation information biases in Mexico due to the scarce presence of gauging stations, failed operations, access difficulty, and data capture errors. This study evaluates the reliability of satellite and reanalysis precipitation products hosted in the GEE repository against rain gauge observation from 2001 to 2017 using data from 4,658 stations over Mexico. The evaluation was carried out using statistical indicators comparing the behavior across different topographic, climatic, and temporal conditions. The results exhibit that the performance of the products hosted in GEE seems to depend on elevation conditions for other climatic regions in Mexico. The results show that all products can capture the general precipitation patterns at annual, seasonal, and monthly scales; however, the accuracy of the product is clearly lower at a daily scale. All products are highly biased on low precipitation events.
HIGHLIGHTS
The intensity and distribution of precipitation in Mexico depends on topography and climatic regions.
ERA5 product in Mexico showed the lowest correlations of the entire region, while CHIRPS is the product with the best score.
There is a more significant correlation of products in the regions and seasons with the highest presence of rainfall.
Graphical Abstract
INTRODUCTION
Over the last decades, urban areas' growth and the development of new cities have led to a constant search for new and different forms of water supply, which generates a need to study and know the behavior of rainfall to plan efficient and effective use of water resources (Manz 2016). Water resources management seriously lacks in situ meteorological monitoring stations in remote regions. It represents one of the most critical challenges decision-makers and hydrologists face (Behrangi et al. 2011).
Rainfall is a fundamental process in the hydrological cycle and has a vital role in climatology and regional meteorology. Measuring precipitation at a given location using surface instruments is relatively straightforward. However, despite providing certainty that the reading taken is correct, the rain gauges have limitations in their spatial distribution. In some cases, the area covered by a single station becomes too large for that reading to represent regional precipitation, reducing the level of confidence in reading. The large spatial and temporal variability, type, and precipitation occurrence make measurements of large regions complex (Ogbu et al. 2020). Rain gauges experiment with severe problems in estimating precipitation over short periods (Hou et al. 2014) or regions with poor measurement networks, a common problem in developing countries (Ulloa et al. 2017).
The correct estimation of the precipitation that falls in a particular area becomes very important for various research purposes, such as hydrological modeling, management of water resources, among others (Xu et al. 2017). Many authors have highlighted the interaction of climatic systems and the heterogeneous topography of a region as factors that affect the capture of precipitation over short distances (Nesbitt & Anders 2009; Manz 2016). Increasingly precise quantitative estimates of rainfall are essential, so there is a need for increased accuracy and appropriate spatial distribution. Accurate information on precipitation distribution is the basis for understanding regional changes, optimizing water resource management, and improving the ability to cope with disasters such as droughts and flooding (Hu et al. 2019).
The density of rain gauges is not homogeneous in Mexico. This is a significant disadvantage when quantifying the amount of precipitation over a region (Rodriguez et al. 2018). To solve this problem, interpolation methods are applied to improve the precision of the measurement, giving a more reliable source of information. The basic principle of most interpolation estimation methods is to transform the point values of precipitation through spatial weights to represent the distribution of rainfall over an area (Borges et al. 2016). However, distribution in extensive regions is still an issue to solve.
In this same idea, remote sensing has become a helpful tool that provides information in areas that are not accessible and where it would be almost impossible to measure rainfall with rain gauges (Hong et al. 2003). The data from meteorological satellite precipitation estimation products is a tool that helps to overcome the limitations of station-based precipitation observation networks. In recent decades, advances in technology have allowed the development of several global precipitation technologies, facilitating many satellite-derived global precipitation products (Chen & Li 2016). Several free-access satellite precipitation products have been extensively studied, and they have been verified both globally and regionally and subsequently released for public use (Tang et al. 2016). Only a few studies have moved forward to analyze and evaluate these products in the Mexican territory (Miranda 2002; Kucieńska et al. 2012; Mayor et al. 2017). The quality of precipitation input data largely determines the successful performance of any hydrological application in Mexico. Thus, it is essential to evaluate their quality (Velázquez et al. 2002). There are satellites dedicated to capturing precipitation data; their importance lies in monitoring large areas where there are usually not enough weather stations, usually located in places close to cities for ease of management. However, as mentioned before, this condition limits its ability to represent a large region adequately since it generalizes conditions in that region for the immediate context of the ground stations. Google Earth Engine (GEE) is a free, service cloud-based processing and storage of satellite images, capable of managing a petabyte-scale archive using Java-script and Python programming language. The information stored covers more than 40 years of satellite images for the entire world with different temporal and spatial scales (Gorelick et al. 2017; Kumar & Mutanga 2018; Mutanga & Kumar 2019).
The capacity and conditions of cloud computing systems such as GEE allow for processing in a simpler and faster way large quantities of satellite images in extended periods without necessarily having high-performance computing equipment (Kumar & Mutanga 2018). In our study, various factors can affect the precision of the satellites and hydrological models. About capturing precipitation in areas with complex topography, there is a need for increased accuracy that has been shown to significantly affect precipitation behavior with weighty rains in minimal areas and short periods by satellite means (Amjad et al. 2020).
The scope that GEE precipitation products can offer has significant capabilities, allowing to know the state and conditions of regions in which ground stations cannot provide. Information gaps can be filled where there are no meteorological measurement stations nearby. Although it solves the situation, the information may be erroneous or inaccurate. In practice, to solve this gap, interpolation is carried out.
This research compares and evaluates the GEE rainfall products (ECMWF Reanalysis 5th Generation (ERA5), Integrated Multi-satellite Retrievals for GPM (IMERG), Daily Surface Weather Data (DAYMET), the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR), Tropical Rainfall Measuring Mission (TRMM), and Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS)) that have coverage in the Mexico area carried out. The spatial resolution of GEE rainfall products varies according to the source (between 1 and 31 km per pixel), from 2001 to 2017. Given the large amount of information that GEE rainfall products offer, it is possible to study the impact of rainfall in multiple areas and periods without the need for high-performance computing equipment. This article verifies the usability of these products in the region of Mexico. The factors that affect rain capture will be sought in the gridded information systems to provide a new approach to analyzing hydrological, meteorological, and climatic processes.
The main advantages of GEE rainfall products lie with its remarkable processing capabilities without the need for downloading the data since the processing is outsourced to Google servers. In this context, it becomes imperative to evaluate the capacity of available precipitation datasets on the platform to represent the spatial and temporal precipitation patterns against locally available ground-based observations in different countries. Although there have been studies on performance assessment of satellite-based precipitation products in Mexico (Montero-Martínez et al. 2012; Real-Rangel et al. 2017; Mendoza 2019; Morales-Velázquez et al. 2021; Yuan et al. 2021) evaluation of several products and integrating the GEE platform is lacking. The TRMM precipitation product has been the most extensive validated in Mexico. On a worldwide scale, studies can be found in which precipitation products integrating GEE have been used. For example, Zeng et al. (2019) analyzed the spatio-temporal distribution of precipitation and the presence of trends in the Zambezi River basin in Zimbabwe. Sazib et al. (2018) evaluated the usefulness of precipitation and soil moisture products in characterizing drought in South Africa and Ethiopia. Another integration of GEE products is by Elnashar et al. (2020) implemented their basin-scale study for the Mekong River.
This study aimed to characterize and analyze which GEE rainfall products have the best performance. We learned from previous studies that satellite precipitation has a lot of uncertainty but they are nevertheless useful. We also know that the rain gauges have limitations in their spatial distribution. Also, many authors have highlighted the interaction of climatic systems and the heterogeneous topography of a region as factors that affect the capture of precipitation over short distances and when the density of rain gauges is not homogeneous like in Mexico and other parts of the globe. To the best of our knowledge, there are no published studies with similar season-related findings with several meteorological stations involved in the study area.
DATA AND METHODS
Study area
The analysis is performed in the entire Mexico. It is located between coordinates 32° and 14° North and 86° and 118° West as shown in Figure 1. Mexico shares 3,152 km of the border to the North with the United States of America (USA) and has a continental territorial area of 1,959,248 km2 (INEGI 2019). It is important to note that Mexico has a very rugged topography, with several mountain systems such as the Sierra Madre Occidental, the Sierra Madre Oriental, and the Mexican Altiplano systems (Rzedowski 2006; Mayor et al. 2017).
Distribution of the average annual precipitation in the country, as well as its relationship with the natural region: (a) the terrain elevation with the distribution of the annual precipitation from rain gauge data and the bottom (b) represents the climate zones.
Distribution of the average annual precipitation in the country, as well as its relationship with the natural region: (a) the terrain elevation with the distribution of the annual precipitation from rain gauge data and the bottom (b) represents the climate zones.
Mexico has a wide variety of climates. The northwest and middle of the country, which covers two-thirds of the territory, are arid or semi-arid regions (Table 1), with an annual rainfall of fewer than 600 mm, by contrast, the southeast is humid with average rainfall that sometimes exceeds 2,000 mm per year (CONAGUA 2018). Most of the annual precipitation in Mexican territory falls from June to September during the rainy season (Figure 2). The spatial distribution of annual mean precipitation is very irregular, as shown in Figure 1 for 2001–2017. The most significant variation in the precipitation regimen occurs between the south and the North of Mexico.
Main climate regions of Mexico and their percentage distribution (INEGI (2019))
Climate regions . | Temperate . | Humid tropic . | Sub-humid tropic . | Arid . |
---|---|---|---|---|
Mexico | 23.4 | 12.2 | 16.1 | 48.3 |
Climate regions . | Temperate . | Humid tropic . | Sub-humid tropic . | Arid . |
---|---|---|---|---|
Mexico | 23.4 | 12.2 | 16.1 | 48.3 |
Datasets
This section describes the precipitation data sources used and their general characteristics. To obtain hydroclimatic variables needed to simulate hydrological processes and water balance studies in Mexico, the global free sources of information, like satellite data, are used (Real-Rangel et al. 2017). This product is essential in Mexico and other developing countries, where in situ observations are scarce. The selection of satellite products was based on those that met the most extended period of the historical record and have already been evaluated for accuracy in some regions of Mexico (De Jesús et al. 2016; Beck et al. 2017; Perdigón-Morales et al. 2018).
We selected the products hosted in GEE that had relevant spatial and temporal coverage for this study (over Mexico and spanning from January 2001 to December 2017, respectively), manually reviewing all the products in GEE that offered precipitation information, supported by the platform's own filters.
This study selected rain gauge observations over Mexico derived from satellites and meteorological stations over 17 years from 1 January 2001 to 31 December 2017. The study period is attributed to the availability of the two sources datasets, satellite products and ground gauges.
Rain gauge datasets
The National Weather Service of Mexico (Sistema Meterológico Nacional, SMN) is the primary climatological data in Mexico and is responsible for gathering and disseminating the information (CONAGUA 2018). The National Water Commission in Mexico (Comisión Nacional del Agua, CONAGUA) deploys a network of weather stations distributed along the Mexican territory. The database offers daily precipitation data reported every 24 h. In this study, the daily precipitation data from 4,568 climatological stations (with different coverage areas for each station, depending on their density) are the database used to assess the accuracy of satellite precipitation products. Then, we obtained the information from the website (//smn.conagua.gob.mx/es/climatologia/informacion-climatologica/informacion-estadistica-climatologica) of CONAGUA. Two filters were applied for the gauging stations to improve the quality of our database (Figure 3). The first filter of the quality control is to select the number of whole years that each climatological station contains, considering that ‘complete’ is the one containing at least 90% of measurements in its time series. Subsequently, the second filter is to take for the analysis only those stations that had at least 10 years of information since 2001.
Distribution of average precipitation for each season and its relation to climate zones.
Distribution of average precipitation for each season and its relation to climate zones.
Satellite precipitation datasets
There are various products with precipitation data; however, only six of them have spatial (Mexico) and temporal (January 2001 to December 2017) coverage in our study area. The products used in this paper are described with their main characteristics in Table 2. The satellite products have different spatial and temporal resolutions ranging from half an hour for IMERG. For spatial resolution, the product with the finest resolution is DAYMET with 1 km, and PERSIANN-CDR and others have the lowest resolution of 0.25° (31 km).
Precipitation products used
Product . | Spatial resolution . | Temporal resolution . | Unit . | Description . |
---|---|---|---|---|
ERA5 | 0.25° (31 km) | Daily | m/day | Built between the ECMWF and the Copernicus Climate Change Service as a source of climatological information with global coverage, offering data from 1979 to the present (Hersbach et al. 2018). |
DAYMET | 1 km | Daily | mm/day | It is a project launched by the NASA Terrestrial Ecology Program, collecting different climatological variables for the territories of North America and Puerto Rico from the year 1980 (Thornton et al. 2016). |
PERSIANN-CDR | 0.25° (31 km) | Daily | mm/day | Offers precipitation information from 1982 to the present, distributed by the National Oceanic and Atmospheric Administration (NOAA) (Sorooshian et al. 2014). |
TRMM 3B42 | 0.25° (31 km) | 3 h | mm/h | It is a project developed in late 1997 between the National Aeronautics and Space Administration (NASA) and the Japanese Aerospace Exploration Agency (JAXA), its coverage range comprises 60° S to 60 °N and 0° to 360° longitude (Lujano Laura et al. 2015). |
IMERG | 0.10° (11 km) | Half an hour | mm/h | It consists of a constellation of satellites launched in 2014 as a successor project to the TRMM 3B42 by NASA and JAXA, improving temporal and spatial resolution (Maghsood et al. 2019). |
CHIRPS | 0.05° (5.6 km) | Daily | mm/day | Developed by the Climate Hazards Center (CHC) with four decades of operation, it offers high-resolution satellite precipitation data with a coverage range between 50° S to 50° N and 0° to 360° longitude (Yu et al. 2020). |
Product . | Spatial resolution . | Temporal resolution . | Unit . | Description . |
---|---|---|---|---|
ERA5 | 0.25° (31 km) | Daily | m/day | Built between the ECMWF and the Copernicus Climate Change Service as a source of climatological information with global coverage, offering data from 1979 to the present (Hersbach et al. 2018). |
DAYMET | 1 km | Daily | mm/day | It is a project launched by the NASA Terrestrial Ecology Program, collecting different climatological variables for the territories of North America and Puerto Rico from the year 1980 (Thornton et al. 2016). |
PERSIANN-CDR | 0.25° (31 km) | Daily | mm/day | Offers precipitation information from 1982 to the present, distributed by the National Oceanic and Atmospheric Administration (NOAA) (Sorooshian et al. 2014). |
TRMM 3B42 | 0.25° (31 km) | 3 h | mm/h | It is a project developed in late 1997 between the National Aeronautics and Space Administration (NASA) and the Japanese Aerospace Exploration Agency (JAXA), its coverage range comprises 60° S to 60 °N and 0° to 360° longitude (Lujano Laura et al. 2015). |
IMERG | 0.10° (11 km) | Half an hour | mm/h | It consists of a constellation of satellites launched in 2014 as a successor project to the TRMM 3B42 by NASA and JAXA, improving temporal and spatial resolution (Maghsood et al. 2019). |
CHIRPS | 0.05° (5.6 km) | Daily | mm/day | Developed by the Climate Hazards Center (CHC) with four decades of operation, it offers high-resolution satellite precipitation data with a coverage range between 50° S to 50° N and 0° to 360° longitude (Yu et al. 2020). |
Compiled from the GEE platform.
Note: All GEE precipitation products used in this study are in CSV data format.
The rainfall products hosted on the GEE platform have a format of mesh that can provide continuous spatial coverage over the area, with two principal temporal resolutions, daily and hourly. On the other hand, we have the rainfall data captured by the gauge stations, the meteorological stations of CONAGUA (//smn.conagua.gob.mx/es/) located throughout Mexico and with a record of rainfall per day.
It is important to note that each satellite product hosted in its original repository has its format. These formats are unified when hosted in the GEE repository. For this analysis, we chose to download the information of each satellite product hosted in GEE in a CSV format and then process it in the Python programming language.




During this unification scale process, gauge stations still present some gaps in their time series, which cause some errors in the metrics. These gaps represent less than 0.5% by each GEE rainfall products, and for this reason, we decided to drop these gaps since they do not represent an adverse effect for the reaches of this research.
We used the Python programming language to manage more than 9 million daily rainfall records divided among seven sources of information. Mainly with its Pandas and Numpy libraries for database management and the Pyplot library for database visualization, in addition to the Google-Collaboratory cloud computing platform.
Continuous statistical indices
To evaluate the performance of satellite precipitation products to capture the observed occurrence of precipitation at each weather station, three statistical indices: Pearson's Correlation Coefficient , relative bias (rBIAS), and Root Mean Square Error (RMSE) (Table 3) were selected, where n is the number of rainfall records of the GEE rainfall products,
and
are the rainfall observation and the mean rainfall respectively, analogously
and x represent the values of the gauge station.
Statistical metrics
Name . | Formula . | Optimal value . |
---|---|---|
Pearson Correlation Coefficient | ![]() | 1.0 |
Relative Bias | ![]() | 0.0 |
Root Mean Square Error | ![]() | 0.0 |
Name . | Formula . | Optimal value . |
---|---|---|
Pearson Correlation Coefficient | ![]() | 1.0 |
Relative Bias | ![]() | 0.0 |
Root Mean Square Error | ![]() | 0.0 |
The Pearson correlation coefficient has ranges from −1 to +1 and evaluates the degree of relationship between the gauging stations and the estimated values from the satellite, a value close to zero indicates a no-correlation. On the other hand, a value relative to −1 or +1 would imply a strong negative or strong positive relationship, respectively. The rBIAS measures the deviation of the GEE rainfall products measurement from the gauge station. With the RMSE, we calculate an average index of the errors in the GEE rainfall products estimates. These metrics were performed in previous research about rainfall evaluation (Ebert Janowiak & Kidd 2007; Hobouchian et al. 2017; Xu et al. 2017; Zhang et al. 2019).
Assessment procedures
Two different verification procedures were performed to analyze the estimation accuracy of six satellite-based precipitation products. The first verification was to evaluate the effect of elevation on the accuracy of precipitation products. The elevations within the study area were classified into 1,000 m categories, starting at 0 m Above Mean Sea Level (AMSL) and up to +3,000 m AMSL. Afterward, the mean annual precipitation for each GEE product and measurement station was calculated using the equation (Table 3) to measure the statistical metrics (rBIAS, RMSE, ) between them, at each range of elevations in Figure 1.
The second process evaluated the accuracy at different time scales (daily, monthly, seasonal, and annually). Statistical indices were evaluated for each rain gauge station. We named a study point where the capture of precipitation products was compared against the measurement CONAGUA's meteorological stations. The inverse distance method was used to generate the interpolation surfaces of the statistical index values. The interpolation used surfaces to observe statistical metrics distribution in Mexican territory. In conclusion, the inverse distance method was used to generate the interpolation surfaces of the statistical index values.
RESULTS AND DISCUSSION
Evaluation point to grid
Figure 4 shows scatter plots of the estimated rainfall with ERA5, DAYMET, CHIRPS, IMERG, and PERSIANN versus precipitation observed at surface rainfall gauge stations at different time scales. These results show that the statistical indices used ( and rBIAS) to evaluate the precipitation products experiment's accuracy significantly improved when moving from daily to annual scale. On a daily scale, the best products were DAYMET and CHIRPS, with DAYMET performing better for daily maximum precipitation events. On monthly, seasonal, and annual scales, the best performing model was CHIRPS, which reached a correlation degree of 0.87 on a monthly scale and 0.88 on the yearly scale. The RMSE also showed the CHIRPS product as the best model in the highest percentage of time scales except for the daily scale, where the best model was DAYMET. The observed results are consistent with those presented by Morales-Velázquez et al. (2021), who evaluated the efficiency of two satellite precipitation products and two reanalysis precipitation datasets at the watershed scale in Mexico.
Scatter plots of precipitation over Mexico at different time scales and GEE products, each row represents a time scale beginning from daily to monthly, seasonal, and annually.
Scatter plots of precipitation over Mexico at different time scales and GEE products, each row represents a time scale beginning from daily to monthly, seasonal, and annually.
The highest rBIAS were presented with the products ERA5, IMERG, and PERSIANN. On a daily scale, all the products show a negative rBIAS, which tends to underestimate the precipitation values. Negative rBIAS are also present for the other time scales, but a decrease in bias is identified. Greater magnitude (RMSE) errors were exhibited with ERA5 on monthly, seasonal, and annual scales. On the daily scale, the highest RMSE value occurred with CHIRPS. Colorado-Ruiz & Cavazos (2021) found similar results in the behavior of rBIAS with the ERA5 and CHIRPS products. Sharifi et al. (2016) and Mayor et al. (2017) found that IMERG generally underestimates rain gauges in Iran, Mexico, and southern United States study cases. Other results in this sense were observed by Yuan et al. (2021).
They indicate that IMERG underestimates the precipitation of tropical cyclones, being their study area Mexico and the United States.
We considered evaluating how the precipitation per month behaved through the entire time series and the products to be evaluated in the study. Figure 5 shows the statistics behavior heat maps in different temporalities. A marked reference to the rainy season in the country (May–November) can be observed in all the products.
Heat maps of statistics behavior in different temporalities, from the first case the precipitation values for all January and calculating the metrics, following this process in an analogous way next months and seasons of each year (2001–2017).
Heat maps of statistics behavior in different temporalities, from the first case the precipitation values for all January and calculating the metrics, following this process in an analogous way next months and seasons of each year (2001–2017).
Figure 5 shows heat maps with precipitation behavior, where a dominant unimodal regime is observed in the Mexican territory with the highest precipitation concentration from June to October. The results for the statistical indices are also shown, where it is identified that the high RMSE values occur in the months with the highest precipitation concentration. Regarding the rBIAS, the highest values were found in the months with the lowest precipitation (April–May). Finally, the behavior of the does not show a particular pattern.
Evaluation of satellite-gauged precipitation at different elevation ranges
This section shows the performance evaluation satellite precipitation results of the GEE products at four different elevation ranges (<1,000, 1,000–2,000, 2,000–3,000, and >3,000 m). The assessment of the products in the study area highlighted that the product with the best performance on a daily scale in all elevation ranges was DAYMET (Figure 6). A better result was observed for DAYMET and CHIRPS, in which the median for rBIAS and RMSE showed values close to zero in all elevation ranges and close to one for correlation coefficient. The behavior of the rBIAS statistic in the comparative point to grid analysis at the different time scales showed that all products are highly positively biased, which means that they all tend to overestimate (Figure 6). The biases are worst at the daily scale, which may be evidence that the products overestimate low precipitation events.
Statistical indices over all the analysis period for point to grid comparison at different time scales (daily, monthly, season, and annually).
Statistical indices over all the analysis period for point to grid comparison at different time scales (daily, monthly, season, and annually).
The correlation degrees showed that the median remained above 0.95 on the monthly and seasonal scales. A decrease in correlation was observed on the annual scale, where the median oscillated between values of 0.6 and 0.7. The analysis of the correlation index shows that the product with the highest yield is CHRIPS, while ERA5 and TRMM showed the lowest result. The poor performance of TRMM stands out compared to the rest of the products, severely overestimates are identified at elevation in ranges less than 3,000 m AMSL.
Finally, according to the RMSE, the median value is almost the same for all the datasets; however, ERA5, PERSIANN, and TRMM have a more dispersed distribution. These products are followed by DAYMET, CHRIPS, and IMERG, which have a tighter distribution. The lowest RMSE values were in the elevation range of <1,000, 1,000–2,000 and 2,000–3,000. Figure 6 shows the increase in RMSE at sites located at elevations above 3,000 m asl.
On point to grid comparison between satellite products and rain gauge stations located in the first 1,000 m of elevation above sea level, it was found that CHIRPS product improved its degree of correlation to 0.93. This product also slightly reduced the relative bias reading, changing to −2.92% and increased the RMSE measurement by 295.57 mm. In this elevation range, this product improved its performance in its correlation and slightly worsened its other two metrics (Figure 6). The results exhibit the strong effect of topographic gradient and rain gauge density as observed by Navarro et al. (2020) in the Ebro basin (Spain). The best statistical performance is at low elevations, where there is a higher concentration of rain gauges (47% at elevations >1,000 m asl). Figure 6 shows that DAYMET showed the best performance of the six products in all the time periods analyzed; however, its performance also decreases when moving from the annual scale to the daily scale.
It is easy to handle massive spatial data when using GEE as it is already processed for immediate use. This analysis found that DAYMET has the best spatial resolution and unusual temporal behavior. However, the latter presents abrupt changes in its measurements, discards it as a product to use. Due to their high spatial resolution, the following two best options are IMERG and CHIRPS, opting for the latter. They have the best spatial resolution and a temporal resolution compatible with CONAGUA stations.
Dubey et al. (2021), in a previous research ‘Evaluation of precipitation datasets available on Google Earth Engine over India’, assess the fidelity of seven precipitation products available on GEE against local ground-based observations. It was observed that most of the products represent the characteristics of Indian precipitation reasonably well, but systematic biases were also observed. The authors found that ERA5 and IMERG perform better in terms of RMSE as well as correlation and get the best RMSE score for the highest elevation ranges (2,000–4,000 m AMSL). The results in Mexico are inconsistent in terms of systematic bias and the lower RMSE was observed for elevations ranges 2,000–3,000 and >3,000.
The analysis of the altitude effect on the accuracy of satellite precipitation products allows us to identify an essential factor in deciding the appropriate type of product for a basin scale. The RMSE showed that the value decreases as the topographic gradient increases. The highest RMSE values are found in areas with elevations <1,000 m asl. Regarding the rBIAS metric, the mean behavior showed a bias very close to zero at all scales except for the daily scale, presenting negative values. The highest biases were identified at elevations above 3,000 m.
Temporal effect in information analysis
We graphically represent the annual behavior of the metrics for each GEE product in Figure 7. Five of the seven products were possibly conducted in a range of 250 and 500 mm RMSE, except IMERG, which maintains an RMSE above 1,000 and below 1,250 mm. When reviewing the time series, we found that DAYMET presented the lowest RMSE values from 2001 to 2011.
In 2012, it began to register the highest metric after IMERG, showing erratic and unstable behavior. On the other hand, it was shown that during the entire time series, CHIRPS presented a low RMSE, between 300 and 250 mm/day.
Timeline of how the metrics over the period of study, using the information of each year.
Timeline of how the metrics over the period of study, using the information of each year.
Similarly, as in RMSE, the behavior of the rBIAS metric in the product was presented by IMERG with the highest values within 17 years of study. However, DAYMET maintains an approximate demeanor to the other five products below +20% rBIAS during the first 13 years of investigation to change to values between −30 and −10% rBIAS from 2013. This graph shows that CHIRPS continues to perform as the best rBIAS product.
And finally, as shown in the Pearson Correlation Coefficient graph, IMERG behaves like the other metrics. At the same time, DAYMET maintained the highest correlation until 2011, when it began to decline until reaching a peak in 2013. CHIRPS continue to be the best performing product across the time series.
ERA5 is the poorest overall performing product, except for the drastic difference in IMERG in RMSE and rBIAS and radical changes in DAYMET. On the other hand, CHIRPS showed the best performance in terms of rainfall estimation for the entire study period.
Dubey et al. (2021) found that the product with the best performance in terms of correlation is DAYMET for the first 10 years of study (2001–2011), with CHIRPS in second place. From 2012, DAYMET began showing a poor correlation performance while CHIRPS maintained consistent performance throughout the 17 years of study, oscillating in a range of 0.8 to 0.95 of . The central area of the country showed the best correlation score.
Some weather stations have inconsistencies in their daily records, resulting from technical failures, equipment maintenance, or loss of information before their compilation in the central database.
Metric distribution over Mexico in climatic regions
There is a large portion of the study points with a score greater than 60 mm RMSE. It also denotes that the most significant volume of study points in arid regions is concentrated at elevations below 3,000 m. These study points tend to have RMSE scores below 40 mm, highlighting those arid regions as the best in this metric. Therefore, there is evidence that we have positive results to use GEE products in those arid regions in the study area. It is necessary to point out that when using the IMERG product, high scores are presented in most of the study points in all regions and elevations where they are present. In contrast with the rest of the products, it is necessary to point out that Figures 8 and 9 were adjusted to a different scale in concordance with the number of stations in that sector to visualize the distribution of study points in this area. In the same way, we found only six study points above 3,000 and all of them in a temperate region.
Spatial distribution of mean annual rainfall over the Mexican territory, using the information of each gauging station and interpolate them with the Inverse Distance Weighting method.
Spatial distribution of mean annual rainfall over the Mexican territory, using the information of each gauging station and interpolate them with the Inverse Distance Weighting method.
Spatial distribution of the three metrics (RMSE, rBIAS, and (R2)) and his relation with the four principal climate zones.
Spatial distribution of the three metrics (RMSE, rBIAS, and (R2)) and his relation with the four principal climate zones.
Based on the results shown in Figure 10, for daily scale readings, DAYMET presents the highest degree of correlation compared to the other five satellite products analyzed in this study, offering an alternative solution to know the daily precipitation levels, especially in areas where there is a low density of climatological stations, such as the arid regions of the country, where the correlation is approximately 0.7.
Statistical indices over all the analysis period for point to grid comparison at different time scales (daily, monthly, season, and annually).
Statistical indices over all the analysis period for point to grid comparison at different time scales (daily, monthly, season, and annually).
The behavior of rBIAS is shown in Figure 10. IMERG continues to present the worst results in all climatic or elevation conditions. This is because, in the four ranges of elevations and climatic regions of the study area, these two conditions do not seem to affect the metric significantly since a similar distribution is shown for the four ranges of rBIAS.
Similar to the previous point, Figure 7 shows that there does not seem to be an effect of the climatic regions on , highlighting the correlation for each gauge station point for the first 3,000 m of elevation.
Geographically analyzing the results, Dubey et al. (2021) and our study present different behaviors in Mexico's overestimation and underestimation (rBIAS) of precipitation. As shown in Figure 8, an overestimation of precipitation was found in the eastern part of the country (Gulf of Mexico). In contrast, an underestimation was found in the western part of the country (Pacific Ocean) and the best rBIAS scores in the central part of the country.
The methodology differences between the Bruster-Flores et al. (2019) in ‘Evaluation of precipitation estimates CMORPH-CRT on regions of Mexico with different climates,’ and our study is the use of reference stations. This same author used 14 Automatic Meteorological Stations (EMAs) with data every 10 min for the entire Mexican territory of CONAGUA and the National Center for Disaster Prevention in Mexico (Centro Nacional de Prevención de Desastres, CENAPRED). In comparison, we used the 4,568 Weather Stations in the country with data every 24 h in this study. Additionally, the precipitation capture product was not considered for our research due to its absence in the GEE repository. It was also found that the product of Satellite-Based Precipitation (SBP) CMORPH-CRT generated by the Climate Precipitation Center (CPC) of the National Office of Oceanic and Atmospheric Administration (NOAA) presents a range in its correlation indices concerning the EMAs that oscillates between 0.111 and 0.652 as said by Bluster-Flores et al. (2019). In contrast, in our study, the CHIRPS product obtained a correlation score that oscillates between 0.7 and 0.9. We can conclude that this mentioned article is the only one in the literature where GEE products are studied for Mexico in precipitation respect. However, they are approaching them from another perspective. They evaluate whether the products work or not, and we consider which is the best performing one.
One limitation of our study is that some satellite products have a coarser spatial resolution than others, which is essential compared to rain gauge (point) observations (Breña-Naranjo et al. 2015).
The observed results can be explained by the influence of the intertropical convergence zone in the central and southern areas. In contrast, in the northern area, the effect of the high-pressure belt is more significant. These aspects define the climatic regions in Mexico, the seasonality, and the spatial variability of precipitation. The highest accuracy of satellite products, in general, is presented in the months with the most increased precipitation (summer and spring), evidencing the effect of seasonality.
To the best of our knowledge, there are no published studies with similar season-related findings with several meteorological stations involved in the study. We hypothesized that the 4,568 weather stations might give CHIRPS's closer scope in Mexico by performance.
CONCLUSION
This study highlights the role of the performance of seven satellite-based precipitation products (ERA5, DAYMET, PERSIANN-CDR, TRMM 3B42, IMERG, and CHIRPS) over Mexican territory. It also informs the users about the various uncertainties in the foundations and specifications of these products. The relevant findings for the study area were:
The evaluation of the accuracy of the products in the different time scales identified that the products show a higher accuracy on the annual scale. It remains in the monthly and seasonal scales and decreases significantly daily. CHIRPS and DAYMET were the products that showed the highest accuracy according to the metrics used in the study, while ERA5 and PERSIANN exhibited the poorest accuracy. A point to highlight is that the rBIAS was lower in the wet period for the national territory (summer and autumn). In contrast, in the dry period (winter and summer), the rBIAS evidenced a more systematic error between the precipitation observed on the surface obtained with the satellite products. The highest BIAS occurred in the spring period being more important in the month of April, which allows inferring that in many cases, the satellite products record.
All products are highly biased on low precipitation events, meaning that all datasets tend to overestimate low precipitation events and tend to underestimate high precipitation events. The rBIAS is systematic so that future studies such as those for forecasting purposes or implementation of hydrological models should consider the observed biases to improve the results.
The study results allow us to identify that the complex orography of the Mexican territory is an essential aspect when dealing with the accuracy of satellite products. The statistical metrics used showed that, on average, the most significant errors in the daily scale occurred in stations below 1,000 m above sea level while in the station, monthly, and annual scales in stations located above 300 m. Similar results to those observed with stations below 1,000 m were observed at stations between 1,000–2,000 m and 2,000–3,000 m. A higher dispersion in the error values can be identified at stations located in low altitude zones.
The correlation coefficient showed that the highest values were presented in the monthly and seasonal scales, with similar behavior for the different altitudes. A more significant variation in correlation was observed in the annual and daily scales, where the lowest correlations were found with stations at altitudes >3,000 m. Only a greater correlation dispersion is identified in elevations <1,000 m and between 1,000–2,000 m.
In the study regions, we can conclude that the products presented a better performance in the humid tropical and sub-humid tropical areas. This can be explained by the fact that these are the regions with the highest accumulated rainfall heights and the highest number of rainy days. These results open the door to analyzing tropical cyclones' contribution to precipitation in the national territory since these are the regions where their most significant impact is identified.
An essential aspect in interpreting the results is the spatial distribution of the surface stations since the climatological monitoring network is very dense in the central and southern parts of the Mexican territory. At the same time, in the north, it is very scarce. It is also essential to add that a high percentage of the stations are installed at altitudes lower than 3,000 m since accessibility facilitates maintenance. These are stations where the information is collected manually.
The intensity and distribution of precipitation in Mexico depends on topography and climatic regions. ERA5 products showed the lowest correlations to the mean correlation of the entire region, while DAYMET and CHIRPS are the products with the best-scaled score in the whole study period. Likewise, it was found that there is a more significant correlation of products in the regions and seasons with the highest presence of rainfall. In brief, the comprehensive evaluation of precipitation products reported herein will act as a valuable reference for the researchers and decision-makers to select the optimal outcome for their intended application.
ACKNOWLEDGEMENTS
This study could not have been carried out without climate data from CONAGUA Mexico. The authors thank anonymous reviewers for their in-depth reading of the manuscript and the valuable comments and suggestions they have made.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information (https://drive.google.com/drive/folders/15jFeBR2xfLGRPcGTBssCZAPI4Z636azx?usp=sharing).