The analysis of the spatial and temporal distribution of storm events contributes to a better use of water resources, for example, the supply of drinking water, irrigation practices, electricity generation and management of extreme events to control floods and mitigate droughts, among others. The traditional observation of rainfall fields in Mexico has been carried out using rain gauge network data, but their spatial representativeness is unsatisfactory. Therefore, this study reviewed the possibility of obtaining better estimates of the spatial distribution of daily rainfall considering information from three different databases, which include rain gauge measurements and remotely sensed precipitation products of satellite systems and weather radars. In order to determine a two-dimensional rainfall distribution, the information has been merged with a sequential data assimilation scheme up to the diagnostic stage, paying attention to the benefit that the rain gauge network density has on the estimation. With the application of the Barnes method, historical events in the Mexican territory were analyzed using statistical parameters for the validation of the estimates, with satisfactory results because the assimilated rainfalls turned out to be better approximations than the values calculated with the individual databases, even for a not very low density of surface observations.

  • The merging of the three databases considered allows for determining satisfactory approximations of the rainfall fields analyzed, regardless the rain gauge network configuration.

  • The increment in the error rates of the assimilated rainfall estimates is mainly due to the lack of accuracy of the remote sensing products.

  • The recovery of the spatial behavior of historical storms is possible.

The aim of analyzing meteorological data is the characterization of the atmosphere state consistently, both in space and time, thus the method applied in the estimation of the state is a primary factor in the accuracy that can be achieved from a limited set of observations of physical reality.

In relation to the precipitation variable, its spatial distribution is irregular as a result of the geographical characteristics of the land surface, for example, in Mexico the latitude influences the increase in rainfall from the north to the south, while approximately 68% of normal precipitation occurs between June and September. These circumstances invariably raise problems of water availability and flood control, depending on the region and the time of year (CONAGUA 2018a, 2018b).

Rainfall over the national surface of 1′964,375 km2 has generally been registered at pre-established intervals of 24 h with around 5,500 weather stations. However, on 31 December 2017, only 3,079 stations were operating (CONAGUA 2018a, 2018b) under an irregular and dispersed spatial distribution, with a lower density of observations in the north, northwest and southeast of Mexico.

For a simplified consideration of the spatial structure of storms, point measurements located in or near each region of interest are often used (Lahoz et al. 2010), even if such values are only representative in the vicinity of the measurement points. A common consideration since the second half of the last century is the use of empirical relationships known as areal reduction factors (ARF), which are not only intended to mitigate observed rainfall in order to estimate average rainfall in an area but are also considered useful for defining approximate rainfall in basins of similar areas and climatologies (Lozoya et al. 2017). Anyway, the ARF concept is based on considering precipitation as a static phenomenon, then its usefulness is reduced when recognizing the kinematics of an event concerning its catchment area (Berndtsson & Niemczynowicz 1988).

It is important to acknowledge that the quality and quantity of data required for the analysis of a system are changing, because the demands are greater by increasing the capacity of computational tools for the representation of smaller-scale phenomena. Therefore, in order to fill the gaps in a system objectively, it is appropriate to propagate as many discrete data as possible.

As the use of purely deterministic interpolation techniques for rainfall data processing may prove insufficient, methods based on the statistical structure of the fields of observation have been used in spatial modeling of the phenomenon, as is the case of the Kriging method used for the construction of storm fields in Mexico City (Cisneros et al. 2001), compared with results obtained with the spline function, considering the recording of 49 pluviographs with daily rainfall episodes greater than 30 mm, using as quality criteria the statistical mean and the variance of the errors of five validation stations (located to the north, south, east, west and center of the study region) to obtain satisfactory results except in areas of significant local variation or with low density of observations, so they recommended that the available radar images be taken into account.

An alternative to the interpolation problem in regions where the observation network is irregular and dispersed is the application of models for the merging of different databases, to consider the information obtained by remote sensing. For example, Rozante et al. (2010) practiced the estimation of rainfall fields over the entire territory of South America, taking into account approximately 1,500 daily rainfall records for the summer and winter quarters of 2007, as well as estimated values using the 0.25° × 0.25° spatial resolution TRMM satellite mission. Cumulative rainfall estimates of 5 consecutive days were obtained with the Barnes scheme, and for its evaluation indices such as bias, probability of detection and root mean square error were considered, concluding that in areas with a high density of records, the values added by the TRMM lose importance, but in regions with scattered observations a strengthening in the quality of the estimates by merging is seen.

Given the uncertainty in the estimation of rainfall with scattered records, especially convective events due to their greater spatial variability, Li & Shao (2010) combined rain gauge data and TRMM estimates of 120 daily rainfall from 2001 over the surface of Australia, without considering assumptions regarding the distribution of errors. They interpolated with a Kernel smoothing function so as not to rely on the stationary spatial behavior of the data, compared to ordinary Kriging and co-Kriging procedures. With a smoothing of satellite rainfall around the boundaries between consecutive cells, they calculated estimates with lower bias, in addition to satisfactory results from the root mean square error, and a better visual performance with respect to Kriging methodologies that showed a tendency to underestimate. Note that the type of Kernel function used to weight observations is not critical, because by using the optimal model with respect to a Gaussian function, applied in the Barnes scheme for example, differences of up to 5% are obtained in the calculation of the mean squared error.

Because the quality of estimates with IDW interpolation or geostatistical methods such as Kriging may be insufficient even for relatively dense rain gauge networks, especially on reduced time scales, Nanding et al. (2015) combined records of 161 rain gauges and radar estimates of 1 km resolution, particularly 20 days in 2007 over a portion of northern England, on 1-h resolution to classify storms between convective and stratiform, although they recognize that the results improve for longer accumulation periods. Considering that radar estimates were affected by a uniform multiplicative error, the correction of the average error bias was applied. The Ordinary Kriging method was used to interpolate rainfall recorded at specific points, but with a modification of the method, results were obtained that preserve the spatial structure of the radar data, conserving the mean value of the rain gauges considered. When analyzing different densities and configurations of the rain gauges network they observed that configuration dependence is only important for a reduced density of records.

Moreover, Calvetti et al. (2017) recently insisted on the importance of considering different databases for the quantitative estimation of precipitation, emphasizing how rainfall integration techniques seek to remove systematic biases from radar and satellite estimates, either with the combination of addition and multiplication bias corrections (Vila et al. 2009), using networks of rain gauges as reference and corrections by weight functions (Rozante et al. 2010), or the use of geostatistical interpolations to combine information (Nanding et al. 2015). To integrate rain gauge data with radar and satellite estimates they used a multi-cell numerical method to solve a Poisson equation, an algorithm similar to data assimilation techniques by employing a boundary condition as a first assumption. When analyzing events of 2013 and 2014 in Brazil, radar underestimates and satellite peaks were observed, while the combination improved root mean square errors, correlation coefficients and standard deviation results, with errors mainly associated with events of intensity greater than 10 mm/h. Therefore, when considering rain gauge data as boundary conditions, the analysis of systems that consider low-density networks is very sensitive to the quality of remote sensing data.

As the objective of this article is to propose a methodology to improve the estimation of the historical behavior of an original spatially distributed rainfall field, from a scarce record of rain gauge data as occurs in Mexico, it was considered valuable to take into account the experiences shared in research aimed at defining the diagnostic stage of a system as a discrete representation of values located on cells of a regular mesh, choosing the use of the successive correction procedure which considers the Gaussian Kernel function proposed by Barnes, due to the satisfactory background of its application in storm analysis, to take advantage of the reliability in the daily records of the national rain gauge network, as well as the knowledge of the spatial structure of storms derived from the use of satellite systems and meteorological radars.

Model description

The main objective of the sequential or variational assimilation schemes of meteorological data is to get a regular representation of the atmosphere state at an adequate resolution, up to four dimensions, from an irregular and imperfect sample of observations in space and time, based on a heterogeneous arrangement of in situ and remote sensing instruments, so that for any initial state and input data, the models can be used to estimate the future state of the system.

In the sequential assimilation scheme, under the perfect model hypothesis, the analysis of a system up to the diagnostic stage begins by assuming that at some time , its antecedent state is known, vector , given a previous analysis or climatology. To get better estimates , the antecedent vector is corrected by assigning weights to the innovations, which are the differences between the observations of actual states and values obtained with the mapping of the state vector towards the observational space. At a later stage, where observations are available, the model evolves to the next time, and the evolved states become antecedent (forecast) states , which later have to be corrected to improve the analysis at this time , and the process can be repeated (Lahoz et al. 2010).

Various advantageous sequential schemes at the operational level differ in the detail required to achieve the desired objectives. For a diagnostic problem, applying successive correction schemes is practical because they are iterative procedures where the assignment of weights to the innovations needs to be optimally applied since the data smoothing is only a function of the distance between data points. To perform a numerical weather map analysis, Barnes (1964) presented a convergent weighted-averaging interpolation scheme that uses a Gaussian weight function in the spatial domain on the assumption that the two-dimensional distribution of an atmospheric variable can be represented by the summation of an infinite number of independent waves (Fourier integral representation). Some advantages of its use are (Daley 1991; Tintoré et al. 1991):

  • A simple algorithm to weight a large number of observations with a non-uniform spatial distribution,

  • Interpolation procedure in two steps, by modification of Barnes (1973),

  • Suitable for the analysis of sub-synoptic or mesoscale phenomena, i.e. storms, widely used with radar and satellite data,

  • It does not require a radius of influence for the consideration of observations,

  • Possible modification of scale parameters for the weighting procedure,

  • It exhibits reduced sensitivity to observational errors,

  • The last advantage is the economical option regarding processing time, with results comparable to those obtained with more sophisticated methods if the rain gauge network density is relatively low.

Barnes's scheme requires two steps. The first one consists of an initial estimate of a regular distributed precipitation field and background state , considering a set of observations of the same variable distributed irregularly at any known location . Next, the observations are weighted through a function , in order to determine the two-dimensional distribution of rainfall estimate over the cells of a regular mesh that represents the region of interest.

The state in cell ‘g’ of an assimilation mesh with N observations is calculated as:
(1)
where is the distance between the centroid of a mesh cell and an observation, is the weight parameter that represents the percentage of the original wave amplitude that will be considered in the filtering function, is the average spacing between the observations in the study area (Sinha et al. 2006), is the weight function, which tends to zero asymptotically with increasing distance , thus it is unnecessary to limit the radius of influence in the observation search process, and the amount of data to be considered can be increased to ensure that a sufficient number of observations will influence the accuracy of the estimated value of each cell.

The second step of the procedure consists of correcting the background state by assigning weights to the innovations using the modified function . Finally, the innovation is calculated between each observation S and its estimate , which is obtained by mapping the state vector towards the observational space, using as an operator the bilinear interpolation of the four values that are adjacent to the observation S, or through a simpler interpolation method when the information is insufficient, as occurs in the analysis of the peripheral mesh cells (Koch et al. 1981).

The corrected state in cell ‘g’ is evaluated by:
(2)
where the weight function is modified by the parameter , for which Barnes suggested a value in the range of 0.2–0.4 since he points out that less than 0.2 can cause overflow when evaluating the exponential function, and greater than 0.5 does not favor an accelerated convergence process.

Model inputs

With the intention of discussing the advantages of applying the assimilation model on three different databases for the estimation of the diagnostic stage of some historical field of daily rainfall registered on Mexican territory, an identification of events of interest was made given their magnitude and spatial distribution, considering that the modeling requires the collection of rainfall corresponding to simultaneous events. For the identification of representative events, the rainfall behavior registered with conventional and automatic stations was reviewed. The following was considered:

  • Perform the analysis of a highly monitored region, so that the availability of rain gauge records on its surface is wide and of suitable length, to practice examples that consider different density and configuration schemes of its rain gauge observations.

  • Perform the analysis on different regions of the country where the availability of data per unit area is significantly lower but is still relevant due to the size of its surface, geographic location that influences the magnitude of the prevailing rainfall, its proximity to urban centers and the availability of hydrometric information.

In addition, meteorological observations were also collected from satellite sensors, considering the following:

  • Even though the rotational motion of geostationary satellites is appropriate for monitoring cloud systems, since much of the radiation does not penetrate deep into the clouds, the satellite rainfall estimates are based on representative conditions at the top of the clouds because the rainfall developed near the earth's surface is only calculated through indirect relationships (WMO 2008).

  • Since rainfall can be estimated with different algorithms, and products derived from a single record can be characterized by insufficient resolution, there are specific algorithms that combine information recorded by different types of sensors.

  • Since the end of 1997, the Tropical Rainfall Measurement Mission (TRMM) began to operate, made up of a network of satellites from the NASA and JAXA agencies, which covered latitudes from 50°N to 50°S, with resolutions of 0.25° × 0.25° every 3 h. The products obtained from its algorithm (TMPA) were used extensively until 2014, when the global precipitation measurement (GPM) mission began its work, with resolutions of 0.1° × 0.1° every 30 min, for latitudes 90°N to 90°S. Different quality estimates have been achieved with its IMERG algorithm, and the final product is obtained 3.5 months after each event. The analysis of daily rainfall recorded since 2000 is available in the most recent version of IMERG, V06 (Huffman 2020).

  • Due to the satisfactory results reported when using the TRMM and GPM missions products, comparable with the CMORPH and PERSIANN products as described by Gebregiorgis & Hossain (2011) and Yu et al. (2021), respectively, for this paper the rainfall estimated with the IMERG algorithm was considered. Raster files for the dates and geographic location of interest are products of the NASA Giovanni tool, a web application developed by the Goddard Earth Sciences Data and Information Services Center (GES DISC).

Additionally, meteorological radar reflectivity images were collected and a procedure was applied to estimate the corresponding precipitation, taking into account the following:

  • A significant limitation of using techniques for merging recorded and estimated rainfall data is the availability of reflectivity images from meteorological radars in Mexico. According to the National Meteorological Service (SMN, by its Spanish initialism, that depends on National Water Commission, CONAGUA, by its Spanish acronym), which is the institution in charge of operating the instruments and safeguarding the data of the 13 radars that make up the national network, there is a limited availability of this information, whose historical record in general is not more than 15 years.

  • For each date of interest, up to 96 Plan Position Indicator (PPI) images can be received by requesting data from the SMN single window, with a temporal resolution of 15 mins and a spatial resolution of 833 m.

  • When having the required images, it is recommended to practice a methodology for estimating rainfall fields, following the steps proposed by Vilchis et al. (2011). With the use of the SIG-Idrisi ® platform (2016) several procedures can be performed, such as import images, georeferencing, conversion of the numerical information of the images to reflectivity, a decrease of attenuation of the beam emitted by the radar due to rain falling on the radome, characterization of rainfall fields and calculation of the intensity during convective or stratiform events (using the NEXRAD and Marshall–Palmer models, respectively), rainfall accumulation of 96 consecutive images in 1 day, in addition to a correction of the final product regarding errors caused by orographic blocking.

  • For the purpose of assimilating daily rainfall in a study area, the combination of the three databases can be performed considering the spatial resolution of the satellite estimates; that is to say, the uniform distribution of the variable is calculated over the centroids of a set of 30 cells of 0.1° × 0.1° of resolution that make up the mesh that defines the study region. For this reason, from the radar rainfall with resolution of 833 m, for each satellite resolution cell a representative rainfall is calculated as the average of the nine radar estimates closer to each centroid (Méndez et al. 2006), considering that the prevailing wind of a particular day can affect the fall of rain on the ground, and the reflectivity captured by the radar can correspond to the rain precipitated on the region near each centroid, on a surface of approximately 6 km2.

Model calibration and evaluation of results

With information on historical events distributed over each study region, the fine-tuning of the proposed data assimilation model was attempted, with the assignment of appropriate values to its conceptual parameters. Then, in the calibration, the model's response was evaluated by adjusting each parameter within a range of values.

For Equation (2), the variation of the numerical convergence parameter was practiced since it helps smooth the modified weight function , considering . In addition, in the model step that corrects the antecedent state, an IDW interpolation was used as an observation operator for the peripheral cells of the mesh, where the rate at which the weights decrease is dependent on the value of the power parameter , taking into account that larger values increase the influence of closer observations (Yang & Xing 2021). As the convergence parameter directly influences the estimation of each mesh cell, and the power parameter influences only the peripheral mesh cells, the calibration began assuming a fixed value equal to 2, commonly applied in the literature (Yang & Xing 2021), in order to observe the behavior of an statistical criteria, according to the variation of between 0.1 and 0.9.

In the mathematical error measurement of the values estimated by assimilation , compared to those observed in a rainfall field , several statistical criteria were used: the mean error (), a measure of bias that indicates whether the estimator represents a consistent under or overestimation; the mean absolute error (), less sensitive indicator in penalizing outliers or errors of greater magnitude; the root mean square error (), a measure of accuracy that penalizes especially errors of greater magnitude; the Nash–Sutcliffe efficiency coefficient (), sensitive to extreme values, ranging from up to 1 when the estimate is perfect, and efficiency equal to zero implies that both the accuracy of the estimator and that of the mean value of the observations are comparable (Nanding et al. 2015); and the correlation coefficient (), a measure of the level of linear association between estimates and observations.

A cross-validation procedure for evaluating the data assimilation scheme was applied, consisting of removing up to 10% of the available rain gauge data, avoiding a drastic modification of the original density and the spatial distribution of observations. Furthermore, by not including them in the assimilation process, the removed observations can be considered references in the error evaluations (Rozante et al. 2010).

Analysis of a region with high rain gauge records availability

Because the present-day Mexico City (CDMX) has been extensively monitored due to its high population density, the wide availability of the pluviometric and pluviographic records is much appreciated for hydrological purposes, and therefore it was deemed appropriate to consider events that occurred in that entity. Regarding the estimated rainfall with information from the meteorological radar closest to the region of interest, during the elaboration of this document, the SMN had reflectivity images captured by the Cerro Catedral radar, C band, located in the state of Estado de México, at a distance less than its 300 km radius of coverage, with data corresponding to events after December 2007.

To identify events that could be considered representative of the site, Open Data from the CONAGUA were taken into account, regarding monthly averages of rainfall data since 1985, and it was observed that in the period from January 2007 to August 2020, the more significant average rainfall occurred during July and August (137.8 and 136.2 mm), and that the higher daily accumulation events occurred on dates when the aforementioned average values were exceeded, as was the case in 2008 when the average rainfall in July and August was, respectively, 164.0 and 184.7 mm.

Daily rainfall data registered by the pluviometers and pluviographs operated, respectively, by CONAGUA and the Water System of Mexico City (SACMEX, by its Spanish acronym) were reviewed, because the instrument used to measure daily rainfall was considered indistinct. Some important events of 2008 were identified, and their behavior was reviewed to recognize possible atypical values. For example, on July 17, the spatial distribution of precipitation was relatively uniform (close to 7 mm), with greater accumulations observed in the east of the city (maximum of 45.2 mm) and some observations above average in the western zone. However, as isolated values, in the absence of additional data for comparison, they were considered valid.

From the five dates identified (17 July, and 2, 5, 7 and 25 August), the data originated by remote sensing were reviewed, with the construction of maps of the historical spatial distribution of the variable. The IDW interpolation procedure was used, because the data availability is wide (network of 82 rain gauges) and allows the estimation of rainfall in the centroids of the 30 cells of the defined mesh to be relatively reliable. It was observed that although there is a difference between the interpolated rainfall compared to those estimated with satellite and radar, the distribution of maximum accumulations is well represented in the products of remote sensing.

In the knowledge of the spatial distribution of historical rainfall events on CDMX (Figure 1), the assimilation model was calibrated. For example, regarding July 17, in the first stage of calibration, the best estimates were obtained for between 0.3 and 0.5, with lower errors and . In the second stage, the variation of between 1 and 10 was accomplished, keeping the value equal to 0.3, 0.4 and 0.5, obtaining lower quality estimates for greater than 3. Finally, the best estimate of the and errors was obtained with equal to 0.3, for equal to 2 or 3, since the differences between the other statistics were less representative.
Figure 1

Spatial distribution of precipitation, CDMX, 17 July 2008.

Figure 1

Spatial distribution of precipitation, CDMX, 17 July 2008.

Close modal

With the same procedure, similar results were obtained for the other events of interest, so it was judged convenient that for analyzing historical daily rainfall fields over CDMX territory, the parameters and must equal 0.3 and 2 units, respectively.

The calculations corresponding to the assimilation procedure were made using a tool programed in MATLAB ® (2017).

Because the rain gauge network at CDMX is relatively dense for the five identified dates, different schemes were used in the evaluation of the proposed procedure:

  • Assimilation of three databases, considering the totality of rain gauges (complete 100%).

  • Assimilation of three databases with rain gauge networks of reduced density (high 75%, medium 50% and low 25%), trying to preserve the original configuration of the network. In addition, the influence of the spatial distribution of the data was analyzed, considering three different configurations for each density.

  • Assimilation which considers only one remote sensing database at a time, for the three densities and three pluviometer configurations.

  • Assimilation of remote sensing data, satellite and radar, without considering rain gauge observations.

Analysis of regions with scarce information

To judge the relevance of the methodology on regions where the availability of rain gauges in quantity and distribution was very limited, the procedure was applied to three different sub-basins. In addition, given the difficulties involved in compiling and managing radar information, assimilation was applied by considering only the rain gauge data available in the CLICOM database of CONAGUA, as well as satellite estimations from the Giovanni online system, so that spatially distributed precipitation is available across a range of cells of equal spatial resolution (0.1° × 0.1°).

In consideration of the sub-basin that directly contributes to surface runoff at the Teapa hydrometric station in the state of Tabasco (current of the Teapa river in the Grijalva river basin, a catchment area of 418.6 km2, warm, humid climate with abundant rains in summer, with an average total annual precipitation of 3,130 mm, and 24-h maximum precipitation of 301.2 mm, with an average of 76.5 mm), an assimilation process was performed concerning the rainfall registered from 19 to 26 November 2015, because the most significant daily accumulation within this period is close to the average of the maximum records. The assimilation was practiced in an array of 20 cells with 8 rain gauge stations, so the validation was performed considering only three observation sites. Regarding the model calibration, the parameters and were equal to 0.4 and 2, respectively.

For similar reasons, two rainfall events that occurred in the sub-basin of the Cadereyta station in the state of Nuevo León were analyzed (current of the Santa Catarina river in the San Juan river basin, the catchment area of 1,812.3 km2, dry and semi-dry climate with rains between August and September, with an average total annual precipitation of 631 mm, and 24-h maximum precipitation of 209.4 mm, with an average of 26.2 mm), taking into consideration the precipitation from 8 to 22 September 2002, since the data show no relationship with any cyclonic event, as well as the rainfall registered from 16 to 30 July 2005, the period in which the influence of the passage of hurricane Emily was observed. A mesh of 60 cells was analyzed, with rainfall registered in 16 and 14 stations for the events of 2002 and 2005, respectively, so only two sites were considered for validation. Regarding the calibration, the parameters and were equal to 0.4 and 1, respectively.

Additionally, a rainfall event in the sub-basin of the Las Perlas station in the state of Veracruz was considered (current of the Coatzacoalcos river basin, catchment area of 8,993.1 km2, warm, humid climate with abundant rains in summer, with an average total annual precipitation of 2,441 mm, and 24-h maximum rainfall of 213.6 mm, with an average of 52.6 mm), in attention to the rainfall registered from 21 September to 5 October 2010, related to the passage of tropical storm Matthew. The study was conducted in a mesh of 170 cells, with rainfall registered in 10 stations, so validation involved only two sites, and in the calibration the parameters and were equal to 0.4 and 3, respectively.

The spatial precipitation estimate was obtained for each consecutive rainy day analyzed in the three regions defined by uniform meshes. As an example, Figure 2 illustrates the spatial distribution of rain gauge records, as well as the satellite-derived rainfall product, the IDW interpolations of the rain gauge records considered in the assimilation, and the rainfall field estimated by assimilation in the Teapa sub-basin (23 November 2015). Figures 3 and 4 illustrate the rain gauge network and the precipitation estimated by assimilation in the Cadereyta and Las Perlas sub-basins (10 September 2002, and 27 September 2010, respectively).
Figure 2

Spatial distribution of precipitation, Teapa, 23 November 2015.

Figure 2

Spatial distribution of precipitation, Teapa, 23 November 2015.

Close modal
Figure 3

Spatial distribution of precipitation, Cadereyta, 10 September 2002.

Figure 3

Spatial distribution of precipitation, Cadereyta, 10 September 2002.

Close modal
Figure 4

Spatial distribution of precipitation, Las Perlas, 27 September 2010.

Figure 4

Spatial distribution of precipitation, Las Perlas, 27 September 2010.

Close modal
The quality of the assimilation procedure of three databases proposed for estimating the daily rainfall spatial distribution was evaluated for five events that occurred in Mexico City, which were considered representative because the rainfall was slightly above the average of the annual maximum values for the region (47 mm). With the purpose of summarizing the results of the assimilation with respect to the rainfall obtained from the independent data of satellite, radar or the IDW interpolation of rain gauge observations, Figure 5 shows the calculation of the , , , and criteria, which correspond to the average statistics of the five rainy days analyzed.
Figure 5

Average statistical criteria in the evaluation of five rain events, CDMX.

Figure 5

Average statistical criteria in the evaluation of five rain events, CDMX.

Close modal

Considering a cell size that fits the satellite resolution, assimilated rainfall fields are considered satisfactory, since the errors obtained showed a slight underestimation, although with better performance and accuracy according to the and parameters. Likewise, with the coefficients close to 1, it was interpreted that the fits are of good quality. Furthermore, the coefficients obtained imply a significant linear association between integrated rainfall and validation data.

Given the satisfactory results obtained with the assimilation of rainfall events when using all the available data in CDMX, the performance of the merging method was reviewed when the number of rain gauge observations was scarce. According to the rain gauge network density (complete 100%, high 75%, moderate 50%, and low 25%), its behavior can be seen in Figure 6, which represents the variation of the , and parameters, normalized for the maximum errors of each event, in addition to the coefficients and . The results are a condensed representation of the average statistics calculated from analyzing three different rain gauge network configurations, considering the five rainy days of interest.
Figure 6

Variation of normalized parameters , and , and coefficients and , in the evaluation of five rain events, CDMX.

Figure 6

Variation of normalized parameters , and , and coefficients and , in the evaluation of five rain events, CDMX.

Close modal

From the analysis of rain gauge networks of different densities and relatively uniform spatial distribution of observations, it was observed that the combination of the three databases allows for determining adequate approximations of the rainfall fields, regardless of a network configuration, especially if the rain gauge network density is high or medium. However, the quality of the estimates is reduced if the density is low, with errors comparable to those obtained with the IDW interpolation of the rain gauge records.

Additionally, it was decided to review the accuracy of the assimilation method considering only one remote sensing database at a time. Figure 7 shows the behavior of the statistics corresponding to the average values obtained with the analysis of three rain gauge configurations, considering the five rainy days of interest. As a complement, the same figure also shows the statistics corresponding to satellite and radar databases merging.
Figure 7

Variation of normalized parameters , and , and coefficients and , in the evaluation of five rain events, by merging only two databases, CDMX.

Figure 7

Variation of normalized parameters , and , and coefficients and , in the evaluation of five rain events, by merging only two databases, CDMX.

Close modal

In general, it was observed that by combining rain gauge records with satellite or radar data, errors of a similar order of magnitude are obtained, with results comparable to the IDW interpolations from rain gauge networks of medium density. Moreover, it is essential to note that the quality of the results of merging the two remote sensing products without rain gauge information is needed to be better.

Given that the most satisfactory estimates of the spatial distribution of daily rainfall over Mexico City were obtained with the assimilation of the three recommended databases and the merging of rain gauge records with just one of the remote sensing products represents an approximation to the best estimate, it was considered appropriate to apply the assimilation of rain gauge and satellite data for the analysis of events that occurred in other regions of the country, where the availability of rain gauge data is significantly lower, in particular three sub-basins whose size and geographical location influenced the non-uniform distribution of records and the magnitude of precipitation on several consecutive days.

In order to know the quality of the two databases' assimilation results compared to the rainfall obtained when considering the databases independently, the statistical criteria previously used were also taken into account (, , , and ), corresponding to the rainy days analyzed in the study regions. In addition, the same criteria were applied to the errors obtained when considering the accumulated rainfall for consecutive rainy days.

According to the calculations, the errors for daily and accumulated estimations obtained by assimilation were lower for the dates the accumulated precipitation was more significant. On the contrary, regarding the dates with less precipitation, when the inaccuracies of the satellite estimates are regularly larger, the errors associated with the IDW interpolation of the rain gauge records may be lower.

Figure 8 is a concise and condensed visual representation of the satisfactory results. It corresponds to the average of the aforementioned statistical criteria from the most significant daily rainfall accumulation (from 21 to 23 November over the Teapa sub-basin). Figure 9 shows the statistics corresponding to the accumulated rainfall during those 3 days.
Figure 8

Average statistical criteria, three rainy days, November 2015, Teapa.

Figure 8

Average statistical criteria, three rainy days, November 2015, Teapa.

Close modal
Figure 9

Statistical criteria for rainfall accumulated in 3 days, November 2015, Teapa.

Figure 9

Statistical criteria for rainfall accumulated in 3 days, November 2015, Teapa.

Close modal
In the same way, Figures 1013 are representations of the same statistical criteria obtained from the daily and accumulated rainfall estimates, applied to the Cadereyta sub-basin, considering the dates with the most significant accumulation, more specifically, the days 10, 11, 14 and 15 of September 2002, and 16, 20, and 25 July 2005, where it is observed that in general, the assimilation results of the two databases are better approximations than the rainfall estimated from each database.
Figure 10

Average statistical criteria, four rainy days, September 2002, Cadereyta.

Figure 10

Average statistical criteria, four rainy days, September 2002, Cadereyta.

Close modal
Figure 11

Statistical criteria for rainfall accumulated in 4 days, September 2002, Cadereyta.

Figure 11

Statistical criteria for rainfall accumulated in 4 days, September 2002, Cadereyta.

Close modal
Figure 12

Average statistical criteria, three rainy days, July 2005, Cadereyta.

Figure 12

Average statistical criteria, three rainy days, July 2005, Cadereyta.

Close modal
Figure 13

Statistical criteria for rainfall accumulated in 3 days, July 2005, Cadereyta.

Figure 13

Statistical criteria for rainfall accumulated in 3 days, July 2005, Cadereyta.

Close modal
Finally, the rainfall estimated with satellites showed an order of magnitude similar to the rain gauge records for the event over the Las Perlas sub-basin. The lack of accuracy in the satellite estimates related to the dates with the least accumulation was less significant. The most critical errors occurred regarding a single date (September 26), when the rain gauge records considered for validation had a much lower magnitude (four to six times) than the satellite estimations. Without considering the 26 September results, Figures 14 and 15 show the statistical criteria from the daily and accumulated rainfall estimates of about 14 rainy days. Again the assimilation results are better approximations than the values calculated from each database.
Figure 14

Average statistical criteria, 14 rainy days, 2010 event, Las Perlas.

Figure 14

Average statistical criteria, 14 rainy days, 2010 event, Las Perlas.

Close modal
Figure 15

Statistical criteria for rainfall accumulated in 14 days, 2010 event, Las Perlas.

Figure 15

Statistical criteria for rainfall accumulated in 14 days, 2010 event, Las Perlas.

Close modal

The availability of rain gauge records over Mexican territory has traditionally made a statistical analysis of local precipitation possible. However, the density and distribution of the national rain gauge network may be considered insufficient to capture the spatial structure of the storms, as there is an average spacing between stations exceeding 25 km.

Considering that daily rainfall measurements at ground level are reliable and there is a history of using remote sensing products to know the spatial and temporal distribution of rainfall events, it was deemed appropriate to take advantage of the data assimilation concept to apply a sequential assimilation scheme up to the diagnostic stage in order to obtain rainfall fields by merging records from different databases. The scheme proposed by Barnes was used, recommended as a simple model, applicable in large-scale regions, with reduced sensitivity to errors, which provides results of appropriate quality if the density of observations is not so low.

In order to evaluate the quality of daily rainfall estimates obtained with the proposed model, general aspects of previously submitted database combination procedures were taken into account, in which validation of accumulations of several consecutive days or months is common, over regions of magnitude even greater than an entire nation, and in consideration of 0.25° × 0.25° spatial resolution satellite data. In the understanding that the calculation of rainfall by remote sensing is better in larger time scales, in this work we chose the analysis of rainy periods of shorter duration, evaluating daily events or consecutive days of rainfall directly associated with the development of surface runoff registered by hydrometric stations, located at the outflow points of sub-basins defined for this purpose. It also used satellite information of better resolution, 0.1° × 0.1°, freely available from NASA's Giovanni platform, GPM mission.

As a complement, this proposal recommends the incorporation of the methodology presented by Vilchis et al. (2011) to perform a robust analysis of reflectivity images obtained with meteorological radars, to have a better-quality distributed rainfall, which is necessary for a more objective evaluation of the use of radar data in comparison to that obtained from satellite missions.

This study employed a larger number of rain gauge observations, with an average spacing between records of 7.0 km, given the background results of rainfall behavior on CDMX. The model was also evaluated with limited information, for a high, medium and low rain gauge network density, which corresponds to a spacing between observations of 8.1, 10.0 and 13.9 km. In addition, with the construction of maps of the spatial distribution of the variable, it was seen in advance that the maximum accumulations are represented in the products of remote perception, which justifies their use in the assimilation process.

From the analysis of mainly convective events in the CDMX, considering a greater number of statistical parameters for validation, it is concluded that the recovery of the spatial behavior of historical storms is possible if the density and spatial distribution of the available rain gauges are sufficient to complement the estimates of rainfall obtained as satellite and meteorological radar products, as it was also observed that the assimilation of the two remote sensing products without rainfall information generates low-quality estimates.

Assimilations using only one remote sensing database are useful approximations to the best result obtained by merging the three recommended databases, because the errors by assimilating rain gauge data with satellite or radar estimates have a similar order of magnitude, identifying an overestimation when considering satellite data, and an underestimation with radar data (Figures 5 and 6).

Additionally, the methodology was applied using rain gauge and satellite data on regions where the amount of information and its distribution were very limited. The results for both daily and accumulated rainfall over several days showed that the increase in the error of estimation of assimilated rainfall is mainly related to the lack of accuracy of rainfall calculated as satellite products, which is most recurrent in low-magnitude rainfall events. Meanwhile, in the Teapa and Cadereyta sub-basins, with distance between observations of 17.5 and 22 km, respectively, it was observed that estimates by IDW rain gauge data interpolation may be better for analyzing dates with less precipitation, especially if this circumstance contributes to increasing the inaccuracy of satellite estimates.

As a complement, in the analysis of the Las Perlas sub-basin, with an average distance of 45.8 km between observations, rainfall amounts of a similar order of magnitude between the rain gauge records and the satellite estimates were observed; that is, the lack of accuracy of the satellite products was less significant, even on days with minor precipitation. It is therefore concluded that in case satellite data keep an order of magnitude similar to the available rain gauge records, the approximation of a rainfall field obtained by assimilation will invariably show better performance in terms of the dispersion of errors, compared to considering the data independently.

Based on the previous, the results show the importance of the uniform distribution of the rain gauges in a rainfall monitoring network, especially if the density of observations is low, since the satisfactory estimation of events with these characteristics corroborated the goodness of the application of the proposed model. Because precipitation analysis with scarce information was performed in sub-basins with available hydrometric registers at their exit points, it is possible to supplement the review of the quality of the estimated rainfall fields by considering them as the main input of an analysis of surface runoff due to the spatial distributed rainfall collected in the corresponding sub-basins.

The authors would like to thank the CONAGUA and the SACMEX institutions for providing the pluviometer and pluviograph data, respectively. Likewise, the authors gratefully acknowledge the SMN and the NASA administration for providing the radar and satellite data, respectively. The first author would like to give special thanks to Dr Iván Vilchis Mata for his entire support in obtaining precipitation estimates from the reflectivity images of weather radars.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Barnes
S. L.
1964
A technique for maximizing details in numerical weather map analysis
.
Journal of Applied Meteorology
3
(
4
),
396
409
.
Barnes
S. L.
1973
Mesoscale objective map analysis using weighted time-series observations. NOAA Tech. Memo. ERL NSSL–62, National Severe Storms Laboratory, Norman, OK 73069, 60 pp. [NTIS COM-73-10781]
.
Calvetti
L.
,
Beneti
C.
,
Neundorf
R. L. A.
,
Inouye
R. T.
,
dos Santos
T. N.
,
Gomes
A. M.
,
Herdies
D. L.
&
de Gonçalves
L. G. G.
2017
Quantitative precipitation estimation integrated by Poisson's equation using radar mosaic, satellite, and rain gauge network
.
Journal of Hydrologic Engineering
22
(
5
),
E5016003
.
Cisneros
H. L.
,
Bouvier
C.
&
Domínguez
R.
2001
Aplicación del método kriging en la construcción de campos de tormenta en la Ciudad de México (Application of the kriging method in the construction of storm fields in Mexico City)
.
Ingeniería hidráulica en México
XVI
(
3
),
5
14
.
CONAGUA
2018a
Atlas del Agua en México (Atlas of Water in Mexico)
.
Available from: https://agua.org.mx/biblioteca/atlas-de-agua-en-mexico/ (accessed 14 April 2023)
.
CONAGUA
2018b
Estadísticas del Agua en México (Statistics of Water in Mexico)
. .
Daley
R.
1991
Atmospheric Data Analysis
.
Cambridge University Press
,
Cambridge
.
Huffman, G. 2020 The transition in Multi-Satellite Products from TRMM to GPM (TMPA to IMERG). In: Global Precipitation Measurement. NASA, pp. 1–5. Available from: https://gpm.nasa.gov/resources/documents/transition-multi-satellite-products-trmm-gpm-tmpa-imerg (accessed 2 October 2023).
Koch
S. E.
,
desJardins
M.
&
Kocin
P. J.
1981
The GEMPAK Barnes objective analysis scheme. NASA Tech. Memo. 83851, NASA/GLAS, Greenbelt, MD 20771, 56 pp. [NTIS-N8221921]
.
Lahoz
W.
,
Khattatov
B.
&
Ménard
R.
2010
Data Assimilation: Making Sense of Observations
.
Springer
,
Berlin
.
Lozoya
J. O.
,
Domínguez
R.
&
Arganis
M. L.
2017
Manual de Diseño de Obras Civiles: Cap. A.1.7 Tormentas de diseño (Civil Works Design Manual: Chapter A.1.7 Design Storms)
.
Instituto de Ingeniería, UNAM
,
México
.
Méndez
B.
,
Domínguez
R.
,
Magaña
V.
,
Caetano
E.
&
Carrizosa
E.
2006
Calibración hidrológica de radares meteorológicos (Hydrological calibration of meteorological radars)
.
Ingeniería hidráulica en México
XXI
(
4
),
43
64
.
Nanding
N.
,
Rico
M. A.
&
Han
D.
2015
Comparison of different radar-raingauge rainfall merging techniques
.
Journal of Hydroinformatics
17
,
422
445
.
Rozante
J. R.
,
Moreira
D. S.
,
de Gonçalves
L. G. G.
&
Vila
D. A.
2010
Combining TRMM and surface observations of precipitation: technique and validation over South America
.
American Meteorological Society
25
,
885
894
.
Sinha
S. K.
,
Narkhedkar
S. G.
&
Mitra
A. K.
2006
Barnes objective analysis scheme of daily rainfall over Maharashtra (India) on a mesoscale grid
.
Atmósfera
19
(
2
),
109
126
.
Tintoré
J.
,
Alonso
S.
&
Gomis
D.
1991
Análisis objetivo y diagnóstico en fluidos geofísicos (Objective analysis and diagnosis in geophysical fluids)
.
Física de la Tierra
3
,
179
218
.
Vila
D.
,
Goncalves
L.
,
Toll
D.
&
Rozante
J.
2009
Statistical evaluation of combined daily gauge observations and rainfall satellite estimates over continental South America
.
American Meteorological Society
10
,
533
543
.
Vilchis
I.
,
Quentin
E.
,
K.
&
Díaz
C.
2011
Estimación de precipitación diaria a través de un SIG con imágenes de radar meteorológico (Estimation of daily precipitation through a GIS with weather radar images)
.
Tecnología y Ciencias del Agua
II
(
4
),
167
174
.
WMO
2008
Guide to Hydrological Practice. Volume I: Hydrology – From Measurement to Hydrological Information, WMO-No. 168
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).