Abstract
Accurate and reliable measurement and prediction of the spatial and temporal distribution of rain field over a wide range of scales are important topics in hydrologic investigations. In this study, a geostatistical approach was adopted. To estimate the rainfall intensity over a study domain with the sample values and the spatial structure from the radar data, the cumulative distribution functions (CDFs) at all unsampled locations were estimated. Indicator kriging (IK) was used to estimate the exceedance probabilities for different preselected threshold levels, and a procedure was implemented for interpolating CDF values between the thresholds that were derived from the IK. Different probability distribution functions of the CDF were tested and their influences on the performance were also investigated. The performance measures and visual comparison between the observed rain field and the IK-based estimation suggested that the proposed method can provide good results of the estimation of indicator variables and is capable of producing a realistic image.
INTRODUCTION
As rainfall constitutes the main source of freshwater for our planet and all living things on earth, accurate and reliable measurement and prediction of its spatial and temporal distribution over a wide range of scales is an important subject for hydrologic investigations. The estimation of the rainfall amount over the spatial domain from the data taken at measurement stations is a crucial stage in many hydrological applications. The problem of estimating spatial distribution of rainfall may be described as follows. Obtain an estimate of the unknown rainfall amount, , at an arbitrary location,
, within the estimation domain,
, using nearby measurements in and around
,
, at locations,
, respectively. The value of
can be estimated as a weighted average of
(Seo 1998).
To construct new data points within the range of a discrete set of known sample data points, interpolation is needed. Many schemes for spatial interpolation of rainfall have been developed, ranging from simple techniques, such as Thiessen polygons (Thiessen 1911), which simply assign the record of the nearest raingauge to the unsampled location, or inverse-distance weighting (IDW) schemes (Cressman 1959; Shepard 1968; Barnes 1973), to more complex approaches, such as kriging (Matheron 1965). In the Thiessen polygons method, only one raingauge is contributed to the estimated value of each unsampled location and the information on rainfall gradients is lost (Dirks et al. 1998). To overcome this shortcoming in estimating the spatial variation of rainfall within the considered domain, numerous studies have been proposed and tested. A simple scheme is based on IDW function with different forms of the weighting functions. The IDW scheme is a relatively simple, deterministic interpolation method, and it requires the determination of the weighting functions prior to the interpolation. Furthermore, the arbitrary nature of choosing the weighting functions may significantly affect the accuracy of the interpolated field. More studies have been made on geostatistical tools, like kriging, to interpolate rainfall field. Kriging avoids the need for preselecting parameters by calculating the semi-variogram for each pair of observations in the network as a measure of the degree of spatial dependence between the points. Kriging is increasingly preferred in the study of spatial interpolation of precipitation because it allows one to capture the spatial correlation between neighboring observations to estimate and predict attribute values at unsampled locations (Goovaerts 2000). Especially in a region with intense and strongly varying rainfall events, a kriging procedure generally provides a much better estimation than any of the more commonly used methods (Creutin & Obled 1982). Similar conclusions were drawn in several other studies (Shaw & Lynn 1972; Tabios & Salas 1985; Phillips et al. 1992; Abtew et al. 1993). In kriging, the semi-variogram is used to determine the weighting function.
Kriging is a local estimation technique of the best linear unbiased estimator (BLUE) for the unknown values of spatial and temporal variables. It is based on the use of a random field, and a number of assumptions such as stationarity and spatial ergodicity, so as to reduce the needed information to a so-called variogram that can be estimated from the available measurements (Loquin & Dubois 2010). Based on the variogram, optimal weights are assigned to known values in order to calculate unknown ones. The variogram changes with the distance and the weights depend on the known sample distribution. As an interpolation tool, it has some very useful properties (Grimes 2008). It takes account of spatial structure inherent in the data and provides unbiased, optimal (in a least-squares sense) measured values. In addition, it can be adapted to account for secondary information (e.g., topography) and provides the degree of uncertainty for each interpolated value. Ordinary kriging (OK) is the most commonly used algorithm in the kriging family. Another best-known technique is simple kriging (SK), which is based on the assumption that the expected value of the regionalized variable is a known constant. There are many variants of the kriging method, such as co-kriging which takes into account the correlation of the main variable (e.g., rainfall) and secondary variables (e.g., topography), kriging with external drift (KED) which assumes that the mean of a considered variable is linearly related to a secondary variable and indicator kriging (IK) which carries out the kriging procedure on a binary pattern and provides the probability of occurrence of a considered variable being greater than a given threshold at the estimated locations. The new method, besides allowing spatial information to be incorporated into the modeling and estimation process, also allows temporal information to be incorporated. This technique employs temporal data as secondary co-kriged variables.
Quite a number of works have been carried on to interpolate rainfall data using geostatistical kriging methods (Lebel & Laborde 1988; Barancourt et al. 1992). Goovaerts (2000) compared three multivariate geostatistical algorithms to incorporate a digital elevation model into the spatial prediction of rainfall with the straightforward linear regression techniques, such as the inverse distance or Thiessen polygon. The results confirmed that in a region with low-density raingauge network, geostatistical interpolation outperformed the other univariate methods. Kyriakidis et al. (2001) presented a geostatistical framework that integrated atmospheric and terrain characteristics into the spatial interpolation of seasonal average daily precipitation. The results indicated that geostatistical framework could provide more accurate representations of the spatial distribution of rainfall than those found in the traditional analyses. However, the degree of the improvement depended on the density of the rain gauge stations and the spatial variability of rain field. Pardo-Igúzquiza & Dowd (2005) described a two-step procedure of IK and OK, made a comparison with the empirical kriging and discussed the pros and cons of each method. Kriging with external drift (KED) and indicator kriging with external drift (IKED) were used for the spatial interpolation of hourly rainfall from raingauges using additional information from the radar by Haberlandt (2007). The impact of the semi-variogram estimation (isotropic or anisotropic behavior) on the interpolation performance was analyzed, and it was concluded that the automatic fitting procedure with isotropic variograms offered the best results in his study.
A merging method combining a mean precipitation field interpolated from raingauge observations, along with information about the spatial variability from radar data, was developed by Ehret et al. (2008). This method was proved to offer a reliable basis for a flood forecasting system. Goudenhoofdt & Delobbe (2009) tested several radar-gauge merging methods with various degrees of complexity and found that geostatistical merging methods outperformed the others. Verworn & Haberlandt (2011) applied the multivariate kriging method with external drift (MKED) using additional information from topography, rainfall data from the denser daily networks and weather radar data and concluded that the weather radar was proved to be the most valuable additional information for convective summer events, while daily rainfall was sufficient for winter events. Tobin et al. (2011) compared different interpolation methods, including IDW, OK and KED, and showed that KED outperformed other methods, which was consistent with the observation by Velasco-Forero et al. (2009). Wagner et al. (2012) analyzed seven different rainfall interpolation schemes including Thiessen polygons and geostatistical approaches with regard to their suitability to produce spatial rainfall estimates in a monsoon-dominated region with scarce rainfall measurements. They concluded that the precipitation interpolation approaches using appropriate covariates performed the best. Sideris et al. (2014b) expanded the well-tested technique (KED) by introducing and coupling temporal data with spatial data through co-kriging and this proposed technique achieved to serve as a real-time operational tool in Switzerland. Lebrenz & Bárdossy (2019) applied quantile kriging (QK) to the variable of observed monthly precipitation from raingauges in South Africa. The cross-validations proved that QK performed the best among OK and external drift kriging (EDK). Huang et al. (2019) proposed a new spatial precipitation interpolation method based on the information diffusion principle that has been proven to effectively reduce data or information uncertainties from point ground observations.
However, kriging is limited in that it depends on the existence of a reliable representation of the spatial correlation structure and is mathematically more complicated and computationally more intensive. The usefulness of kriging depends on whether the distribution of data agrees with certain statistical assumptions (e.g., Gaussian distribution) and a representative semi-variogram can be defined. In modeling spatial variability of rainfall field, these assumptions are not always satisfied. A number of technical studies are carried out to fix this problem and are well documented in Goovaerts (1997). IK is often qualified as a ‘non-parametric’ approach because it does not rely on a prespecified spatial distribution model for the attribute under study. It is, therefore, applicable for any data sets, which explains the popularity of the method. Liu & Tung (2015) established a geostatistical model based on indicator variograms fed with radar-derived rainfall data that could effectively evaluate the dynamic evolution of the spatial-temporal variation of rain field in Hong Kong. Apart from this non-Gaussian distribution issue, still several questions need to be further addressed for an improvement of interpolation, like the influence of the semi-variogram estimation on the interpolation performance of short time step rainfall in Hong Kong is relatively unknown.
The main objective of this paper is to contribute to evaluate the performance of IK schemes using different sample variograms and different probability distribution functions for fitting rainfall intensity. The paper is organized as follows. First, a brief description of the study area and the data preparation is presented. Then, the methodology to estimate the rainfall intensity by different sample variograms and different probability distribution functions is introduced. This is followed by discussion of the results and finally the conclusions that can be drawn from this study are presented.
STUDY AREA AND DATA PREPARATION
Study area
Hong Kong, with a population of about 7 million people, is situated on China's south coast and is enclosed by the Pearl River Delta and the South China Sea. Heavy rain in summer in Hong Kong is typically associated with the southwest monsoon and tropical cyclones. The threat of flooding is greatest during storm surges generated by the passage of typhoons due to high wind. Many parts of the territory are densely populated urban areas surrounded by or situated within catchments that are typically small with relatively steep slopes. These urban areas and catchments have a relatively short time-of-concentration and are particularly susceptible to the threat of flash floods and extensive flooding rendering significant economic losses and possible fatalities. Apart from urban flash flooding, another disaster caused by persistent heavy rain in Hong Kong is landslides because of the hilly terrain.
Data preparation
There are numbers of factors influencing the evolution of the spatial structure of rainfall during a rainstorm event, including the weather conditions, the topography of the study area, type of precipitation, and the spatial and temporal scale under consideration. The spatial structure of rainfall could be quite dynamic in space and time. To capture and describe the dynamic time-space evolution of the rainfall, Liu & Tung (2015) proposed a geostatistical approach based on indicator variograms of rain fields and found that the variability of the spatial structure of rain field was rationally retrieved by the main features of the variograms. Therefore, effective and reliable estimation of the variograms at each time step during the rainstorm event is of great importance for the geostatistical interpolation. The variogram estimation has to be made for each time step.
Traditionally, rainfall information has been collected in the form of raingauge records and rainfall is observed at a particular point on the Earth's surface. While the temporal resolution of the rainfall records from the raingauge stations is acceptable, the density of raingauge network is sometimes too sparse to reliably estimate the spatial structure. The idea of using radar reflectivity measurements is appealing by providing the high space-time resolution of observations. In recent years, increasing use has been made of radar-based rainfall estimates and radar images. Due to the superiority of the high spatial and temporal resolution, radar rainfall data have been used more frequently as an input for hydrological modeling.
However, careful system design and sophisticated data processing are required for radar rainfall estimation because of sampling error and measurement-algorithm induced error. The relative influence for different sources of error as well as relations between point raingauge rainfall records and radar measurements are described at length elsewhere (Huffman 1997; Ciach & Krajewski 1999; Harrison et al. 2000; Germann & Joss 2001; Sokol 2003). Because of the existence of frequently occurring large space-time variable bias, radar data that have not been corrected are insufficient for the rainfall estimation or prediction (Cole & Moore 2008; Verworn & Haberlandt 2011). Quite a number of studies which compare different algorithms for deriving estimates of precipitation from radar rainfall data have been carried out in conjunction with measurements at raingauges to map precipitation (Wagner et al. 2012).
The radar data used in this study is collected by the Hong Kong Observatory (HKO) from a rainstorm nowcasting system SWIRLS (Short-range Warning of Intense Rainstorms in Localized Systems). The radar reflectivity with a high spatial (1 km2) and temporal (6 min) resolution which is detected at 1 km height is correlated every 5 min with the rainfall recorded by the raingauges underneath to gain the optimal parameters a and b in the relationship
. A total of 132 raingauge stations, 46 from the HKO and 86 from the Geotechnical Engineering Office (GEO), over the Territory of Hong Kong were used (Figure 1).
To choose the thresholds, the quantiles of the rainfall intensity values are obtained for each rainstorm event (see Figure 2). For the purpose of illustration, rainstorms occurred on 18 May 2007 and 19 April 2008 are used herein.
Quantiles of the distribution of the rain rate values for the four studied rainstorm events.
Quantiles of the distribution of the rain rate values for the four studied rainstorm events.
On 18 May 2007, a trough of low pressure developed over inland Guangdong and moved across the south China coast. A squall line swept across Hong Kong from northwest to southeast that evening, bringing heavy showers and severe gusts to the territory. A peak gust over 100 km/h was recorded at Waglan Island. With the trough of low pressure lingering along the south China coast, there were heavy showers and thunderstorms between 19 and 22 May.
Under the combined effect of Typhoon Neoguri and the northeast monsoon, local winds started to pick up on 18 April 2008. Local winds became generally strong on the afternoon of 19 April 2008 when Neoguri was about 200 km to the west-southwest of Hong Kong. When Neoguri approached the coast of western Guangdong, the warm southerly winds associated with Neoguri met the relatively cooler northeast monsoon and formed a warm front with severe convective activities over the coastal waters of Guangdong. The warm front moved from south to north across the coast of Guangdong and brought heavy rain to Hong Kong on that day. The total daily rainfall recorded at the HKO on that day was 233.4 mm, the highest daily rainfall amount recorded in April since records began. After making landfall and moving further inland, Neoguri weakened rapidly. Locally, rain also eased off rapidly with just a few showers on 20 April 2008.
METHODOLOGY
IK scheme




Example of estimated ranges of indicator variogram at the threshold of 5 mm/h for different directions at 16:30 pm during the 2007-05-18 rainstorm event.
Example of estimated ranges of indicator variogram at the threshold of 5 mm/h for different directions at 16:30 pm during the 2007-05-18 rainstorm event.
Example of estimated ranges of indicator variogram at the threshold of 10 mm/h for different directions at 20:00 pm during the 2008-04-18 rainstorm event.
Example of estimated ranges of indicator variogram at the threshold of 10 mm/h for different directions at 20:00 pm during the 2008-04-18 rainstorm event.
Illustration of kriging with moving neighborhood with the solid circle indicating the conditioning sample data used to estimate R(x0) and the empty circle indicating the discarded points.
Illustration of kriging with moving neighborhood with the solid circle indicating the conditioning sample data used to estimate R(x0) and the empty circle indicating the discarded points.





















Since 1983 when IK was introduced by Journel (1983), it has grown to become a widely applied estimation technique. The method offers a non-parametric, distribution-free solution to the spatial interpolation problem of a variable at an unsampled location. IK has a number of advantages: (1) it does not require the prior knowledge about the spatial distribution of the original sample data; (2) it can handle moderate mixing of diverse sample populations and is resistant to outliers; (3) it takes into account the structure of indicators at each considered threshold; (4) it provides full information of the CDF at the selected thresholds and (5) it provides an estimation variance.
Despite its wide use, IK suffers from several limitations and impediments (Emery & Ortiz 2013), some practical issues need to be addressed which include issues associated with kriging neighborhood and the order relations problem and their correction.
- 1.
Selection of kriging neighborhood
The kriging theory is always derived as if all sample data points were used in the estimation; it is the so-called global neighborhood case. The minimum mean squared error is achieved when all the sample data are included. If a smaller neighborhood is chosen, the estimation can be considered as a constrained optimization with zero weights assigned on the discarded points. However, a global neighborhood may result in a kriging matrix that is too large to be numerically inverted. Another concern is that the geostatistical model may only be valid over short distances. The experimental variogram is generally accurate for small distance lag, and the uncertainties grow rapidly as the separation distance gets large.
The other alternative is called ‘moving neighborhood’, and the point selection is restricted to a subset of the sample data which is changing with the estimated point. In this study, only the sample data that lie within the preselected distance from the point of estimation are counted as conditioning in kriging, and this distance
is a function of direction
because of the anisotropy (Figure 5). For the exponential variogram model, Gilgen (2006) suggested not to interpolate for distances larger than the range.
- 2.
Problem of the order relation



However, the violation of such order relation could happen during the estimation.
There are two sources that could cause the violation of order relation: negative IK weights and insufficient number of data in some classes. Practice has shown that the majority of order relation problems are due to the lack of sufficient data, more precisely, to cases when IK is attempted at a cutoff which is the lower bound of a class
that contains no data. In such a case, the indicator data set is the same for both cut-off levels
and
, and yet, the corresponding indicator variogram models are likely to be different. Therefore, the resulting exceedance probability values will likely be different with a good chance of violating the non-increasing relationship.
A correction of the IK-returned exceedance probability value is necessary. Correcting for order relation problems is quite delicate because of the ordering of the cumulative indicators. The following correction algorithm considers the average of an upward and downward correction (Deutsch & Journel 1998): The upward correction starts with the lowest cut-off level . If the IK-returned exceedance probability value
is not within [0, 1], reset it to the closest bound. Proceed to the next cut-off level
. If the IK-returned exceedance probability value
is not within
, reset it to the closest bound. A loop through all remaining cut-off levels
,
with S being the total number of cut-off levels. The downward correction starts with the largest cut-off level
. If the IK-returned exceedance probability value
is not within [0, 1], reset it to the closest bound. Proceed to the next cut-off level
. If the IK-returned exceedance probability value
is not within
, reset it to the closest bound. A loop downward through all remaining cut-off levels
,
. Average the two sets of corrected exceedance probability values.




To evaluate this integral, the procedures for interpolation between the IK-derived conditional CDF values are critical. Knowing a conditional CDF at several cut-off values, the integral can be estimated by assuming the distribution of the rainfall intensity at the considered location. In this study, several commonly used parametric probability distribution functions are used to fit the IK estimated CDF at different cut-off rainfall intensities, as well as a discrete approximation.
- 1.
Discrete approximation
where is the mean value of the class
as obtained from the conditional CDF model.
- 2.
Log-normal distribution










- 3.
Generalized extreme value distribution
The shape parameter governs the tail behavior of the distribution and the sub-family defined by
corresponds to the Gumbel (or extreme value type I, EV1) distribution.
- 4.
Gumbel
- 5.
Generalized Pareto distribution









- 6.
Generalized logistic distribution











Statistical indices




To evaluate this integral, the procedures for interpolation between the IK-derived conditional CDF values are critical. Knowing a conditional CDF at several threshold values, the integral can be estimated by assuming the distribution of the rainfall intensity at the considered location. In this study, several commonly used parametric probability distribution functions, including LN distribution, GEV distribution, GPA distribution, GLO distribution, are used to fit the IK estimated CDF at different threshold rainfall intensities, as well as DA.
There are many factors affecting the performance of spatial interpolation methods, including sampling density, sample spatial distribution, sample clustering, surface type, data variance, data normality, quality of secondary information, stratification and temporal scale. Among them, data variation is a dominant factor (Li & Heap 2011). Large data variation indicates a high heterogeneity in the data. Since the original rainfall data have transformed into binary variables to distinguish whether the rainfall intensity is higher than the considered threshold in this study, variation in data itself is not a concern that affects the performance of spatial interpolation methods. Herein, the interpolation performance, as well as the impact of the semi-variogram estimation on the interpolation performance, is evaluated. IK results, along with different distributional assumptions of the random rainfall intensity at unsampled locations, were used to estimate the mean rainfall intensity value. The estimated values of rainfall intensity within the domain of interest were compared with the observed values at every sample point using the following performance criteria.




RESULTS AND DISCUSSION
To estimate the probability distribution as well as the mean rainfall intensity at an unsampled location within the studied domain, several thresholds need to be selected. At each threshold, the data are indicator-transformed and their resulting indicator variograms are used to describe the changes in spatial variability along with the CDF. Hence, threshold values need to be selected to provide the maximum possible resolution defining CDF. More thresholds, in theory, provide a better definition of the CDF, but the more intensive calculation is required.
In this study, six threshold values of rainfall intensity were chosen, based on a numerical experiment indicating that increasing the number of thresholds brings only a minor additional reduction in estimation accuracy. The threshold value k was set to be 1, 2, 3, 5, 10 and 20 mm/h, respectively, for modeling the prior distribution.
Isotropic variogram
One important problem for geostatistical interpolation of rain field is the effective and reliable estimation of the variogram for each time step. Variograms were inferred from radar rainfall data because of the higher resolution in space compared to the recording raingauge stations. To investigate the influence of variogram estimation on interpolation performance, isotropic experimental semi-variograms are estimated for every 6-min time step under the considered six threshold levels during each selected rainstorm event. The automatically estimated parameters (e.g., range and sill) of the isotropic variogram for an exponential model for the two selected rainstorm events are shown in Figures 6 and 7.
Time variation of estimated variogram parameters ((a) range and (b) sill) of the exponential model under the isotropic condition for the 2007-05-18 rainstorm event.
Time variation of estimated variogram parameters ((a) range and (b) sill) of the exponential model under the isotropic condition for the 2007-05-18 rainstorm event.
Time variation of estimated variogram parameters ((a) range and (b) sill) of the exponential model under the isotropic condition for the 2008-04-19 rainstorm event.
Time variation of estimated variogram parameters ((a) range and (b) sill) of the exponential model under the isotropic condition for the 2008-04-19 rainstorm event.
There are some features that can be noticed from the estimated parameters of the isotropic variograms for the selected rainstorm events.
Most of the sills of indicator variograms at all considered cut-off levels and time steps are ranging from 0 to 0.4.
For the 2007-05-18 rainstorm event, it is clear that the values of the range and sill have a negative correlation with the rainfall intensity thresholds. The larger the threshold, the smaller the range and sill are. Situations are different for the 2008-04-19 rainstorm event from the 80th 6-min to the 200th 6-min. This correlation offers information about the number of the rainstorm centers. Confirmed from the rainfall intensity images of the 2008-04-19 rainstorm event, there is indeed more than one rainstorm center during those time periods, which makes the values of the variogram parameters for lower cut-off levels (e.g., 1 and 2 mm/h) go down, even to the smallest. ‘Rainstorm centers’ here are referring to the areas where the rainfall intensity is greater than 10 mm/h. The instantaneous rain field of the 2007-05-18 rainstorm event at the 33rd 6-min shown in Figure 8 clearly indicates that the only one rainstorm center is located in the northwest. For the 160th 6-min of the 2008-04-19 rainstorm event (see Figure 9), the rainstorm centers scatter at several different locations. This result consists of the information obtained from the correlation between the mean length and range of the indicator variogram (Liu & Tung 2015). Because of the existence of multiple rainstorm centers, the correlation coefficients of the mean length and range for the 2008-04-19 rainstorm event are slightly lower than those for the other two considered rainstorm events.
Instantaneous rainfall intensity contour (in mm/h) over the study area during the 2007-05-18 rainstorm event at 33rd 6-min.
Instantaneous rainfall intensity contour (in mm/h) over the study area during the 2007-05-18 rainstorm event at 33rd 6-min.
Instantaneous rainfall intensity contour (in mm/h) over the study area during the 2008-04-19 rainstorm event at 160th 6-min.
Instantaneous rainfall intensity contour (in mm/h) over the study area during the 2008-04-19 rainstorm event at 160th 6-min.
Anisotropic variogram with elliptic-fitted parameters
The spatial structure of rain field at each threshold level reveals strong anisotropy. Hence, the indicator variograms are fitted with the exponential variogram model for 24 directions at each time step. The variogram parameters (e.g., range and sill) are expected to show its directional association and tend to be similar if the directions are close. However, the presence of sampling variation could result in significant variability of variogram parameters with the direction even the two considered directions only differ by about 10°. Herein, the parameters of anisotropic variograms are fitted with an elliptic model. Figures 10 and 11 show the examples of sample estimated and elliptic-fitted ranges of anisotropy variogram for the selected rainstorm events. The procedure of fitting the variogram parameters with the elliptic model removes the fluctuation of the original parameters that are directly estimated from the theoretical variogram model for 24 directions. In addition, the parametric representation of variogram parameters by the elliptic model reduces the number of the parameters of the model from 48 (both ranges and sills for all 24 considered directions at one time) to 6 (the biggest range and sill, the smallest range and sill and the angle of the biggest range and sill).
Sample and elliptic-fitted ranges of anisotropy variogram for the threshold of 5 mm/h at 18:30 pm during the 2007-05-18 rainstorm event.
Sample and elliptic-fitted ranges of anisotropy variogram for the threshold of 5 mm/h at 18:30 pm during the 2007-05-18 rainstorm event.
Sample and elliptic-fitted ranges of anisotropy variogram for the threshold of 10 mm/h at 20:00 pm during the 2008-04-19 rainstorm event.
Sample and elliptic-fitted ranges of anisotropy variogram for the threshold of 10 mm/h at 20:00 pm during the 2008-04-19 rainstorm event.
IK is used to analyze the probability that the rainfall intensity is greater than the considered threshold for the unsampled location and the non-decreasing CDF values at the predetermined threshold levels are available. Eight different interpolation schemes are tested to fit the estimated CDF from IK by the least-square method and to estimate the mean rainfall intensity within the study area (see examples in Figure 12).
Examples of curve fitting for CDF at four different locations for the 2007-05-18 rainstorm event (18:30 pm).
Examples of curve fitting for CDF at four different locations for the 2007-05-18 rainstorm event (18:30 pm).
Discussion
The results of the performance of IK and mean rainfall intensity estimation (compared with the 132 observed raingauge estimations) are shown in Tables 1 and 2 under different assumptions about the semi-variogram model and different probability distributions of the CDF.
Example of performance assessment of IK based on sample isotropic and anisotropic variograms and elliptic-fitted anisotropic variogram at 18:30 pm during the 2007-05-18 rainstorm event
Distribution . | IK based on sample isotropic variogram . | IK based on sample anisotropic variogram . | IK based on elliptic-fitted anisotropic variogram . | Distribution . | IK based on sample isotropic variogram . | IK based on sample anisotropic variogram . | IK based on elliptic-fitted anisotropic variogram . |
---|---|---|---|---|---|---|---|
BIAS | RMSE | ||||||
DA | 1.992 | −0.529 | 0.526 | DA | 15.682 | 6.439 | 8.951 |
LN | −0.462 | −0.932 | −0.526 | LN | 6.910 | 6.291 | 6.556 |
GEV | −1.282 | −1.907 | −1.567 | GEV | 9.017 | 6.319 | 6.371 |
Gumbel | −0.293 | −1.387 | −1.132 | Gumbel | 9.561 | 6.707 | 6.958 |
GPA | −0.336 | −1.333 | −0.928 | GPA | 8.847 | 6.296 | 6.403 |
PA | −0.198 | −1.356 | −1.059 | PA | 8.791 | 6.336 | 6.505 |
GLO | −1.053 | −1.528 | −1.199 | GLO | 9.037 | 6.325 | 6.345 |
LO | −0.343 | −1.392 | −1.174 | LO | 9.934 | 6.44 | 6.655 |
MAE | AEIV | ||||||
DA | 4.926 | 2.577 | 3.468 | DA | 0.847 | 0.828 | 0.830 |
LN | 2.707 | 2.443 | 2.632 | LN | 0.847 | 0.844 | 0.840 |
GEV | 3.141 | 2.297 | 2.374 | GEV | 0.847 | 0.852 | 0.850 |
Gumbel | 3.334 | 2.431 | 2.64 | Gumbel | 0.818 | 0.837 | 0.825 |
GPA | 3.250 | 2.298 | 2.413 | GPA | 0.848 | 0.852 | 0.855 |
PA | 3.089 | 2.313 | 2.478 | PA | 0.843 | 0.851 | 0.849 |
GLO | 3.063 | 2.297 | 2.348 | GLO | 0.848 | 0.851 | 0.850 |
LO | 3.307 | 2.363 | 2.542 | LO | 0.821 | 0.845 | 0.834 |
Distribution . | IK based on sample isotropic variogram . | IK based on sample anisotropic variogram . | IK based on elliptic-fitted anisotropic variogram . | Distribution . | IK based on sample isotropic variogram . | IK based on sample anisotropic variogram . | IK based on elliptic-fitted anisotropic variogram . |
---|---|---|---|---|---|---|---|
BIAS | RMSE | ||||||
DA | 1.992 | −0.529 | 0.526 | DA | 15.682 | 6.439 | 8.951 |
LN | −0.462 | −0.932 | −0.526 | LN | 6.910 | 6.291 | 6.556 |
GEV | −1.282 | −1.907 | −1.567 | GEV | 9.017 | 6.319 | 6.371 |
Gumbel | −0.293 | −1.387 | −1.132 | Gumbel | 9.561 | 6.707 | 6.958 |
GPA | −0.336 | −1.333 | −0.928 | GPA | 8.847 | 6.296 | 6.403 |
PA | −0.198 | −1.356 | −1.059 | PA | 8.791 | 6.336 | 6.505 |
GLO | −1.053 | −1.528 | −1.199 | GLO | 9.037 | 6.325 | 6.345 |
LO | −0.343 | −1.392 | −1.174 | LO | 9.934 | 6.44 | 6.655 |
MAE | AEIV | ||||||
DA | 4.926 | 2.577 | 3.468 | DA | 0.847 | 0.828 | 0.830 |
LN | 2.707 | 2.443 | 2.632 | LN | 0.847 | 0.844 | 0.840 |
GEV | 3.141 | 2.297 | 2.374 | GEV | 0.847 | 0.852 | 0.850 |
Gumbel | 3.334 | 2.431 | 2.64 | Gumbel | 0.818 | 0.837 | 0.825 |
GPA | 3.250 | 2.298 | 2.413 | GPA | 0.848 | 0.852 | 0.855 |
PA | 3.089 | 2.313 | 2.478 | PA | 0.843 | 0.851 | 0.849 |
GLO | 3.063 | 2.297 | 2.348 | GLO | 0.848 | 0.851 | 0.850 |
LO | 3.307 | 2.363 | 2.542 | LO | 0.821 | 0.845 | 0.834 |
Example of performance assessment of IK based on sample isotropic and anisotropic variograms and elliptic-fitted anisotropic variogram at 20:00 pm during the 2008-04-19 rainstorm event
Distribution . | IK based on sample isotropic variogram . | IK based on sample anisotropic variogram . | IK based on elliptic-fitted anisotropic variogram . | Distribution . | IK based on sample isotropic variogram . | IK based on sample anisotropic variogram . | IK based on elliptic-fitted anisotropic variogram . |
---|---|---|---|---|---|---|---|
BIAS | RMSE | ||||||
DA | 28.089 | 16.345 | 18.327 | DA | 48.423 | 25.197 | 26.179 |
LN | 1.247 | −2.311 | −0.373 | LN | 22.139 | 20.396 | 19.622 |
GEV | − 0.878 | −3.358 | −1.604 | GEV | 20.916 | 20.783 | 19.546 |
Gumbel | 0.657 | −5.690 | −4.380 | Gumbel | 20.973 | 22.321 | 21.116 |
GPA | 1.359 | −2.810 | −1.077 | GPA | 21.359 | 20.440 | 19.173 |
PA | 2.293 | −4.925 | −3.366 | PA | 21.626 | 21.639 | 20.226 |
GLO | − 0.909 | −3.593 | −1.922 | GLO | 20.048 | 21.035 | 19.740 |
LO | − 1.822 | −6.177 | −5.050 | LO | 21.664 | 22.787 | 21.753 |
MAE | AEIV | ||||||
DA | 29.971 | 19.937 | 21.594 | DA | 0.584 | 0.856 | 0.595 |
LN | 10.282 | 9.712 | 10.140 | LN | 0.699 | 0.878 | 0.722 |
GEV | 9.910 | 9.338 | 9.644 | GEV | 0.708 | 0.841 | 0.721 |
Gumbel | 10.563 | 7.777 | 10.031 | Gumbel | 0.703 | 0.916 | 0.709 |
GPA | 9.773 | 9.773 | 9.393 | GPA | 0.717 | 0.925 | 0.721 |
PA | 10.057 | 9.768 | 9.421 | PA | 0.713 | 0.902 | 0.716 |
GLO | 10.041 | 8.486 | 9.703 | GLO | 0.710 | 0.855 | 0.718 |
LO | 10.931 | 7.728 | 10.452 | LO | 0.698 | 0.916 | 0.702 |
Distribution . | IK based on sample isotropic variogram . | IK based on sample anisotropic variogram . | IK based on elliptic-fitted anisotropic variogram . | Distribution . | IK based on sample isotropic variogram . | IK based on sample anisotropic variogram . | IK based on elliptic-fitted anisotropic variogram . |
---|---|---|---|---|---|---|---|
BIAS | RMSE | ||||||
DA | 28.089 | 16.345 | 18.327 | DA | 48.423 | 25.197 | 26.179 |
LN | 1.247 | −2.311 | −0.373 | LN | 22.139 | 20.396 | 19.622 |
GEV | − 0.878 | −3.358 | −1.604 | GEV | 20.916 | 20.783 | 19.546 |
Gumbel | 0.657 | −5.690 | −4.380 | Gumbel | 20.973 | 22.321 | 21.116 |
GPA | 1.359 | −2.810 | −1.077 | GPA | 21.359 | 20.440 | 19.173 |
PA | 2.293 | −4.925 | −3.366 | PA | 21.626 | 21.639 | 20.226 |
GLO | − 0.909 | −3.593 | −1.922 | GLO | 20.048 | 21.035 | 19.740 |
LO | − 1.822 | −6.177 | −5.050 | LO | 21.664 | 22.787 | 21.753 |
MAE | AEIV | ||||||
DA | 29.971 | 19.937 | 21.594 | DA | 0.584 | 0.856 | 0.595 |
LN | 10.282 | 9.712 | 10.140 | LN | 0.699 | 0.878 | 0.722 |
GEV | 9.910 | 9.338 | 9.644 | GEV | 0.708 | 0.841 | 0.721 |
Gumbel | 10.563 | 7.777 | 10.031 | Gumbel | 0.703 | 0.916 | 0.709 |
GPA | 9.773 | 9.773 | 9.393 | GPA | 0.717 | 0.925 | 0.721 |
PA | 10.057 | 9.768 | 9.421 | PA | 0.713 | 0.902 | 0.716 |
GLO | 10.041 | 8.486 | 9.703 | GLO | 0.710 | 0.855 | 0.718 |
LO | 10.931 | 7.728 | 10.452 | LO | 0.698 | 0.916 | 0.702 |
Comparing the results based on different variogram models, it is clear that IK based on the variogram under isotropic assumption shows the worst performance for most times. This result signifies the necessity of capturing the anisotropy feature of rain field although it is more computationally intensive and complex to fit the variogram model for 24 different directions. The other finding is that the IK based on the anisotropic variogram with strictly sampled estimated parameters and that of elliptic-fitted parameters show similar performances and the former one leads to slightly better results. Without sacrificing too much in the accuracy, the sampling fluctuation of the variogram parameters for 24 directions can be removed by fitting with an elliptic model and the number of the parameters is reduced from 48 to 6.
It can be seen that the estimation on the value of the rainfall intensity is not always satisfactory. However, performance varies from one rainstorm event to another. The worst performance was found during the 2008-04-19 rainstorm event, the values of ,
and
are around −4.1, 9.0 and 21.8 mm/h, respectively. IK shows reasonable performance for the 2007-05-18 rainstorm event (
: −1.3 mm/h,
: 2.3 mm/h and
: 6.3 mm/h). This accuracy of estimation can be improved by increasing the number of the thresholds, but at the same time, the indicator variograms associated with the new thresholds need to be calculated which is very time-consuming. The other performance measure,
, shows that, although the estimated mean rainfall intensity value sometimes may not be that precise, most of the times the estimation is located in the right range of the threshold levels. It can be concluded that IK can usually adequately provide the probability that an unknown rainfall intensity at an unsampled location is greater than the given threshold value (or the value of the indicator variable), and this information is the basis for sequential indicator simulation and critical decisions in agriculture, hydrology, hydraulic structural design and other areas.
For further analysis, the ,
and
are calculated for each threshold range of the rainfall intensity. Most contribution of the error comes from the large range with rainfall intensity greater than 10 mm/h.
In general, the three-parameter distribution models usually show slightly better performance than the two-parameter distributions. Fitting the CDF with a GPA distribution outperformed the other six distribution models considered according to the values of ,
,
and
.
Figures 13 and 14 compare the observed rainfall intensity with that of the estimated value under the assumption of GPA distribution for the CDF. From the standard deviation map, it is clear to see that large uncertainty in estimating unknown rainfall intensity at unsampled location is associated with the large rainfall intensity. The AIEV is high when the rainfall intensity exceeds 5 mm/h. From the results of , it can be seen that overestimation often occurs when estimating the lower range rainfall intensity, while it tends to underestimate in the range of higher rainfall intensity.
Example of observed rainfall intensity (left), estimated mean rainfall intensity (middle) and standard deviation of the estimation (right) in mm/h under the assumption of GPA distribution at 18:30 pm during the 2007-05-18 rainstorm event.
Example of observed rainfall intensity (left), estimated mean rainfall intensity (middle) and standard deviation of the estimation (right) in mm/h under the assumption of GPA distribution at 18:30 pm during the 2007-05-18 rainstorm event.
Example of observed rainfall intensity (left), estimated mean rainfall intensity (middle) and standard deviation of the estimation (right) in mm/h under the assumption of GPA distribution at 20:00 pm during the 2008-04-19 rainstorm event.
Example of observed rainfall intensity (left), estimated mean rainfall intensity (middle) and standard deviation of the estimation (right) in mm/h under the assumption of GPA distribution at 20:00 pm during the 2008-04-19 rainstorm event.
CONCLUSIONS
IK has the capability of estimating the probability that an unknown quantity at an unsampled location is greater than a given threshold value by capturing the proportion of neighboring data valued above the considered threshold. In this study, IK is used to estimate the exceedance probabilities for different rainfall intensity threshold levels at all unsampled locations within the studied domain. Six rainfall intensity thresholds are chosen based on a numerical experiment, indicating that increasing the number of thresholds brings only a minor additional improvement in estimation accuracy. A procedure for the interpolation between the IK-derived conditional CDF values is implemented.
The performance accuracy of spatial interpolation results as well as influences of different variogram models (isotropic and anisotropic) and different fitted schemes for the estimated conditional CDF are studied. Under the isotropy assumption, values of the exponential variogram model parameters (e.g., range and sill) have a negative correlation with the rainfall intensity threshold except for the time periods when multiple rainstorm centers exist. When different time resolutions (6-min, 12-min, 30-min and 60-min) are considered, the time series of isotropic variogram parameters for the four-time resolutions share a similar trend. IK based on these isotropic variograms shows the worst performance for most times. While IK based on the anisotropic variogram with strictly sampled estimated parameters and that of elliptic-fitted parameters show similar performances. This confirms the necessity of capturing the anisotropy feature of rain field and the procedure of fitting anisotropic variogram parameters with an elliptic model.
Fitting the conditional CDF values at the selected threshold levels with a GPA model shows slightly better performance when estimating the mean rainfall intensity. The corresponding is the highest among the eight considered distributions for most considered rainstorm events. The proposed IK method tends to overestimate the rainfall intensity values in a lower range and underestimate them in a higher range. And the performance varies from one rainstorm event to another. The standard deviation maps indicate that there is high uncertainty associated with the large rainfall intensity values. However, it can provide a good estimation of an indicator variable for rainfall intensity based on the predetermined thresholds, which can serve a good basis for sequential indicator simulation and critical decisions in agriculture, hydrology, hydraulic structural design and other areas.
ACKNOWLEDGEMENTS
The authors would like to thank the five anonymous reviewers and the editor who helped us improve the quality of the original manuscript. The authors wish to acknowledge the support of National Key R&D Program of China (Grant No. 2017YFC1502702) and the Hong Kong Research Grant Council (Project No. 620110: Development of time-space rainstorm model for Hong Kong) for the study. Gratitude is extended to the Hong Kong Observatory for providing the rainfall radar data used in the study.