Spatial interpolation of rain-field dynamic time-space evolution based on radar rainfall data

Accurate and reliable measurement and prediction of the spatial and temporal distribution of rain field over a wide range of scales are important topics in hydrologic investigations. In this study, a geostatistical approach was adopted. To estimate the rainfall intensity over a study domain with the sample values and the spatial structure from the radar data, the cumulative distribution functions (CDFs) at all unsampled locations were estimated. Indicator kriging (IK) was used to estimate the exceedance probabilities for different preselected threshold levels, and a procedure was implemented for interpolating CDF values between the thresholds that were derived from the IK. Different probability distribution functions of the CDF were tested and their influences on the performance were also investigated. The performance measures and visual comparison between the observed rain field and the IK-based estimation suggested that the proposed method can provide good results of the estimation of indicator variables and is capable of producing a realistic image. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/). doi: 10.2166/nh.2020.115 ://iwaponline.com/hr/article-pdf/51/3/521/698364/nh0510521.pdf Peng Liu (corresponding author) State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Nanjing Hydraulic Research Institute, 223 Guangzhou Road, Nanjing 210029, China E-mail: pliu@nhri.cn Yeou-Koung Tung Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong


INTRODUCTION
As rainfall constitutes the main source of freshwater for our planet and all living things on earth, accurate and reliable measurement and prediction of its spatial and temporal distribution over a wide range of scales is an important subject for hydrologic investigations. The estimation of the rainfall amount over the spatial domain from the data taken at measurement stations is a crucial stage in many hydrological applications. The problem of estimating spatial distribution of rainfall may be described as follows. Obtain an estimate of the unknown rainfall amount, r(x 0 ), at an arbitrary location, x 0 , within the estimation domain, A, using nearby measurements in and around A, r(x 1 ), r(x 2 ), . . . , r(x n ), at locations, x 1 , x 2 , . . . , x n , respectively. The value of r(x 0 ) can be estimated as a weighted average of r(x 1 ), r(x 2 ), . . . , r(x n ) (Seo ).
To construct new data points within the range of a discrete set of known sample data points, interpolation is needed. Many schemes for spatial interpolation of rainfall have been developed, ranging from simple techniques, such as Thiessen polygons (Thiessen ), which simply assign the record of the nearest raingauge to the unsampled location, or inverse-distance weighting (IDW) schemes (Cressman ; Shepard ; Barnes ), to more complex approaches, such as kriging (Matheron ). In the Thiessen polygons method, only one raingauge is contributed to the estimated value of each unsampled location and the information on rainfall gradients is lost (Dirks et al. ). To overcome this shortcoming in estimating the spatial variation of rainfall within the considered domain, numerous studies have been proposed and tested. A simple scheme is based on IDW function with different forms of the weighting functions. The IDW scheme is a relatively simple, deterministic interpolation method, and it requires the determination of the weighting functions prior to the interpolation. Furthermore, the arbitrary nature of choosing the weighting functions may significantly affect the accuracy of the interpolated field. More studies have been made on geostatistical tools, like kriging, to interpolate rainfall field. Kriging avoids the need for preselecting parameters by calculating the semi-variogram for each pair of observations in the network as a measure of the degree of spatial dependence between the points. Kriging is increasingly preferred in the study of spatial interpolation of precipitation because it allows one to capture the spatial correlation between neighboring observations to estimate and predict attribute values at unsampled locations (Goovaerts ). Especially in a region with intense and strongly varying rainfall events, a kriging procedure generally provides a much better estimation than any of the more commonly used methods (Creutin & Obled ). In kriging, the semi-variogram is used to determine the weighting function.
Kriging is a local estimation technique of the best linear unbiased estimator (BLUE) for the unknown values of spatial and temporal variables. It is based on the use of a random field, and a number of assumptions such as stationarity and spatial ergodicity, so as to reduce the needed information to a so-called variogram that can be estimated from the available measurements (Loquin & Dubois ).
Based on the variogram, optimal weights are assigned to known values in order to calculate unknown ones. The variogram changes with the distance and the weights depend on the known sample distribution. As an interpolation tool, it has some very useful properties (Grimes ). It takes account of spatial structure inherent in the data and provides unbiased, optimal (in a least-squares sense) measured values. In addition, it can be adapted to account for secondary information (e.g., topography) and provides the degree of uncertainty for each interpolated value. Ordinary kriging (OK) is the most commonly used algorithm in the kriging family. Another best-known technique is simple kriging (SK), which is based on the assumption that the expected value of the regionalized variable is a known constant. There are many variants of the kriging method, such as co-kriging which takes into account the correlation of the main variable (e.g., rainfall) and secondary variables (e.g., topography), kriging with external drift (KED) which assumes that the mean of a considered variable is linearly related to a secondary variable and indicator kriging (IK) which carries out the kriging procedure on a binary pattern and provides the probability of occurrence of a considered variable being greater than a given threshold at the estimated locations. The new method, besides allowing spatial information to be incorporated into the modeling and estimation process, also allows temporal information to be incorporated. This technique employs temporal data as secondary co-kriged variables.
Quite a number of works have been carried on to interpolate rainfall data using geostatistical kriging methods (Lebel & Laborde ; Barancourt et al. ). Goovaerts () compared three multivariate geostatistical algorithms to incorporate a digital elevation model into the spatial prediction of rainfall with the straightforward linear regression techniques, such as the inverse distance or Thiessen polygon. The results confirmed that in a region with lowdensity raingauge network, geostatistical interpolation outperformed the other univariate methods. Kyriakidis et al.
() presented a geostatistical framework that integrated atmospheric and terrain characteristics into the spatial interpolation of seasonal average daily precipitation. The results indicated that geostatistical framework could provide more accurate representations of the spatial distribution of rainfall than those found in the traditional analyses. However, the degree of the improvement depended on the density of the rain gauge stations and the spatial variability of rain field. Pardo-Igúzquiza & Dowd () described a two-step procedure of IK and OK, made a comparison with the empirical kriging and discussed the pros and cons of each method. Kriging with external drift (KED) and indicator kriging with external drift (IKED) were used for the spatial interpolation of hourly rainfall from raingauges using additional information from the radar by Haberlandt (). The impact of the semi-variogram estimation (isotropic or anisotropic behavior) on the interpolation performance was analyzed, and it was concluded that the automatic fitting procedure with isotropic variograms offered the best results in his study.
A merging method combining a mean precipitation field interpolated from raingauge observations, along with information about the spatial variability from radar data, was developed by Ehret et al. (). This method was proved to offer a reliable basis for a flood forecasting system. Goudenhoofdt & Delobbe () tested several radar-gauge merging methods with various degrees of complexity and found that geostatistical merging methods outperformed the others.
Verworn & Haberlandt () applied the multivariate kriging method with external drift (MKED) using additional information from topography, rainfall data from the denser daily networks and weather radar data and concluded that the weather radar was proved to be the most valuable additional information for convective summer events, while daily rainfall was sufficient for winter events. Tobin approaches with regard to their suitability to produce spatial rainfall estimates in a monsoon-dominated region with scarce rainfall measurements. They concluded that the precipitation interpolation approaches using appropriate covariates performed the best. Sideris et al. (b) expanded the well-tested technique (KED) by introducing and coupling temporal data with spatial data through co-kriging and this proposed technique achieved to serve as a realtime operational tool in Switzerland. Lebrenz & Bárdossy () applied quantile kriging (QK) to the variable of observed monthly precipitation from raingauges in South Africa. The cross-validations proved that QK performed the best among OK and external drift kriging (EDK). Huang et al. () proposed a new spatial precipitation interpolation method based on the information diffusion principle that has been proven to effectively reduce data or information uncertainties from point ground observations. However, kriging is limited in that it depends on the existence of a reliable representation of the spatial correlation structure and is mathematically more complicated and computationally more intensive. The usefulness of kriging depends on whether the distribution of data agrees with certain statistical assumptions (e.g., Gaussian distribution) and a representative semi-variogram can be defined. In modeling spatial variability of rainfall field, these assumptions are not always satisfied. A number of technical studies are carried out to fix this problem and are well documented in Goovaerts (). IK is often qualified as a 'non-parametric' approach because it does not rely on a prespecified spatial distribution model for the attribute under study. It is, therefore, applicable for any data sets, which explains the popularity of the method. The main objective of this paper is to contribute to evaluate the performance of IK schemes using different sample variograms and different probability distribution functions for fitting rainfall intensity. The paper is organized as follows.
First, a brief description of the study area and the data preparation is presented. Then, the methodology to estimate the rainfall intensity by different sample variograms and different probability distribution functions is introduced. This is followed by discussion of the results and finally the conclusions that can be drawn from this study are presented. Traditionally, rainfall information has been collected in the form of raingauge records and rainfall is observed at a particular point on the Earth's surface. While the temporal resolution of the rainfall records from the raingauge stations is acceptable, the density of raingauge network is sometimes too sparse to reliably estimate the spatial structure. The idea of using radar reflectivity measurements is appealing by providing the high spacetime resolution of observations. In recent years, increasing use has been made of radar-based rainfall estimates and radar images. Due to the superiority of the high spatial and temporal resolution, radar rainfall data have been used more frequently as an input for hydrological modeling. However, careful system design and sophisticated data processing are required for radar rainfall estimation because of sampling error and measurement-algorithm The radar data used in this study is collected by the Hong Kong Observatory (HKO) from a rainstorm nowcasting system SWIRLS (Short-range Warning of Intense Rainstorms in Localized Systems). The radar reflectivity with a high spatial (1 km 2 ) and temporal (6 min) resolution which is detected at 1 km height is correlated every 5 min with the rainfall recorded by the raingauges underneath to gain the optimal parameters a and b in the Z À R relation-    fluctuations, which can be expressed as the path of a point:

STUDY AREA AND DATA PREPARATION
where the angle parameter ω varies from 0 to 2π. In Equation (1), (X center , Y center ) is the center of the ellipse which is set to be (0, 0) during the fitting procedure; A and B are, respectively, the semi-major and semi-minor axes of the ellipse and φ is the angle between the x-axis and the major axis of the ellipse.
Then, the value of the semi-variogram parameter (e.g., range a for direction ω) can be derived from the corresponding X Ã (ω) and Y Ã (ω) as follows: To determine the optimal parameters of the elliptic between the original estimated range for each direction and the corresponding elliptic-fitted value is minimized: where a(θ i ), i ¼ 1, . . . , 24 is the original estimated range for the considered 24 directions and a Ã (θ i , A, B, φ), i ¼ 1, . . . , 24 is the elliptic-fitted value of the range.  By IK, the actual rain process is transformed into a binary process to distinguish whether the rainfall intensity R(x) at location x at time t is above a special threshold k or not. IK does not aim at estimating the indicator transform i(x, k) set to 1 or 0 but provides an estimate of the cumulative distribution function (CDF) value conditional on threshold level k. A linear estimate of the conditional probability that R(x 0 ) greater than a given threshold k from the sample data R(x α ), α ¼ 1, . . . , n, which is also the mean of i(x 0 , k), is given by the following equation: Denoting the covariance of the indicator variables for the threshold level k at location x τ and x ν by . . , n are obtained by solving the following OK system as follows: where . . , n is the n × 1 vector of covariance between the data and the target and λ IK β (x 0 , k), β ¼ 1, . . . , n is the n × 1 vector of unknown weights to be solved.
Since 1983 when IK was introduced by Journel (), it has grown to become a widely applied estimation technique.
The method offers a non-parametric, distribution-free solution to the spatial interpolation problem of a variable at an unsampled location. IK has a number of advantages: (1) it does not require the prior knowledge about the spatial distribution of the original sample data; (2) it can handle moderate mixing of diverse sample populations and is resistant to outliers; (3) it takes into account the structure of indicators at each considered threshold; (4) it provides full information of the CDF at the selected thresholds and (5) it provides an estimation variance.
Despite its wide use, IK suffers from several limitations and impediments (Emery & Ortiz ), some practical issues need to be addressed which include issues associated with kriging neighborhood and the order relations problem and their correction.

Selection of kriging neighborhood
The kriging theory is always derived as if all sample data points were used in the estimation; it is the so-called global neighborhood case. The minimum mean squared error is achieved when all the sample data are included. If a smaller neighborhood is chosen, the estimation can be considered as a constrained optimization with zero weights assigned on the discarded points. However, a global neighborhood may result in a kriging matrix that is too large to be numerically inverted. Another concern is that the geostatistical model may only be valid over short distances. The experimental variogram is generally accurate for small distance lag, and the uncertainties grow rapidly as the separation distance gets large.
The other alternative is called 'moving neighborhood', and the point selection is restricted to a subset of the sample data which is changing with the estimated point.
In this study, only the sample data that lie within the preselected distance δ θ from the point of estimation are counted as conditioning in kriging, and this distance δ θ is a function of direction θ because of the anisotropy ( Figure 5). For the exponential variogram model, Gilgen () suggested not to interpolate for distances larger than the range.

Problem of the order relation
The IK algorithm itself does not necessarily ensure that the resulting probability estimates satisfy non-decreasing (or non-increasing) relationships between the threshold level k and non-exceedance probability (or exceedance probability).

Let
Pr{R be the exceedance probability of a continuous variable R(x) > k and the corresponding indicator variable i(x, k) ¼ 1.
Then, for two threshold levels k 0 > k, the legitimate exceedance probability relationship should satisfy F{R(x), k 0 j(n)} F{x, kj(n)} However, the violation of such order relation could happen during the estimation.
There are two sources that could cause the violation of order relation: negative IK weights and insufficient number of data in some classes. Practice has shown that the majority of order relation problems are due to the lack of sufficient data, more precisely, to cases when IK is attempted at a cutoff k s which is the lower bound of a class [k s , k sþ1 ) that contains no data. In such a case, the indicator data set is the same for both cut-off levels k l and k sþ1 , and yet, the corresponding indicator variogram models are likely to be different. Therefore, the resulting exceedance probability values will likely be different with a good chance of violating the non-increasing relationship.
where F(x, k s ), s ¼ 1, 2, . . . , S is the exceedance probability at cutoff k s for location x gained by IK.
Various estimates for the unknown value R(x) can be derived from the conditional CDF, e.g., expectation (E-type), median (M-type) or a maximum probability type (P-type) (Bierkens & Burrough ). In this study, the E-type estimate R Ã E (x) is used to estimate the performance IK and is defined as follows: where F Ã (x) is the estimated CDF for the rainfall intensity at location x. The standard deviation sd Ã (x) of rainfall intensity can also be calculated to investigate the corresponding uncertainties: To evaluate this integral, the procedures for interpolation between the IK-derived conditional CDF values are critical. Knowing a conditional CDF at several cut-off values, the integral can be estimated by assuming the distribution of the rainfall intensity at the considered location. In this study, several commonly used parametric probability distribution functions are used to fit the IK estimated CDF at different cut-off rainfall intensities, as well as a discrete approximation.

Discrete approximation
Discrete approximation (DA) is actually assuming linearity between two adjacent IK estimated CDF values. The mean value of DA-based rainfall intensity at location x can be estimated by the following equation: where R 0 is the mean value of the class (k sÀ1 , k s ] as obtained from the conditional CDF model.
The standard deviation can be estimated by the following equation: where k s , s ¼ 1, . . . , S is the S cutoffs and k 0 ¼ 0, k Sþ1 ¼ R max are the minimum and maximum R-values.

Log-normal distribution
When a log-normal distribution (LN) is assumed for the random rainfall intensity at an unsampled location, its CDF corresponding to each cutoff k s , s ¼ 1, 2, . . . , S, can be expressed as follows: where R(x) is the value of rainfall intensity at location x; μ ln R(x) is the mean of ln R(x); σ ln R(x) is the standard deviation of ln R(x) and Φ() is the standard normal CDF.
μ ln R(x) and σ ln R(x) can be estimated by linear regression of where The mean value of LN-based rainfall intensity at location x can be estimated by the following equation: whereas the standard deviation of R(x) can be estimated by the following equation:

Generalized extreme value distribution
When generalized extreme value (GEV) distribution is assumed for random rainfall intensity at each unsampled location, the CDF corresponding to each cutoff k s , s ¼ 1, 2, . . . , S, can be expressed as follows: where ξ x , α x , κ x are, respectively, location, scale and shape The shape parameter κ x governs the tail behavior of the distribution and the sub-family defined by κ x ¼ 0 corresponds to the Gumbel (or extreme value type I, EV1) distribution.
The inverse CDF is: The mean value of GEV-based rainfall intensity at location x can be estimated as follows: and the standard deviation of R(x) can be estimated by the following equation:

Gumbel
When Gumbel distribution is assumed for random rainfall intensity at each unsampled location, the CDF corresponding to each cutoff k s , s ¼ 1, 2, . . . , S, can be expressed as follows: where μ x and β x are, respectively, location and scale parameters.
The inverse CDF of Gumbel is: The mean value of Gumbel-based rainfall intensity at location x can be estimated as follows: where γ x is Euler's constant, and the standard deviation of R(x) can be estimated by the following equation: where ζ( Á ) is the Riemann zeta function.

Generalized Pareto distribution
When generalized Pareto (GPA) distribution is assumed for the random rainfall intensity at each unsampled location, the CDF corresponding to each cutoff k s , s ¼ 1, 2, . . . , S, can be expressed as follows: where ξ x , α x , κ x are, respectively, location, scale and shape When the shape parameter κ x ¼ 0, the GPA distribution is reduced to the Pareto (PA) distribution.
The inverse CDF of GPA is given as follows: The mean value of GPA-based rainfall intensity at location x can be estimated as follows: and the standard deviation of R(x) can be estimated by the following equation:

Generalized logistic distribution
When the generalized logistic (GLO) distribution is assumed for the random rainfall intensity at each unsampled location, for each cutoff k s , s ¼ 1, 2, . . . , S, can be expressed as follows: where ξ x , α x , κ x are, respectively, location, scale and shape par- The GLO distribution is reduced to logistic (LO) distribution when the shape parameter κ x is zero.
The inverse CDF of GLO is given as follows: The mean value of GLO-based rainfall intensity at location x can be estimated as follows: And the standard deviation can be estimated by the following equation: where

Statistical indices
Various estimates for the unknown value R(x) can be derived from the conditional CDF, e.g., expectation (E-type), median (M-type) or a maximum probability type (P-type) (Bierkens & Burrough ). In this study, the E-type estimate R Ã E (x) is used to estimate the performance IK and is defined as follows: where F Ã (x) is the estimated CDF for the rainfall intensity at location x. The standard deviation sd Ã (x) of rainfall intensity can also be calculated to investigate the corresponding uncertainties: There are many factors affecting the performance of spatial interpolation methods, including sampling density, sample spatial distribution, sample clustering, surface type, data variance, data normality, quality of secondary information, stratification and temporal scale. Among them, data variation is a dominant factor (Li & Heap ). Large data variation indicates a high heterogeneity in the data.
Since the original rainfall data have transformed into binary variables to distinguish whether the rainfall intensity is higher than the considered threshold in this study, variation in data itself is not a concern that affects the performance of spatial interpolation methods. Herein, BIAS measures the averaged differences between the estimated rainfall values with the reference to the observed values as follows: where R Ã E (x i ) is the E-type estimate at location x i ; R(x i ) the observed value at location x i . Since the negative and positive estimates counteract each other, the resultant BIAS tends to be lower than the other performance criteria considered below. The positive-valued BIAS indicates that the estimated rainfall value on average is higher than that of the observed.
It should not be used as an indicator of prediction accuracy.
The mean absolute error (MAE) summarizes the averaged absolute differences between the observed and estimated rainfall values: The root mean squared error (RMSE) is a commonly used performance indicator of overall model prediction accuracy as compared to the observed values. It is more sensitive to outliers than the MAE as large errors exert heavier influence on the value of RMSE: The accuracy of estimated indicator variables (AIEV) is given as follows: where I Ã (x, k) is the indicator transformation of the estimated rainfall intensity for threshold k at location x; I(x, k) is the indicator transformation of the observed rainfall intensity for threshold k at location x.

RESULTS AND DISCUSSION
To estimate the probability distribution as well as the mean rainfall intensity at an unsampled location within the studied domain, several thresholds need to be selected. At each threshold, the data are indicator-transformed and their resulting indicator variograms are used to describe the changes in spatial variability along with the CDF.
Hence, threshold values need to be selected to provide the maximum possible resolution defining CDF. More thresholds, in theory, provide a better definition of the CDF, but the more intensive calculation is required.
In this study, six threshold values of rainfall intensity were chosen, based on a numerical experiment indicating that increasing the number of thresholds brings only a minor additional reduction in estimation accuracy. The threshold value k was set to be 1, 2, 3, 5, 10 and 20 mm/h, respectively, for modeling the prior distribution.   IK is used to analyze the probability that the rainfall intensity is greater than the considered threshold for the  Figure 12).

Discussion
The results of the performance of IK and mean rainfall intensity estimation (compared with the 132 observed raingauge estimations) are shown in Tables 1 and 2

CONCLUSIONS
IK has the capability of estimating the probability that an unknown quantity at an unsampled location is greater than a given threshold value by capturing the proportion of neighboring data valued above the considered threshold.
In this study, IK is used to estimate the exceedance probabil-