Rainfall maps with gridded data are frequently used as an important input for many hydrological models. In this study, two kriging-based interpolation methods (i.e., ordinary kriging (OK) and kriging with genetic programming (KGP)) and a deterministic interpolation method (inverse distance weighting (IDW)) are implemented to generate gridded rainfall maps from point rainfalls. The KGP is implemented as a new kriging method in which the genetic programming-based non-parametric variogram model is used with kriging. Rainfall records from existing 19 raingauges in the Middle Yarra River catchment, Australia are used for the analysis. The performance of each method is assessed through the cross-validation test. Results indicate that the kriging-based methods clearly outperform the IDW method. Among all the kriging-based methods, OK with the spherical variogram model yields the lowest prediction error and best estimates for all months. The KGP method gives an almost identical error to that given by the OK with the spherical variogram model for most of the months and a lower prediction error than that given by OK with the exponential or Gaussian variogram model. Thus, the KGP can be used in line with traditional kriging as a viable alternative technique for spatial estimation and mapping of rainfall.
INTRODUCTION
Rainfall maps with continuous gridded data provide an important input for many hydrological modelling and water resources management tasks including water budget analysis, flood forecasting, climate change studies, reservoir operation, irrigation scheduling and management, etc. (Moral 2010). The accuracy of most hydrological analysis highly depends on the correct estimation of the spatial distribution of rainfall. Improved estimates of rainfall distribution can be achieved by a dense network of raingauges. Unfortunately, the raingauge network is often sparse and the number of raingauges in the networks is limited due to financial, logistics and geological factors (Abo-Monasar & Al-Zahrani 2014). Consequently, point rainfalls are available from a finite number of raingauges in space. Moreover, the spatial extent of rainfall may not be arranged on a regular grid. Therefore, spatial interpolation methods are important in such cases to estimate rainfall at unsampled locations using the available sampled rainfall values at surrounding locations.
Different deterministic, stochastic and data-driven interpolation methods are generally used to estimate the spatial distribution of rainfall from the ground-based point data. Deterministic methods use the existing configuration of the observed data points to generate a surface, which usually include traditional weighting techniques. Inverse distance weighting (IDW) is a typical representative of deterministic methods, which have been extensively used for spatial interpolation of rainfall (ASCE 1996). Although attempts have been made to upgrade the deterministic techniques over time, limitations of those techniques continue to exist and are discussed in Teegavarapu & Chandramouli (2005).
Kriging-based stochastic interpolation techniques offer important advantages over the deterministic methods, which have been extensively applied for rainfall interpolation (e.g., Moral 2010; Delbari et al. 2013; Abo-Monasar & Al-Zahrani 2014). Kriging has been increasingly preferred because it takes into account the spatial correlation between neighbouring observations and desired location where the estimation is to be made. A noted advantage of kriging over simpler methods such as IDW is that it is capable of utilizing correlated dense secondary variables such as topographic information (e.g., Goovaerts 2000; Feki et al. 2012) to improve the estimation accuracy of sparsely sampled primary variable (i.e., rainfall).
Ordinary kriging (OK) is one of the most preferred stochastic interpolation methods for spatial rainfall estimation from the point rainfall data. However, the accuracy of traditional OK highly depends on the variogram model that generally demonstrates the spatial autocorrelation structure of the underlying stochastic process. Therefore, extra attention should be paid to selecting appropriate variogram models and finding optimal variogram parameters (i.e., nugget, sill, range). The time-consuming and tedious task for calibrating the variogram parameters of the standard variogram models (i.e., spherical, exponential, Gaussian) is a major difficulty associated with the traditional OK.
In the recent past, researchers have demonstrated the application of kriging with different data-driven methods namely, artificial neural networks (ANN) (e.g., Teegavarapu 2007), neuro-fuzzy system and support vector machine (e.g., Kisi & Sanikhani 2015), fuzzy theory and genetic algorithm (GA) (e.g., Chang et al. 2005), combined ANN and GA (Asghari & Nasseri 2015), genetic programming (GP) (e.g., Adhikary et al. 2015b) for spatial interpolation. The major advantage of kriging with data-driven methods is that they usually yield better results than those obtained by traditional deterministic (e.g., IDW) and stochastic interpolation methods (e.g., traditional kriging). However, a major drawback of the ANN- and GA-based methods is that no analytical equations of functional relationships between variables are obtained in these methods, and it is usually not easy to interpret the network weights given by the ANN models (Muttil & Lee 2005).
GP (Koza 1992) is an evolutionary data-driven modelling technique, which can be applied to obtain the variogram model to be used with kriging. The output from GP is an empirical model to approximate the function of the underlying process. GP inferred models exhibit the advantages of creating simple expressions and thereby offering some likely interpretations of the underlying process (Muttil & Lee 2005). The main advantage of GP as compared to traditional models is that it does not assume any a priori functional form of the solution. Globally, a number of applications of GP have been reported in the hydrology and water resources field (e.g., Khu et al. 2001; Sivapragasam et al. 2010, 2012; Yilmaz & Muttil 2014). To the best knowledge of the authors, kriging with a GP-based data-driven method has not been explicitly used as an interpolator in the past to generate gridded rainfall maps. Therefore, the objective of this study is to apply kriging with genetic programming (KGP) for spatial interpolation of monthly and annual mean rainfall to generate the gridded rainfall data set for the Middle Yarra River catchment in Australia. GP is used to get the GP-based non-parametric variogram model to be used in the KGP method, which replaces the standard parametric variogram models in the traditional OK.
The rest of the paper has been organized as follows. First, details of the methodology are presented, which is followed by a brief description of the study area and data sets used. The results are summarized next and, finally, conclusions are drawn.
METHODOLOGY
The methodology of this study consists of three main parts:
data preparation for normal distribution;
spatial interpolation of rainfall using deterministic and stochastic interpolation methods;
assessment of interpolation methods for spatial estimation of rainfall.
Spatial interpolation methods employed in this study: (a) traditional OK; (b) KGP; and (c) IDW methods.
Spatial interpolation methods employed in this study: (a) traditional OK; (b) KGP; and (c) IDW methods.
Data preparation
The normal distribution of data is a basic requirement of the kriging-based geostatistical approach. Kriging leads to an optimum estimator and yields best results for the normally distributed data (Adhikary et al. 2015a, 2015b). Abo-Monasar & Al-Zahrani (2014) have indicated that variograms developed with the non-normal data are unreliable for kriging interpolation. If data are not normally distributed, appropriate transformations should be implemented to normalize the data. In this study, log-transformation is used for the transformation of non-normal or skewed data. Finally, the widely used Kolmogorov–Smirnov (K-S) test is used to confirm the normality condition of the transformed rainfall data for a 5% significance level. The K-S test is a simple and straightforward statistical test to examine the normality condition of a data set, which has been explained in detail in McCuen (2003). The resulting normalized rainfall data sets are used for interpolation and map production.
Spatial interpolation methods
In this study, two stochastic interpolation methods (i.e., OK and KGP), and a deterministic interpolation method (i.e., IDW), are used for interpolation of rainfall and preparation of rainfall maps. The basic conceptual difference between traditional OK and KGP methods is presented in Figure 1(a) and 1(b). As can be seen from the figures, the standard parametric variogram models (e.g., spherical, exponential, Gaussian) are used in the traditional OK method, whereas the GP-based non-parametric variogram model is included in the KGP method. A brief description of these methods is presented in the following sub-sections.
OK

























KGP

The GP-based variogram model is obtained using GPKernel, a software package developed by DHI Water and Environment (Babovic & Keijzer 2000). The GPKernel is a command line-based tool to obtain the underlying functions (in mathematical form) on data. The model with the lowest RMSE and MAE values and the highest CC value is chosen as the best GP model. The KGP method is implemented through a customized spreadsheet application developed in the Microsoft Excel platform for rainfall interpolation to produce gridded rainfall maps.
IDW








As p approaches zero and the weights become more similar, IDW estimates approach the simple average of the surrounding observations. However, the effect of the farthest observations on the estimated value is decreased with the increase of ‘p’. The most commonly used value of p is 2 in the IDW method (Isaaks & Srivastava 1989), which is used in this study. IDW method is implemented using GS+ software (Robertson 2008) for rainfall interpolation to generate gridded rainfall maps.
Performance assessment of interpolation methods
The performance of three different spatial interpolation methods (i.e., OK, KGP and IDW) are evaluated and compared based on the cross-validation test. The cross-validation is a leave-one-out procedure that involves eliminating the rainfall data set individually one by one from the observed data sets and then re-estimating each rainfall data set by using the remaining data sets. Cross-validation test statistics provides important evidence of the performance measures of the interpolation methods. In this study, RMSE, average standardized error (ASE) and root mean square standardized error (RMSS) are used (Adhikary et al. 2015a, 2015b) for cross-validation. They are briefly described below:
RMSE is used for checking the estimation accuracy between the observed and estimated data. RMSE value close to zero indicates the higher accuracy in estimation.
ASE is used for checking the unbiasedness condition of the interpolator. ASE value close to zero indicates unbiased estimation of the variable.
RMSS is used to test the correct assessment of the variability of prediction, which should be close to 1 for correct assessment of prediction variability of the interpolator. RMSS value close to 1 specifies that the variance is consistent and the variability of the estimated data by interpolation is correctly assessed.
STUDY AREA AND DATA USED
Study area description
The Middle Yarra River catchment (case study area) with the location of existing raingauge stations.
The Middle Yarra River catchment (case study area) with the location of existing raingauge stations.
Data used
There is remarkable variation in the rainfall patterns through different segments of the Yarra River catchment. The mean annual rainfall varies across the catchment from about 1,100 mm in the Upper Yarra segment to 600 mm in the Lower Yarra segment. The Middle Yarra segment (the case study area) covers an area of 1,511 km2. There are 19 (numbers 1 to 19 as shown in Figure 2) raingauges in the study area presently operated by the Australian Bureau of Meteorology (BoM). Historical daily rainfall records of all 19 raingauges for 1980–2012 have been collected from the SILO (http://www.longpaddock.qld.gov.au/silo/) climate database and compiled to generate monthly and annual data, which are then used for the analysis. The annual mean rainfall during the aforementioned periods varies between 508.1 mm and 1,913.9 mm, with the highest rainfall in the southern and south-eastern part and lowest rainfall in the north-western part of the study area.
Summary statistics of monthly and annual rainfall data are given in Table 1. September is the wettest month (rainfall amount equals 112.5 mm) with the highest variation in rainfall. The driest month is February (rainfall amount equals 56.4 mm) with the second highest variation. The skewness coefficient computed for monthly and annual rainfall data is presented in Table 1. The data usually fit normal distribution, if the skewness is very small or close to zero (Robertson 2008). Small skewness values in Table 1 indicate that all data sets are almost normally distributed. Furthermore, it is evident from the K-S test results (Table 1) that the null hypothesis cannot be rejected at 5% significance level because the computed K-S test statistic values for all data sets are smaller than the corresponding critical value (i.e., 0.30143). In this study, the null hypothesis is defined as the condition that indicates the data follow normal distribution. Thus, the rainfall data sets are found normally distributed at 5% significance level and can be directly used for variogram modelling and spatial interpolation.
Summary statistics for the monthly and annual mean rainfall data (n = 19)
. | Rainfall (mm) . | . | . | |||
---|---|---|---|---|---|---|
Period . | Mean . | Std dev . | Minimum . | Maximum . | Skewness . | K-S value . |
January | 67.3 | 31.2 | 1.8 | 201.4 | 0.2796 | 0.14971 |
February | 56.4 | 53.4 | 0.0 | 269.5 | 0.3156 | 0.12822 |
March | 66.3 | 36.9 | 9.2 | 217.8 | 0.2800 | 0.16618 |
April | 84.7 | 45.8 | 15.2 | 246.0 | 0.0176 | 0.15649 |
May | 88.3 | 42.5 | 10.2 | 239.7 | −0.1027 | 0.19500 |
June | 106.4 | 46.2 | 13.8 | 300.2 | −0.0522 | 0.16641 |
July | 102.1 | 49.0 | 17.9 | 303.9 | −0.0745 | 0.14543 |
August | 108.6 | 50.7 | 21.1 | 289.7 | −0.0029 | 0.12252 |
September | 112.5 | 57.0 | 25.3 | 350.3 | 0.0330 | 0.15424 |
October | 101.6 | 50.9 | 4.0 | 333.9 | −0.0988 | 0.17993 |
November | 96.1 | 47.6 | 15.3 | 258.7 | 0.0659 | 0.14956 |
December | 91.3 | 52.9 | 8.2 | 301.2 | 0.0469 | 0.16043 |
Annual | 1081.6 | 293.8 | 508.1 | 1913.9 | −0.0756 | 0.14468 |
. | Rainfall (mm) . | . | . | |||
---|---|---|---|---|---|---|
Period . | Mean . | Std dev . | Minimum . | Maximum . | Skewness . | K-S value . |
January | 67.3 | 31.2 | 1.8 | 201.4 | 0.2796 | 0.14971 |
February | 56.4 | 53.4 | 0.0 | 269.5 | 0.3156 | 0.12822 |
March | 66.3 | 36.9 | 9.2 | 217.8 | 0.2800 | 0.16618 |
April | 84.7 | 45.8 | 15.2 | 246.0 | 0.0176 | 0.15649 |
May | 88.3 | 42.5 | 10.2 | 239.7 | −0.1027 | 0.19500 |
June | 106.4 | 46.2 | 13.8 | 300.2 | −0.0522 | 0.16641 |
July | 102.1 | 49.0 | 17.9 | 303.9 | −0.0745 | 0.14543 |
August | 108.6 | 50.7 | 21.1 | 289.7 | −0.0029 | 0.12252 |
September | 112.5 | 57.0 | 25.3 | 350.3 | 0.0330 | 0.15424 |
October | 101.6 | 50.9 | 4.0 | 333.9 | −0.0988 | 0.17993 |
November | 96.1 | 47.6 | 15.3 | 258.7 | 0.0659 | 0.14956 |
December | 91.3 | 52.9 | 8.2 | 301.2 | 0.0469 | 0.16043 |
Annual | 1081.6 | 293.8 | 508.1 | 1913.9 | −0.0756 | 0.14468 |
Std dev, standard deviation of rainfall; K-S value, Kolmogorov–Smirnov test statistic value, K-S19, 0.05 = 0.30143.
RESULTS AND DISCUSSION
Variogram modelling for spatial correlation analysis
Experimental variogram for monthly and annual rainfall
For variogram modelling, initially an experimental variogram is computed for each monthly and annual rainfall data set based on the variogram cloud (Wackernagel 2003). The variogram cloud is a plot in which the semivariance calculated using Equation (3) for any two data points is plotted against the distance between them (Robertson 2008). For 19 raingauges, the variogram cloud for each data set consists of 171 (19C2 = 171) data pairs in all directions. However, the variogram clouds are difficult to interpret due to the data points being congested. Hence, the variogram cloud is clustered into different bins to achieve the experimental variogram. A bin is a classification of lags where all lags with similar separation distance and directions are considered in the same bin (Adhikary et al. 2015a). The average lag distance and semivariance for all data pairs in each individual bin gives a discrete point on the experimental variogram plot. The experimental variogram thus consists of a finite number of discrete points with common distance and direction.
Experimental isotropic variogram (dotted points) fitted with the variogram model (solid line) for (a) September rainfall and (b) annual rainfall.
Experimental isotropic variogram (dotted points) fitted with the variogram model (solid line) for (a) September rainfall and (b) annual rainfall.
Standard variogram model for monthly and annual rainfall
In traditional OK, three standard parametric variogram models (i.e., exponential, Gaussian, spherical) are used to fit the experimental variogram for all data sets. The variogram parameters that result in the best fitting performance based on RMSE, MAE and CC measures are chosen as the best fitted model in each case. Figure 3 shows the plots of fitted variogram models for September (taken as an example from all months, which is also the wettest month with high rainfall variation) and annual rainfall. The optimal variogram parameters and fitting performance of the standard variogram models for all monthly and annual rainfall are presented in Table 2. As can be seen from Table 2, the ratio of nugget coefficient to sill is small for all data sets indicating that monthly and annual rainfall is spatially correlated over the study area. This supports the use of kriging methods (i.e., OK, KGP), which inherently consider the spatial correlation. It is also evident from Table 2 that the CC values for the fitted models range from 0.886 to 0.989. Eleven months have CC values higher than 0.9 that indicates strong correlation between monthly rainfall and spatial distribution of raingauges. It can be seen that the Gaussian variogram model is the best fitted model for 6 months (i.e., February, March, April, July, November and December). For the remaining 6 months, the exponential variogram model is the best model for January, June and October while the spherical variogram model gives the best results for May, August and September.
Variogram parameters and fitting performance of fitted variogram models used with the kriging-based methods
. | . | Variogram parameters . | Goodness-of-fit statistics . | ||||
---|---|---|---|---|---|---|---|
Month . | Variogram model . | Nugget (mm2) . | Sill (mm2) . | Range (km) . | RMSE (mm2) . | MAE (mm2) . | CC . |
January | Exponential | 0.10 | 173.80 | 30.39 | 20.66 | 19.01 | 0.892 |
Gaussian | 19.40 | 153.60 | 15.54 | 20.68 | 16.92 | 0.888 | |
Spherical | 1.90 | 155.50 | 19.80 | 20.89 | 17.94 | 0.886 | |
GP-based | 0.00 | – | – | 19.73 | 17.26 | 0.899 | |
February | Exponential | 0.10 | 64.85 | 32.52 | 4.49 | 3.94 | 0.967 |
Gaussian | 3.80 | 56.22 | 15.61 | 2.95 | 2.23 | 0.983 | |
Spherical | 0.10 | 56.42 | 19.87 | 3.46 | 2.88 | 0.979 | |
GP-based | 0.00 | – | – | 3.83 | 3.23 | 0.973 | |
March | Exponential | 0.10 | 206.60 | 28.38 | 24.57 | 21.19 | 0.901 |
Gaussian | 0.10 | 185.00 | 13.91 | 16.22 | 15.29 | 0.954 | |
Spherical | 0.10 | 184.10 | 17.78 | 20.06 | 18.06 | 0.939 | |
GP-based | 0.00 | – | – | 17.05 | 15.15 | 0.949 | |
April | Exponential | 0.10 | 278.90 | 33.75 | 23.13 | 20.08 | 0.953 |
Gaussian | 0.10 | 247.40 | 15.55 | 11.94 | 9.86 | 0.986 | |
Spherical | 0.10 | 248.70 | 20.61 | 17.18 | 15.02 | 0.975 | |
GP-based | 0.00 | – | – | 14.89 | 12.80 | 0.978 | |
May | Exponential | 1.00 | 426.90 | 49.47 | 28.99 | 22.79 | 0.960 |
Gaussian | 33.50 | 327.60 | 19.61 | 28.20 | 18.83 | 0.959 | |
Spherical | 1.00 | 339.90 | 26.35 | 28.07 | 20.14 | 0.960 | |
GP-based | 0.00 | – | – | 27.14 | 18.55 | 0.963 | |
June | Exponential | 1.00 | 775.00 | 36.33 | 75.88 | 67.36 | 0.928 |
Gaussian | 35.00 | 648.40 | 15.26 | 76.61 | 59.83 | 0.924 | |
Spherical | 1.00 | 657.90 | 20.23 | 79.32 | 63.54 | 0.919 | |
GP-based | 0.00 | – | – | 70.96 | 61.07 | 0.936 | |
July | Exponential | 1.00 | 855.10 | 30.96 | 82.10 | 66.73 | 0.933 |
Gaussian | 71.00 | 752.60 | 14.96 | 79.07 | 64.35 | 0.935 | |
Spherical | 1.00 | 753.60 | 18.66 | 80.63 | 67.34 | 0.933 | |
GP-based | 64.00 | – | – | 74.27 | 53.91 | 0.943 | |
August | Exponential | 13.00 | 936.90 | 35.67 | 34.78 | 27.46 | 0.989 |
Gaussian | 208.00 | 852.00 | 21.91 | 39.23 | 31.19 | 0.984 | |
Spherical | 94.00 | 851.80 | 26.07 | 32.07 | 25.82 | 0.989 | |
GP-based | 0.00 | – | – | 34.92 | 28.66 | 0.988 | |
September | Exponential | 1.00 | 912.90 | 39.87 | 65.12 | 52.39 | 0.965 |
Gaussian | 72.00 | 771.50 | 18.53 | 59.75 | 49.53 | 0.966 | |
Spherical | 1.00 | 780.80 | 23.56 | 59.82 | 48.83 | 0.967 | |
GP-based | 0.00 | – | – | 55.12 | 42.45 | 0.972 | |
October | Exponential | 1.00 | 412.90 | 31.68 | 26.87 | 19.26 | 0.986 |
Gaussian | 25.00 | 369.20 | 16.14 | 22.26 | 18.86 | 0.981 | |
Spherical | 1.00 | 377.50 | 21.81 | 22.30 | 19.37 | 0.982 | |
GP-based | 0.00 | – | – | 15.41 | 13.78 | 0.991 | |
November | Exponential | 1.00 | 345.50 | 34.47 | 23.79 | 20.45 | 0.965 |
Gaussian | 1.00 | 284.60 | 14.25 | 15.85 | 14.22 | 0.981 | |
Spherical | 0.10 | 286.60 | 19.08 | 19.84 | 17.12 | 0.975 | |
GP-based | 0.00 | – | – | 17.51 | 15.56 | 0.978 | |
December | Exponential | 1.00 | 361.80 | 32.88 | 22.04 | 16.18 | 0.975 |
Gaussian | 1.00 | 307.20 | 14.05 | 15.71 | 12.26 | 0.984 | |
Spherical | 0.10 | 310.60 | 19.04 | 19.30 | 14.94 | 0.979 | |
GP-based | 0.00 | – | – | 9.54 | 7.73 | 0.994 | |
Annual | Exponential | 100.00 | 61,300.00 | 36.60 | 4,479.63 | 3,524.03 | 0.965 |
Gaussian | 1,700.00 | 50,970.00 | 15.69 | 3,706.71 | 2,960.04 | 0.971 | |
Spherical | 100.00 | 51,830.00 | 21.19 | 4,118.88 | 3,539.73 | 0.966 | |
GP-based | 0.00 | – | – | 3,535.70 | 2,888.40 | 0.975 |
. | . | Variogram parameters . | Goodness-of-fit statistics . | ||||
---|---|---|---|---|---|---|---|
Month . | Variogram model . | Nugget (mm2) . | Sill (mm2) . | Range (km) . | RMSE (mm2) . | MAE (mm2) . | CC . |
January | Exponential | 0.10 | 173.80 | 30.39 | 20.66 | 19.01 | 0.892 |
Gaussian | 19.40 | 153.60 | 15.54 | 20.68 | 16.92 | 0.888 | |
Spherical | 1.90 | 155.50 | 19.80 | 20.89 | 17.94 | 0.886 | |
GP-based | 0.00 | – | – | 19.73 | 17.26 | 0.899 | |
February | Exponential | 0.10 | 64.85 | 32.52 | 4.49 | 3.94 | 0.967 |
Gaussian | 3.80 | 56.22 | 15.61 | 2.95 | 2.23 | 0.983 | |
Spherical | 0.10 | 56.42 | 19.87 | 3.46 | 2.88 | 0.979 | |
GP-based | 0.00 | – | – | 3.83 | 3.23 | 0.973 | |
March | Exponential | 0.10 | 206.60 | 28.38 | 24.57 | 21.19 | 0.901 |
Gaussian | 0.10 | 185.00 | 13.91 | 16.22 | 15.29 | 0.954 | |
Spherical | 0.10 | 184.10 | 17.78 | 20.06 | 18.06 | 0.939 | |
GP-based | 0.00 | – | – | 17.05 | 15.15 | 0.949 | |
April | Exponential | 0.10 | 278.90 | 33.75 | 23.13 | 20.08 | 0.953 |
Gaussian | 0.10 | 247.40 | 15.55 | 11.94 | 9.86 | 0.986 | |
Spherical | 0.10 | 248.70 | 20.61 | 17.18 | 15.02 | 0.975 | |
GP-based | 0.00 | – | – | 14.89 | 12.80 | 0.978 | |
May | Exponential | 1.00 | 426.90 | 49.47 | 28.99 | 22.79 | 0.960 |
Gaussian | 33.50 | 327.60 | 19.61 | 28.20 | 18.83 | 0.959 | |
Spherical | 1.00 | 339.90 | 26.35 | 28.07 | 20.14 | 0.960 | |
GP-based | 0.00 | – | – | 27.14 | 18.55 | 0.963 | |
June | Exponential | 1.00 | 775.00 | 36.33 | 75.88 | 67.36 | 0.928 |
Gaussian | 35.00 | 648.40 | 15.26 | 76.61 | 59.83 | 0.924 | |
Spherical | 1.00 | 657.90 | 20.23 | 79.32 | 63.54 | 0.919 | |
GP-based | 0.00 | – | – | 70.96 | 61.07 | 0.936 | |
July | Exponential | 1.00 | 855.10 | 30.96 | 82.10 | 66.73 | 0.933 |
Gaussian | 71.00 | 752.60 | 14.96 | 79.07 | 64.35 | 0.935 | |
Spherical | 1.00 | 753.60 | 18.66 | 80.63 | 67.34 | 0.933 | |
GP-based | 64.00 | – | – | 74.27 | 53.91 | 0.943 | |
August | Exponential | 13.00 | 936.90 | 35.67 | 34.78 | 27.46 | 0.989 |
Gaussian | 208.00 | 852.00 | 21.91 | 39.23 | 31.19 | 0.984 | |
Spherical | 94.00 | 851.80 | 26.07 | 32.07 | 25.82 | 0.989 | |
GP-based | 0.00 | – | – | 34.92 | 28.66 | 0.988 | |
September | Exponential | 1.00 | 912.90 | 39.87 | 65.12 | 52.39 | 0.965 |
Gaussian | 72.00 | 771.50 | 18.53 | 59.75 | 49.53 | 0.966 | |
Spherical | 1.00 | 780.80 | 23.56 | 59.82 | 48.83 | 0.967 | |
GP-based | 0.00 | – | – | 55.12 | 42.45 | 0.972 | |
October | Exponential | 1.00 | 412.90 | 31.68 | 26.87 | 19.26 | 0.986 |
Gaussian | 25.00 | 369.20 | 16.14 | 22.26 | 18.86 | 0.981 | |
Spherical | 1.00 | 377.50 | 21.81 | 22.30 | 19.37 | 0.982 | |
GP-based | 0.00 | – | – | 15.41 | 13.78 | 0.991 | |
November | Exponential | 1.00 | 345.50 | 34.47 | 23.79 | 20.45 | 0.965 |
Gaussian | 1.00 | 284.60 | 14.25 | 15.85 | 14.22 | 0.981 | |
Spherical | 0.10 | 286.60 | 19.08 | 19.84 | 17.12 | 0.975 | |
GP-based | 0.00 | – | – | 17.51 | 15.56 | 0.978 | |
December | Exponential | 1.00 | 361.80 | 32.88 | 22.04 | 16.18 | 0.975 |
Gaussian | 1.00 | 307.20 | 14.05 | 15.71 | 12.26 | 0.984 | |
Spherical | 0.10 | 310.60 | 19.04 | 19.30 | 14.94 | 0.979 | |
GP-based | 0.00 | – | – | 9.54 | 7.73 | 0.994 | |
Annual | Exponential | 100.00 | 61,300.00 | 36.60 | 4,479.63 | 3,524.03 | 0.965 |
Gaussian | 1,700.00 | 50,970.00 | 15.69 | 3,706.71 | 2,960.04 | 0.971 | |
Spherical | 100.00 | 51,830.00 | 21.19 | 4,118.88 | 3,539.73 | 0.966 | |
GP-based | 0.00 | – | – | 3,535.70 | 2,888.40 | 0.975 |
It is important to note that the best fitted variogram model may not always give the best results in interpolation by kriging if it cannot satisfy the cross-validation criteria (Wackernagel 2003). Cross-validation test is often used in kriging to choose the ultimate form of the standard variogram models. Hence, the ultimate form of these models is chosen based on the cross-validation test and the adopted model is finally used to generate the rainfall map. In order to satisfy the cross-validation test criteria, the variogram parameters (i.e., nugget, sill, range) of the standard parametric variogram models are iteratively changed and interpolation results are re-evaluated. The cross-validation results are given in Table 3, which indicates that all adopted variogram models satisfy the cross-validation criteria and are thus appropriate for use in kriging interpolation and rainfall map production.
Cross-validation results of estimating monthly and annual mean rainfall using different interpolation methods
Cross-validation statistics . | Interpolation methods . | January . | February . | March . | April . | May . | June . | July . | August . | September . | October . | November . | December . | Annual . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | OKS | 9.33 | 5.23 | 9.71 | 10.04 | 11.13 | 16.17 | 17.50 | 16.95 | 16.83 | 12.24 | 11.34 | 11.69 | 141.13 |
OKE | 10.65 | 6.20 | 11.84 | 12.38 | 12.41 | 19.62 | 21.68 | 21.28 | 20.57 | 15.20 | 13.43 | 13.89 | 171.75 | |
OKG | 11.46 | 6.93 | 12.90 | 14.14 | 14.58 | 22.65 | 24.37 | 23.09 | 23.26 | 16.81 | 15.34 | 15.73 | 197.46 | |
KGP | 11.10 | 5.48 | 11.52 | 13.16 | 11.49 | 16.42 | 23.14 | 21.30 | 22.98 | 13.17 | 12.23 | 14.70 | 177.26 | |
IDW | 11.11 | 6.67 | 12.32 | 13.57 | 14.82 | 22.01 | 23.61 | 23.77 | 23.41 | 16.36 | 14.78 | 15.28 | 192.64 | |
ASE | OKS | −0.137 | −0.118 | −0.133 | −0.140 | −0.089 | −0.111 | −0.083 | −0.077 | −0.119 | −0.117 | −0.139 | −0.145 | −0.122 |
OKE | −0.102 | −0.091 | −0.096 | −0.106 | −0.087 | −0.101 | −0.087 | −0.089 | −0.103 | −0.100 | −0.097 | −0.111 | −0.101 | |
OKG | −0.108 | −0.102 | −0.105 | −0.113 | −0.111 | −0.114 | −0.098 | −0.100 | −0.121 | −0.105 | −0.101 | −0.111 | −0.110 | |
KGP | −0.369 | −0.288 | −0.136 | −0.254 | −0.201 | −0.088 | −0.027 | 0.004 | −0.612 | −0.247 | −0.253 | −0.121 | −0.285 | |
IDW | −0.155 | −0.132 | −0.143 | −0.169 | −0.172 | −0.175 | −0.165 | −0.162 | −0.174 | −0.171 | −0.142 | −0.166 | −0.169 | |
RMSS | OKS | 0.984 | 0.984 | 0.992 | 0.991 | 0.978 | 0.988 | 0.989 | 0.995 | 0.990 | 0.989 | 0.991 | 0.991 | 0.991 |
OKE | 1.000 | 1.012 | 1.001 | 1.029 | 1.019 | 1.036 | 1.056 | 1.039 | 1.033 | 1.054 | 1.030 | 1.045 | 1.042 | |
OKG | 0.980 | 0.985 | 1.000 | 0.981 | 0.987 | 0.987 | 0.991 | 0.995 | 0.988 | 0.987 | 0.981 | 0.981 | 0.983 | |
KGP | 1.144 | 1.127 | 1.108 | 1.080 | 1.047 | 0.896 | 1.107 | 1.131 | 1.427 | 1.023 | 1.069 | 1.053 | 1.127 | |
IDW | 1.021 | 0.996 | 1.006 | 1.058 | 1.123 | 1.102 | 1.182 | 1.139 | 1.096 | 1.157 | 1.026 | 1.056 | 1.104 |
Cross-validation statistics . | Interpolation methods . | January . | February . | March . | April . | May . | June . | July . | August . | September . | October . | November . | December . | Annual . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | OKS | 9.33 | 5.23 | 9.71 | 10.04 | 11.13 | 16.17 | 17.50 | 16.95 | 16.83 | 12.24 | 11.34 | 11.69 | 141.13 |
OKE | 10.65 | 6.20 | 11.84 | 12.38 | 12.41 | 19.62 | 21.68 | 21.28 | 20.57 | 15.20 | 13.43 | 13.89 | 171.75 | |
OKG | 11.46 | 6.93 | 12.90 | 14.14 | 14.58 | 22.65 | 24.37 | 23.09 | 23.26 | 16.81 | 15.34 | 15.73 | 197.46 | |
KGP | 11.10 | 5.48 | 11.52 | 13.16 | 11.49 | 16.42 | 23.14 | 21.30 | 22.98 | 13.17 | 12.23 | 14.70 | 177.26 | |
IDW | 11.11 | 6.67 | 12.32 | 13.57 | 14.82 | 22.01 | 23.61 | 23.77 | 23.41 | 16.36 | 14.78 | 15.28 | 192.64 | |
ASE | OKS | −0.137 | −0.118 | −0.133 | −0.140 | −0.089 | −0.111 | −0.083 | −0.077 | −0.119 | −0.117 | −0.139 | −0.145 | −0.122 |
OKE | −0.102 | −0.091 | −0.096 | −0.106 | −0.087 | −0.101 | −0.087 | −0.089 | −0.103 | −0.100 | −0.097 | −0.111 | −0.101 | |
OKG | −0.108 | −0.102 | −0.105 | −0.113 | −0.111 | −0.114 | −0.098 | −0.100 | −0.121 | −0.105 | −0.101 | −0.111 | −0.110 | |
KGP | −0.369 | −0.288 | −0.136 | −0.254 | −0.201 | −0.088 | −0.027 | 0.004 | −0.612 | −0.247 | −0.253 | −0.121 | −0.285 | |
IDW | −0.155 | −0.132 | −0.143 | −0.169 | −0.172 | −0.175 | −0.165 | −0.162 | −0.174 | −0.171 | −0.142 | −0.166 | −0.169 | |
RMSS | OKS | 0.984 | 0.984 | 0.992 | 0.991 | 0.978 | 0.988 | 0.989 | 0.995 | 0.990 | 0.989 | 0.991 | 0.991 | 0.991 |
OKE | 1.000 | 1.012 | 1.001 | 1.029 | 1.019 | 1.036 | 1.056 | 1.039 | 1.033 | 1.054 | 1.030 | 1.045 | 1.042 | |
OKG | 0.980 | 0.985 | 1.000 | 0.981 | 0.987 | 0.987 | 0.991 | 0.995 | 0.988 | 0.987 | 0.981 | 0.981 | 0.983 | |
KGP | 1.144 | 1.127 | 1.108 | 1.080 | 1.047 | 0.896 | 1.107 | 1.131 | 1.427 | 1.023 | 1.069 | 1.053 | 1.127 | |
IDW | 1.021 | 0.996 | 1.006 | 1.058 | 1.123 | 1.102 | 1.182 | 1.139 | 1.096 | 1.157 | 1.026 | 1.056 | 1.104 |
RMSE, root mean square error; ASE, average standardized error; RMSS, root mean square standardized error.
OKS, ordinary kriging with spherical variogram model; OKE, ordinary kriging with exponential variogram model; OKG, ordinary kriging with Gaussian variogram model; KGP, kriging with genetic programming-based variogram model; IDW, inverse distance weighting.
GP-based variogram model for monthly and annual rainfall
In the KGP method, the GP-based variogram model is used with traditional kriging. GP is used to approximate the experimental variogram to achieve the GP-based variogram model. Available simple function sets in the standard variogram models (+, −, *, /, x2, exp) are used for the GP runs to obtain simple GP models. The evolved GP-based models with the aforementioned function sets can exhibit a similar functional structure to that of the standard variogram models. The best fitted GP model is obtained by using the objective function of minimizing RMSE. In this study, a population size of 500, crossover rate of 0.95, mutation rate of 0.05 and the number of generations as 500 are adopted for the evolution of the best GP-based models. The performance measures (RMSE, MAE, CC) of the GP-based variogram models for all data sets are presented in Table 2. It can be seen from Table 2 that the GP-based variogram model outperforms the standard variogram models for 7 months (i.e., January, May, June, July, September, October, December) and also shows better performance than the other two standard variogram models for another 3 months (i.e., March, April, November) and annual rainfall when all the performance measures are considered. Figure 3 shows the plots of fitted GP-based variogram models for September and annual rainfall. Teegavarapu (2007) indicated that the nugget coefficient of the variogram model significantly impacts the kriging computation and thus should be correctly estimated for best results by kriging interpolation. The nugget coefficient of the GP-based variogram model is determined by extending the function to touch the ordinate for zero distance in the abscissa. The nugget coefficients can then be read from the ordinate, which are also presented for all data sets in Table 2. It can be seen from Figure 3 that the GP-based variogram model has a similar structure to that of the standard variogram models (i.e., spherical, exponential, Gaussian). It is also evident from Figure 3 that the structure of the GP-based variogram model is composed of two standard models. Thus, the GP-based variogram model can be referred to as a combined variogram model of standard structure. A major advantage of the GP-based variogram model is that it does not have any pre-defined structure and parameters unlike the standard parametric variogram models.
Isaaks & Srivastava (1989) mentioned that any form of new variogram function can be fitted to the experimental variogram. They also indicated that developing and selecting such new variogram models (for example, the GP-based variogram model in this study) other than the standard ones may not result in a unique and stable solution for the kriging weights if the ‘positive definiteness condition’, an essential requirement to be satisfied by any variogram model, is not satisfied. The ‘positive definiteness condition’ for a variogram model demonstrates that the variance obtained by kriging interpolation remains always positive (Wackernagel 2003). It is worth mentioning that the standard variogram models always satisfy the ‘positive definiteness condition’ (Teegavarapu 2007) and thus results in a unique solution for the kriging weights in traditional OK. Therefore, the cross-validation test alone is adequate to select the best standard variogram model. However, both cross-validation and positive definiteness criteria must be satisfied simultaneously for the GP-based variogram model so that it can generate a unique solution for the kriging weights in the KGP method. The cross-validation results based on the GP-based variogram model are given in Table 3. It is obvious from Table 3 that all the cross-validation criteria are satisfied by the GP-based variogram model for all data sets. At the same time, the estimated kriging variance (results not shown) for all data sets is found positive in all stations and hence satisfies the positive definiteness condition. Therefore, the obtained GP-based variogram model for all data sets is able to provide a unique and stable solution for the kriging weights and can be used in the KGP method for rainfall interpolation and subsequent map generation.
Spatial estimation of rainfall
Different stochastic (i.e., OK, KGP) and deterministic (e.g., IDW) methods are used in the current study to estimate the monthly and annual rainfall patterns in the Middle Yarra River catchment in Australia. In stochastic interpolation framework, kriging with three standard parametric models, namely, OK with spherical model (OKS), OK with exponential model (OKE), OK with Gaussian model (OKG) as well as kriging with GP-based non-parametric variogram model (KGP) are implemented.
The RMSE and cross-validation plot results have been widely used to indicate how accurately an interpolator estimates the observed values. A smaller error in RMSE indicates better prediction. In the case of a cross-validation plot, the perfect estimation is that wherein all scattered points lie on the 45 ° line in the plot with the maximum coefficient of determination (R2) between estimated and measured data.
Table 3 presents the cross-validation results of monthly and annual rainfall estimates over the study area. As can be seen from the table, stochastic interpolation techniques (i.e., OK and KGP) yield better estimation than the deterministic technique (i.e., IDW) for almost all cases considering RMSE values. Similar findings have been reported in many studies (e.g., Goovaerts 2000; Delbari et al. 2013), where IDW has been found to be less accurate than kriging methods for the monthly and annual rainfall estimation.
Rainfall maps for September and annual rainfall by (a) OK, (b) KGP and (c) IDW methods.
Rainfall maps for September and annual rainfall by (a) OK, (b) KGP and (c) IDW methods.
PBIAS obtained for (a) September and (b) annual rainfall by OK, KGP and IDW methods.
PBIAS obtained for (a) September and (b) annual rainfall by OK, KGP and IDW methods.
CONCLUSIONS
In this study, the aim is to apply two kriging-based stochastic interpolation methods (i.e., traditional kriging and KGP) and a deterministic interpolation method (i.e., IDW) for the estimation of monthly and annual rainfall distribution in the Middle Yarra River catchment, Australia. The objective is to compare the performance of the aforementioned methods for generating a high quality gridded rainfall data set for the study area in the form of a rainfall map. This study introduces KGP as a new kriging-based interpolation method for interpolation of point rainfalls to generate the gridded rainfall map. In the KGP method, the GP-based non-parametric variogram model is used with kriging instead of using the standard parametric variogram models (i.e., spherical, exponential, and Gaussian). A major advantage of the GP-based variogram model is that it does not require a pre-defined functional form and parameters to fit the experimental variogram unlike the standard parametric variogram models in traditional kriging.
The following points are the major conclusions of this study:
The cross-validation results indicate that the kriging-based stochastic interpolation methods (i.e., OK and KGP) clearly outperform the deterministic method (i.e., IDW) for spatial interpolation of rainfall.
Among all the kriging-based methods, OK with the spherical variogram model (OKS) produces the lowest prediction error and highest coefficient of correlation for rainfall estimation. Thus, OKS is found to be the most accurate interpolator for estimating the monthly and annual rainfalls over the study area.
The KGP method yields almost identical prediction error and coefficient of correlation for rainfall estimation to that given by the OKS for most of the months. Also, the KGP method gives the lower prediction error and is found to be a better interpolator than the OK with the exponential or Gaussian variogram models for most of the months.
Although for some months, the KGP method cannot produce the lowest prediction error, the method gives reasonable estimates, which are in line with the OKS (best selected interpolator in this study). The maps generated by the KGP method seem to be consistent with the observed rainfall and reasonable with that produced by the best interpolator. A major advantage of the KGP method is that it eliminates the time-consuming and monotonous task of obtaining the variogram parameters and kriging weights as required with traditional kriging. Therefore, the KGP offers a viable alternative to traditional kriging for spatial interpolation and mapping of rainfall in hydrology.
ACKNOWLEDGEMENTS
The authors wish to thank DHI Water and Environment for providing GPKernel and the BoM for providing the necessary data.