Rainfall maps with gridded data are frequently used as an important input for many hydrological models. In this study, two kriging-based interpolation methods (i.e., ordinary kriging (OK) and kriging with genetic programming (KGP)) and a deterministic interpolation method (inverse distance weighting (IDW)) are implemented to generate gridded rainfall maps from point rainfalls. The KGP is implemented as a new kriging method in which the genetic programming-based non-parametric variogram model is used with kriging. Rainfall records from existing 19 raingauges in the Middle Yarra River catchment, Australia are used for the analysis. The performance of each method is assessed through the cross-validation test. Results indicate that the kriging-based methods clearly outperform the IDW method. Among all the kriging-based methods, OK with the spherical variogram model yields the lowest prediction error and best estimates for all months. The KGP method gives an almost identical error to that given by the OK with the spherical variogram model for most of the months and a lower prediction error than that given by OK with the exponential or Gaussian variogram model. Thus, the KGP can be used in line with traditional kriging as a viable alternative technique for spatial estimation and mapping of rainfall.

INTRODUCTION

Rainfall maps with continuous gridded data provide an important input for many hydrological modelling and water resources management tasks including water budget analysis, flood forecasting, climate change studies, reservoir operation, irrigation scheduling and management, etc. (Moral 2010). The accuracy of most hydrological analysis highly depends on the correct estimation of the spatial distribution of rainfall. Improved estimates of rainfall distribution can be achieved by a dense network of raingauges. Unfortunately, the raingauge network is often sparse and the number of raingauges in the networks is limited due to financial, logistics and geological factors (Abo-Monasar & Al-Zahrani 2014). Consequently, point rainfalls are available from a finite number of raingauges in space. Moreover, the spatial extent of rainfall may not be arranged on a regular grid. Therefore, spatial interpolation methods are important in such cases to estimate rainfall at unsampled locations using the available sampled rainfall values at surrounding locations.

Different deterministic, stochastic and data-driven interpolation methods are generally used to estimate the spatial distribution of rainfall from the ground-based point data. Deterministic methods use the existing configuration of the observed data points to generate a surface, which usually include traditional weighting techniques. Inverse distance weighting (IDW) is a typical representative of deterministic methods, which have been extensively used for spatial interpolation of rainfall (ASCE 1996). Although attempts have been made to upgrade the deterministic techniques over time, limitations of those techniques continue to exist and are discussed in Teegavarapu & Chandramouli (2005).

Kriging-based stochastic interpolation techniques offer important advantages over the deterministic methods, which have been extensively applied for rainfall interpolation (e.g., Moral 2010; Delbari et al. 2013; Abo-Monasar & Al-Zahrani 2014). Kriging has been increasingly preferred because it takes into account the spatial correlation between neighbouring observations and desired location where the estimation is to be made. A noted advantage of kriging over simpler methods such as IDW is that it is capable of utilizing correlated dense secondary variables such as topographic information (e.g., Goovaerts 2000; Feki et al. 2012) to improve the estimation accuracy of sparsely sampled primary variable (i.e., rainfall).

Ordinary kriging (OK) is one of the most preferred stochastic interpolation methods for spatial rainfall estimation from the point rainfall data. However, the accuracy of traditional OK highly depends on the variogram model that generally demonstrates the spatial autocorrelation structure of the underlying stochastic process. Therefore, extra attention should be paid to selecting appropriate variogram models and finding optimal variogram parameters (i.e., nugget, sill, range). The time-consuming and tedious task for calibrating the variogram parameters of the standard variogram models (i.e., spherical, exponential, Gaussian) is a major difficulty associated with the traditional OK.

In the recent past, researchers have demonstrated the application of kriging with different data-driven methods namely, artificial neural networks (ANN) (e.g., Teegavarapu 2007), neuro-fuzzy system and support vector machine (e.g., Kisi & Sanikhani 2015), fuzzy theory and genetic algorithm (GA) (e.g., Chang et al. 2005), combined ANN and GA (Asghari & Nasseri 2015), genetic programming (GP) (e.g., Adhikary et al. 2015b) for spatial interpolation. The major advantage of kriging with data-driven methods is that they usually yield better results than those obtained by traditional deterministic (e.g., IDW) and stochastic interpolation methods (e.g., traditional kriging). However, a major drawback of the ANN- and GA-based methods is that no analytical equations of functional relationships between variables are obtained in these methods, and it is usually not easy to interpret the network weights given by the ANN models (Muttil & Lee 2005).

GP (Koza 1992) is an evolutionary data-driven modelling technique, which can be applied to obtain the variogram model to be used with kriging. The output from GP is an empirical model to approximate the function of the underlying process. GP inferred models exhibit the advantages of creating simple expressions and thereby offering some likely interpretations of the underlying process (Muttil & Lee 2005). The main advantage of GP as compared to traditional models is that it does not assume any a priori functional form of the solution. Globally, a number of applications of GP have been reported in the hydrology and water resources field (e.g., Khu et al. 2001; Sivapragasam et al. 2010, 2012; Yilmaz & Muttil 2014). To the best knowledge of the authors, kriging with a GP-based data-driven method has not been explicitly used as an interpolator in the past to generate gridded rainfall maps. Therefore, the objective of this study is to apply kriging with genetic programming (KGP) for spatial interpolation of monthly and annual mean rainfall to generate the gridded rainfall data set for the Middle Yarra River catchment in Australia. GP is used to get the GP-based non-parametric variogram model to be used in the KGP method, which replaces the standard parametric variogram models in the traditional OK.

The rest of the paper has been organized as follows. First, details of the methodology are presented, which is followed by a brief description of the study area and data sets used. The results are summarized next and, finally, conclusions are drawn.

METHODOLOGY

The methodology of this study consists of three main parts:

  • data preparation for normal distribution;

  • spatial interpolation of rainfall using deterministic and stochastic interpolation methods;

  • assessment of interpolation methods for spatial estimation of rainfall.

The schematic illustration of the methodology is shown in Figure 1. Details of the methodology are described in the following subsections.

Figure 1

Spatial interpolation methods employed in this study: (a) traditional OK; (b) KGP; and (c) IDW methods.

Figure 1

Spatial interpolation methods employed in this study: (a) traditional OK; (b) KGP; and (c) IDW methods.

Data preparation

The normal distribution of data is a basic requirement of the kriging-based geostatistical approach. Kriging leads to an optimum estimator and yields best results for the normally distributed data (Adhikary et al. 2015a, 2015b). Abo-Monasar & Al-Zahrani (2014) have indicated that variograms developed with the non-normal data are unreliable for kriging interpolation. If data are not normally distributed, appropriate transformations should be implemented to normalize the data. In this study, log-transformation is used for the transformation of non-normal or skewed data. Finally, the widely used Kolmogorov–Smirnov (K-S) test is used to confirm the normality condition of the transformed rainfall data for a 5% significance level. The K-S test is a simple and straightforward statistical test to examine the normality condition of a data set, which has been explained in detail in McCuen (2003). The resulting normalized rainfall data sets are used for interpolation and map production.

Spatial interpolation methods

In this study, two stochastic interpolation methods (i.e., OK and KGP), and a deterministic interpolation method (i.e., IDW), are used for interpolation of rainfall and preparation of rainfall maps. The basic conceptual difference between traditional OK and KGP methods is presented in Figure 1(a) and 1(b). As can be seen from the figures, the standard parametric variogram models (e.g., spherical, exponential, Gaussian) are used in the traditional OK method, whereas the GP-based non-parametric variogram model is included in the KGP method. A brief description of these methods is presented in the following sub-sections.

OK

Kriging refers to a surface interpolation method based on spatially dependent variance, which estimates variable values at target locations in space using the known sampling values (Isaaks & Srivastava 1989). In this study, OK is used for spatial interpolation of point rainfalls, which is given by: 
formula
1
where is the estimates of variable θ (rainfall in this study) at target position ; indicates the kriging weights linked with the sampled location with respect to ; and n is the number of sampling points.
The kriging weights are obtained by solving an optimization scheme containing (n + 1) simultaneous linear equations as given by: 
formula
2
where and indicate the variogram values that come from the standard variogram models for the distance and , respectively; is the separation distance between sampling points and ; is the separation distance between the sampling point and the target location, (where estimation is to be made); and is the Lagrange multiplier.
The robustness of kriging heavily depends on proper selection of a variogram model that demonstrates the degree of spatial autocorrelation in the data set. Standard variogram models are generally used in traditional OK. In this study, three most commonly used standard variogram models in hydrology, namely, spherical, exponential, and Gaussian models are considered. Initially, an experimental variogram is derived from the observed data by: 
formula
3
where and are the rainfall values at corresponding sampling locations and , respectively, for a separation distance d, and is the number of data pairs. A standard variogram model is then fitted to the experimental variogram . The variogram model fitted with the lowest root mean square error (RMSE) and mean absolute error (MAE), and the highest correlation coefficient (CC) is selected as the best model. Variogram model fitting and traditional OK are implemented using GS+ software (Robertson 2008) for rainfall interpolation to generate gridded rainfall maps.

KGP

In this study, KGP is given by: 
formula
4
where is the estimates of variable θ (rainfall in this study) at target location ; represents the kriging weights in the KGP method.
The weights in the KGP method are also obtained by solving an optimization scheme containing (n + 1) simultaneous linear equations as expressed by: 
formula
5
where and refer to the GP-based variogram values; and indicates the Lagrangian multiplier.
The GP-based variogram model is used in the KGP method for rainfall interpolation and subsequent rainfall map production. GP is used to get the GP-based non-parametric variogram model by approximating the experimental variogram in Equation (3). The key benefit of using GP for function approximation is that it expresses the underlying structure of a process in the form of a mathematical function (or equation), which can be scientifically interpreted to derive knowledge about the process to be modelled. Mathematically, the GP-based variogram model in this study is expressed by: 
formula
6
As mentioned earlier in the Introduction, many engineering applications have extensively applied GP for approximating the underlying structure of either a natural or an experimental process. The evolutionary search strategy behind GP (Koza 1992) is that of a GA (Goldberg 1989). GP differs from the traditional GA in that it operates on parse trees instead of bit strings to generate the equation that best describes the relationship between the output and input variables. As in a GA, GP starts by randomly generating a population of parse trees, calculates their fitness (which is a measure of how well they solve the given problem) and subsequently selects the better parse trees for reproduction and recombination (usually done through crossover and mutation) to form the next generation of parse trees. This process of selection and reproduction continues iteratively until a specified stopping criterion is met. A detailed description of GP with water resources applications can be found in Babovic & Keijzer (2000), Liong et al. (2002) and Muttil & Lee (2005).

The GP-based variogram model is obtained using GPKernel, a software package developed by DHI Water and Environment (Babovic & Keijzer 2000). The GPKernel is a command line-based tool to obtain the underlying functions (in mathematical form) on data. The model with the lowest RMSE and MAE values and the highest CC value is chosen as the best GP model. The KGP method is implemented through a customized spreadsheet application developed in the Microsoft Excel platform for rainfall interpolation to produce gridded rainfall maps.

IDW

The estimation by IDW is a linear weighted average of several neighbouring observations, which is given by: 
formula
7
where indicates the estimated rainfall at target location ; represents the weights assigned to sampling location with respect to n is the number of sampling points; is the distance from the sampling location to the target location at ; and p is an arbitrary power.

As p approaches zero and the weights become more similar, IDW estimates approach the simple average of the surrounding observations. However, the effect of the farthest observations on the estimated value is decreased with the increase of ‘p’. The most commonly used value of p is 2 in the IDW method (Isaaks & Srivastava 1989), which is used in this study. IDW method is implemented using GS+ software (Robertson 2008) for rainfall interpolation to generate gridded rainfall maps.

Performance assessment of interpolation methods

The performance of three different spatial interpolation methods (i.e., OK, KGP and IDW) are evaluated and compared based on the cross-validation test. The cross-validation is a leave-one-out procedure that involves eliminating the rainfall data set individually one by one from the observed data sets and then re-estimating each rainfall data set by using the remaining data sets. Cross-validation test statistics provides important evidence of the performance measures of the interpolation methods. In this study, RMSE, average standardized error (ASE) and root mean square standardized error (RMSS) are used (Adhikary et al. 2015a, 2015b) for cross-validation. They are briefly described below:

  • RMSE is used for checking the estimation accuracy between the observed and estimated data. RMSE value close to zero indicates the higher accuracy in estimation.

  • ASE is used for checking the unbiasedness condition of the interpolator. ASE value close to zero indicates unbiased estimation of the variable.

  • RMSS is used to test the correct assessment of the variability of prediction, which should be close to 1 for correct assessment of prediction variability of the interpolator. RMSS value close to 1 specifies that the variance is consistent and the variability of the estimated data by interpolation is correctly assessed.

STUDY AREA AND DATA USED

Study area description

The middle segment of Yarra River catchment in Victoria, Australia is considered as the case study area. Figure 2 shows the approximate location of the case study area. The Yarra River catchment is located north and east of Melbourne and covers an area of 4,044 km2, which is home to more than one-third of Victoria's population (approximately 1.8 million). Although the Yarra River catchment is not large with respect to other Australian catchments, it yields the fourth highest amount of water per hectare of the catchments in Victoria (Adhikary et al. 2015a). Therefore, this is a very productive catchment and thus plays a very important role for water supply in Melbourne. The Middle Yarra segment of the catchment consists of an extensive flood plain, which is mainly used for agricultural activities. The Lower Yarra segment is mainly characterized by the urbanized floodplain areas of Melbourne city. Most of the land along rivers and creeks in the middle and lower segments has been cleared for agriculture or urban development (Barua et al. 2012).

Figure 2

The Middle Yarra River catchment (case study area) with the location of existing raingauge stations.

Figure 2

The Middle Yarra River catchment (case study area) with the location of existing raingauge stations.

Data used

There is remarkable variation in the rainfall patterns through different segments of the Yarra River catchment. The mean annual rainfall varies across the catchment from about 1,100 mm in the Upper Yarra segment to 600 mm in the Lower Yarra segment. The Middle Yarra segment (the case study area) covers an area of 1,511 km2. There are 19 (numbers 1 to 19 as shown in Figure 2) raingauges in the study area presently operated by the Australian Bureau of Meteorology (BoM). Historical daily rainfall records of all 19 raingauges for 1980–2012 have been collected from the SILO (http://www.longpaddock.qld.gov.au/silo/) climate database and compiled to generate monthly and annual data, which are then used for the analysis. The annual mean rainfall during the aforementioned periods varies between 508.1 mm and 1,913.9 mm, with the highest rainfall in the southern and south-eastern part and lowest rainfall in the north-western part of the study area.

Summary statistics of monthly and annual rainfall data are given in Table 1. September is the wettest month (rainfall amount equals 112.5 mm) with the highest variation in rainfall. The driest month is February (rainfall amount equals 56.4 mm) with the second highest variation. The skewness coefficient computed for monthly and annual rainfall data is presented in Table 1. The data usually fit normal distribution, if the skewness is very small or close to zero (Robertson 2008). Small skewness values in Table 1 indicate that all data sets are almost normally distributed. Furthermore, it is evident from the K-S test results (Table 1) that the null hypothesis cannot be rejected at 5% significance level because the computed K-S test statistic values for all data sets are smaller than the corresponding critical value (i.e., 0.30143). In this study, the null hypothesis is defined as the condition that indicates the data follow normal distribution. Thus, the rainfall data sets are found normally distributed at 5% significance level and can be directly used for variogram modelling and spatial interpolation.

Table 1

Summary statistics for the monthly and annual mean rainfall data (n = 19)

  Rainfall (mm)
 
    
Period Mean Std dev Minimum Maximum Skewness K-S value 
January 67.3 31.2 1.8 201.4 0.2796 0.14971 
February 56.4 53.4 0.0 269.5 0.3156 0.12822 
March 66.3 36.9 9.2 217.8 0.2800 0.16618 
April 84.7 45.8 15.2 246.0 0.0176 0.15649 
May 88.3 42.5 10.2 239.7 −0.1027 0.19500 
June 106.4 46.2 13.8 300.2 −0.0522 0.16641 
July 102.1 49.0 17.9 303.9 −0.0745 0.14543 
August 108.6 50.7 21.1 289.7 −0.0029 0.12252 
September 112.5 57.0 25.3 350.3 0.0330 0.15424 
October 101.6 50.9 4.0 333.9 −0.0988 0.17993 
November 96.1 47.6 15.3 258.7 0.0659 0.14956 
December 91.3 52.9 8.2 301.2 0.0469 0.16043 
Annual 1081.6 293.8 508.1 1913.9 −0.0756 0.14468 
  Rainfall (mm)
 
    
Period Mean Std dev Minimum Maximum Skewness K-S value 
January 67.3 31.2 1.8 201.4 0.2796 0.14971 
February 56.4 53.4 0.0 269.5 0.3156 0.12822 
March 66.3 36.9 9.2 217.8 0.2800 0.16618 
April 84.7 45.8 15.2 246.0 0.0176 0.15649 
May 88.3 42.5 10.2 239.7 −0.1027 0.19500 
June 106.4 46.2 13.8 300.2 −0.0522 0.16641 
July 102.1 49.0 17.9 303.9 −0.0745 0.14543 
August 108.6 50.7 21.1 289.7 −0.0029 0.12252 
September 112.5 57.0 25.3 350.3 0.0330 0.15424 
October 101.6 50.9 4.0 333.9 −0.0988 0.17993 
November 96.1 47.6 15.3 258.7 0.0659 0.14956 
December 91.3 52.9 8.2 301.2 0.0469 0.16043 
Annual 1081.6 293.8 508.1 1913.9 −0.0756 0.14468 

Std dev, standard deviation of rainfall; K-S value, Kolmogorov–Smirnov test statistic value, K-S19, 0.05 = 0.30143.

RESULTS AND DISCUSSION

Variogram modelling for spatial correlation analysis

Experimental variogram for monthly and annual rainfall

For variogram modelling, initially an experimental variogram is computed for each monthly and annual rainfall data set based on the variogram cloud (Wackernagel 2003). The variogram cloud is a plot in which the semivariance calculated using Equation (3) for any two data points is plotted against the distance between them (Robertson 2008). For 19 raingauges, the variogram cloud for each data set consists of 171 (19C2 = 171) data pairs in all directions. However, the variogram clouds are difficult to interpret due to the data points being congested. Hence, the variogram cloud is clustered into different bins to achieve the experimental variogram. A bin is a classification of lags where all lags with similar separation distance and directions are considered in the same bin (Adhikary et al. 2015a). The average lag distance and semivariance for all data pairs in each individual bin gives a discrete point on the experimental variogram plot. The experimental variogram thus consists of a finite number of discrete points with common distance and direction.

First, the directional variogram is computed for each data set. However, the directional variograms are found noisy due to the small number of raingauges. Consequently, an isotropic variogram is computed by assuming an identical spatial correlation in all directions and neglecting the influence of anisotropy on the variogram parameters. Isotropy is a property that defines the negligible directional influence on the data (Robertson 2008). The experimental variogram for the month of September and annual rainfall is shown in Figure 3. As can be seen from Figure 3, the experimental variogram often reaches the sill for large inter-station distance, which is often close to the variance of the data. The experimental variograms are then modelled by three standard variogram models (i.e., spherical, exponential, Gaussian) as well as GP, which are discussed in the following sub-sections.

Figure 3

Experimental isotropic variogram (dotted points) fitted with the variogram model (solid line) for (a) September rainfall and (b) annual rainfall.

Figure 3

Experimental isotropic variogram (dotted points) fitted with the variogram model (solid line) for (a) September rainfall and (b) annual rainfall.

Standard variogram model for monthly and annual rainfall

In traditional OK, three standard parametric variogram models (i.e., exponential, Gaussian, spherical) are used to fit the experimental variogram for all data sets. The variogram parameters that result in the best fitting performance based on RMSE, MAE and CC measures are chosen as the best fitted model in each case. Figure 3 shows the plots of fitted variogram models for September (taken as an example from all months, which is also the wettest month with high rainfall variation) and annual rainfall. The optimal variogram parameters and fitting performance of the standard variogram models for all monthly and annual rainfall are presented in Table 2. As can be seen from Table 2, the ratio of nugget coefficient to sill is small for all data sets indicating that monthly and annual rainfall is spatially correlated over the study area. This supports the use of kriging methods (i.e., OK, KGP), which inherently consider the spatial correlation. It is also evident from Table 2 that the CC values for the fitted models range from 0.886 to 0.989. Eleven months have CC values higher than 0.9 that indicates strong correlation between monthly rainfall and spatial distribution of raingauges. It can be seen that the Gaussian variogram model is the best fitted model for 6 months (i.e., February, March, April, July, November and December). For the remaining 6 months, the exponential variogram model is the best model for January, June and October while the spherical variogram model gives the best results for May, August and September.

Table 2

Variogram parameters and fitting performance of fitted variogram models used with the kriging-based methods

  Variogram parameters
 
Goodness-of-fit statistics
 
Month Variogram model Nugget (mm2Sill (mm2Range (km) RMSE (mm2MAE (mm2CC 
January Exponential 0.10 173.80 30.39 20.66 19.01 0.892 
Gaussian 19.40 153.60 15.54 20.68 16.92 0.888 
Spherical 1.90 155.50 19.80 20.89 17.94 0.886 
GP-based 0.00 – – 19.73 17.26 0.899 
February Exponential 0.10 64.85 32.52 4.49 3.94 0.967 
Gaussian 3.80 56.22 15.61 2.95 2.23 0.983 
Spherical 0.10 56.42 19.87 3.46 2.88 0.979 
GP-based 0.00 – – 3.83 3.23 0.973 
March Exponential 0.10 206.60 28.38 24.57 21.19 0.901 
Gaussian 0.10 185.00 13.91 16.22 15.29 0.954 
Spherical 0.10 184.10 17.78 20.06 18.06 0.939 
GP-based 0.00 – – 17.05 15.15 0.949 
April Exponential 0.10 278.90 33.75 23.13 20.08 0.953 
Gaussian 0.10 247.40 15.55 11.94 9.86 0.986 
Spherical 0.10 248.70 20.61 17.18 15.02 0.975 
GP-based 0.00 – – 14.89 12.80 0.978 
May Exponential 1.00 426.90 49.47 28.99 22.79 0.960 
Gaussian 33.50 327.60 19.61 28.20 18.83 0.959 
Spherical 1.00 339.90 26.35 28.07 20.14 0.960 
GP-based 0.00 – – 27.14 18.55 0.963 
June Exponential 1.00 775.00 36.33 75.88 67.36 0.928 
Gaussian 35.00 648.40 15.26 76.61 59.83 0.924 
Spherical 1.00 657.90 20.23 79.32 63.54 0.919 
GP-based 0.00 – – 70.96 61.07 0.936 
July Exponential 1.00 855.10 30.96 82.10 66.73 0.933 
Gaussian 71.00 752.60 14.96 79.07 64.35 0.935 
Spherical 1.00 753.60 18.66 80.63 67.34 0.933 
GP-based 64.00 – – 74.27 53.91 0.943 
August Exponential 13.00 936.90 35.67 34.78 27.46 0.989 
Gaussian 208.00 852.00 21.91 39.23 31.19 0.984 
Spherical 94.00 851.80 26.07 32.07 25.82 0.989 
GP-based 0.00 – – 34.92 28.66 0.988 
September Exponential 1.00 912.90 39.87 65.12 52.39 0.965 
Gaussian 72.00 771.50 18.53 59.75 49.53 0.966 
Spherical 1.00 780.80 23.56 59.82 48.83 0.967 
GP-based 0.00 – – 55.12 42.45 0.972 
October Exponential 1.00 412.90 31.68 26.87 19.26 0.986 
Gaussian 25.00 369.20 16.14 22.26 18.86 0.981 
Spherical 1.00 377.50 21.81 22.30 19.37 0.982 
GP-based 0.00 – – 15.41 13.78 0.991 
November Exponential 1.00 345.50 34.47 23.79 20.45 0.965 
Gaussian 1.00 284.60 14.25 15.85 14.22 0.981 
Spherical 0.10 286.60 19.08 19.84 17.12 0.975 
GP-based 0.00 – – 17.51 15.56 0.978 
December Exponential 1.00 361.80 32.88 22.04 16.18 0.975 
Gaussian 1.00 307.20 14.05 15.71 12.26 0.984 
Spherical 0.10 310.60 19.04 19.30 14.94 0.979 
GP-based 0.00 – – 9.54 7.73 0.994 
Annual Exponential 100.00 61,300.00 36.60 4,479.63 3,524.03 0.965 
Gaussian 1,700.00 50,970.00 15.69 3,706.71 2,960.04 0.971 
Spherical 100.00 51,830.00 21.19 4,118.88 3,539.73 0.966 
GP-based 0.00 – – 3,535.70 2,888.40 0.975 
  Variogram parameters
 
Goodness-of-fit statistics
 
Month Variogram model Nugget (mm2Sill (mm2Range (km) RMSE (mm2MAE (mm2CC 
January Exponential 0.10 173.80 30.39 20.66 19.01 0.892 
Gaussian 19.40 153.60 15.54 20.68 16.92 0.888 
Spherical 1.90 155.50 19.80 20.89 17.94 0.886 
GP-based 0.00 – – 19.73 17.26 0.899 
February Exponential 0.10 64.85 32.52 4.49 3.94 0.967 
Gaussian 3.80 56.22 15.61 2.95 2.23 0.983 
Spherical 0.10 56.42 19.87 3.46 2.88 0.979 
GP-based 0.00 – – 3.83 3.23 0.973 
March Exponential 0.10 206.60 28.38 24.57 21.19 0.901 
Gaussian 0.10 185.00 13.91 16.22 15.29 0.954 
Spherical 0.10 184.10 17.78 20.06 18.06 0.939 
GP-based 0.00 – – 17.05 15.15 0.949 
April Exponential 0.10 278.90 33.75 23.13 20.08 0.953 
Gaussian 0.10 247.40 15.55 11.94 9.86 0.986 
Spherical 0.10 248.70 20.61 17.18 15.02 0.975 
GP-based 0.00 – – 14.89 12.80 0.978 
May Exponential 1.00 426.90 49.47 28.99 22.79 0.960 
Gaussian 33.50 327.60 19.61 28.20 18.83 0.959 
Spherical 1.00 339.90 26.35 28.07 20.14 0.960 
GP-based 0.00 – – 27.14 18.55 0.963 
June Exponential 1.00 775.00 36.33 75.88 67.36 0.928 
Gaussian 35.00 648.40 15.26 76.61 59.83 0.924 
Spherical 1.00 657.90 20.23 79.32 63.54 0.919 
GP-based 0.00 – – 70.96 61.07 0.936 
July Exponential 1.00 855.10 30.96 82.10 66.73 0.933 
Gaussian 71.00 752.60 14.96 79.07 64.35 0.935 
Spherical 1.00 753.60 18.66 80.63 67.34 0.933 
GP-based 64.00 – – 74.27 53.91 0.943 
August Exponential 13.00 936.90 35.67 34.78 27.46 0.989 
Gaussian 208.00 852.00 21.91 39.23 31.19 0.984 
Spherical 94.00 851.80 26.07 32.07 25.82 0.989 
GP-based 0.00 – – 34.92 28.66 0.988 
September Exponential 1.00 912.90 39.87 65.12 52.39 0.965 
Gaussian 72.00 771.50 18.53 59.75 49.53 0.966 
Spherical 1.00 780.80 23.56 59.82 48.83 0.967 
GP-based 0.00 – – 55.12 42.45 0.972 
October Exponential 1.00 412.90 31.68 26.87 19.26 0.986 
Gaussian 25.00 369.20 16.14 22.26 18.86 0.981 
Spherical 1.00 377.50 21.81 22.30 19.37 0.982 
GP-based 0.00 – – 15.41 13.78 0.991 
November Exponential 1.00 345.50 34.47 23.79 20.45 0.965 
Gaussian 1.00 284.60 14.25 15.85 14.22 0.981 
Spherical 0.10 286.60 19.08 19.84 17.12 0.975 
GP-based 0.00 – – 17.51 15.56 0.978 
December Exponential 1.00 361.80 32.88 22.04 16.18 0.975 
Gaussian 1.00 307.20 14.05 15.71 12.26 0.984 
Spherical 0.10 310.60 19.04 19.30 14.94 0.979 
GP-based 0.00 – – 9.54 7.73 0.994 
Annual Exponential 100.00 61,300.00 36.60 4,479.63 3,524.03 0.965 
Gaussian 1,700.00 50,970.00 15.69 3,706.71 2,960.04 0.971 
Spherical 100.00 51,830.00 21.19 4,118.88 3,539.73 0.966 
GP-based 0.00 – – 3,535.70 2,888.40 0.975 

It is important to note that the best fitted variogram model may not always give the best results in interpolation by kriging if it cannot satisfy the cross-validation criteria (Wackernagel 2003). Cross-validation test is often used in kriging to choose the ultimate form of the standard variogram models. Hence, the ultimate form of these models is chosen based on the cross-validation test and the adopted model is finally used to generate the rainfall map. In order to satisfy the cross-validation test criteria, the variogram parameters (i.e., nugget, sill, range) of the standard parametric variogram models are iteratively changed and interpolation results are re-evaluated. The cross-validation results are given in Table 3, which indicates that all adopted variogram models satisfy the cross-validation criteria and are thus appropriate for use in kriging interpolation and rainfall map production.

Table 3

Cross-validation results of estimating monthly and annual mean rainfall using different interpolation methods

Cross-validation statistics Interpolation methods January February March April May June July August September October November December Annual 
RMSE OKS 9.33 5.23 9.71 10.04 11.13 16.17 17.50 16.95 16.83 12.24 11.34 11.69 141.13 
OKE 10.65 6.20 11.84 12.38 12.41 19.62 21.68 21.28 20.57 15.20 13.43 13.89 171.75 
OKG 11.46 6.93 12.90 14.14 14.58 22.65 24.37 23.09 23.26 16.81 15.34 15.73 197.46 
KGP 11.10 5.48 11.52 13.16 11.49 16.42 23.14 21.30 22.98 13.17 12.23 14.70 177.26 
IDW 11.11 6.67 12.32 13.57 14.82 22.01 23.61 23.77 23.41 16.36 14.78 15.28 192.64 
ASE OKS −0.137 −0.118 −0.133 −0.140 −0.089 −0.111 −0.083 −0.077 −0.119 −0.117 −0.139 −0.145 −0.122 
OKE −0.102 −0.091 −0.096 −0.106 −0.087 −0.101 −0.087 −0.089 −0.103 −0.100 −0.097 −0.111 −0.101 
OKG −0.108 −0.102 −0.105 −0.113 −0.111 −0.114 −0.098 −0.100 −0.121 −0.105 −0.101 −0.111 −0.110 
KGP −0.369 −0.288 −0.136 −0.254 −0.201 −0.088 −0.027 0.004 −0.612 −0.247 −0.253 −0.121 −0.285 
IDW −0.155 −0.132 −0.143 −0.169 −0.172 −0.175 −0.165 −0.162 −0.174 −0.171 −0.142 −0.166 −0.169 
RMSS OKS 0.984 0.984 0.992 0.991 0.978 0.988 0.989 0.995 0.990 0.989 0.991 0.991 0.991 
OKE 1.000 1.012 1.001 1.029 1.019 1.036 1.056 1.039 1.033 1.054 1.030 1.045 1.042 
OKG 0.980 0.985 1.000 0.981 0.987 0.987 0.991 0.995 0.988 0.987 0.981 0.981 0.983 
KGP 1.144 1.127 1.108 1.080 1.047 0.896 1.107 1.131 1.427 1.023 1.069 1.053 1.127 
IDW 1.021 0.996 1.006 1.058 1.123 1.102 1.182 1.139 1.096 1.157 1.026 1.056 1.104 
Cross-validation statistics Interpolation methods January February March April May June July August September October November December Annual 
RMSE OKS 9.33 5.23 9.71 10.04 11.13 16.17 17.50 16.95 16.83 12.24 11.34 11.69 141.13 
OKE 10.65 6.20 11.84 12.38 12.41 19.62 21.68 21.28 20.57 15.20 13.43 13.89 171.75 
OKG 11.46 6.93 12.90 14.14 14.58 22.65 24.37 23.09 23.26 16.81 15.34 15.73 197.46 
KGP 11.10 5.48 11.52 13.16 11.49 16.42 23.14 21.30 22.98 13.17 12.23 14.70 177.26 
IDW 11.11 6.67 12.32 13.57 14.82 22.01 23.61 23.77 23.41 16.36 14.78 15.28 192.64 
ASE OKS −0.137 −0.118 −0.133 −0.140 −0.089 −0.111 −0.083 −0.077 −0.119 −0.117 −0.139 −0.145 −0.122 
OKE −0.102 −0.091 −0.096 −0.106 −0.087 −0.101 −0.087 −0.089 −0.103 −0.100 −0.097 −0.111 −0.101 
OKG −0.108 −0.102 −0.105 −0.113 −0.111 −0.114 −0.098 −0.100 −0.121 −0.105 −0.101 −0.111 −0.110 
KGP −0.369 −0.288 −0.136 −0.254 −0.201 −0.088 −0.027 0.004 −0.612 −0.247 −0.253 −0.121 −0.285 
IDW −0.155 −0.132 −0.143 −0.169 −0.172 −0.175 −0.165 −0.162 −0.174 −0.171 −0.142 −0.166 −0.169 
RMSS OKS 0.984 0.984 0.992 0.991 0.978 0.988 0.989 0.995 0.990 0.989 0.991 0.991 0.991 
OKE 1.000 1.012 1.001 1.029 1.019 1.036 1.056 1.039 1.033 1.054 1.030 1.045 1.042 
OKG 0.980 0.985 1.000 0.981 0.987 0.987 0.991 0.995 0.988 0.987 0.981 0.981 0.983 
KGP 1.144 1.127 1.108 1.080 1.047 0.896 1.107 1.131 1.427 1.023 1.069 1.053 1.127 
IDW 1.021 0.996 1.006 1.058 1.123 1.102 1.182 1.139 1.096 1.157 1.026 1.056 1.104 

RMSE, root mean square error; ASE, average standardized error; RMSS, root mean square standardized error.

OKS, ordinary kriging with spherical variogram model; OKE, ordinary kriging with exponential variogram model; OKG, ordinary kriging with Gaussian variogram model; KGP, kriging with genetic programming-based variogram model; IDW, inverse distance weighting.

GP-based variogram model for monthly and annual rainfall

In the KGP method, the GP-based variogram model is used with traditional kriging. GP is used to approximate the experimental variogram to achieve the GP-based variogram model. Available simple function sets in the standard variogram models (+, −, *, /, x2, exp) are used for the GP runs to obtain simple GP models. The evolved GP-based models with the aforementioned function sets can exhibit a similar functional structure to that of the standard variogram models. The best fitted GP model is obtained by using the objective function of minimizing RMSE. In this study, a population size of 500, crossover rate of 0.95, mutation rate of 0.05 and the number of generations as 500 are adopted for the evolution of the best GP-based models. The performance measures (RMSE, MAE, CC) of the GP-based variogram models for all data sets are presented in Table 2. It can be seen from Table 2 that the GP-based variogram model outperforms the standard variogram models for 7 months (i.e., January, May, June, July, September, October, December) and also shows better performance than the other two standard variogram models for another 3 months (i.e., March, April, November) and annual rainfall when all the performance measures are considered. Figure 3 shows the plots of fitted GP-based variogram models for September and annual rainfall. Teegavarapu (2007) indicated that the nugget coefficient of the variogram model significantly impacts the kriging computation and thus should be correctly estimated for best results by kriging interpolation. The nugget coefficient of the GP-based variogram model is determined by extending the function to touch the ordinate for zero distance in the abscissa. The nugget coefficients can then be read from the ordinate, which are also presented for all data sets in Table 2. It can be seen from Figure 3 that the GP-based variogram model has a similar structure to that of the standard variogram models (i.e., spherical, exponential, Gaussian). It is also evident from Figure 3 that the structure of the GP-based variogram model is composed of two standard models. Thus, the GP-based variogram model can be referred to as a combined variogram model of standard structure. A major advantage of the GP-based variogram model is that it does not have any pre-defined structure and parameters unlike the standard parametric variogram models.

Isaaks & Srivastava (1989) mentioned that any form of new variogram function can be fitted to the experimental variogram. They also indicated that developing and selecting such new variogram models (for example, the GP-based variogram model in this study) other than the standard ones may not result in a unique and stable solution for the kriging weights if the ‘positive definiteness condition’, an essential requirement to be satisfied by any variogram model, is not satisfied. The ‘positive definiteness condition’ for a variogram model demonstrates that the variance obtained by kriging interpolation remains always positive (Wackernagel 2003). It is worth mentioning that the standard variogram models always satisfy the ‘positive definiteness condition’ (Teegavarapu 2007) and thus results in a unique solution for the kriging weights in traditional OK. Therefore, the cross-validation test alone is adequate to select the best standard variogram model. However, both cross-validation and positive definiteness criteria must be satisfied simultaneously for the GP-based variogram model so that it can generate a unique solution for the kriging weights in the KGP method. The cross-validation results based on the GP-based variogram model are given in Table 3. It is obvious from Table 3 that all the cross-validation criteria are satisfied by the GP-based variogram model for all data sets. At the same time, the estimated kriging variance (results not shown) for all data sets is found positive in all stations and hence satisfies the positive definiteness condition. Therefore, the obtained GP-based variogram model for all data sets is able to provide a unique and stable solution for the kriging weights and can be used in the KGP method for rainfall interpolation and subsequent map generation.

Spatial estimation of rainfall

Different stochastic (i.e., OK, KGP) and deterministic (e.g., IDW) methods are used in the current study to estimate the monthly and annual rainfall patterns in the Middle Yarra River catchment in Australia. In stochastic interpolation framework, kriging with three standard parametric models, namely, OK with spherical model (OKS), OK with exponential model (OKE), OK with Gaussian model (OKG) as well as kriging with GP-based non-parametric variogram model (KGP) are implemented.

The RMSE and cross-validation plot results have been widely used to indicate how accurately an interpolator estimates the observed values. A smaller error in RMSE indicates better prediction. In the case of a cross-validation plot, the perfect estimation is that wherein all scattered points lie on the 45 ° line in the plot with the maximum coefficient of determination (R2) between estimated and measured data.

Table 3 presents the cross-validation results of monthly and annual rainfall estimates over the study area. As can be seen from the table, stochastic interpolation techniques (i.e., OK and KGP) yield better estimation than the deterministic technique (i.e., IDW) for almost all cases considering RMSE values. Similar findings have been reported in many studies (e.g., Goovaerts 2000; Delbari et al. 2013), where IDW has been found to be less accurate than kriging methods for the monthly and annual rainfall estimation.

The results indicate that for 12 months and annual rainfall, OKS has the lowest RMSE and highest R2 values compared to IDW and other kriging methods (i.e., OKE, OKG and KGP) as shown in Figure 4. Thus, OKS is found to be the most accurate interpolator for estimating monthly and annual rainfall patterns over the study area. OKS and KGP do not show a significant difference between each other in terms of RMSE (Figure 4(a)). The results also show that KGP and OKE methods provide similar estimates as given by the OKS methods in some months, and also reasonable estimates in other months in terms of R2 (Figure 4(b)). Moreover, KGP yields better estimates than IDW and two other traditional kriging methods (i.e., OKE and OKG), when both RMSE and R2 values are considered. The average RMSE values for monthly data by KGP, OKS, OKE, OKG and IDW methods are 14.73 mm, 12.35 mm, 14.93 mm, 16.77 mm and 16.47 mm, respectively. Whereas the average R2 values for monthly data by KGP, OKS, OKE, OKG and IDW methods are 0.35, 0.50, 0.34, 0.10 and 0.27, respectively.

Figure 4

(a) RMSE and (b) R2 obtained for OK, KGP and IDW methods.

Figure 4

(a) RMSE and (b) R2 obtained for OK, KGP and IDW methods.

In addition to quantitative estimation and comparison by the cross-validation results, the performance of different interpolators is evaluated through the visual examination of generated maps. The generated maps for September (again taking September as an example from all months) and annual rainfall are presented in Figure 5. It can be seen from the figures that the maps produced by kriging and IDW methods are noticeably different. The reason is that kriging considers the pattern of spatial dependence of the rainfall data set, whereas the IDW considers only the distance between observed and estimated locations. Visual comparison indicates that the map generated by kriging is smoother than that obtained by the IDW method. The locations of raingauges are more obviously seen in the IDW map than the other maps. The reason behind this is that IDW generally produces spikes around the sampled locations. In spite of having a similar spatial distribution of rainfall depth for all generated maps, the maximum estimated rainfall amounts by each interpolation method are different.

Figure 5

Rainfall maps for September and annual rainfall by (a) OK, (b) KGP and (c) IDW methods.

Figure 5

Rainfall maps for September and annual rainfall by (a) OK, (b) KGP and (c) IDW methods.

A more detailed comparison of all interpolators (i.e., OK, KGP, IDW) can be made using the PBIAS errors (Moriasi et al. 2007), which is given by: 
formula
8
The PBIAS errors are computed for the monthly (e.g., again taking September as an example from all months) and annual rainfall estimations at each raingauge, which are shown in Figure 6. As can be seen from the figure, the larger bias errors are found for few raingauges, which are usually isolated (e.g., stations 12, 13 and 14). This is not surprising because there are not enough neighbouring observations around those isolated raingauges. All methods similarly give the largest bias error at raingauge 14, which is the westernmost location in the study area. Interpolation by OKS for September and annual rainfall indicates that there are six raingauges with an absolute bias error greater than ±20%, while interpolation by KGP demonstrates that there are 11 raingauges with an absolute bias error greater than ±20%, respectively. The number of raingauges with the aforementioned condition is obtained as 12 for OKE, 15 for OKG and 17 for IDW. These results clearly indicate that although kriging produces better estimation and appears as the most accurate interpolator in terms of the lower RMSE values, this method introduces notable errors (greater than ±20%) for a large number of raingauges than KGP, if exponential or Gaussian (other than spherical) variogram models are used. Figure 6 also shows that although all methods overestimate the rainfall amount at raingauge 14, the KGP method gives the lowest RMSE in that location for September and annual rainfall. The aforementioned findings give an important insight into developing a newly developed variogram model (e.g., the GP-based variogram model in this study) and using it in line with the standard variogram models (i.e., spherical, exponential and Gaussian) in kriging interpolation for spatial interpolation of rainfall and generation of gridded rainfall maps.
Figure 6

PBIAS obtained for (a) September and (b) annual rainfall by OK, KGP and IDW methods.

Figure 6

PBIAS obtained for (a) September and (b) annual rainfall by OK, KGP and IDW methods.

CONCLUSIONS

In this study, the aim is to apply two kriging-based stochastic interpolation methods (i.e., traditional kriging and KGP) and a deterministic interpolation method (i.e., IDW) for the estimation of monthly and annual rainfall distribution in the Middle Yarra River catchment, Australia. The objective is to compare the performance of the aforementioned methods for generating a high quality gridded rainfall data set for the study area in the form of a rainfall map. This study introduces KGP as a new kriging-based interpolation method for interpolation of point rainfalls to generate the gridded rainfall map. In the KGP method, the GP-based non-parametric variogram model is used with kriging instead of using the standard parametric variogram models (i.e., spherical, exponential, and Gaussian). A major advantage of the GP-based variogram model is that it does not require a pre-defined functional form and parameters to fit the experimental variogram unlike the standard parametric variogram models in traditional kriging.

The following points are the major conclusions of this study:

  • The cross-validation results indicate that the kriging-based stochastic interpolation methods (i.e., OK and KGP) clearly outperform the deterministic method (i.e., IDW) for spatial interpolation of rainfall.

  • Among all the kriging-based methods, OK with the spherical variogram model (OKS) produces the lowest prediction error and highest coefficient of correlation for rainfall estimation. Thus, OKS is found to be the most accurate interpolator for estimating the monthly and annual rainfalls over the study area.

  • The KGP method yields almost identical prediction error and coefficient of correlation for rainfall estimation to that given by the OKS for most of the months. Also, the KGP method gives the lower prediction error and is found to be a better interpolator than the OK with the exponential or Gaussian variogram models for most of the months.

  • Although for some months, the KGP method cannot produce the lowest prediction error, the method gives reasonable estimates, which are in line with the OKS (best selected interpolator in this study). The maps generated by the KGP method seem to be consistent with the observed rainfall and reasonable with that produced by the best interpolator. A major advantage of the KGP method is that it eliminates the time-consuming and monotonous task of obtaining the variogram parameters and kriging weights as required with traditional kriging. Therefore, the KGP offers a viable alternative to traditional kriging for spatial interpolation and mapping of rainfall in hydrology.

ACKNOWLEDGEMENTS

The authors wish to thank DHI Water and Environment for providing GPKernel and the BoM for providing the necessary data.

REFERENCES

REFERENCES
Adhikary
S. K.
Yilmaz
A. G.
Muttil
N.
2015a
Optimal design of raingauge network in the Middle Yarra River catchment
.
Hydrol. Processes
29
(
11
),
2582
2599
.
Adhikary
S. K.
Muttil
N.
Yilmaz
A. G.
2015b
Genetic programming-based ordinary kriging for spatial interpolation of rainfall
.
J. Hydrol. Eng.
21
(
2
),
04015062
.
doi: 10.1061/(ASCE)HE.1943-5584.0001300.
ASCE
1996
Hydrology Handbook
,
2nd edn
.
Task Committee on Hydrology Handbook (Management Group D of ASCE), American Society of Civil Engineers (ASCE)
,
New York
.
Babovic
V.
Keijzer
M.
2000
Genetic programming as a model induction engine
.
J. Hydroinform.
2
(
1
),
35
60
.
Delbari
M.
Afrasiab
P.
Jahani
S.
2013
Spatial interpolation of monthly and annual rainfall in northeast of Iran
.
Meteorol. Atmos. Phys.
122
(
1–2
),
103
113
.
Goldberg
D. E.
1989
Genetic Algorithms in Search, Optimization, and Machine Learning
.
Addison-Wesley Publishing
,
Reading, MA
,
USA
.
Isaaks
H. E.
Srivastava
R. M.
1989
An Introduction to Applied Geostatistics
.
Oxford University Press
,
New York
,
USA
.
Khu
S. T.
Liong
S. Y.
Babovic
V.
Madsen
H.
Muttil
N.
2001
Genetic programming and its application in real-time runoff forecasting
.
J. Am. Water Resour. Assoc.
37
(
2
),
439
451
.
Koza
J. R.
1992
Genetic Programming – On the Programming of Computers by Means of Natural Selection
.
The MIT Press
,
Cambridge, MA
,
USA
.
Liong
S. Y.
Gautam
T. R.
Khu
S. T.
Babovic
V.
Keijzer
M.
Muttil
N.
2002
Genetic programming: a new paradigm in rainfall runoff modelling
.
J. Am. Water Resour. Assoc.
38
(
3
),
705
718
.
McCuen
R. H.
2003
Modelling Hydrologic Change: Statistical Methods
.
CRC Press
,
Boca Raton, FL
,
USA
.
Moral
F. J.
2010
Comparison of different geostatistical approaches to map climate variables: application to precipitation
.
Int. J. Climatol.
30
(
4
),
620
631
.
Moriasi
D. N.
Arnold
J. G.
Van Liew
M. W.
Bingner
R. L.
Harmel
R. D.
Veith
T. L.
2007
Model evaluation guidelines for systematic quantification of accuracy in watershed simulations
.
Trans. ASABE
50
(
3
),
885
900
.
Robertson
G. P.
2008
GS + : Geostatistics for the Environmental Sciences
.
Gamma Design Software
,
Plainwell, MI
,
USA
.
Sivapragasam
C.
Muttil
N.
Muthukumar
S.
Arun
V. M.
2010
Prediction of algal bloom using genetic programming
.
Mar. Pollut. Bull.
60
(
10
),
1849
1855
.
Sivapragasam
C.
Muttil
N.
Arun
V. M.
Sudha
S. J.
2012
Suitability of universal function approximators in re-designing raingauge network for a river basin
.
Int. J. Earth Sci. Eng.
5
(
2
),
255
267
.
Wackernagel
H.
2003
Multivariate Geostatistics: An Introduction with Applications, 3rd edn.
Springer-Verlag
,
Berlin
,
Germany
.