In spatial interpolation, one of the most widely used deterministic methods is the inverse distance weighting (IDW) technique. The general idea of IDW is primarily based on the hypothesis that the attribute value of an ungauged site is the weighted average of the known attribute values within the neighborhood, and the ‘weights’ are merely associated with the horizontal distances between the gauged and ungauged sites. However, here we propose an extended version of IDW (hereafter, called the EIDW method) to provide ‘alternative weights’ based on the blended geographical and physiographical spaces for estimation of streamflow percentiles at ungauged sites. Based on the leave-one-out cross-validation procedure, the coefficient of determination had a value of 0.77 and 0.82 for the proposed EIDW models, M1 and M2, respectively, with low root mean square errors. Moreover, after investigating the relationship between the prediction efficiency and the distance decay parameter (C), the better performance of the M1 and M2 resulted at C = 2. Furthermore, the results of this study show that the EIDW could be considered as a constructive way forward to provide more accurate and consistent results in comparison to the traditional IDW or the dimension reduction technique-based IDW.
Streamflow data records are very important for efficient water resource management. However, in many instances, streamflow series are too short to allow for reliable estimation of extreme events at the site of interest. Numerous methodologies have been applied for streamflow estimation at ungauged sites or at sites with short streamflow series (Razavi & Coulibaly 2013). These methods, which are used to transfer hydrological information from gauge sites to the ungauged site, are generally called regionalization (Chokmani & Ouarda 2004; Farmer & Vogel 2013). Regionalization techniques frequently used in practice are generally based on deterministic rainfall–runoff models or hydrological model-independent (hydro-statistical) methods (Razavi & Coulibaly 2013; Farmer & Vogel 2013). Razavi & Coulibaly (2013) provided a brief review of the merits and demerits of these two different regionalization categories. Since there is no universally acceptable unique regionalization method (Arsenault & Brissette 2014), we focused on the relatively simple methods, i.e., hydrological model-independent regionalization techniques.
In the current era of hydrological model-independent regionalization study, spatial interpolation techniques are divided into deterministic and geostatistical approaches based on both geographical and physiographical space (Castiglioni et al. 2009). Numerous studies of regionalization using spatial interpolation techniques have been reported in the scientific literature (Razavi & Coulibaly 2013). The most widely used deterministic and geostatistical spatial interpolation approaches include the inverse distance method (IDW) and the kriging method, respectively. According to Chokmani & Ouarda (2004), Skøien et al. (2006), Castiglioni et al. (2011), and Archfield et al. (2013), a geostatistical approach such as kriging can be considered a reliable benchmark for regionalization. In particular, Castiglioni et al. (2009) reported that geostatistical interpolation methods outperform deterministic methods. Generally, the kriging method is reliant on the appropriateness of the theoretical semivariogram. However, identification of the appropriate theoretical variogram for the given data is critical. In general, the level of spatial autocorrelation decreases as spatial lag increases, and the changes in spatial autocorrelation level over various spatial lags are measured and represented by a variogram. Unfortunately, the spatial structure of data might not follow the general structure in practical studies due to many reasons, such as poor data quality or presence of spatial heterogeneity or neighbor structures (Chokmani & Ouarda 2004; Lu & Wong 2008). More specifically, if the variogram does not sufficiently provide the spatial structure of the data, then kriging might not reflect accurate results (Lu & Wong 2008). In addition, the original data points in kriging are seldom honored, and efficiency can be affected by the number of donor sites (gauged sites), location (headwater, presence of pits and spikes), and spatial distribution of input sample points (Skøien et al. 2006; Lu & Wong 2008).
In contrast, the assumption of IDW is simple, and there is no need to identify a theoretical distribution of the data. This method does not contain computationally intensive measures, such as inverting the covariance matrix, as needed in kriging. Through comparison of four different interpolation methods, Dirks et al. (1998) demonstrated that the kriging method does not provide any significant improvement over the simple IDW method. Zhuang & Wang (2003) and Zhang et al. (2014) provided similar arguments. The IDW method is univariate with a single influence factor of horizontal distance. The obvious drawbacks associated with poor performance of the IDW are a direct application of the interpolation method in geographical space and abrupt changes in the adjacent site configuration (e.g., elevation changes) (Chang et al. 2005). Some studies have also verified that direct application of the interpolation method in geographical space might cause unrealistic results (Chokmani & Ouarda 2004; Sauquet 2006; Lu & Wong 2008). Numerous forms of inverse distance weights have been suggested in order to alleviate the limitations of using the simplistic inverse distance weights (Lu & Wong 2008). In addition, several studies have provided extensions of the IDW method, using physiographical space, by introducing the distance-elevation ratio space, or by changing the distance decay parameter (power function) of the traditional IDW (Chokmani & Ouarda 2004; Chang et al. 2005; Lu & Wong 2008; Castiglioni et al. 2011). Chang et al. (2005) briefly discussed the need to consider the effects of multiple factors (physiographical space) in addition to horizontal distance, and Chokmani & Ouarda (2004) and Castiglioni et al. (2011) reported that physiographical space might have real potential for interpolation of streamflow.
The idea behind the physiographic space-based interpolation approach is innovative, as it allows one to interpolate the hydrometric descriptors without necessarily defining the homogeneous regions (Castiglioni et al. 2009). With physiographical space-based interpolation, any given site (gauged or ungauged) can be represented as a point in XY space. The two-dimensional XY space (climatic and physiographic descriptors) is generally derived using multivariate dimension reduction methods (principal component analysis (PCA) or canonical correlation analysis) (Chokmani & Ouarda 2004; Castiglioni et al. 2011). The empirical values of the quantity of interest (e.g., low flow or 100-year flood) can be characterized along the third dimension Z for each gauged site and can be spatially interpolated using any suitable standard interpolation algorithm (Castiglioni et al. 2009). However, the limitations of PCA are primarily associated with assumptions, i.e., linearity in data transformation, most of the information lies in the direction of the maximum variance in the data inputs, and the desired number of principal components (PCs). In addition, the scales of the original descriptors are not always comparable, and the variable with high absolute variance will dominate the first principal component, which might cause unrealistic results. A plethora of multivariate dimension reduction methods, e.g., kernel PCA, kernel entropy component analysis, and kernel partial least square, have been developed in order to overcome the limitations associated with PCA (Jiang & Shi 2014; Rajsekhar et al. 2015). In addition, the selection of an appropriate method from the plethora of dimension reduction methods requires a strong mathematical background and expertise in statistical analysis.
Estimation of streamflow at ungauged sites using physiographic space-based interpolation, without taking into consideration any dimension reduction method, could be a step to avoid limitations associated with dimension reduction methods. Therefore, the simplest IDW method was intended to modify using geographical and physiographic space without using any dimension reduction method. The core concept of the proposed method is to modify the ‘weights (w)’ associated with the traditional IDW method, by introducing the aggregated weights of geographical and physiographical space, instead of merely basing the weights on geographical space or synthetic variables (PCs). The basic idea of the method is to define alternative weighting strategies that represent the relative importance of the individual donor site based on the joint effects of physiographical space and horizontal distance rather than the strategies of traditional IDW prediction.
The study area is located in the Han River basin, South Korea, as shown in Figure 1. The study area selection was primarily based on the appropriate accuracy of observations (continuity and absence of missing values), significant record length (>15 years), and independency of selected sites. Two types of datasets were collected for analysis: (1) hydrological attributes and (2) geomorpho-climatic characteristics of the site. The data were collected from the Water Management Information System (http://www.wamis.go.kr/) of Korea.
The geomorpho-climatic attributes considered for the analyses included mean annual precipitation, basin average slope (BASL), basin average elevation (BAE), drainage area (DA), drainage perimeter (DP), maximum altitude of the basin (MAB), elongation ratio (ER) (the ratio of the diameter of a circle of the same area as the basin to the maximum basin length), form factor (FF) (the ratio of basin area to square of basin length), and relief ratio (ReR) (the ratio between basin relief and basin length) as explanatory predictors. Similarly, the hydrological attributes included in this study constituted mean annual runoff (MAR) and runoff ratio (RR). The detailed information regarding the geomorpho-climatic and hydrological attributes is shown in Table 1.
Note: For detailed description see http://www.wamis.go.kr/eng/main.aspx.
Traditional IDW method
The IDW is a straightforward, non-computationally intensive method. It is based on Tobler's first law or the law of geography (generally stated as ‘everything is related to everything else, but near things are more related than distant things’), and it applies geographical space for interpolation. It has been used as one of the standard spatial interpolation procedures (Longley 2005; Burrough et al. 2013) and has been effectively used in various geographic information system (GIS) software packages. Its general idea is based on the assumption that the attribute value of an unsampled point is the weighted average of the known values within the neighborhood (Lu & Wong 2008). This method involves the process of assigning values to unknown points using values from a scattered set of known points. The value of an unknown point is a weighted sum of the values of the known points (Chen & Liu 2012).
Extension of inverse distance weighting method
The new proposed approach (i.e., extension of inverse distance weighting (EIDW)) is based on the interpolation of the hydrological variables of interest (i.e., flow duration curve (FDC) percentiles) over normalized geographical and physiographical space rather than only geographical space (i.e., as in the traditional IDW). It was assumed that the difference in the hydrological variable was influenced by the difference in site configuration. To assess the ability of the proposed method to predict the desired hydrological variables (the selected 15 percentiles), we used a comprehensive leave-one-out cross-validation (LOOCV) approach for the entire study area (Samuel et al. 2011). A brief summary of the proposed and comparative studies is illustrated in Figure 2.
Selection of hydrological variables of interest
In addition, the high-flow segments (Q0.1, Q0.5, Q2, Q5, and Q10), medium-flow segments (Q40, Q45, Q50, Q55, and Q60), and low-flow segments (Q75, Q80, Q85, Q90, and Q99) of the FDCs at the gauged sites were selected as the hydrological variables (QP, where P is the selected percentile, e.g., 0.1, 0.5,…, 99) to be transferred. The selection of these groups of percentile flow was intended to provide ease of reconstruction of the complete FDC and streamflow time series (Yusuf 2008).
Normalization of the physiographical attributes
Extension of the IDW (M1)
In addition, the weights obtained from Equation (7) were promoted as a distance weight, i.e., in the traditional IDW method (Equation (1)) for ungauged prediction, where M1 stands for extension M1. The sequence of the equations used for model M1 was Equations (2), (5), (6), (7), (1b), and (1a).
Extension of IDW (M2)
Next, was promoted as an alternative distance weight, i.e., in the traditional IDW method (Equation (1)) for ungauged prediction, where M2 stands for extension M2. The sequence of the equations used for model M2 was Equations (11), (12), (1b), and (1a).
Power parameter (C)
During estimation of the streamflow percentiles of interest QP(x) using M1 and M2, the critical parameters of EIDW, such as power parameter (C), were taken into account. The power parameter (C) is sometimes called a control parameter, and it is generally assumed that C = 2. In the current study, we experimented with variation in C, observing the spatial distribution of the information. Several studies have experimented with variation in C, generally 0 ∼ 5 (Chen & Liu 2012). In the present study, seven different random values of C, i.e., lower (0.1, 0.5, 1), average (2), and higher (3, 4, 5), were used for analysis.
Cross-validation and performance assessment
The performance evaluation was carried out by comparing EIDW with the traditional IDW and the modified IDW proposed by Castiglioni et al. (2009) based on physiographical space using the dimension reduction method (hereafter IDW-PC).
As indicated earlier, kriging and IDW are quite different approaches, and the basic motivation of the EIDW method is to offer a better predictive capability than the traditional IDW method. Therefore, there were no means to compare it with kriging. However, when the IDW is a logical alternative to kriging, e.g., the spatial correlation structure of the data is not strong, or when the data are limited to apply kriging, the EIDW offers a better alternative.
In order to regionalize the selected 15 percentiles using the suggested IDW extensions (i.e., M1, and M2), the NDB was used. Using the LOOCV procedure with rotation, every site was assumed an ungauged site, and the performance of the methods was evaluated for each assumed an ungauged site with varying C based on the scatter plot, R2, and RMSE.
Preliminary selection of the physiographical attributes was based on correlation with mean percentile Q50 and runoff ratio (RR in Table 2). The correlation analysis indicated that DA, DP, BAE, and MAB showed significant correlation with mean percentile Q50 and runoff ratio, as shown in Table 2. Therefore, these physiographical attributes were used for further analysis.
In addition, for the IDW-PC proposed by Castiglioni et al. (2009), the PCA was performed over the observed physiographical space. It was observed that the first two PC cumulatively represented approximately 63% of the data input variance. Then, the two-dimensional space (i.e., x-y space) for application of the IDW-PC was defined using the first and second PCs, as mentioned by Castiglioni et al. (2009).
The scatter plots shown in Figure 3 indicate a higher degree of accuracy for the proposed extensions (M1 and M2) and relatively poor performances for the traditional IDW and IDW-PC interpolation techniques. Using R2 as the evaluation criterion, it was observed that the traditional methods performed poorly (R2 = 0.46 and 0.67 using the IDW and IDW-PC, respectively) compared to the proposed methods (M1 with R2 = 0.77 and M2 with R2 = 0.82). In addition, Figure 3 clearly shows that, although there is some underestimation and overestimation, the overall comparative performance evaluation of M1 and M2 exhibited better agreement between the observed and estimated FDC percentiles.
In terms of RMSE, the proposed M1 and M2 interpolation techniques demonstrated comparable performances. Figure 4 shows the average relative error of the proposed models (i.e., M1 and M2) and the comparative models (IDW and IDW-PC) with varying values of distance decay (C), i.e., 0.1, 0.5, 1, 2, 3, 4, and 5. Figure 4 shows that the IDW produced a less accurate prediction and reveals that the relative error resulting from the IDW prediction was quite high in comparison to that of the proposed models at any of the distance decay values. In addition, the IDW-PC performance was comparatively better than that of the IDW, while it was relatively worse than compared to those of M1 and M2. The RMSE values of all of the models were comparatively low at C = 2 than at the rest of the C values. This result was expected since a zero C value in the IDW method tends to produce behavior like that of the simple arithmetic mean, while higher C values demonstrate behaviors, according to the Thiessen polygons method, which are less efficient than the simple IDW (Castiglioni et al. 2009). It was also observed that the proposed model M2 offered more noteworthy results than those of other models, depicting the minimum average RMSE (i.e., 8.66 at C = 2).
Individual site-based assessment (assumed ungauged one by one) resulted in the comparatively poorest method being the IDW, followed by the IDW-PC method. In comparison to the IDW-PC, the M1 provided overall significant results for the study area, except for sites 1, 2, 6, and 13, while the IDW-PC resulted in slightly better performance, as shown in Figure 5. In comparison to the studied models, it was noted that the M2 model produced a significantly accurate prediction by exhibiting a comparatively lower RMSE for the entire study area. Similarly, employing the overall mean RMSE as the performance indicator, the worst method was the IDW, followed by the IDW-PC method (Table 3). The suggested models performed better than the contender methods. Among the suggested models, the performance of M2 based on overall mean RMSE for varying C was found to be the highest, followed by M1, having values of 10.20 and 11.28, respectively. These results verified that consideration of a large number of influential factors (physiographical space-based weights) reduced the estimation error.
Since the emphasis of current studies in spatial interpolation is placed on increasing the prediction efficiency, the primary objective of this study was to provide an enhanced version of the traditional IDW method. Our study focused on improvement of IDW by considering the joint effects of site configuration variability and horizontal distance. More specifically, the current study exclusively introduced alternative weights (Equations (7) and (12)) in order to increase the prediction capability of the IDW. The results provided some insights in terms of strength and applicability of the EIDW method.
The concept of the EIDW method was used to perform interpolation based on both geographical and physiographical space rather than solely on geographical space. The performances of the EIDW models were evaluated with various distance decay constants (C), demonstrating the better performance of the EIDW using C = 2. The comparative performances of the proposed EIDW models (M1 and M2) and the contenders, i.e., the IDW and IDW-PC, were evaluated using the LOOCV procedure and other evaluation criteria, such as RMSE and R2. The comparable performances resulted in values of 0.46, 0.67, 0.77, and 0.82 of R2 for the IDW, IDW-PC, EIDW (M1), and EIDW (M2), respectively. In addition, comparatively lower RMSE values in the cases of M1 and M2 also provided evidence of the efficacy of the EIDW. It was expected as direct transformation of information in a merely geographical space, significant variation in the site geometric configuration could result in less significant outputs using the IDW, and the limitations of the dimension reduction techniques (i.e., PCA) could hinder the high predictability using the IDW-PC. Since the deterministic and geostatistical techniques use quite different approaches, the basic idea of proposing the EIDW method was to offer a better predictive capability than the traditional IDW method. Therefore, there were no means to compare these methods. However, when the IDW is a logical alternative to kriging, e.g., the spatial correlation structure of the data is not strong, or when the data are limited for application of kriging, the EIDW might be a better alternative.
Based on the performance assessment, it was concluded that a blended geographical and physiographical space-based EIDW is a step forward for providing more accurate, consistent, and unbiased results in comparison to the traditional IDW and IDW-PC methods. Although the results are necessarily related to the study area considered in this study, they still offer useful and somewhat general indications that can help the experts in the selection of a significant methodology for the problem at hand.
This work was supported by the Advanced Water Management Research Program (14AWMP-B082564-01) of the Korean government.