In spatial interpolation, one of the most widely used deterministic methods is the inverse distance weighting (IDW) technique. The general idea of IDW is primarily based on the hypothesis that the attribute value of an ungauged site is the weighted average of the known attribute values within the neighborhood, and the ‘weights’ are merely associated with the horizontal distances between the gauged and ungauged sites. However, here we propose an extended version of IDW (hereafter, called the EIDW method) to provide ‘alternative weights’ based on the blended geographical and physiographical spaces for estimation of streamflow percentiles at ungauged sites. Based on the leave-one-out cross-validation procedure, the coefficient of determination had a value of 0.77 and 0.82 for the proposed EIDW models, M1 and M2, respectively, with low root mean square errors. Moreover, after investigating the relationship between the prediction efficiency and the distance decay parameter (C), the better performance of the M1 and M2 resulted at C = 2. Furthermore, the results of this study show that the EIDW could be considered as a constructive way forward to provide more accurate and consistent results in comparison to the traditional IDW or the dimension reduction technique-based IDW.

## INTRODUCTION

Streamflow data records are very important for efficient water resource management. However, in many instances, streamflow series are too short to allow for reliable estimation of extreme events at the site of interest. Numerous methodologies have been applied for streamflow estimation at ungauged sites or at sites with short streamflow series (Razavi & Coulibaly 2013). These methods, which are used to transfer hydrological information from gauge sites to the ungauged site, are generally called regionalization (Chokmani & Ouarda 2004; Farmer & Vogel 2013). Regionalization techniques frequently used in practice are generally based on deterministic rainfall–runoff models or hydrological model-independent (hydro-statistical) methods (Razavi & Coulibaly 2013; Farmer & Vogel 2013). Razavi & Coulibaly (2013) provided a brief review of the merits and demerits of these two different regionalization categories. Since there is no universally acceptable unique regionalization method (Arsenault & Brissette 2014), we focused on the relatively simple methods, i.e., hydrological model-independent regionalization techniques.

In the current era of hydrological model-independent regionalization study, spatial interpolation techniques are divided into deterministic and geostatistical approaches based on both geographical and physiographical space (Castiglioni *et al.* 2009). Numerous studies of regionalization using spatial interpolation techniques have been reported in the scientific literature (Razavi & Coulibaly 2013). The most widely used deterministic and geostatistical spatial interpolation approaches include the inverse distance method (IDW) and the kriging method, respectively. According to Chokmani & Ouarda (2004), Skøien *et al.* (2006), Castiglioni *et al.* (2011), and Archfield *et al.* (2013), a geostatistical approach such as kriging can be considered a reliable benchmark for regionalization. In particular, Castiglioni *et al.* (2009) reported that geostatistical interpolation methods outperform deterministic methods. Generally, the kriging method is reliant on the appropriateness of the theoretical semivariogram. However, identification of the appropriate theoretical variogram for the given data is critical. In general, the level of spatial autocorrelation decreases as spatial lag increases, and the changes in spatial autocorrelation level over various spatial lags are measured and represented by a variogram. Unfortunately, the spatial structure of data might not follow the general structure in practical studies due to many reasons, such as poor data quality or presence of spatial heterogeneity or neighbor structures (Chokmani & Ouarda 2004; Lu & Wong 2008). More specifically, if the variogram does not sufficiently provide the spatial structure of the data, then kriging might not reflect accurate results (Lu & Wong 2008). In addition, the original data points in kriging are seldom honored, and efficiency can be affected by the number of donor sites (gauged sites), location (headwater, presence of pits and spikes), and spatial distribution of input sample points (Skøien *et al.* 2006; Lu & Wong 2008).

In contrast, the assumption of IDW is simple, and there is no need to identify a theoretical distribution of the data. This method does not contain computationally intensive measures, such as inverting the covariance matrix, as needed in kriging. Through comparison of four different interpolation methods, Dirks *et al.* (1998) demonstrated that the kriging method does not provide any significant improvement over the simple IDW method. Zhuang & Wang (2003) and Zhang *et al.* (2014) provided similar arguments. The IDW method is univariate with a single influence factor of horizontal distance. The obvious drawbacks associated with poor performance of the IDW are a direct application of the interpolation method in geographical space and abrupt changes in the adjacent site configuration (e.g., elevation changes) (Chang *et al.* 2005). Some studies have also verified that direct application of the interpolation method in geographical space might cause unrealistic results (Chokmani & Ouarda 2004; Sauquet 2006; Lu & Wong 2008). Numerous forms of inverse distance weights have been suggested in order to alleviate the limitations of using the simplistic inverse distance weights (Lu & Wong 2008). In addition, several studies have provided extensions of the IDW method, using physiographical space, by introducing the distance-elevation ratio space, or by changing the distance decay parameter (power function) of the traditional IDW (Chokmani & Ouarda 2004; Chang *et al.* 2005; Lu & Wong 2008; Castiglioni *et al.* 2011). Chang *et al.* (2005) briefly discussed the need to consider the effects of multiple factors (physiographical space) in addition to horizontal distance, and Chokmani & Ouarda (2004) and Castiglioni *et al.* (2011) reported that physiographical space might have real potential for interpolation of streamflow.

The idea behind the physiographic space-based interpolation approach is innovative, as it allows one to interpolate the hydrometric descriptors without necessarily defining the homogeneous regions (Castiglioni *et al.* 2009). With physiographical space-based interpolation, any given site (gauged or ungauged) can be represented as a point in XY space. The two-dimensional XY space (climatic and physiographic descriptors) is generally derived using multivariate dimension reduction methods (principal component analysis (PCA) or canonical correlation analysis) (Chokmani & Ouarda 2004; Castiglioni *et al.* 2011). The empirical values of the quantity of interest (e.g., low flow or 100-year flood) can be characterized along the third dimension Z for each gauged site and can be spatially interpolated using any suitable standard interpolation algorithm (Castiglioni *et al.* 2009). However, the limitations of PCA are primarily associated with assumptions, i.e., linearity in data transformation, most of the information lies in the direction of the maximum variance in the data inputs, and the desired number of principal components (PCs). In addition, the scales of the original descriptors are not always comparable, and the variable with high absolute variance will dominate the first principal component, which might cause unrealistic results. A plethora of multivariate dimension reduction methods, e.g., kernel PCA, kernel entropy component analysis, and kernel partial least square, have been developed in order to overcome the limitations associated with PCA (Jiang & Shi 2014; Rajsekhar *et al.* 2015). In addition, the selection of an appropriate method from the plethora of dimension reduction methods requires a strong mathematical background and expertise in statistical analysis.

Estimation of streamflow at ungauged sites using physiographic space-based interpolation, without taking into consideration any dimension reduction method, could be a step to avoid limitations associated with dimension reduction methods. Therefore, the simplest IDW method was intended to modify using geographical and physiographic space without using any dimension reduction method. The core concept of the proposed method is to modify the ‘weights (*w*)’ associated with the traditional IDW method, by introducing the aggregated weights of geographical and physiographical space, instead of merely basing the weights on geographical space or synthetic variables (PCs). The basic idea of the method is to define alternative weighting strategies that represent the relative importance of the individual donor site based on the joint effects of physiographical space and horizontal distance rather than the strategies of traditional IDW prediction.

## STUDY AREA

The geomorpho-climatic attributes considered for the analyses included mean annual precipitation, basin average slope (BASL), basin average elevation (BAE), drainage area (DA), drainage perimeter (DP), maximum altitude of the basin (MAB), elongation ratio (ER) (the ratio of the diameter of a circle of the same area as the basin to the maximum basin length), form factor (FF) (the ratio of basin area to square of basin length), and relief ratio (ReR) (the ratio between basin relief and basin length) as explanatory predictors. Similarly, the hydrological attributes included in this study constituted mean annual runoff (MAR) and runoff ratio (RR). The detailed information regarding the geomorpho-climatic and hydrological attributes is shown in Table 1.

DA | DP | BAE | BASL | FF | ER | ReR | MAB | |
---|---|---|---|---|---|---|---|---|

(km^{2}) | (km) | (m) | (%) | (m) | ||||

Minimum | 351.34 | 166.76 | 55.31 | 10.86 | 0.07 | 0.31 | 0.01 | 810.00 |

1st quartile | 1,502.56 | 227.39 | 282.98 | 33.62 | 0.12 | 0.38 | 0.01 | 1,144.75 |

Median | 1,600.88 | 258.54 | 417.30 | 39.59 | 0.18 | 0.48 | 0.02 | 1,237.50 |

Mean | 1,611.38 | 252.33 | 395.66 | 36.66 | 0.31 | 0.58 | 0.02 | 1,282.95 |

3rd quartile | 2,017.55 | 286.67 | 563.69 | 45.02 | 0.54 | 0.83 | 0.02 | 1,567.75 |

Maximum | 2,483.82 | 348.71 | 749.32 | 57.88 | 0.78 | 1.00 | 0.03 | 1,585.00 |

DA | DP | BAE | BASL | FF | ER | ReR | MAB | |
---|---|---|---|---|---|---|---|---|

(km^{2}) | (km) | (m) | (%) | (m) | ||||

Minimum | 351.34 | 166.76 | 55.31 | 10.86 | 0.07 | 0.31 | 0.01 | 810.00 |

1st quartile | 1,502.56 | 227.39 | 282.98 | 33.62 | 0.12 | 0.38 | 0.01 | 1,144.75 |

Median | 1,600.88 | 258.54 | 417.30 | 39.59 | 0.18 | 0.48 | 0.02 | 1,237.50 |

Mean | 1,611.38 | 252.33 | 395.66 | 36.66 | 0.31 | 0.58 | 0.02 | 1,282.95 |

3rd quartile | 2,017.55 | 286.67 | 563.69 | 45.02 | 0.54 | 0.83 | 0.02 | 1,567.75 |

Maximum | 2,483.82 | 348.71 | 749.32 | 57.88 | 0.78 | 1.00 | 0.03 | 1,585.00 |

*Note:* For detailed description see http://www.wamis.go.kr/eng/main.aspx.

## METHODOLOGY

### Traditional IDW method

The IDW is a straightforward, non-computationally intensive method. It is based on Tobler's first law or the law of geography (generally stated as ‘everything is related to everything else, but near things are more related than distant things’), and it applies geographical space for interpolation. It has been used as one of the standard spatial interpolation procedures (Longley 2005; Burrough *et al.* 2013) and has been effectively used in various geographic information system (GIS) software packages. Its general idea is based on the assumption that the attribute value of an unsampled point is the weighted average of the known values within the neighborhood (Lu & Wong 2008). This method involves the process of assigning values to unknown points using values from a scattered set of known points. The value of an unknown point is a weighted sum of the values of the known points (Chen & Liu 2012).

*m*source sites (i.e., known points) transfer information to an unknown point (i.e., ungauged site), the required streamflow value at the ungauged site can be computed as the weighted average of the estimates of the

*m*source sites. The computation of the required streamflow value for an ungauged site can be obtained using the IDW equations as follows: where the distance between any two sites is , the IDW in subscript stands for the method used,

*d*is the distance based on the geographical distance weighted scheme (DWS), and

_{DWS}*Q*(

_{P}*x*) is the hydrological variable at the ungauged site located at point (

*x*, y).

*Q*(

_{P}*x*) is the hydrological variable at the neighboring donor site

_{k}*k*located at point (

*x*, y

_{k}_{k}) in the region,

*m*is the total number of donor sites,

*c*is the power parameter, and

*w*is the interpolation weight assigned to the

_{k}*k*th donor site.

### Extension of inverse distance weighting method

*et al.*2011). A brief summary of the proposed and comparative studies is illustrated in Figure 2.

### Analysis procedure

#### Selection of hydrological variables of interest

*q*) were ranked in descending order (

_{j}*q*and

_{1}*q*were the largest and smallest values, respectively). The ordered streamflow values were then plotted against the corresponding

_{N}*P*calculated from the Weibull plotting position formula in Equation (3) as follows: where

_{j}*P*is the probability that a given streamflow equaled or exceeded

_{j}*q*,

_{j}*j*is the rank assigned to each streamflow value in the period of the record, and

*N*is the total number of streamflow records.

In addition, the high-flow segments (*Q _{0.1}, Q_{0.5}, Q_{2}, Q_{5},* and

*Q*), medium-flow segments (

_{10}*Q*and

_{40}, Q_{45}, Q_{50}, Q_{55},*Q*), and low-flow segments (

_{60}*Q*and

_{75}, Q_{80}, Q_{85}, Q_{90},*Q*) of the FDCs at the gauged sites were selected as the hydrological variables (

_{99}*Q*where

_{P},*P*is the selected percentile, e.g., 0.1, 0.5,…, 99) to be transferred. The selection of these groups of percentile flow was intended to provide ease of reconstruction of the complete FDC and streamflow time series (Yusuf 2008).

#### Normalization of the physiographical attributes

#### Extension of the IDW (M1)

*DA*and

_{k}*DA*are the areas of the

*k*th donor site and the ungauged site, respectively, and

*v*and

_{ik}*v*are the corresponding

_{i}*i*th normalized physiographical attributes. The physiographical attributes having a significant correlation coefficient (Pearson correlation coefficient) with flow percentile were used in the PWS.

#### Extension of IDW (M2)

*i*(=1, 2, 3, ……,

*n*) normalized physiographical attributes corresponding to

*k*(=1, 2, 3, …,

*m*) donor sites.

*i*th variable were assigned (Equations (9)–(11)) based on the entropy of the individual variable. Information entropy was first introduced by Shannon (1948) in order to estimate uncertainty. The entropy weights were introduced in this study in order to provide a balanced relationship among the various selected attributes to account for data discrimination (abrupt changes) and to provide unbiased relative weights since entropy can provide a better characterization of the data compared to the variance. The equations for the entropy weights are as follows: where

*e*is the optimal entropy weight assigned to an

_{i}*i*th variable to account for its regional variability satisfying , and

*D*is the measure of divergence of each

_{i}*i*th variable.

*e*) corresponding to the

_{i}*i*th variable was further utilized in order to determine the ‘alternative weight’ (Equation (12)). The primary difference between the Euclidian distance and the result of Equation (12) is the introduction of optimal objective entropy weight and consideration of physiographical space instead of only geographical space:

#### Power parameter (C)

During estimation of the streamflow percentiles of interest *Q _{P(x)}* using M1 and M2, the critical parameters of EIDW, such as power parameter (C), were taken into account. The power parameter (C) is sometimes called a control parameter, and it is generally assumed that C = 2. In the current study, we experimented with variation in C, observing the spatial distribution of the information. Several studies have experimented with variation in C, generally 0 ∼ 5 (Chen & Liu 2012). In the present study, seven different random values of C, i.e., lower (0.1, 0.5, 1), average (2), and higher (3, 4, 5), were used for analysis.

#### Cross-validation and performance assessment

The performance evaluation was carried out by comparing EIDW with the traditional IDW and the modified IDW proposed by Castiglioni *et al.* (2009) based on physiographical space using the dimension reduction method (hereafter IDW-PC).

As indicated earlier, kriging and IDW are quite different approaches, and the basic motivation of the EIDW method is to offer a better predictive capability than the traditional IDW method. Therefore, there were no means to compare it with kriging. However, when the IDW is a logical alternative to kriging, e.g., the spatial correlation structure of the data is not strong, or when the data are limited to apply kriging, the EIDW offers a better alternative.

*et al.*2011). McCuen (2005) explained the advantage of the LOOCV procedure and described that the model accuracy obtained using the procedure was independent of the calibration data. In the LOOCV method, the streamflow record of one site in the study area was considered ‘ungauged’ and was omitted from the database. Then, the streamflow percentiles for that site were obtained using the calibration results from the remaining sites. To analyze the model performance, deviations from the 15 percentiles were measured using the coefficient of determination (R

^{2}) and root mean square error (RMSE) as follows: where

*N*is the number of streamflow percentiles,

*Q*and are the selected and the corresponding estimated percentiles of streamflow, respectively (observed 15 percentiles), and is the averaged observed percentile value.

_{P}## RESULTS

In order to regionalize the selected 15 percentiles using the suggested IDW extensions (i.e., M1, and M2), the NDB was used. Using the LOOCV procedure with rotation, every site was assumed an ungauged site, and the performance of the methods was evaluated for each assumed an ungauged site with varying C based on the scatter plot, *R*^{2}, and RMSE.

Preliminary selection of the physiographical attributes was based on correlation with mean percentile Q_{50} and runoff ratio (RR in Table 2). The correlation analysis indicated that DA, DP, BAE, and MAB showed significant correlation with mean percentile Q_{50} and runoff ratio, as shown in Table 2. Therefore, these physiographical attributes were used for further analysis.

DA | DP | BAE | BASL | FF | ER | ReR | MAB | ||
---|---|---|---|---|---|---|---|---|---|

Q_{50} | Pearson corr. | 0.761 | 0.745 | 0.723 | 0.491 | −0.428 | −0.430 | −0.378 | 0.717 |

Sig. | 0.002 | 0.002 | 0.003 | 0.075 | 0.127 | 0.125 | 0.183 | 0.004 | |

RR | Pearson corr. | 0.748 | 0.770 | 0.737 | 0.533 | −0.475 | −0.476 | −0.427 | 0.709 |

Sig. | 0.002 | 0.001 | 0.003 | 0.050 | 0.086 | 0.085 | 0.128 | 0.004 |

DA | DP | BAE | BASL | FF | ER | ReR | MAB | ||
---|---|---|---|---|---|---|---|---|---|

Q_{50} | Pearson corr. | 0.761 | 0.745 | 0.723 | 0.491 | −0.428 | −0.430 | −0.378 | 0.717 |

Sig. | 0.002 | 0.002 | 0.003 | 0.075 | 0.127 | 0.125 | 0.183 | 0.004 | |

RR | Pearson corr. | 0.748 | 0.770 | 0.737 | 0.533 | −0.475 | −0.476 | −0.427 | 0.709 |

Sig. | 0.002 | 0.001 | 0.003 | 0.050 | 0.086 | 0.085 | 0.128 | 0.004 |

In addition, for the IDW-PC proposed by Castiglioni *et al.* (2009), the PCA was performed over the observed physiographical space. It was observed that the first two PC cumulatively represented approximately 63% of the data input variance. Then, the two-dimensional space (i.e., x-y space) for application of the IDW-PC was defined using the first and second PCs, as mentioned by Castiglioni *et al.* (2009).

*R*

^{2}as the evaluation criterion, it was observed that the traditional methods performed poorly (

*R*

^{2}= 0.46 and 0.67 using the IDW and IDW-PC, respectively) compared to the proposed methods (M1 with

*R*

^{2}= 0.77 and M2 with

*R*

^{2}= 0.82). In addition, Figure 3 clearly shows that, although there is some underestimation and overestimation, the overall comparative performance evaluation of M1 and M2 exhibited better agreement between the observed and estimated FDC percentiles.

*et al.*2009). It was also observed that the proposed model M2 offered more noteworthy results than those of other models, depicting the minimum average RMSE (i.e., 8.66 at C = 2).

IDW | IDW-PC | M1 | M2 | |
---|---|---|---|---|

Average RMSE | 14.09 | 11.65 | 11.28 | 10.20 |

IDW | IDW-PC | M1 | M2 | |
---|---|---|---|---|

Average RMSE | 14.09 | 11.65 | 11.28 | 10.20 |

## CONCLUSIONS

Since the emphasis of current studies in spatial interpolation is placed on increasing the prediction efficiency, the primary objective of this study was to provide an enhanced version of the traditional IDW method. Our study focused on improvement of IDW by considering the joint effects of site configuration variability and horizontal distance. More specifically, the current study exclusively introduced alternative weights (Equations (7) and (12)) in order to increase the prediction capability of the IDW. The results provided some insights in terms of strength and applicability of the EIDW method.

The concept of the EIDW method was used to perform interpolation based on both geographical and physiographical space rather than solely on geographical space. The performances of the EIDW models were evaluated with various distance decay constants (C), demonstrating the better performance of the EIDW using C = 2. The comparative performances of the proposed EIDW models (M1 and M2) and the contenders, i.e., the IDW and IDW-PC, were evaluated using the LOOCV procedure and other evaluation criteria, such as RMSE and *R*^{2}. The comparable performances resulted in values of 0.46, 0.67, 0.77, and 0.82 of *R*^{2} for the IDW, IDW-PC, EIDW (M1), and EIDW (M2), respectively. In addition, comparatively lower RMSE values in the cases of M1 and M2 also provided evidence of the efficacy of the EIDW. It was expected as direct transformation of information in a merely geographical space, significant variation in the site geometric configuration could result in less significant outputs using the IDW, and the limitations of the dimension reduction techniques (i.e., PCA) could hinder the high predictability using the IDW-PC. Since the deterministic and geostatistical techniques use quite different approaches, the basic idea of proposing the EIDW method was to offer a better predictive capability than the traditional IDW method. Therefore, there were no means to compare these methods. However, when the IDW is a logical alternative to kriging, e.g., the spatial correlation structure of the data is not strong, or when the data are limited for application of kriging, the EIDW might be a better alternative.

Based on the performance assessment, it was concluded that a blended geographical and physiographical space-based EIDW is a step forward for providing more accurate, consistent, and unbiased results in comparison to the traditional IDW and IDW-PC methods. Although the results are necessarily related to the study area considered in this study, they still offer useful and somewhat general indications that can help the experts in the selection of a significant methodology for the problem at hand.

## ACKNOWLEDGEMENTS

This work was supported by the Advanced Water Management Research Program (14AWMP-B082564-01) of the Korean government.