The present study on the hydrologic regionalization was taken up to evaluate the utility of hierarchical cluster analysis for the delineation of hydrologically homogeneous regions and multiple linear regression (MLR) models for information transfer to derive flow duration curve (FDC) in ungauged basins. For this purpose, 50 catchments with largely unregulated flows located in South India were identified and a dataset of historical streamflow records and 16 catchment attributes was created. Using selected catchment attributes, three hydrologically homogenous regions were delineated using a hierarchical agglomerative cluster approach, and nine flow quantiles (10–90%) for each of the catchments in the respective clusters was derived. Regionalization approach was then adopted, whereby using step-wise regression, flow quantiles were related with readily derived basin-physical characteristics through MLR models. Cluster-wise performance analysis of the developed models indicated excellent performance with an average coefficient of determination (R2) values of 0.85, 0.97, and 0.8 for Cluster-1, -2, and -3, respectively, in comparison to poor performance when all 50 stations were considered to be in a single region. However, Jackknife cross-validation showed mixed performances with regard to the reliability of developed models with performance being good for high-flow quantiles and poor for low-flow quantiles.

  • Hierarchical cluster analysis was used to delineate 50 unregulated catchments into homogenous groups.

  • Nine flow quantiles of the flow duration curve were extracted for each catchment and related to significant catchment attributes through multiple linear regression models.

  • Accuracies of models developed for each cluster were good but jackknife cross-validation showed fairly high reliability for only high-flow quantiles.

Time series information related to river flow is essential for water resources management studies, such as assessment of water availability for domestic supply and irrigation, forecasting of floods and droughts, assessing the ecosystem health, and analysis and design of water resources projects (Vogel et al. 1999; Masih et al. 2010; Karki et al. 2023). Streamflow information is neither available nor sufficient in terms of quality or quantity, resulting in many catchments being classified as ungauged. Despite substantial progress in hydrological research, many developing countries continue to struggle with problems associated with insufficient hydrometric data and the problems of land-use changes and climate change impacts, affecting water availability and degrading the ecosystems. Such issues are difficult to overcome when estimating flows from an ungauged or inadequately gauged basin (Sivapalan et al. 2003), resulting in improper planning and management of water resources not only at the ungauged site but also at the river basin level (Masih et al. 2010). Thus, the prediction of flows in ungauged basins is of practical significance for water resources planning and management and has been recognized as a critical research topic by the international hydrologic community (Sivapalan et al. 2003; Qamar et al. 2016; Guo et al. 2020; Karki et al. 2023).

The flow duration curve (FDC) of a catchment provides a concise, yet complete description of the runoff regime and therefore its prediction in ungauged basins is considered important (Boscarello et al. 2016; Ma et al. 2023; Yang et al. 2023). The FDC represents the relationship between stream discharge and the percentage of time (duration) (D) that this discharge (QD) was equaled or exceeded in the period of record. It has wide applications in the field of water resources assessment and management which include the estimation of the abstractable volume of water from rivers for domestic, irrigation, and hydropower projects, evaluation of low-flow statistics to maintain the water-quality standards, flood frequency analysis, wetland inundation mapping, reservoir and lake sedimentation studies, and instream flow assessment studies (Fennessey & Vogel 1990; Vogel & Fennessey 1994; Yu et al. 2002; Qamar et al. 2016; Silva et al. 2019; Gaviria & Carvajal 2022; Yang et al. 2023). FDC is one of the most commonly adopted techniques for the prediction of flows through regionalization (Boscarello et al. 2016; Qamar et al. 2016). It is for this reason that researchers have devoted significant efforts to the prediction of FDC in ungauged basins using the hydrologic regionalization approach. In this approach, certain characteristics of the observed FDC derived from historical flow records in gauged basins are transferred to ungauged basins located within hydrologically homogeneous regions (Panthi et al. 2021). Information transfer is achieved by establishing relationships between the flow quantiles extracted from observed FDC and selected catchment characteristics for the gauged basins. The developed relationships are then used to derive flow quantiles/FDC for the ungauged basins using data pertaining to their catchment characteristics.

For example, Yu & Yang (1996) assessed regional FDC for Southern Taiwan using multivariate statistical analysis of flow data from 34 sites. Castellarin et al. (2004a, 2004b) developed regional FDC for 51 unregulated river basins in Italy using catchment and morphologic characteristics. Mohamoud (2008) developed a regression model for different exceedance probabilities of flows in more than 40 climatic and landscape regions of the Northeastern US. A regionalization study by Shu & Ouarda (2012) indicated that FDC-based method outperformed the area-ratio method in 109 stations of Quebec Provinces in Canada. A recent study by Qamar et al. (2016) presented a nonparametric regionalization procedure for assessing FDC for 124 catchments of Northwestern Italy. Releasing the wide hydrological application of FDC, Chouaib et al. (2019) predicted the daily FDC through regionalization in ungauged basins using the hydroclimatic data of 73 catchments in the eastern USA. Similarly, the regionalization of FDC was demonstrated by Panthi et al. (2021) for predicting streamflow values for the data scares region of the central Himalayas. Considering FDC as a crucial indicator of river basins, Ma et al. (2023) attempted to identify the best-fit function using the regression analysis concept for developing FDC in a semi-arid region of North China. Karki et al. (2023) evaluated the uses and limitations of the regionalization method for developing FDC in 23 medium- to small-sized watersheds across Nepal. Among various regionalization techniques, one of the most popularly used methods is the regression technique that creates multivariate regression between streamflow and catchment attributes and has the advantage of evaluating each model parameter independently (Parajka et al. 2005; Yang et al. 2017).

While not all studies consider the delineation of hydrologically homogenous regions for regionalization, research (e.g., Yu & Yang 1996; Burgan & Aksoy 2020) has demonstrated that in doing so, increased accuracy in information transfer can be achieved especially if substantial spatial variability in the hydrologic or physiographic features of the catchments exists (Isik & Singh 2008). In the past few decades, different methods for the delineation of homogenous regions using a variety of similarity measures have been proposed (Tasker 1982; Rao & Srinivas 2006; Nobert et al. 2011; Latt et al. 2014; Boscarello et al. 2016; Li et al. 2018; Javadinejad 2021; Song et al. 2022) among which multivariate cluster analysis has proved to be the most efficient one. For example, Yu & Yang (1996) defined homogenous regions using cluster analysis for developing FDC for 34 stream-gauged stations in Southern Taiwan. Stream flow information from 655 gauging stations in Columbia was studied by Gaviria & Carvajal (2022) and they delineated 15 homogenous regions using geological, topographic, and climatic information as clustering variables and K-mean algorithms for grouping. An agglomerative hierarchical clustering algorithm was used by Burn et al. (1997) to define homogeneous regions for regional flood frequency analysis in the Saskatchewan–Nelson River basin in west-central Canada. A hierarchical clustering approach was adopted by Boscarello et al. (2016) for classifying 46 catchments in the Upper Po River basin in northwest Italy into three homogenous groups that were further used to estimate FDCs using the regionalization technique. Petrakis et al. (2021) used a hierarchical clustering approach to classify sub-basins of Smith Canyon Watershed, USA based on 12 environmental variables related to structural, biophysical, and hydrologic traits. Owing to the simplicity and wide spread use of the hierarchical clustering method, Mulaomerović-Šeta et al. (2023) adopted hierarchical clustering for grouping the basins and subsequently predicted the flood quantiles in ungauged basins of West Balkans using the regionalization concept. Similarly, various other studies on identifying homogenous regions using cluster analysis have been carried out by Mosley (1981), Shaban et al. (2010), Goyal & Gupta (2014), Abdolhay et al. (2012), Latt et al. (2014), Li et al. (2018), and Riswandi et al. (2022).

The review of the literature revealed that with regard to the characteristics of the observed FDC to be reproduced in the ungauged basin, two broad approaches have been used – a ‘parametric’ approach in which a function (empirical or probabilistic) is fitted to the observed FDCs and the optimized parameters of the function are predicted for the ungauged basin using the transfer function. On the other hand, in the ‘point’ approach, flow quantiles (QD) corresponding to specific values of duration (D) (for example, Q10, Q20,….Q90) are extracted from the observed FDCs and predicted for the ungauged basin using the transfer function. Also, few previous studies seem to have explicitly checked whether or not recorded streamflows in the selected gauged catchments are influenced by upstream diversions or regulations due to the presence of dams/reservoirs. This may prove to be a critical issue since the effect of such anthropogenic modifications on the natural runoff regime of the gauged catchments may be transferred to the ungauged basin in the process of regionalization. Therefore, it is imperative to ensure that the selected gauged catchments possess unregulated flows if the hydrological predictions in the ungauged basins are to represent natural conditions. Also, despite the fact that a large number of new water resources projects are being planned in India in general and in South India in particular, few previous studies seem to been taken up to develop tools to predict streamflows and FDCs in ungauged catchments in this region.

Therefore, the present study was taken up with the specific objective of developing appropriate models for the prediction of FDCs in ungauged basins located in the South Indian peninsular region. For this purpose, 50 gauged catchments in the region which did not have any major water resources project upstream of the gauging station and, therefore represented largely unregulated flows, were selected. For each of these catchments, available historical records of observed daily flows were compiled along with various catchment characteristics. Using frequency analysis, period-of-record observed FDCs were derived and a set of nine flow quantiles (Q10–Q90) were extracted from them for each catchment. Step-wise regression was used to develop multiple linear regression (MLR) equations relating each flow quantile to catchment characteristics, initially by considering the entire study area to represent a single homogeneous region and subsequently by delineating the catchments into separate homogeneous regions using hierarchical cluster analysis. Finally, the accuracy of the developed MLR equations was assessed using a leave-one-out jack knife cross-validation procedure. Complete details of the study area, data used, methodology adopted, and results obtained thereof are presented in subsequent sections of this paper.

Study area

Indian rivers are broadly classified into those belonging to the Himalayan River System (HRS) in the north and those belonging to the Peninsular River System (PRS) in the south. While the Himalayan Mountains form the headwater catchments for rivers in the HRS, rivers in the PRS mostly originate in the Western Ghat Mountains which run parallel to the West Coast of India and travel eastward across the peninsula before emptying into the Bay of Bengal. The major rivers of the PRS are the Tapi, Narmada, Mahanadi, Godavari, Krishna, and Cauvery. The PRS also comprises several short West flowing rivers which originate in the Western Ghats and flow westward into the Arabian Sea and several short East flowing rivers which originate in the Eastern Ghats and flow into the Bay of Bengal. For the purpose of the present study, 50 small-, medium-, and large-sized gauged catchments with unregulated flows located in the PRS were considered. The main criteria for selecting the catchments was the absence of any major water resources upstream of the gauging station so that the recorded flows could be considered as being virgin or unregulated flows. Table 1 shows details of the selected catchments identified by the river basin in which they are located and the names of the gauging stations. Figure 1 shows the location map of the selected gauging stations.
Table 1

List of selected stream gauging stations

River basin nameNo. of stationsStation name
West flowing rivers 18 Santeguli, Avershe, Yennehole, Addoor, Bantwal, Erinjipuzha, Kidangoor, Kalloopara, Thumpaman, Ayilam, Kuniyil, Karathodu, Kalampur, Mahuwa, Haladi, Nanipalasan, Ozerkheda, Pulamanthole 
Krishna 08 Kellodu, Talikot, Navalgund, Balehonnur, Khanapur, Marol, Halia, Naguleru @Dachepalli 
Cauvery 11 Sakleshpura, K M Vadi, E_Mangalam, Bendrahalli, Hogenakkal, Kudlur, Thevur, Thoppur, Nellithurai, Thengumarahada, T. Bekuppe 
East flowing rivers 05 Kashipatnam, Seedhi, Ambasamudram, Salur, Gunupur 
Godavari 08 Pedagedadda, Ramakona, Wairagarh, Amabal, Tumnar, Cherribeda, Gandlapet, Sonarpal 
River basin nameNo. of stationsStation name
West flowing rivers 18 Santeguli, Avershe, Yennehole, Addoor, Bantwal, Erinjipuzha, Kidangoor, Kalloopara, Thumpaman, Ayilam, Kuniyil, Karathodu, Kalampur, Mahuwa, Haladi, Nanipalasan, Ozerkheda, Pulamanthole 
Krishna 08 Kellodu, Talikot, Navalgund, Balehonnur, Khanapur, Marol, Halia, Naguleru @Dachepalli 
Cauvery 11 Sakleshpura, K M Vadi, E_Mangalam, Bendrahalli, Hogenakkal, Kudlur, Thevur, Thoppur, Nellithurai, Thengumarahada, T. Bekuppe 
East flowing rivers 05 Kashipatnam, Seedhi, Ambasamudram, Salur, Gunupur 
Godavari 08 Pedagedadda, Ramakona, Wairagarh, Amabal, Tumnar, Cherribeda, Gandlapet, Sonarpal 
Figure 1

Map showing the location of selected stream gauge stations (cluster-wise) in the Peninsular River System of India.

Figure 1

Map showing the location of selected stream gauge stations (cluster-wise) in the Peninsular River System of India.

Close modal

The areas of the delineated catchments upstream of the gauging stations varied from a minimum of 171 km2 to a maximum of 6,930 km2, with six catchments having catchment areas in excess of 3,000 km2. The topography of the study area consists of hilly terrain in the West and a flat plateau toward the East with the average elevation varying from 251 to 1,404 m. The mean annual rainfall of the selected catchments ranged from a minimum of 27 mm to a maximum of 12,068mm. The major part of river flow occurs during the wet monsoon months from July to August, and the flow is negligible in most of the rivers during the other months. The average daily temperature in the identified catchments ranges between 21 and 31 °C.

Discharge data

Daily discharge data for each of the identified stream gauge stations were extracted from India – WaterResources Information System (WRIS) portal by the Central Water Commission (CWC), Government of India. Owing to the inconsistency in the available data, the longest discharge data length selected for the study was from 1991 to 2018, and the shortest was from 2008 to 2018.

Catchment attributes

All the identified catchments were delineated using 30 m resolution Shuttle Radar Topography Mission (SRTM) – digital elevation model (DEM) data obtained from the US Geological Survey's Earth Resources Observation and Science (EROS) Center. The DEM was used to compute the following catchment attributes for the 50 basins: maximum elevation ‘MAXe’, minimum elevation ‘MINe’ (km), relief ‘ΔH’ (km), relative relief ‘ΔH/P’, slope ‘S', catchment area ‘A’ (km2), basin perimeter ‘P’ (km), length of the basin ‘L’ (km), basin width ‘W’ (km), longest flow path ‘Lp’ (km), drainage density ‘Dd’ (km/km2), form factor ‘FF’, shape factor ‘SF’, circulatory ratio ‘Rc’, and elongation ratio ‘RL’.

Daily rainfall data for 50 grid points were obtained from the India Meteorological Department (IMD) 0.25° × 0.25° gridded rainfall product for the period 1981–2018. Using grid points lying within and close to each catchment, areal rainfall values were derived using the Thiessen polygon method, and average annual rainfall values ‘Rain’ (mm) for this historical period were obtained for each catchment and included with the other attributes.

These catchment attributes were used in the cluster analysis and also in the development of regression models for the FDCs (described later).

Flow duration curve

FDC is one of the common tools used in hydrological studies that provide concise information about the river flow variability in the study basin (Quimpo et al. 1983; Yu et al. 2002; Castellarin et al. 2004a, 2004b; Boscarello et al. 2016; Burgan & Aksoy 2020). FDC is a graphical representation of the magnitude of stream flow versus the percentage of time a particular stream flow is exceeded or equaled over a period of time. An FDC is said to be the complement of the cumulative distribution function of daily streamflow. FDC for each of the gauge stations can be developed using a standard nonparametric approach that involves counting the number of occurrences of historical flows falling within class intervals of descending flow magnitudes qi with i = 1, 2…,n and subsequent calculation of percent exceedance probability using an appropriate plotting position formula (Fennessey & Vogel 1990; Vogel & Fennessey 1994; Sugiyama et al. 2003; Castellarin et al. 2004a, 2004b; Isik & Singh 2008; Li et al. 2010; Shu & Ouarda 2012).

The Weibull plotting position formula is most commonly adopted and is given by
(1)
where pi is the probability of the discharge being greater than or equal to a specified value qi of ordered streamflows, ‘m’ is the number of counts of the flow values falling within the specified interval, and ‘n’ is the number of events on records. From this analysis, nine flow quantiles related to their corresponding exceedance probability values viz 10, 20, 30, 40, 50, 60, 70, 80, and 90% were obtained by interpolation for each of the identified gauge stations (Shu & Ouarda 2012).

Cluster analysis and homogeneity test

Clustering is a multivariate technique to identify hydrologically homogenous regions using hydrological or catchment characteristics. Prior to regionalization, a homogeneous group of catchments is created by grouping the catchments into clusters according to the characteristics of the variables within the clusters (Yu & Yang 1996; Rao & Srinivas 2006; Isik & Singh 2008). Clustering techniques are generally classified into hierarchical and flat clustering. The usefulness of hybrid cluster analysis (combination of hierarchical and flat clustering) in regionalization was demonstrated by Rao & Srinivas (2006) for watersheds in Indiana, USA. Hierarchical clustering is preferred over flat clustering when the number of clusters is unknown. Hierarchical clustering works by merging smaller clusters into bigger ones, known as the agglomerative technique, or dividing bigger clusters into smaller ones, known as the divisive technique (Rao & Srinivas 2006; Javadinejad 2021). Hierarchical clustering is typically displayed using a tree-like figure recognized as a ‘dendrogram’ of clusters (explains the organization of the clusters). As per Demirel (2004), the user must decide the number of clusters to be formed, as the dendrogram will not provide the cluster assignment details. Since the length of the dendrogram's limb denotes the proximity of points, data can be clustered by cutting the dendrogram at a desired level (Isik & Singh 2008; Boscarello et al. 2016).

Previous research has used hierarchical clustering techniques, such as single linkage, complete linkage, centroid, average distance, and Ward's minimum variance technique for hydrologic regionalization (Li et al. 2018). For example, Tasker (1982) adopted a complete linkage algorithm to regionalize watersheds in Arizona. Comparison of different algorithms, including single, complete, and average linkage, centroid, median, and Ward's method, was demonstrated by Nathan & McMahon (1990) using the Statistical Package for the Social Sciences (SPSS) tool. Similarly, Burn et al. (1997) used an agglomerative hierarchical clustering algorithm to regionalize watersheds in Canada. Ward's method outperformed the other methods in terms of separation so that clusters are relatively dense with low variability within groups (Boscarello et al. 2016). Hence Ward's method was adopted in the present study to group the stations into clusters using hierarchical agglomerative algorithms.

Irrespective of the type of clustering scheme, a similarity measure is required to classify individual catchments into homogenous groups. Out of the different similarity measuring techniques, the most commonly adopted Euclidean distance method (Tasker 1982; Yu & Yang 1996; Isik & Singh 2008) is used as a similarity measure in this study. The Euclidean distance is defined as
(2)
where DP, Q is the distance between two stations, P and Q. XPi and XQi is the ith attribute at stations P and Q, and j is the total number of selected attributes.

Identification of clustering variables

Clustering attributes/variables have a strong influence on the results. Hence appropriate cluster attributes/variables should be identified before grouping the homogenous catchments (Yu & Yang 1996). The cluster analysis can categorize groups based on discharge data, topographical and meteorological characteristics (Rao & Srinivas 2006; Boscarello et al. 2016). Further, it is sensible to include those attributes that are not highly correlated with each other (Boscarello et al. 2016). In this context, cluster analysis of 50 catchments in the present study was carried out using the catchment attributes described (Article 2.3). The cluster analysis for the study was carried out for the identified clustering attributes in the SPSS statistical package tool using Euclidean distance as a similarity measure and Ward's technique for linkages.

Homogeneity test and discordance measure

As per the method demonstrated by Nobert et al. (2011), the homogeneity within the catchment groups derived from cluster analysis is assessed using the coefficient of variation (CV) test in this study. This test involves the calculation of mean, standard deviation, and CV of daily rainfall (times series data used for cluster analysis) at each station of the study area. The regional average coefficient of variation (CVAvg) and standard deviation of CV (σCV) of the river flow information is given as
(3)
(4)
where CVi is the coefficient of variation at the ith station, and N is the number of stations used in the study. A region is regarded to be homogenous if the homogeneity measure (CC) defined by Equation (5) is less than or equal to 0.3.
(5)
Further, the discordance measure aims to recognize discordant catchment stations within the group that need to be adjusted or excluded to improve their homogeneity. In order to do so, a discordance method proposed by Hosking & Wallis (1993) has been adopted in this study. The discordance measure utilizes the advantages offered by sampling properties of L-moment ratios. L-moments are expectations of certain linear combinations of order statistics that are more robust than conventional moments to outliers, suffer less from the effects of sample variability, and help to obtain valuable inferences from small samples about an underlying probability distribution (Hosking 1990). This study used the R-Studio software package to determine the initial four L-moments (L1, L2, L3, and L4) for each identified station. The L-moment ratios are defined as
(6)
(7)
where τ is the measure of scale and dispersion, τ3 and τ4 are measures of skewness and kurtosis, respectively.
If vi = [τ(i), τ3(i), τ4(i)]T be the vector containing the L-CV, L-Skeweness, and L-Kurtosis value related to the station ‘i’. Then the discordancy measure for the station ‘i’ is given as
(8)
(9)
where S is the sample covariance matrix given as
(10)

Higher values of Di indicate the most discordant station in the group. As per Hosking & Wallis (1993), the stations identified as discordant should be examined thoroughly, as discordancy may result from sampling variability or changes in the attribute values due to localized extreme events. Irrespective of the discordance value, the statistical parameters of the stations need to be compared with other stations within the group before declaring the station as discordant.

Cluster-wise regionalization through MLR technique

Once the homogenous regions are delineated using cluster analysis, the regionalization concept is applied in each cluster. Regionalization is one of the commonly used techniques for the analysis of flow characteristics in the ungauged catchments by utilizing the information from one or more gauged stations located within the same hydrological homogenous region (Blöschl & Sivapalan 1995; Sivapalan et al. 2003; Li et al. 2010; Bao et al. 2012; Yang et al. 2017; Guo et al. 2020). Regionalization can be carried out by different techniques, viz. regression analysis, area-index, nearest neighbor method, and hydrological similarity method. Among these, the MLR analysis is one of the earliest and most widely used techniques globally for regionalization (Li et al. 2010; Bao et al. 2012; Swain & Patra 2017). This technique aims to develop a relationship between identified catchment characteristics and stream flow information corresponding to gauged stations through an MLR equation. Various researchers have adopted such multivariate regression analysis for analyzing the ungauged catchments. For example, Bao et al. (2012) compared the regionalization approaches based on regression and similarity methods in 55 catchments of China. Vogel et al. (1999) developed the regional regression model relating the hydrologic, geomorphic, and climatic characteristics of a large number of catchments across the United States. Cluster-wise regionalization analysis was investigated by Li et al. (2018) for 15 catchments located in the Yangtze and Yellow River basins of China. An attempt was made by Huang et al. (2015) to integrate the regression concept of regionalization with clustering analysis, and it turned out to be effective in the Yalong River Basin, China. Compared to other methods, the regression approach has more advantages that include integration of catchment and stream flow characteristics in Geographical Information System (GIS) processing, analysis of climate change impacts on water yields, and most importantly, this concept can be used to quantify the mean and variance of the stream flow for any catchment in the region (Vogel et al. 1999).

Using the data available, both the parametric and point approaches for the regionalization of the FDCs were implemented. However, preliminary results for the parametric approach (not shown here for brevity) indicated poorer performance in comparison to the point approach, and therefore the latter approach was adopted.

Accordingly, separate MLR equations relating each of the nine flow quantiles (Q10, Q20,….Q90) extracted from the observed FDCs to the identified catchment attributes of the gauged catchments were established. The general form of the MLR equation used was
(11)
where Q(D) is the flow quantile of specific percentage duration (D) (10, 20 …90%) for each of the gauged catchments; X1, X2, ……Xn are the selected catchment characteristics for the gauged catchments and , ,……. are regression coefficients obtained through the least squares criterion. The optimal regression coefficients obtained from Equation (11) for identified gauged stations were then utilized to estimate Q(D) values for the ungauged stations by substituting their catchment attributes (Yu et al. 2002; Shu & Ouarda 2012; Nruthya & Srinivas 2015; Silva et al. 2019). As per He et al. (2011), the performance of the MLR approach mainly depends on the appropriate choice of attributes.

For the regionalization approach, catchment attributes were selected and utilized for multiple regression analysis in each of the 50 unregulated stations. The correlation matrix of the catchment attributes (Article 2.3) was studied to reduce multi-collinearity problems (Mohamoud 2008). The correlation matrix analysis coupled with the step-wise regression procedure facilitated the identification of the most influential and irredundant catchment attribute for explaining the observed variability in the flow quantiles (Nathan & McMahon 1990; Vogel et al. 1999; Yu et al. 2002). The backward step-wise regression analysis tool available in the SPSS statistical package was used in the present study.

Jackknife cross-validation

Jackknife cross-validation is a commonly used validation technique for evaluating the uncertainties between the developed model and input data (Efron 1981; Shao & Tu 1995) and is especially useful with small data sets. In a comparative study between jackknife and split-sampling methods, McCuen (2005) found that the jackknife test was less sensitive to the sample size variation and provided better model prediction accuracy than the split-sampling technique. Literature has also indicated that the model precision obtained using the jackknife technique is independent of calibration data (McCuen 2005; Shu & Ouarda 2012).

In the jackknife cross-validation technique as applied to the regionalization of FDCs, one gauge station is assumed to be ungauged and the information from the remaining (n − 1) gauge stations is utilized to develop the regression model associated with specified flow quantiles. The catchment attributes of the withheld station are then used in the developed regression to synthesize the flow quantile in the assumed ungauged catchment. Similarly, the station withheld in the first jackknife run is replaced, and the next gauged station is assumed to be ungauged for the second run. This procedure is continued until all the gauge stations have been utilized to make predictions (McCuen 2005; Shu & Ouarda 2012; Nruthya & Srinivas 2015). The model's reliability is then tested by comparing the predicted jackknife estimates (model predictions) with that of the observed values. Such cross-validations help to derive the reliability of the regional regression model (Castellarin et al. 2004a).

Performance evaluation

The model efficiency in predicting the flow quantile was determined by comparing the model results with the observed flow values corresponding to the station under investigation. The following three efficiency measures were used to assess the performance of the MLR models:

  • Coefficient of determination (R2):
    (12)
  • Root mean square error (RMSE):
    (13)
  • Percentage bias (PBIAS):
    (14)

In the above equations, ‘N’ denotes the number of selected catchments, ‘O’ and ‘P’ are the observed and predicted flow values, and, and indicate the average values of observed and predicted flow rates. RMSE displays the extent of a typical error. For an ideal model, the RMSE value should be zero. The coefficient of determination (R2) determines the percentage variation in the observations which are explained by the model with a value of 1 signifying ideal model performance (Domínguez et al. 2010; Shu & Ouarda 2012). Further, the model's tendency to consistently underpredict or overpredict the observed values is measured using percent bias (PBIAS). Model performance is acceptable if the PBIAS is within 100% (Moriasi et al. 2015; Burgan & Aksoy 2020).

Flow duration curve

The observed flows of individual gauge stations were plotted against the corresponding exceedance probability by means of the Weibull plotting position formula (Equation (1)) to obtain the corresponding FDCs for the gauge stations. Illustrative FDCs for the largest and smallest gauge stations in all five South Indian River basins are shown in Figure 2.
Figure 2

Illustrative FDC for the smallest catchments (left) and largest catchments (right) in different river basins of South Peninsular India.

Figure 2

Illustrative FDC for the smallest catchments (left) and largest catchments (right) in different river basins of South Peninsular India.

Close modal

As mentioned previously, a total of nine flow quantile values, viz 10, 20, 30, 40, 50, 60, 70, 80, and 90% were obtained by interpolation for each of the identified gauge stations.

The extracted flow quantiles were utilized for regression analysis and subsequent generation of FDCs for ungauged stations.

Cluster analysis

Cluster analysis was carried out to delineate the hydrologically homogenous region in the study region. A hierarchical agglomerative cluster algorithm that merges smaller cluster groups into bigger ones is adopted in the study for clustering the 50 stations using Ward's minimum variance technique with a Squared Euclidean similarity measure.

In order to reduce bias, it is crucial to choose a limited set of variables for cluster analysis. The cluster variables shown in Table 2 were identified based on a literature review and analysis of the correlation matrix of the 16 catchment attributes as shown in Table 3. As mentioned earlier, the clustering of 50 basins was carried out with the SPSS software tool. Each of the nine cluster variables was standardized to give equal importance and avoid the issues related to the usage of different measuring units.

Table 2

Cluster variables selected for study

AttributeUnitsAbbreviation
Maximum elevation km MAXe 
Minimum elevation km MINe 
Slope – S 
Basin area km2 A 
Shape factor – SF 
Circularity ratio – Rc 
Elongation ratio – RL 
Drainage density km/km2 DD 
Rainfall Rain 
AttributeUnitsAbbreviation
Maximum elevation km MAXe 
Minimum elevation km MINe 
Slope – S 
Basin area km2 A 
Shape factor – SF 
Circularity ratio – Rc 
Elongation ratio – RL 
Drainage density km/km2 DD 
Rainfall Rain 
Table 3

Correlation matrix of the cluster variables

VariablesMAXe (km)MINe (km)ΔH (km)ΔH/PSA (km2)P (km)L (km)W (km)Lp (km)FFSFRcRLDD (km/km2)Rain (m)
MAXe (km) 1.00                
MINe (km) −0.07 1.00               
ΔH (km) 0.90 −0.49 1.00              
ΔH/P 0.52 −0.52 0.68 1.00             
S 0.59 −0.52 0.74 0.96 1.00            
A (km2−0.01 0.21 −0.10 −0.58 −0.54 1.00           
P (km) −0.02 0.26 −0.13 −0.67 −0.63 0.95 1.00          
L (km) −0.03 0.24 −0.13 −0.62 −0.65 0.82 0.91 1.00         
W (km) 0.05 0.20 −0.04 −0.55 −0.41 0.89 0.81 0.56 1.00        
Lp (km) −0.03 0.19 −0.11 −0.61 −0.61 0.81 0.89 0.96 0.60 1.00       
FF 0.00 −0.05 0.02 −0.04 0.18 0.09 −0.03 −0.33 0.48 −0.20 1.00      
SF −0.06 −0.04 −0.03 0.01 −0.19 −0.06 0.09 0.42 −0.45 0.32 −0.87 1.00     
Rc 0.01 −0.17 0.08 0.37 0.44 −0.33 −0.53 −0.58 −0.08 −0.49 0.58 −0.59 1.00    
RL 0.01 0.25 −0.10 −0.19 −0.06 0.31 0.19 −0.12 0.56 −0.20 0.62 −0.72 0.31 1.00   
DD (km/km20.01 −0.31 0.15 0.13 0.10 −0.14 −0.01 0.10 −0.27 0.05 −0.28 0.40 −0.29 −0.23 1.00  
Rain (m) −0.01 −0.48 0.20 0.37 0.41 −0.37 −0.42 −0.45 −0.26 −0.39 0.21 −0.18 0.28 −0.07 0.31 1.00 
VariablesMAXe (km)MINe (km)ΔH (km)ΔH/PSA (km2)P (km)L (km)W (km)Lp (km)FFSFRcRLDD (km/km2)Rain (m)
MAXe (km) 1.00                
MINe (km) −0.07 1.00               
ΔH (km) 0.90 −0.49 1.00              
ΔH/P 0.52 −0.52 0.68 1.00             
S 0.59 −0.52 0.74 0.96 1.00            
A (km2−0.01 0.21 −0.10 −0.58 −0.54 1.00           
P (km) −0.02 0.26 −0.13 −0.67 −0.63 0.95 1.00          
L (km) −0.03 0.24 −0.13 −0.62 −0.65 0.82 0.91 1.00         
W (km) 0.05 0.20 −0.04 −0.55 −0.41 0.89 0.81 0.56 1.00        
Lp (km) −0.03 0.19 −0.11 −0.61 −0.61 0.81 0.89 0.96 0.60 1.00       
FF 0.00 −0.05 0.02 −0.04 0.18 0.09 −0.03 −0.33 0.48 −0.20 1.00      
SF −0.06 −0.04 −0.03 0.01 −0.19 −0.06 0.09 0.42 −0.45 0.32 −0.87 1.00     
Rc 0.01 −0.17 0.08 0.37 0.44 −0.33 −0.53 −0.58 −0.08 −0.49 0.58 −0.59 1.00    
RL 0.01 0.25 −0.10 −0.19 −0.06 0.31 0.19 −0.12 0.56 −0.20 0.62 −0.72 0.31 1.00   
DD (km/km20.01 −0.31 0.15 0.13 0.10 −0.14 −0.01 0.10 −0.27 0.05 −0.28 0.40 −0.29 −0.23 1.00  
Rain (m) −0.01 −0.48 0.20 0.37 0.41 −0.37 −0.42 −0.45 −0.26 −0.39 0.21 −0.18 0.28 −0.07 0.31 1.00 

The dendrogram chart obtained from cluster analysis displayed the distribution of stations into different groups arranged based on the hierarchical agglomerative concept. Three distinctive clusters were identified for regionalization by utilizing the proximity points along the dendrogram limb. The details of the stations associated with derived clusters are provided in Table 4, and their region is depicted in Figure 1.

Table 4

Group memberships of stations derived from cluster analysis

Cluster numberNo. of gauge stationsStation nameAssociated river basin
Cluster 1 17 Thumpaman, Ayilam, Pulamanthole, Kuniyil, Nanipalasan, Ozerkheda West-Flowing 
Naguleru Krishna 
Nellithurai, Thengumarahada, Thoppur, Thevur, Kudlur Cauvery 
Ambasamudram, Kashipatnam, Salur, Seedhi East-Flowing 
Pedagedadda Godavari 
Cluster 2 11 Karathodu, Kalampur, Kalloopara, Avershe, Erinjipuzha, Yennehole, Addoor, Santeguli, Kidangoor, Haladi, Bantwal West-Flowing 
Cluster 3 22 Mahuwa West-Flowing 
Balehonnur, Halia, Khanapur, Kellodu, Navalgund, Talikot, Marol Krishna 
KMVadi, Bendrahalli, Sakleshpura, Hogenakkal, E_Mangalam, T. Bekuppe Cauvery 
Gunupur East-Flowing 
Ramakona, Amabal, Wairagarh, Tumnar, Sonarpal, Gandlapet, Cherribeda Godavari 
Cluster numberNo. of gauge stationsStation nameAssociated river basin
Cluster 1 17 Thumpaman, Ayilam, Pulamanthole, Kuniyil, Nanipalasan, Ozerkheda West-Flowing 
Naguleru Krishna 
Nellithurai, Thengumarahada, Thoppur, Thevur, Kudlur Cauvery 
Ambasamudram, Kashipatnam, Salur, Seedhi East-Flowing 
Pedagedadda Godavari 
Cluster 2 11 Karathodu, Kalampur, Kalloopara, Avershe, Erinjipuzha, Yennehole, Addoor, Santeguli, Kidangoor, Haladi, Bantwal West-Flowing 
Cluster 3 22 Mahuwa West-Flowing 
Balehonnur, Halia, Khanapur, Kellodu, Navalgund, Talikot, Marol Krishna 
KMVadi, Bendrahalli, Sakleshpura, Hogenakkal, E_Mangalam, T. Bekuppe Cauvery 
Gunupur East-Flowing 
Ramakona, Amabal, Wairagarh, Tumnar, Sonarpal, Gandlapet, Cherribeda Godavari 

The first cluster contains 17 stations, with the majority of them spread across West-flowing, Cauvery, and East-flowing river basins. The catchment areas of the stations in this cluster vary from a maximum of 1,998 km2 to a minimum of 181 km2, with average elevation varying from 1,710 to 154 m. The mean annual rainfall varies from a maximum of 2,153 mm to a minimum of 805 mm. All the stations in the second cluster are located within the West-flowing river basin with catchment areas varying between 3,204 and 276 km2. Most of the identified stations in West-flowing rivers are bounded by Western Ghats Mountains with an average elevation between 1,302 and 8 m at the coast. The mean annual rainfall varies from a maximum of 4,029 mm to a minimum of 2,280 mm. The last cluster-3 is the biggest, containing 22 stations, with most of them being located in the Krishna, Cauvery, and Godavari basins. The majority of the stations in this cluster have larger catchment areas compared to other clusters (a maximum of 6,930 km2 to a minimum of 601 km2). The average elevations in this cluster vary from 1,220 to 450 m and mean annual rainfall ranges from 2,831 mm to a minimum of 564 mm.

Further, the study utilized the CV test and discordance measure using L-moments to check the credibility of the cluster formations. The homogeneity measure (CC) computed from CV (Equation (5)) was evaluated considering all 50 stations as a single homogeneous region and also separately for each of the 3 regions delineated using cluster analysis. Results of this analysis are shown in Table 5 from which it is revealed that considering a single region fails the homogeneity requirement since the CC value exceeds 0.3. On the other hand, all three delineated clusters yield CC values less than 0.3 and hence they may be considered to be hydrologically homogeneous.

Table 5

Results of CV homogeneity test

RegionsNo. of stationsHomogeneity measure (CC)Region typeTest criteria
All stations 50 0.622 Non-homogenous The region is declared homogenous if CC is less than 0.3 (Nobert et al. 2011)  
Cluster-1 17 0.193 Homogenous 
Cluster-2 11 0.067 Homogenous 
Cluster-3 22 0.134 Homogenous 
RegionsNo. of stationsHomogeneity measure (CC)Region typeTest criteria
All stations 50 0.622 Non-homogenous The region is declared homogenous if CC is less than 0.3 (Nobert et al. 2011)  
Cluster-1 17 0.193 Homogenous 
Cluster-2 11 0.067 Homogenous 
Cluster-3 22 0.134 Homogenous 

As discussed previously, the discordance measure test identifies discordant stations within the homogenous groups. In order to check the discordancy, 38 years of daily rainfall information derived from the Indian Meteorological Department, Government of India, was utilized for the study. L-moments were determined using the R-studio software tool, and L-ratios were calculated as per Equations (6) and (7) for each of the identified gauge stations. The mean L-CV value for the identified stations was estimated to be 0.84 with lower and upper quartiles varying between 0.80 and 0.87 as represented in Figure 3. Similarly, the mean L-Skewness was found to be 0.705 with quartiles ranging from 0.65 to 0.77 and the mean L-Kurtosis was 0.44 with quartiles ranging from 0.36 to 0.53 (Figure 3). It is evident that L-CV values exhibited the smallest variability across the 50 catchments while L-Kurtosis exhibited the highest variability.
Figure 3

Variation of L-CV, L-Skewness, and L-Kurtosis for the identified gauge station.

Figure 3

Variation of L-CV, L-Skewness, and L-Kurtosis for the identified gauge station.

Close modal
Next, the discordance value (Di) was calculated cluster-wise using Equations (8)–(10). Figure 4 represents the results of the discordance measure test executed for all stations in a cluster-wise manner. The test reveals no discordancy was observed in the cluster-2 and cluster-3 stations as most of the Di values are less than 3 (Hosking & Wallis 1993). However, one station located at Nellithurai in cluster-1 has Di slightly more than 3 making it discordant from others within the region. As suggested by Hosking & Wallis (1993), the discordant station was reverified with respect to the variation of its statistical parameters, such as L-CV, L-Skeweness, and L-Kurtosis and catchment-climatic characteristics with other stations within the cluster group. It was evident from the cross-verification exercise that- the discordant station has a similar kind of statistical and catchment-climatic behavior as that of the remaining stations within the same group. Hence it is not worthwhile to shift this station to another region as it has dissimilar behavior with other cluster sites and might affect the regionalization process. Based on these observations, Nellithurai station was retained in Cluster-1 for further analysis.
Figure 4

Results of discordance measure test for Cluster-1, -2, and -3.

Figure 4

Results of discordance measure test for Cluster-1, -2, and -3.

Close modal

Regionalization

The regionalization process in this study was carried out for individual cluster groups separately, resulting in catchment-wise regression relationships in the form of Equation (11). The step-wise regression method (in the SPSS statistical tool) was adopted to develop the separate MLR models for each of the nine flow quantiles (Q10, Q20,…,Q90) as response variables and catchment attributes as predictor variables (Equation (11)). From among the catchment attributes (Article 2.3), only 15 of the physiographic catchment attributes, except the hydroclimatic variable (Rain) were considered as potential predictor variables, and backward step-wise regression analysis was carried out to identify the most significant catchment attributes. To compare the model results, the regression analysis was initially executed by considering all 50 catchments to constitute a single region and subsequently considering catchments in each of the 3 delineated homogeneous clusters.

Single region analysis

In this analysis, all 50 catchments were considered as one group for step-wise regression with the observed flow quantile values in each catchment as the response variable and the 15 catchment physiographic attributes as potential predictor variables in Equation (11). The resulting final MLR models (Equation (11)) for each of the nine flow quantiles represented in terms of the intercept term (ψ0) and regression coefficients (ψ1, ψ2,…. ψN) for the most significant predictor variables as determined through step-wise regression are listed in Table 6. From the results shown therein, certain inferences, albeit in a statistical sense only, can be drawn regarding the influence of catchment attributes on the flow quantiles and thereby on the shape of the FDCs. For instance, it can be seen that the maximum elevation (MAXe) is a significant predictor variable for almost all the flow quantiles and seems to have a major influence on the shape of the FDC. On the other hand, while MINe and circulatory ratio (Rc) have an effect only on the upper half of the FDC, several other attributes such as relative relief (ΔH/P), catchment area (A), basin width (W), and form factor (FF) appear to influence only the lower part of the FDC. Attributes longest flow path (Lp) and elongation ratio (RL) influence some parts of the upper and lower portions of the FDC whereas basin perimeter (P) effects only the median flow quantiles. Slope (S) is significant only in the case of the two largest flow quantiles and drainage density (Dd) for only the lowest flow quantile.

Table 6

Regression coefficients associated with different predictor variables for final MLR models (Equation 11) for nine flow quantiles considering all catchments in single region

Flow quantileConstant Regression coefficients for the predictor variable
MAXe (km)MINe (km)ΔH (km)ΔH/PSA (km2)P (km)L (km)W (km)Lp (km)Dd (km/km2)FFSFRcRL
Q10 −33.3 +93.9 −293.4 −4,078.2 −8.5 +6.3 −909.7 +688.7 
Q20 −19.6 +61 −158.4 −2,235 −4.2 +3 −474.4 +349.4 
Q30 −27.4 +18.3 −68 −2.1 +1.7 −358.5 +209.7 
Q40 +23.8 +23.7 −49 −4,206.1 +0.1 −0.5 
Q50 +9.7 +9.6 −15.4 −1,530.2 +0.03 −0.2 
Q60 +29.5 +6.7 −1,295.5 +0.01 −0.7 −0.2 +32.9 −49.1 
Q70 +21.6 +4.8 −1,085.8 +0.01 −0.7 −0.1 +29.3 −35 
Q80 +12.5 +2.9 −658.1 +0.005 −0.5 −0.1 +21.7 −21.3 
Q90 +0.4 +0.002 −0.3  −0.7 +11.1 
Flow quantileConstant Regression coefficients for the predictor variable
MAXe (km)MINe (km)ΔH (km)ΔH/PSA (km2)P (km)L (km)W (km)Lp (km)Dd (km/km2)FFSFRcRL
Q10 −33.3 +93.9 −293.4 −4,078.2 −8.5 +6.3 −909.7 +688.7 
Q20 −19.6 +61 −158.4 −2,235 −4.2 +3 −474.4 +349.4 
Q30 −27.4 +18.3 −68 −2.1 +1.7 −358.5 +209.7 
Q40 +23.8 +23.7 −49 −4,206.1 +0.1 −0.5 
Q50 +9.7 +9.6 −15.4 −1,530.2 +0.03 −0.2 
Q60 +29.5 +6.7 −1,295.5 +0.01 −0.7 −0.2 +32.9 −49.1 
Q70 +21.6 +4.8 −1,085.8 +0.01 −0.7 −0.1 +29.3 −35 
Q80 +12.5 +2.9 −658.1 +0.005 −0.5 −0.1 +21.7 −21.3 
Q90 +0.4 +0.002 −0.3  −0.7 +11.1 

The performances of the optimal MLR models were evaluated using the performance statistics described in Article 3.5, i.e., R2, RMSE, and PBIAS. However, it was found that all the MLR models yielded negligible values of PBIAS and accordingly the values of only R2 and RMSE are shown in Table 7. The performances of the MLR models developed considering all 50 catchments to be in a single region when evaluated in terms of R2 indicated that the results were quite poor for all the flow quantiles (R2 between 0.16 and 0.31) with the performances being slightly better for the high flow quantiles in comparison to the low-flow quantiles. RMSE values which depend on the magnitude of flows, ranged between 1.90 and 133.80 m3/s with the lower values being associated with low-flow quantiles and vice versa.

Table 7

Performance statistics of final MLR models for nine flow quantiles developed considering catchments in single region, Cluster-1, Cluster-2 and Cluster-3

MLR for flow quantileSingle region
Cluster-1
Cluster-2
Cluster -3
R2RMSE (m3/s)R2RMSE (m3/s)R2RMSE (m3/s)R2RMSE (m3/s)
Q10 0.31 133.80 0.86 32.73 0.98 9.09 0.83 26.64 
Q20 0.29 74.73 0.86 18.39 0.98 6.59 0.85 14.23 
Q30 0.26 43.50 0.98 4.74 0.98 4.73 0.83 9.64 
Q40 0.27 22.31 0.98 1.61 0.98 2.28 0.76 7.59 
Q50 0.20 9.88 0.98 0.43 0.98 0.47 0.77 5.15 
Q60 0.18 5.95 0.92 0.80 0.96 1.39 0.79 3.53 
Q70 0.19 4.23 0.79 0.67 0.96 1.20 0.79 2.44 
Q80 0.22 2.66 0.56 0.79 0.96 0.84 0.81 1.13 
Q90 0.16 1.90 0.72 0.48 0.96 0.71 0.77 0.55 
Average 0.23 33.22 0.85 6.74 0.97 3.03 0.80 7.88 
MLR for flow quantileSingle region
Cluster-1
Cluster-2
Cluster -3
R2RMSE (m3/s)R2RMSE (m3/s)R2RMSE (m3/s)R2RMSE (m3/s)
Q10 0.31 133.80 0.86 32.73 0.98 9.09 0.83 26.64 
Q20 0.29 74.73 0.86 18.39 0.98 6.59 0.85 14.23 
Q30 0.26 43.50 0.98 4.74 0.98 4.73 0.83 9.64 
Q40 0.27 22.31 0.98 1.61 0.98 2.28 0.76 7.59 
Q50 0.20 9.88 0.98 0.43 0.98 0.47 0.77 5.15 
Q60 0.18 5.95 0.92 0.80 0.96 1.39 0.79 3.53 
Q70 0.19 4.23 0.79 0.67 0.96 1.20 0.79 2.44 
Q80 0.22 2.66 0.56 0.79 0.96 0.84 0.81 1.13 
Q90 0.16 1.90 0.72 0.48 0.96 0.71 0.77 0.55 
Average 0.23 33.22 0.85 6.74 0.97 3.03 0.80 7.88 

Cluster-wise analysis

The catchment attributes and the flow quantiles related to 17 catchment stations of Cluster-1 (Table 4) were used to carry out step-wise regression analysis. The forms of the resulting final MLR models for Cluster-1 are listed in Table 8. From these results, it is immediately apparent that unlike in the previous case of considering a single region (Table 6), in this case, all the catchment attributes except MAXe, have an influence on some or all flow quantiles. The attribute relief (ΔH) is a significant predictor for all flow quantiles, and catchment length (L) is significant for all except one flow quantile (Table 8). Attributes S, A, W, and FF appear to have a significant effect on only the low-flow quantiles. Also, it is interesting to note that more number of predictor variables are involved in the models for medium-flow quantiles in comparison to the high-flow quantiles and that the least number of predictors are required for predicting the low-flow quantiles. From a hydrological perspective, the flow characteristics in the majority of catchments within Cluster-1 primarily rely on geometric aspects, such as S, L, A, and W. Additionally, the relief factors, particularly MAXe, S, and ΔH, as well as the areal aspect FF, play a significant role in shaping these flow patterns. Notably, within Cluster-1, a high relief value (ΔH), which indicates the overall steepness of the terrain, was observed in the majority of catchments. Furthermore, FF which signifies the intensity of flow, exhibited higher values in six specific stations (Kuniyil, Ozerkheda, Kudlur, Nellithurai, Thengumarahada, and Seedhi). The shapes of these catchments were more rounded/circular leading to concentrated flows. Conversely, FF values in the remaining stations indicated an elongated catchment behavior. The performance statistics for the final MLR models for Cluster-1 are listed in Table 7. Results indicate that grouping catchments into homogeneous regions leads to significant improvement in the performances of the developed MLR models in comparison to using a single region. High values of R2 for all flow quantiles and more so for the Q30, Q40, and Q50 quantiles were obtained indicating excellent predictive capabilities of the developed MLR models. Also, significantly lower values of RMSE were obtained across all flow quantiles (Table 7).

Table 8

Regression coefficients associated with different predictor variables for final MLR models (Equation 11) for nine flow quantiles considering all catchments in Cluster-1

Flow quantileConstant Regression coefficients for predictor variable
MAXe (km)MINe (km)ΔH (km)ΔH/PSA (km2)P (km)L (km)W (km)Lp (km)Dd (km/km2)FFSFRcRL
Q10 +480 −252.4 +226.5 −9,177.3 +0.5 −8.8 −40.8 +1,006.9 −730.1 
Q20 +280.4 −138.3 +137.3 −5,439.4 +0.3 −5.2 −24.3 +592.7 −445.4 
Q30 +15.6 −44.5 +65.2 +28,452.1 −9,378.4 +0.1 +1.9 −7.6 −18.9 +466.8 
Q40 −16.5 −17.2 +33.4 +17,715.1 −5,535.7 +0.1 +1.2 −4.4 −11.7 +8.3 +280.3 +29.3 
Q50 −35.3 −11.7 +19.2 +6,133.9 −2,103.7 +0.03 +0.4 −2.2 −5.9 +0.6 +4.5 +105.9 +93.3 
Q60 −24.2 −9.9 +11.1 −302.9 +0.01 −0.8 −2.3 +0.6 +21.1 −20.5 +89.1 
Q70 −38.8 −5.6 +3 −0.7 −0.9 +0.6 −3.7 +1.8 −21.7 +87.1 
Q80 −0.28 +1.69 −1.95 
Q90 +1.4 +3.2 −87.1 −0.03 −0.1 +0.1 −1.2 −21.1 +11.6 
Flow quantileConstant Regression coefficients for predictor variable
MAXe (km)MINe (km)ΔH (km)ΔH/PSA (km2)P (km)L (km)W (km)Lp (km)Dd (km/km2)FFSFRcRL
Q10 +480 −252.4 +226.5 −9,177.3 +0.5 −8.8 −40.8 +1,006.9 −730.1 
Q20 +280.4 −138.3 +137.3 −5,439.4 +0.3 −5.2 −24.3 +592.7 −445.4 
Q30 +15.6 −44.5 +65.2 +28,452.1 −9,378.4 +0.1 +1.9 −7.6 −18.9 +466.8 
Q40 −16.5 −17.2 +33.4 +17,715.1 −5,535.7 +0.1 +1.2 −4.4 −11.7 +8.3 +280.3 +29.3 
Q50 −35.3 −11.7 +19.2 +6,133.9 −2,103.7 +0.03 +0.4 −2.2 −5.9 +0.6 +4.5 +105.9 +93.3 
Q60 −24.2 −9.9 +11.1 −302.9 +0.01 −0.8 −2.3 +0.6 +21.1 −20.5 +89.1 
Q70 −38.8 −5.6 +3 −0.7 −0.9 +0.6 −3.7 +1.8 −21.7 +87.1 
Q80 −0.28 +1.69 −1.95 
Q90 +1.4 +3.2 −87.1 −0.03 −0.1 +0.1 −1.2 −21.1 +11.6 

The final forms of the MLR models derived using step-wise regression for catchments in Cluster-2 are listed in Table 9. In this case, it can be seen that 4 (MINe, ΔH, S, and SF) out of the potential 15 catchment attributes considered as predictors are significant in the models of all the flow quantiles. Attributes FF, Rc, and RL too are significant for all but a few flow quantiles. The catchment area (A) and length of the basin (L) are significant for the high-flow and low-flow quantiles, respectively. Overall, the number of significant predictors for this cluster is smaller than that for Cluster-1 but as in the earlier case, models do not seem to be more parsimonious for low flows. Hydrologically, the flows in the majority of catchments within Cluster-2 rely on geometric aspect L, relief aspects MINe, S, and ΔH, and areal aspect FF, SF, Rc, and RL. Similar to the preceding cluster, most catchments in this cluster exhibit a moderate relief value (ΔH). Moreover, four stations (Santeguli, Yennehole, Bantwal, and Haladi) display moderate values of form factor and elongation ratio, lesser values of shape factor, and circulatory ratio, indicating that these catchments have moderately elongated shapes and are associated with a low-flow response. Among all the cases considered, grouping catchments into Cluster-2 yielded the most accurate MLR models for all flow quantiles as indicated by the results of performance analysis shown in Table 7. Extremely high values of R2 (0.96–0.98) and extremely low values of RMSE (0.47–9.09 m3/s) were recorded for the derived MLR models (Table 7).

Table 9

Regression coefficients associated with different predictor variables for final MLR models (Equation 11) for nine flow quantiles considering all catchments in Cluster-2

Flow quantileConstant Regression coefficients for predictor variable
MAXe (km)MINe (km)ΔH (km)ΔH/PSA (km2)P (km)L (km)W (km)Lp (km)Dd (km/km2)FFSFRcRL
Q10 −1,008.4 −4,213.5 −121.9 +9,197.1 +0.4 +596.6 +115.5 +543.6 
Q20 −418.6 −2,858 −87.6 +5,007.3 +0.2 +160.4 +50.3 +298.4 
Q30 −186.4 −2,693.4 −82.2 +3,836.7 +0.1 +49.8 +28.7 +270 
Q40 −93.3 −2,267.1 −134 +4,306.9 +0.04 +2.6 +7.4 +86.3 
Q50 −110.2 −2,471.5 −150.6 +5,353.6 +3.5 +50 +7.9 +108.4 −96.6 
Q60 −183.9 −2,502.8 −156.9 +6,238.9 −0.02 +4.4 +111.3 +10.5 −68.8 
Q70 −143.4 −2,253.5 −107.4 +4,768.8 +2.4 +93.1 +14.6 +187.5 −145 
Q80 −117.3 −1,806.9 −87.5 +3,963.4 +1.9 +76.63 +11.8 +146.8 −122.1 
Q90 −93.3 −1,444.9 −70 +3,164 +1.5 +59.1 +9.3 +115.5 −95.1 
Flow quantileConstant Regression coefficients for predictor variable
MAXe (km)MINe (km)ΔH (km)ΔH/PSA (km2)P (km)L (km)W (km)Lp (km)Dd (km/km2)FFSFRcRL
Q10 −1,008.4 −4,213.5 −121.9 +9,197.1 +0.4 +596.6 +115.5 +543.6 
Q20 −418.6 −2,858 −87.6 +5,007.3 +0.2 +160.4 +50.3 +298.4 
Q30 −186.4 −2,693.4 −82.2 +3,836.7 +0.1 +49.8 +28.7 +270 
Q40 −93.3 −2,267.1 −134 +4,306.9 +0.04 +2.6 +7.4 +86.3 
Q50 −110.2 −2,471.5 −150.6 +5,353.6 +3.5 +50 +7.9 +108.4 −96.6 
Q60 −183.9 −2,502.8 −156.9 +6,238.9 −0.02 +4.4 +111.3 +10.5 −68.8 
Q70 −143.4 −2,253.5 −107.4 +4,768.8 +2.4 +93.1 +14.6 +187.5 −145 
Q80 −117.3 −1,806.9 −87.5 +3,963.4 +1.9 +76.63 +11.8 +146.8 −122.1 
Q90 −93.3 −1,444.9 −70 +3,164 +1.5 +59.1 +9.3 +115.5 −95.1 

Derived MLR models for Cluster-3 catchments are listed in Table 10. It is evident that in this case all 15 potential catchment attributes are involved in one or more models for the 9 flow quantiles. In particular, attributes MAXe, MINe, A, and W turn out to be significant predictors for all flow quantiles, and attributes ΔH/P, P and Lp are important in all but one of the flow quantile models (Table 10). Among the remaining attributes, L and FF are significant for the high-flow quantiles and DD and Rc for low-flow quantiles. In terms of hydrology, the flow patterns in most catchments within Cluster-3 are mainly influenced by geometric aspects, including A, P, W, and Lp. Furthermore, the relief factors, particularly MAXe, MINe, and ΔH/P, along with the areal aspect DD, play a crucial role in shaping these flow patterns. Notably, Cluster-3 exhibits relatively lower values of relative relief and drainage density, indicating an elongated catchment characterized by highly permeable soils and a coarse drainage texture. Performance statistics of the MLR models for Cluster-3 shown in Table 7 indicate that although the accuracies of the derived models are not as good as for the other two clusters, the performances still are far superior to the case of the single region case. Values of R2 in the range of 0.76–0.85 and RMSE values between 0.55 and 26.64 m3/s for the derived MLR models are indicative of good performances.

Table 10

Regression coefficients associated with different predictor variables for final MLR models (Equation 11) for nine flow quantiles considering all catchments in Cluster-3

Flow quantileConstant Regression coefficients for the predictor variable
MAXe (km)MINe (km)ΔH (km)ΔH/PSA (km2)P (km)L (km)W (km)Lp (km)Dd (km/km2)FFSFRcRL
Q10 +298.8 −184 +103.5 +78,998.7 +0.2 −3.6 −23.4 +1.1 +765.8 −869.3 
Q20 +35.7 −151.2 +101 +70,198.6 +0.1 +0.4 −1.6 −15.6 −14.9 +435.2 
Q30 +63.4 −118.2 +99.2 +53,307.6 +0.1 +0.2 −5.4 −0.8 +208.6 −184 
Q40 +46.5 −64.8 +54.8 +31,833.4 +0.02 +0.1 −0.8 −0.6 −110.2 
Q50 −6.6 −49.7 +37.9 +25,301 +0.01 +0.1 −1.2 −0.3 −5.8 
Q60 −24.6 −37.8 +28.1 +19,403.3 +0.006 +0.1 −1.1 −0.2 −5.2 +88.3 
Q70 −17.3 −25.9 +19.2 +13,390.9 +0.004 +0.1 −0.8 −0.1 −3.9 +62.5 
Q80 −4.1 −12.2 +9.4 +6,284.3 +0.004 +0.1 −0.7 −0.1 −1.7 +19.3 +53 −22.4 
Q90 −4.9 −4.6 +2.8 +535.8 +0.002 +0.03 −0.4 −0.04 −0.8 +32.9 
Flow quantileConstant Regression coefficients for the predictor variable
MAXe (km)MINe (km)ΔH (km)ΔH/PSA (km2)P (km)L (km)W (km)Lp (km)Dd (km/km2)FFSFRcRL
Q10 +298.8 −184 +103.5 +78,998.7 +0.2 −3.6 −23.4 +1.1 +765.8 −869.3 
Q20 +35.7 −151.2 +101 +70,198.6 +0.1 +0.4 −1.6 −15.6 −14.9 +435.2 
Q30 +63.4 −118.2 +99.2 +53,307.6 +0.1 +0.2 −5.4 −0.8 +208.6 −184 
Q40 +46.5 −64.8 +54.8 +31,833.4 +0.02 +0.1 −0.8 −0.6 −110.2 
Q50 −6.6 −49.7 +37.9 +25,301 +0.01 +0.1 −1.2 −0.3 −5.8 
Q60 −24.6 −37.8 +28.1 +19,403.3 +0.006 +0.1 −1.1 −0.2 −5.2 +88.3 
Q70 −17.3 −25.9 +19.2 +13,390.9 +0.004 +0.1 −0.8 −0.1 −3.9 +62.5 
Q80 −4.1 −12.2 +9.4 +6,284.3 +0.004 +0.1 −0.7 −0.1 −1.7 +19.3 +53 −22.4 
Q90 −4.9 −4.6 +2.8 +535.8 +0.002 +0.03 −0.4 −0.04 −0.8 +32.9 

Jackknife cross-validation

The jackknife cross-validation process described in Article 3.4 was implemented to check the reliability of the flow quantile MLR models developed for the three clusters. The flow quantile estimate from the validation test was compared with the observed values and performance was evaluated in terms of R2 (Equation (12)), RMSE (Equation (13)), and PBIAS (Equation (14)).

The range in the values of each of these performance statistics in each cluster for each flow quantile is shown separately in Figure 5. Examination of R2 values reveals that in Cluster-1, high values are recorded for the median flow quantiles, reasonably better values for high flow quantiles, and low values for the low-flow quantiles. This implies that the MLR models developed for this cluster (Table 8) may be considered reliable for the prediction of medium to high-flow quantiles in ungauged basins. Examination of results for individual stations revealed that the unsatisfactory performance for low-flow quantiles was due to overprediction of Q70 and Q90 quantiles at six stations, namely Nanipalsan, Kudlur, Thoppur, Seedhi, Ambasamudram, and Salur. Conversely, underprediction of these quantiles was seen at four stations (Naguleru, Thengumarhada, Kashipatnam, and Pedagedada). For Cluster-2, the reliability of the models in terms of R2 values is reasonably high for high-flow quantiles (except Q20) but poor for the low-flow quantiles (Figure 5). Performance was unsatisfactory due to the under-prediction of low-flow quantiles at four stations, namely, Yennehole (Q30, Q50, and Q70), Bantwal (Q10, Q30, and Q60), Adoor (Q60), and Haladi (Q60Q90) stations. Overprediction of low flows was observed at Yennehole (Q60), Adoor (Q30 and Q70), Kidangoor (Q10 and Q20) and Karathodu (Q10 and Q30) stations. In Cluster-3, R2 values in the range 0.4–0.5 were recorded for high flow quantiles but were extremely low for all other flow quantiles. The poor performance of the model is due to over prediction at stations – Talikoti (Q60Q90), Navalgund (all flow quantiles), Halia (all quantiles), K.M.Vadi (Q10), Bendrehalli (all quantiles), Gogenakal (all quantiles), T.Bekuppe (Q10Q20) and Gadlapet(Q10Q20). Conversely, under-prediction was observed at stations Mahuwa (Q10Q20), Kellodu (Q10Q30), Balehonnur (all quantiles), E-Mangalam (Q10Q50), Amabal (all quantiles), Cherribeda (Q10), and Sonarpal (all quantiles).
Figure 5

Jackknife performance statistics for flow duration MLR models in Cluster-1, -2, and -3.

Figure 5

Jackknife performance statistics for flow duration MLR models in Cluster-1, -2, and -3.

Close modal

Values of RMSE on the other hand indicate higher reliabilities in the jackknife validation. For instance, Figure 5 shows that in Cluster-1, low values of RMSE are evident for all flow quantiles except Q10 and Q20. For the MLR models in Cluster-2, RMSE values are high for high-flow quantiles, moderate for median flow quantiles, and low for low-flow quantiles. Low values of RMSE were recorded for all flow quantiles (except Q10) for Cluster-3. The PBIAS values shown in Figure 5 indicate that they were less than 50% for all flow quantiles in all three clusters except for Q60 in Cluster-2.

Overall, the Jackknife cross-validation procedure provided a mixed response regarding the reliability of the MLR models for the regionalization process. The regression models designed for all three clusters performed very well for high-flow quantiles but were unsatisfactory in predicting the low-flow quantiles. Such weakening of the model for low-flow quantiles indicates the possibilities of uncertainties introduced in the low-flow values on account of zero flows. Further, leaving out one station at a time during Jackknife cross-validation might affect the model statistics if a highly influential attribute (comparatively higher area, elevation, width, length, etc.) is left out during analysis. A study by Arsenault & Brissette (2016) indicated that the poor performance of regionalization is due to uncertainties in data measurement or incorrect selection of catchment attributes. Hence it is essential to review these stations' information for errors, uncertainties, and other discrepancies for applying appropriate corrections.

The present study was taken up to implement hydrologic regionalization and develop simple regression models for flow quantiles in ungauged basins located in Southern Peninsular India. For this purpose, historical flow records of 50 largely unregulated catchments located in the region were used. Period-of-record FDCs for each of the catchments were developed and a total of nine flow quantiles (Q10Q90) were extracted by interpolation for each catchment. Also, a database of 15 catchment attributes and average annual rainfall values were extracted from the DEMs of catchments and gridded rainfall data. To enable effective regionalization, a hierarchical agglomerative cluster analysis was implemented using Ward's linkage method, and the study area was delineated into three homogeneous clusters. Cluster-1 had 17 catchments, followed by 11 catchments in Cluster-2 and 22 catchments in Cluster-3. All three clusters were found to be homogenous without any discordant stations as per the CV test and L-Discordancy measure using the L-Moment ratio.

As the next step in regionalization, MLR models relating each flow quantile (response variable) to the catchment attributes (predictor variables) were developed. A step-wise regression technique was used to arrive at the final forms of the MLR models containing only the most significant predictor variables. Initially, MLR models were developed considering all 50 catchments to be within a single region and subsequently for each of the three clusters by considering catchments located within them. Performances of the developed MLR models were evaluated using the coefficient of determination (R2), RMSE, and percentage bias (PBIAS) statistics. Models developed for the clusters performed quite well with average R2 values for nine flow quantiles being 0.85 for Cluster-1, 0.97 for Cluster-2, and 0.80 for Cluster-3. In contrast, considering all 50 catchments a single group was unsatisfactory in predicting the flow quantiles with an average R2 of 0.23. These results demonstrate the critical need to delineate catchments into homogeneous groups and hierarchical cluster analysis proved to be an efficient technique for doing this.

A jackknife cross-validation technique, which was adopted to check the reliability of the MLR models, revealed a mixed response for different flow quantiles. Very good to satisfactory performance was recorded for high-flow quantiles but was found to be unsatisfactory for low-flow quantiles in all three clusters.

Overall results of this study demonstrate that the use of hierarchical cluster analysis along with largely unregulated historical flow records for a large number of catchments and a variety of catchment attributes can result in the development of models for the prediction of FDCs in ungauged catchments which are very accurate and reasonably reliable.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Abdolhay
A.
,
Saghafian
B.
,
Soom
M. A.
&
Ghazali
A. H.
2012
Identification of homogenous regions in Gorganrood Basin (Iran) for the purpose of regionalization
.
Natural Hazards
61
(
3
),
1427
1442
.
Arsenault
R.
&
Brissette
F.
2016
Analysis of continuous streamflow regionalization methods within a virtual setting
.
Hydrological Sciences Journal
61
(
15
),
2680
2693
.
Bao
Z.
,
Zhang
J.
,
Liu
J.
,
Fu
G.
,
Wang
G.
,
He
R.
,
Yan
X.
,
Jin
J.
&
Liu
H.
2012
Comparison of regionalization approaches based on regression and similarity for predictions in ungauged catchments under multiple hydro-climatic conditions
.
Journal of Hydrology
466–467
,
37
46
.
Blöschl
G.
&
Sivapalan
M.
1995
Scale issues in hydrological modelling: A review
.
Hydrological Processes
9
(
3–4
),
251
290
.
Boscarello
L.
,
Ravazzani
G.
,
Cislaghi
A.
&
Mancini
M.
2016
Regionalization of flow-duration curves through catchment classification with streamflow signatures and physiographic–climate indices
.
Journal of Hydrologic Engineering
21
(
3
),
05015027
.
Burn
D. H.
,
Zrinji
Z.
&
Kowalchuk
M.
1997
Regionalization of catchments for regional flood frequency analysis
.
Journal of Hydrologic Engineering
2
(2), 76–82.
Castellarin
A.
,
Galeati
G.
,
Brandimarte
L.
,
Montanari
A.
&
Brath
A.
2004a
Regional flow-duration curves: Reliability for ungauged basins
.
Advances in Water Resources
27
(
10
),
953
965
.
Castellarin
A.
,
Vogel
R. M.
&
Brath
A.
2004b
A stochastic index flow model of flow duration curves
.
Water Resources Research
40
(
3
),
W03104
.
Chouaib
W.
,
Alila
Y.
&
Caldwell
P. V.
2019
On the use of mean monthly runoff to predict the flow–duration curve in ungauged catchments
.
Hydrological Sciences Journal
64
(
13
),
1573
1587
.
Demirel
M. C.
2004
Cluster analysis of streamflow data over Turkey
.
MSc thesis, ITU, Istanbul, Turkey
.
Domínguez
E.
,
Dawson
C. W.
,
Ramírez
A.
&
Abrahart
R. J.
2010
The search for orthogonal hydrological modelling metrics: A case study of 20 monitoring stations in Colombia
.
Journal of Hydroinformatics
13
(
3
),
429
442
.
Fennessey
N.
&
Vogel
R. M.
1990
Regional flow-duration curves for ungauged sites in Massachusetts
.
Journal of Water Resources Planning and Management
116
(
4
),
530
549
.
Gaviria
C.
&
Carvajal-Serna
F.
2022
Regionalization of flow duration curves in Colombia
.
Hydrology Research
53
(
8
),
1075
1089
.
He
Y.
,
Bárdossy
A.
&
Zehe
E.
2011
A review of regionalisation for continuous streamflow simulation
.
Hydrology and Earth System Sciences
15
(
11
),
3539
3553
.
Hosking
J. R.
1990
L-moments: Analysis and estimation of distributions using linear combinations of order statistics
.
Journal of the Royal Statistical Society: Series B (Methodological)
52
(
1
),
105
124
.
Hosking
J. R.
&
Wallis
J. R.
1993
Some statistics useful in regional frequency analysis
.
Water Resources Research
29
(
2
),
271
281
.
Huang
C.
,
Wang
G.
,
Zheng
X.
,
Yu
J.
&
Xu
X.
2015
Simple linear modeling approach for linking hydrological model parameters to the physical features of a river basin
.
Water Resources Management
29
(
9
),
3265
3289
.
Isik
S.
&
Singh
V. P.
2008
Hydrologic regionalization of watersheds in Turkey
.
Journal of Hydrologic Engineering
13
(
9
),
824
834
.
Javadinejad
S.
2021
A review on homogeneity across hydrological regions
.
Resources Environment and Information Engineering
3
(
1
),
124
137
.
Karki
N.
,
Shakya
N. M.
,
Pandey
V. P.
,
Devkota
L. P.
,
Pradhan
A. M.
&
Lamichhane
S.
2023
Comparative performance of regionalization methods for model parameterization in ungauged Himalayan watersheds
.
Journal of Hydrology: Regional Studies
47
, 101359.
Li
M.
,
Shao
Q.
,
Zhang
L.
&
Chiew
F. H. S.
2010
A new regionalization approach and its application to predict flow duration curve in ungauged basins
.
Journal of Hydrology
389
(
1–2
),
137
145
.
Li
Q.
,
Li
Z.
,
Zhu
Y.
,
Deng
Y.
,
Zhang
K.
&
Yao
C.
2018
Hydrological regionalisation based on available hydrological information for runoff prediction at catchment scale
.
Proceedings of the International Association of Hydrological Sciences
379
,
13
19
.
Ma
L.
,
Liu
D.
,
Huang
Q.
,
Guo
F.
,
Zheng
X.
,
Zhao
J.
,
Luan
J.
,
Fan
J.
&
Ming
G.
2023
Identification of a function to fit the flow duration curve and parameterization of a semi-arid region in North China
.
Atmosphere
14
(
1
),
116
.
McCuen
R. H.
2005
Accuracy assessment of peak discharge models
.
Journal of Hydrologic Engineering
10
(
1
),
16
22
.
Moriasi
D. N.
,
Gitau
M. W.
,
Pai
N.
&
Daggupati
P.
2015
Hydrologic and water quality models: Performance measures and evaluation criteria
.
Transactions of the ASABE
58
(
6
),
1763
1785
.
Mosley
M. P.
1981
Delimitation of New Zealand hydrologic regions
.
Journal of Hydrology
49
(
1–2
),
173
192
.
Mulaomerović-Šeta
A.
,
Blagojević
B.
,
Mihailović
V.
&
Petroselli
A.
2023
A silhouette-width-induced hierarchical clustering for defining flood estimation regions
.
Hydrology
10
(
6
),
126
.
Nathan
R. J.
&
McMahon
T. A.
1990
Identification of homogeneous regions for the purposes of regionalisation
.
Journal of Hydrology
121
(
1–4
),
217
238
.
Nobert
J.
,
Ndayizeye
J.
&
Mkhandi
S.
2011
Regional flow duration curve estimation and its application in assessing low flow characteristics for ungauged catchment. A case study of Rwegura Catchment-Burundi
.
Nile Basin Water Science & Engineering Journal
4
(
1
),
13
23
.
Panthi
J.
,
Talchabhadel
R.
,
Ghimire
G. R.
,
Sharma
S.
,
Dahal
P.
,
Baniya
R.
,
Boving
T.
,
Pradhanang
S. M.
&
Parajuli
B.
2021
Hydrologic regionalization under data scarcity: Implications for streamflow prediction
.
Journal of Hydrologic Engineering
26
(
9
),
05021022
.
Parajka
J.
,
Merz
R.
&
Blöschl
G.
2005
A comparison of regionalisation methods for catchment model parameters
.
Hydrology and Earth System Sciences
9
(
3
),
157
171
.
Petrakis
R. E.
,
Norman
L. M.
,
Vaughn
K.
,
Pritzlaff
R.
,
Weaver
C.
,
Rader
A.
&
Pulliam
H. R.
2021
Hierarchical clustering for paired watershed experiments: Case study in southeastern Arizona, U.S.A
.
Water
13
(
21
),
2955
.
Qamar
M. U.
,
Ganora
D.
,
Claps
P.
,
Azmat
M.
,
Shahid
M. A.
&
Khushnood
R. A.
2016
Flow duration curve regionalization with enhanced selection of donor basins
.
Journal of Applied Water Engineering and Research
6
(
1
),
70
84
.
Quimpo
R. G.
,
Alejandrino
A. A.
&
McNally
T. A.
1983
Regionalized flow duration for Philippines
.
Journal of Water Resources Planning and Management
109
(
4
),
320
330
.
Rao
A. R.
&
Srinivas
V. V.
2006
Regionalization by hybrid cluster analysis
.
Journal of Hydrology
318
(
1
),
37
56
.
Riswandi
H.
,
Sukiyah
E.
&
Tania
D.
2022
Hydrogeological cluster analysis with average linkage and ward method in the southern slope of Merapi Mountain
. In:
Proceedings of the 5th International Conference of Geological Engineering Faculty
Universitas Padjadjaran, Indonesia. TIIKM Publishing
. .
Shaban
M.
,
Urban
B.
,
El Saadi
A.
&
Faisal
M.
2010
Detection and mapping of water pollution variation in the Nile delta using multivariate clustering and GIS techniques
.
Journal of Environmental Management
91
(
8
),
1785
1793
.
Shao
J.
&
Tu
D.
1995
The Jackknife and Bootstrap. Springer Series in Statistics, Springer Inc., New York
.
Shu
C.
&
Ouarda
T. B.
2012
Improved methods for daily streamflow estimates at ungauged sites
.
Water Resources Research
48
(
2
),
W02523
.
Silva
R.
,
Blanco
C. J.
&
Pessoa
F. C.
2019
Alternative for the regionalization of flow duration curves
.
Journal of Applied Water Engineering and Research
7
(
3
),
198
206
.
Sivapalan
M.
,
Takeuchi
K.
,
Franks
S. W.
,
Gupta
V. K.
,
Karambiri
H.
,
Lakshmi
V.
,
Liang
X.
,
Mcdonnell
J. J.
,
Mendiondo
E. M.
,
O'connell
P. E.
,
Oki
T.
,
Pomeroy
J. W.
,
Schertzer
D.
,
Uhlenbrook
S.
&
Zehe
E.
2003
IAHS decade on predictions in ungauged basins (PUB), 2003-2012: Shaping an exciting future for the Hydrological Sciences
.
Hydrological Sciences Journal
48
(
6
), 857–880.
Song
Z.
,
Xia
J.
,
Wang
G.
,
She
D.
,
Hu
C.
&
Hong
S.
2022
Regionalization of hydrological model parameters using gradient boosting machine
.
Hydrology and Earth System Sciences
26
(
2
),
505
524
.
Sugiyama
H.
,
Vudhivanich
V.
,
Whitaker
A. C.
&
Lorsirirat
K.
2003
Stochastic flow duration curves for evaluation of flow regimes in rivers
.
Journal of the American Water Resources Association
39
(
1
),
47
58
.
Swain
J. B.
&
Patra
K. C.
2017
Streamflow estimation in ungauged catchments using regional flow duration curve: Comparative study
.
Journal of Hydrologic Engineering
22
(7), 04017010.
Tasker
G. D.
1982
Comparing methods of hydrologic regionalization
.
Journal of the American Water Resources Association
18
(
6
),
965
970
.
Vogel
R. M.
&
Fennessey
N. M.
1994
Flow-duration curves. I: New interpretation and confidence intervals
.
Journal of Water Resources Planning and Management
120
(
4
),
485
504
.
Vogel
R. M.
,
Wilson
I.
&
Daly
C.
1999
Regional regression models of annual streamflow for the United States
.
Journal of Irrigation and Drainage Engineering
125
(
3
),
148
157
.
Yang
X.
,
Magnusson
J.
,
Rizzi
J.
&
Xu
C. Y.
2017
Runoff prediction in ungauged catchments in Norway: Comparison of regionalization approaches
.
Hydrology Research
49
(
2
),
487
505
.
Yu
P. S.
&
Yang
T. C.
1996
Synthetic regional flow duration curve for southern Taiwan
.
Hydrological Processes
10
(
3
),
373
391
.
Yu
P. S.
,
Yang
T. C.
&
Wang
Y. C.
2002
Uncertainty analysis of regional flow duration curves
.
Journal of Water Resources Planning and Management
128
(
6
),
424
430
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).