## Abstract

The present study on the hydrologic regionalization was taken up to evaluate the utility of hierarchical cluster analysis for the delineation of hydrologically homogeneous regions and multiple linear regression (MLR) models for information transfer to derive flow duration curve (FDC) in ungauged basins. For this purpose, 50 catchments with largely unregulated flows located in South India were identified and a dataset of historical streamflow records and 16 catchment attributes was created. Using selected catchment attributes, three hydrologically homogenous regions were delineated using a hierarchical agglomerative cluster approach, and nine flow quantiles (10–90%) for each of the catchments in the respective clusters was derived. Regionalization approach was then adopted, whereby using step-wise regression, flow quantiles were related with readily derived basin-physical characteristics through MLR models. Cluster-wise performance analysis of the developed models indicated excellent performance with an average coefficient of determination (*R*^{2}) values of 0.85, 0.97, and 0.8 for Cluster-1, -2, and -3, respectively, in comparison to poor performance when all 50 stations were considered to be in a single region. However, Jackknife cross-validation showed mixed performances with regard to the reliability of developed models with performance being good for high-flow quantiles and poor for low-flow quantiles.

## HIGHLIGHTS

Hierarchical cluster analysis was used to delineate 50 unregulated catchments into homogenous groups.

Nine flow quantiles of the flow duration curve were extracted for each catchment and related to significant catchment attributes through multiple linear regression models.

Accuracies of models developed for each cluster were good but jackknife cross-validation showed fairly high reliability for only high-flow quantiles.

## INTRODUCTION

Time series information related to river flow is essential for water resources management studies, such as assessment of water availability for domestic supply and irrigation, forecasting of floods and droughts, assessing the ecosystem health, and analysis and design of water resources projects (Vogel *et al.* 1999; Masih *et al.* 2010; Karki *et al.* 2023). Streamflow information is neither available nor sufficient in terms of quality or quantity, resulting in many catchments being classified as ungauged. Despite substantial progress in hydrological research, many developing countries continue to struggle with problems associated with insufficient hydrometric data and the problems of land-use changes and climate change impacts, affecting water availability and degrading the ecosystems. Such issues are difficult to overcome when estimating flows from an ungauged or inadequately gauged basin (Sivapalan *et al.* 2003), resulting in improper planning and management of water resources not only at the ungauged site but also at the river basin level (Masih *et al.* 2010). Thus, the prediction of flows in ungauged basins is of practical significance for water resources planning and management and has been recognized as a critical research topic by the international hydrologic community (Sivapalan *et al.* 2003; Qamar *et al.* 2016; Guo *et al.* 2020; Karki *et al.* 2023).

The flow duration curve (FDC) of a catchment provides a concise, yet complete description of the runoff regime and therefore its prediction in ungauged basins is considered important (Boscarello *et al.* 2016; Ma *et al.* 2023; Yang *et al.* 2023). The FDC represents the relationship between stream discharge and the percentage of time (duration) (*D*) that this discharge (*Q*_{D}) was equaled or exceeded in the period of record. It has wide applications in the field of water resources assessment and management which include the estimation of the abstractable volume of water from rivers for domestic, irrigation, and hydropower projects, evaluation of low-flow statistics to maintain the water-quality standards, flood frequency analysis, wetland inundation mapping, reservoir and lake sedimentation studies, and instream flow assessment studies (Fennessey & Vogel 1990; Vogel & Fennessey 1994; Yu *et al.* 2002; Qamar *et al.* 2016; Silva *et al.* 2019; Gaviria & Carvajal 2022; Yang *et al.* 2023). FDC is one of the most commonly adopted techniques for the prediction of flows through regionalization (Boscarello *et al.* 2016; Qamar *et al.* 2016). It is for this reason that researchers have devoted significant efforts to the prediction of FDC in ungauged basins using the hydrologic regionalization approach. In this approach, certain characteristics of the observed FDC derived from historical flow records in gauged basins are transferred to ungauged basins located within hydrologically homogeneous regions (Panthi *et al.* 2021). Information transfer is achieved by establishing relationships between the flow quantiles extracted from observed FDC and selected catchment characteristics for the gauged basins. The developed relationships are then used to derive flow quantiles/FDC for the ungauged basins using data pertaining to their catchment characteristics.

For example, Yu & Yang (1996) assessed regional FDC for Southern Taiwan using multivariate statistical analysis of flow data from 34 sites. Castellarin *et al.* (2004a, 2004b) developed regional FDC for 51 unregulated river basins in Italy using catchment and morphologic characteristics. Mohamoud (2008) developed a regression model for different exceedance probabilities of flows in more than 40 climatic and landscape regions of the Northeastern US. A regionalization study by Shu & Ouarda (2012) indicated that FDC-based method outperformed the area-ratio method in 109 stations of Quebec Provinces in Canada. A recent study by Qamar *et al.* (2016) presented a nonparametric regionalization procedure for assessing FDC for 124 catchments of Northwestern Italy. Releasing the wide hydrological application of FDC, Chouaib *et al.* (2019) predicted the daily FDC through regionalization in ungauged basins using the hydroclimatic data of 73 catchments in the eastern USA. Similarly, the regionalization of FDC was demonstrated by Panthi *et al.* (2021) for predicting streamflow values for the data scares region of the central Himalayas. Considering FDC as a crucial indicator of river basins, Ma *et al.* (2023) attempted to identify the best-fit function using the regression analysis concept for developing FDC in a semi-arid region of North China. Karki *et al.* (2023) evaluated the uses and limitations of the regionalization method for developing FDC in 23 medium- to small-sized watersheds across Nepal. Among various regionalization techniques, one of the most popularly used methods is the regression technique that creates multivariate regression between streamflow and catchment attributes and has the advantage of evaluating each model parameter independently (Parajka *et al.* 2005; Yang *et al.* 2017).

While not all studies consider the delineation of hydrologically homogenous regions for regionalization, research (e.g., Yu & Yang 1996; Burgan & Aksoy 2020) has demonstrated that in doing so, increased accuracy in information transfer can be achieved especially if substantial spatial variability in the hydrologic or physiographic features of the catchments exists (Isik & Singh 2008). In the past few decades, different methods for the delineation of homogenous regions using a variety of similarity measures have been proposed (Tasker 1982; Rao & Srinivas 2006; Nobert *et al.* 2011; Latt *et al.* 2014; Boscarello *et al.* 2016; Li *et al.* 2018; Javadinejad 2021; Song *et al.* 2022) among which multivariate cluster analysis has proved to be the most efficient one. For example, Yu & Yang (1996) defined homogenous regions using cluster analysis for developing FDC for 34 stream-gauged stations in Southern Taiwan. Stream flow information from 655 gauging stations in Columbia was studied by Gaviria & Carvajal (2022) and they delineated 15 homogenous regions using geological, topographic, and climatic information as clustering variables and K-mean algorithms for grouping. An agglomerative hierarchical clustering algorithm was used by Burn *et al.* (1997) to define homogeneous regions for regional flood frequency analysis in the Saskatchewan–Nelson River basin in west-central Canada. A hierarchical clustering approach was adopted by Boscarello *et al.* (2016) for classifying 46 catchments in the Upper Po River basin in northwest Italy into three homogenous groups that were further used to estimate FDCs using the regionalization technique. Petrakis *et al.* (2021) used a hierarchical clustering approach to classify sub-basins of Smith Canyon Watershed, USA based on 12 environmental variables related to structural, biophysical, and hydrologic traits. Owing to the simplicity and wide spread use of the hierarchical clustering method, Mulaomerović-Šeta *et al.* (2023) adopted hierarchical clustering for grouping the basins and subsequently predicted the flood quantiles in ungauged basins of West Balkans using the regionalization concept. Similarly, various other studies on identifying homogenous regions using cluster analysis have been carried out by Mosley (1981), Shaban *et al.* (2010), Goyal & Gupta (2014), Abdolhay *et al.* (2012), Latt *et al.* (2014), Li *et al.* (2018), and Riswandi *et al.* (2022).

The review of the literature revealed that with regard to the characteristics of the observed FDC to be reproduced in the ungauged basin, two broad approaches have been used – a ‘parametric’ approach in which a function (empirical or probabilistic) is fitted to the observed FDCs and the optimized parameters of the function are predicted for the ungauged basin using the transfer function. On the other hand, in the ‘point’ approach, flow quantiles (*Q*_{D}) corresponding to specific values of duration (*D*) (for example, *Q*_{10}, *Q*_{20},….*Q*_{90}) are extracted from the observed FDCs and predicted for the ungauged basin using the transfer function. Also, few previous studies seem to have explicitly checked whether or not recorded streamflows in the selected gauged catchments are influenced by upstream diversions or regulations due to the presence of dams/reservoirs. This may prove to be a critical issue since the effect of such anthropogenic modifications on the natural runoff regime of the gauged catchments may be transferred to the ungauged basin in the process of regionalization. Therefore, it is imperative to ensure that the selected gauged catchments possess unregulated flows if the hydrological predictions in the ungauged basins are to represent natural conditions. Also, despite the fact that a large number of new water resources projects are being planned in India in general and in South India in particular, few previous studies seem to been taken up to develop tools to predict streamflows and FDCs in ungauged catchments in this region.

Therefore, the present study was taken up with the specific objective of developing appropriate models for the prediction of FDCs in ungauged basins located in the South Indian peninsular region. For this purpose, 50 gauged catchments in the region which did not have any major water resources project upstream of the gauging station and, therefore represented largely unregulated flows, were selected. For each of these catchments, available historical records of observed daily flows were compiled along with various catchment characteristics. Using frequency analysis, period-of-record observed FDCs were derived and a set of nine flow quantiles (*Q*_{10}–Q_{90}) were extracted from them for each catchment. Step-wise regression was used to develop multiple linear regression (MLR) equations relating each flow quantile to catchment characteristics, initially by considering the entire study area to represent a single homogeneous region and subsequently by delineating the catchments into separate homogeneous regions using hierarchical cluster analysis. Finally, the accuracy of the developed MLR equations was assessed using a leave-one-out jack knife cross-validation procedure. Complete details of the study area, data used, methodology adopted, and results obtained thereof are presented in subsequent sections of this paper.

## STUDY AREA AND DATA

### Study area

River basin name . | No. of stations . | Station name . |
---|---|---|

West flowing rivers | 18 | Santeguli, Avershe, Yennehole, Addoor, Bantwal, Erinjipuzha, Kidangoor, Kalloopara, Thumpaman, Ayilam, Kuniyil, Karathodu, Kalampur, Mahuwa, Haladi, Nanipalasan, Ozerkheda, Pulamanthole |

Krishna | 08 | Kellodu, Talikot, Navalgund, Balehonnur, Khanapur, Marol, Halia, Naguleru @Dachepalli |

Cauvery | 11 | Sakleshpura, K M Vadi, E_Mangalam, Bendrahalli, Hogenakkal, Kudlur, Thevur, Thoppur, Nellithurai, Thengumarahada, T. Bekuppe |

East flowing rivers | 05 | Kashipatnam, Seedhi, Ambasamudram, Salur, Gunupur |

Godavari | 08 | Pedagedadda, Ramakona, Wairagarh, Amabal, Tumnar, Cherribeda, Gandlapet, Sonarpal |

River basin name . | No. of stations . | Station name . |
---|---|---|

West flowing rivers | 18 | Santeguli, Avershe, Yennehole, Addoor, Bantwal, Erinjipuzha, Kidangoor, Kalloopara, Thumpaman, Ayilam, Kuniyil, Karathodu, Kalampur, Mahuwa, Haladi, Nanipalasan, Ozerkheda, Pulamanthole |

Krishna | 08 | Kellodu, Talikot, Navalgund, Balehonnur, Khanapur, Marol, Halia, Naguleru @Dachepalli |

Cauvery | 11 | Sakleshpura, K M Vadi, E_Mangalam, Bendrahalli, Hogenakkal, Kudlur, Thevur, Thoppur, Nellithurai, Thengumarahada, T. Bekuppe |

East flowing rivers | 05 | Kashipatnam, Seedhi, Ambasamudram, Salur, Gunupur |

Godavari | 08 | Pedagedadda, Ramakona, Wairagarh, Amabal, Tumnar, Cherribeda, Gandlapet, Sonarpal |

The areas of the delineated catchments upstream of the gauging stations varied from a minimum of 171 km^{2} to a maximum of 6,930 km^{2}, with six catchments having catchment areas in excess of 3,000 km^{2}. The topography of the study area consists of hilly terrain in the West and a flat plateau toward the East with the average elevation varying from 251 to 1,404 m. The mean annual rainfall of the selected catchments ranged from a minimum of 27 mm to a maximum of 12,068mm. The major part of river flow occurs during the wet monsoon months from July to August, and the flow is negligible in most of the rivers during the other months. The average daily temperature in the identified catchments ranges between 21 and 31 °C.

### Discharge data

Daily discharge data for each of the identified stream gauge stations were extracted from India – WaterResources Information System (WRIS) portal by the Central Water Commission (CWC), Government of India. Owing to the inconsistency in the available data, the longest discharge data length selected for the study was from 1991 to 2018, and the shortest was from 2008 to 2018.

### Catchment attributes

All the identified catchments were delineated using 30 m resolution Shuttle Radar Topography Mission (SRTM) – digital elevation model (DEM) data obtained from the US Geological Survey's Earth Resources Observation and Science (EROS) Center. The DEM was used to compute the following catchment attributes for the 50 basins: maximum elevation ‘MAX_{e}’, minimum elevation ‘MIN_{e}’ (km), relief ‘Δ*H*’ (km), relative relief ‘Δ*H*/*P*’, slope ‘*S*', catchment area ‘*A*’ (km^{2}), basin perimeter ‘*P*’ (km), length of the basin ‘*L*’ (km), basin width ‘*W*’ (km), longest flow path ‘*L*_{p}’ (km), drainage density ‘*D*_{d}’ (km/km^{2}), form factor ‘FF’, shape factor ‘SF’, circulatory ratio ‘*R*_{c}’, and elongation ratio ‘*R*_{L}’.

Daily rainfall data for 50 grid points were obtained from the India Meteorological Department (IMD) 0.25° × 0.25° gridded rainfall product for the period 1981–2018. Using grid points lying within and close to each catchment, areal rainfall values were derived using the Thiessen polygon method, and average annual rainfall values ‘Rain’ (mm) for this historical period were obtained for each catchment and included with the other attributes.

These catchment attributes were used in the cluster analysis and also in the development of regression models for the FDCs (described later).

## METHODOLOGY

### Flow duration curve

FDC is one of the common tools used in hydrological studies that provide concise information about the river flow variability in the study basin (Quimpo *et al.* 1983; Yu *et al.* 2002; Castellarin *et al.* 2004a, 2004b; Boscarello *et al.* 2016; Burgan & Aksoy 2020). FDC is a graphical representation of the magnitude of stream flow versus the percentage of time a particular stream flow is exceeded or equaled over a period of time. An FDC is said to be the complement of the cumulative distribution function of daily streamflow. FDC for each of the gauge stations can be developed using a standard nonparametric approach that involves counting the number of occurrences of historical flows falling within class intervals of descending flow magnitudes *q _{i}* with

*i*= 1, 2…,

*n*and subsequent calculation of percent exceedance probability using an appropriate plotting position formula (Fennessey & Vogel 1990; Vogel & Fennessey 1994; Sugiyama

*et al.*2003; Castellarin

*et al.*2004a, 2004b; Isik & Singh 2008; Li

*et al.*2010; Shu & Ouarda 2012).

*p*is the probability of the discharge being greater than or equal to a specified value

_{i}*q*of ordered streamflows, ‘

_{i}*m*’ is the number of counts of the flow values falling within the specified interval, and ‘

*n*’ is the number of events on records. From this analysis, nine flow quantiles related to their corresponding exceedance probability values viz 10, 20, 30, 40, 50, 60, 70, 80, and 90% were obtained by interpolation for each of the identified gauge stations (Shu & Ouarda 2012).

### Cluster analysis and homogeneity test

Clustering is a multivariate technique to identify hydrologically homogenous regions using hydrological or catchment characteristics. Prior to regionalization, a homogeneous group of catchments is created by grouping the catchments into clusters according to the characteristics of the variables within the clusters (Yu & Yang 1996; Rao & Srinivas 2006; Isik & Singh 2008). Clustering techniques are generally classified into hierarchical and flat clustering. The usefulness of hybrid cluster analysis (combination of hierarchical and flat clustering) in regionalization was demonstrated by Rao & Srinivas (2006) for watersheds in Indiana, USA. Hierarchical clustering is preferred over flat clustering when the number of clusters is unknown. Hierarchical clustering works by merging smaller clusters into bigger ones, known as the agglomerative technique, or dividing bigger clusters into smaller ones, known as the divisive technique (Rao & Srinivas 2006; Javadinejad 2021). Hierarchical clustering is typically displayed using a tree-like figure recognized as a ‘dendrogram’ of clusters (explains the organization of the clusters). As per Demirel (2004), the user must decide the number of clusters to be formed, as the dendrogram will not provide the cluster assignment details. Since the length of the dendrogram's limb denotes the proximity of points, data can be clustered by cutting the dendrogram at a desired level (Isik & Singh 2008; Boscarello *et al.* 2016).

Previous research has used hierarchical clustering techniques, such as single linkage, complete linkage, centroid, average distance, and Ward's minimum variance technique for hydrologic regionalization (Li *et al.* 2018). For example, Tasker (1982) adopted a complete linkage algorithm to regionalize watersheds in Arizona. Comparison of different algorithms, including single, complete, and average linkage, centroid, median, and Ward's method, was demonstrated by Nathan & McMahon (1990) using the Statistical Package for the Social Sciences (SPSS) tool. Similarly, Burn *et al.* (1997) used an agglomerative hierarchical clustering algorithm to regionalize watersheds in Canada. Ward's method outperformed the other methods in terms of separation so that clusters are relatively dense with low variability within groups (Boscarello *et al.* 2016). Hence Ward's method was adopted in the present study to group the stations into clusters using hierarchical agglomerative algorithms.

*D*

_{P}_{, Q}is the distance between two stations,

*P*and

*Q*.

*X*and

_{Pi}*X*is the

_{Qi}*i*th attribute at stations

*P*and

*Q*, and

*j*is the total number of selected attributes.

#### Identification of clustering variables

Clustering attributes/variables have a strong influence on the results. Hence appropriate cluster attributes/variables should be identified before grouping the homogenous catchments (Yu & Yang 1996). The cluster analysis can categorize groups based on discharge data, topographical and meteorological characteristics (Rao & Srinivas 2006; Boscarello *et al.* 2016). Further, it is sensible to include those attributes that are not highly correlated with each other (Boscarello *et al.* 2016). In this context, cluster analysis of 50 catchments in the present study was carried out using the catchment attributes described (Article 2.3). The cluster analysis for the study was carried out for the identified clustering attributes in the SPSS statistical package tool using Euclidean distance as a similarity measure and Ward's technique for linkages.

#### Homogeneity test and discordance measure

*et al.*(2011), the homogeneity within the catchment groups derived from cluster analysis is assessed using the coefficient of variation (CV) test in this study. This test involves the calculation of mean, standard deviation, and CV of daily rainfall (times series data used for cluster analysis) at each station of the study area. The regional average coefficient of variation (CV

_{Avg}) and standard deviation of CV (

*σ*

_{CV}) of the river flow information is given aswhere CV

*is the coefficient of variation at the*

_{i}*i*th station, and

*N*is the number of stations used in the study. A region is regarded to be homogenous if the homogeneity measure (CC) defined by Equation (5) is less than or equal to 0.3.Further, the discordance measure aims to recognize discordant catchment stations within the group that need to be adjusted or excluded to improve their homogeneity. In order to do so, a discordance method proposed by Hosking & Wallis (1993) has been adopted in this study. The discordance measure utilizes the advantages offered by sampling properties of L-moment ratios. L-moments are expectations of certain linear combinations of order statistics that are more robust than conventional moments to outliers, suffer less from the effects of sample variability, and help to obtain valuable inferences from small samples about an underlying probability distribution (Hosking 1990). This study used the R-Studio software package to determine the initial four

*L*-moments (

*L*

_{1},

*L*

_{2},

*L*

_{3}, and

*L*

_{4}) for each identified station. The L-moment ratios are defined aswhere

*τ*is the measure of scale and dispersion,

*τ*

_{3}and

*τ*

_{4}are measures of skewness and kurtosis, respectively.

Higher values of *D _{i}* indicate the most discordant station in the group. As per Hosking & Wallis (1993), the stations identified as discordant should be examined thoroughly, as discordancy may result from sampling variability or changes in the attribute values due to localized extreme events. Irrespective of the discordance value, the statistical parameters of the stations need to be compared with other stations within the group before declaring the station as discordant.

### Cluster-wise regionalization through MLR technique

Once the homogenous regions are delineated using cluster analysis, the regionalization concept is applied in each cluster. Regionalization is one of the commonly used techniques for the analysis of flow characteristics in the ungauged catchments by utilizing the information from one or more gauged stations located within the same hydrological homogenous region (Blöschl & Sivapalan 1995; Sivapalan *et al.* 2003; Li *et al.* 2010; Bao *et al.* 2012; Yang *et al.* 2017; Guo *et al.* 2020). Regionalization can be carried out by different techniques, viz. regression analysis, area-index, nearest neighbor method, and hydrological similarity method. Among these, the MLR analysis is one of the earliest and most widely used techniques globally for regionalization (Li *et al.* 2010; Bao *et al.* 2012; Swain & Patra 2017). This technique aims to develop a relationship between identified catchment characteristics and stream flow information corresponding to gauged stations through an MLR equation. Various researchers have adopted such multivariate regression analysis for analyzing the ungauged catchments. For example, Bao *et al.* (2012) compared the regionalization approaches based on regression and similarity methods in 55 catchments of China. Vogel *et al.* (1999) developed the regional regression model relating the hydrologic, geomorphic, and climatic characteristics of a large number of catchments across the United States. Cluster-wise regionalization analysis was investigated by Li *et al.* (2018) for 15 catchments located in the Yangtze and Yellow River basins of China. An attempt was made by Huang *et al.* (2015) to integrate the regression concept of regionalization with clustering analysis, and it turned out to be effective in the Yalong River Basin, China. Compared to other methods, the regression approach has more advantages that include integration of catchment and stream flow characteristics in Geographical Information System (GIS) processing, analysis of climate change impacts on water yields, and most importantly, this concept can be used to quantify the mean and variance of the stream flow for any catchment in the region (Vogel *et al.* 1999).

Using the data available, both the parametric and point approaches for the regionalization of the FDCs were implemented. However, preliminary results for the parametric approach (not shown here for brevity) indicated poorer performance in comparison to the point approach, and therefore the latter approach was adopted.

*Q*

_{10},

*Q*

_{20},….

*Q*

_{90}) extracted from the observed FDCs to the identified catchment attributes of the gauged catchments were established. The general form of the MLR equation used waswhere

*Q*(

*D*) is the flow quantile of specific percentage duration (

*D*) (10, 20 …90%) for each of the gauged catchments;

*X*

_{1},

*X*

_{2}, ……

*X*are the selected catchment characteristics for the gauged catchments and , ,……. are regression coefficients obtained through the least squares criterion. The optimal regression coefficients obtained from Equation (11) for identified gauged stations were then utilized to estimate Q(D) values for the ungauged stations by substituting their catchment attributes (Yu

_{n}*et al.*2002; Shu & Ouarda 2012; Nruthya & Srinivas 2015; Silva

*et al.*2019). As per He

*et al.*(2011), the performance of the MLR approach mainly depends on the appropriate choice of attributes.

For the regionalization approach, catchment attributes were selected and utilized for multiple regression analysis in each of the 50 unregulated stations. The correlation matrix of the catchment attributes (Article 2.3) was studied to reduce multi-collinearity problems (Mohamoud 2008). The correlation matrix analysis coupled with the step-wise regression procedure facilitated the identification of the most influential and irredundant catchment attribute for explaining the observed variability in the flow quantiles (Nathan & McMahon 1990; Vogel *et al.* 1999; Yu *et al.* 2002). The backward step-wise regression analysis tool available in the SPSS statistical package was used in the present study.

### Jackknife cross-validation

Jackknife cross-validation is a commonly used validation technique for evaluating the uncertainties between the developed model and input data (Efron 1981; Shao & Tu 1995) and is especially useful with small data sets. In a comparative study between jackknife and split-sampling methods, McCuen (2005) found that the jackknife test was less sensitive to the sample size variation and provided better model prediction accuracy than the split-sampling technique. Literature has also indicated that the model precision obtained using the jackknife technique is independent of calibration data (McCuen 2005; Shu & Ouarda 2012).

In the jackknife cross-validation technique as applied to the regionalization of FDCs, one gauge station is assumed to be ungauged and the information from the remaining (*n* − 1) gauge stations is utilized to develop the regression model associated with specified flow quantiles. The catchment attributes of the withheld station are then used in the developed regression to synthesize the flow quantile in the assumed ungauged catchment. Similarly, the station withheld in the first jackknife run is replaced, and the next gauged station is assumed to be ungauged for the second run. This procedure is continued until all the gauge stations have been utilized to make predictions (McCuen 2005; Shu & Ouarda 2012; Nruthya & Srinivas 2015). The model's reliability is then tested by comparing the predicted jackknife estimates (model predictions) with that of the observed values. Such cross-validations help to derive the reliability of the regional regression model (Castellarin *et al.* 2004a).

### Performance evaluation

The model efficiency in predicting the flow quantile was determined by comparing the model results with the observed flow values corresponding to the station under investigation. The following three efficiency measures were used to assess the performance of the MLR models:

In the above equations, ‘*N*’ denotes the number of selected catchments, ‘*O*’ and ‘*P*’ are the observed and predicted flow values, and, and indicate the average values of observed and predicted flow rates. RMSE displays the extent of a typical error. For an ideal model, the RMSE value should be zero. The coefficient of determination (*R*^{2}) determines the percentage variation in the observations which are explained by the model with a value of 1 signifying ideal model performance (Domínguez *et al.* 2010; Shu & Ouarda 2012). Further, the model's tendency to consistently underpredict or overpredict the observed values is measured using percent bias (PBIAS). Model performance is acceptable if the PBIAS is within 100% (Moriasi *et al.* 2015; Burgan & Aksoy 2020).

## RESULTS AND DISCUSSION

### Flow duration curve

As mentioned previously, a total of nine flow quantile values, viz 10, 20, 30, 40, 50, 60, 70, 80, and 90% were obtained by interpolation for each of the identified gauge stations.

The extracted flow quantiles were utilized for regression analysis and subsequent generation of FDCs for ungauged stations.

### Cluster analysis

Cluster analysis was carried out to delineate the hydrologically homogenous region in the study region. A hierarchical agglomerative cluster algorithm that merges smaller cluster groups into bigger ones is adopted in the study for clustering the 50 stations using Ward's minimum variance technique with a Squared Euclidean similarity measure.

In order to reduce bias, it is crucial to choose a limited set of variables for cluster analysis. The cluster variables shown in Table 2 were identified based on a literature review and analysis of the correlation matrix of the 16 catchment attributes as shown in Table 3. As mentioned earlier, the clustering of 50 basins was carried out with the SPSS software tool. Each of the nine cluster variables was standardized to give equal importance and avoid the issues related to the usage of different measuring units.

Attribute . | Units . | Abbreviation . |
---|---|---|

Maximum elevation | km | MAX_{e} |

Minimum elevation | km | MIN_{e} |

Slope | – | S |

Basin area | km^{2} | A |

Shape factor | – | SF |

Circularity ratio | – | R_{c} |

Elongation ratio | – | R_{L} |

Drainage density | km/km^{2} | DD |

Rainfall | m | Rain |

Attribute . | Units . | Abbreviation . |
---|---|---|

Maximum elevation | km | MAX_{e} |

Minimum elevation | km | MIN_{e} |

Slope | – | S |

Basin area | km^{2} | A |

Shape factor | – | SF |

Circularity ratio | – | R_{c} |

Elongation ratio | – | R_{L} |

Drainage density | km/km^{2} | DD |

Rainfall | m | Rain |

Variables . | MAX_{e} (km)
. | MIN_{e} (km)
. | ΔH (km)
. | ΔH/P
. | S
. | A (km^{2})
. | P (km)
. | L (km)
. | W (km)
. | L_{p} (km)
. | FF . | SF . | R_{c}
. | R_{L}
. | DD (km/km^{2})
. | Rain (m) . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MAX_{e} (km) | 1.00 | |||||||||||||||

MIN_{e} (km) | −0.07 | 1.00 | ||||||||||||||

ΔH (km) | 0.90 | −0.49 | 1.00 | |||||||||||||

ΔH/P | 0.52 | −0.52 | 0.68 | 1.00 | ||||||||||||

S | 0.59 | −0.52 | 0.74 | 0.96 | 1.00 | |||||||||||

A (km^{2}) | −0.01 | 0.21 | −0.10 | −0.58 | −0.54 | 1.00 | ||||||||||

P (km) | −0.02 | 0.26 | −0.13 | −0.67 | −0.63 | 0.95 | 1.00 | |||||||||

L (km) | −0.03 | 0.24 | −0.13 | −0.62 | −0.65 | 0.82 | 0.91 | 1.00 | ||||||||

W (km) | 0.05 | 0.20 | −0.04 | −0.55 | −0.41 | 0.89 | 0.81 | 0.56 | 1.00 | |||||||

Lp (km) | −0.03 | 0.19 | −0.11 | −0.61 | −0.61 | 0.81 | 0.89 | 0.96 | 0.60 | 1.00 | ||||||

FF | 0.00 | −0.05 | 0.02 | −0.04 | 0.18 | 0.09 | −0.03 | −0.33 | 0.48 | −0.20 | 1.00 | |||||

SF | −0.06 | −0.04 | −0.03 | 0.01 | −0.19 | −0.06 | 0.09 | 0.42 | −0.45 | 0.32 | −0.87 | 1.00 | ||||

R_{c} | 0.01 | −0.17 | 0.08 | 0.37 | 0.44 | −0.33 | −0.53 | −0.58 | −0.08 | −0.49 | 0.58 | −0.59 | 1.00 | |||

R_{L} | 0.01 | 0.25 | −0.10 | −0.19 | −0.06 | 0.31 | 0.19 | −0.12 | 0.56 | −0.20 | 0.62 | −0.72 | 0.31 | 1.00 | ||

DD (km/km^{2}) | 0.01 | −0.31 | 0.15 | 0.13 | 0.10 | −0.14 | −0.01 | 0.10 | −0.27 | 0.05 | −0.28 | 0.40 | −0.29 | −0.23 | 1.00 | |

Rain (m) | −0.01 | −0.48 | 0.20 | 0.37 | 0.41 | −0.37 | −0.42 | −0.45 | −0.26 | −0.39 | 0.21 | −0.18 | 0.28 | −0.07 | 0.31 | 1.00 |

Variables . | MAX_{e} (km)
. | MIN_{e} (km)
. | ΔH (km)
. | ΔH/P
. | S
. | A (km^{2})
. | P (km)
. | L (km)
. | W (km)
. | L_{p} (km)
. | FF . | SF . | R_{c}
. | R_{L}
. | DD (km/km^{2})
. | Rain (m) . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MAX_{e} (km) | 1.00 | |||||||||||||||

MIN_{e} (km) | −0.07 | 1.00 | ||||||||||||||

ΔH (km) | 0.90 | −0.49 | 1.00 | |||||||||||||

ΔH/P | 0.52 | −0.52 | 0.68 | 1.00 | ||||||||||||

S | 0.59 | −0.52 | 0.74 | 0.96 | 1.00 | |||||||||||

A (km^{2}) | −0.01 | 0.21 | −0.10 | −0.58 | −0.54 | 1.00 | ||||||||||

P (km) | −0.02 | 0.26 | −0.13 | −0.67 | −0.63 | 0.95 | 1.00 | |||||||||

L (km) | −0.03 | 0.24 | −0.13 | −0.62 | −0.65 | 0.82 | 0.91 | 1.00 | ||||||||

W (km) | 0.05 | 0.20 | −0.04 | −0.55 | −0.41 | 0.89 | 0.81 | 0.56 | 1.00 | |||||||

Lp (km) | −0.03 | 0.19 | −0.11 | −0.61 | −0.61 | 0.81 | 0.89 | 0.96 | 0.60 | 1.00 | ||||||

FF | 0.00 | −0.05 | 0.02 | −0.04 | 0.18 | 0.09 | −0.03 | −0.33 | 0.48 | −0.20 | 1.00 | |||||

SF | −0.06 | −0.04 | −0.03 | 0.01 | −0.19 | −0.06 | 0.09 | 0.42 | −0.45 | 0.32 | −0.87 | 1.00 | ||||

R_{c} | 0.01 | −0.17 | 0.08 | 0.37 | 0.44 | −0.33 | −0.53 | −0.58 | −0.08 | −0.49 | 0.58 | −0.59 | 1.00 | |||

R_{L} | 0.01 | 0.25 | −0.10 | −0.19 | −0.06 | 0.31 | 0.19 | −0.12 | 0.56 | −0.20 | 0.62 | −0.72 | 0.31 | 1.00 | ||

DD (km/km^{2}) | 0.01 | −0.31 | 0.15 | 0.13 | 0.10 | −0.14 | −0.01 | 0.10 | −0.27 | 0.05 | −0.28 | 0.40 | −0.29 | −0.23 | 1.00 | |

Rain (m) | −0.01 | −0.48 | 0.20 | 0.37 | 0.41 | −0.37 | −0.42 | −0.45 | −0.26 | −0.39 | 0.21 | −0.18 | 0.28 | −0.07 | 0.31 | 1.00 |

The dendrogram chart obtained from cluster analysis displayed the distribution of stations into different groups arranged based on the hierarchical agglomerative concept. Three distinctive clusters were identified for regionalization by utilizing the proximity points along the dendrogram limb. The details of the stations associated with derived clusters are provided in Table 4, and their region is depicted in Figure 1.

Cluster number . | No. of gauge stations . | Station name . | Associated river basin . |
---|---|---|---|

Cluster 1 | 17 | Thumpaman, Ayilam, Pulamanthole, Kuniyil, Nanipalasan, Ozerkheda | West-Flowing |

Naguleru | Krishna | ||

Nellithurai, Thengumarahada, Thoppur, Thevur, Kudlur | Cauvery | ||

Ambasamudram, Kashipatnam, Salur, Seedhi | East-Flowing | ||

Pedagedadda | Godavari | ||

Cluster 2 | 11 | Karathodu, Kalampur, Kalloopara, Avershe, Erinjipuzha, Yennehole, Addoor, Santeguli, Kidangoor, Haladi, Bantwal | West-Flowing |

Cluster 3 | 22 | Mahuwa | West-Flowing |

Balehonnur, Halia, Khanapur, Kellodu, Navalgund, Talikot, Marol | Krishna | ||

KMVadi, Bendrahalli, Sakleshpura, Hogenakkal, E_Mangalam, T. Bekuppe | Cauvery | ||

Gunupur | East-Flowing | ||

Ramakona, Amabal, Wairagarh, Tumnar, Sonarpal, Gandlapet, Cherribeda | Godavari |

Cluster number . | No. of gauge stations . | Station name . | Associated river basin . |
---|---|---|---|

Cluster 1 | 17 | Thumpaman, Ayilam, Pulamanthole, Kuniyil, Nanipalasan, Ozerkheda | West-Flowing |

Naguleru | Krishna | ||

Nellithurai, Thengumarahada, Thoppur, Thevur, Kudlur | Cauvery | ||

Ambasamudram, Kashipatnam, Salur, Seedhi | East-Flowing | ||

Pedagedadda | Godavari | ||

Cluster 2 | 11 | Karathodu, Kalampur, Kalloopara, Avershe, Erinjipuzha, Yennehole, Addoor, Santeguli, Kidangoor, Haladi, Bantwal | West-Flowing |

Cluster 3 | 22 | Mahuwa | West-Flowing |

Balehonnur, Halia, Khanapur, Kellodu, Navalgund, Talikot, Marol | Krishna | ||

KMVadi, Bendrahalli, Sakleshpura, Hogenakkal, E_Mangalam, T. Bekuppe | Cauvery | ||

Gunupur | East-Flowing | ||

Ramakona, Amabal, Wairagarh, Tumnar, Sonarpal, Gandlapet, Cherribeda | Godavari |

The first cluster contains 17 stations, with the majority of them spread across West-flowing, Cauvery, and East-flowing river basins. The catchment areas of the stations in this cluster vary from a maximum of 1,998 km^{2} to a minimum of 181 km^{2}, with average elevation varying from 1,710 to 154 m. The mean annual rainfall varies from a maximum of 2,153 mm to a minimum of 805 mm. All the stations in the second cluster are located within the West-flowing river basin with catchment areas varying between 3,204 and 276 km^{2}. Most of the identified stations in West-flowing rivers are bounded by Western Ghats Mountains with an average elevation between 1,302 and 8 m at the coast. The mean annual rainfall varies from a maximum of 4,029 mm to a minimum of 2,280 mm. The last cluster-3 is the biggest, containing 22 stations, with most of them being located in the Krishna, Cauvery, and Godavari basins. The majority of the stations in this cluster have larger catchment areas compared to other clusters (a maximum of 6,930 km^{2} to a minimum of 601 km^{2}). The average elevations in this cluster vary from 1,220 to 450 m and mean annual rainfall ranges from 2,831 mm to a minimum of 564 mm.

Further, the study utilized the CV test and discordance measure using L-moments to check the credibility of the cluster formations. The homogeneity measure (CC) computed from CV (Equation (5)) was evaluated considering all 50 stations as a single homogeneous region and also separately for each of the 3 regions delineated using cluster analysis. Results of this analysis are shown in Table 5 from which it is revealed that considering a single region fails the homogeneity requirement since the CC value exceeds 0.3. On the other hand, all three delineated clusters yield CC values less than 0.3 and hence they may be considered to be hydrologically homogeneous.

Regions . | No. of stations . | Homogeneity measure (CC) . | Region type . | Test criteria . |
---|---|---|---|---|

All stations | 50 | 0.622 | Non-homogenous | The region is declared homogenous if CC is less than 0.3 (Nobert et al. 2011) |

Cluster-1 | 17 | 0.193 | Homogenous | |

Cluster-2 | 11 | 0.067 | Homogenous | |

Cluster-3 | 22 | 0.134 | Homogenous |

Regions . | No. of stations . | Homogeneity measure (CC) . | Region type . | Test criteria . |
---|---|---|---|---|

All stations | 50 | 0.622 | Non-homogenous | The region is declared homogenous if CC is less than 0.3 (Nobert et al. 2011) |

Cluster-1 | 17 | 0.193 | Homogenous | |

Cluster-2 | 11 | 0.067 | Homogenous | |

Cluster-3 | 22 | 0.134 | Homogenous |

*D*) was calculated cluster-wise using Equations (8)–(10). Figure 4 represents the results of the discordance measure test executed for all stations in a cluster-wise manner. The test reveals no discordancy was observed in the cluster-2 and cluster-3 stations as most of the

_{i}*D*values are less than 3 (Hosking & Wallis 1993). However, one station located at Nellithurai in cluster-1 has

_{i}*D*slightly more than 3 making it discordant from others within the region. As suggested by Hosking & Wallis (1993), the discordant station was reverified with respect to the variation of its statistical parameters, such as L-CV, L-Skeweness, and L-Kurtosis and catchment-climatic characteristics with other stations within the cluster group. It was evident from the cross-verification exercise that- the discordant station has a similar kind of statistical and catchment-climatic behavior as that of the remaining stations within the same group. Hence it is not worthwhile to shift this station to another region as it has dissimilar behavior with other cluster sites and might affect the regionalization process. Based on these observations, Nellithurai station was retained in Cluster-1 for further analysis.

_{i}### Regionalization

The regionalization process in this study was carried out for individual cluster groups separately, resulting in catchment-wise regression relationships in the form of Equation (11). The step-wise regression method (in the SPSS statistical tool) was adopted to develop the separate MLR models for each of the nine flow quantiles (*Q*_{10}, *Q*_{20},…,*Q*_{90}) as response variables and catchment attributes as predictor variables (Equation (11)). From among the catchment attributes (Article 2.3), only 15 of the physiographic catchment attributes, except the hydroclimatic variable (Rain) were considered as potential predictor variables, and backward step-wise regression analysis was carried out to identify the most significant catchment attributes. To compare the model results, the regression analysis was initially executed by considering all 50 catchments to constitute a single region and subsequently considering catchments in each of the 3 delineated homogeneous clusters.

#### Single region analysis

In this analysis, all 50 catchments were considered as one group for step-wise regression with the observed flow quantile values in each catchment as the response variable and the 15 catchment physiographic attributes as potential predictor variables in Equation (11). The resulting final MLR models (Equation (11)) for each of the nine flow quantiles represented in terms of the intercept term (*ψ*_{0}) and regression coefficients (*ψ*_{1}, *ψ*_{2},…. *ψ*_{N}) for the most significant predictor variables as determined through step-wise regression are listed in Table 6. From the results shown therein, certain inferences, albeit in a statistical sense only, can be drawn regarding the influence of catchment attributes on the flow quantiles and thereby on the shape of the FDCs. For instance, it can be seen that the maximum elevation (MAX_{e}) is a significant predictor variable for almost all the flow quantiles and seems to have a major influence on the shape of the FDC. On the other hand, while MIN_{e} and circulatory ratio (*R*_{c}) have an effect only on the upper half of the FDC, several other attributes such as relative relief (Δ*H*/*P*), catchment area (*A*), basin width (*W*), and form factor (FF) appear to influence only the lower part of the FDC. Attributes longest flow path (*L*_{p}) and elongation ratio (*R*_{L}) influence some parts of the upper and lower portions of the FDC whereas basin perimeter (*P*) effects only the median flow quantiles. Slope (*S*) is significant only in the case of the two largest flow quantiles and drainage density (*D*_{d}) for only the lowest flow quantile.

Flow quantile . | Constant . | Regression coefficients for the predictor variable . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MAX_{e} (km)
. | MIN_{e} (km)
. | ΔH (km)
. | ΔH/P
. | S
. | A (km^{2})
. | P (km)
. | L (km)
. | W (km)
. | L_{p} (km)
. | D_{d} (km/km^{2})
. | FF . | SF . | R_{c}
. | R_{L}
. | ||

Q_{10} | −33.3 | +93.9 | −293.4 | = | = | −4,078.2 | = | = | −8.5 | = | +6.3 | = | = | = | −909.7 | +688.7 |

Q_{20} | −19.6 | +61 | −158.4 | = | = | −2,235 | = | = | −4.2 | = | +3 | = | = | = | −474.4 | +349.4 |

Q_{30} | −27.4 | +18.3 | −68 | = | = | = | = | = | −2.1 | = | +1.7 | = | = | = | −358.5 | +209.7 |

Q_{40} | +23.8 | +23.7 | −49 | = | −4,206.1 | = | = | +0.1 | −0.5 | = | = | = | = | = | = | = |

Q_{50} | +9.7 | +9.6 | −15.4 | = | −1,530.2 | = | = | +0.03 | −0.2 | = | = | = | = | = | = | = |

Q_{60} | +29.5 | +6.7 | = | = | −1,295.5 | = | +0.01 | = | = | −0.7 | −0.2 | = | +32.9 | = | = | −49.1 |

Q_{70} | +21.6 | +4.8 | = | = | −1,085.8 | = | +0.01 | = | = | −0.7 | −0.1 | = | +29.3 | = | = | −35 |

Q_{80} | +12.5 | +2.9 | = | = | −658.1 | = | +0.005 | = | = | −0.5 | −0.1 | = | +21.7 | = | = | −21.3 |

Q_{90} | +0.4 | = | = | = | = | = | +0.002 | = | = | −0.3 | −0.7 | +11.1 | = | = | = |

Flow quantile . | Constant . | Regression coefficients for the predictor variable . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MAX_{e} (km)
. | MIN_{e} (km)
. | ΔH (km)
. | ΔH/P
. | S
. | A (km^{2})
. | P (km)
. | L (km)
. | W (km)
. | L_{p} (km)
. | D_{d} (km/km^{2})
. | FF . | SF . | R_{c}
. | R_{L}
. | ||

Q_{10} | −33.3 | +93.9 | −293.4 | = | = | −4,078.2 | = | = | −8.5 | = | +6.3 | = | = | = | −909.7 | +688.7 |

Q_{20} | −19.6 | +61 | −158.4 | = | = | −2,235 | = | = | −4.2 | = | +3 | = | = | = | −474.4 | +349.4 |

Q_{30} | −27.4 | +18.3 | −68 | = | = | = | = | = | −2.1 | = | +1.7 | = | = | = | −358.5 | +209.7 |

Q_{40} | +23.8 | +23.7 | −49 | = | −4,206.1 | = | = | +0.1 | −0.5 | = | = | = | = | = | = | = |

Q_{50} | +9.7 | +9.6 | −15.4 | = | −1,530.2 | = | = | +0.03 | −0.2 | = | = | = | = | = | = | = |

Q_{60} | +29.5 | +6.7 | = | = | −1,295.5 | = | +0.01 | = | = | −0.7 | −0.2 | = | +32.9 | = | = | −49.1 |

Q_{70} | +21.6 | +4.8 | = | = | −1,085.8 | = | +0.01 | = | = | −0.7 | −0.1 | = | +29.3 | = | = | −35 |

Q_{80} | +12.5 | +2.9 | = | = | −658.1 | = | +0.005 | = | = | −0.5 | −0.1 | = | +21.7 | = | = | −21.3 |

Q_{90} | +0.4 | = | = | = | = | = | +0.002 | = | = | −0.3 | −0.7 | +11.1 | = | = | = |

The performances of the optimal MLR models were evaluated using the performance statistics described in Article 3.5, i.e., *R*^{2}, RMSE, and PBIAS. However, it was found that all the MLR models yielded negligible values of PBIAS and accordingly the values of only *R*^{2} and RMSE are shown in Table 7. The performances of the MLR models developed considering all 50 catchments to be in a single region when evaluated in terms of *R*^{2} indicated that the results were quite poor for all the flow quantiles (*R*^{2} between 0.16 and 0.31) with the performances being slightly better for the high flow quantiles in comparison to the low-flow quantiles. RMSE values which depend on the magnitude of flows, ranged between 1.90 and 133.80 m^{3}/s with the lower values being associated with low-flow quantiles and vice versa.

MLR for flow quantile . | Single region . | Cluster-1 . | Cluster-2 . | Cluster -3 . | ||||
---|---|---|---|---|---|---|---|---|

R^{2}
. | RMSE (m^{3}/s)
. | R^{2}
. | RMSE (m^{3}/s)
. | R^{2}
. | RMSE (m^{3}/s)
. | R^{2}
. | RMSE (m^{3}/s)
. | |

Q_{10} | 0.31 | 133.80 | 0.86 | 32.73 | 0.98 | 9.09 | 0.83 | 26.64 |

Q_{20} | 0.29 | 74.73 | 0.86 | 18.39 | 0.98 | 6.59 | 0.85 | 14.23 |

Q_{30} | 0.26 | 43.50 | 0.98 | 4.74 | 0.98 | 4.73 | 0.83 | 9.64 |

Q_{40} | 0.27 | 22.31 | 0.98 | 1.61 | 0.98 | 2.28 | 0.76 | 7.59 |

Q_{50} | 0.20 | 9.88 | 0.98 | 0.43 | 0.98 | 0.47 | 0.77 | 5.15 |

Q_{60} | 0.18 | 5.95 | 0.92 | 0.80 | 0.96 | 1.39 | 0.79 | 3.53 |

Q_{70} | 0.19 | 4.23 | 0.79 | 0.67 | 0.96 | 1.20 | 0.79 | 2.44 |

Q_{80} | 0.22 | 2.66 | 0.56 | 0.79 | 0.96 | 0.84 | 0.81 | 1.13 |

Q_{90} | 0.16 | 1.90 | 0.72 | 0.48 | 0.96 | 0.71 | 0.77 | 0.55 |

Average | 0.23 | 33.22 | 0.85 | 6.74 | 0.97 | 3.03 | 0.80 | 7.88 |

MLR for flow quantile . | Single region . | Cluster-1 . | Cluster-2 . | Cluster -3 . | ||||
---|---|---|---|---|---|---|---|---|

R^{2}
. | RMSE (m^{3}/s)
. | R^{2}
. | RMSE (m^{3}/s)
. | R^{2}
. | RMSE (m^{3}/s)
. | R^{2}
. | RMSE (m^{3}/s)
. | |

Q_{10} | 0.31 | 133.80 | 0.86 | 32.73 | 0.98 | 9.09 | 0.83 | 26.64 |

Q_{20} | 0.29 | 74.73 | 0.86 | 18.39 | 0.98 | 6.59 | 0.85 | 14.23 |

Q_{30} | 0.26 | 43.50 | 0.98 | 4.74 | 0.98 | 4.73 | 0.83 | 9.64 |

Q_{40} | 0.27 | 22.31 | 0.98 | 1.61 | 0.98 | 2.28 | 0.76 | 7.59 |

Q_{50} | 0.20 | 9.88 | 0.98 | 0.43 | 0.98 | 0.47 | 0.77 | 5.15 |

Q_{60} | 0.18 | 5.95 | 0.92 | 0.80 | 0.96 | 1.39 | 0.79 | 3.53 |

Q_{70} | 0.19 | 4.23 | 0.79 | 0.67 | 0.96 | 1.20 | 0.79 | 2.44 |

Q_{80} | 0.22 | 2.66 | 0.56 | 0.79 | 0.96 | 0.84 | 0.81 | 1.13 |

Q_{90} | 0.16 | 1.90 | 0.72 | 0.48 | 0.96 | 0.71 | 0.77 | 0.55 |

Average | 0.23 | 33.22 | 0.85 | 6.74 | 0.97 | 3.03 | 0.80 | 7.88 |

#### Cluster-wise analysis

The catchment attributes and the flow quantiles related to 17 catchment stations of Cluster-1 (Table 4) were used to carry out step-wise regression analysis. The forms of the resulting final MLR models for Cluster-1 are listed in Table 8. From these results, it is immediately apparent that unlike in the previous case of considering a single region (Table 6), in this case, all the catchment attributes except MAX_{e}, have an influence on some or all flow quantiles. The attribute relief (Δ*H*) is a significant predictor for all flow quantiles, and catchment length (*L*) is significant for all except one flow quantile (Table 8). Attributes *S*, *A*, *W*, and FF appear to have a significant effect on only the low-flow quantiles. Also, it is interesting to note that more number of predictor variables are involved in the models for medium-flow quantiles in comparison to the high-flow quantiles and that the least number of predictors are required for predicting the low-flow quantiles. From a hydrological perspective, the flow characteristics in the majority of catchments within Cluster-1 primarily rely on geometric aspects, such as *S*, *L*, *A*, and *W*. Additionally, the relief factors, particularly MAXe, *S*, and Δ*H*, as well as the areal aspect FF, play a significant role in shaping these flow patterns. Notably, within Cluster-1, a high relief value (Δ*H*), which indicates the overall steepness of the terrain, was observed in the majority of catchments. Furthermore, FF which signifies the intensity of flow, exhibited higher values in six specific stations (Kuniyil, Ozerkheda, Kudlur, Nellithurai, Thengumarahada, and Seedhi). The shapes of these catchments were more rounded/circular leading to concentrated flows. Conversely, FF values in the remaining stations indicated an elongated catchment behavior. The performance statistics for the final MLR models for Cluster-1 are listed in Table 7. Results indicate that grouping catchments into homogeneous regions leads to significant improvement in the performances of the developed MLR models in comparison to using a single region. High values of *R*^{2} for all flow quantiles and more so for the *Q*_{30}, *Q*_{40,} and *Q*_{50} quantiles were obtained indicating excellent predictive capabilities of the developed MLR models. Also, significantly lower values of RMSE were obtained across all flow quantiles (Table 7).

Flow quantile . | Constant . | Regression coefficients for predictor variable . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MAX_{e} (km)
. | MIN_{e} (km)
. | ΔH (km)
. | ΔH/P
. | S
. | A (km^{2})
. | P (km)
. | L (km)
. | W (km)
. | L_{p} (km)
. | D_{d} (km/km^{2})
. | FF . | SF . | R_{c}
. | R_{L}
. | ||

Q_{10} | +480 | = | −252.4 | +226.5 | = | −9,177.3 | +0.5 | = | −8.8 | −40.8 | = | = | +1,006.9 | = | −730.1 | = |

Q_{20} | +280.4 | = | −138.3 | +137.3 | = | −5,439.4 | +0.3 | = | −5.2 | −24.3 | = | = | +592.7 | = | −445.4 | = |

Q_{30} | +15.6 | = | −44.5 | +65.2 | +28,452.1 | −9,378.4 | +0.1 | +1.9 | −7.6 | −18.9 | = | = | +466.8 | = | = | = |

Q_{40} | −16.5 | = | −17.2 | +33.4 | +17,715.1 | −5,535.7 | +0.1 | +1.2 | −4.4 | −11.7 | = | +8.3 | +280.3 | = | = | +29.3 |

Q_{50} | −35.3 | = | −11.7 | +19.2 | +6,133.9 | −2,103.7 | +0.03 | +0.4 | −2.2 | −5.9 | +0.6 | +4.5 | +105.9 | = | = | +93.3 |

Q_{60} | −24.2 | = | −9.9 | +11.1 | = | −302.9 | +0.01 | = | −0.8 | −2.3 | +0.6 | = | +21.1 | = | −20.5 | +89.1 |

Q_{70} | −38.8 | = | −5.6 | +3 | = | = | = | = | −0.7 | −0.9 | +0.6 | −3.7 | = | +1.8 | −21.7 | +87.1 |

Q_{80} | −0.28 | = | = | +1.69 | = | = | = | = | = | = | = | −1.95 | = | = | = | = |

Q_{90} | +1.4 | = | = | +3.2 | = | −87.1 | = | −0.03 | −0.1 | = | +0.1 | −1.2 | = | = | −21.1 | +11.6 |

Flow quantile . | Constant . | Regression coefficients for predictor variable . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MAX_{e} (km)
. | MIN_{e} (km)
. | ΔH (km)
. | ΔH/P
. | S
. | A (km^{2})
. | P (km)
. | L (km)
. | W (km)
. | L_{p} (km)
. | D_{d} (km/km^{2})
. | FF . | SF . | R_{c}
. | R_{L}
. | ||

Q_{10} | +480 | = | −252.4 | +226.5 | = | −9,177.3 | +0.5 | = | −8.8 | −40.8 | = | = | +1,006.9 | = | −730.1 | = |

Q_{20} | +280.4 | = | −138.3 | +137.3 | = | −5,439.4 | +0.3 | = | −5.2 | −24.3 | = | = | +592.7 | = | −445.4 | = |

Q_{30} | +15.6 | = | −44.5 | +65.2 | +28,452.1 | −9,378.4 | +0.1 | +1.9 | −7.6 | −18.9 | = | = | +466.8 | = | = | = |

Q_{40} | −16.5 | = | −17.2 | +33.4 | +17,715.1 | −5,535.7 | +0.1 | +1.2 | −4.4 | −11.7 | = | +8.3 | +280.3 | = | = | +29.3 |

Q_{50} | −35.3 | = | −11.7 | +19.2 | +6,133.9 | −2,103.7 | +0.03 | +0.4 | −2.2 | −5.9 | +0.6 | +4.5 | +105.9 | = | = | +93.3 |

Q_{60} | −24.2 | = | −9.9 | +11.1 | = | −302.9 | +0.01 | = | −0.8 | −2.3 | +0.6 | = | +21.1 | = | −20.5 | +89.1 |

Q_{70} | −38.8 | = | −5.6 | +3 | = | = | = | = | −0.7 | −0.9 | +0.6 | −3.7 | = | +1.8 | −21.7 | +87.1 |

Q_{80} | −0.28 | = | = | +1.69 | = | = | = | = | = | = | = | −1.95 | = | = | = | = |

Q_{90} | +1.4 | = | = | +3.2 | = | −87.1 | = | −0.03 | −0.1 | = | +0.1 | −1.2 | = | = | −21.1 | +11.6 |

The final forms of the MLR models derived using step-wise regression for catchments in Cluster-2 are listed in Table 9. In this case, it can be seen that 4 (MIN_{e}, Δ*H*, *S*, and SF) out of the potential 15 catchment attributes considered as predictors are significant in the models of all the flow quantiles. Attributes FF, *R*_{c,} and *R*_{L} too are significant for all but a few flow quantiles. The catchment area (*A*) and length of the basin (*L*) are significant for the high-flow and low-flow quantiles, respectively. Overall, the number of significant predictors for this cluster is smaller than that for Cluster-1 but as in the earlier case, models do not seem to be more parsimonious for low flows. Hydrologically, the flows in the majority of catchments within Cluster-2 rely on geometric aspect *L*, relief aspects MINe, *S*, and Δ*H*, and areal aspect FF, SF, *R*_{c,} and *R*_{L}. Similar to the preceding cluster, most catchments in this cluster exhibit a moderate relief value (Δ*H*). Moreover, four stations (Santeguli, Yennehole, Bantwal, and Haladi) display moderate values of form factor and elongation ratio, lesser values of shape factor, and circulatory ratio, indicating that these catchments have moderately elongated shapes and are associated with a low-flow response. Among all the cases considered, grouping catchments into Cluster-2 yielded the most accurate MLR models for all flow quantiles as indicated by the results of performance analysis shown in Table 7. Extremely high values of *R*^{2} (0.96–0.98) and extremely low values of RMSE (0.47–9.09 m^{3}/s) were recorded for the derived MLR models (Table 7).

Flow quantile . | Constant . | Regression coefficients for predictor variable . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MAX_{e} (km)
. | MIN_{e} (km)
. | ΔH (km)
. | ΔH/P
. | S
. | A (km^{2})
. | P (km)
. | L (km)
. | W (km)
. | L_{p} (km)
. | D_{d} (km/km^{2})
. | FF . | SF . | R_{c}
. | R_{L}
. | ||

Q_{10} | −1,008.4 | = | −4,213.5 | −121.9 | = | +9,197.1 | +0.4 | = | = | = | = | = | +596.6 | +115.5 | = | +543.6 |

Q_{20} | −418.6 | = | −2,858 | −87.6 | = | +5,007.3 | +0.2 | = | = | = | = | = | +160.4 | +50.3 | = | +298.4 |

Q_{30} | −186.4 | = | −2,693.4 | −82.2 | = | +3,836.7 | +0.1 | = | = | = | = | = | +49.8 | +28.7 | +270 | = |

Q_{40} | −93.3 | = | −2,267.1 | −134 | = | +4,306.9 | +0.04 | = | +2.6 | = | = | = | = | +7.4 | +86.3 | = |

Q_{50} | −110.2 | = | −2,471.5 | −150.6 | = | +5,353.6 | = | = | +3.5 | = | = | = | +50 | +7.9 | +108.4 | −96.6 |

Q_{60} | −183.9 | = | −2,502.8 | −156.9 | = | +6,238.9 | −0.02 | = | +4.4 | = | = | = | +111.3 | +10.5 | = | −68.8 |

Q_{70} | −143.4 | = | −2,253.5 | −107.4 | = | +4,768.8 | = | = | +2.4 | = | = | = | +93.1 | +14.6 | +187.5 | −145 |

Q_{80} | −117.3 | = | −1,806.9 | −87.5 | = | +3,963.4 | = | = | +1.9 | = | = | = | +76.63 | +11.8 | +146.8 | −122.1 |

Q_{90} | −93.3 | = | −1,444.9 | −70 | = | +3,164 | = | = | +1.5 | = | = | = | +59.1 | +9.3 | +115.5 | −95.1 |

Flow quantile . | Constant . | Regression coefficients for predictor variable . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MAX_{e} (km)
. | MIN_{e} (km)
. | ΔH (km)
. | ΔH/P
. | S
. | A (km^{2})
. | P (km)
. | L (km)
. | W (km)
. | L_{p} (km)
. | D_{d} (km/km^{2})
. | FF . | SF . | R_{c}
. | R_{L}
. | ||

Q_{10} | −1,008.4 | = | −4,213.5 | −121.9 | = | +9,197.1 | +0.4 | = | = | = | = | = | +596.6 | +115.5 | = | +543.6 |

Q_{20} | −418.6 | = | −2,858 | −87.6 | = | +5,007.3 | +0.2 | = | = | = | = | = | +160.4 | +50.3 | = | +298.4 |

Q_{30} | −186.4 | = | −2,693.4 | −82.2 | = | +3,836.7 | +0.1 | = | = | = | = | = | +49.8 | +28.7 | +270 | = |

Q_{40} | −93.3 | = | −2,267.1 | −134 | = | +4,306.9 | +0.04 | = | +2.6 | = | = | = | = | +7.4 | +86.3 | = |

Q_{50} | −110.2 | = | −2,471.5 | −150.6 | = | +5,353.6 | = | = | +3.5 | = | = | = | +50 | +7.9 | +108.4 | −96.6 |

Q_{60} | −183.9 | = | −2,502.8 | −156.9 | = | +6,238.9 | −0.02 | = | +4.4 | = | = | = | +111.3 | +10.5 | = | −68.8 |

Q_{70} | −143.4 | = | −2,253.5 | −107.4 | = | +4,768.8 | = | = | +2.4 | = | = | = | +93.1 | +14.6 | +187.5 | −145 |

Q_{80} | −117.3 | = | −1,806.9 | −87.5 | = | +3,963.4 | = | = | +1.9 | = | = | = | +76.63 | +11.8 | +146.8 | −122.1 |

Q_{90} | −93.3 | = | −1,444.9 | −70 | = | +3,164 | = | = | +1.5 | = | = | = | +59.1 | +9.3 | +115.5 | −95.1 |

Derived MLR models for Cluster-3 catchments are listed in Table 10. It is evident that in this case all 15 potential catchment attributes are involved in one or more models for the 9 flow quantiles. In particular, attributes MAX_{e}, MIN_{e}, *A*, and *W* turn out to be significant predictors for all flow quantiles, and attributes Δ*H*/*P*, *P* and *L*_{p} are important in all but one of the flow quantile models (Table 10). Among the remaining attributes, *L* and FF are significant for the high-flow quantiles and DD and *R*_{c} for low-flow quantiles. In terms of hydrology, the flow patterns in most catchments within Cluster-3 are mainly influenced by geometric aspects, including *A*, *P*, *W*, and *L*_{p}. Furthermore, the relief factors, particularly MAX_{e}, MIN_{e}, and Δ*H*/*P*, along with the areal aspect DD, play a crucial role in shaping these flow patterns. Notably, Cluster-3 exhibits relatively lower values of relative relief and drainage density, indicating an elongated catchment characterized by highly permeable soils and a coarse drainage texture. Performance statistics of the MLR models for Cluster-3 shown in Table 7 indicate that although the accuracies of the derived models are not as good as for the other two clusters, the performances still are far superior to the case of the single region case. Values of *R*^{2} in the range of 0.76–0.85 and RMSE values between 0.55 and 26.64 m^{3}/s for the derived MLR models are indicative of good performances.

Flow quantile . | Constant . | Regression coefficients for the predictor variable . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MAX_{e} (km)
. | MIN_{e} (km)
. | ΔH (km)
. | ΔH/P
. | S
. | A (km^{2})
. | P (km)
. | L (km)
. | W (km)
. | L_{p} (km)
. | D_{d} (km/km^{2})
. | FF . | SF . | R_{c}
. | R_{L}
. | ||

Q_{10} | +298.8 | −184 | +103.5 | = | +78,998.7 | = | +0.2 | = | −3.6 | −23.4 | +1.1 | = | +765.8 | = | −869.3 | = |

Q_{20} | +35.7 | −151.2 | +101 | = | +70,198.6 | = | +0.1 | +0.4 | −1.6 | −15.6 | = | −14.9 | +435.2 | = | = | = |

Q_{30} | +63.4 | −118.2 | +99.2 | = | +53,307.6 | = | +0.1 | +0.2 | = | −5.4 | −0.8 | = | +208.6 | = | = | −184 |

Q_{40} | +46.5 | −64.8 | +54.8 | = | +31,833.4 | = | +0.02 | +0.1 | = | −0.8 | −0.6 | = | = | = | = | −110.2 |

Q_{50} | −6.6 | −49.7 | +37.9 | = | +25,301 | = | +0.01 | +0.1 | = | −1.2 | −0.3 | −5.8 | = | = | = | = |

Q_{60} | −24.6 | −37.8 | +28.1 | = | +19,403.3 | = | +0.006 | +0.1 | = | −1.1 | −0.2 | −5.2 | = | = | +88.3 | = |

Q_{70} | −17.3 | −25.9 | +19.2 | = | +13,390.9 | = | +0.004 | +0.1 | = | −0.8 | −0.1 | −3.9 | = | = | +62.5 | = |

Q_{80} | −4.1 | −12.2 | +9.4 | = | +6,284.3 | = | +0.004 | +0.1 | = | −0.7 | −0.1 | −1.7 | +19.3 | = | +53 | −22.4 |

Q_{90} | −4.9 | −4.6 | +2.8 | = | = | +535.8 | +0.002 | +0.03 | = | −0.4 | −0.04 | −0.8 | = | = | +32.9 | = |

Flow quantile . | Constant . | Regression coefficients for the predictor variable . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MAX_{e} (km)
. | MIN_{e} (km)
. | ΔH (km)
. | ΔH/P
. | S
. | A (km^{2})
. | P (km)
. | L (km)
. | W (km)
. | L_{p} (km)
. | D_{d} (km/km^{2})
. | FF . | SF . | R_{c}
. | R_{L}
. | ||

Q_{10} | +298.8 | −184 | +103.5 | = | +78,998.7 | = | +0.2 | = | −3.6 | −23.4 | +1.1 | = | +765.8 | = | −869.3 | = |

Q_{20} | +35.7 | −151.2 | +101 | = | +70,198.6 | = | +0.1 | +0.4 | −1.6 | −15.6 | = | −14.9 | +435.2 | = | = | = |

Q_{30} | +63.4 | −118.2 | +99.2 | = | +53,307.6 | = | +0.1 | +0.2 | = | −5.4 | −0.8 | = | +208.6 | = | = | −184 |

Q_{40} | +46.5 | −64.8 | +54.8 | = | +31,833.4 | = | +0.02 | +0.1 | = | −0.8 | −0.6 | = | = | = | = | −110.2 |

Q_{50} | −6.6 | −49.7 | +37.9 | = | +25,301 | = | +0.01 | +0.1 | = | −1.2 | −0.3 | −5.8 | = | = | = | = |

Q_{60} | −24.6 | −37.8 | +28.1 | = | +19,403.3 | = | +0.006 | +0.1 | = | −1.1 | −0.2 | −5.2 | = | = | +88.3 | = |

Q_{70} | −17.3 | −25.9 | +19.2 | = | +13,390.9 | = | +0.004 | +0.1 | = | −0.8 | −0.1 | −3.9 | = | = | +62.5 | = |

Q_{80} | −4.1 | −12.2 | +9.4 | = | +6,284.3 | = | +0.004 | +0.1 | = | −0.7 | −0.1 | −1.7 | +19.3 | = | +53 | −22.4 |

Q_{90} | −4.9 | −4.6 | +2.8 | = | = | +535.8 | +0.002 | +0.03 | = | −0.4 | −0.04 | −0.8 | = | = | +32.9 | = |

### Jackknife cross-validation

The jackknife cross-validation process described in Article 3.4 was implemented to check the reliability of the flow quantile MLR models developed for the three clusters. The flow quantile estimate from the validation test was compared with the observed values and performance was evaluated in terms of *R*^{2} (Equation (12)), RMSE (Equation (13)), and PBIAS (Equation (14)).

*R*

^{2}values reveals that in Cluster-1, high values are recorded for the median flow quantiles, reasonably better values for high flow quantiles, and low values for the low-flow quantiles. This implies that the MLR models developed for this cluster (Table 8) may be considered reliable for the prediction of medium to high-flow quantiles in ungauged basins. Examination of results for individual stations revealed that the unsatisfactory performance for low-flow quantiles was due to overprediction of

*Q*

_{70}and

*Q*

_{90}quantiles at six stations, namely Nanipalsan, Kudlur, Thoppur, Seedhi, Ambasamudram, and Salur. Conversely, underprediction of these quantiles was seen at four stations (Naguleru, Thengumarhada, Kashipatnam, and Pedagedada). For Cluster-2, the reliability of the models in terms of

*R*

^{2}values is reasonably high for high-flow quantiles (except

*Q*

_{20}) but poor for the low-flow quantiles (Figure 5). Performance was unsatisfactory due to the under-prediction of low-flow quantiles at four stations, namely, Yennehole (

*Q*

_{30},

*Q*

_{50,}and

*Q*

_{70}), Bantwal (

*Q*

_{10},

*Q*

_{30,}and

*Q*

_{60}), Adoor (

*Q*

_{60}), and Haladi (

*Q*

_{60}–

*Q*

_{90}) stations. Overprediction of low flows was observed at Yennehole (

*Q*

_{60}), Adoor (

*Q*

_{30}and

*Q*

_{70}), Kidangoor (

*Q*

_{10}and

*Q*

_{20}) and Karathodu (

*Q*

_{10}and

*Q*

_{30}) stations. In Cluster-3,

*R*

^{2}values in the range 0.4–0.5 were recorded for high flow quantiles but were extremely low for all other flow quantiles. The poor performance of the model is due to over prediction at stations – Talikoti (

*Q*

_{60}–

*Q*

_{90}), Navalgund (all flow quantiles), Halia (all quantiles), K.M.Vadi (

*Q*

_{10}), Bendrehalli (all quantiles), Gogenakal (all quantiles), T.Bekuppe (

*Q*

_{10}–

*Q*

_{20}) and Gadlapet(

*Q*

_{10}–

*Q*

_{20}). Conversely, under-prediction was observed at stations Mahuwa (

*Q*

_{10}–

*Q*

_{20}), Kellodu (

*Q*

_{10}–

*Q*

_{30}), Balehonnur (all quantiles), E-Mangalam (

*Q*

_{10}–

*Q*

_{50}), Amabal (all quantiles), Cherribeda (

*Q*

_{10}), and Sonarpal (all quantiles).

Values of RMSE on the other hand indicate higher reliabilities in the jackknife validation. For instance, Figure 5 shows that in Cluster-1, low values of RMSE are evident for all flow quantiles except *Q*_{10} and *Q*_{20}. For the MLR models in Cluster-2, RMSE values are high for high-flow quantiles, moderate for median flow quantiles, and low for low-flow quantiles. Low values of RMSE were recorded for all flow quantiles (except *Q*_{10}) for Cluster-3. The PBIAS values shown in Figure 5 indicate that they were less than 50% for all flow quantiles in all three clusters except for *Q*_{60} in Cluster-2.

Overall, the Jackknife cross-validation procedure provided a mixed response regarding the reliability of the MLR models for the regionalization process. The regression models designed for all three clusters performed very well for high-flow quantiles but were unsatisfactory in predicting the low-flow quantiles. Such weakening of the model for low-flow quantiles indicates the possibilities of uncertainties introduced in the low-flow values on account of zero flows. Further, leaving out one station at a time during Jackknife cross-validation might affect the model statistics if a highly influential attribute (comparatively higher area, elevation, width, length, etc.) is left out during analysis. A study by Arsenault & Brissette (2016) indicated that the poor performance of regionalization is due to uncertainties in data measurement or incorrect selection of catchment attributes. Hence it is essential to review these stations' information for errors, uncertainties, and other discrepancies for applying appropriate corrections.

## CONCLUSIONS

The present study was taken up to implement hydrologic regionalization and develop simple regression models for flow quantiles in ungauged basins located in Southern Peninsular India. For this purpose, historical flow records of 50 largely unregulated catchments located in the region were used. Period-of-record FDCs for each of the catchments were developed and a total of nine flow quantiles (*Q*_{10}–*Q*_{90}) were extracted by interpolation for each catchment. Also, a database of 15 catchment attributes and average annual rainfall values were extracted from the DEMs of catchments and gridded rainfall data. To enable effective regionalization, a hierarchical agglomerative cluster analysis was implemented using Ward's linkage method, and the study area was delineated into three homogeneous clusters. Cluster-1 had 17 catchments, followed by 11 catchments in Cluster-2 and 22 catchments in Cluster-3. All three clusters were found to be homogenous without any discordant stations as per the CV test and L-Discordancy measure using the L-Moment ratio.

As the next step in regionalization, MLR models relating each flow quantile (response variable) to the catchment attributes (predictor variables) were developed. A step-wise regression technique was used to arrive at the final forms of the MLR models containing only the most significant predictor variables. Initially, MLR models were developed considering all 50 catchments to be within a single region and subsequently for each of the three clusters by considering catchments located within them. Performances of the developed MLR models were evaluated using the coefficient of determination (*R*^{2}), RMSE, and percentage bias (PBIAS) statistics. Models developed for the clusters performed quite well with average *R*^{2} values for nine flow quantiles being 0.85 for Cluster-1, 0.97 for Cluster-2, and 0.80 for Cluster-3. In contrast, considering all 50 catchments a single group was unsatisfactory in predicting the flow quantiles with an average *R*^{2} of 0.23. These results demonstrate the critical need to delineate catchments into homogeneous groups and hierarchical cluster analysis proved to be an efficient technique for doing this.

A jackknife cross-validation technique, which was adopted to check the reliability of the MLR models, revealed a mixed response for different flow quantiles. Very good to satisfactory performance was recorded for high-flow quantiles but was found to be unsatisfactory for low-flow quantiles in all three clusters.

Overall results of this study demonstrate that the use of hierarchical cluster analysis along with largely unregulated historical flow records for a large number of catchments and a variety of catchment attributes can result in the development of models for the prediction of FDCs in ungauged catchments which are very accurate and reasonably reliable.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Journal of Hydrology: Regional Studies*

*Journal of Hydrologic Engineering*

Journal of Hydrologic Engineering