## Abstract

The hydrological model calibration is a challenging task, especially in ungauged catchments. The regionalization calibration methods can be used to estimate the parameters of the model in ungauged sub-catchments. In this article, the model of ungauged sub-catchments is calibrated by a regionalization approach based on automatic clustering. Under the clustering procedure, gauged and ungauged sub-catchments are grouped based on their physical characteristics and similarity. The optimal number of clusters is determined using an automatic differential evolution algorithm-based clustering. Considering obtained five clusters, the value of the silhouette measure is equal to 0.56, which is an acceptable value for goodness of clustering. The calibration process is conducted according to minimizing errors in simulated peak flow and total flow volume. The Storm Water Management Model is applied to calibrate a set of 53 sub-catchments in the Gorganrood river basin. Comparing graphically and statistically simulated and observed runoff values and also calculating the value of the silhouette coefficient demonstrate that the proposed methodology is a promising approach for hydrological model calibration in ungauged catchments.

## HIGHLIGHTS

The model of ungauged sub-catchments is calibrated by a regionalization approach based on automatic clustering.

The optimal number of clusters is determined using an automatic differential evolution algorithm-based clustering.

Comparing graphically and statistically simulated and observed runoff values and also calculating the value of silhouette coefficient proved the superiority of automatic clustering differential evolution in clustering.

## INTRODUCTION

The hydrological models, as a theoretical representation of the real hydrologic cycle, can be employed for the prediction of surface runoff caused by heavy rainfalls (Bemmoussat *et al.* 2021). Based on the results of hydrological models, the time, location, and scale of floods can be estimated (Xin *et al.* 2019), and this valuable information can be used for flood risk management in general and for flood forecasting and warning systems, in particular. Applying the hydrological models in flood control projects generally requires the estimation of model parameters through calibration and validation with observed data (Madsen 2003; Valipour 2017). The parameters of hydrological models are tuned based on the physical characteristics of the basin, the available data, and information. Their identification is essential for direct runoff hydrograph estimation (Bardossy 2007; Ghumman *et al.* 2011). There are many cases where the estimate of hydrological parameters is required for locations for which streamflow data have not been measured (Burn & Boorman 1993). In the ungauged catchments, a small number of observations or even the absence of observations of key variables that influence hydrological processes limits the applicability of rainfall–runoff models. Early attempts have simply used the parameter values derived for neighboring catchments where streamflow data are available (Mosley 1983; Vandewiele & Elias 1995; Wagener *et al.* 2007). According to a survey of the literature, the clustering technique appears to be a promising research tool for prediction in ungauged basins, model parameterization, and catchment development and management. Clustering means the act of partitioning a dataset into groups called clusters, such that similar objects are placed in the same group (Pratima & Nimmakanti 2012). The clustering approach provides parameters transferring from gauged to similar ungauged catchments. Similarity can be defined based on physical properties like slope and land use (Oudin *et al.* 2008; Samaniego *et al.* 2010; Samuel *et al.* 2011) and geomorphological attributes of the catchments such as catchment's form (Merz & Blöschl 2004; Heuvelmans *et al.* 2006; Wagener *et al.* 2007; Bastola *et al.* 2008). This approach, the effectiveness of which depends on the choice of attributes for the similarity of catchments, identifies a donor catchment that is most similar to an ungauged and then transfers a parameter set from the donor to the corresponding ungauged catchment for hydrological modeling. Therefore, the rainfall–runoff simulation model can be carried out in ungauged catchments using the parameters of the gauged catchment with hydrologically similar attributes (Arsenault & Brissette 2016). This would avoid the need to calibrate the model everywhere, saving time and computer resources and enabling model use in ungauged catchments (Singh *et al.* 2016). Different clustering approaches have been applied to catchment regionalization on long-term and short-term time scales including monthly (Kahya & Demirel 2007; Sharghi *et al.* 2018; Pagliero *et al.* 2019; Zamoum & Souag-Gamane 2019), daily (Ding & Haberlandt 2017; He & Molkenthin 2021; Mosavi *et al.* 2021), and hourly (Sengupta *et al.* 2018) with the regionalization target of the hydrological model parameter, observed streamflow, as well as hydrological signature.

In most previous studies, the number of clusters was defined based on researchers’ knowledge about the study region, the results of studies conducted in that area, or short data availability which is not necessarily optimal. The number of clusters used in grouping data has a substantial effect on the results of regionalization. However, this fact was not considered by the researchers in clustering-based methodologies previously proposed for model regionalization in ungauged catchments.

The objective of this article is to present a methodology for the calibration of hydrological models in ungauged catchments based on automatic clustering. The proposed calibration method is used for the parameter setting of a short-term flood forecasting model in basins with low data availability. Short-term flood forecasting model calibration has not been studied in previous works. For this purpose, EPA Storm Water Management Model (SWMM5) is used as the rainfall–runoff simulation model in a basin with several gauged and ungauged sub-catchments, and an automatic clustering differential evolution (ACDE) algorithm was developed to regionalize the characteristics of gauged sub-catchments to ungauged areas. The possibility of using the proposed method for calibrating the short-term rainfall–runoff models in basins with low data is one of the most important advantages of the developed method. As another advantage, the optimal number of clusters can be automatically determined. Although the proposed method works well, there are challenges in its implementation. Determining the criteria of subcategory similarities for clustering is one of the challenges of this study. To overcome this problem, this study proposed an automatic clustering method based on optimization to find the optimal number of clusters. The similarity indices were finally defined based on the available physical and geological information and engineering judgments. Here, the clustering techniques are used for model calibration. If the sub-catchments grouped in a cluster are all ungauged, it would not be possible to determine the calibration parameters in that cluster and therefore develop a rainfall–runoff model with fully calibrated sub-catchments. To overcome this challenge, a constraint is defined in the optimization problem so that each obtained cluster includes at least one gauged sub-catchment. The applicability of the proposed methodology is investigated in the Gorganrood river basin, in northern Iran, an area prone to devastating flash floods (Omidvar & Khodaei 2008). The details of the proposed methodology are demonstrated in Section 2.

## METHODOLOGY

### Data preparation

In a catchment, there are various flood events in which rainfall and runoff data are usually recorded in several sub-catchments. In some cases, there may be several sub-catchments where no data had been recorded. To resolve the problem of lack of observational data in ungauged sub-catchments, various spatial interpolation techniques can be employed (Chen & Liu 2012). Geographical and non-geographical statistics, nearest neighbor, Thiessen polygons, inverse distance weighting (IDW), and the Kriging method (Lam 1983; Jeffrey *et al.* 2001; Yeh *et al.* 2011) are some of the most popular techniques (Yakowitz & Karlsson 1987; Dirks *et al.* 1998; Chen & Liu 2012). In this study, the IDW technique, as a widely interpolation method in hydrology (Robinson & Metternicht 2006), is applied to calculate areal rainfall in all sub-catchments. Furthermore, since the sub-catchments that are similar in physical characteristics have similar hydrological behavior (Oudin *et al.* 2010), the physical characteristics of all sub-catchments were collected from the relevant maps and documents. The physical characteristics include slope (derived from the digital elevation model), land use (extracted from the land-use map produced by available multispectral satellite data), soil type (obtained from data based on soil survey classifications), geological nature data (produced by the geological map that is constructed from observations), shape factor (calculated by the following formula: the sub-catchment width divided by the maximum overland flow length), and Gravilious coefficient (defined as the ratio of the basin perimeter to the hypothetical circular perimeter whose area is equal to the area of the catchment). This information is essential in classifying measured and unmeasured sub-catchments.

### SWMM, a rainfall–runoff simulation model

The EPA SWMM5 is a dynamic rainfall–runoff simulation model first developed in 1971 and widely used for a single-event and long-term (continuous) simulation of the rainfall–runoff process (Rossman 2010). Multiple abilities in this model make it appropriate to simulate various hydrological characteristics of the catchment, such as surface runoff, transport through the streams, storage, and receiving water effects (Huber *et al.* 1988; Wang & Altunkaynak 2012).

SWMM is a widely used open-source model available for planning, analysis, and design related to stormwater runoff in urban as well as nonurban drainage systems (Kim *et al.* 2015). The major advantage of SWMM is its capability to incorporate both hydrological and hydraulic models. This model can present water depth by numerically solving Saint-Venant equations in rivers and floodplains, which makes it suitable for flood studies. Moreover, a calibrated and validated SWMM model is much easier to use for accurate flood prediction as it requires only minimal meteorological data like temperature and rainfall (Rai *et al.* 2017). Furthermore, unlike the widely used conceptual Hydrologic Engineering Center - Hydrologic Modeling System (HEC-HMS) model, by using the SWMM model for flood routing in the river, flood depth is obtained at different points. Although the SWMM model was developed for urban applications, it is successfully used to model nonurban areas and natural watersheds by researchers (Jang *et al.* 2007; Jun *et al.* 2010; Chung *et al.* 2011).

In the SWMM model, there are various methods to represent the hydrological processes of a catchment including, rainfall losses, surface runoff hydrographs, flow routing, and groundwater flow. Here, the Soil Conservation Service (SCS) method has been applied to calculate precipitation losses due to its simplicity and fewer required parameters. The runoff hydrograph estimation of the sub-catchments in this model is done physically based on the continuity equation and Manning formula over time. The computational runoff hydrograph of sub-catchments in this model can be routed inside the rivers by solving kinematic or dynamic waves. In this research, routing by the dynamic wave method has been used due to its higher accuracy.

#### Sensitivity analysis and calibration

Sensitivity analysis is performed to establish the most sensitive parameters affecting the amount of surface runoff. This is done by varying the tested parameter over a practical range while keeping the rest of the parameters unchanged (Zaghloul 1983).

Hydrologic models are typically used in stormwater applications to estimate the discharge characteristics of a catchment, based on the contributing rainfall and the hydrologic attributes of the catchment. The characteristics of a catchment are represented in a model by parameters and relationships that are used to estimate the response of the catchment. Most models have several parameters that can be directly measured or observed (e.g. area of the catchment, percent imperviousness) as well as parameters that are not directly observable (e.g. parameters that define groundwater flow). Parameters that are not directly observable must be estimated using indirect techniques of matching the model output with historical measured data. The process of adjusting model parameters is called calibration or parameter estimation (Gupta *et al.* 1999).

SWMM requires tuning a set of parameters to sufficiently express the complex relationship between rainfall and surface runoff. Model parameters involve impervious percentage, roughness coefficient of impervious surfaces, curve number (CN), and roughness coefficient of pervious surfaces in sub-catchments. A sensitivity analysis with a local method (Saltelli 2002) is performed to obtain information about collinearity among parameters and a ranking of influence. The less-effective parameters are given a fixed value, and for the remaining ones, calibration is performed through the trial-and-error method.

### Automatic clustering technique based on DE

Clustering is known as one of the most challenging problems in machine learning, due to its unknown structural characteristics. Also, clustering can be formally considered a particular kind of optimization problem for the regionalization of catchments (Acreman & Sinclair 1986; Lin & Chen 2006; Ramachandra Rao & Srinivas 2006; Sadri & Burn 2011; Ahani & Mousavi Nadoushani 2016; Ahani *et al.* 2020). The two most widely studied clustering algorithms are partitional and hierarchical clustering. Partitional clustering algorithms require a set of initial clusters that are then improved iteratively while there is no need to specify the initial cluster centers for hierarchical clustering algorithms (Reddy & Vinzamuri 2018). It is one of the major advantages of the hierarchical algorithms for the regionalization of the catchments (Ahani *et al.* 2020).

In many previous studies, the number of clusters is either assumed to be known or specified by the user (MacQueen 1967; Tu *et al.* 2014; Zhang *et al.* 2017), while determining the appropriate number of clusters in a clustering problem is a challenging issue that requires sufficient prior knowledge (Salvador & Chan 2004). To address this problem, numerous heuristic approaches have been suggested over the years. A popular approach to solving clustering problems is applying evolutionary algorithms (EAs). EAs are widely believed to be effective in clustering problems to provide near-optimal solutions in a reasonable time (Hruschka *et al.* 2009).

In some previous studies, EAs have been used to solve clustering problems with a known or predefined number of clusters (Bezdek *et al.* 1994; Fränti *et al.* 1997; Maulik & Bandyopadhyay 2000; Bandyopadhyay & Maulik 2002; Lu *et al.* 2004; Sheng & Liu 2004; Ma *et al.* 2006; Nayak *et al.* 2016; Hancer *et al.* 2020), whereas fewer studies addressed the EAs to optimize the number of clusters (*k*) and the corresponding partitions (Hruschka & Ebecken 2003; Liu *et al.* 2004; Hruschka *et al.* 2006; Lensen *et al.* 2017; Viloria & Lezama 2019).

Clustering methods have been applied in hydrological regionalization to identify similar catchments (Ramachandra Rao & Srinivas 2006), classify the flow regime of streams (Moliere *et al.* 2009), define types of waters in terms of water quality (McNeil *et al.* 2005; Panda *et al.* 2006), group climate stations toward determination of climatic regions (Raju & Kumar 2007), and determine similar hydrological behavior concerning simulated water balances of hydrological response units (Bormann *et al.* 1999). In many studies, the number of clusters has been predetermined or selected through an iterative process. In real-world clustering problems, determining the optimal number of clusters is a challenge. Therefore, automatic clustering approaches are recommended due to their flexibility and effectiveness.

As an alternative approach, automatic clustering can be used in which metaheuristic algorithms are utilized to find the global optimum of the clustering problem (Bhattacharya & Guojun 2003; Das *et al.* 2009; Hu *et al.* 2012; Maier *et al.* 2019). The application of automatic clustering methods in various fields of water engineering such as drainage pipe monitoring points design (Guo *et al.* 2018), groundwater remediation systems development (Vali *et al.* 2021), and hydrograph clustering (Wunsch *et al.* 2022) is expanding, but has not been reported in the calibration of ungauged catchments. In this study, the optimal number of clusters is determined using an automatic clustering optimization technique that has not been seen in previous works. In contrast to most of the existing clustering techniques, the proposed algorithm requires no prior knowledge of the data to be classified. Rather, it determines the optimal number of clusters on the run. This approach was first presented by Das *et al.* (2009) for image processing and recognition. In this study, the approach extends to hydrological application.

Correctness of clustering can be evaluated quantitatively with a global validity index like the Davies–Bouldin's measure (Davies & Bouldin 1979) or CS measure (Chou *et al.* 2004). The theory of automatic clustering used in this study is borrowed from the study by Das *et al.* (2008) and is done as follows:

*d*-dimensional space that is to be clustered. To achieve this goal, an optimization problem is defined. The cluster centers and the number of clusters are decision variables in the optimization problem and are obtained using automatic clustering. The objective function is formulated as follows:where

In the aforementioned equations, CS is a validity measure for automatic clustering, is the maximum number of clusters, represents the optimal obtained clusters, and the cluster centroid is obtained by averaging the data of each cluster. The Euclidean distance between each point and its cluster center is denoted by . As can be seen, CS is a function of the ratio of the sum of within-cluster scatter to between-cluster separation. It focuses on increasing the cluster separation and increasing cohesion (decreasing the distance between members) in each cluster. The minimum value of CS indicates the appropriate partitions.

This is a nonlinear and nonconvex optimization problem and can be solved by one of the EAs. Here, the differential evolution (DE) algorithm is chosen because DE has emerged as one of the fast, robust, and efficient global search EAs of current interest (Das *et al.* 2008).

#### Optimization solver

DE (Storn & Price 1995) is a population-based metaheuristic search algorithm proposed to solve unconstrained global optimization problems by iteratively improving a candidate solution based on an evolutionary process. The advantage of DE compared to GA is its strong global search operator (Hegerty *et al.* 2009). Concerning the classical optimization methods, the weakness of DE like many other EAs is local search capabilities (Alizadeh *et al.* 2018). DE is composed of five steps: initialization, mutation, crossover, selection, and repetition. These steps are briefly summarized as follows (Storn & Price 1997):

Initialization: Generate the initial population from candidate responses:

Mutation: Generate a trial solution *y*, where *i* is the index of the decision variable, and *a*, *b*, and *c* are randomly selected from the population for each member in the population.

Selection: Select the value with the smaller objective function value.

Repetition: Repeat steps 2–4 until the desired convergence criterion is reached.

#### Clustering validation

Cluster validation is a technique to evaluate the goodness of clustering algorithm results. There are three classes of cluster validation: internal, external, and relative cluster validation (Halkidi *et al.* 2001). Different types of indices are used to evaluate different types of clustering problems, and index selection depends on the kind of available data. Furthermore, external indices require previous knowledge to assess the results of a clustering problem, whereas internal indices do not (Rendón *et al.* 2011). Silhouette coefficient (Rousseeuw 1987), as a popular internal cluster validation index, is a measurement of how similar the points are to their clusters (cohesion) compared to other clusters (separation). This measure is described as follows:

*k*clusters by a clustering method. For data point in the cluster , the mean distance between

*i*and all other data points in the same cluster is defined as follows:where is the number of points belonging to cluster

*i*and is the distance between data points

*i*and

*j*in the cluster . The mean dissimilarity of point

*i*in the cluster to any other cluster (), of which

*i*is not a member is defined as as follows:here, a silhouette (value) of one data point

*i*is defined as follows:and

The silhouette is the mean of all output values *s*(*i*) and it ranges from −1 to +1, where the value of −1 means that clusters have been assigned in the wrong way. A score of 1 indicates that the object is well matched to its cluster and poorly matched to neighboring clusters. The silhouette score reaching closer to 1 implies that the clustering configuration is appropriate.

## STUDY AREA

The Gorganrood catchment is located in the north-east of Iran and lies between the latitudes of 36° 30′ and 37° 50′ N and the longitudes of 54° 5′ and 56° 30′ E. The total area of the catchment is 11,380 km^{2}. About 59.5% of the Gorganrood catchment is mountainous areas and 40.5% is foothills and plains. Sixty-seven percent of the Golestan province's yearly surface water, i.e. 828 million cubic meters, flows in this catchment. Topographically, the Gorganrood catchment has a mountainous area and flat land. The surface elevation of the area ranges between –30 and 3,662 m. The mean annual precipitation in the study area is 615 mm (Rahmati *et al.* 2016), which is about three times the average annual rainfall in Iran. The main land-use types are water body, range, irrigated agriculture, dryland agriculture, low forest, and dense forest. The upper part of the catchment is mainly covered by forest and pasture, and the lower part is the area under cultivation (Rahmati *et al.* 2016).

The mainstream length of the Gorganrood river catchment is 350 km, which flows in the general east-west direction in the north of the Gorgan–Gonbad plain. Numerous branches, which are mostly in the south-north direction, join the mainstream and finally flow into the Gulf of Gorgan. The geological maps of the basin show that about 65% of the sub-catchments have high permeability, and the permeability of the rest is average.

Hydrometric gauge . | Calibration event . | Duration (h) . | Validation event . | Duration (h) . |
---|---|---|---|---|

Khormaloo | 10/20/2011 | 24 | 9/6/2012 | 24 |

Arazkooseh | 2/2/2012 | 61 | 5/14/2007 | 72 |

Baghesalian | 6/20/2012 | 67 | 10/23/2011 | 96 |

Tangrah | 8/10/2001 | 76 | 8/9/2005 | 36 |

Hajighoshan | 10/8/2005 | 55 | 2/22/2010 | 18 |

Galikesh | 8/9/2005 | 24 | 4/16/2016 | 27 |

Ghochmaz | 8/25/2011 | 36 | 2/22/2010 | 12 |

Tamar | 8/9/2005 | 24 | 9/18/2004 | 39 |

Hydrometric gauge . | Calibration event . | Duration (h) . | Validation event . | Duration (h) . |
---|---|---|---|---|

Khormaloo | 10/20/2011 | 24 | 9/6/2012 | 24 |

Arazkooseh | 2/2/2012 | 61 | 5/14/2007 | 72 |

Baghesalian | 6/20/2012 | 67 | 10/23/2011 | 96 |

Tangrah | 8/10/2001 | 76 | 8/9/2005 | 36 |

Hajighoshan | 10/8/2005 | 55 | 2/22/2010 | 18 |

Galikesh | 8/9/2005 | 24 | 4/16/2016 | 27 |

Ghochmaz | 8/25/2011 | 36 | 2/22/2010 | 12 |

Tamar | 8/9/2005 | 24 | 9/18/2004 | 39 |

## RESULTS

*N*values for the impervious and pervious areas of the sub-catchment (N-impervious and N-pervious, respectively) for the flood events observed in eight hydrometric gauges. The calibration parameters were tuned within their defined practical range. To evaluate the calibration efficiency, the obtained calibrated parameters have been served in the model to simulate other recorded flood events as the validation task. To assess the calibrated parameters, simulated hydrographs are compared with observed ones. Comparing both graphically and statistically confirms the calibration results. Observed and simulated hydrographs in the calibration and verification processes are depicted in Figures 5 and 6, respectively. To compare the simulated and observed hydrographs, statistical metrics including R, root-mean-square error (RMSE), Nash-Sutcliffe efficiency (NSE), and mean absolute error (MAE) were considered. These metrics and their values are tabulated in Table 3 for the calibration and verification datasets, respectively. The ideal value for R and NSE is one and for the rest, is zero.

Parameter of sensitivity analysis . | Initial value . | Variation range . | The maximum increase in peak flood discharge (%) . | The maximum decrease in peak flood discharge (%) . | Difference between the previous two columns . | Sensitivity analysis rank based on the previous column . |
---|---|---|---|---|---|---|

CN | 70 | 55–90 | 4.55 | −1.82 | 6.37 | 3 |

%Impervious | 25 | 5–35 | 20.45 | −81 | 101.45 | 1 |

Dstore-pervious | 0.05 | 0.01–0.15 | 0.05 | 0.45 | 0.4 | 5 |

N-pervious | 0.1 | 0.03–0.2 | −0.45 | 0.1 | 0.55 | 4 |

Slope | Dependent on each catchment | −50 to 200% | 0.18 | 0.54 | 0.36 | 6 |

N-impervious | 0.01 | 0.005–0.02 | 6.82 | −9.09 | 15.91 | 2 |

Parameter of sensitivity analysis . | Initial value . | Variation range . | The maximum increase in peak flood discharge (%) . | The maximum decrease in peak flood discharge (%) . | Difference between the previous two columns . | Sensitivity analysis rank based on the previous column . |
---|---|---|---|---|---|---|

CN | 70 | 55–90 | 4.55 | −1.82 | 6.37 | 3 |

%Impervious | 25 | 5–35 | 20.45 | −81 | 101.45 | 1 |

Dstore-pervious | 0.05 | 0.01–0.15 | 0.05 | 0.45 | 0.4 | 5 |

N-pervious | 0.1 | 0.03–0.2 | −0.45 | 0.1 | 0.55 | 4 |

Slope | Dependent on each catchment | −50 to 200% | 0.18 | 0.54 | 0.36 | 6 |

N-impervious | 0.01 | 0.005–0.02 | 6.82 | −9.09 | 15.91 | 2 |

Criteria . | Formulation . | Calibration . | Verification . |
---|---|---|---|

Correlation coefficient () | 0.86 | 0.84 | |

RMSE | 44.15 | 19.81 | |

NSE | 0.59 | 0.35 | |

MAE | 26.78 | 12.84 |

Criteria . | Formulation . | Calibration . | Verification . |
---|---|---|---|

Correlation coefficient () | 0.86 | 0.84 | |

RMSE | 44.15 | 19.81 | |

NSE | 0.59 | 0.35 | |

MAE | 26.78 | 12.84 |

According to Figures 5 and 6, a graphical comparison between observed and simulated flood hydrographs at eight hydrometric gauges in various flood events indicates that flood hydrographs are relatively well simulated by the model, especially for the peak discharge. In this study, depending on the specific model application, the simulation of peak discharge has been considered as the main objective function of calibration that takes priority over the flood volume simulation. Therefore, obtaining the general hydrograph shape is acceptable as an estimate of the discharge volume simulation. It is worth mentioning that in some verification events, such as 14 May 2007 (Arazkooseh gauge), 23 October 2011 (Baghesalian gauge), and 16 April 2016 (Galikesh gauge), the difference between simulated and observed flood volume is related to the existence of ruptures in some parts of recorded hyetograph and incomplete precipitation data in these events. Generally, according to Table 4, the results of model calibration and validation confirm the satisfactory accuracy of model calibration.

Cluster . | Number of gauged sub-catchments . | Number of ungauged sub-catchments . | SWMM model parameters for ungauged sub-catchments . | |||
---|---|---|---|---|---|---|

CN . | %Imp . | N-pervious . | N-impervious . | |||

Cluster 1 | 1 | 1 | 57 | 12 | 0.03 | 0.02 |

Cluster 2 | 1 | 2 | 64 | 19 | 0.06 | 0.01 |

Cluster 3 | 5 | 19 | 58.8 | 11.02 | 0.064 | 0.018 |

Cluster 4 | 15 | 6 | 59.47 | 14.73 | 0.11 | 0.01 |

Cluster 5 | 2 | 1 | 57.5 | 6.1 | 0.13 | 0.02 |

Cluster . | Number of gauged sub-catchments . | Number of ungauged sub-catchments . | SWMM model parameters for ungauged sub-catchments . | |||
---|---|---|---|---|---|---|

CN . | %Imp . | N-pervious . | N-impervious . | |||

Cluster 1 | 1 | 1 | 57 | 12 | 0.03 | 0.02 |

Cluster 2 | 1 | 2 | 64 | 19 | 0.06 | 0.01 |

Cluster 3 | 5 | 19 | 58.8 | 11.02 | 0.064 | 0.018 |

Cluster 4 | 15 | 6 | 59.47 | 14.73 | 0.11 | 0.01 |

Cluster 5 | 2 | 1 | 57.5 | 6.1 | 0.13 | 0.02 |

There are 53 sub-catchments in the Gorganrood catchment. Parameters of 24 gauged sub-catchments from the total sub-catchments have been obtained during the calibration procedure by simulating flood hydrographs in the mentioned eight hydrometric stations.

Parameters of the other 29 ungauged sub-catchments have been estimated based on the characteristics of similarly calibrated sub-catchments through the application of the ACDE algorithm. By implementing this algorithm, five clusters were obtained. The number of gauged and ungauged sub-catchments of each cluster is given in Table 4. In Figure 6, similar sub-catchments have been depicted in a colorful map of the whole catchment. The model parameters for ungauged sub-catchments are calculated by averaging the calibrated parameters of gauged sub-catchments in their clusters.

As mentioned in Section 2, the silhouette coefficient has been considered to assess the goodness of clustering. Considering obtained five clusters, the value of the silhouette measure is equal to 0.56. According to the silhouette ranges (from −1 to +1), 0.54 is an acceptable value that indicates the goodness of clustering.

Cluster . | Slope (degree) . | %Imp . | CN . | N-perv . | N-imp . |
---|---|---|---|---|---|

Cluster 1 | 17–21 | 12–12 | 75–75 | 0.1–0.1 | 0.01–0.01 |

Cluster 2 | 5–17 | 19–19 | 64–64 | 0.06–0.06 | 0.01–0.01 |

Cluster 3 | 1–13 | 5.1–13 | 56–77 | 0.03–0.2 | 0.01–0.01 |

Cluster 4 | 10–23 | 5–28 | 55–65 | 0.04–0.2 | 0.005–0.03 |

Cluster 5 | 36–44 | 5.2–7 | 57–58 | 0.07–0.19 | 0.01–0.03 |

Cluster . | Slope (degree) . | %Imp . | CN . | N-perv . | N-imp . |
---|---|---|---|---|---|

Cluster 1 | 17–21 | 12–12 | 75–75 | 0.1–0.1 | 0.01–0.01 |

Cluster 2 | 5–17 | 19–19 | 64–64 | 0.06–0.06 | 0.01–0.01 |

Cluster 3 | 1–13 | 5.1–13 | 56–77 | 0.03–0.2 | 0.01–0.01 |

Cluster 4 | 10–23 | 5–28 | 55–65 | 0.04–0.2 | 0.005–0.03 |

Cluster 5 | 36–44 | 5.2–7 | 57–58 | 0.07–0.19 | 0.01–0.03 |

Cluster . | Slope (degree) . | %Imp . | CN . | N-perv . | N-imp . |
---|---|---|---|---|---|

Cluster 1 | 19–2 | 12–0 | 75–0 | 0.1–0 | 0.01–0 |

Cluster 2 | 10–6 | 19–0 | 64–0 | 0.06–0 | 0.01–0 |

Cluster 3 | 6–4 | 10.42–1.35 | 68–3 | 0.07–0.03 | 0.01–0 |

Cluster 4 | 17–4 | 14.7–6.5 | 59–3 | 0.11–0.05 | 0.01–0.005 |

Cluster 5 | 28–5 | 6.1–0.9 | 57–0.5 | 0.13–0.06 | 0.02–0.01 |

Cluster . | Slope (degree) . | %Imp . | CN . | N-perv . | N-imp . |
---|---|---|---|---|---|

Cluster 1 | 19–2 | 12–0 | 75–0 | 0.1–0 | 0.01–0 |

Cluster 2 | 10–6 | 19–0 | 64–0 | 0.06–0 | 0.01–0 |

Cluster 3 | 6–4 | 10.42–1.35 | 68–3 | 0.07–0.03 | 0.01–0 |

Cluster 4 | 17–4 | 14.7–6.5 | 59–3 | 0.11–0.05 | 0.01–0.005 |

Cluster 5 | 28–5 | 6.1–0.9 | 57–0.5 | 0.13–0.06 | 0.02–0.01 |

As presented in Table 5, the range of changes in each variable is limited. For example, the range of the CN in adjacent clusters 3 and 4 is 67–88 and 55–56, respectively. As another example, the slope of clusters 2 and 3 is in the range of 18–21 and 1–18 degrees, respectively. The closeness of the CN values of the sub-basins located in each cluster shows the physical similarity of the sub-basins located in each cluster. Also, according to Table 6, the standard deviation values of physical and geological characteristics in obtained clusters are low in comparison to the mean that indicates the relative similarity of the sub-catchments located in obtained clusters.

## CONCLUSION

The aim of this article is the calibration of the SWMM model in ungauged sub-catchments in a flood-prone catchment, in Iran. A sensitivity analysis showed that impervious percentage, roughness coefficient of impervious surfaces, CN, and roughness coefficient of pervious surfaces are the most effective parameters affecting total runoff and peak discharge. A regionalization approach based on clustering provided a promising tool for the calibration of ungauged sub-catchments. The clustering approach provided parameters transferring from gauged to similar ungauged sub-catchments. In contrast to most of the existing clustering techniques, the optimal number of clusters was determined using an automatic clustering optimization technique that has not been seen in previous works. ACDE method determined both the optimal number of clusters and their members. The correctness of ACDE clustering was evaluated in comparison with the well-known k-means clustering algorithm, and the global validity index proved the superiority of ACDE in clustering and parameter estimation for hydrological model calibration. The proposed calibration method is a useful approach for the parameter setting of a short-term flood forecasting model in basins with low data availability. In this study, clustering has been done by the optimization method, but the application of experimental clustering methods based on the researcher's knowledge and their comparison can be considered for future studies. Furthermore, a combination of various clustering techniques and developing a hybrid clustering approach might give better results than the other algorithms. This approach can also be followed in future research lines. In this research, due to the lack of meteorological information, the meteorological properties of the sub-catchment were not used as the selected features of clustering. It is suggested that this weakness be resolved in future studies by applying the meteorological properties of the basin like rainfall and temperature or humidity in clustering for hydrological modeling.

Also, for various physical properties and geomorphological attributes of the sub-catchments, similar weights were considered. However, depending on the study area, available data, and expert knowledge, these features might be of varying importance. Therefore, another suggestion for future research is to involve different weighting factors in the proposed automatic clustering approach to improve the effectiveness of the approach compared to the other methods. Extending the precipitation monitoring network and using adjacent basin observations, and considering other characteristics such as the river network degree, which indicates the hydrological behavior of sub-catchments, are recommended to address in future studies to eliminate data deficiency and uncertainties.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.