Cluster analysis methods are a type of well-known technique for regionalisation of catchments to perform regional flood frequency analysis. In this study, a fuzzy extension of hybrid clustering algorithms is evaluated. Self-organizing feature maps and four hierarchical clustering algorithms were used to provide the initial cluster centres for fuzzy c-means (FCM) algorithm. The hybrid approach was used for regionalisation of catchments in Sefidroud basin based on feature vectors including five catchment attributes: longitude and latitude, drainage area, runoff coefficient and mean annual precipitation. The results showed that according to the values of both the objective function and the cluster validity indices, the performances of FCM algorithm often was improved by using the proposed hybrid approach. Also, it was evident from the results that in the case of minimizing the objective function, the combination of Ward's algorithm and FCM provided best results, but according to the cluster validity indices, other hybrid algorithms such as combinations of single linkage or complete linkage and FCM algorithm presented the most desirable results. In addition, according to the results, there are two well-defined homogeneous regions in Sefidroud basin identified by all the examined hybrid algorithms.

INTRODUCTION

In hydrology, estimation of the average return period of an event with specified magnitude is called frequency analysis (Hosking & Wallis 1997). Flood frequency analysis (FFA) methods are used to relate the magnitude of floods to their return periods (Rao & Srinivas 2008). In at-site FFA, site-specific hydrologic data is used to estimate flood quantiles at the target site. However, in some cases, the at-site FFA may not be useful to provide reliable flood estimates because of paucity of gauging stations and flood data in target sites. The regional flood frequency analysis (RFFA) methods can solve this problem. In these methods, a number of sites with similar flood generating mechanisms forms a group named homogeneous region. After that, a good-fitting frequency distribution is chosen as regional frequency distribution and then, the site-specific frequency distributions and quantile estimates can be provided for the interested sites.

Over the past five decades, several researchers studied the RFFA. One of the earliest and most widely used RFFA methods is the index-flood procedure of Dalrymple (1960). A suitable frequency distribution is fitted to the flood data gathered from all sites of a homogeneous region. This distribution is named regional frequency distribution. Based on a number of researches about ‘regional PWM (probability weighted moments) algorithm’, Hosking & Wallis (1997) concluded that a suitable regional analysis will yield useful and relatively accurate quantile estimates for many realistic applications. Hosking & Wallis (1997) also proposed an RFFA approach based on L-moments (Hosking 1990). The index-flood method and its L-moment based extension are based on an assumption of scale invariance. This assumption means that the frequency distributions of all sites in a homogeneous region are identical and only a site-specific factor named the index flood differs from site to site. However, it is not always a valid assumption and, therefore, Basu & Srinivas (2013) proposed a mathematical approach in L-moment framework to address this issue. Based on their results, the proposed approach shows a better performance than methods based on index-flood approach.

The first main step in RFFA is regionalisation. Different attributes of sites can be used for regionalisation. For a site with n selected attributes, an n-dimensional feature vector is defined to represent the site. Hydrologists proposed several methods for regionalisation, each method has its advantages and limitations. One of the most widely used regionalisation methods is cluster analysis (e.g. Mosley 1981; Tasker 1982; Wiltshire 1986; Bhaskar & O'Connor 1989; Burn 1989; Nathan & McMahon 1990; Hosking & Wallis 1997; Hall & Minns 1999; Burn & Goel 2000; Hall et al. 2002; Jingyi & Hall 2004; Rao & Srinivas 2006a, 2006b; Sadri & Burn 2011; Basu & Srinivas 2015). Cluster analysis is a procedure for grouping the data points based on the similarities between them. Each data point is represented by a feature vector. The objective is to reach the maximum possible similarity between the feature vectors of a group or cluster and also to reach the maximum possible dissimilarity between the feature vectors of different clusters (Rao & Srinivas 2008).

Hard clustering and fuzzy clustering are two main categories of clustering algorithms. In order to form the hard clusters, each feature vector should belong to only one cluster, but in fuzzy clusters each feature vector can be assigned to all the clusters with degrees of membership in the interval [0,1]. Hard clustering algorithms can be categorized as hierarchical algorithms and partitional algorithms. Some of the well-known hierarchical algorithms include: average linkage (Sokal & Michener 1958; Lance & Williams 1967; Sneath & Sokal 1973), complete linkage (Johnson 1967; Lance & Williams 1967; McQuitty 1967; Sneath & Sokal 1973), single linkage (Florek et al. 1951a, 1951b; Sneath 1957, 1966; Jardine et al. 1967; Johnson 1967; Lance & Williams 1967; Gower & Ross 1969; Sneath & Sokal 1973), and Ward's algorithm (Ward 1963). Also K-means (KM), K-medoids and K-modes are some of the most widely used partitional algorithms. Fuzzy clustering algorithms are founded on fuzzy sets (Zadeh 1965) and fuzzy c-means (FCM) algorithm is the most well-known in regionalisation studies (e.g. Hall & Minns 1999; Rao & Srinivas 2006b; Srinivas et al. 2008; Sadri & Burn 2011; Asong et al. 2015; Basu & Srinivas 2015).

In RFFA studies, several researchers used hierarchical algorithms (e.g. Mosley 1981; Tasker 1982; Nathan & McMahon 1990; Burn et al. 1997) and KM algorithm (e.g. Wiltshire 1986; Bhaskar & O'Connor 1989; Burn 1989; Burn & Goel 2000) for regionalisation of catchments in various study areas. Rao & Srinivas (2006a) proposed a hybrid clustering approach for regionalisation of catchments. They introduced three hybrid algorithms to utilize the advantages of both hierarchical and partitional algorithms and also to tackle their limitations. Three hierarchical algorithms including single linkage, complete linkage and Ward's algorithm were used to provide the initial cluster centres for KM algorithms. They applied these algorithms to the watersheds of Indiana in the USA and concluded that the combination of Ward's algorithm and KM algorithm provides the best results according to some cluster validity indices and heterogeneity measures proposed by Hosking & Wallis (1997).

Hall & Minns (1999) assessed the efficiency of an FCM algorithm for regionalisation of 101 gauging stations in two areas in the UK. Also, Rao & Srinivas (2006b) applied an FCM algorithm to regionalisation of catchments of Indiana state in the USA. They used fuzzy cluster validity measures to find the optimum number of regions and suggested that measures related to both degrees of membership and data structure are more efficient than measures only related to degrees of membership. Sadri & Burn (2011) used an FCM algorithm to regional drought frequency analysis in three Canadian provinces. They also used multivariate L-moments (L-comoments) and a bivariate copula method for regional frequency analysis of droughts. Asong et al. (2015) proposed a new approach for identifying homogeneous regions for regionalisation of precipitation characteristics for the Canadian Prairie Provinces. They utilized a combination of principal component analysis, canonical correlation analysis and FCM clustering to form homogeneous precipitation regions. Basu & Srinivas (2015) proposed an approach to solve the problem of loss of information in defuzzification of the fuzzy clusters to form the final regions in RFA. Also, they stated that the threshold strategies may yield quantile under-estimation.

Furthermore, about 30 years ago, Kohonen (1982) introduced the self-organizing feature map (SOFM) or self-organizing map. SOFM is a type of artificial neural network (ANN) that can be used as a clustering technique and therefore some hydrologists used it for regionalisation in RFFA studies (e.g. Hall & Minns 1999; Hall et al. 2002; Jingyi & Hall 2004). Besides RFFA, SOFM has found several applications in improving performances of river flow forecasting models (Tiwari et al. 2013), rainfall–runoff models (Nourani & Parhizkar 2013) and groundwater quality models (Nourani et al. 2015). Also, SOFM has been used in some studies on water distribution systems (e.g. Chang et al. 2011; Mounce et al. 2016).

In the area of RFFA, Hall & Minns (1999) used SOFM and FCM algorithms for regionalisation of catchments in the southwest of England and Wales. The results indicated the feasibility of employing these methodologies on a country-wide basis. Hall et al. (2002) studied the application of data mining techniques for the regionalisation of hydrological variables. They used one dimensional (1-D) SOFM for regionalisation of three data sets from three study areas. Also, Jingyi & Hall (2004) utilized this type of ANN for regionalisation of 86 gauging stations in two provinces in China. Srinivas et al. (2008) presented a two-level SOFM-based clustering approach by combining SOFM and FCM. The results showed that this combination performs better in estimating flood quantiles at ungauged sites. In addition, Di Prinzio et al. (2011), Razavi & Coulibaly (2013), Toth (2013) and Farsadnia et al. (2014) used combinations of SOFM and various statistical methods and different data-driven techniques to RFFA.

In this study, our objective is to evaluate the potential of a hybrid approach to improve the performance of FCM algorithm for regionalisation of catchments. The used approach may reduce the variation in the clustering results because of the random initialization of FCM algorithm. In this approach, four hierarchical clustering algorithms (average linkage, complete linkage, single linkage and Ward's algorithm) and SOFM are used to provide the initial cluster centres for an FCM algorithm. Effects of proposed hybrid algorithms on the performance of FCM are examined by assessing the changes in the values of the objective function, fuzzy cluster validity indices and cluster sizes. In addition, the capabilities of these hybrid algorithms to form the homogeneous regions are evaluated based on the heterogeneity measures.

METHODS

Study area and data sets

The study area is Sefidroud basin located in the north of Iran (Figure 1). The total basin area is about 63,945 km2 and includes portions of ten provinces: Alborz, Ardebil, Azarbaijan-e-gharbi (west Azarbaijan), Azarbaijan-e-sharghi (east Azarbaijan), Ghazvin, Guilan, Hamedan, Kordestan, Mazandaran and Zanjan. Shahroud and Ghezel-ozan rivers join together and constitute Sefidroud river that flows to the Caspian Sea. Thirty-nine gauging stations over the basin were selected for use in RFFA. The annual peak flows or flood data were obtained from the electronic file provided by Iran Water Resources Management Company (WRMC).
Figure 1

Location map of Sefidroud basin and selected hydrometry stations.

Figure 1

Location map of Sefidroud basin and selected hydrometry stations.

The dependence of flood related statistics includes mean value of annual peak flows (MAF), median value of annual peak flows (MEF), mean annual flood per unit area of drainage basin (MAF/A), median annual flood per unit area of drainage basin (MEF/A), mean annual flood divided by the mean annual precipitation (MAF/P), and median annual flood divided by the mean annual precipitation (MEF/P) on the different attributes by calculating the correlations. There were noticeable values of correlation between some of these statistics and three attributes including drainage area, mean annual precipitation and runoff coefficient. Finally, five attributes were selected to form the feature vectors and evaluate dissimilarity (similarity) between the sites for use in the regionalisation. These attributes include two geographic location attributes: longitude and latitude; and three other catchment characteristics including drainage area, mean annual precipitation and runoff coefficient. In addition, the correlation matrix was used to evaluate the independence between three latter attributes. There was not any significant correlation between these three attributes. The statistical characteristics of the considered attributes in interested sites are presented in Table 1. The attributes used to form the feature vectors were obtained from the technical reports provided by Iran WRMC.

Table 1

Descriptive statistics of selected attributes to form feature vectors for use in the regionalisation

Attribute Range Mean Std. Dev. 
Longitude (decimal degree) 47.05–51.07 48.74 1.30 
Latitude (decimal degree) 35.18–37.53 36.55 0.73 
Drainage area (km229–49,300 5,591.38 11,569.86 
Mean annual precipitation (mm) 184–1,400 456.59 315.19 
Runoff coefficient 0.03–1.32 0.49 0.34 
Attribute Range Mean Std. Dev. 
Longitude (decimal degree) 47.05–51.07 48.74 1.30 
Latitude (decimal degree) 35.18–37.53 36.55 0.73 
Drainage area (km229–49,300 5,591.38 11,569.86 
Mean annual precipitation (mm) 184–1,400 456.59 315.19 
Runoff coefficient 0.03–1.32 0.49 0.34 

The logarithmic transformation was applied to the drainage area values and then the selected attributes were standardized because of their different magnitudes and dimensions. For the jth attribute in the ith initial feature vector in an n-dimensional attribute space, the jth element of the ith rescaled feature vector is defined as: 
formula
1
where represents the rescaled value of , and and denote the mean value and the standard deviation of the jth attribute respectively.

Hierarchical clustering

Hierarchical clustering algorithms are multi-stage or nested clustering procedures that can be categorized into two main groups of algorithms: agglomerative algorithms and divisive algorithms. In the regionalisation studies, the agglomerative algorithms have been used more than the divisive algorithms (e.g. Mosley 1981; Tasker 1982; Acreman & Sinclair 1986; Nathan & McMahon 1990; Burn et al. 1997; Hosking & Wallis 1997; Rao & Srinivas 2006a). For an interested set of N data points, the agglomerative algorithms begin with N clusters and each of them includes one specified feature vector. In the first step, the two nearest or most similar clusters are identified and merged. Distance or similarity between any two sites is evaluated by using a distance or dissimilarity measure such as Euclidean distance. The agglomerative clustering continues finding and merging the two nearest clusters in each step until the interested number of clusters is obtained. In contrast, divisive algorithms begin with one cluster including all feature vectors. Then, in the first step the furthest feature vector from the cluster centroid is found and separated to form a distinct cluster. The process continues until the requested number of clusters is formed (Rao & Srinivas 2008).

In this study, four agglomerative clustering algorithms are used to initialize the clusters for FCM algorithm. These algorithms include: average linkage, complete linkage, single linkage and Ward's algorithm. Due to the abundance of available information on the application of these algorithms, this information is not presented in this paper. For details, see Rao & Srinivas (2008).

SOFMs

A SOFM is a type of ANN with special capabilities in unsupervised learning and pattern recognition and therefore can be used as an efficient tool for data clustering. Two types of SOFM were proposed by Willshaw & von der Malsburg (1976) and Kohonen (1982). The latter model has found more applications in scientific researches. Also, several researchers used SOFM in regionalisation studies (Hall & Minns 1999; Hall et al. 2002; Jingyi & Hall 2004; Srinivas et al. 2008; Di Prinzio et al. 2011; Razavi & Coulibaly 2013; Toth 2013; Farsadnia et al. 2014).

The Kohonen network consists of two layers: an input layer including n-dimensional observations × and an output layer (also called Kohonen layer) consisting of c nodes for the c clusters, each of which is associated with an n-dimensional weight (Everitt et al. 2011). The number of dimensions of the output layer does not have a principal limitation, but the one and two dimensional structures are the most widely used structures in regionalisation studies. The structure of SOFM has been described in Haykin (2009) and Everitt et al. (2011) in more detail.

In this study, the R package named ‘kohonen’ developed by Wehrens & Buydens (2007) was used to perform clustering by SOFM.

Fuzzy clustering and FCM algorithm

Fuzzy clustering is a type of clustering method in which each feature vector or data point can be assigned to more than one cluster and even all clusters simultaneously with membership values in the interval [0,1].

The matrix with c rows and N columns represents the fuzzy partitions, where denotes the degree of membership of the kth feature vector, belongs to the ith cluster (). The matrix U is called the fuzzy partition matrix. Conditions for a fuzzy partition matrix are given by: 
formula
2
 
formula
3

The latter condition implies that there can be no empty cluster and there can be no class that contains all the data points (Ross 2009).

Two approaches can be used to defuzzify the fuzzy partition matrix: the maximum membership and the nearest-centre classifier (Ross 2009), but the results of Rao & Srinivas (2006b) show that fuzzy clustering can improve the homogeneity of the formed regions for RFFA.

A fuzzy cluster includes the sites whose memberships in the cluster is greater than a specified threshold value. As suggested by Rao & Srinivas (2008), the value of 1/c is believed to be a reasonable option for the threshold fuzzy membership. The total number of the feature vectors contained in all regions can increase by using fuzzy clusters and so, more flood data can be contained in the regions. Consequently, the reliable flood estimates can be provided with higher return periods.

FCM algorithm was proposed by Bezdek (1981) and because of its similarities to KM algorithm, it is sometimes called fuzzy KM. As with the KM algorithm, FCM needs the number of clusters to be specified as an input parameter to the algorithm, but the objective function of FCM contains an additional parameter named ‘fuzzifier’ (de Oliviera & Pedrycz 2007).

For a data matrix FCM algorithm is based on minimizing a squared error objective function defined as (de Oliviera & Pedrycz 2007): 
formula
4
where is a fuzzy partition matrix of X, is the ith cluster prototype, which has to be determined, and is a squared inner-product distance norm between the kth feature vector and the ith prototype. In the classical FCM algorithm, the prototypes are centres, ; therefore, the distance can be formed as below: 
formula
5
where A is the distance measure (if there is no prior knowledge, A = I), and parameter m is called the ‘fuzziness index’ or ‘fuzzifier’; it is used to control the fuzziness of each feature vector membership, . There is no theoretical basis for the optimal selection of m, and a value of m = 2.0 is often chosen (Bezdek 1981). Also, when FCM is transformed to classical KM.
The FCM algorithm is an iterative process to minimize Equation (4) while updating the cluster centres and the memberships by de Oliviera & Pedrycz (2007): 
formula
6
 
formula
7
when the difference between the updated and the previous U reaches a predefined minimum threshold or the objective function is below a certain tolerance value, the algorithm can be terminated. Also, the maximum number of iterations can be considered as a termination criterion (de Oliviera & Pedrycz 2007).

Fuzzy cluster validity indices

Specifying the number of clusters (c) is an important problem in fuzzy clustering. Selections of a different number of initial clusters result in different clustering partitions. The problem for finding an optimal c is usually called cluster validity (Wang & Zhang 2007).

Compactness and separation are two criteria proposed for clustering evaluation and selection of an optimal clustering scheme. Compactness measures closeness of feature vectors within each cluster. On the other hand, separation indicates how distinct two clusters are. It computes the ‘distance’ between two different clusters (Wang & Zhang 2007).

Several validity indices have been proposed in the literature. These indices can be categorized into two main groups. The first group indices make use only of membership values and have the advantage of being easy to compute. On the other hand, the indices of the second group involve both partition matrix U and the data set structure. Now, it is widely accepted that a better definition of a validity index always considers both partition matrix U and the data set itself (Wang & Zhang 2007). This conclusion also is verified in RFFA by Rao & Srinivas (2006b). In this study partition coefficient (PC) and partition entropy (PE) from the first group and extended Xie and Beni's index and Kwon index from the second group were used to validate clusters.

Bezdek (1974a, 1974b) presented a performance measure named PC. The index was defined as: 
formula
8

The index values range in [1/c,1].

The PE was proposed by Bezdek (1981) as below: 
formula
9
where a is the base of the logarithm. The index is computed for values of c greater than 1 and its values range in .
Pal & Bezdek (1995) modified a validity function proposed by Xie & Beni (1991) with m = 2 as follows: 
formula
10
They stated that a good partition produces a small value for the compactness, and that well-separated s will produce a high value for the separation (Wang & Zhang 2007). Also, Kwon (1998) proposed an extended version of Xie and Beni's index to eliminate its tendency to monotonically decrease when the number of clusters approaches the number of data points by adding a punishing function to the numerator of Xie and Beni's original validity index. The Kwon index was defined as (Kwon 1998): 
formula
11

The second term in the numerator in Equation (11) is an ad hoc punishing function, used to eliminate the decreasing tendency when the number of clusters c becomes very large and close to the number of data points N (Kwon 1998).

In general, optimal partition corresponds to a minimum value of and , while a maximum value of implies an optimal partition.

Computation of an FCM algorithm in this study was performed by using the R package ‘e1071’ developed by Meyer et al. (2014).

Discordancy and heterogeneity measures

Given a group of sites, the aim of screening the data is to identify those sites that are grossly discordant with the group as a whole (Hosking & Wallis 1997). Hosking & Wallis (1997) defined a discordancy measure, D, in terms of the L-moments of the site's data. They suggested critical values of D to identify discordant sites (for , ). If the value of D for a specified site exceeds the critical value, it is considered as a discordant site.

The identification of homogeneous regions is one of the most important stages of RFFA. Heterogeneity (or homogeneity) measures are used to assess whether formed regions are heterogeneous or homogeneous. Hosking & Wallis (1997) introduced three heterogeneity measures denoted by and , to estimate the degree of heterogeneity in a group of sites and to assess whether the sites might reasonably be treated as a homogeneous region. A region is regarded as ‘acceptably homogeneous’ if H < 1, ‘possibly heterogeneous’ if , and ‘definitely heterogeneous’ if (Hosking & Wallis 1997). However, in some studies, a region is considered as ‘possibly homogeneous’ if (e.g. Rao & Srinivas 2006a, 2006b; Wallis et al. 2007; Srinivas et al. 2008).

Because of the abundance of available information about L-moments theory and its application in RFFA, that information is not presented in this paper. More details can be found in Hosking & Wallis (1997).

The R package ‘lmomRFA’ written by Hosking (2014) was used to compute discordancy measure and heterogeneity measures for sites and formed regions.

A hybrid approach for fuzzy clustering

Let X represent a set of N standardized feature vectors in n-dimensional attribute space, each represents one of the N sites. Feature vectors are grouped in c (the interested number of clusters) clusters by using SOFM and four hierarchical clustering algorithms including average linkage, complete linkage, single linkage and Ward's algorithm. The cluster centres provided by different hierarchical algorithms and SOFM are used to specify initial cluster centres for FCM algorithm, separately. Then, an iterative FCM algorithm with the initialized cluster centres is applied to N rescaled feature vectors to form c fuzzy clusters. The iterative algorithm is repeated until a termination criterion is met. This approach can be considered as a fuzzy extension of hybrid clustering algorithms presented by Rao & Srinivas (2006a).

For each specified number of clusters, c, values of the objective function and fuzzy cluster validity indices obtained for each proposed algorithm can be compared with the average corresponding values related to the FCM initialized randomly in order to evaluate the performance of each hybrid algorithm. In fact, these average values are calculated for a reasonable number of performing FCM algorithms with randomly initialized cluster centres. Although specified values of c and m optimal clustering is determined by minimizing the objective function, the minimum value may depend on selecting initial cluster centres. There is no simple and certain way to determine most optimum initial cluster centres. Therefore, if an approach of selecting initial cluster centres can provide a final value of objective function lower than an average value of objective function provided by other options, then it can be used as a reasonable option to perform clustering.

Besides the items mentioned above, cluster or region size and heterogeneity measures can be used to compare performances of the different combinations of the clustering algorithms with each other. The number of sites belonging to each region is an important issue in RFFA. Jakob et al. (1999) suggested the 5 T rule as a rule of thumb. The 5 T rule means that the sum of the observed data of all stations (in station-years) in a region should be at least equal to 5 T, where T is the return period of interest. In addition, approximate homogeneity is sufficient to ensure that regional frequency analysis is much more accurate than at-site analysis (Hosking & Wallis 1997). Also, larger homogeneous regions can provide flood estimates with higher return periods.

R (2014) programming language and its GUI were used to perform the required computations and plotting graphs.

RESULTS AND DISCUSSION

In the initial stage, the discordancy measure D was applied to all the 39 selected sites. According to the values of the measure D, two sites were identified as discordant (D > 3) and data related to them were removed from data sets. Thus, the study was followed on the 37 remained sites. Further, according to the 5 T rule (Jakob et al. 1999), regionalisation of the 37 remaining sites was examined for a range of cluster numbers, .

Five clustering algorithms including average linkage, complete linkage, single linkage, Ward's algorithm, and SOFM were applied to the 37 feature vectors to initialize cluster centres for FCM algorithm. In this paper, these combinations are denoted by ALFCM, CLFCM, SLFCM, WFCM and SOFMFCM, respectively.

The ability of the hybrid approach to minimize FCM objective function is a key point to evaluate its performance. The values of FCM objective function obtained by each hybrid algorithm can be used to compare its performance with other hybrid algorithms. To evaluate the performances of the hybrid algorithms in comparison with FCM initialized randomly, the values of the objective function were used. As mentioned before, for a number of clusters c, an average value of the FCM objective function was obtained in a number of implementations of FCM algorithm initialized randomly. Each implementation included a required number of iterations to form c steady clusters. The computation procedure can be summarized as below:

  • For pth implementation, the objective function value denoted by is calculated.

  • For pth implementation, represents the mean value of the objective function in p implementation. is calculated as below: 
    formula
    12
  • In each step p, a relative difference value between and are calculated and represented by . is defined as follows: 
    formula
    13
  • The process is continued until in six sequent implementation cycles and then the process is terminated and the final value of p is denoted by pf.

Figure 2 shows, for , how varies with incremental changes in values of p. For , values of pf were approximately equal to 12, 250, 810, 670, 530 and 610, respectively. In this paper, the average values of the objective function and the cluster validity indices related to pf implementations of the randomly initialized FCM are denoted by arFCM.
Figure 2

Relative differences of the average of J values for p implementations of FCM with the random initial cluster centers for (a) two clusters, (b) three clusters, (c) four clusters, (d) five clusters, (e) six clusters, and (f) seven clusters.

Figure 2

Relative differences of the average of J values for p implementations of FCM with the random initial cluster centers for (a) two clusters, (b) three clusters, (c) four clusters, (d) five clusters, (e) six clusters, and (f) seven clusters.

FCM objective function values were calculated for and the fuzzifier values with increments equal to 0.1. Values of J of FCM algorithm for mentioned ranges are presented in Figure 3. Generally, the value of J reduces when c increases for a specified value of the fuzzifier m. It also decreases when the value of fuzzifier m increases for a specified number of clusters.
Figure 3

Optimal value of J of FCM obtained by the proposed hybrid clustering algorithms and arFCM for different values of fuzzifier (m) for (a) c = 2, (b) c = 3, (c) c = 4, (d) c = 5, (e) c = 6, and (f) c = 7. The vertical axis shows the values of objective function (J) for the hybrid algorithms, and the mean value of the objective function in pf implementations for arFCM ().

Figure 3

Optimal value of J of FCM obtained by the proposed hybrid clustering algorithms and arFCM for different values of fuzzifier (m) for (a) c = 2, (b) c = 3, (c) c = 4, (d) c = 5, (e) c = 6, and (f) c = 7. The vertical axis shows the values of objective function (J) for the hybrid algorithms, and the mean value of the objective function in pf implementations for arFCM ().

According to Figure 3(a), for c = 2, the values of the objective function J calculated for clusters formed by the five different hybrid algorithms are identical. Also, the mean values of J related to arFCM in the range are equal to those of the hybrid algorithms. In fact, there are two well-separated clusters based on the defined feature vectors and therefore, all the examined approaches and algorithms result in the same clusters and the same values of J.

As seen in Figure 3(b), for c = 3, a noticeable increase in absolute values of slopes of curves related to SLFCM, CLFCM and ALFCM algorithms occurs in . In addition, for c = 4, it is seen in Figure 3(c) that the value of J computed for SOFMFCM algorithm decreases sharply in the range . Also, in this figure, when the value of m increases from 1.1 to 1.2, a fast change is observed in the slope of the graph related to SLFCM. Furthermore, a similar change is observed in Figure 3(d) in the case of SOFMFCM algorithm. These noticeable changes occur because of sudden significant changes in configuration of the clusters provided by the examined clustering algorithms. The effects of the changes in configuration of the clusters on the values of the cluster validity measures can be seen in Figure 4.
Figure 4

Values of cluster validity indices for the proposed hybrid clustering algorithms and arFCM for different values of fuzzifier for (a) c = 2, (b) c = 3, (c) c = 4, (d) c = 5, (e) c = 6, and (f) c = 7.

Figure 4

Values of cluster validity indices for the proposed hybrid clustering algorithms and arFCM for different values of fuzzifier for (a) c = 2, (b) c = 3, (c) c = 4, (d) c = 5, (e) c = 6, and (f) c = 7.

Four cluster validity indices including partition coefficient (VPC), partition entropy (VPE), extended Xie and Beni's index () and Kwon index (VK) were used to validate the clusters formed by the hybrid algorithms and arFCM. The computed values of these indices are seen in Figure 4. It is evident from Figure 4 that VPC shows a monotonic decrease with an increase in the values of fuzzifier, and on the other hand VPE shows a monotonic increase with an increase in the values of m. In fact, both VPC and VPE always suggest m = 1.1 as the optimal value of fuzzifier. This is consistent with the results presented by Rao & Srinivas (2006b). Based on the results related to these two indices, the quality of resulting clusters always decreases with an increase in their fuzziness. In fact, these indices seem to be appropriate to look for well-separated clusters with crisp boundaries. Also, as reported by Rao & Srinivas (2006b), because of lack of a connection to data structure, appropriate clustering according to VPC and VPE, may not result in homogeneous regions. Also, it is seen in Figure 4 that VPC and VPE may not clearly distinguish between performances of the different clustering algorithms. So, it seems to be more reasonable to pay more attention to the two other indices, and Vk, in order to compare performances of the clustering algorithms.

For c = 2, values of the four cluster validity indices obtained by all the examined algorithms are nearly identical, as expected (Figure 4(a)). This result confirms the results related to the variation of the objective function for c = 2. In fact, all the used clustering algorithms form two identical clusters, because there are two potential well-defined clusters based on the feature vectors of the sites.

According to Figure 4, for c = 3, the values of and Vk for SLFCM, CLFCM and ALFCM algorithms in the range are more suitable than that of arFCM, while values of these indices for the other two algorithms are not desirable in the same range. However, in the range , all the considered algorithms present approximately the same results. For c = 4,5,6,7, based on the values of and Vk, all the hybrid algorithms often present more appropriate performances than that of arFCM, even though there are a few exceptions.

In the case of and Vk, it can be seen in Figure 4 that the values of each index for the different clustering algorithms become closer to each other when the values of the fuzzifier increases. On the other hand, the differences between the values of each index for the different clustering algorithms increases with an increase in the number of clusters.

Similar to the results related to the variation of the objective function, some sudden changes are observed in the absolute values of slopes of the graphs related to the cluster validity indices. Generally, these changes increase with an increase in the number of the clusters. Epecially, for c = 7, a considerable number of shifts in the values of and Vk are seen. These changes occur because of noticeable changes in the configuration of clusters formed by these algorithms in the mentioned ranges as seen in Figures 58.
Figure 5

Number of sites belonging to the regions formed by SLFCM algorithm for different values of fuzzifier for .

Figure 5

Number of sites belonging to the regions formed by SLFCM algorithm for different values of fuzzifier for .

Figure 6

Number of sites belonging to the regions formed by CLFCM algorithm for different values of fuzzifier for .

Figure 6

Number of sites belonging to the regions formed by CLFCM algorithm for different values of fuzzifier for .

Figure 7

Number of sites belonging to the regions formed by ALFCM algorithm for different values of fuzzifier for .

Figure 7

Number of sites belonging to the regions formed by ALFCM algorithm for different values of fuzzifier for .

Figure 8

Number of sites belonging to the regions formed by WFCM algorithm for different values of fuzzifier for .

Figure 8

Number of sites belonging to the regions formed by WFCM algorithm for different values of fuzzifier for .

As seen in Table 2, the mean values of J and the cluster validity indices computed for all the hybrid algorithms for over the range are more desirable than that of arFCM (lower values of J and VPE, and VK measures, and higher values of VPC). These results can show the efficiency of the hybrid approach to improve the performance of the FCM algorithm.

Table 2

Mean values of J and cluster validity indices computed for all the examined clustering algorithms for c = 2, 3, … , 7 over the range m ∈ [1.1, 2.5]

Algorithm Objective function (JVPC VPE VXB,m VK 
arFCM 39.175 0.710 0.575 0.294 19.359 
SLFCM 39.111 0.722 0.554 0.225 15.405 
CLFCM 38.636 0.720 0.557 0.230 15.894 
ALFCM 38.707 0.721 0.556 0.226 15.510 
WFCM 37.939 0.716 0.563 0.258 17.256 
SOFMFCM 38.821 0.714 0.568 0.269 17.360 
Algorithm Objective function (JVPC VPE VXB,m VK 
arFCM 39.175 0.710 0.575 0.294 19.359 
SLFCM 39.111 0.722 0.554 0.225 15.405 
CLFCM 38.636 0.720 0.557 0.230 15.894 
ALFCM 38.707 0.721 0.556 0.226 15.510 
WFCM 37.939 0.716 0.563 0.258 17.256 
SOFMFCM 38.821 0.714 0.568 0.269 17.360 

As evidenced by the results shown in Table 3, using each of the hybrid algorithms produced results better than those obtained by arFCM, at least in 62% of applications. In the case of minimizing the objective function, the WFCM algorithm provided the best results.

Table 3

Percentage of points over the range in which each proposed hybrid algorithm showed better performance than arFCM according to values of the objective function and cluster validity indices for

    Measure
 
Algorithm Objective function (JVPC VPE VXB,m VK 
SLFCM 62 87 88 93 92 
CLFCM 72 79 80 97 91 
ALFCM 68 82 86 96 90 
WFCM 84 65 67 81 76 
SOFMFCM 68 62 62 77 76 
    Measure
 
Algorithm Objective function (JVPC VPE VXB,m VK 
SLFCM 62 87 88 93 92 
CLFCM 72 79 80 97 91 
ALFCM 68 82 86 96 90 
WFCM 84 65 67 81 76 
SOFMFCM 68 62 62 77 76 

To specify final regions and identify sites assigned to them, for a specified value of c () a membership threshold 1/c suggested by Rao & Srinivas (2008) was considered. In fact, for each region or cluster, the sites belonging to it with a degree of membership greater than or equal to 1/c were specified as final members of the interested region. In the case of a site assigned to more than one region, its flood quantiles are computed as a weighted mean of its quantiles in different regions that it is assigned to them.

As seen in Figures 59 for c = 2, the total number of sites belonging to all the clusters using each of the hybrid algorithms is equal to 37. This implies that each site is assigned only to one region and no site is assigned to both regions. For c = 2, the value of the threshold is equal to 1/2. In addition, the condition presented in Equation (2) must be satisfied. Therefore, it is very unlikely that a site belongs to the two regions simultaneously when c = 2. In other words, if we use the threshold value 1/c for c = 2, the final results of regionalisation using a c-means algorithm would be very close to the results of the classic KM algorithm.

For c = 3, SLFCM, CLFCM and ALFCM result in an identical regionalisation. On the other hand, the regionalisations provided by WFCM and SOFMFCM are the same, but different from that related to the three former algorithms. For c = 4 and c = 6, the performances of SLFCM, CLFCM, ALFCM and WFCM algorithms in forming the regions are the same. There is only one exception related to SLFCM in m = 1.1 for c = 4. It should be noted that the region or cluster numbers may vary from algorithm to algorithm. As seen in the figures, for c = 4, the configuration of regions formed by SOFMFCM is different from that of other algorithms. For c = 5 only regionalisations performed by CLFCM and WFCM are completely the same over the range and finally, for c = 7, there are fewer similarities between the regionalisations performed by the five hybrid algorithms.

Figure 5 shows some sudden changes in configuration of the regions formed by the SLFCM algorithm for c = 5 and c = 7. Also, it is evident from Figure 9 that configuration of regions formed by an SOFMFCM algorithm changes in some ranges considerably. These changes or shifts correspond to those shown in Figures 3 and 4 related to the values of the objective function and the cluster validity indices.
Figure 9

Number of sites belonging to the regions formed by SOFMFCM algorithm for different values of fuzzifier for .

Figure 9

Number of sites belonging to the regions formed by SOFMFCM algorithm for different values of fuzzifier for .

In general, as evidenced by Figures 59, for each value of c, the total number of sites belonging to all clusters increases with an increase in value of fuzzifier (m), i.e. increase in fuzziness of clusters. This result confirms the results presented by Rao & Srinivas (2006b). In addition, based on the results shown in the figures, SOFMFCM shows a higher ability to increase the total number of sites belonging to all the regions, especially for c = 6 and c = 7.

Descriptive statistics of the sizes of the regions and the number of the singleton clusters obtained by each used algorithm are presented in Table 4. As seen in Table 4, SLFCM and ALFCM algorithms tend to form regions with noticeable differences in their sizes (i.e. a small number of large regions and several small regions). These findings are consistent with the results reported by Rao & Srinivas (2006a). On the other hand, CLFCM, WFCM and SOFMFCM algorithms often form regions with sizes relatively close to each other. Also, they do not tend to form regions including one site (singletons). In general, singleton regions are not appropriate to perform RFFA and therefore the clustering algorithms that tend to form singleton clusters are not reasonable choices for regionalisation.

Table 4

Main statistical characteristics of sizes of the regions formed by the hybrid algorithms

Algorithm Mean Std. dev. Max Min Singletons Total number of sites belonging to all clusters in m = 2.5 for c = 7 
SLFCM 9.595 5.978 26 65 
CLFCM 9.578 5.757 26 65 
ALFCM 9.560 5.799 26 65 
WFCM 9.567 5.044 26 65 
SOFMFCM 9.564 4.851 26 67 
Algorithm Mean Std. dev. Max Min Singletons Total number of sites belonging to all clusters in m = 2.5 for c = 7 
SLFCM 9.595 5.978 26 65 
CLFCM 9.578 5.757 26 65 
ALFCM 9.560 5.799 26 65 
WFCM 9.567 5.044 26 65 
SOFMFCM 9.564 4.851 26 67 

In some cases, it can be seen that the results related to the objective function values and the cluster validity measures of two hybrid algorithms are different, but their final clusters (regions) are the same. This is related to the finalization process. Finalization of the clusters using a membership threshold decreases differences between clusters formed by different algorithms. To understand this concept, consider two feature vectors j and k that for c = 4, their memberships in four clusters are and for i = 1,2,3,4 respectively. After finalization of the clusters using a threshold 1/4, assignments of the feature vectors to both of them are the same ([1,0,0,0]). For this reason, some differences seen between values of J or cluster validity indices computed for the different clustering algorithms are not recognizable in the final clusters formed by them.

Homogeneity of the regions formed by each hybrid algorithm were assessed by the heterogeneity measures and introduced by Hosking & Wallis (1997). According to the results, for c = 2, all the regions formed by all the hybrid algorithms over the range are homogeneous. As mentioned before, for c = 2, there are two well-defined regions that can be identified by all the hybrid algorithms and so, the heterogeneity measures are identical for the regions provided by the different hybrid algorithms. In general, c = 2 can be chosen as the optimum number of the regions in order to partition Sefidroud basin into homogeneous regions for RFFA.

For the regions formed by SLFCM and ALFCM, in many cases the values of the heterogeneity measures are relatively identical, because the regions formed by these two algorithms are similar to each other. In general, for c = 2,3,4 and c = 6,7 there is no noticeable difference between homogeneity of the regions provided by these two algorithms. Also, the regions formed by the WFCM algorithm are often very similar to the regions provided by the CLFCM algorithm. Therefore the computed heterogeneity measures are very close to each other for c = 4, c = 5 and c = 6. However, it should be noted that computations of H measures are performed using simulation of a number of homogeneous regions with specified regional L-moment ratios, and therefore some limited variations may be seen in values of heterogeneity measures computed for a specified region in two different iterative computations.

Table 5 shows the ratio of the homogeneous regions formed by each hybrid algorithm according to each heterogeneity measure H to all the regions provided by it. The first three columns of the table show the ratio of homogeneous regions (H < 1), each based on one of the heterogeneity measures (). The two last columns of the table are related to the assessment of homogeneity of the formed regions based on the values of the three heterogeneity measures (), simultaneously. It is seen from the table that if the regions with H < 1 are considered as ‘acceptably homogeneous’, then CLFCM, WFCM and SOFMFCM algorithms provide a maximum number of homogeneous regions respectively. However, if the criteria H < 2 suggested by Rao & Srinivas (2006a, 2006b), Wallis et al. (2007) and Srinivas et al. (2008) is used to identify ‘possibly homogeneous’ regions, then the percentage of the homogeneous regions provided by each hybrid algorithm grows considerably, especially in the cases of WFCM and SOFMFCM, and all the formed regions can be considered as homogeneous.

Table 5

Ratio of homogeneous regions formed by each hybrid algorithm to all the regions provided by it

    Ratio (%)
 
Algorithm H1 < 1 H2 < 1 H3 < 1 H1 < 1&H2 < 1&H3 < 1 H1 < 2&H2 < 2&H3 < 2 
SLFCM 86.4 74.8 83.7 65.00 98.89 
CLFCM 87.9 76.0 84.2 68.89 98.89 
ALFCM 87.9 74.3 84.0 63.89 98.89 
WFCM 87.9 73.8 87.7 66.11 100 
SOFMFCM 87.2 76.0 90.6 67.78 100 
    Ratio (%)
 
Algorithm H1 < 1 H2 < 1 H3 < 1 H1 < 1&H2 < 1&H3 < 1 H1 < 2&H2 < 2&H3 < 2 
SLFCM 86.4 74.8 83.7 65.00 98.89 
CLFCM 87.9 76.0 84.2 68.89 98.89 
ALFCM 87.9 74.3 84.0 63.89 98.89 
WFCM 87.9 73.8 87.7 66.11 100 
SOFMFCM 87.2 76.0 90.6 67.78 100 

In general, it seems to be appropriate to divide Sefidroud basin into two regions based on the feature vectors related to the sites. As already mentioned, there are two well-defined regions that can be identified by all the examined hybrid algorithms. The regions are shown in Figure 10. These two regions also satisfy the homogeneity conditions completely. In addition, the lower number of regions usually results in forming larger regions so that it can help to provide flood quantiles related to the higher return periods. Although, the increase in the number of regions often causes the increase in the total number of sites belonging to all the regions, it usually results in smaller regions. Therefore, c = 2 is a reasonable choice for regionalisation in Sefidroud basin. For the current case study, there is no difference between performances of the examined hybrid algorithms for c = 2. However, for the other values of c, WFCM, SOFMFCM and CLFCM algorithms generally show better performances in forming medium-size homogeneous regions.
Figure 10

Final assignment of the sites to two optimal regions in Sefidroud basin.

Figure 10

Final assignment of the sites to two optimal regions in Sefidroud basin.

As seen in Figure 10, the first and second formed regions include 26 and 11 sites respectively. Twenty-five sites of the first region are located in the west and middle area of Sefidroud basin on the Ghezel-ozan river subbasin. On the other hand, all the sites of the second region are located in the east area of Sefidroud basin, eight of them on the Shahroud river subbasin, and the three remaining sites around the main reach of Sefidroud river. Besides the geographical contiguity, a look at the feature vector related to the sites of the second region shows that these sites have smaller drainage areas and greater values of the mean annual precipitation than those of the sites of the first region. Also, they often include higher values of the runoff coefficient compared to the sites of the first region.

According to the results obtained in the current study, compared to the non-hybrid approach, the hybrid approach can often improve the performance of a FCM algorithm to form more desirable clusters based on the values of the objective function and the cluster validity indices. Furthermore, while in the non-hybrid approach the initial cluster centres for FCM algorithm must be defined, it is not required in the hybrid approach, because the initial cluster centres used by FCM algorithm are specific feature vectors provided by the hierarchical clustering algorithms or SOFM. In the non-hybrid approach, the initial cluster centres for FCM algorithm are often defined randomly and so, the final formed clusters may change in each implementation of clustering algorithm with a change in the the initial cluster centres. This problem is also solved by using the hybrid approach. This way, while it is not required to specify initial cluster centres because of using the hierarchical clustering algorithms or SOFM, the dynamic structure of FCM algorithm can improve the quality of the final clustering in the hybrid approach. In fact, contrary to the hierarchical clustering algorithms, in FCM algorithm, the data points can move from one cluster to another to minimize the objective function.

CONCLUSIONS

As mentioned before, using fuzzy clusters in regionalisation can increase the total number of feature vectors included in all regions. This causes a growth in data contained in each region. Consequently, reliable flood estimates can be provided with higher return periods. This may be an important motive to use FCM in RFFA studies. On the other hand, specifying initial cluster centres for an FCM algorithm affects the final resulting clusters. In this paper, a hybrid approach was proposed to improve performance of FCM algorithm for regionalisation of catchments. SOFM and four hierarchical algorithms including average linkage, complete linkage, single linkage and Ward's algorithm were used to determine the initial cluster centres for FCM algorithm.

To evaluate the performances of the proposed hybrid algorithms, for each specified number of clusters, c, values of the objective function of FCM, J, and the four fuzzy cluster validity indices obtained for each proposed algorithm were compared with average corresponding values calculated for a reasonable number of performing FCM algorithm with randomly initialized cluster centres. Results showed that according to the values of both objective function and the cluster validity indices, the performances of the hybrid algorithms are often more appropriate than those of randomly initialized FCM. In the case of minimizing the objective function, the combination of Ward's algorithm and FCM provided the best results, but according to cluster validity indices other hybrid algorithms such as SLFCM and CLFCM presented the most desirable results. Also, the results showed that the PC and the PE may be not efficient enough to determine optimal fuzzy clusters.

The comparison of sizes of the regions formed by the hybrid algorithms showed that SLFCM and ALFCM algorithms tend to form the regions with noticeable differences in their sizes (i.e. a small number of large regions and several small regions). On the other hand, CLFCM, WFCM and SOFMFCM algorithms often form regions with sizes relatively close to each other. Also, they do not tend to form regions including one site (singletons). According to these results, it seems to be better to use the algorithms of the latter group (CLFCM, WFCM and SOFMFCM) when our goal is to perform an RFFA in order to estimate flood quantiles for all sites included in an interested area. However, if flood estimation for a specified site is required, then it may be more reasonable to evaluate the performances of the different hybrid clustering algorithms and to choose the one that can provide reliable flood estimations with the largest return period for the interested site.

In this study, regionalisation is performed by using geographical location, physiographical and meteorological attributes. The heterogeneity measures were computed based on flood data and so there may be no direct relation between the performance of clustering algorithm and homogeneity of regions. However, it was seen from the results that the homogeneity of the regions formed by WFCM, SOFMFCM and CLFCM algorithms were at a higher rate than those of SLFCM and ALFCM algorithms.

In the case of Sefidroud basin, as already mentioned, there are two well-defined regions identified by all the examined hybrid algorithms. Also, these two regions are completely homogeneous based on the values of the heterogeneity measures. Further, c = 2 is an appropriate choice for the number of regions because it results in forming larger regions in comparison with those provided by the greater values of c. For the current case study, there is no difference between performances of the examined hybrid algorithms for c = 2. However, in general, for the other values of c, WFCM, SOFMFCM and CLFCM algorithms shows better performances in forming medium-size homogeneous regions.

For future work, it may be very useful to evaluate the effects of feature selection and feature weighting methods on the performance of hybrid clustering algorithms, especially in RFFA studies.

REFERENCES

REFERENCES
Basu
B.
Srinivas
V. V.
2013
Formulation of a mathematical approach to regional frequency analysis
.
Water Resour. Res.
49
,
6810
6833
.
Bezdek
J. C.
1974a
Numerical taxonomy with fuzzy sets
.
J. Math. Biol.
1
,
57
71
.
Bezdek
J. C.
1974b
Cluster validity with fuzzy sets
.
J. Cybernetics
3
,
58
72
.
Bezdek
J. C.
1981
Pattern Recognition with Fuzzy Objective Function Algorithms
.
Plenum Press
,
New York
.
Bhaskar
N. R.
O'Connor
C. A.
1989
Comparison of method of residuals and cluster analysis for flood regionalization
.
J. Water Resour. Plan. Manage.
115
,
793
808
.
Burn
D. H.
1989
Cluster analysis as applied to regional flood frequency
.
J. Water Resour. Plan. Manage.
115
,
567
582
.
Burn
D. H.
Zrinji
Z.
Kowalchuk
M.
1997
Regionalization of catchments for regional flood frequency analysis
.
J. Hydrol.
2
,
76
82
.
Dalrymple
T.
1960
Flood Frequency Analysis
.
Water Supply Paper 1543A
,
US Geol. Surv.
,
Washington
, p.
51
.
de Oliviera
J. V.
Pedrycz
W.
(eds)
2007
Advances in Fuzzy Clustering and its Applications
.
Wiley
,
New York
.
Di Prinzio
M.
Castellarin
A.
Toth
E.
2011
Data-driven catchment classification: application to the pub problem
.
Hydrol. Earth Syst. Sci.
15
,
1921
1935
.
Everitt
B. S.
Landau
S.
Leese
M.
Stahl
D.
2011
Cluster Analysis
.
Wiley
,
New York
.
Farsadnia
F.
Rostami Kamrood
M.
Moghaddam Nia
A.
Modarres
R.
Bray
M. T.
Han
D.
Sadatinejad
J.
2014
Identification of homogeneous regions for regionalization of watersheds by two-level self-organizing feature maps
.
J. Hydrol.
509
,
387
397
.
Florek
K.
Łukaszewicz
J.
Perkal
J.
Steinhaus
H.
Zubrzycki
S.
1951a
Sur la liaison et la division des points d'un ensemble fini
.
Colloq. Math.
3
(
2
),
282
285
.
Florek
K.
Åukaszewicz
J.
Perkal
J.
Zubrzycki
S.
1951b
Taksonomia Wroclawska
.
Przeglad Antropol.
17
,
193
211
.
Gower
J. C.
Ross
G. J. S.
1969
Minimum spanning trees and single linkage cluster analysis
.
Appl. Stat.
18
(
1
),
54
64
.
Hall
M. J.
Minns
A. W.
1999
The classification of hydrologically homogeneous regions
.
Hydrol. Sci. J.
44
,
693
704
.
Hall
M. J.
Minns
A. W.
Ashrafuzzaman
A. K. M.
2002
The application of data mining techniques for the regionalisation of hydrological variables
.
Hydrol. Earth Syst. Sci.
6
,
685
694
.
Haykin
S.
2009
Neural Networks: A Comprehensive Foundation
.
3rd edn
.
Prentice Hall
,
New York
.
Hosking
J. R. M.
1990
L-moments: analysis and estimation of distributions using linear combinations of order statistics
.
J. R. Stat. Soc. B
52
,
105
124
.
Hosking
J. R. M.
2014
Regional frequency analysis using L-moments. R package, version 3.0. Available from: http://CRAN.R-project.org/package=lmomRFA
.
Hosking
J. R. M.
Wallis
J. R.
1997
Regional Frequency Analysis: An Approach Based on L-Moments
.
Cambridge University Press
,
Cambridge
.
Jakob
D.
Reed
D. W.
Robson
A. J.
1999
Choosing a Pooling-Group, Chapter C6. In:
Flood Estimation Handbook, vol. 3, Statistical Procedures for Flood Frequency Estimation
.
Institute of Hydrology
,
Wallingford
,
UK
.
Jardine
C. J.
Jardine
N.
Sibson
R.
1967
The structure and construction of taxonomic hierarchies
.
Math. Biosci.
1
(
2
),
173
179
.
Johnson
S. C.
1967
Hierarchical clustering schemes
.
Psychometrika
32
(
3
),
241
254
.
Kwon
S. H.
1998
Cluster validity index for fuzzy clustering
.
Electron. Lett.
34
,
2176
2177
.
Meyer
D.
Dimitriadou
E.
Hornik
K.
Weingessel
A.
Leisch
F.
2014
e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.6-3. Available from: http://CRAN.R-project.org/package=e1071
.
Mounce
S. R.
Blokker
E. J. M.
Husband
S. P.
Furnass
W. R.
Schaap
P. G.
Boxall
J. B.
2016
Multivariate data mining for estimating the rate of discolouration material accumulation in drinking water distribution systems
.
J. Hydroinform.
18
(
1
),
96
114
.
Pal
N. R.
Bezdek
J. C.
1995
On cluster validity for fuzzy c-means model
.
IEEE Trans. Fuzzy Syst.
3
,
370
379
.
Rao
A. R.
Srinivas
V. V.
2006a
Regionalization of watersheds by hybrid cluster analysis
.
J. Hydrol.
318
,
37
56
.
Rao
A. R.
Srinivas
V. V.
2006b
Regionalization of watersheds by fuzzy cluster analysis
.
J. Hydrol.
318
,
57
79
.
Rao
A. R.
Srinivas
V. V.
2008
Regionalization of Watersheds: An Approach Based on Cluster Analysis
.
Springer
,
The Netherlands
.
R Core Team
2014
R: A Language and Environment for Statistical Computing
.
R Foundation for Statistical Computing
,
Vienna
,
Austria
.
Available from: www.R-project.org/
.
Ross
T. J.
2009
Fuzzy Logic with Engineering Applications
.
John Wiley & Sons
,
New York
.
Sneath
P. H.
1957
The application of computers to taxonomy
.
J. Gen. Microbiol.
17
(
1
),
201
226
.
Sneath
P. H.
1966
A comparison of different clustering methods as applied to randomly-spaced points
.
Class. Soc. Bull.
1
(
2
),
2
.
Sneath
P. H.
Sokal
R. R.
1973
Numerical Taxonomy: The Principles and Practice of Numerical Classification
.
W.H. Freeman and Company
,
San Francisco
.
Sokal
R. R.
Michener
C. D.
1958
A statistical method for evaluating systematic relationships
.
Uni. Kansas Sci. Bull.
38
,
1409
1438
.
Srinivas
V. V.
Tripathi
S.
Rao
A. R.
Govindaraju
R. S.
2008
Regional flood frequency analysis by combining self-organizing feature maps and fuzzy clustering
.
J. Hydrol.
348
,
148
166
.
Tasker
G. D.
1982
Comparing methods of hydrologic regionalization
.
Water Resour. Bull.
18
,
965
970
.
Tiwari
M. K.
Song
K. Y.
Chatterjee
C.
Gupta
M. M.
2013
Improving reliability of river flow forecasting using neural networks, wavelets and self-organising maps
.
J. Hydroinform.
15
(
2
),
486
502
.
Wallis
J. R.
Schaefer
M. G.
Barker
B. L.
Taylor
G. H.
2007
Regional precipitation-frequency analysis and spatial mapping for 24-hour and 2-hour durations for Washington State
.
Hydrol. Earth Syst. Sci.
11
(
1
),
415
442
.
Wang
W.
Zhang
Y.
2007
On fuzzy cluster validity indices
.
Fuzzy Sets Syst.
158
,
2095
2117
.
Ward
J. H.
Jr
1963
Hierarchical grouping to optimize an objective function
.
J. Am. Stat. Assoc.
58
,
236
244
.
Wehrens
R.
Buydens
L. M. C.
2007
Self- and super-organising maps in R: the kohonen package
.
J. Stat. Software
21
(
5
),
1
19
.
Willshaw
D. J.
von der Malsburg
C.
1976
How patterned neural connections can be set up by selforganization
.
Proc. R. Stat. Soc. London B.
194
,
431
445
.
Xie
X. L.
Beni
G.
1991
A validity measure for fuzzy clustering
.
IEEE Trans. Pattern Analysis Machine Intell.
13
,
841
847
.
Zadeh
L. A.
1965
Fuzzy sets
.
Inform. Control
8
,
338
353
.