Lack of streamflow data is one of the main limitations in hydrologic studies. One method of solving this problem is by streamflow regionalization. The identification of hydrologically homogeneous regions is the main and most important stage of regionalization. In this study homogeneous flow regions are identified by fuzzy c-means (FCM) cluster analysis based on morpho-climatic characteristics from streamflow at 208 stream gauges in the Amazon region. The optimal number of clusters in the dataset was identified by applying the PBM validation index, maximized for ten clusters, with a fuzzing parameter of 1.6. The application dataset is best divided into 10 groups. These were well defined and demonstrated the Amazon's hydrologic similarity.

Knowledge of the hydrologic behavior and flows in a river basin is very important for water resource planning and management. Flow rates, calculated from time-series obtained from stream gauges must be quantified. However, such hydrologic information is not always available, perhaps because there are few stream gauges and/or the observation period(s) are short. Observations are valid only at the measurement sites, while water resource project implementation rarely coincides with stream gauge locations.

Streamflow data can be estimated by hydrologic regionalization, involving information transfer between locations, taking advantage of real data from another geographical area with similar characteristics. Regionalization techniques can be applied to such variables as rainfall and streamflow, probability distribution parameters, hydrologic indicators in general, rainfall-runoff model parameters and hydrologic functions, such as flow duration curves. However, for good regionalization, it is necessary to define regions with hydrologically homogeneous behavior – i.e., regions with hydrologic similarity between their physical and climatic characteristics.

Cluster analysis methods generally have a hierarchical or partitioned approach. Hierarchical and non-hierarchical algorithms are widely used to identify homogeneous regions of precipitation (Lin & Chen 2006; Álvarez et al. 2011; Nasseri & Zahraie 2011; Farsadnia et al. 2014; Awan et al. 2015) and/or discharge (Kahya et al. 2008; Srinivas et al. 2008; Rianna et al. 2011; Tsakiris et al. 2011; Dikbas et al. 2012; Latt et al. 2015).

In the partitioned approach, the diffuse fuzzy c-means (FCM) method has been used frequently in hydrology to identify homogeneous streamflow regions (Rao & Srinivas 2006). The FCM generalized algorithm from the K-means algorithm by Bezdek (1981) allows a characteristic vector to belong to more than one cluster, albeit with different pertinence degree rates. Rao & Srinivas (2006) used the diffuse clustering FCM algorithm to determine statistically homogeneous regions in Indiana, USA, for regional flow rate analysis. Sadri & Burn (2011) employed FCM procedures to define homogeneous regions in the Canadian provinces of Alberta, Saskatchewan and Manitoba. The methodology was applied to the hydrologic records from 36 flow monitoring sites, based on bivariate criteria (severity and duration). The authors confirmed the importance of the methodology to delimit homogeneous regions. Satyanarayana & Srinivas (2011) presented an approach based on FCM cluster analysis by which homogeneous rainfall regions in India could be identified with large-scale atmospheric variables, location attributes and rainfall seasonality. In a study on rainfall series classification of 188 rain-gauges installed in Turkey, Dikbas et al. (2013) applied the FCM method and found that 6 was the ideal number of hydrologically homogeneous regions, on the basis of total annual precipitation and its coefficient of variation, and latitude and longitude. Sahin & Cigizoglu (2012) applied cluster analysis methods, including the Ward method, and a combination of the neural and FCM methods, to identify homogeneous climate and precipitation sub-regions in Turkey. With >95% performance, the neuro-fuzzy method proved to be applicable to cluster analysis problems. Nourani & Komasi (2013) used the Integrated Geomorphological Adaptive Neuro-Fuzzy Inference System (IGANFIS) model for rainfall-flow modeling at several stations in the Enguia River Basin, California, USA. Input data were classified in agglomerates (homogeneous groups) by the FCM method to improve the model's efficiency. Goyal & Gupta (2014) identified four homogeneous rainfall regions in northeastern India using fuzzy cluster analysis FCM. Bharath & Srinivas (2015) employed FCM methodology to delimit homogeneous hydro-meteorological regions, with precipitation and temperature as two key variables.

Although the FCM algorithm always reaches convergence, it does not always reach the objective function's global minimum because its results depend strongly on the initialization rates, which are usually assigned at random. Thus, it is necessary to determine which partitions are significant. Partition evaluation is done by applying the cluster validation index with the aim of establishing which partition yields better grouping structure in a dataset. Several clustering validation indexes have been proposed. For example, the PBM index (2004) investigates partitions by evaluating their geometric structure and whether those generated are well defined and separate. It is a maximizing index, or rather, the higher the calculated PBM, the better the quality of the partition generated.

In this study, the FCM method was applied to a database composed of variables that explain the occurrence of flow rates to identify homogeneous streamflow regions in the Amazon region and to validate the partitions of the different PBM index groupings.

Study Area

The study area involves watersheds in the Amazon region, between 5°N and 18°S, and 42°W and 74°W. Within Brazil it includes several states – Acre, Amapá, Amazonas, Mato Grosso, Pará, Roraima, Rondônia, Tocantins and part of Maranhão. It also extends into neighboring countries like French Guiana, Venezuela, Colombia, Peru and Bolivia (Figure 1).

Figure 1

Amazon region river basins, and stream and rainfall gauges.

Figure 1

Amazon region river basins, and stream and rainfall gauges.

Close modal

Data

Streamflow Behavior in a river basin may be influenced by both morphometry and climate. The drainage area, for instance, defines the recharge area limit that supplies the basin. The amount of precipitation contributes to the rivers’ streamflow volumes. River length and the basin perimeters are characteristics related to the basin's shape, and may influence the time of concentration and the maximum rate of streamflow. The data used in this study to identify homogeneous flow regions are the drainage area (A), basin perimeter (Pe), river length (L), mean annual precipitation (P), and average long period flow (Qm). The choice of these variables, among those related to the flow rate, arises from their relatively easy acquisition from current Geographic Information Systems (GIS). Data on rainfall and outflows were retrieved from the Brazilian Water Agency (ANA) database system.

Initially, 208 rainfall and streamflow gauges were selected (Figure 1). They belong to the ANA Hydrological Information System – HIDROWEB (ANA 2016). The stations were chosen because their data distribution was consistent with the historical set for the period 1975 to 2012. Rainfall and streamflow data were stored in electronic spreadsheets, while average annual rainfall and long-term streamflow for each gauge were calculated. The basins’ drainage areas, and length of main-river and perimeter were delimited using GIS with the Brazilian Digital Elevation Model – MDE (Miranda 2005).

The streamflow variables are limited to intervals between 491 and 3,911,283 km2 (A); 134 and 17,231 km (Pe); 36 and 427,531 km (L); 813 and 3,539 mm (P), and 2 and 170,013 m³/s (Qm).

Algorithm FCM

The FCM algorithm was proposed by Dunn (1973) but generalized by Bezdek (1981). It is a multivariate analytical technique that replaces the binary configuration, classical set theory, by intervals of pertinence, so that one element belongs to one or more sets with a certain degree of pertinence between 0 and 1. Because of this distortion, it can be assumed that the results provide more information explaining hydrologic processes than conventional methods.

The FCM algorithm is also iterative, with new centroids and degrees of membership calculated at each iteration. Data partitioning in fuzzy clusters is achieved by minimizing the objective function Jm (Equation (1)), whose task is to verify FCM algorithm convergence. The function depends on the rate of m (fuzzy parameter), which follows the restriction: .
formula
(1)
where is the degree of membership, xk in cluster i and vi.is cluster centroid i.
According to Hall & Minns (1999), the fuzzing parameter controls the level of diffusivity in classification. Thus, for m = 1, the groups have strict limits equivalent to those of the K-means. In other words, as the rate increases, the limits become more diffuse. Ross (1995) indicated rates between 1.25 ≤ m ≤ 2 (Equation (2)) and degrees of pertinence (Equation (3)).
formula
(2)
formula
(3)

In Equation (2), represents the cluster centroid i (i=1,…, c) in iteration t. When centroids were calculated, the degrees of pertinence were obtained from Equation (3).

PBM Validation index

The PBM index proposed by Pakhira et al. (2004) validated the groupings formed by applying the FCM method. It is the product of three factors – see Equation (4). Maximization of the value of the PBM index ensures that the partition has number of well separated and compacted groups.
formula
(4)
where K is the number of groupings and
formula
(5)
such that
formula
(6)
and
formula
(7)
where n is the total number of points in a dataset, is a partition matrix for the data, and Zk is the center of the kth cluster. The objective is to maximize the index to obtain the true number of clusters, i.e., the higher the PBM index, the better the partition. (The PBM is an optimization index.) For the best partition, the algorithm must be executed for several rates of K and that which yields the highest PBM index rate chosen.

The FCM algorithm was implemented to generate a data matrix of 208 characteristic vectors and five independent variables (A, Pe, L, P, and Qm). Fuzzification parameter rates (m), between 1.5 and 2.0, were tested, following Ross (1995), as well as the number of clusters (c), between 2 and 15. The minimum ε = 0.0001 and maximum (tmax) = 200, errors were used as the stopping criterion. Algorithm performance is influenced by various parameters – e.g., m, c, ε, tmax – and even the data matrix order. The optimal number of clusters formed was identified by applying the PBM validation index – see Table 1, where each column has rates corresponding to m ranging between 1.5 and 2.0. Figure 2 shows the graphs for these results. As can be seen, the PBM validation index achieved its maximum for a number of groupings equal to ten (c = 10) and m = 1.6 (fuzzing parameter). In other words, the application dataset is best grouped into 10 groups.

Table 1

Application of the PBM index to FCM algorithm groupings

Number of ClustersPMB-index
m = 1.5m = 1.6m = 1.7m = 1.8m = 1.9m = 2.0
4.65E + 09 4.64E + 09 1.02E + 12 1.01E + 12 1.00E + 12 4.64E + 09 
2.38E + 09 2.82E + 10 2.37E + 09 2.38E + 09 2.55E + 10 1.56E + 11 
1.29E + 11 1.76E + 10 1.48E + 09 4.59E + 10 4.00E + 11 1.61E + 11 
1.01E + 10 3.35E + 10 1.47E + 10 1.10E + 12 3.08E + 11 6.82E + 10 
2.40E + 10 1.99E + 10 1.30E + 10 9.37E + 08 2.34E + 10 9.54E + 09 
1.80E + 08 8.71E + 08 2.12E + 10 6.11E + 09 1.51E + 11 5.65E + 09 
2.40E + 10 6.76E + 09 1.55E + 10 2.90E + 10 6.50E + 08 5.54E + 09 
3.28E + 09 5.80E + 09 5.87E + 08 5.49E + 10 6.51E + 09 7.53E + 09 
10 5.16E + 09 1.32E +12 2.80E + 09 4.99E + 10 2.48E + 09 1.21E + 10 
11 1.48E + 10 1.46E + 09 1.77E + 09 2.13E + 10 5.61E + 08 3.79E + 10 
12 1.44E + 10 3.52E + 10 1.66E + 10 1.01E + 09 9.19E + 09 1.20E + 07 
13 1.21E + 09 1.64E + 10 3.35E + 09 1.52E + 10 7.62E + 09 5.01E + 08 
14 1.25E + 09 5.03E + 09 1.45E + 09 4.13E + 09 9.72E + 08 3.27E + 10 
15 5.64E + 09 4.96E + 09 1.73E + 09 1.26E + 09 1.42E + 10 8.83E + 09 
Number of ClustersPMB-index
m = 1.5m = 1.6m = 1.7m = 1.8m = 1.9m = 2.0
4.65E + 09 4.64E + 09 1.02E + 12 1.01E + 12 1.00E + 12 4.64E + 09 
2.38E + 09 2.82E + 10 2.37E + 09 2.38E + 09 2.55E + 10 1.56E + 11 
1.29E + 11 1.76E + 10 1.48E + 09 4.59E + 10 4.00E + 11 1.61E + 11 
1.01E + 10 3.35E + 10 1.47E + 10 1.10E + 12 3.08E + 11 6.82E + 10 
2.40E + 10 1.99E + 10 1.30E + 10 9.37E + 08 2.34E + 10 9.54E + 09 
1.80E + 08 8.71E + 08 2.12E + 10 6.11E + 09 1.51E + 11 5.65E + 09 
2.40E + 10 6.76E + 09 1.55E + 10 2.90E + 10 6.50E + 08 5.54E + 09 
3.28E + 09 5.80E + 09 5.87E + 08 5.49E + 10 6.51E + 09 7.53E + 09 
10 5.16E + 09 1.32E +12 2.80E + 09 4.99E + 10 2.48E + 09 1.21E + 10 
11 1.48E + 10 1.46E + 09 1.77E + 09 2.13E + 10 5.61E + 08 3.79E + 10 
12 1.44E + 10 3.52E + 10 1.66E + 10 1.01E + 09 9.19E + 09 1.20E + 07 
13 1.21E + 09 1.64E + 10 3.35E + 09 1.52E + 10 7.62E + 09 5.01E + 08 
14 1.25E + 09 5.03E + 09 1.45E + 09 4.13E + 09 9.72E + 08 3.27E + 10 
15 5.64E + 09 4.96E + 09 1.73E + 09 1.26E + 09 1.42E + 10 8.83E + 09 
Figure 2

Application of the PBM index to the FCM algorithm groupings.

Figure 2

Application of the PBM index to the FCM algorithm groupings.

Close modal

The FCM algorithm, c = 10 and m = 1.6, reached the stop condition in 10 iterations (Figure 3). For the first iteration, the objective function jm provided 6.79 × 1012 and the calculated rate for the last iteration was 4.44 × 1010.

Figure 3

Convergence of the objective function for the 10 clusters.

Figure 3

Convergence of the objective function for the 10 clusters.

Close modal

Table 2 is a summary of pertinence degree rates for the 208 streamflow gauges. The groups were formed by assessing the degree of pertinence, that is, the highest degree of pertinence determines to which group an object belongs.

Table 2

Degrees of pertinence of streamflow gauges

ID code Gauge G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 
E1 18250000 Uruará 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
E2 17345000 Base Cachimbo 0.95 0.04 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 
E3 18121006 Barragem Conj. 0.57 0.40 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 
E4 17610000 Creporizão 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
E5 17675000 Jardim Ouro 0.00 0.86 0.00 0.06 0.00 0.00 0.07 0.00 0.00 0.00 
⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ 
E208 17121000 Caiabis 0.87 0.12 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 
ID code Gauge G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 
E1 18250000 Uruará 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
E2 17345000 Base Cachimbo 0.95 0.04 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 
E3 18121006 Barragem Conj. 0.57 0.40 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 
E4 17610000 Creporizão 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
E5 17675000 Jardim Ouro 0.00 0.86 0.00 0.06 0.00 0.00 0.07 0.00 0.00 0.00 
⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ 
E208 17121000 Caiabis 0.87 0.12 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 

Table 3 shows the data distribution for each group. The intervals between the largest and smallest area of each of the groups is given in the drainage area column, whereas a single rate corresponding to each group's average is given in the average flow and average annual precipitation columns.

Table 3

Cluster x data distribution

ClustersNG%Drainage Area (km²)Average Flow (m³/s)Average annual precipitation (mm)
101 48.56 491–17,990 150 1837 
55 26.44 18,394–47,038 925 1883 
21 10.10 51,147–112,186 1824 2027 
10 4.81 133,571–193,372 4646 2343 
2.88 225,424–293,084 13014 2178 
1.92 317,967–367,791 22882 2342 
1.44 456,347–508,733 9083 1829 
2.88 889,201–1,082,709 17661 2162 
0.48 1,402,097 101158 2250 
10 0.48 3,911,283 170013 1778 
Total 208 100  
ClustersNG%Drainage Area (km²)Average Flow (m³/s)Average annual precipitation (mm)
101 48.56 491–17,990 150 1837 
55 26.44 18,394–47,038 925 1883 
21 10.10 51,147–112,186 1824 2027 
10 4.81 133,571–193,372 4646 2343 
2.88 225,424–293,084 13014 2178 
1.92 317,967–367,791 22882 2342 
1.44 456,347–508,733 9083 1829 
2.88 889,201–1,082,709 17661 2162 
0.48 1,402,097 101158 2250 
10 0.48 3,911,283 170013 1778 
Total 208 100  

NG – number of gauges.

The explanatory variable drainage area was that with the greatest significance in cluster formation, so that the areas were presented in ascending order (Table 3). River length and basin perimeter did not follow a singular distribution between the clusters, as occurred for both area and flow. However, they provided mean rates 1,109 and 4,275 km, and 571 and 17,231 km, respectively, in the clusters formed. The groups (homogeneous regions) were determined by the variable distribution for drainage area, mean long period flow, mean annual rainfall, river length and basin perimeter. However, the first two were the most similar within the groups, whereas precipitation was the least affected (Figure 4). The graph in Figure 4 is an estimate of the mean flow rate of a river as a function of the homogeneous region, the drainage area and the average annual rainfall.

Figure 4

Groupings according to the drainage area, mean annual precipitation and average long term flow.

Figure 4

Groupings according to the drainage area, mean annual precipitation and average long term flow.

Close modal

Figure 5 shows the spatial distribution of the hydrologically homogeneous streamflow regions in the Amazon determined by the FCM algorithm. Homogeneous regions 9 and 10 had the largest drainage areas (1,402,097 and 3,911,283 km2, respectively). Region 10 corresponds to the entire basin of the Amazon River and its tributaries the Solimões, Negro and Madeira rivers. It extends beyond the Brazilian border into Venezuela, Colombia, Peru and Bolivia, where the rivers rise. Consequently, the Solimões River receives contributions from tributaries in Peru; the Negro from tributaries rising in Colombia; and the Madeira from Bolivia. It has also been observed that, due to its very extensive area, there are other homogeneous regions within the Amazon Basin. These include the Purus, Tapajós and Madeira river basins. Region 9 is an example as it is entirely within region 10, and corresponds to part of the Solimões River basin, with its main tributaries including the Purus, Juruá and Japurá rivers.

Figure 5

Map of hydrologically homogeneous flow regions.

Figure 5

Map of hydrologically homogeneous flow regions.

Close modal

Geographic contiguity is not necessary to define a hydrologically homogeneous region (Rao & Srinivas 2006). Equally, it is noted that regions 6 to 8 group few stations – about 6% of the total (Table 3) – whereas regions 1 to 5 incorporate most (about 93%). These regions cover most of the Amazon, especially compared to other regions, and reveal the region's hydrologic similarity. Some regions of the Brazilian Amazon (white) were not grouped due to the lack of streamflow data.

The FCM methodology grouped streamflow gauges in homogeneous regions and revealed that the explanatory variable drainage area had the greatest significance in region formation. The optimal number of clusters formed in the dataset was identified by applying the PBM validation index, which was maximized for 10 clusters, with fuzzing parameter 1.6. The 10 regions were well defined, showing the Amazon's general hydrologic similarity. Consequently, as the graph presented is a function of clustering, drainage area, mean flow and annual mean precipitation, simpler reference identification between watersheds and homogeneous regions is provided. If the drainage area, average flow rate and precipitation – easily measured variables – are known, the region to which the river basin belongs can be determined.

Álvarez
,
O. G.
,
Hotait
,
S. N.
&
Sustaita
,
R. F.
2011
Identificación de regions hidrológicas homogéneas mediante análisis multivariado (in Spanish)
.
Ingeniería, investigación y tecnología
12
(
3
),
277
284
.
ANA – Brazilian Water Agency
.
Stream and rainfall gauges inventory
.
http://hidroweb.ana.gov.br (Accessed January 2016)
.
Awan
,
A. J.
,
Bae
,
D.
&
Kim
,
K.
2015
Identification and trend analysis of homogeneous rainfall zones over the East Asia monsoon region
.
International Journal of Climatology
35
,
1422
1433
.
Bezdek
,
J. C.
1981
Pattern Recognition with Fuzzy Objective Function Algorithms
.
Plenum Press
,
New York
.
Bharath
,
R.
&
Srinivas
,
V. V.
2015
Delineation of homogeneous hydrometeorological regions using wavelet-based global fuzzy cluster analysis
.
International Journal of Climatology
35
(
15
),
4707
4727
.
Dikbas
,
F.
,
Firat
,
M.
,
Cem Koc
,
A.
&
Gungor
,
M.
2012
Classification of precipitation series using fuzzy cluster method
.
International Journal of Climatology
32
,
1596
1603
.
Dikbas
,
F.
,
Firat
,
M.
,
Cem Koc
,
A.
&
Gungor
,
M.
2013
Defining homogeneous regions for streamflow processes in Turkey using a k-means clustering method
.
Arabian Journal for Science and Engineering
38
,
1313
1319
.
Dunn
,
J. C. A.
1973
A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters
.
Cybernetics and Systems
3
,
32
57
.
Farsadnia
,
F.
,
Kamrood
,
M. R.
,
Moghaddam Nia
,
A.
,
Modarres
,
R.
,
Bray
,
M. T.
,
Han
,
D.
&
Sadatinejad
,
J.
2014
Identification of homogeneous regions for regionalization of watersheds by two-level self-organizing feature maps
.
Journal of Hydrology
509
,
387
397
.
Hall
,
M. J.
&
Minns
,
W. A.
1999
The classification of hydrologically homogeneous regions
.
Hydrological Sciences Journal
44
(
5
),
693
704
.
Kahya
,
E.
,
Demirel
,
M. C.
&
Bég
,
O. A.
2008
Hydrologic homogeneous regions using monthly streamflow in Turkey
.
Earth Science Research Journal
12
(
2
),
181
193
.
Miranda
,
E. E.
2005
Brasil em Relevo
.
Embrapa Monitoramento por Satélite
,
Campinas
.
http://www.relevobr.cnpm.embrapa.br (Accessed February 2016)
.
Nasseri
,
M.
&
Zahraie
,
B.
2011
Application of simple clustering on space-time mapping of mean monthly rainfall pattern
.
International Journal of Climatology
31
,
732
741
.
Pakhira
,
M. K.
,
Bandyopadhyay
,
S.
&
Maulik
,
U.
2004
Validity index for crisp and fuzzy clusters
.
Pattern Recognition
37
,
487
501
.
Rao
,
A. R.
&
Srinivas
,
V. V.
2006
Regionalization of watersheds by hybrid-cluster analysis
.
Journal of Hydrology
318
,
37
56
.
Rianna
,
M.
,
Russo
,
F.
&
Napolitano
,
F.
2011
Stochastic index model for intermittent regimes: from preliminary analysis to regionalization
.
Natural Hazards and Earth System Sciences
11
,
1189
1203
.
Ross
,
T. J.
1995
Fuzzy Logic with Engineering Applications
.
McGraw-Hill
,
New York
.
Sahin
,
S.
&
Cigizoglu
,
K. H.
2012
The sub-climate regions and the sub-precipitation regime regions in Turkey
.
Journal of Hydrology
450–451
,
180
189
.
Srinivas
,
V.
,
Tripathi
,
S.
,
Rao
,
A. R.
&
Govindaraju
,
R. S.
2008
Regional flood frequency analysis by combining self-organizing feature map and fuzzy clustering
.
Journal of Hydrology
348
,
148
166
.
Tsakiris
,
G.
,
Nalbantis
,
L.
&
Cavadias
,
G.
2011
Regionalization of low flows based on canonical correlation analysis
.
Advances in Water Resources
34
,
865
872
.