Abstract
Identification of hydrologically homogenous watersheds in the Upper Blue Nile Basin of Ethiopia is challenging due to the large number of watersheds and the lack of consistent and reliable data. Traditional methods, such as expert-based classification, are time-consuming, subjective, and often not reproducible. Therefore, this study aims to identify homogenous gauged watersheds using hydrometeorological and remote sensing data. In this study 76 watersheds were delineated from a 30-m digital elevation model (SRTM-DEM). Twelve watershed characteristics were selected to aid the classification process. Three homogenous climate regions were identified using rainfall data from 42 stations, and for each homogeneous climate region, gauged watersheds were identified. Principal component analysis (PCA) and K-means clustering were used for classification. The PCA reduced 12 watershed characteristics into three principal components using a threshold of 80% accounted variance and eigenvalues greater than one. K-means clustering classified the 76 watersheds into nine homogenous clusters. In the classified regions, vegetation dynamics within three decades have also been analyzed. This helped identify trends in vegetation cover and its spatial and temporal dynamics. The results of the investigation will potentially be used for runoff prediction of ungauged watersheds and for water resource management models in the future.
HIGHLIGHTS
Principal component analysis and K-means cluster analysis were used for homogenous watershed classification.
Use of hydrometeorological data and remote sensing indices were used for homogenous watershed classification.
Seventy-six watersheds in the upper Blue Nile were classified to three climatological regions and nine homogenous watersheds.
INTRODUCTION
Watershed classification is a method used to group watersheds based on similar attributes such as land use, soil type, climate, topography, geology, and hydrology (Wolfe et al. 2019). This process is useful for predicting streamflow in ungauged basins (Choubin et al. 2019), sustainable environmental planning (Pascucci et al. 2018), and flood frequency analysis (Pallard et al. 2008; Farsadnia et al. 2014). Predicting runoff in ungauged basins also requires the classification of watersheds into hydrologically comparable groups before regionalization (Kanishka & Eldho 2017). This is especially true when there are limited financial resources and a lack of technical knowledge to undertake on ground work (Choubin et al. 2017). The precision of regionalization heavily relies on the precise categorization of comparable watersheds (Razavi & Coulibaly 2013; Kanishka & Eldho 2017; Ayalew et al. 2022). Choubin et al. (2017), Mosavi et al. (2021), and Sardooi et al. (2019) have demonstrated watershed classification based on average hydrological indices, physiographic features, and meteorological characteristics specific to each watershed within the basin. With the help of different machine learning clustering algorithms, such as K-means clustering, agglomerative hierarchical clustering, and hybrid clustering, watersheds are generally grouped into a similar region (Farsadnia et al. 2014; Kanishka & Eldho 2017; Sardooi et al. 2019) using the hydrological characteristics of each watershed. So, it is important to choose these watershed characteristics carefully. However, in the Blue Nile Basin, it is generally difficult to determine and understand the hydrological indices of the ungauged watersheds due to the lack of sufficient data. This issue may be resolved using remote sensing datasets that are widely used for runoff prediction (Choubin et al. 2017). Kanishka & Eldho (2020), Palcon (2021), and Chaudhary & Pandey (2022) have explored the potential of clustering watersheds into similar regions, aiming to enhance predictions for ungauged watersheds. These studies aimed to identify homogenous groups of watersheds using various dimensionality reduction techniques on watershed characteristics. Dimensionality reduction methods include both linear and nonlinear techniques. Principal component analysis (PCA) is a linear dimensionality reduction technique that reduces high-dimensional data into smaller dimensions. Several hydrology researchers have demonstrated the effectiveness of utilizing PCA prior to watershed classification (Farhan et al. 2017; Kanishka & Eldho 2017, 2020; Kunnath-Poovakka & Eldho 2018; Palcon 2021). In each of the studies mentioned, PCA reduced the dimensionality of the data so that the majority of the variations within the data were reduced in lower dimensions. One example of nonlinear dimensionality reduction techniques is self-organizing maps (Swain et al. 2016). Watershed classification in the Blue Nile River Basin, Ethiopia, is challenging due to the large number of watersheds and the lack of consistent and reliable data. Traditional methods, such as expert-based classification, are time-consuming, subjective, and often not reproducible. The Upper Blue Nile River Basin faces recurrent drought and famine due to inadequate infrastructure, such as a lack of water impounding systems and reliable irrigation schemes to address extended periods of low precipitation (Kim & Kaluarachchi 2008; Gebregiorgis et al. 2013). The region's dependence on rain-fed agriculture and small-scale irrigation makes runoff estimation crucial for small watersheds. In addition, estimating runoff for ungauged watershed is essential for planning long-term strategies like hydropower generation, large-scale irrigation, and ecological protection. Understanding the temporal and spatial variability of water yield in the study area is vital for local economies and downstream countries. Despite the significance, previous studies have primarily concentrated on estimating runoff only at the outlet of the gauged watersheds (Tigabu et al. 2015, 2020; Ayele et al. 2016). In addition Kim & Kaluarachchi (2008) have attempted to develop regionalization models without identifying hydrologically homogenous watersheds. Therefore, before developing a hydrological model, there is a need for a more objective and efficient method for watershed classification in the Blue Nile River Basin of Ethiopia. Recognizing these challenges in this study, a more objective and efficient linear classification technique, PCA with K-means clustering, is used for classifying 76 watersheds in Blue Nile River Basin, Ethiopia, using the existing physiographic and meteorological characteristics as well as remote sensing-based watershed characteristics. One of the most important objectives and interests of the International Association of Hydrological Sciences (IAHS) is the use of remote sensing datasets to improve runoff prediction in ungauged watersheds (Sivapalan et al. 2003; Choubin et al. 2017). The use of remote sensing datasets is particularly important in regions where hydrological data may be limited, as they can provide additional information about land use, vegetation, and soil properties that can be used to improve the accuracy of runoff prediction models.
MATERIAL AND METHODS
Study area
The Blue Nile River is the most important tributary of the Nile River, providing over 60–70% of the Nile's flow at Aswan Dam (Nawaz et al. 2010). Both Egypt and, to a lesser extent, Sudan are almost entirely dependent on water from the Nile. This dependency creates the challenges of water resources management in these regions and is currently a subject of the international law of transboundary rivers (Waterbury 2008). The Upper Blue Nile Basin refers to the uppermost part of the Blue Nile Basin, located in Ethiopia, that originates from Lake Tana, which is located at an elevation of just under 1,800 m (Figure 1). It leaves the southeastern corner of the lake, flowing first southeast, before looping back on itself, flowing west and then turning northwest close to the border with Sudan. Until the main stream reaches the lowlands at the Ethiopian–Sudanese border at El-Diem, numerous tributaries join the main stream in the central and southern highlands of Ethiopia. By gaining a better understanding of the Upper Blue Nile Basin's homogeneous watersheds, managers can more effectively plan the utilization of water resources and mitigate natural disasters such as erosion, drought, and others, which may be influenced by the geography and climate of the watershed.
Methodology
The methodology of the study included (i) deriving the required watershed characteristics from hydrometeorological and different remote sensing datasets, (ii) normalization of watershed characteristics, (iii) multicollinearity assessment, (iv) K-means clustering, and (v) finding the optimum number of classes according to the classification validation criteria. Various research studies have utilized different methods to characterize watersheds, indicating the need to determine which watershed characteristics significantly affect runoff responses. Expert judgment is required to identify such characteristics. According to previous studies (Choubin et al. 2017; Sardooi et al. 2019; Wolfe et al. 2019), a total of 12 potentially useful watershed characteristics were selected to identify homogeneous watersheds in the Upper Blue Nile Basin. The selected watershed characteristics for the classification of watersheds into homogenous groups are presented in Table 1.
. | Description . | Units . | Data source . |
---|---|---|---|
Watershed attribute | |||
Area | The size of each watershed | km2 | 30-m SRTM-DEM (https://gdex.cr.usgs.gov/gdex) |
Longitude | Longitudinal centroid value for each watershed | Degrees | |
Latitude | Latitudinal centroid value for each watershed | Degrees | |
Physiographic characteristics | |||
Elevation | Average elevation for each watershed | M | |
Slope | Average slope for each watershed | % | |
Meteorological characteristics | |||
Precipitation | Mean areal precipitation for each watershed | Mm | Ethiopian National Metrological Agency (ENMA) |
Temperature | Mean temperature | °C | |
Remote sensing indices | |||
NDVI | Mean area normalized difference vegetation index | – | 12-year mean annual Landsat 8 https://earthexplorer.usgs.gov/ |
EVI | Mean areal enhanced vegetation index | – | |
SAVI | Mean areal soil-adjusted vegetation index | – | |
NDMI | Mean areal normalized difference moisture index | – | |
NDWI | Mean areal normalized difference water index | – |
. | Description . | Units . | Data source . |
---|---|---|---|
Watershed attribute | |||
Area | The size of each watershed | km2 | 30-m SRTM-DEM (https://gdex.cr.usgs.gov/gdex) |
Longitude | Longitudinal centroid value for each watershed | Degrees | |
Latitude | Latitudinal centroid value for each watershed | Degrees | |
Physiographic characteristics | |||
Elevation | Average elevation for each watershed | M | |
Slope | Average slope for each watershed | % | |
Meteorological characteristics | |||
Precipitation | Mean areal precipitation for each watershed | Mm | Ethiopian National Metrological Agency (ENMA) |
Temperature | Mean temperature | °C | |
Remote sensing indices | |||
NDVI | Mean area normalized difference vegetation index | – | 12-year mean annual Landsat 8 https://earthexplorer.usgs.gov/ |
EVI | Mean areal enhanced vegetation index | – | |
SAVI | Mean areal soil-adjusted vegetation index | – | |
NDMI | Mean areal normalized difference moisture index | – | |
NDWI | Mean areal normalized difference water index | – |
Note: All data were obtained on October 2, 2023.
Physiographic and metrological characteristics
Remote sensing indices
Normalized difference vegetation index
Soil-adjusted vegetation index
Normalized difference moisture index
The NIR band is sensitive to vegetation reflectance, while the MIR band is sensitive to water and moisture content in vegetation and soil. By subtracting the MIR band from the NIR band and normalizing the result, the NDMI index is able to highlight areas of high moisture content and discriminate them from areas of low moisture content.
Enhanced vegetation index
The Red band is sensitive to chlorophyll absorption, while the NIR band is sensitive to vegetation structure and biomass. The Blue band helps correct for atmospheric interference and soil background effects. EVI values range from −1 to 1, with higher values indicating greater vegetation density and health (Huete et al. 2002). EVI is widely used in a variety of applications, including monitoring crop yields, tracking deforestation, and assessing the impacts of climate change on vegetation (Fensholt & Proud 2012).
Identifying homogenous climate and watersheds
Homogenous climate zones and watersheds were identified using the PCA. PCA converts possibly correlated multiple variables into linearly uncorrelated variables and considerably reconstructs the variability in the original dataset with numerous variables using fewer new variables (Jackson 2005). In this study, the watershed attributes (Table 1) were dimensionally reduced using PCA. However, the concepts and the algorithms used to execute a cluster analysis with PCA are inherently different. Due to our data matrix being huge, performing eigen decomposition to calculate the eigenvalues of the covariance matrix proved challenging and prone to round-off errors. As an alternative, singular value decomposition (SVD) is a reliable computational technique frequently used to compute PCAs of a dataset (Ayalew et al. 2022). This involves reducing the less significant basis vectors in the initial SVD matrix. Therefore, the analysis was performed using SVD in R environment. To determine the number of primary components, in this study, a scree plot of the elbow rule was used (Peres-Neto et al. 2005). This approach involves locating the ‘elbow’ shape on the curve and keeping all components until the curve flattens out (Holland 2008; Zambelli 2016). In the process of identifying similar watersheds, it is crucial to take into account different factors that may impact the precision of outcomes, including variation in climate zones. To reduce this influence, initially, the homogeneous climate zones were identified using data from 42 rainfall stations, and subsequently, for each homogeneous climate zone, the homogeneous watersheds were identified.
Data normalization
RESULTS AND DISCUSSION
The results of PCA analysis depicted that the 42 stations within the Blue Nile Basin were categorized into three distinct climate regions (Figure 5). Through identifying homogenous climate regions, we were able to reduce the uncertainty of homogenous watershed identification. By doing so, we were able to obtain watershed classification results that were more precise and dependable. This allowed us to achieve more accurate and reliable results in our watershed classification analysis. The identification of homogenous climate regions provides valuable information about the spatiotemporal distribution of rainfall within the basin, which can serve as a useful reference for future investigations into the hydrological mechanisms at work in the region.
According to the PCA presented in Figure 5, three homogenous rainfall regions/clusters can be identified. The vector lines in the figure represent highly correlated weather stations and are considered part of a single, homogeneous climate zone. The points indicate temporal variability at a monthly timescale. Based on their relevance degrees, 11 stations in the upper basin were categorized under homogenous climate region I, 11 stations in the lower basin were categorized under homogenous climate region II, and 20 stations in the central basin were categorized under homogenous climate region III. The first two principal components (PCs) explain over 95% of the total variability in the dataset, and their standard deviation is greater than 1, providing valuable insights into rainfall variability.
Variable reduction for watershed classification
In Figure 7, the horizontal dashed line represents the expected contribution (average contribution of variables) to each PC since all variables might be contributed evenly. Therefore, variables that have a contribution to that PC are located above this line. In the first component, eight watershed characteristics (EVI, Longitude, NDMI, NDVI, Rainfall, and SAVI) have a significant contribution, while Elevation and Temperature have a low contribution (as shown in Figure 7). In contrast, the second PC is determined by only two variables, namely, Area and Slope, as illustrated in Figure 7. After variable reduction, the cluster analysis was performed using K-means clustering algorithm based on these 10 most significant variables.
Determine optimal number of clusters
The optimal cluster (K) is determined by selecting the point at which the Average Silhouette is maximized and the total WSS is minimized, across a range of possible values for K. As shown in Figure 8(a), for homogenous climate region I, K = 4, the WSS tends to fluctuate more slowly than it does for other Ks. Therefore, K = 4 should be a good decision for the number of clusters for region I homogenous climate. For homogenous climate region II, K = 3 indicates that the WSS tends to fluctuate more slowly than it does for other Ks. Therefore, K = 3 is considered a good decision for the number of clusters for homogenous climate region II. For homogenous climate region III, K = 2, the WSS tends to fluctuate more slowly than it does for other Ks.
K-means clustering result
We determined the optimal number of clusters for classifying watersheds in the Upper Blue Nile Basin using the Elbow and Average Silhouette methods (as shown in Figure 8). Based on our analysis, we classified the watersheds into nine homogeneous groups using the K-means classification method. This classification method allowed us to group the watersheds based on their similarities in terms of hydrometeorological and remote sensing data. By doing so, we were able to identify groups of watersheds that have similar hydrological processes and characteristics, which can be valuable for runoff prediction in ungauged watersheds using the regionalization method.
In Table 2, we present the cluster membership of the watersheds in the Blue Nile Basin, which were classified into several clusters based on their similarities in terms of hydrometeorological and remote sensing data. Climate homogeneous region I consists of four homogeneous clusters, namely, cluster 1, cluster 2, cluster 3, and cluster 4, with seven, six, seven, and five watersheds, respectively; climate homogeneous region II comprises of three homogeneous clusters, namely, cluster 1, cluster 2, and cluster 3. These clusters consist of seven, four, and four watersheds, respectively; and climate homogeneous region III comprises of two homogeneous clusters, namely, cluster 1 and cluster 2, consisting of 20 and 16 watersheds, respectively. The cluster membership information presented in Table 2 provides insight into the grouping of watersheds based on their similarities. This information can be valuable for designing effective water management strategies and decision-making processes for the Upper Blue Nile Basin.
Climate region . | Homogenous watersheds . | |||
---|---|---|---|---|
Cluster 1 . | Cluster 2 . | Cluster 3 . | Cluster 4 . | |
Region I | Aleltu, Boreda, Desso, Gebreguracha, Jemma, Mechela, and Wenchit | Chacha, Gorfo, Mugher, Roba, Robi-Jida, and Robi-gumero | Beressa, Gerado, Jogola, Kelina, Selgi, Shy, and Wizer | Debis, Guder, Huluka, Tilku Duber, and Tinshu Duber |
Region II | Adiya, Angar, Dabana, Dabus, Little Ang, Uke, and Wama | Didessa, Tamsa, Urgessa, and Yebu | Indris, Neshi, Sifa, and Tato | |
Region III | Abahim, Andassa, Azuari, Bogena, Chemoga, Chena, Dirma, Gemero, Gumara, Megech, Muga, Ribb, Sedie, Shina, Suha, Teme, Tigdar, Tul, Wenka, and Yeda | Abbay, Birr, Chereka, Dondor, Dura, Fettam, Gelgel Abay, Gilgel Beles, Gudla, Jedeb, Koga, Lah, Leza, Main Beles, Missini, and Temcha |
Climate region . | Homogenous watersheds . | |||
---|---|---|---|---|
Cluster 1 . | Cluster 2 . | Cluster 3 . | Cluster 4 . | |
Region I | Aleltu, Boreda, Desso, Gebreguracha, Jemma, Mechela, and Wenchit | Chacha, Gorfo, Mugher, Roba, Robi-Jida, and Robi-gumero | Beressa, Gerado, Jogola, Kelina, Selgi, Shy, and Wizer | Debis, Guder, Huluka, Tilku Duber, and Tinshu Duber |
Region II | Adiya, Angar, Dabana, Dabus, Little Ang, Uke, and Wama | Didessa, Tamsa, Urgessa, and Yebu | Indris, Neshi, Sifa, and Tato | |
Region III | Abahim, Andassa, Azuari, Bogena, Chemoga, Chena, Dirma, Gemero, Gumara, Megech, Muga, Ribb, Sedie, Shina, Suha, Teme, Tigdar, Tul, Wenka, and Yeda | Abbay, Birr, Chereka, Dondor, Dura, Fettam, Gelgel Abay, Gilgel Beles, Gudla, Jedeb, Koga, Lah, Leza, Main Beles, Missini, and Temcha |
Validation of cluster analysis
The spatial distribution of the clusters within each region provides important insights into the hydrological conditions of the Blue Nile Basin. For instance, the presence of cluster 2 in all three regions indicates that there are areas with moderate elevation, abundant rainfall, and good vegetation cover throughout the basin. This suggests that these areas may play a critical role in the hydrology of the basin, such as in the generation of runoff and the maintenance of water quality. In addition, the spatial distribution of cluster 1 in the eastern part of the basin across all three regions suggests that this area may be more prone to water scarcity and drought conditions, while the presence of cluster 2 in the western part of the basin across all three regions suggests that this area may be more resilient to water scarcity due to the presence of moderate rainfall and vegetation cover. From this, we observed that the K-mean clustering algorithm has successfully divided the watershed into nine groups that are relatively similar in terms of physiographic and meteorological variability. This is beneficial for better understanding the overall condition of the watersheds, as it allows for more targeted and specific management practices to be implemented in each group based on their particular characteristics. Therefore, watersheds in homogenous climate region I are characterized by steep topography and low vegetation cover; we may want to implement management practices that focus on controlling erosion and sedimentation; for homogenous climate region II with gentle topography and high vegetation cover, we may want to implement practices that focus on water conservation and increasing soil moisture retention, while for homogenous climate region III with intermediate elevation and medium rainfall, we may want to implement both water conservation practice and practices that focus on controlling erosion and sedimentation. This watershed clustering approach may also have been considered to improve problems of runoff prediction in the ungauged watersheds.
Vegetation dynamics in three homogenous climate regions
The upper Blue Nile Basin exhibits significant spatial variability in NDVI values within the three identified regions (Figure 13). In region II, for example, there are areas of high NDVI values, particularly in the western part of the basin, where forests and grasslands are dominant. However, there are also areas of low NDVI values, particularly in the eastern and southern parts of the basin, where cropland and grazing land use dominate. In region III, there is also significant spatial variability in NDVI values, with the highest values observed in the irrigated agricultural areas in the central and western parts of the basin. However, there are also areas of lower NDVI values in the eastern and southern parts of the basin, where rain-fed agriculture and grazing land use are dominant. In region I, there is less spatial variability in NDVI values compared to the other two regions, with the lowest values observed in the barren upland areas and the highest values observed in the southeastern cropland areas. Overall, the spatial variability in NDVI values within the three regions reflects the complex interactions between land use, topography, and environmental factors that influence vegetation growth in the upper Blue Nile Basin. These findings have important implications for sustainable land use management, particularly in the context of climate change and other environmental challenges.
CONCLUSION
In this study, a clustering algorism was used to identify hydrologically homogenous watersheds of the upper Blue Nile Basin, Ethiopia. To achieve the goal, the physiographic parameters (area, longitude, latitude, elevation, and slope), metrological parameters (rainfall and temperature), and remote sensing indices (NDVI, SAVI, EVI, NDMI, and NDWI) were used. For watershed classification, a linear dimensionality reduction technique, PCA followed by K-means clustering, was used to classify 76 watersheds of the basin. The number of primary PCs was determined using the plot of the percentage of variance arranged from largest to smallest (scree plot). In the analysis, 10 parameters were taken into account for the first two PCs, which were then used for conducting K-means clustering. In addition to the PCA, the Elbow and Average Silhouette methods were employed to determine the optimal value of K for K-means clustering. After determining the optimal number of regions using the Elbow and Average Silhouette methods, the 76 Upper Blue Nile Basin watersheds of the three homogenous climate regions were also classified into nine watershed clusters using the K-means clustering algorithm. The results of the clustering analysis conducted in this study aligned with the existing physiographic and meteorological patterns that are known to exist in the Blue Nile Basin. The upper region of the basin (region I) is distinguished by low vegetation coverage, which can be attributed to the area's high percentage of agricultural land use and dense population. Conversely, the lower region of the basin (region II) is recognized for its relatively flat terrain and greater water resources, which fosters a greater abundance of vegetation and more extensive agricultural practices. The central part of the basin (region III) possesses intermediate features, making it suitable for a mix of land uses. Overall, the findings of this study have important implications for sustainable management of water resources in the Upper Blue Nile Basin and other similar basins. The results highlight the need to consider the spatial variability and heterogeneity of land use and environmental factors in hydrological modeling and decision-making. Future studies could further explore the use of clustering analysis and additional remote sensing indices and hydrological parameters to improve the accuracy and applicability of regionalization models in similar watersheds.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.