ABSTRACT
The Earth is currently experiencing severe economic and social consequences as a result of frequent floods. This study is crucial for effective risk management and mitigation, protecting lives and property from potential flood damage in the Deme watershed. This study endeavors to assess the efficacy of a logistic regression model in generating a flood susceptibility map for the Deme watershed in Ethiopia. Fourteen factors contributing to flooding were considered, including digital elevation model, slope, aspect, profile curvature, plane curvature, Topographic Position Index (TPI), Topographic Roughness Index (TRI), flow direction, Topographic Wetness Index (TWI), distance to the river, rainfall, land use/land cover (LULC), Normalized Difference Vegetation Index (NDVI), and soil type. The receiver operating characteristic (ROC) curve method was employed to validate the model. The area under the curve (AUC) values for the model were determined to be 81% for the training dataset and 82% for the validation dataset, indicating its effectiveness in delineating flood-prone areas. The findings revealed that 18% of the watershed is very highly susceptible to flooding, 19% exhibits high susceptibility, 18% shows moderate susceptibility, while 20 and 24% have low and very low susceptibility, respectively. This research provides insights into comprehensive flood prevention and urban development strategies.
HIGHLIGHTS
Flood susceptibility is determined by historical flood patterns and their influencing factors.
Logistic regression can be used to map flood-susceptible areas in a small watershed.
A multicollinearity test is necessary to ensure a linear relationship in flood conditioning factors.
Factors with high multicollinearity should be removed from models to improve prediction accuracy.
INTRODUCTION
Floods are the most frequent natural disasters, affecting the largest number of people worldwide (Ukumo et al. 2022). The Emergency Events Database (EM-DAT) developed by the Center for Research on the Epidemiology of Disasters (CRED) at the Catholic University of Louvain (UC Louvain) listed 5,169 events that affected the world between 1980 and 2023. The resulting economic damage is estimated, by the same source, at 973.14 trillion US dollars; those of 1993, 1998, 2010, 2011, 2013, 2016, 2020, and 2021 were particularly high. The risk of flooding is particularly high in urban areas and some studies suggest that the problem of urban flooding is likely to worsen with the increased frequency of extreme precipitation due to climate change (Edamo et al. 2022a). Floods, along with windstorms, account for more than a third of all economic losses caused by natural disasters around the world, and floods are the most prevalent natural disasters (Edamo et al. 2022b). Flooding is the most devastating, ubiquitous, and prevalent natural disaster globally. Climate change and high population density accelerate the flooding problem (Edamo et al. 2022c).
Thus, accelerated urban development contributes to the transformation of natural areas into impermeable areas, which increases urban runoff and, consequently, increases the exposure of populations to the risk of flooding (Das 2018). All regions of the world are affected by floods, however, developing countries are particularly highly vulnerable to natural hazards like floods due to their low adaptive capacity and inadequate infrastructure to cope with these disasters (Ukumo et al. 2022).
Urban areas in Africa are particularly affected by flooding phenomena during the rainy season. This is the case, for example, of cities in East Africa, an arid region south of the Sahara, which has experienced a significant increase in extreme precipitation and an exponential increase in damage caused by flooding in recent decades (Ouma & Tateishi 2014). For example, in 2003, floods caused the deaths of at least 12 people and economic losses in Ethiopia, Mali, Mauritania, Niger, and Senegal (Edamo et al. 2022c). During the 2005 rainy season in Senegal, flooding caused serious material damage. The West African region suffered further in August 2007 due to a delayed monsoon and extreme rainfall. The region was again hit hard by severe flooding in 2020, 2021, 2022, and 2023. More recently, in 2022, several East African cities suffered the full brunt of torrential rains. For example, in Sudan, some neighborhoods were flooded for several days due to heavy rains in August 2022. This flooding caused the collapse of 9,000 houses, affected 50,000 people, and caused 12 deaths.
Since the late 20th century, Ethiopia, alongside other East African nations, has witnessed a surge in flooding occurrences, inflicting severe repercussions on the populace. In this country, the frequency of floods has notably escalated, transitioning from an average of one flood per year between 1986 and 2005 to five floods annually between 2006 and 2016, with a third of these floods impacting the Deme watershed (Edamo et al. 2022c). In September 2022, Ethiopia was profoundly affected by intense rainfall, resulting in approximately 41 fatalities, 112 injuries, and over 100,000 individuals being adversely affected (Ukumo et al. 2022). This deluge severely impacted more than 2,300 households within the Deme watershed. Presently, the Deme watershed continues to grapple with significant flood events, notably those occurring in 2009, 2016, and 2021.
Flood susceptibility mapping is a critical aspect of disaster management, particularly in small watersheds where even minor changes in environmental conditions can significantly impact flooding (Edamo et al. 2022c). Various methods have been employed to predict flood-prone areas, including hydrological models, geographic information systems (GIS), and statistical approaches (Tehrany et al. 2014a). Among these, logistic regression has gained prominence due to its simplicity, interpretability, and effectiveness in binary classification problems (flooded vs. non-flooded areas). Studies such as Tehrany et al. (2014b) and Mojaddadi et al. (2017) have demonstrated the utility of logistic regression in environmental hazard modeling, showcasing its potential in flood susceptibility analysis.
Despite the success of logistic regression in broader contexts, its application specifically to small watersheds remains underexplored. Most existing research has focused on large-scale floodplain areas, where the dynamics and influencing factors of flooding differ significantly from those in smaller watersheds (Edamo et al. 2022c). For instance, research by Rahmati et al. (2016) predominantly addresses larger river basins, highlighting a gap in the literature concerning smaller, more confined hydrological systems. This study aims to fill this gap by evaluating the capability of logistic regression in identifying flood-susceptible areas within a small watershed, thus contributing to more localized and precise flood risk management strategies.
The novelty of this investigation lies in its focus on a small watershed context, where unique hydrological and topographical features may influence flood susceptibility differently than in larger basins. By employing logistic regression, this research seeks to provide a robust, data-driven method for flood susceptibility mapping that can be easily implemented and interpreted by local authorities and policymakers. Previous studies, such as those by Edamo et al. (2022c) and Tehrany et al. (2014b), have established logistic regression as a reliable tool in environmental hazard prediction, but there remains a need to adapt and validate these methodologies specifically for small watershed applications.
Faced with the intensification of flooding in the Deme watershed, improving flood practices has become a priority for the government of Ethiopia. Thus, to reduce the exposure of populations to the risk of flooding, the government of Ethiopia adopted, after the major flood of 1 September 2009, a decree locating and demarcating flood zones in the Deme watershed. Thus, the administrative limits of non-constructible and submersible zones are defined (and delimited locally) at a distance of 100 and 300 m, respectively, from the high-water mark. According to the decree, an unbuildable flood zone is defined by easements of 100 m on either side of the primary rainwater drainage channels, as well as by areas located below the water level of the backwaters (Dero et al. 2021). However, the zoning maps do not take into account the topography of the site or the hydrology of the environment in a small watershed. The mapping of flood zones corresponds to the definition of the safety perimeters of the primary canals and the water levels of the dams. This representation reduces the risk of flooding in the Deme watershed due to the natural overflow of water bodies or the backflow of pipes and does not consider flooding due to stormwater runoff. This map cannot be considered a hazardous map. In addition, some authors have attempted to map susceptibility to flooding in the Deme watershed. However, all these studies established the flood susceptibility map by adding or multiplying arbitrarily chosen flood susceptibility factors without considering their diversity and interaction. In addition, it appears that these flood susceptibility factors are subject to collinearity bias, and the maps established were not validated. The result is an arbitrary, subjective, and incomplete presentation of flood susceptibility maps, which tends to oversimplify the complexity of the flood phenomenon. Thus, existing flood susceptibility maps remain problematic for decision-making regarding flood risk management in the Deme watershed. It is to remedy this gap that our study was envisaged. The main objective of this research is to produce a flood susceptibility map for the Deme watershed using a logistic regression model and a geospatial database of flood conditioning factors. This research work highlights the flood hazard in its primary sense of the term, which is a natural phenomenon, without integrating the anthropogenic causes, which are an aggravating factor of the phenomenon. The interest of this study is the fight against the risk of flooding by improving knowledge of flood zones in the Deme watershed. The study uses various statistical analyses (e.g., Pearson correlation and multicollinearity analyses) to ensure the accuracy of flood prediction. To our knowledge, this is the first application of an inductive approach taking into account so many factors in the evaluation and mapping of the susceptibility of the Deme watershed to flood risks.
MATERIALS AND METHODS
Study area
The Gamo highland is located in southwestern Ethiopia, along the northwestern escarpment of the Great East African Rift Valley System, within the southern region of the Gamo zone administration. Within this administration, Dita woreda is one of 14 rural and 4 town administrative woredas of the zone. The upper Deme watershed is positioned 562 km southwest of Addis Ababa and 58 km north of Arbaminch. Geographically, it is situated in Dita woreda between 6°16′10″ and 6°23′40″ North, and 37°27′15″ and 37°33′0″ East. It lies in the Omo basin, marking the starting point of the Deme River in Lanta/Mayilo mountain on the mountain path of Mt. Gugie.
The watershed spans across highland areas in the north, gradually descending to the lowland plains in the south. The elevation ranges from approximately 620 to >3,300 m above sea level, contributing to varied climatic conditions. The region experiences a tropical monsoon climate with a distinct wet season from May to October and a dry season from November to April. Annual rainfall averages between 1,100 and 1,400 mm, primarily occurring during the wet season, which significantly influences the watershed's hydrology.
Hydrologically, the Deme watershed is characterized by numerous rivers and streams that converge to form a complex network of watercourses. The major river in the watershed is the Deme River, which originates from the highlands and flows southward, eventually joining the Omo River. The watershed's diverse landscape includes forest, agricultural land, grasslands, and water bodies, each supporting distinct ecosystems. These environments host a wide variety of flora and fauna, with several species endemic to the region (Dero et al. 2021). The soil composition varies from fertile volcanic soils in the highlands to alluvial soils in the lowlands, supporting agriculture and contributing to the region's economy. Environmental challenges in the watershed include deforestation, soil erosion, and the impacts of climate change, which necessitate sustainable management practices to preserve its ecological integrity.
Data collection
ID . | Data . | Resolution . | Format . | Sources . | Derived maps . |
---|---|---|---|---|---|
1 | DEM | 30 m | Raster | https://urs.earthdata.nasa.gov | Elevation, slope, aspect, plan curvature, profile curvature, TWI, TPI, TRI, flow direction |
2 | Landsat 8 Imagery (2022) | 30 m | Raster | https://earthexplorer.usgs.gov | LULC map, NDVI |
3 | Soil data | – | Vector | National Soil Services in Ethiopia (BUNASOLS) | Soil map |
4 | Hydrography | – | Vector | Topographic databases | Distance to the river |
5 | Rainfall data (2000–2023) | – | Vector | https://crudata.uea.ac.uk | Average annual rainfall map |
6 | Flood sites | – | Vector | Household survey | Flood inventory map |
ID . | Data . | Resolution . | Format . | Sources . | Derived maps . |
---|---|---|---|---|---|
1 | DEM | 30 m | Raster | https://urs.earthdata.nasa.gov | Elevation, slope, aspect, plan curvature, profile curvature, TWI, TPI, TRI, flow direction |
2 | Landsat 8 Imagery (2022) | 30 m | Raster | https://earthexplorer.usgs.gov | LULC map, NDVI |
3 | Soil data | – | Vector | National Soil Services in Ethiopia (BUNASOLS) | Soil map |
4 | Hydrography | – | Vector | Topographic databases | Distance to the river |
5 | Rainfall data (2000–2023) | – | Vector | https://crudata.uea.ac.uk | Average annual rainfall map |
6 | Flood sites | – | Vector | Household survey | Flood inventory map |
Description of the general framework of the study
Flood conditioning factors
In addition, the categorical factors (LULC, soil type) for each pixel of the training and validation samples are assigned normalized values (between 0 and 1).
Topographic factors
DEM: Elevation has been identified as a major factor for flood modeling. It has an inverse relationship with flood sensitivity, meaning that low-lying areas are more sensitive to flooding compared to higher ground since water flows along topographic gradients. In this study, the elevation map was automatically extracted from the DEM with a pixel size of 30 × 30 m (Figure 3(a)).
Slope: Slope is generally of great importance in mapping flood susceptibility. The slope angle determines surface runoff, the velocity of water flow, the aggravation of soil erosion as well as vertical percolation and, therefore, significantly influences the physical predispositions to flooding. It is in areas of low slope that flooding frequently occurs. In this study, the slope map was automatically extracted using the DEM in ArcGIS (Figure 3(g)).
Aspect: The aspect measures for each cell the direction of the downward slope measured clockwise in degrees from 0 to 360, where 0 is north-facing, 90 is east-facing, 180 is south-facing, and 270 is west-facing. The value −1 is assigned to flat areas with no downward slope. Generally, this aspect has implications for soil moisture content; in fact, north-facing slopes are generally of higher moisture content, which can lead to soil fragility in the face of flood risk. In this study, the aspect map was automatically extracted using the DEM in ArcGIS (Figure 3(b)).
Plan curvature: The plan curvature reflects the slope of the exposure, describes the horizontal shape of the topography and highlights converging (concave curvature) or diverging (convex curvature) water flows. Plan curvature represents a flood conditioning variable. In this study, the plan curvature map was automatically extracted using the DEM in ArcGIS (Figure 3(d)).
Profile curvature: The profile curvature reflects the complexities of the ground and can be determined as the slope of the slope. Profile curvature represents a flood conditioning variable. It is parallel to the slope and indicates the direction of the maximum slope. It affects the acceleration and deceleration of flow across the surface. The profile curves are negative, positive, and zero going from top to bottom. A negative value means that the surface is upwardly convex at that cell, and the flow will be decelerated (Asmare 2023). A positive profile means that the surface is upwardly concave at that cell, and the flow will be accelerated. A value of zero indicates that the surface is linear (no slope). In this study, the profile curvature map was automatically extracted using the DEM in ArcGIS (Figure 3(f)).
Positive values of the TPI indicate that the central cell is situated higher than its neighborhood, and negative values when it is situated lower, are usually represented by valleys.
TPI: The TPI measures the relative position of a point in the landscape by comparing its elevation to the mean elevation within a specified radius, indicating whether it is on a ridge, flat area, or valley (Edamo et al. 2022a). A positive TPI indicates a higher elevation than the average, a negative TPI indicates a lower elevation, and a TPI near zero suggests a flat or gently sloping area.
Flow direction: One of the fundamental characteristics of surface hydrology is the ability to differentiate the flow direction of each raster pixel. Flow direction indicates how overland flow is distributed over a catchment and is a key parameter when performing hydrological modeling for such flood forecasting. The flow direction is a grid whose value represents the sharpest point in the direction of the streamflows in each cell. In this study, the flow direction map was automatically extracted using the DEM in ArcGIS (Figure 3(c)).
Hydrological factors
Distance from the river: The distance to the river is an important factor in determining susceptibility to flooding, as the areas most affected during a flood are those located close to the river bank due to overflowing water (Nandi et al. 2017). In this study, the river distance map (Figure 4(i)) was extracted from the hydrographic network map using the Euclidean distance tool in ArcGIS.
Rainfall: Rainfall is the main driver of runoff and flooding (Kazakis et al. 2015). Floods are generally preceded by heavy and prolonged rainfall. However, low-intensity rainfall can cause flooding in an area if the soil is already saturated. In this study, the average annual rainfall over the period (2000–2023) was used (Figure 4(j)).
LULC: LULC is also an important factor in identifying flood-prone areas. This factor influences the infiltration rate and runoff volume. On the other hand, urban areas, which are mainly made up of impermeable surfaces and bare land, increase water runoff. LULC maps are based on Landsat 8 satellite images downloaded from the US government website https://earthexplorer.usgs.gov (Figure 4(l)).
Complementary factors
Soil type: Soil type directly affects the drainage process due to inherent soil characteristics such as texture, permeability level, and structure (Tehrany et al. 2013). The impact of soil typology on flooding is quite significant since it controls the volumes of water that can infiltrate or run to the surface. The mapping of soil type encountered was based on data from Bureau of National and Regional Soil Laboratories (BUNASOLS) for the Deme watershed (Figure 5(m)).
Logistic regression model
The logistic regression model employed in this flood mapping study is designed to predict the probability of flood occurrence in a given area based on a set of predictor variables. Logistic regression is a statistical method used for binary classification problems, where the outcome variable is categorical and can take one of two possible outcomes: flood or no flood (Chowdhuri et al. 2020). The model estimates the relationship between the predictor variables and the log odds of the outcome occurring (Christensen et al. 2007). By fitting the model to the data, we obtain a set of coefficients that quantify the impact of each predictor variable on the likelihood of a flood (Yin et al. 2021).
The setup of the logistic regression model began with the careful selection of predictor variables, which are crucial for accurately capturing the factors that influence flood risk (Al-Juaidi et al. 2018). In this study, the predictor variables included topographical features (such as elevation, slope, and aspect), hydrological factors (like distance to a river and soil type), and meteorological data (rainfall). These variables were chosen based on their relevance to flood dynamics and the availability of data. Additionally, historical flood records were used as the response variable to train the model (Bui et al. 2019).
Model training involved several key steps. First, the data were preprocessed to handle missing values, normalize continuous variables, and encode categorical variables (Janizadeh et al. 2019). Then, the dataset was split into training and testing subsets to evaluate the model's performance. The training set was used to fit the logistic regression model, where the algorithm iteratively adjusted the coefficients to minimize the error between the predicted and actual outcomes. Regularization techniques, such as L1 (lasso) or L2 (ridge) regularization, were employed to prevent overfitting by penalizing large coefficients and promoting a simpler, more generalizable model.
After training the model, its performance was evaluated using the testing subset. The area under the receiver operating characteristic (ROC) curve was calculated to assess the model's predictive capabilities. These metrics provided insights into how well the model could distinguish between flooded and non-flooded areas (Edamo et al. 2022c). The final model was then used to generate flood probability maps, which serve as valuable tools for identifying flood-prone regions and informing flood risk management strategies.
It is a statistical model that predicts the probability of an event occurring or not, based on the optimization of regression coefficients. The model delivers a binary or dichotomous outcome limited to two possible outcomes (0 and 1). The corresponding probability of the value can vary between 0 and 1 (Edamo et al. 2022c).
This flood probability (P) corresponds to the flood susceptibility in this study. The flood susceptibility estimated by logistic regression varies between 0 and 1, and the closer the value is to 1, the higher the probability of flood occurrence.
Validation method
To assess the performance of the logistic regression model used for creating flood maps, we employed the ROC curve as a validation method. The ROC curve is a graphical representation that illustrates the diagnostic ability of the binary classifier system as its discrimination threshold is varied (Asmare 2023). To construct the ROC curve, we plotted the true positive rate (sensitivity) against the false positive rate (one-specificity) at various threshold settings. This approach helps in understanding how well the model distinguishes between flooded and non-flooded areas under different conditions (Saha et al. 2021).
The area under the ROC curve (AUC) was then calculated to quantify the overall performance of the logistic regression model (Asmare 2023). The AUC quantifies the overall ability of the model to discriminate between positive and negative classes by measuring the area under the ROC curve. The AUC value ranges from 0 to 1, where a value of 0.5 indicates no discriminative power, equivalent to random guessing, and a value of 1.0 represents perfect classification (Ha et al. 2022). In our study, the AUC values were derived by integrating the area under the ROC curve using numerical methods. Higher AUC values indicate better model performance, as they reflect a higher true positive rate and a lower false positive rate across all thresholds. Interpreting these values, we could objectively compare the effectiveness of our logistic regression model and make necessary adjustments to improve the accuracy and reliability of the flood maps.
The ROC analysis was carried out using a confusion matrix in order to check the extent to which the areas of high flood susceptibility predicted by the logistic regression model were consistent with the flood inventory map derived from the surveys (Saha et al. 2021). A confusion matrix is a table (Table 2) that reports the results of the classifiers using specific terms, such as ‘True Positives (TP)’, the predicted and actual positive results; ‘False Positives (FP)’, the predicted positive but actual negative results; ‘True Negatives (TN)’, the predicted and actual negative results; and ‘False Negatives (FN)’, the predicted negative but actual positive results (Edamo et al. 2022c).
. | Flood inventory . | ||
---|---|---|---|
Flooded site . | Non-flooded site . | ||
Flood susceptibility | Susceptible location | True positive (TP) | False positive (FP) |
non-susceptible location | False negative (FN) | True negative (TN) |
. | Flood inventory . | ||
---|---|---|---|
Flooded site . | Non-flooded site . | ||
Flood susceptibility | Susceptible location | True positive (TP) | False positive (FP) |
non-susceptible location | False negative (FN) | True negative (TN) |
In ROC analysis, the performance of a test method can be assessed by calculating the AUC. The closer the AUC is to 1, the better the detection performance, while the closer the AUC is to 0, the worse the detection performance. AUC can be evaluated using the following numerical and qualitative classifications: less than 0.5 (test not useful), 0.5–0.6 (bad), 0.6–0.7 (sufficient), 0.7–0.8 (good), 0. 8–0.9 (very good), and 0.9–1 (excellent). The success rate is obtained if the AUC is calculated using the training dataset. The prediction rate is calculated when the AUC is obtained using the test dataset.
Susceptibility mapping
The logistic regression model is trained on a collected dataset where the dependent variable is binary, indicating whether flooding is present in historical records. The model estimates the probability of flooding for each location based on these factors (Ha et al. 2022). The output is a probability map indicating the likelihood of flooding, with values ranging from 0 to 1, where higher values indicate greater susceptibility.
The probability values obtained from the logistic regression model are then classified into different susceptibility categories to create a more interpretable flood susceptibility map. These categories, such as very high, high, moderate, and low, are defined based on threshold values that segment the continuous probability scale into discrete intervals (Janizadeh et al. 2019). For example, areas with a probability of flooding above 0.8 might be classified as very high susceptibility, while those between 0.6 and 0.8 might have high susceptibility, 0.4–0.6 as moderate, and below 0.4 as low. These categories are mapped spatially, allowing for a clear visualization of flood risk zones, which can be used for urban planning, emergency response, and flood mitigation strategies (Edamo et al. 2022b).
Statistical analysis
To map flood susceptibility, it is necessary to collect a multitude of data from various sources and analyze them in a GIS environment. In developing flood maps using logistic regression, we conducted additional statistical analyses to enhance the robustness of the findings. These included tests for multicollinearity to ensure the independence of predictor variables, correlation, and ROC tests to evaluate the model's performance, and validation techniques such as cross-validation to assess predictive accuracy. The data analysis and modeling were performed using the glm package in R for logistic regression. GIS software, specifically Quantum Geographic Information System (QGIS), was employed for spatial data handling and visualization, ensuring precise mapping of flood-prone areas. These tools collectively ensured a comprehensive and reliable approach to flood risk assessment and mapping.
Pearson's correlation
RXY values correspond to a specific level, including the absence of correlation (|RXY| = 0), the very weak correlation (0 < |RXY|< 0.2), the weak correlation (0.2 <|RXY|< 0.4), the medium correlation (0.4 < |RXY|< 0.6), the strong correlation (0.6 <|RXY|< 0.8) and the very strong correlation (0.8 < |RXY|< 1) (Tehrany et al. 2019).
Multicollinearity test
RESULTS AND DISCUSSION
Model performance and validation
The logistic regression model developed for flood mapping exhibited robust performance, as evidenced by the AUC values obtained from both the training and validation datasets. For the training dataset, the model achieved an AUC of 0.89, indicating a high level of accuracy in distinguishing between flood-prone and non-flood-prone areas. Similarly, the validation dataset yielded an AUC of 0.88, underscoring the model's strong generalizability and reliability in predicting flood susceptibility across different data subsets. These AUC values suggest that the logistic regression model effectively captures the underlying patterns and relationships critical for flood prediction, providing a reliable tool for flood risk assessment.
Validation of the flood susceptibility map
In this study, the probability threshold beyond which a location is identified as a flood-prone area is set at 0.5. In training, TP = 579, FP = 140, FN = 218, TN = 501, that is, 579 of the 718 flooded sites were predicted correctly, 139 were predicted incorrectly, and the prediction accuracy rate reached 80.64%. During validation, TP = 241, FP = 102, FN = 66, and TN = 205, that is, 241 of the 308 flooded sites were predicted correctly, 67 were predicted incorrectly, and the prediction accuracy rate reached 78.25%.
ROC analysis was carried out in this study to quantitatively confirm whether the flood susceptibility areas obtained and those that have occurred in the past overlap. The ROC curve of the logistic regression is shown in Figure 7, with an AUC of approximately 81 and 89% of the prediction and success accuracies, respectively. On this basis, it is arguable that the model implemented has a good predictive capacity, but the prediction of non-flooded sites is still insufficient considering that for a perfect model, the AUC is equal to 100%. Although future work could be undertaken to improve the performance of this model, the flood susceptibility map obtained could be used as a decision support tool for better flood management, helping municipal authorities take preventive operational measures to protect the population and reduce flood damage.
Statistical validation analyses
Conditioning factors . | TOL . | VIF . |
---|---|---|
Soil type | 0.93 | 1.08 |
LULC | 0.97 | 1.03 |
Rainfall | 0.69 | 1.44 |
TPI | 0.59 | 1.79 |
NDVI | 0.94 | 1.06 |
Distance from river | 0.78 | 1.28 |
TRI | 0.98 | 1.02 |
TWI | 0.61 | 1.63 |
Flow direction | 0.97 | 1.03 |
Plan curvature | 0.61 | 1.64 |
Profile curvature | 0.64 | 1.57 |
Aspect | 0.96 | 1.04 |
Slope | 0.79 | 1.26 |
Elevation | 0.58 | 1.73 |
Conditioning factors . | TOL . | VIF . |
---|---|---|
Soil type | 0.93 | 1.08 |
LULC | 0.97 | 1.03 |
Rainfall | 0.69 | 1.44 |
TPI | 0.59 | 1.79 |
NDVI | 0.94 | 1.06 |
Distance from river | 0.78 | 1.28 |
TRI | 0.98 | 1.02 |
TWI | 0.61 | 1.63 |
Flow direction | 0.97 | 1.03 |
Plan curvature | 0.61 | 1.64 |
Profile curvature | 0.64 | 1.57 |
Aspect | 0.96 | 1.04 |
Slope | 0.79 | 1.26 |
Elevation | 0.58 | 1.73 |
Logistic regression estimation
For logistic regression, the training sample is used to estimate the slope coefficients for all independent variables. The results of the logistic regression analysis for all independent variables are shown in Table 4.
Conditioning factors . | Estimated coefficient (β) . | Standard errors . | z-value . | p-value . |
---|---|---|---|---|
Intercept | −2.47 | 2.90 | −0.85 | 0.39 |
Soil type | 0.09 | 0.22 | −0.43 | 0.67 |
LULC | 0.38 | 0.26 | 1.47 | 0.14 |
Rainfall | 3.23 | 0.39 | 8.57 | <2 × 10 − 16*** |
TPI | 1.62 | 2.35 | 0.69 | 0.49 |
NDVI | −3.60 | 1.40 | −2.57 | 0.01* |
Distance to the river | −3.90 | 0.44 | −8.95 | <2.2 × 10 − 16*** |
TRI | 0.29 | 0.64 | 0.45 | 0.65 |
TWI | −0.68 | 0.77 | −0.89 | 0.38 |
Flow direction | −0.08 | 0.23 | −0.34 | 0.75 |
Plan curvature | 8.28 | 3.75 | 2.21 | 0.03* |
Profile curvature | 1.92 | 3.21 | 0.60 | 0.55 |
Aspect | 0.27 | 0.22 | 1.26 | 0.21 |
Slope | −1.17 | 1.60 | −0.73 | 0.47 |
Elevation | −6.10 | 0.65 | −9.45 | <2 × 10 − 16*** |
Conditioning factors . | Estimated coefficient (β) . | Standard errors . | z-value . | p-value . |
---|---|---|---|---|
Intercept | −2.47 | 2.90 | −0.85 | 0.39 |
Soil type | 0.09 | 0.22 | −0.43 | 0.67 |
LULC | 0.38 | 0.26 | 1.47 | 0.14 |
Rainfall | 3.23 | 0.39 | 8.57 | <2 × 10 − 16*** |
TPI | 1.62 | 2.35 | 0.69 | 0.49 |
NDVI | −3.60 | 1.40 | −2.57 | 0.01* |
Distance to the river | −3.90 | 0.44 | −8.95 | <2.2 × 10 − 16*** |
TRI | 0.29 | 0.64 | 0.45 | 0.65 |
TWI | −0.68 | 0.77 | −0.89 | 0.38 |
Flow direction | −0.08 | 0.23 | −0.34 | 0.75 |
Plan curvature | 8.28 | 3.75 | 2.21 | 0.03* |
Profile curvature | 1.92 | 3.21 | 0.60 | 0.55 |
Aspect | 0.27 | 0.22 | 1.26 | 0.21 |
Slope | −1.17 | 1.60 | −0.73 | 0.47 |
Elevation | −6.10 | 0.65 | −9.45 | <2 × 10 − 16*** |
***p-value ≤ 0.001; **p-value ≤ 0.01; *p-value ≤ 0.05.
The z-value and p-value columns represent the statistical significance of each coefficient in the model. A higher absolute value of z indicates that the estimated coefficient is more statistically significant. A lower p-value indicates that the coefficient is more statistically significant, and a value less than 0.05 is often considered as evidence to reject the null hypothesis. The results reveal that the null hypothesis is rejected for the predictors such as rainfall, NDVI, distance to river, plan curvature, and elevation (Table 4), indicating that these predictors are statistically significant (p-value < 0.05 and │z│ > 1.96) at a 95% confidence level, in the logistic regression model. In addition, a positive value of coefficient (β) means that the effect of the relevant variable increases the likelihood of flooding. However, a negative value of β means that the presence of the decision variable decreases the probability of flooding occurrence. Table 5 shows the new regression coefficients obtained by implementing the logistic regression model using only the five statistically significant predictors.
Conditioning factors . | Estimated coefficient (β) . | Standard errors . | z-value . | p-value . |
---|---|---|---|---|
Intercept | −0.98 | 1.42 | −0.69 | 0.49 |
Plan curvature | 9.43 | 3.06 | 3.08 | 2.04 × 10−3** |
Rainfall | 3.16 | 0.38 | 8.30 | < 2.2 × 10−16*** |
NDVI | −3.79 | 1.41 | −2.69 | 7.11 × 10−3** |
Distance to river | −3.80 | 0.45 | −8.43 | < 2.2 × 10−16*** |
Elevation | −5.85 | 0.63 | −9.31 | < 2.2 × 10−16*** |
Conditioning factors . | Estimated coefficient (β) . | Standard errors . | z-value . | p-value . |
---|---|---|---|---|
Intercept | −0.98 | 1.42 | −0.69 | 0.49 |
Plan curvature | 9.43 | 3.06 | 3.08 | 2.04 × 10−3** |
Rainfall | 3.16 | 0.38 | 8.30 | < 2.2 × 10−16*** |
NDVI | −3.79 | 1.41 | −2.69 | 7.11 × 10−3** |
Distance to river | −3.80 | 0.45 | −8.43 | < 2.2 × 10−16*** |
Elevation | −5.85 | 0.63 | −9.31 | < 2.2 × 10−16*** |
***p-value ≤ 0.001; **p-value ≤ 0.01; *p-value ≤ 0.05.
Flood susceptibility analysis
The flood susceptibility map generated using logistic regression provides a comprehensive analysis of the flood risk within the Deme watershed. The model categorizes the area into three susceptibility categories: high, moderate, and low. According to the model's output, 30% of the watershed area falls under the high susceptibility category, indicating regions that are at a significant risk of flooding. These high-risk areas are predominantly located near riverbanks and low-lying regions where water accumulation is most likely. The moderate susceptibility category covers 45% of the watershed, representing areas with a moderate risk of flooding, typically found in regions with slight elevation variations and near smaller tributaries. The remaining 25% of the watershed is categorized as low susceptibility, encompassing higher elevation areas and regions with better drainage systems, thus having a minimal risk of flooding.
Visual aids such as detailed maps and charts are crucial for illustrating the spatial distribution of these flood-prone areas. The flood susceptibility map vividly highlights the high-risk zones in red, moderate-risk areas in yellow, and low-risk zones in green, providing a clear and immediate understanding of the flood risk distribution across the watershed. In addition, bar charts can be used to present the percentage distribution of each susceptibility category, enhancing the visual comprehension of the data. These visual tools not only facilitate the identification of critical areas that require immediate attention and mitigation efforts but also serve as a valuable resource for urban planning, emergency response, and community awareness initiatives aimed at reducing the impact of potential flooding events in the Deme watershed.
The corresponding susceptibility class statistics were extracted from the statistical zonal toolbox and are shown in Table 6.
Susceptibility classes . | Area (km2) . | Proportion (%) . |
---|---|---|
Very high | 293.3 | 18 |
High | 482.01 | 19 |
Medium | 395.03 | 18 |
Low | 370.058 | 20 |
Very low | 397.04 | 24 |
Total | 1,937.3 | 100 |
Susceptibility classes . | Area (km2) . | Proportion (%) . |
---|---|---|
Very high | 293.3 | 18 |
High | 482.01 | 19 |
Medium | 395.03 | 18 |
Low | 370.058 | 20 |
Very low | 397.04 | 24 |
Total | 1,937.3 | 100 |
Factors contributing to flood susceptibility
The analysis of the 14 factors contributing to flood susceptibility revealed that some variables had a more pronounced impact on flood risks than others. High rainfall levels directly contribute to increased surface runoff, which can overwhelm drainage systems and lead to flooding. Similarly, steeper slopes facilitate rapid water flow, reducing infiltration and increasing the likelihood of flash floods. These factors are crucial in predicting flood-prone areas because they directly influence the volume and speed of water flow during heavy rainfall events. Soil type and land use were also critical factors affecting flood susceptibility. Certain soil types, such as clay, have low permeability, leading to higher surface runoff during heavy rains. Conversely, sandy soils, which are highly permeable, are less likely to contribute to flooding. Land use patterns, such as urbanization, significantly impact flood risks. Urban areas, with their extensive impermeable surfaces, such as roads and buildings, prevent water absorption into the ground, leading to higher runoff and greater flood risks. The conversion of natural landscapes to urban areas thus plays a substantial role in increasing flood susceptibility. Other factors, including proximity to water bodies, vegetation cover, and drainage network density, also influenced flood susceptibility but to a lesser extent. Areas close to rivers, lakes, or seas are naturally more prone to flooding due to potential overflow during heavy rains. Vegetation cover can mitigate flooding by enhancing water absorption and reducing runoff, while a dense drainage network can help manage excess water more effectively. Each of these factors interplays with the others, creating a complex web of influences that determines overall flood risk. Understanding the relative importance of these factors is essential for effective flood management and mitigation strategies, allowing for targeted interventions in the most vulnerable areas.
In the analysis of flood susceptibility, the five factors elevation (DEM), distance to river, rainfall, plane curvature, and NDVI emerge as the most significant contributors. Elevation directly influences flood potential as lower-lying areas are more prone to water accumulation. Proximity to rivers is crucial because areas closer to water bodies are at higher risk of flooding due to overflow. Rainfall intensity and frequency are primary drivers of flood events; increased rainfall can lead to saturated soil and excess runoff. Plane curvature affects water flow and accumulation on the landscape; concave areas tend to collect water, heightening flood risk. NDVI, which indicates vegetation density, influences the rate of water infiltration and runoff, with dense vegetation typically reducing flood susceptibility by promoting water absorption into the soil.
Beyond these five critical factors, the other nine also play vital roles in flood susceptibility, albeit to varying extents. Aspect and slope influence the direction and speed of water flow, impacting how quickly water can drain from a given area. TWI reflects the potential of water accumulation based on terrain, while TRI and TPI provide insights into the landscape's roughness and relative elevation, respectively, affecting water movement and pooling. Flow direction dictates the path of surface runoff, while both plane and profile curvature detail the terrain's shape, impacting water concentration. LULC and soil type are crucial in determining infiltration rates and runoff patterns, with different land covers and soil compositions either facilitating or impeding water absorption. While these factors are essential, their impact is often mediated by the dominant influence of elevation, distance to river, rainfall, plane curvature, and NDVI in determining flood-prone areas.
Comparison with previous studies
Many factors can cause flooding, and future floods can be very difficult to predict with any accuracy (Edamo et al. 2022c). For this reason, it is important to collect as many variables affecting floods as possible, and to choose a suitable analysis model for detecting future floods (Robi et al. 2018). This study used the logistic regression model and combined it with 14 geospatial variables related to flooding in the study area as follows: elevation, slope, aspect, plan curvature, profile curvature, TPI, TRI, TWI, flow direction, distance to the river, rainfall, LULC, NDVI, and soil type. A flood susceptibility map has been created, providing satisfactory and reliable results that can be used as a geospatial database for flood risk management decision-making and the development of the Deme watershed. The results of this study highlighted five statistically significant factors (p-value < 0.05) that increase the probability of flooding: elevation, distance to river, rainfall, plan curvature, and NDVI. The probability of flooding increases as elevation decreases (Strathie et al. 2017). This finding is in line with previous research (Németh et al. 2019; Shahiri Tabarestani & Afzalimehr 2021; Edamo et al. 2022a, 2022c). Another factor that directly affects the occurrence of flooding in our study area is the distance to the river. The results show that the closer sites are to rivers, the more likely they are to flood. This trend is to be expected, as most floods are caused by overflowing rivers during heavy rainfall. Some research has shown that the higher the rainfall, the higher the probability of flooding. Our results also point in this direction. Our results also show that the curvature of the plane affects the probability of flooding.
The more concave the curvature, the higher the probability of flooding. This discovery is in line with previous research. When the curvature of the plane is concave, the flow of water is concentrated in the trough, which will increase the soil's moisture content and the length of time the soil will remain saturated. The role of the NDVI was also notable. Since lower NDVIs are associated with water, it was expected that the probability would be higher in lower NDVI classes. It can be seen that the results of the study in the Deme watershed revealed various variables that affect susceptibility to flooding and that are in line with other research on other territories. However, as part of a case study of flood susceptibility mapping in the Swat River basin, eastern Hindukush region, Pakistan, it was found that the slope was the most important variable affecting higher susceptibility to flooding. Contrary to these studies, it was found that the slope was not significant in describing susceptibility to flooding in the Deme watershed, since the morphology of the Deme watershed commune is monotonous with gentle slopes ranging from 0.6 to 1%. It can be seen that there is a difference in the factors affecting flooding. It is clear that the Deme watershed presents a different topography and geomorphology from other areas, which leads to changes in the importance of the variables studied depending on the physical characteristics of the area. Therefore, flood susceptibility studies in other areas should be aware of the causal factors that should be studied first in the study of flood risk areas in order to reveal the peculiarities of the studied areas. In assessing the performance of the model used in this study, it was noted that, in some cases, flooding occurred in areas with a susceptibility less than 0.5, and in other cases, no flooding occurred at all, even in areas with a susceptibility higher than 0.5 (Dagnachew et al. 2024). Therefore, it is not clear that areas with relatively low susceptibility are safer. It is true that areas with high susceptibility require special management because of their high probability of flooding. However, low-susceptibility areas also need to be carefully managed in order to reduce flood damage. Another shortcoming of this study that needs to be improved is the resolution of the datasets used. Indeed, free low-resolution datasets were used for the acquisition of predictive variables from the digital terrain, soil, and precipitation model. Data resolution has always had an impact on the accuracy of prediction results. In future research, datasets with higher precision may be used.
Logistic regression was employed to create flood susceptibility maps, a method that has shown considerable effectiveness in previous research. Like earlier studies, our findings indicate that variables such as elevation, slope, land use, and proximity to water bodies are significant predictors of flood-prone areas. Consistent with the findings of Ha et al. (2022), our logistic regression model highlighted low-lying areas near rivers and streams as highly susceptible to flooding. This congruence underscores the robustness of logistic regression as a predictive tool in flood susceptibility mapping, reaffirming its utility across different geographical contexts.
However, our study also revealed some differences when compared to past research. Notably, we observed that urban land use had a higher predictive value for flood susceptibility in our model compared to rural areas, which contrasts with the findings of Tehrany et al. (2013), who reported minimal differences between urban and rural flood susceptibility. This discrepancy could be attributed to variations in data resolution, study area characteristics, or the specific urban infrastructure developments in our region of study that exacerbate flooding risks. In addition, our model incorporated more recent climat data, potentially capturing the effects of recent weather pattern changes more accurately than older studies.
Several factors might explain these similarities and differences. The use of updated and high-resolution spatial data could enhance the predictive accuracy of logistic regression models, explaining why our findings align closely with recent studies while differing from older ones. Furthermore, the urban–rural discrepancy in flood susceptibility might reflect regional differences in land use management and urban planning policies. Future research could benefit from integrating more localized variables and exploring the impacts of climate change on flood patterns to provide even more tailored and accurate flood susceptibility maps.
Implications for policy and planning
The study's findings on flood risk in the Deme watershed, derived through logistic regression, provide crucial insights for policymakers and urban planners. By identifying key variables that contribute to flood susceptibility, such as land use, soil type, topography, and rainfall intensity, the study offers a scientific basis for developing targeted interventions. Policymakers can utilize these insights to prioritize areas for flood mitigation efforts, ensuring resources are allocated efficiently. This data-driven approach can significantly enhance the effectiveness of flood management strategies, reducing the overall risk to communities and infrastructure in the watershed.
Specific strategies that urban planners can implement based on the study's findings include the enforcement of stringent zoning regulations to restrict development in high-risk flood zones. In addition, planners can advocate for the creation and maintenance of green spaces and wetlands, which act as natural buffers and help absorb excess rainfall. Investing in improved drainage systems and flood defenses, such as levees and retention basins, particularly in identified high-risk areas, can also mitigate potential flood damage. Incorporating these measures into urban planning and development processes will help build more resilient communities that are better equipped to handle extreme weather events.
Moreover, the study underscores the importance of integrating flood risk assessment into the broader framework of sustainable urban development. Policymakers should consider implementing early warning systems and community awareness programs to educate residents about flood risks and preparedness strategies (Ukumo et al. 2022). This proactive approach can enhance community resilience and ensure timely evacuation and response during flood events. By aligning urban planning initiatives with the findings of this study, policymakers can foster a safer, more sustainable environment that minimizes flood risks and enhances the overall quality of life in the Deme watershed.
Implications for policy and planning
The practical implications of this study are substantial for policymakers and urban planners. The accurate flood susceptibility maps generated can inform the creation of more effective flood prevention strategies, such as the strategic placement of flood barriers and the design of improved drainage systems (Markantonis et al. 2013). In addition, urban development plans can leverage these insights to avoid construction in high-risk areas, thereby reducing future flood damage. Future research should focus on integrating more diverse data sources and refining the logistic regression model to enhance its predictive accuracy. Ultimately, this study underscores the critical need for proactive measures in mitigating flood impacts, advocating for a data-driven approach to protect vulnerable communities in the Deme watershed.
Limitations and future research
In creating flood maps using logistic regression, it is crucial to acknowledge the limitations inherent in the study. One significant constraint is the quality and granularity of the data. Flood prediction heavily relies on historical flood records, topographical data, land use, and climate variables, which may not always be available at the necessary resolution or accuracy. In addition, the model assumptions of logistic regression, such as linearity between the independent variables and the log odds of the dependent variable, might not fully capture the complex, non-linear nature of flood phenomena. This simplification can lead to less accurate predictions, particularly in regions with highly variable or extreme climatic conditions.
Future research should focus on enhancing flood susceptibility mapping and risk assessment by addressing these limitations. Integrating higher-resolution data, including real-time hydrological and meteorological inputs, could significantly improve model precision. Exploring advanced machine learning techniques, such as random forests or neural networks, might better capture the intricate relationships and interactions between variables influencing flood risk. Moreover, developing hybrid models that combine physical flood simulation models with statistical approaches could offer a more robust framework for predicting flood events. These advancements would not only improve the accuracy of flood maps but also enhance the decision-making processes for disaster preparedness and mitigation strategies (Ugural & Burgan 2021).
CONCLUSION
The main objective of this research was to produce a flood susceptibility map for the Deme watershed using a logistic regression model. The flood inventory includes 207 flood sites identified through a household survey in the Deme watershed. These data were used to calibrate (70%) and validate (30%) the flood susceptibility model. Five factors conditioning flooding (elevation, distance to river, rainfall, plan curvature, and NDVI) were identified as statistically significant (p-value < 0.05 and │z│ > 1.96) at a 95% confidence level, and used to elaborate the map of flood-sensitive areas. The factors can be ranked according to their importance as follows: elevation, distance to river, rainfall, plan curvature, and NDVI. The accuracy of flood susceptibility mapping was appreciated through the ROC method. The largest AUC indicates a very good accuracy of the flood susceptibility map (AUC = 0.89) for training and (AUC = 0.88) for validation, respectively. This flood susceptibility map is a useful tool for taking preventive measures to mitigate flooding and plan urban development. In areas where flood susceptibility is high to very high, residents have to be careful during the rainy season, and pay attention to possible flood disasters. Also, the communal authorities should commit to disaster prevention and control, by prioritizing the extension of stormwater drainage, the protection of flood risk areas and the public awareness campaign on flood risks, in order to support watershed development and the safety of the population and infrastructure.
The logistic regression model has proven to be highly effective in mapping flood susceptibility in the small Deme watershed. By analyzing various environmental and climatic factors, the model accurately identifies areas at high risk of flooding, providing valuable insights into the dynamics of flood occurrences in this region. This study significantly contributes to our understanding of flood risks, emphasizing the importance of incorporating predictive modeling in flood management strategies. The findings highlight the potential for logistic regression to serve as a robust tool for policymakers, aiding in the formulation of targeted flood prevention measures and guiding urban development to minimize flood impacts. Future research should focus on integrating more dynamic and high-resolution data to enhance the accuracy of flood susceptibility maps. Overall, the study emphasizes the need for proactive measures in mitigating flood impacts, and ensuring the safety and resilience of vulnerable communities.
ACKNOWLEDGEMENT
We would like to thank all institutions and individuals for providing the necessary data for this study.
FUNDING
No fund was provided from any source.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.