ABSTRACT
Each year, floods, as one of the natural calamities, lead to significant destruction in various regions globally. Consequently, precise flood prediction becomes crucial in mitigating human and financial losses and effectively managing water resources. To achieve this, Convolutional Neural Network and Long Short-Term Memory (LSTM) models were utilized in this study to map flood hazards in the Aji Chay watershed. Flood data points were collected from the study area and subsequently divided into two groups using the Absence Point Generation technique. The first group, comprising 70% of the data, served as the training dataset for model construction, while the remaining 30% formed the testing dataset for validation. Seven key factors influencing floods, namely, precipitation, land use, Normalized Difference Vegetation Index, drainage density, flow direction, topographic wetness index, and terrain ruggedness index, were identified through Leave-One-Feature-Out approach and employed in the modeling process. The LSTM model with a Kolmogorov–Smirnov (KS) statistic value of 88.14 was chosen as the best model based on the KS plot. The results revealed that approximately 37% of the study area fell into high and very high flood risk classes. These research findings can be valuable in the effective management of flood-prone areas and the reduction of flood damages.
HIGHLIGHTS
Convolutional Neural Network/Long Short-Term Memory (LSTM) models predict Aji Chay watershed floods.
The key factors are precipitation, land use, and Normalized Difference Vegetation Index.
LSTM (Kolmogorov–Smirnov statistic: 88.14) identifies 37% high flood risk.
Valuable insights for flood-prone area management are discussed in this study.
Flood risk zoning in the catchment area is evaluated.
INTRODUCTION
MATERIALS AND METHODS
1. It provides an estimate of the interpolation error or uncertainty, allowing for a quantitative assessment of the reliability of the interpolated values.
2. It can handle clustered or irregularly spaced data points, which is common in precipitation monitoring networks.
3. It allows for the incorporation of additional covariates or auxiliary variables, such as elevation or terrain features, to improve the accuracy of the interpolation.
The formula uses M0 to represent the elevation of the point being assessed, Mn to represent the elevation of the surrounding grid, and n as the total number of neighboring points considered in the evaluation. The index yields values close to zero for flat areas, positive values for higher elevations than the surroundings, and negative values for valleys or depressions (Olanrewaju & Umeuduji 2017). In the study area, the TPI values ranged from 0.95 to 120.6. The TPI, TWI, and TRI maps were generated using SAGA-GIS software.
The NDVI formula uses the reflectance of near-infrared and red light bands to calculate values ranging from −1 to 1, which serve as an index for vegetation cover density. Higher NDVI values indicate a higher density of vegetation cover, whereas lower NDVI values indicate a lack of vegetation. The NDVI data for this layer were acquired from C1 LANDSAT 8 OLI/TIRS on 13 March 2018, with the path number 117 and row number 43. The NDVI values in the study area vary from 0.19 to 0.88. The drainage density map is calculated by dividing the total length of all rivers and watercourses in the watershed area by the total area of that region (Desalegn & Mulu 2021). Using the Line Density function in ArcGIS software, the drainage density map for the Aji Chay watershed was generated, with values ranging from 0 to 0.68 km/km2 per square kilometer. The flow direction represents the direction toward lower neighboring cells for each cell in the raster data, determined based on the slope of each cell relative to its adjacent cell (Hadibasyir & Fikriyah 2023). The flow direction values in the study area range from 1 to 128. The flow direction map used in this study was derived from the DEM data as the input raster. Table 1 provides the breakdown of land use classification based on zoning regulations.
Land use . | Area (ha) . | Area (%) . |
---|---|---|
Agricultural land | 2,962,995.9 | 37.87 |
Bareland | 89,983.8 | 1.15 |
City | 49,686.9 | 0.64 |
Forest | 233,695.7 | 2.99 |
Island | 8,577.6 | 0.11 |
Range | 3,824,794.9 | 48.88 |
Rock | 10,627.1 | 0.14 |
Saltland | 64,187.4 | 0.82 |
Water | 441,245.2 | 5.64 |
Wetland | 26,176.7 | 0.33 |
Woodland | 112,261.7 | 1.43 |
Land use . | Area (ha) . | Area (%) . |
---|---|---|
Agricultural land | 2,962,995.9 | 37.87 |
Bareland | 89,983.8 | 1.15 |
City | 49,686.9 | 0.64 |
Forest | 233,695.7 | 2.99 |
Island | 8,577.6 | 0.11 |
Range | 3,824,794.9 | 48.88 |
Rock | 10,627.1 | 0.14 |
Saltland | 64,187.4 | 0.82 |
Water | 441,245.2 | 5.64 |
Wetland | 26,176.7 | 0.33 |
Woodland | 112,261.7 | 1.43 |
The first and most crucial step in creating a flood hazard map is to prepare a flood inventory map. There are various methods used for this purpose, and in the study area, the locations affected by floods were identified through on-site surveys. Using a GPS device, flood-affected locations were recorded during the period from 2018 to 2020. Generating absence data is an important step in binary modeling of environmental issues, which typically involves random sampling. In this study, the Absence Point Generation (APG) tool, a Python-based ArcGIS toolbox, was employed to automatically generate absence data for the flood study area. The APG toolbox was developed to automate the creation of absence datasets for geospatial studies. It utilizes a frequency ratio analysis of key factors such as altitude, slope degree, TWI, and distance from rivers to define low potential zones for generating absence datasets. The importance of incorporating absence points in environmental binary modeling is highlighted by Naghibi et al. (2021), who found that the APG toolbox significantly improved the performance of benchmark algorithms like RF and boosted regression trees compared to traditional absence sampling methods, resulting in higher area under the curve (AUC) values. This underscores the necessity of incorporating absence points to enhance the accuracy and reliability of predictive models for various hazards like landslides, floods, and erosion (Bui et al. 2019; Naghibi et al. 2021).
LOFO feature selection
Feature selection is a crucial step in model development that aims to reduce the number of input variables. Its main objectives are to decrease computational costs and potentially enhance the model's performance (Zebari et al. 2020). In certain predictive modeling scenarios, a large number of variables can hinder model development and demand substantial system memory. Furthermore, including irrelevant or redundant input variables may lead to a decline in the model's performance (Brownlee 2019). To address these challenges, feature selection methods are employed to eliminate unnecessary or redundant variables, thereby reducing the number of input features. Various approaches exist to assess the relative importance of variables in a dataset. In this study, the Leave-One-Feature-Out (LOFO) approach was utilized to identify the most influential features in the modeling process (Roseline & Geetha 2021). The LOFO method involves evaluating the model's performance initially using all input features based on the ROC-AUC metric. Subsequently, one feature is iteratively removed at a time, and the model is retrained and evaluated on a validation set (Gholami et al. 2021). This process yields the importance of each feature, allowing for the elimination of features with low importance from the modeling process.
FLOOD HAZARD MODELING
Convolutional neural network
1- The first layer is a Convolution1D layer with input_dim = 7, where the number 7 represents the seven selected features in the feature selection stage. The layer has 256 neurons with the activation function set to relu.
2- A max pooling layer follows the first Convolution1D layer.
3- The next layer is another Convolution1D layer with 128 neurons and relu activation function.
4- Another max pooling layer is applied after the second Convolution1D layer.
5- The process continues with another Convolution1D layer with 64 neurons and relu activation function.
6- The last max pooling layer is used after the third Convolution1D layer.
7- The flatten layer is applied to transform the output into a one-dimensional vector.
8- The final layer is a fully connected layer with one neuron.
In the compilation step, the binary_crossapg loss function is chosen, and the optimizer is set to adam. In addition, the accuracy metric is selected.
Optimization of CNN model parameters
1. Changing the batch_size: As mentioned earlier, the initial value for this parameter was set to 10. In this stage, we experimented with two different values: 8 and 15. The highest accuracy was achieved with a batch_size of 10.
2. Changing the number of epochs: The initial value for this parameter was set to 2,000. In this stage, we tried two different values: 500 and 1,500. The highest accuracy was obtained with 1,500 epochs.
3. Changing the optimizer type: Initially, we used the adam optimizer. In this stage, we switched to the Adamax optimizer. However, the highest accuracy was still achieved with the adam optimizer.
These optimization steps were performed to fine-tune the CNN model and find the best combination of hyperparameters that yield the highest accuracy.
LSTM model
1- In the first layer, an LSTM with 256 units was utilized. In addition, a dropout rate of 0.2 and recurrent dropout rate of 0.2 were applied in this layer.
2- In the second layer, an LSTM with 128 units was employed. The dropout rate and recurrent dropout rate in this layer were both set to 0.2.
3- The third layer featured an LSTM with 64 units. Similar to the previous layers, a dropout rate of 0.2 and recurrent dropout rate of 0.2 were used here.
4- The fourth layer consisted of a dense layer with one unit, and the activation function employed was Sigmoid.
In the compilation step, the binary_crossentropy loss function was chosen, and the optimizer was set to adam. In addition, the accuracy metric was selected.
In the execution or fitting step of the model, the number of epochs was set to 2,000. Moreover, a batch size of 10 was chosen.
Optimization of LSTM model parameters:
Optimizing the LSTM model parameters involved three key steps:
1- Batch Size Modification: The initial batch_size was set to 10. During this optimization phase, we experimented with two different values: 8 and 15. After thorough evaluation, we found that the batch_size of 10 resulted in the highest accuracy.
2- Epochs Adjustment: Initially, the number of epochs was set to 2,000. In this optimization stage, we tried two different values: 500 and 1,500. However, after analyzing the results, we determined that the best accuracy was achieved with 2,000 epochs.
3- Optimizer Type Change: We started with the adam optimizer, but in this optimization step, we switched to the Adamax optimizer. Surprisingly, the highest accuracy was still obtained using the adam optimizer.
By performing these optimization steps, we fine-tuned the LSTM model and identified the optimal hyperparameter combination that resulted in the highest accuracy.
Evaluation of flood risk models
In this study, four charts were utilized to evaluate the performance of the models, and below, we present an introduction to these four charts:
The Gain chart
The Gain chart is a graphical representation used to directly compare the performance of different models with a baseline random model, which serves as the best guess for labeling samples (Soukup &Davidson 2002). It is a valuable metric for evaluating model effectiveness. The random line on the chart represents results obtained randomly. Models that deviate further from the baseline diagonal line provide more value and perform better (Halawi et al. 2022). Conversely, models closer to the baseline perform more like random guessing and are less valuable. The wizard line on the chart represents the best achievable result by a model. Thus, the closer the model's performance line is to the wizard line, the better its performance. In summary, the distance between these lines represents a range spanning from the worst possible result to the best possible result.
The Lift chart
The Lift chart is a type of cumulative gain chart used to evaluate model performance. In the cumulative gain chart, the x-axis displays the predicted values by the model and the y-axis represents the cumulative gain relative to the entire dataset. However, in the Lift chart, the x-axis remains the same with the predicted values by the model, while the y-axis represents the Lift instead of cumulative gain. Lift indicates the relative improvement of each point compared to the baseline, which represents the ratio of positives to negatives in the input data. In summary, the Lift chart demonstrates how much the model has improved compared to a random baseline. A larger area between the Lift curve and the baseline indicates better model performance.
The Decile-wise Lift chart
The Decile-wise Lift chart is a visualization that involves sorting the test data based on the model's predicted probabilities and then dividing it into 10 equal deciles. For each decile, the lift value is calculated by comparing the ratio of actual positive instances to the predicted positive instances from the model with the ratio of actual positive instances to predicted positive instances in the entire test dataset. The lift values are plotted on the vertical axis, and the deciles are plotted on the horizontal axis. If the lift value for the first decile is 2, it indicates that the model performed twice as well as a random model in predicting the data in that decile. Lift values less than 1 suggest that the model performed worse than a random model, while lift values greater than 1 indicate that the model outperformed a random model. By studying the Decile-wise Lift chart, we can identify which decile model made the best predictions and where its performance declined. Overall, the Decile-wise Lift chart provides insights into how much better the model performed compared to a random model when predicting the test data.
The KS chart
The Kolmogorov–Smirnov (KS) chart is a graphical tool used to assess the classification performance of models. It specifically quantifies the distinction between the distributions of positive and negative instances. The KS value ranges from 0 to 100, and higher values indicate that the model is better at distinguishing between positive and negative cases (Abubakar & Muhammad Sabri 2021). A KS value of 0 suggests that the model cannot differentiate between positives and negatives and performs like random guessing. In summary, the KS chart provides a metric for evaluating how effectively a classification model separates positive and negative instances, and higher KS values imply better performance in this regard.
Interpretability of deep learning models
The complexity and opacity of neural network models have created a strong need for a clear understanding of their decision-making process (Kokhlikyan et al. 2020). To address this, various methods were used in this study to enhance the interpretability of deep learning models. Shapley value plots were utilized to assess the significance of the criteria employed during model development. Shapley value, derived from cooperative game theory, has gained prominence in recent years with diverse applications in machine learning models (Rozemberczki et al. 2022). Multiple approaches were implemented in this research to generate Shapley value plots. Among the conventional methods, the beeswarm chart was employed to visualize the Shapley value results. Using colors, this approach illustrates the primary value of each feature in individual instances, enabling users to grasp the importance of features and their influence on the model's output more easily. Another approach used for explaining deep learning models is through Individual Conditional Expectation (ICE) plots (ICEPs). These plots visualize how each input parameter's change affects the model's prediction (Lorentzen & Mayer 2020). ICEPs are similar to Partial Dependence Plots (PDPs) but differ in that PDP calculates the average over the marginal distribution, while ICE retains all individual instances. Each line in the ICEP represents predictions for a specific output. By avoiding averaging across all samples, ICE reveals nonhomogeneous relationships, but it is limited to a specific target feature, as two features may create overlapping levels that are challenging to identify visually (Molnar 2020).
In this study, another approach for interpretability was employed, known as Deep Learning Model Sensitivity Analysis. This method examines how predictions change with variations in each input parameter (Li et al. 2021). The research involved altering the inputs by increments of 5, 10, 20, and 40% and then calculating the corresponding changes in predictions. Subsequently, all the results were visualized on a chart, allowing for the identification of the most sensitive parameters.
RESULTS
Feature selection
Flood hazard maps
Model . | Class . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Very low . | Low . | Moderate . | High . | Very high . | ||||||
Area (ha) . | Area (%) . | Area (ha) . | Area (%) . | Area (ha) . | Area (%) . | Area (ha) . | Area (%) . | Area (ha) . | Area (%) . | |
CNN | 2,098,091.14 | 26.93 | 1,350,974.96 | 17.34 | 1,157,493.70 | 14.85 | 1,244,107.98 | 15.97 | 1,941,442.69 | 24.92 |
LSTM | 1,458,259.57 | 18.71 | 1,813,729.73 | 23.28 | 1,647,958.36 | 21.15 | 1,419,699.60 | 18.22 | 1,452,328.99 | 18.64 |
Model . | Class . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Very low . | Low . | Moderate . | High . | Very high . | ||||||
Area (ha) . | Area (%) . | Area (ha) . | Area (%) . | Area (ha) . | Area (%) . | Area (ha) . | Area (%) . | Area (ha) . | Area (%) . | |
CNN | 2,098,091.14 | 26.93 | 1,350,974.96 | 17.34 | 1,157,493.70 | 14.85 | 1,244,107.98 | 15.97 | 1,941,442.69 | 24.92 |
LSTM | 1,458,259.57 | 18.71 | 1,813,729.73 | 23.28 | 1,647,958.36 | 21.15 | 1,419,699.60 | 18.22 | 1,452,328.99 | 18.64 |
Evaluation of CNN and LSTM models
In this research, we utilized four evaluation methods to assess the performance of CNN and LSTM models in flood modeling, namely, Lift charts, Decile-wise Lift charts, cumulative gains, and KS statistic. The Lift chart depicts how much better the CNN and LSTM models perform compared to a random model. Initially, during the early training and testing stages of the CNN model, the Lift values were approximately 1.9 and 1.7 times better than the random model. However, as the training iterations progress, the Lift gradually decreases and eventually converges to the random model line. This is due to higher probability scores being concentrated in the upper deciles, which is also evident in the cumulative gains chart. Consequently, the lower deciles exhibit lower probability values and resemble the performance of the random model. The Decile-wise Lift chart displays the percentage of target class observations in each decile. Decile 1 exhibits the highest percentage, which decreases as we move to higher deciles, even going below the random model line at a particular point. This is because the model has fewer observations in the higher deciles, while the random model distributes observations randomly and uniformly. The cumulative gains chart, representing an accumulative curve, indicates that the accumulative gain curve for the CNN and LSTM models becomes smooth after decile 6, signifying that deciles 6–10 have either the minimum records or no records at all. The Wizard model attains 100% at decile 5, indicating its ideal performance with perfect predictions and almost 100% prediction accuracy. The KS statistic chart is used to compare different distributions, and the KS value corresponds to the point of maximum difference between the two distributions. In summary, this chart provides insights into the deep learning models' capability to distinguish between two events. The KS values for the CNN and LSTM models during the training and testing stages are 75.02, 56.09, 78.65, and 88.14, respectively, at decile 5 on the chart.
Interpretability results of LSTM model for flood risk prediction
From the perspective of maximal to minimal impact:
Based on these plots, the top four features, in order of importance, had the greatest impact on the model output for predicting flood risk in the study area. According to several studies (Ayazi et al. 2010; Huang & Weile 2011), decreasing underground water or underground water resources depletion has been recognized as the most important controlling factor for landslide hazard in Iran and China.
The results show that flood risk is higher in areas with high drainage density due to rapid runoff accumulation. Also, rainfall and flood occurrence have a direct relationship in the study area; as rainfall increases, the number of flooded pixels and flood risk class weights also increase. Another influential factor is elevation – with increasing elevation, flood risk also increases in high elevation range classes. The main reason for flooding at high elevations, especially hilly and mountainous areas, is the land slope, causing faster rainwater flow. The results indicate higher flood risk in areas with high elevation, rainfall, and drainage density coupled with low vegetation cover and agricultural land use, leading to more flooding in these areas.
DISCUSSION
Feature selection
In Figure 8, the LOFO results are presented for feature selection in flood hazard assessment in the Aji Chay region. Among the 13 features studied, the LOFO algorithm identified 7 important features for flood hazard assessment, namely, rainfall, land use, NDVI, drainage density, flow direction, TWI, and TRI (Al-Areeq et al. 2022; Sachdeva & Kumar 2022; Yang et al. 2022).
Flood hazard maps
Two flood hazard maps were generated using the CNN and LSTM models, shown in Figures 10 and 11, respectively. The flood hazard classes were categorized into five levels, namely, very high, high, moderate, low, and very low, using the natural breaks method (Cao et al. 2016; Ghosh et al. 2022). According to result maps and the associated table, the CNN and LSTM models have identified 40.89% (318,550.68 hectares) and 36.86% (2,872,028.58 hectares) of the total area falling into the very high and high-risk flood hazard classes, respectively. Conversely, only 26.93% (2,098,091.14 hectares) and 18.71% (1,458,259.57 hectares) of the area are categorized under the very low-risk class for the CNN and LSTM models, respectively. The classified regions prone to flood hazards by both models include parts of the western, northeastern, and central areas of the Aji Chay region, as well as some western parts. The main contributing factor to this classification is the land use changes and reduced vegetation cover in river basins due to recurrent droughts in recent years, leading to decreased water infiltration in the soil and an increase in surface runoff.
Performance evaluation of deep learning models
General conclusion
The main aim of this study is to utilize machine learning models for creating a flood hazard map and analyzing influential variables in flood occurrences within the studied area.
To identify flood-prone areas, both CNN and LSTM models were employed to generate the flood hazard map. This involved using 13 criteria and 170 flood and non-flood data points. The non-flood points were extracted using the APG toolbox in ArcGIS software.
The results highlighted seven criteria, namely, rainfall, land use, NDVI, drainage density, flow direction, TWI, and TRI, as the most significant factors contributing to flood risk, identified through the LOFO feature selection algorithm.
In addition, SHAP plots demonstrated that these criteria played a major role in the modeling process. Based on evaluation metrics using training and testing datasets, the LSTM model outperformed the CNN model with KS values (at threshold 5) of 78.35 and 88.14%, respectively.
The flood hazard map effectively pinpointed locations prone to flooding, offering valuable insights for planners to implement preventive measures and enabling crisis managers and rescue teams to identify areas requiring evacuation and assistance during flood events. Moreover, it can enhance public awareness and minimize flood-related damages.
In conclusion, the flood hazard map proves to be an essential tool for flood risk management and mitigation in the studied region, providing valuable and precise information about potential flood hazards to decision-makers and planners.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.