Each year, floods, as one of the natural calamities, lead to significant destruction in various regions globally. Consequently, precise flood prediction becomes crucial in mitigating human and financial losses and effectively managing water resources. To achieve this, Convolutional Neural Network and Long Short-Term Memory (LSTM) models were utilized in this study to map flood hazards in the Aji Chay watershed. Flood data points were collected from the study area and subsequently divided into two groups using the Absence Point Generation technique. The first group, comprising 70% of the data, served as the training dataset for model construction, while the remaining 30% formed the testing dataset for validation. Seven key factors influencing floods, namely, precipitation, land use, Normalized Difference Vegetation Index, drainage density, flow direction, topographic wetness index, and terrain ruggedness index, were identified through Leave-One-Feature-Out approach and employed in the modeling process. The LSTM model with a Kolmogorov–Smirnov (KS) statistic value of 88.14 was chosen as the best model based on the KS plot. The results revealed that approximately 37% of the study area fell into high and very high flood risk classes. These research findings can be valuable in the effective management of flood-prone areas and the reduction of flood damages.

  • Convolutional Neural Network/Long Short-Term Memory (LSTM) models predict Aji Chay watershed floods.

  • The key factors are precipitation, land use, and Normalized Difference Vegetation Index.

  • LSTM (Kolmogorov–Smirnov statistic: 88.14) identifies 37% high flood risk.

  • Valuable insights for flood-prone area management are discussed in this study.

  • Flood risk zoning in the catchment area is evaluated.

In contemporary times, the escalating risk of floods is attributed to factors such as intense rainfall, hurricanes, urbanization near rivers and coastal zones, and population growth. A recent analysis conducted by the Global Resource Institute indicates that the global population affected by floods is projected to double by 2030 (Global Resource Institute [GRI] 2021). This emphasizes the potential of floods to cause substantial damage to both properties and urban infrastructure (Costache 2019). Accurate information regarding flood occurrences plays a crucial role in guiding decision-makers to implement effective strategies to tackle flood-related challenges (Cao et al. 2019). This valuable knowledge can be acquired through precise predictions concerning the timing and location of floods. GIS plays a crucial role in disaster management by providing a framework for storing, managing, analyzing, and visualizing large amounts of data related to geological disasters, pathogen contamination disasters, and emergency situations (Bassi 2023). Researchers employ various methodologies to study floods (Sahana et al. 2020), which encompass multicriteria decision-making approaches (Chen et al. 2011; Das 2019; Thimmaiah et al. 2020), the integration of remote sensing data with Geographic Information System (GIS) technology, frequency ratio analysis (Khosravi et al. 2019), logistic regression (Youssef et al. 2015), fuzzy logic, random forest (RF) (Chapi et al. 2017; Paul et al. 2019), artificial neural networks (Falah et al. 2019; Antzoulatos et al. 2022), and support vector machines (Razavi Termeh et al. 2018). Sophisticated techniques can achieve high-resolution and precise flood prediction performance. The prediction of flood hazard variables using physical methods may entail a sequence of hydraulic and hydrological models, which describe the physical aspects of floods. While such models provide a deeper comprehension of floods, they often necessitate intricate computations and extensive data. Hence, machine learning methods are employed as an alternative approach to tackle the challenges of flood prediction (Hassanien & Darwish 2023). Incorporating machine learning techniques alongside flood prediction methods can enhance flood preparedness. Machine learning algorithms play a crucial role in identifying patterns within data, whether it be hyperspectral data for pattern recognition (Hassan & Sabha 2023) and use these patterns to make precise predictions for new data instances. Deep learning techniques have been increasingly employed in flood management due to their ability to overcome limitations and achieve higher accuracy by leveraging inductive biases to better process the spatial characteristics of flooding events (Bentivoglio et al. 2022). The rationale for employing deep learning models in this study is their capability to capture complex relationships and nuances in flood data (Enhancing Flood Risk Assessment Through Machine Learning & Open Data 2023). Deep learning models can learn underlying physical phenomena, eliminating the need for manual parameter setting and reducing computational burden, which is crucial for real-time predictions in large urban areas (Kevin et al. 2023). While simpler models may suffice for basic predictions, deep learning models outperform them in handling more complex spatial data (Kevin et al. 2023). The advanced capabilities of deep learning models make them essential for precise flood risk assessments in the face of increasing urban floods and changing environmental conditions (Jones & Dawson 2023). In recent times, Convolutional Neural Network (CNN) models (Kabir et al. 2020; Khosravi et al. 2020; Hosseiny 2021; Seydi et al. 2022) and Recurrent Neural Network (RNN) models (Ngo et al. 2021) have gained substantial attention in the field of flood analysis. This study aims to develop a flood risk map utilizing CNN and Long Short-Term Memory (LSTM) models in the Aji Chay region. The study area, the Aji Chay watershed, is located in the northwestern region of Iran and is one of the significant sub-basins of Lake Urmia. Positioned between latitudes 37 degrees and 42 minutes to 38 degrees and 30 minutes north, and longitudes 40 degrees and 45 minutes to 53 degrees and 47 minutes east, it shares boundaries with the Aras River basin to the north, the Qezel Uzan watershed to the east, the Qezel Uzan watershed and the West Sahand sub-basin to the south, and the North Lake Urmia sub-basin to the east. Covering an area of approximately 12,790 km2, the Aji Chay watershed predominantly extends in the eastern part of the Lake Urmia basin. Its main branches consist of Sini Khay, Tajyar, Nahand, and Ojan Chay. Flowing within the Aji Chay sub-basin, the Aji Chay River serves as the central main drainage channel for this sub-basin. The study area is shown in Figure 1.
Figure 1

Location map of the Aji Chay basin.

Figure 1

Location map of the Aji Chay basin.

Close modal
The study focuses on floods, which are natural occurrences influenced by multiple factors, such as climate, hydrology, geomorphology, topography, and land use. Thorough investigation and referencing previous research, 13 significant factors affecting floods were identified for this study. These factors include precipitation, land use, elevation, slope, aspect, Normalized Difference Vegetation Index (NDVI), distance from the river, drainage density, flow direction, geology, Topographic Wetness Index (TWI), Topographic Position Index (TPI), and Terrain Ruggedness Index (TRI) (as shown in Figure 24). Elevation and slope are indeed crucial parameters in flood risk and vulnerability mapping. Variations in elevation have a definitive impact on climate characteristics (Samanta et al. 2011), while slope controls surface runoff, water flow ferocity, and vertical percolation (Youssef et al. 2011; Adiat et al. 2012). The land use/land cover factor directly or indirectly influences infiltration, evapotranspiration, and surface runoff generation (Samanta et al. 2018). The TWI refers to the spatial distribution of wetness and controls overland water flow, having a significant impact on flood mapping (Samanta et al. 2018). Aspect, defined as the direction of the maximum slope, is considered one of the most important flood hazard indicators (Taromideh et al. 2022). The NDVI included as flood risk in bare areas is higher than in areas with varying vegetation densities. Distance from rivers and drainage density are crucial factors, as flood risk is higher in areas closer to rivers when they exceed their capacity (Samanta et al. 2018). Flow direction is important for determining water accumulation in flood hazard mapping. Precipitation is considered the most important and fundamental factor for flood risk assessment. Geology and land use are also important considerations for various flood types. The TPI and TRI play crucial roles in flood hazard mapping. TPI helps identify the position of a point in relation to its surroundings, aiding in understanding flow accumulation and potential flood pathways (Omayma et al. 2022). TRI quantifies terrain ruggedness, indicating areas where water flow might be impeded or accelerated, influencing flood dynamics (Pankaj et al. 2023). By incorporating these 13 factors, which have been extensively studied and validated in the literature, the authors aim to capture the complex interplay of various elements that shape flood events, ultimately enhancing the predictive capabilities of the flood risk map for the study area. The methodology for generating maps for each of these factors was subsequently explained. Raster maps with a pixel size of 10 m were utilized to prepare various parameters required for the study. Slope, an essential topographic parameter, is determined by calculating the height difference between two points on the ground surface divided by the horizontal distance within a unit. It causes water to flow toward lower slopes, increasing the velocity of water along its path (Liu & Cho 2001). The slope map was generated from a Digital Elevation Model (DEM) with a spatial resolution of 10 m using the Slope function in ArcGIS software. The slope values range from 0 to 210°. The direction of the slope directly influences soil moisture and flood occurrences in the area (Costache et al. 2023). In northern slopes, where less sunlight reaches the ground, water and moisture tend to be retained. In contrast, southern slopes, receiving more sunlight, result in drier and less moist soil. Consequently, the slope direction is a critical factor in flood and soil moisture prediction. The slope direction map is derived from the DEM map. Elevation also significantly impacts floods. At a larger scale, flood dynamics differ between high-altitude regions (such as mountains) and low-altitude regions (such as plains). At a smaller scale, the elevation above the ground surface determines the flow paths of water on the ground or areas where water accumulates (Antzoulatos et al. 2001). The study area's DEM with a resolution of 10 m was obtained from the website https://earthexplorer.usgs.gov/ and depicts elevation varying from 128 to 4,782 m. The lowest elevations are found in the northeastern and southeastern parts of the area. The precipitation map was prepared using daily rainfall data from 28 rain gauge stations spanning 20 years. The kriging interpolation method in ArcGIS software was employed to classify precipitation. This method uses a statistical model-based algorithm to estimate precipitation values in rain gauge stations with missing data through interpolation. The decision to use kriging was based on its well-established advantages and suitability for spatial interpolation of environmental variables, particularly precipitation data, and its ability to provide accurate predictions in regions with limited data (Yan et al. 2019). Kriging is a geostatistical interpolation technique that considers both the distance and the degree of variation between known data points when estimating values at unsampled locations. This method is widely used in various fields, including hydrology, meteorology, and environmental sciences, for interpolating spatially distributed data. The rationale for choosing the kriging method over other interpolation techniques, such as inverse distance weighting or spline interpolation, is its ability to provide accurate estimates and minimize errors by accounting for spatial autocorrelation and local variations. Kriging is particularly suitable for interpolating precipitation data because it can effectively capture the spatial patterns and heterogeneity of rainfall distribution, which is often influenced by factors such as topography, elevation, and atmospheric conditions (Yao & Wang 2014; Lucas et al. 2022). Furthermore, kriging offers several advantages over other interpolation methods.
  • 1. It provides an estimate of the interpolation error or uncertainty, allowing for a quantitative assessment of the reliability of the interpolated values.

  • 2. It can handle clustered or irregularly spaced data points, which is common in precipitation monitoring networks.

  • 3. It allows for the incorporation of additional covariates or auxiliary variables, such as elevation or terrain features, to improve the accuracy of the interpolation.

Figure 2

Flood influencing factors: aspect, DEM, flow direction, NDVI, precipitations, and river density.

Figure 2

Flood influencing factors: aspect, DEM, flow direction, NDVI, precipitations, and river density.

Close modal
Figure 3

Flood influencing factors: River distance, slope, TPI, TRI, TWI, and land use.

Figure 3

Flood influencing factors: River distance, slope, TPI, TRI, TWI, and land use.

Close modal
Figure 4

Flood influencing factors: geology.

Figure 4

Flood influencing factors: geology.

Close modal
By employing the kriging method, the authors aimed to leverage its ability to account for spatial autocorrelation and local variations, as well as its suitability for regions with limited data, resulting in a more accurate and reliable representation of the precipitation distribution across the study area. This is crucial for accurately assessing the influence of precipitation on flood risk and ensuring the robustness of the subsequent analyses and mapping efforts. Subsequently, ArcGIS software was used to perform precipitation classification in the designated area, resulting in a map displaying the distribution of precipitation. The study area's precipitation values range from 195 to 902 mm. Land use is a significant and influential factor in relation to floods (Alshammari et al. 2023). Different land use patterns lead to diverse runoff conditions in various regions. The type of vegetation cover directly affects the soil's ability to absorb water, and urban areas, including residential zones, gardens, farmlands, and tree plantations, also exhibit variability in land use. Regions with vegetation cover experience reduced surface runoff due to the higher water infiltration capacity of the soil, resulting in a mitigated impact of floods. Conversely, urban areas, often characterized by impermeable surfaces, exhibit higher surface runoff and significantly lower water infiltration rates. The predominant land use in the study area consists of grassland and agricultural land. The geomorphology of the region plays a crucial role in the occurrence of floods as it impacts the flow and runoff (Rogger et al. 2017). Quaternary deposits are prevalent and highly influential geological formations in the Aji Chay region. The proximity to rivers and drainage channels is significant in determining the impact of floods (Glenn et al. 2012). To create the river distance map, the Euclidean distance tool in ArcGIS software was utilized. The Euclidean distances to the river system were calculated as continuous values ranging from 0 to 24,516 m across the study area. The TWI is a parameter related to topohydrology that reflects the potential moisture accumulation at each pixel. An elevation in the TWI signifies greater moisture values and the existence of areas with higher moisture density, usually observed in regions with gentle slopes, which are more prone to floods (Oh et al. 2011). The TWI is calculated using Equation (1):
formula
(1)
In the given formula, the variable As represents the flow accumulation in each pixel and α indicates the slope in degrees for each pixel. The natural logarithm is denoted by ln. The TWI values observed in the study area range from 8.2 to 24.1. The TRI was introduced by Riley et al. in (1999). It is computed by assessing the difference in elevation values between a central cell and its eight adjacent cells. TRI is an indicator of surface roughness, which causes hydrodynamic friction with objects present on it, such as plants and buildings (Dorn et al. 2014). The TRI values in the study area vary from 0 to 183. In addition, the TPI for a cell is obtained by subtracting the cell's elevation value from the average elevation of its neighboring cells (Gervasi et al. 2020). TPI is expressed using Equation (2):
formula
(2)

The formula uses M0 to represent the elevation of the point being assessed, Mn to represent the elevation of the surrounding grid, and n as the total number of neighboring points considered in the evaluation. The index yields values close to zero for flat areas, positive values for higher elevations than the surroundings, and negative values for valleys or depressions (Olanrewaju & Umeuduji 2017). In the study area, the TPI values ranged from 0.95 to 120.6. The TPI, TWI, and TRI maps were generated using SAGA-GIS software.

NDVI is a crucial vegetation index extensively utilized for precise vegetation cover assessment. It is a spectral index calculated as the ratio of the difference between the reflectance of red and near-infrared light bands (Gao et al. 2023). Equation (3) defines the NDVI as follows:
formula
(3)

The NDVI formula uses the reflectance of near-infrared and red light bands to calculate values ranging from −1 to 1, which serve as an index for vegetation cover density. Higher NDVI values indicate a higher density of vegetation cover, whereas lower NDVI values indicate a lack of vegetation. The NDVI data for this layer were acquired from C1 LANDSAT 8 OLI/TIRS on 13 March 2018, with the path number 117 and row number 43. The NDVI values in the study area vary from 0.19 to 0.88. The drainage density map is calculated by dividing the total length of all rivers and watercourses in the watershed area by the total area of that region (Desalegn & Mulu 2021). Using the Line Density function in ArcGIS software, the drainage density map for the Aji Chay watershed was generated, with values ranging from 0 to 0.68 km/km2 per square kilometer. The flow direction represents the direction toward lower neighboring cells for each cell in the raster data, determined based on the slope of each cell relative to its adjacent cell (Hadibasyir & Fikriyah 2023). The flow direction values in the study area range from 1 to 128. The flow direction map used in this study was derived from the DEM data as the input raster. Table 1 provides the breakdown of land use classification based on zoning regulations.

Table 1

Descriptions and corresponding areas of the land use in the study area

Land useArea (ha)Area (%)
Agricultural land 2,962,995.9 37.87 
Bareland 89,983.8 1.15 
City 49,686.9 0.64 
Forest 233,695.7 2.99 
Island 8,577.6 0.11 
Range 3,824,794.9 48.88 
Rock 10,627.1 0.14 
Saltland 64,187.4 0.82 
Water 441,245.2 5.64 
Wetland 26,176.7 0.33 
Woodland 112,261.7 1.43 
Land useArea (ha)Area (%)
Agricultural land 2,962,995.9 37.87 
Bareland 89,983.8 1.15 
City 49,686.9 0.64 
Forest 233,695.7 2.99 
Island 8,577.6 0.11 
Range 3,824,794.9 48.88 
Rock 10,627.1 0.14 
Saltland 64,187.4 0.82 
Water 441,245.2 5.64 
Wetland 26,176.7 0.33 
Woodland 112,261.7 1.43 

The flowchart in Figure 5 provides a visual depiction of the research methodology.
Figure 5

Flowchart of the methodology in this study.

Figure 5

Flowchart of the methodology in this study.

Close modal

The first and most crucial step in creating a flood hazard map is to prepare a flood inventory map. There are various methods used for this purpose, and in the study area, the locations affected by floods were identified through on-site surveys. Using a GPS device, flood-affected locations were recorded during the period from 2018 to 2020. Generating absence data is an important step in binary modeling of environmental issues, which typically involves random sampling. In this study, the Absence Point Generation (APG) tool, a Python-based ArcGIS toolbox, was employed to automatically generate absence data for the flood study area. The APG toolbox was developed to automate the creation of absence datasets for geospatial studies. It utilizes a frequency ratio analysis of key factors such as altitude, slope degree, TWI, and distance from rivers to define low potential zones for generating absence datasets. The importance of incorporating absence points in environmental binary modeling is highlighted by Naghibi et al. (2021), who found that the APG toolbox significantly improved the performance of benchmark algorithms like RF and boosted regression trees compared to traditional absence sampling methods, resulting in higher area under the curve (AUC) values. This underscores the necessity of incorporating absence points to enhance the accuracy and reliability of predictive models for various hazards like landslides, floods, and erosion (Bui et al. 2019; Naghibi et al. 2021).

The dataset consisted of a total of 170 flood and non-flood points generated by the toolbox, which were then divided into two subsets for training (70% of the data, n = 120) and evaluation (30% of the data, n = 50) purposes (Figure 6). The training data were used to construct and calibrate the model, while the evaluation data were used to assess the model's accuracy (Tokar & Johnson 1999).
Figure 6

Flood inventory map.

Figure 6

Flood inventory map.

Close modal

LOFO feature selection

Feature selection is a crucial step in model development that aims to reduce the number of input variables. Its main objectives are to decrease computational costs and potentially enhance the model's performance (Zebari et al. 2020). In certain predictive modeling scenarios, a large number of variables can hinder model development and demand substantial system memory. Furthermore, including irrelevant or redundant input variables may lead to a decline in the model's performance (Brownlee 2019). To address these challenges, feature selection methods are employed to eliminate unnecessary or redundant variables, thereby reducing the number of input features. Various approaches exist to assess the relative importance of variables in a dataset. In this study, the Leave-One-Feature-Out (LOFO) approach was utilized to identify the most influential features in the modeling process (Roseline & Geetha 2021). The LOFO method involves evaluating the model's performance initially using all input features based on the ROC-AUC metric. Subsequently, one feature is iteratively removed at a time, and the model is retrained and evaluated on a validation set (Gholami et al. 2021). This process yields the importance of each feature, allowing for the elimination of features with low importance from the modeling process.

Convolutional neural network

The CNN is a specialized type of deep neural network widely utilized for image classification tasks. It is constructed with different types of layers, including convolutional, pooling, and fully connected layers, to create a deep CNN model (Albawi et al. 2017). These hidden layers consist of specific components like convolution, pooling, and fully connected layers. In the convolutional layer, a filter matrix, whose size is determined by the model's architecture, is applied to the input image. The pooling layer performs downsampling, usually after the convolution, and does not involve any learning process. Various pooling operations exist, with max pooling being a commonly used method. In max pooling, a set of pixel values is taken as input, and the maximum value from that set is selected as the output. The fully connected layer is the final part of the CNN. Here, all nodes are interconnected, and a learning process is employed to predict the probabilities of each output using softmax or sigmoid functions (Fujita et al. 2021). Figure 7 illustrates the structure of the convolutional model used in this study, comprising the following layers:
  • 1- The first layer is a Convolution1D layer with input_dim = 7, where the number 7 represents the seven selected features in the feature selection stage. The layer has 256 neurons with the activation function set to relu.

  • 2- A max pooling layer follows the first Convolution1D layer.

  • 3- The next layer is another Convolution1D layer with 128 neurons and relu activation function.

  • 4- Another max pooling layer is applied after the second Convolution1D layer.

  • 5- The process continues with another Convolution1D layer with 64 neurons and relu activation function.

  • 6- The last max pooling layer is used after the third Convolution1D layer.

  • 7- The flatten layer is applied to transform the output into a one-dimensional vector.

  • 8- The final layer is a fully connected layer with one neuron.

Figure 7

CNN architecture showing the three main layers.

Figure 7

CNN architecture showing the three main layers.

Close modal

In the compilation step, the binary_crossapg loss function is chosen, and the optimizer is set to adam. In addition, the accuracy metric is selected.

Optimization of CNN model parameters

  • 1. Changing the batch_size: As mentioned earlier, the initial value for this parameter was set to 10. In this stage, we experimented with two different values: 8 and 15. The highest accuracy was achieved with a batch_size of 10.

  • 2. Changing the number of epochs: The initial value for this parameter was set to 2,000. In this stage, we tried two different values: 500 and 1,500. The highest accuracy was obtained with 1,500 epochs.

  • 3. Changing the optimizer type: Initially, we used the adam optimizer. In this stage, we switched to the Adamax optimizer. However, the highest accuracy was still achieved with the adam optimizer.

These optimization steps were performed to fine-tune the CNN model and find the best combination of hyperparameters that yield the highest accuracy.

LSTM model

LSTM is an artificial neural network extensively applied in the domains of artificial intelligence and deep learning. It belongs to the family of RNNs and incorporates feedback connections (Jiang et al. 2019). The unique capability of an LSTM model lies in its capacity to retain information from long-term past time series data while automatically deciding which features are important to keep and which ones are irrelevant and should be disregarded in its memory cell state. Figure 8 provides an overview of the LSTM architecture. Within the LSTM model, three gates play a significant role in capturing long-term dependencies. These gates are the input gate, output gate, and forget gate, which enable the LSTM to decide whether to retain newly acquired information in its memory cell or forget it, based on its relevance and significance (Lee & Noh 2023). In this study, the following LSTM structure was employed for the modeling process. Figure 8 illustrates the structure of the LSTM model used in this study, comprising the following layers:
  • 1- In the first layer, an LSTM with 256 units was utilized. In addition, a dropout rate of 0.2 and recurrent dropout rate of 0.2 were applied in this layer.

  • 2- In the second layer, an LSTM with 128 units was employed. The dropout rate and recurrent dropout rate in this layer were both set to 0.2.

  • 3- The third layer featured an LSTM with 64 units. Similar to the previous layers, a dropout rate of 0.2 and recurrent dropout rate of 0.2 were used here.

  • 4- The fourth layer consisted of a dense layer with one unit, and the activation function employed was Sigmoid.

Figure 8

Structure of the LSTM.

Figure 8

Structure of the LSTM.

Close modal

In the compilation step, the binary_crossentropy loss function was chosen, and the optimizer was set to adam. In addition, the accuracy metric was selected.

In the execution or fitting step of the model, the number of epochs was set to 2,000. Moreover, a batch size of 10 was chosen.

Optimization of LSTM model parameters:

Optimizing the LSTM model parameters involved three key steps:

  • 1- Batch Size Modification: The initial batch_size was set to 10. During this optimization phase, we experimented with two different values: 8 and 15. After thorough evaluation, we found that the batch_size of 10 resulted in the highest accuracy.

  • 2- Epochs Adjustment: Initially, the number of epochs was set to 2,000. In this optimization stage, we tried two different values: 500 and 1,500. However, after analyzing the results, we determined that the best accuracy was achieved with 2,000 epochs.

  • 3- Optimizer Type Change: We started with the adam optimizer, but in this optimization step, we switched to the Adamax optimizer. Surprisingly, the highest accuracy was still obtained using the adam optimizer.

By performing these optimization steps, we fine-tuned the LSTM model and identified the optimal hyperparameter combination that resulted in the highest accuracy.

Evaluation of flood risk models

In this study, four charts were utilized to evaluate the performance of the models, and below, we present an introduction to these four charts:

The Gain chart

The Gain chart is a graphical representation used to directly compare the performance of different models with a baseline random model, which serves as the best guess for labeling samples (Soukup &Davidson 2002). It is a valuable metric for evaluating model effectiveness. The random line on the chart represents results obtained randomly. Models that deviate further from the baseline diagonal line provide more value and perform better (Halawi et al. 2022). Conversely, models closer to the baseline perform more like random guessing and are less valuable. The wizard line on the chart represents the best achievable result by a model. Thus, the closer the model's performance line is to the wizard line, the better its performance. In summary, the distance between these lines represents a range spanning from the worst possible result to the best possible result.

The Lift chart

The Lift chart is a type of cumulative gain chart used to evaluate model performance. In the cumulative gain chart, the x-axis displays the predicted values by the model and the y-axis represents the cumulative gain relative to the entire dataset. However, in the Lift chart, the x-axis remains the same with the predicted values by the model, while the y-axis represents the Lift instead of cumulative gain. Lift indicates the relative improvement of each point compared to the baseline, which represents the ratio of positives to negatives in the input data. In summary, the Lift chart demonstrates how much the model has improved compared to a random baseline. A larger area between the Lift curve and the baseline indicates better model performance.

The Decile-wise Lift chart

The Decile-wise Lift chart is a visualization that involves sorting the test data based on the model's predicted probabilities and then dividing it into 10 equal deciles. For each decile, the lift value is calculated by comparing the ratio of actual positive instances to the predicted positive instances from the model with the ratio of actual positive instances to predicted positive instances in the entire test dataset. The lift values are plotted on the vertical axis, and the deciles are plotted on the horizontal axis. If the lift value for the first decile is 2, it indicates that the model performed twice as well as a random model in predicting the data in that decile. Lift values less than 1 suggest that the model performed worse than a random model, while lift values greater than 1 indicate that the model outperformed a random model. By studying the Decile-wise Lift chart, we can identify which decile model made the best predictions and where its performance declined. Overall, the Decile-wise Lift chart provides insights into how much better the model performed compared to a random model when predicting the test data.

The KS chart

The Kolmogorov–Smirnov (KS) chart is a graphical tool used to assess the classification performance of models. It specifically quantifies the distinction between the distributions of positive and negative instances. The KS value ranges from 0 to 100, and higher values indicate that the model is better at distinguishing between positive and negative cases (Abubakar & Muhammad Sabri 2021). A KS value of 0 suggests that the model cannot differentiate between positives and negatives and performs like random guessing. In summary, the KS chart provides a metric for evaluating how effectively a classification model separates positive and negative instances, and higher KS values imply better performance in this regard.

Interpretability of deep learning models

The complexity and opacity of neural network models have created a strong need for a clear understanding of their decision-making process (Kokhlikyan et al. 2020). To address this, various methods were used in this study to enhance the interpretability of deep learning models. Shapley value plots were utilized to assess the significance of the criteria employed during model development. Shapley value, derived from cooperative game theory, has gained prominence in recent years with diverse applications in machine learning models (Rozemberczki et al. 2022). Multiple approaches were implemented in this research to generate Shapley value plots. Among the conventional methods, the beeswarm chart was employed to visualize the Shapley value results. Using colors, this approach illustrates the primary value of each feature in individual instances, enabling users to grasp the importance of features and their influence on the model's output more easily. Another approach used for explaining deep learning models is through Individual Conditional Expectation (ICE) plots (ICEPs). These plots visualize how each input parameter's change affects the model's prediction (Lorentzen & Mayer 2020). ICEPs are similar to Partial Dependence Plots (PDPs) but differ in that PDP calculates the average over the marginal distribution, while ICE retains all individual instances. Each line in the ICEP represents predictions for a specific output. By avoiding averaging across all samples, ICE reveals nonhomogeneous relationships, but it is limited to a specific target feature, as two features may create overlapping levels that are challenging to identify visually (Molnar 2020).

In this study, another approach for interpretability was employed, known as Deep Learning Model Sensitivity Analysis. This method examines how predictions change with variations in each input parameter (Li et al. 2021). The research involved altering the inputs by increments of 5, 10, 20, and 40% and then calculating the corresponding changes in predictions. Subsequently, all the results were visualized on a chart, allowing for the identification of the most sensitive parameters.

Feature selection

The results of feature selection using the LOFO method are presented in Figure 9. On the horizontal axis, variables with a central boxplot line greater than zero are considered selected features. Based on this analysis, out of the 13 factors, 7 including rainfall, land use, NDVI, drainage density, flow direction, TWI, and TRI were identified as influential factors in flood risk in the Aji Chay region. However, six factors including elevation, slope, aspect, distance from the river, geology, and TPI were excluded from the analysis. In other words, the controlling factors for flood risk vary depending on the specific conditions of the region.
Figure 9

Feature selection by LOFO importance.

Figure 9

Feature selection by LOFO importance.

Close modal

Flood hazard maps

The main objective of this study was to create flood hazard maps for the Aji Chay region using CNN and LSTM models. To accomplish this goal, the researchers considered seven significant factors related to flooding, including rainfall, land use, NDVI, drainage density, flow direction, TWI, and TRI, during the feature selection process for modeling. The flood hazard maps were then categorized into five classes (very high, high, moderate, low, and very low) using the natural breaks method. The results showed that approximately 38% of the study area is at risk of flood hazards. The CNN model covered specific percentages for each class (ranging from 14.85 to 26.93%), while the LSTM model covered different percentages for each class (ranging from 18.22 to 23.28%). These maps provide valuable insights into the potential flood risk across the study area. The results of both models, along with the corresponding areas in hectares and percentages for each flood hazard class, are presented in Figures 10 and 11 and Table 2.
Table 2

The area of flood hazard classes affected by the deep learning models

ModelClass
Very low
Low
Moderate
High
Very high
Area (ha)Area (%)Area (ha)Area (%)Area (ha)Area (%)Area (ha)Area (%)Area (ha)Area (%)
CNN 2,098,091.14 26.93 1,350,974.96 17.34 1,157,493.70 14.85 1,244,107.98 15.97 1,941,442.69 24.92 
LSTM 1,458,259.57 18.71 1,813,729.73 23.28 1,647,958.36 21.15 1,419,699.60 18.22 1,452,328.99 18.64 
ModelClass
Very low
Low
Moderate
High
Very high
Area (ha)Area (%)Area (ha)Area (%)Area (ha)Area (%)Area (ha)Area (%)Area (ha)Area (%)
CNN 2,098,091.14 26.93 1,350,974.96 17.34 1,157,493.70 14.85 1,244,107.98 15.97 1,941,442.69 24.92 
LSTM 1,458,259.57 18.71 1,813,729.73 23.28 1,647,958.36 21.15 1,419,699.60 18.22 1,452,328.99 18.64 
Figure 10

Flood hazard maps produced using CNN model.

Figure 10

Flood hazard maps produced using CNN model.

Close modal
Figure 11

Flood hazard maps produced using LSTM model.

Figure 11

Flood hazard maps produced using LSTM model.

Close modal
Figure 12 displays a comparison between the CNN and LSTM models in flood modeling. The study analyzed the accuracy curves of both models during the training phase. The findings revealed that the LSTM model had a slightly higher training accuracy than the CNN model, but it demonstrated less stability. Moreover, relative stability ratios were observed in the CNN model's training dataset. Ultimately, the results suggest that the LSTM model outperforms the CNN model in extracting information for generating flood hazard predictions, as its accuracy improves with successive iterations of the dataset.
Figure 12

Accuracy for training data in (a) CNN and (b) LSTM models.

Figure 12

Accuracy for training data in (a) CNN and (b) LSTM models.

Close modal

Evaluation of CNN and LSTM models

In this research, we utilized four evaluation methods to assess the performance of CNN and LSTM models in flood modeling, namely, Lift charts, Decile-wise Lift charts, cumulative gains, and KS statistic. The Lift chart depicts how much better the CNN and LSTM models perform compared to a random model. Initially, during the early training and testing stages of the CNN model, the Lift values were approximately 1.9 and 1.7 times better than the random model. However, as the training iterations progress, the Lift gradually decreases and eventually converges to the random model line. This is due to higher probability scores being concentrated in the upper deciles, which is also evident in the cumulative gains chart. Consequently, the lower deciles exhibit lower probability values and resemble the performance of the random model. The Decile-wise Lift chart displays the percentage of target class observations in each decile. Decile 1 exhibits the highest percentage, which decreases as we move to higher deciles, even going below the random model line at a particular point. This is because the model has fewer observations in the higher deciles, while the random model distributes observations randomly and uniformly. The cumulative gains chart, representing an accumulative curve, indicates that the accumulative gain curve for the CNN and LSTM models becomes smooth after decile 6, signifying that deciles 6–10 have either the minimum records or no records at all. The Wizard model attains 100% at decile 5, indicating its ideal performance with perfect predictions and almost 100% prediction accuracy. The KS statistic chart is used to compare different distributions, and the KS value corresponds to the point of maximum difference between the two distributions. In summary, this chart provides insights into the deep learning models' capability to distinguish between two events. The KS values for the CNN and LSTM models during the training and testing stages are 75.02, 56.09, 78.65, and 88.14, respectively, at decile 5 on the chart.

Interpretability results of LSTM model for flood risk prediction

Figure 13(a) shows the interpretability results of the LSTM model for flood hazard prediction. The beeswarm chart displays the importance of different features on the model's prediction, highlighting the most influential features. The heatmap chart presents the model's output as a color-coded map, and the overall importance of each input is represented as a bar chart on the right side of the heatmap. The results indicate that features related to drainage density, rainfall, and TWI have the greatest impact on the model's output. A SHAP plot, based on the SHapley Additive exPlanation technique, is a valuable tool for interpreting black-box machine learning models by decomposing predictions into fair feature contributions. Specifically, for boosted trees models with additively modeled features, the SHAP dependence plot closely aligns with the partial dependence plot, differing only by a vertical shift, as demonstrated with XGBoost (Mayer 2022), is shown in figure 15(b). The order of feature importance on the model output is as follows:
Figure 13

(a) ICEP for all features used in (b) heatmap plot of SHAP values constructed by game theory.

Figure 13

(a) ICEP for all features used in (b) heatmap plot of SHAP values constructed by game theory.

Close modal

From the perspective of maximal to minimal impact:

Based on these plots, the top four features, in order of importance, had the greatest impact on the model output for predicting flood risk in the study area. According to several studies (Ayazi et al. 2010; Huang & Weile 2011), decreasing underground water or underground water resources depletion has been recognized as the most important controlling factor for landslide hazard in Iran and China.

The results show that flood risk is higher in areas with high drainage density due to rapid runoff accumulation. Also, rainfall and flood occurrence have a direct relationship in the study area; as rainfall increases, the number of flooded pixels and flood risk class weights also increase. Another influential factor is elevation – with increasing elevation, flood risk also increases in high elevation range classes. The main reason for flooding at high elevations, especially hilly and mountainous areas, is the land slope, causing faster rainwater flow. The results indicate higher flood risk in areas with high elevation, rainfall, and drainage density coupled with low vegetation cover and agricultural land use, leading to more flooding in these areas.

The graphs in Figure 14 are ICEP, illustrating how a feature affects the predictions. Each ICEP displays a line representing the prediction changes for a specific instance as the feature varies. These plots provide insight into the prediction's dependence on individual features for each instance. The red lines in the plots correspond to PDPs. According to these plots, drainage density has been identified as the most critical feature for flood hazard in the studied region. The flood hazard increases significantly when drainage density exceeds 0.25 and rainfall goes beyond 300 mm.
Figure 14

ICEP for all features used in the mapping process.

Figure 14

ICEP for all features used in the mapping process.

Close modal
The chart in Figure 15 represents the Input Sensitivity Analysis. It is evident from the chart that drainage density, rainfall, and NDVI are the most sensitive features for the model. The results suggest that variations in the values of all other variables (excluding drainage density, rainfall, and TWI) had minimal impact on the model's output.
Figure 15

Model sensitivity analysis for LSTM model.

Figure 15

Model sensitivity analysis for LSTM model.

Close modal
The charts in Figures 16 and 17 represent the comparison of model predictions and actual observations for LSTM and CNN, respectively. It is evident from the charts that LSTM's performance is more reliable.
Figure 16

Comparison of model predictions and actual observation for LSTM.

Figure 16

Comparison of model predictions and actual observation for LSTM.

Close modal
Figure 17

Comparison of model predictions and actual observation for CNN.

Figure 17

Comparison of model predictions and actual observation for CNN.

Close modal
The bar chart in Figure 18 illustrates the association of land use with flood occurrence, categorized into four types: agricultural, city, forest, and pastures. It is obvious that flood occurrence in agricultural and forest land use have larger share according to the study area.
Figure 18

Association of each land use in flood occurrence

Figure 18

Association of each land use in flood occurrence

Close modal

Feature selection

In Figure 8, the LOFO results are presented for feature selection in flood hazard assessment in the Aji Chay region. Among the 13 features studied, the LOFO algorithm identified 7 important features for flood hazard assessment, namely, rainfall, land use, NDVI, drainage density, flow direction, TWI, and TRI (Al-Areeq et al. 2022; Sachdeva & Kumar 2022; Yang et al. 2022).

Flood hazard maps

Two flood hazard maps were generated using the CNN and LSTM models, shown in Figures 10 and 11, respectively. The flood hazard classes were categorized into five levels, namely, very high, high, moderate, low, and very low, using the natural breaks method (Cao et al. 2016; Ghosh et al. 2022). According to result maps and the associated table, the CNN and LSTM models have identified 40.89% (318,550.68 hectares) and 36.86% (2,872,028.58 hectares) of the total area falling into the very high and high-risk flood hazard classes, respectively. Conversely, only 26.93% (2,098,091.14 hectares) and 18.71% (1,458,259.57 hectares) of the area are categorized under the very low-risk class for the CNN and LSTM models, respectively. The classified regions prone to flood hazards by both models include parts of the western, northeastern, and central areas of the Aji Chay region, as well as some western parts. The main contributing factor to this classification is the land use changes and reduced vegetation cover in river basins due to recurrent droughts in recent years, leading to decreased water infiltration in the soil and an increase in surface runoff.

Performance evaluation of deep learning models

Figures 19 and 20 display the results of statistical metrics (KS, Lift, cumulative gain, and Decile-wise Lift) used to evaluate the CNN and LSTM models on both the training and testing datasets. The KS values at threshold 5 for the CNN and LSTM models on the training dataset were 75.02 and 78.35%, respectively, with the LSTM model achieving the higher value. Similarly, on the testing dataset, the KS values for the CNN and LSTM models were 56.09 and 88.14%, respectively, once again showing the LSTM model's superiority over the CNN model. The LSTM model's performance can be observed in the proximity of its model line (red line) to the wizard line (blue dashed line) compared to the CNN model. The closer the model line is to the wizard line, the better the predictions align with the optimal results. Based on the performance evaluation metrics, the LSTM model outperformed the CNN model in generating flood hazard maps for both datasets. Various studies, such as those by Kim et al. (2021), Kewat (2023), and Ranasinghe & Ilmini (2020), have demonstrated the excellent modeling capabilities of the LSTM model. The LSTM's success can be attributed to its utilization of LSTM RNNs, which enable the model to maintain long-term memory and address issues like vanishing or exploding gradients. The LSTM model incorporates a distinct type of activation function known as the LSTM cell, which aids in preserving information in long sequences. Overall, the LSTM model's superior performance can be attributed to its effectiveness in handling sequential data and retaining essential information over extended timeframes, making it a potent tool for flood hazard prediction.
Figure 19

Lift, Decile-wise Lift, cumulative gains, and KS statistic plot of the CNN model in the (a) training and (b) test steps.

Figure 19

Lift, Decile-wise Lift, cumulative gains, and KS statistic plot of the CNN model in the (a) training and (b) test steps.

Close modal
Figure 20

Lift, Decile-wise Lift, cumulative gains, and KS statistic plot of the LSTM model in the (a) training and (b) test steps.

Figure 20

Lift, Decile-wise Lift, cumulative gains, and KS statistic plot of the LSTM model in the (a) training and (b) test steps.

Close modal

General conclusion

The main aim of this study is to utilize machine learning models for creating a flood hazard map and analyzing influential variables in flood occurrences within the studied area.

To identify flood-prone areas, both CNN and LSTM models were employed to generate the flood hazard map. This involved using 13 criteria and 170 flood and non-flood data points. The non-flood points were extracted using the APG toolbox in ArcGIS software.

The results highlighted seven criteria, namely, rainfall, land use, NDVI, drainage density, flow direction, TWI, and TRI, as the most significant factors contributing to flood risk, identified through the LOFO feature selection algorithm.

In addition, SHAP plots demonstrated that these criteria played a major role in the modeling process. Based on evaluation metrics using training and testing datasets, the LSTM model outperformed the CNN model with KS values (at threshold 5) of 78.35 and 88.14%, respectively.

The flood hazard map effectively pinpointed locations prone to flooding, offering valuable insights for planners to implement preventive measures and enabling crisis managers and rescue teams to identify areas requiring evacuation and assistance during flood events. Moreover, it can enhance public awareness and minimize flood-related damages.

In conclusion, the flood hazard map proves to be an essential tool for flood risk management and mitigation in the studied region, providing valuable and precise information about potential flood hazards to decision-makers and planners.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abubakar, H. & Muhammad Sabri, S. R. 2021 Simulation Study on Modified Weibull Distribution for Modelling of Investment Return. Pertanika Journal of Science and Technology 29 (4). https://doi.org/10.47836/PJST.29.4.29.
Adiat
K. A. N.
,
Nawawi
M. N. M.
&
Abdullah
K.
2012
Assessing the accuracy of GIS-based elementary multi criteria decision analysis as a spatial prediction tool – A case of predicting potential zones of sustainable groundwater resources
.
Journal of Hydrology
440–441
,
75
89
.
https://doi.org/10.1016/j.jhydrol.2012.03.028
.
Albawi
S.
,
Mohammed
T. A.
&
Al-Zawi
S.
2017
Understanding of a convolutional neural network
. In
2017 International Conference on Engineering and Technology (ICET)
.
IEEE
, pp.
1
6
.
Alshammari
E. Z.
,
Rahman
A. A.
,
Rainis
R.
,
Seri
N. A.
&
Fuzi
N. F. A.
2023
The impacts of land use changes in urban hydrology, runoff and flooding: a review
.
Current Urban Studies
11
(
01
),
1
22
.
https://doi.org/10.4236/cus.2023.111007
.
Antzoulatos
G.
,
Kouloglou
I. O.
,
Bakratsas
M.
,
Moumtzidou
A.
,
Gialampoukidis
I.
,
Karakostas
A.
&
Kompatsiaris
I.
2022
Flood hazard and risk mapping by applying an explainable machine learning framework using satellite imagery and GIS data
.
Sustainability
14
(
6
),
3251
.
Ayazi
M. H.
,
Pirasteh
S.
,
Arvin
A. K. P.
,
Pradhan
B.
,
Nikouravan
B.
&
Mansor
S.
2010
Disasters and risk reduction in groundwater: Zagros mountain southwest Iran using geoinformatics techniques
.
Disaster Advances
.
Bassi
P.
2023
GIS and geospatial studies in disaster management
. In:
International Handbook of Disaster Research
(
Singh
A.
, ed.).
Springer
,
Cham
.
https://doi.org/10.1007/978-981-16-8800-3_214-1
.
Bentivoglio
R.
,
Isufi
E.
,
Jonkman
S. N.
&
Taormina
R.
2022
Deep learning methods for flood mapping: A review of existing applications and future research directions
.
Hydrology and Earth System Sciences
26
(
16
),
4345
4378
.
Brownlee
J.
2019
How to choose a feature selection method for machine learning
.
Machine Learning Mastery
10
. https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/
Bui
D. T.
,
Tsangaratos
P.
,
Nguyen
V. T.
,
Van Liem
N.
&
Trinh
P. T.
2019
Comparing performance of two soft computing models in shallow landslide susceptibility mapping: A case study at Mu Cang Chai District, Yemen Bai Province, Vietnam
.
Geomatics, Natural Hazards and Risk
10
(
1
),
1084
1105
.
https://doi.org/10.1080/19475705.2018.1559293
.
Cao
Q.
,
Mehran
A.
,
Ralph
F. M.
&
Lettenmaier
D. P.
2019
The role of hydrological initial conditions on Atmospheric River floods in the Russian River basin
.
Journal of Hydrometeorology
20
(
8
),
1667
1686
.
Chapi
K.
,
Singh
V. P.
,
Shirzadi
A.
,
Shahabi
H.
,
Bui
D. T.
,
Pham
B. T.
&
Khosravi
K.
2017
A novel hybrid artificial intelligence approach for flood susceptibility assessment
.
Environmental Modelling & Software
95
,
229
245
.
Costache
R.
,
Arabameri
A.
,
Costache
I.
,
Crăciun
A.
,
Islam
A. R. M. T.
,
Abba
S. I.
&
Pham
B. T.
2023
Flood hazard potential evaluation using decision tree state-of-the-art models
.
Risk Analysis
44 (2), 439–458.
Dorn
H.
,
Vetter
M.
&
Höfle
B.
2014
GIS-based roughness derivation for flood simulations: a comparison of orthophotos, LiDAR and crowdsourced geodata
.
Remote Sensing
6
(
2
),
1739
1759
.
https://doi.org/10.3390/RS6021739
.
Enhancing Flood Risk Assessment Through Machine Learning and Open Data
2023
https://doi.org/10.31223/x5xw95
.
Falah
F.
,
Rahmati
O.
,
Rostami
M.
,
Ahmadisharaf
E.
,
Daliakopoulos
I. N.
&
Pourghasemi
H. R.
2019
Artificial neural networks for flood susceptibility mapping in data-scarce urban areas
. In:
Spatial Modeling in GIS and R for Earth and Environmental Sciences
(Pourghasemi, H. R. & Gokceoglu, C., eds.).
Elsevier
, Amsterdam, pp.
323
336
.
Fujita
H.
,
Selamat
A.
,
Lin
J. C. W.
&
Ali
M.
2021
Advances and trends in artificial intelligence
. In
Artificial Intelligence Practices: 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021
,
Kuala Lumpur, Malaysia
,
July 26–29, 2021
.
Proceedings, Part I
.
Gao
P.
,
Du
W.
,
Lei
Q.
,
Li
J.
,
Zhang
S.
&
Li
N.
2023
NDVI Forecasting model based on the combination of time series decomposition and CNN – LSTM
.
Water Resources Management
37
(
4
),
1481
1497
.
https://doi.org/10.1007/s11269-022-03419-3
.
Gervasi
O.
,
Murgante
B.
,
Misra
S.
,
Garau
C.
,
Blečić
I.
,
Taniar
D.
&
Karaca
Y.
2020
Computational science and its applications – ICCSA 2020
. In
20th International Conference, Cagliari, Italy, July 1–4, 2020, Proceedings, Part VII
,
Vol. 12255
.
Springer Nature, Cham
.
Gholami
H.
,
Mohammadifar
A.
,
Malakooti
H.
,
Esmaeilpour
Y.
,
Golzari
S.
,
Mohammadi
F.
&
Collins
A. L.
2021
Integrated modelling for mapping spatial sources of dust in central Asia – An important dust source in the global atmospheric system
.
Atmospheric Pollution Research
12
(
9
),
101173
.
Glenn
E. P.
,
Morino
K.
,
Nagler
P. L.
,
Murray
R. S.
,
Pearlstein
S.
&
Hultine
K. R.
2012
Roles of saltcedar (Tamarix spp.) and capillary rise in salinizing a non-flooding terrace on a flow-regulated desert river
.
Journal of Arid Environments
79
,
56
65
.
Global Resource Institute [GRI]
2021
Increasing Flood Risk: A Global Analysis of Flood Risk From 2020 to 2030. Available from: https://www.globalresourceinstitute.org/flood-risk-analysis.
Hadibasyir
H. Z.
&
Fikriyah
V. N.
2023
Proceedings of the International Conference of Geography and Disaster Management (ICGDM 2022)
, Vol. 755
.
Springer Nature, Cham
.
Halawi
L.
,
Clarke
A.
&
George
K.
2022
Harnessing the Power of Analytics
.
Springer
,
New York
, pp.
51
59
.
Hassan
A. D.
&
Sabha
M.
2023
Feature extraction for image analysis and detection using machine learning techniques
.
International Journal of Advanced Networking and Applications
14
(
04
),
5499
5508
.
https://doi.org/10.35444/ijana.2023.14401
.
Hassanien
A. E.
&
Darwish
A.
2023
The Power of Data: Driving Climate Change with Data Science and Artificial Intelligence Innovations
, Vol.
118
.
Springer Nature, Cham
.
Hosseiny
H.
2021
A deep learning model for predicting river flood depth and extent
.
Environmental Modelling & Software
145
,
105186
.
Huang
R.
&
Weile
L.
2011
Formation, distribution and risk control of landslides in China
.
Journal of Rock Mechanics and Geotechnical Engineering
3 (2), 97–116.
Jiang
Q.
,
Tang
C.
,
Chen
C.
,
Wang
X.
&
Huang
Q.
2019
Stock price forecast based on LSTM neural network
. In:
Proceedings of the Twelfth International Conference on Management Science and Engineering Management
(
Xu
J.
,
Cooke
F.
&
Gen
M.
, eds).
Springer International Publishing
,
Cham
, pp.
393
-
408
.
Jones
A. E.
&
Dawson
G.
2023
ML approaches to flood susceptibility mapping at the country scale. EGUsphere. https://doi.org/10.5194/egusphere-egu23-8534
.
Kabir
S.
,
Kabir
S.
,
Patidar
S.
,
Xia
X.
,
Liang
Q.
,
Neal
J.
&
Pender
G.
2020
A deep convolutional neural network model for rapid prediction of fluvial flood inundation
.
Journal of Hydrology
590
,
125481
.
https://doi.org/10.1016/J.JHYDROL.2020.125481
.
Kevin
I.
,
Stricker
M.
,
Miyamoto
T.
,
Nuske
M.
&
Dengel
A.
2023
On the Importance of Feature Representation for Flood Mapping using Classical Machine Learning Approaches. arXiv. https://doi.org/10.48550/arXiv.2303.00691
.
Kewat
K.
2023
Plant disease classification using Alex Net. Preprint. Doi:HYPERLINK "http://dx.doi.org/10.21203/rs.3.rs-2612739/v1"10.21203/rs.3.rs-2612739/v1
.
Khosravi
K.
,
Shahabi
H.
,
Pham
B. T.
,
Adamowski
J.
,
Shirzadi
A.
,
Pradhan
B.
,
Pradhan
B.
,
Dou
J.
,
Ly
H.-B.
,
Gróf
G.
,
Ho
H. L.
,
Hong
H.
,
Chapi
K.
&
Prakash
I.
2019
A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods
.
Journal of Hydrology
573
,
311
323
.
https://doi.org/10.1016/J.JHYDROL.2019.03.073
.
Khosravi
K.
,
Panahi
M.
,
Golkarian
A.
,
Keesstra
S.
,
Keesstra
S.
,
Saco
P. M.
,
Bui
D. T.
&
Lee
S.
2020
Convolutional neural network approach for spatial prediction of flood hazard at national scale of Iran
.
Journal of Hydrology
591
,
125552
.
https://doi.org/10.1016/J.JHYDROL.2020.125552
.
Kim
M. H.
,
Kim
J. H.
,
Lee
K.
&
Gim
G.-Y.
2021
The prediction of COVID-19 using LSTM algorithms
.
International Journal of Networked and Distributed Computing
9
(
1
),
19
24
.
https://doi.org/10.2991/IJNDC.K.201218.003
.
Kokhlikyan
N.
,
Miglani
V.
,
Martin
M.
,
Wang
E.
,
Alsallakh
B.
,
Reynolds
J.
&
Reblitz-Richardson
O.
2020
Captum: A unified and generic model interpretability library for pytorch. arXiv preprint. arXiv:2009.07896
.
Li
K.
,
Long
Y.
,
Wang
H.
&
Wang
Y. F.
2021
Modeling and sensitivity analysis of concrete creep with machine learning methods
.
Journal of Materials in Civil Engineering
33
(
8
),
04021206
.
Liu
J.
&
Cho
H.-R.
2001
Effects of topographic slopes on hydrological processes and climate
.
Advances in Atmospheric Sciences
18
(
5
),
733
741
.
https://doi.org/10.1007/BF03403498
.
Lorentzen
C.
&
Mayer
M.
2020
Peeking into the black box: An actuarial case study for interpretable machine learning
.
Lucas
M. P.
,
Longman
R. J.
,
Giambelluca
T. W.
,
Frazier
A. G.
,
McLean
J.
,
Cleveland
S. B.
&
Huang
Y.-F.
2022
Optimizing automated kriging to improve spatial interpolation of monthly rainfall over complex terrain
.
Journal of Hydrometeorology
23
(
8
),
1419
1439
.
https://doi.org/10.1175/JHM-D-21-0171.1
.
Mayer
M.
2022
SHAP for additively modeled features in a boosted trees model
.
https://doi.org/10.48550/arXiv.2207.14490
.
Molnar
C.
2020
Interpretable machine learning. Lulu.com
.
Naghibi
S. A.
,
Hashemi
H.
&
Pradhan
B.
2021
APG: A novel python-based ArcGIS toolbox to generate absence-datasets for geospatial studies
.
Geoscience Frontiers
12
(
5
),
101232
.
https://doi.org/10.1016/j.gsf.2021.101232
.
Ngo
P. T. T.
,
Panahi
M.
,
Khosravi
K.
,
Ghorbanzadeh
O.
,
Kariminejad
N.
,
Cerdà
A.
&
Lee
S.
2021
Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran
.
Geoscience Frontiers
12
(
2
),
505
519
.
https://doi.org/10.1016/J.GSF.2020.06.013
.
Oh
H.-J.
,
Pradhan
B.
&
Pradhan
B.
2011
Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area
.
Computers & Geosciences
37
(
9
),
1264
1276
.
https://doi.org/10.1016/J.CAGEO.2010.10.012
.
Olanrewaju
L.
&
Umeuduji
J. E.
2017
Exploration of hydro-geomorphological indices for coastal floodplain characterization in Rivers State, Nigeria
.
Ghana Journal of Geography
9
(
1
),
67
87
.
Omayma
A.
,
El Morabiti
K.
,
Maftei
C.
,
Papatheodorou
C.
,
Buta
C.
,
Bounab
A.
&
Ouchar Al-Djazouli
M.
2022
Topographic indices and two-dimensional hydrodynamic modelling for flood hazard mapping in a data-scarce plain area: A case study of Oued Laou catchment (Northern of Morocco)
.
Geocarto International
1
24
.
https://doi.org/10.1080/10106049.2022.2082548
.
Pankaj
R. D.
,
Joshi
Y.
,
Rajib
A.
,
Thakur
P. K.
,
Nikam
B. R.
&
Aggarwal
S. P.
2023
Evaluating topography-based approaches for fast floodplain mapping in data-scarce complex-terrain regions: Findings from a Himalayan basin
.
Journal of Hydrology
619
,
Article 129309
.
https://doi.org/10.1016/j.jhydrol.2023.129309
.
Paul
G. C.
,
Saha
S.
&
Hembram
T. K.
2019
Application of the GIS-based probabilistic models for mapping the flood susceptibility in Bansloi sub-basin of Ganga-Bhagirathi river and their comparison
.
Remote Sensing in Earth System Sciences
2
(
2
),
120
146
.
https://doi.org/10.1007/S41976-019-00018-6
.
Ranasinghe
R. M.
&
Ilmini
W. M. K. S.
2020
Introducing an LSTM based Flood Forecasting Model for the Nilwala river basin with a Mobile Application–a Review
.
Razavi Termeh
S. V.
,
Kornejady
A.
,
Pourghasemi
H. R.
,
Keesstra
S.
&
Keesstra
S.
2018
Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms
.
Science of the Total Environment
615
,
438
451
.
https://doi.org/10.1016/J.SCITOTENV.2017.09.262
.
Riley
S.
,
Degloria
S.
&
Elliot
S. D.
1999
A terrain ruggedness index that quantifies topographic heterogeneity
.
International Journal of Science
5
,
23
27
.
Rogger
M.
,
Agnoletti
M.
,
Alaoui
A.
,
Bathurst
J. C.
,
Bodner
G.
,
Borga
M.
,
Chaplot
V.
,
Gallart
F.
,
Glatzel
G.
,
Hall
J.
,
Holden
J.
,
Holko
L.
,
Horn
R.
,
Kiss
A.
,
Kohnová
S.
,
Leitinger
G.
,
Lennartz
B.
,
Parajka
J.
,
Perdigão
R. A. P.
&
Blöschl
G.
2017
Land use change impacts on floods at the catchment scale: challenges and opportunities for future research
.
Water Resources Research
53
(
7
),
5209
5219
.
https://doi.org/10.1002/2017WR020723
.
Roseline
S. A.
&
Geetha
S.
2021
Android malware detection and classification using LOFO feature selection and tree-based models
.
Journal of Physics: Conference Series
1911
,
012031
.
Rozemberczki
B.
,
Watson
L.
,
Bayer
P.
,
Yang
H. T.
,
Kiss
O.
,
Nilsson
S.
&
Sarkar
R.
2022
The Shapley value in machine learning. arXiv preprint. arXiv:2202.05594
.
Sahana
M.
,
Rehman
S.
,
Sajjad
H.
,
Hong
H.
,
Hong
H.
&
Hong
H.
2020
Exploring effectiveness of frequency ratio and support vector machine models in storm surge flood susceptibility assessment: A study of Sundarban Biosphere Reserve, India
.
CATENA
189
,
104450
.
https://doi.org/10.1016/J.CATENA.2019.104450
.
Seydi
S. T.
,
Kanani-Sadat
Y.
,
Hasanlou
M.
,
Sahraei
R.
,
Chanussot
J.
&
Amani
M.
2022
Comparison of machine learning algorithms for flood susceptibility mapping
.
Remote Sensing
15
(
1
),
192
.
https://doi.org/10.3390/rs15010192
.
Soukup
T.
&
Davidson
I.
2002
Visual Data Mining: Techniques and Tools for Data Visualization and Mining
.
John Wiley & Sons
,
New York
.
Samanta
S.
,
Pal
D. K.
&
Palsamanta
B.
2018
Flood susceptibility analysis through remote sensing, GIS and frequency ratio model
.
Applied Water Science
8
,
Article 66
.
https://doi.org/10.1007/s13201-018-0710-1
.
Taromideh
F.
,
Fazloula
R.
,
Choubin
B.
,
Emadi
A.
&
Berndtsson
R.
2022
Urban flood-risk assessment: Integration of decision-making and machine learning
.
Sustainability
14
(
8
),
Article 4483
.
https://doi.org/10.3390/su14084483
.
Thimmaiah
G. N.
,
Tavakkoli Piralilou
S.
,
Gholamnia
K.
,
Ghorbanzadeh
O.
,
Rahmati
O.
&
Blaschke
T.
2020
Flood susceptibility mapping with machine learning, multi-criteria decision analysis and ensemble using Dempster Shafer Theory
.
Journal of Hydrology
590
,
125275
.
https://doi.org/10.1016/J.JHYDROL.2020.125275
.
Tokar
A. S.
&
Johnson
P. A.
1999
Rainfall-runoff modeling using artificial neural networks
.
Journal of Hydrologic Engineering
4
(
3
),
232
239
.
Yan
Y.
,
Liu
Y.
,
Wang
H.
,
Jin
C.
&
Dong
D.
2019
Improved spatio-temporal kriging and its application to regional precipitation prediction
. In
2019 International Conference on Intelligent Computing and its Emerging Applications (ICEA)
, pp.
165
170
.
https://doi.org/10.1109/IDAACS.2019.8924333
.
Yao
X.
&
Wang
G.
2014
Kriging interpolation method based on Delaunay and GPU
. In
2014 International Conference on Audio, Language and Image Processing
, pp.
782
786
.
https://doi.org/10.1109/ICALIP.2014.7009906
.
Youssef
A. M.
,
Pradhan
B.
&
Hassan
A. M.
2011
Flash flood risk estimation along the St. Katherine road, southern Sinai, Egypt using GIS-based morphometry and satellite imagery
.
Environmental Earth Sciences
62
,
611
623
.
https://doi.org/10.1007/s12665-010-0551-1
.
Youssef
A. M.
,
Al-Kathery
M.
&
Pradhan
B.
2015
Landslide susceptibility mapping at Al-Hasher area, Jizan (Saudi Arabia) using GIS-based frequency ratio and index of entropy models
.
Geosciences Journal
19
(
1
),
113
134
.
https://doi.org/10.1007/S12303-014-0032-8
.
Zebari
R.
,
Abdulazeez
A.
,
Zeebaree
D.
,
Zebari
D.
&
Saeed
J.
2020
A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction
.
Journal of Applied Science and Technology Trends
1
(
1
),
56
70
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).