Due to the physical processes of floods, the use of data-driven machine learning (ML) models is a cost-efficient approach to flood modeling. The innovation of the current study revolves around the development of tree-based ML models, including Rotation Forest (ROF), Alternating Decision Tree (ADTree), and Random Forest (RF) via binary particle swarm optimization (BPSO), to estimate flood susceptibility in the Maneh and Samalqan watershed, Iran. Therefore, to implement the models, 370 flood-prone locations in the case study were identified (2016–2019). In addition, 20 hydrogeological, topographical, geological, and environmental criteria affecting flood occurrence in the study area were extracted to predict flood susceptibility. The area under the curve (AUC) and a variety of other statistical indicators were used to evaluate the performances of the models. The results showed that the RF-BPSO (AUC=0.935) has the highest accuracy compared to ROF-BPSO (AUC=0.904), and ADTree-BPSO (AUC=0.923). In addition, the findings illustrated that the chance of flooding in the center of the area in question is greater than in other points due to lower elevation, lower slope, and proximity to rivers. Therefore, the ensemble framework proposed here can also be used to predict flood susceptibility maps in other regions with similar geo-environmental characteristics for flood management and prevention.

  • Comparative assessment of tree-based machine learning models to classify locations as either flooded or non-flooded.

  • Development of machine learning models BPSO algorithm.

  • A total of 20 geo-environmental criteria were used for flood susceptibility mapping.

  • Determining flood-affecting criteria using the BPSO algorithm.

  • Sensitivity analysis of 20 geo-environmental criteria in predicting flood susceptibility.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Every year various natural disasters such as floods, landslides, and earthquakes cause extensive human and financial losses worldwide (Smith & Ward 1998; Chapi et al. 2017). Floods have been identified as one of the most devastating and destructive natural disasters around the globe, negatively affecting humans and ecosystems (Alam et al. 2021). Statistics show that floods were responsible for more than half of the damage caused by natural disasters in the past five decades globally (Nachappa et al. 2020; Imani et al. 2021). In Iran, the occurrence of floods is not specific to certain regions and the entirety of the country faces this issue; however, based on the features of each region, the type of floods that occur and the damages caused by this natural disaster vary (Giang et al. 2020; Haer et al. 2020; Saedi et al. 2020). Despite the efforts of experts, policymakers, stakeholders, and government officials to reduce the effects of floods in the last few decades, the number of occurrences has been on the rise around the world (Johann & Leismann 2017; Kocaman et al. 2020). Due to this increase in floods, especially in cities, and the various hazards therefrom, identifying efficient flood prediction measures is of great importance (Hong et al. 2018; Chen et al. 2019). Therefore, identifying these measures can assist in a more effective prevention of this phenomenon while making use of public education, efficient management policies, and more extensive monitoring to combat the factors stimulating the increase of floods (Du et al. 2013; Minea 2013). Flood susceptibility refers to the probability that an area will experience flooding. Essentially, it is the likelihood of flooding of a particular type in a given location. It refers to the spatial likelihood or probability (either qualitatively or quantitatively) of a flood in the future (Hervás & Bobrowsky 2009). In other words, maps of flood susceptibility can be defined as quantitative or qualitative assessments of the classification, area, and spatial distribution of floods both existing and potentially occurring in a region (Santangelo et al. 2011). Prediction of flood susceptibility maps has been found to be a crucial step in the prevention and management of future floods (Khosravi et al. 2016; Youssef et al. 2016). One of the methods to reduce flood risks is the preparation of flood susceptibility maps, which provide valuable information about nature, floods, and their effects on floodplain lands and river boundaries (Zuo et al. 2015; Khosravi et al. 2016). As a result, it is possible to send appropriate warnings in case of flood danger and facilitate rescue operations. In flood zoning for functional control and land development, floodplain areas are divided into parts with the different susceptible regions. In recent years, metaheuristic techniques, numerical simulations, physical hydrological models, and machine learning (ML) models have been used to prepare flood susceptibility maps in watersheds worldwide (Liu et al. 2016; Chapi et al. 2017; Choubin et al. 2019; Khosravi et al. 2019; Razavi Termeh et al. 2018). Preparing a flood susceptibility map is recognized as an essential step in preventing and managing future floods. Flood susceptibility maps can identify and predict the risks of future floods based on statistical or deterministic methods (Mosavi et al. 2018). However, the occurrence of a flood event has complex conditions that make occurrence difficult to make a reliable prediction (Pham et al. 2020). It can be concluded that spatial prediction of natural hazards using models created by spatial data and their output leads to the preparation of susceptibility maps, which is the most appropriate solution for land use planning in watersheds to prevent these events. Although, due to the various complex criteria affecting floods, it is extremely difficult to reliably predict this phenomenon (Pourghasemi et al. 2020). In recent years, the combination of Geographic Information System (GIS) with Remote Sensing (RS) technology has dramatically increased the accuracy of flood susceptibility prediction (Safaripour et al. 2012; Chen et al. 2015; Zuo et al. 2015). Swift access to RS satellites and improved business practices have increased the use of GIS in the prediction of flood susceptibility maps. Therefore, GIS is a useful tool for analyzing complex phenomena like floods (Hong et al. 2018). In previous studies, knowledge-driven and data-driven approaches were used for the zoning of flood susceptible regions (Khosravi et al. 2016; Wang & Liu 2019). On the one hand, data-driven approaches have high efficiency in known areas or regions in which, statistically, the number of known evidence is sufficient. On the other hand, knowledge-driven approaches are more efficient in less known regions or where there are fewer targets in the region. In terms of flood susceptibility prediction, the major models are data-driven and rely on simple assumptions (Lohani et al. 2014). Over the past two decades, ML models have provided better performance and cost-effective solutions by mimicking the complex mathematical expressions of flooding processes (Mosavi et al. 2018). Physically based models were used to predict hydrological events, such as storms, rainfall, runoff, and models of hydraulic flow, including the impact of atmospheric, oceanic, and flood events (Zhao & Hendon 2009; Borah 2011; Costabile et al. 2013; Fernández-Pato et al. 2016; Xia et al. 2017). Despite their capacity to predict a wide range of flood scenarios, physical models often require a variety of hydro-geomorphological monitoring datasets that require intensive calculations, which make short-term predictions impossible (Nayak et al. 2005). In addition, establishing physically based models requires an in-depth understanding of hydrological parameters which is proving to be quite challenging (Kim et al. 2015). Also, studies have found that physical models have a short-term prediction capability gap (Costabile & Macchione 2015). Physically based models have certain disadvantages, which leads to the use of advanced data-driven models, such as ML models. This popularity is due, in part, to the fact that flood nonlinearity can be numerically derived from historical data without having to understand the relevant physical processes (Mosavi et al. 2018). ML models can develop faster and require less input than traditional models using data-driven prediction. The field of ML is based on artificial intelligence (AI) that aims to develop patterns, provide easier implementation with low computing cost, as well as rapid training, validation, testing, and evaluation with high performance compared to physical hydrological models (Mekanik et al. 2013). With the continual development of ML models in the last two decades, they have proven to perform more accurately than conventional models in predicting flood susceptibility (Mosavi et al. 2018).

For flood modeling studies, physical hydrological models, including HEC-HMS (Feldman 2000), SWAT (Arnold et al. 1998), and HSPF (Bicknell et al. 1997), have been used. Although such models are useful, they require a large number of field measurements and tedious parameterization methods (Fenicia et al. 2008). In addition, they still provide estimates of flood risk only at the site using local flow data recorded at hydrometric stations, so they are not suitable for regional flood assessments (Tien Bui et al. 2016). Using GIS to predict flood susceptibility is a valuable tool to reduce the risks associated with future floods (Wang et al. 2019). Through geostatistical tools for managing large amounts of spatial data, GIS has made significant contributions to flood susceptibility prediction studies (Tien Bui et al. 2016; Wang et al. 2019). Various statistical and data-driven techniques along with GIS techniques to identify flood susceptible regions have been proposed and used in the literature. Common knowledge-driven approaches used are the Analytic Hierarchy Process (AHP), frequency ratio (FR), and weight of evidence (WOE) (Rahmati et al. 2016; Khosravi et al. 2016; Seejata et al. 2018). Khosravi et al. (2016) prepared flood susceptibility maps for the Haraz river watersheds in Mazandaran using four different models, including FR, WOE, Analytical Hierarchy Process (AHP), and a combination of frequency ratio and analytical hierarchy process (FR-AHP). To implement the proposed models, 10 flood-affecting criteria, including slope angle, plain curvature, elevation, Topographic Wetness Index (TWI), Stream Power Index (SPI), rainfall, distance to river, lithology, land use, and Normalized Difference Vegetation Index (NDVI) were extracted. The results showed that the FR model had the most area under the curve (AUC) compared to the other models. Rahmati et al. (2016) used two knowledge-driven FR and WOE models for flood susceptibility mapping in Golestan Province, Iran. The final results showed that the FR and WOE models have almost similar and reasonable results. However, there are drawbacks to the use of these methods in generating a flood susceptibility map. For example, AHP results are subject to uncertainty because of ambiguous judgments and the FR method is highly dependent on the sample size (Miles & Snow 1984; Sajedi-Hosseini et al. 2018). Models based on ML can provide information directly from data without assuming anything prior to analysis. They reduce operating costs, improve the speed of data analysis, and are important when dealing with spatial data analysis (Jaafari et al. 2019). Various ML models have been used to predict flood susceptible regions, including support vector machines (SVMs), genetic algorithms (GAs), adaptive fuzzy neural inference systems (ANFIS), artificial neural networks (ANNs), and tree-based models have been proposed and developed for flood susceptibility prediction (Kia et al. 2012; Seckin et al. 2013; Tien Bui et al. 2016, 2018; Chapi et al. 2017; Zhao et al. 2018; Khosravi et al. 2019; Wang et al. 2019). Liu et al. (2016) used the Naïve Bayes (NB) method to evaluate the flood susceptibility of the Bowen watershed in Australia. To this end, four measures were used, including elevation, slope angle, soil type, and drainage density. It was found that the measures of elevation and slope angle have significant effects on evaluating flood susceptibility compared to the other measures. Hong et al. (2018) used a combination of various methods, including Logistic Regression (LR), Random Forest (RF), and SVM with WOE to predict flood susceptibility maps for the Poyang region in China. The findings indicated that the SVM-WOE combination model had the highest AUC compared to the other combination models (LR-WOE and RF-WOE). Choubin et al. (2019) predicted flood susceptibility maps in the Khiyav-Chai watershed in Iran using different ML models such as multivariate discriminant analysis (MDA), SVM, and classification and regression tree (CART) models. The results showed that all models have high performance with an AUC greater than 0.8. Nachappa et al. (2020) used two Multi-Criteria Decision Analysis (MCDA) models, including AHP and Analytical Network Process (ANP), and two ML models, including RF and SVM, to prepare flood susceptibility maps for the city of Salzburg, Austria. The AUC findings indicated that the RF (AUC=87.8%) and SVM (AUC=87%) models performed better than the multi-criteria decision models. Costache et al. (2021) used six ML models, including SVM, J48 decision tree, ANFIS, RF, ANN, and Alternating Decision Tree (ADTree) to predict flood susceptibility maps for the Buzău river watershed in Romania. To this end, 12 criteria including slope angle, elevation, aspect, Topographic Position Index (TPI), TWI, Convergence Index (CI), plain curvature, soil type, land use, distance to river, lithology, and rainfall were used. The results showed that the RF and ADTree models had the highest accuracy, while the J48 model had the lowest. In addition, the results showed that ML performance can be further improved by combining with other ML models, metaheuristic techniques, numerical simulations, and physical models (Mosavi et al. 2018; Razavi Temeh et al. 2018; Wang et al. 2019). Razavi Temeh et al. (2018) used a combination of Adaptive Neuro-Fuzzy Inference System (ANFIS) with Ant Colony Optimization (ACO), Genetic Algorithm (GA), and Particle Swarm Optimization (PSO) to predict the flood susceptibility of Jahrom in the city of Fars, Iran. To this end, nine hydrogeological, topographical, geological, and environmental criteria were extracted. The AUC resulting from Receiver Operating Characteristic (ROC) showed accuracies of 91.8, 92.6, and 94.5% for ANFIS-ACO, ANFIS-GA, and ANFIS-PSO combination models, respectively. They found that the ANFIS metaheuristic model has the most practical application in terms of reproducing the highly focused flood susceptibility map. To predict flood susceptibility in Dingnan County in China, Wang et al. (2019) combined the ANFIS model with two metaheuristic methods, including biogeography-based optimization (BBO) and imperialistic competitive algorithm (ICA). The results showed that the two ensemble ML models had superior effectiveness compared to ANFIS in the study area. However, ML algorithms have important characteristics that must be carefully considered. First, as a rule, they are as good as their training, since they learn the task through past data, and second, the capability of each ML model varies depending on the type of task (Faizollahzadeh Ardabili et al. 2018; Mosavi et al. 2018). This is also referred to as the ‘generalization problem’, because it shows how well the trained system can predict beyond the scope of the training data. It may be possible for some algorithms to perform well for short-term predictions, but not for long-term predictions. It is important to clarify these characteristics of the ML models based on the type of training data (Mosavi et al. 2018).

In all of the above studies, researchers have compared ML and metaheuristic models to select the best model for predicting flood susceptibility. It is desirable to identify more efficient and accurate ML models that can be used with minimal field data to spatially predict flood susceptibility. Although a number of ensemble ML models have been used to predict flood susceptibility, there is no perfect way to accurately predict flood susceptibility. Based on the results of previous research, tree-based ML models including RF, ADTree, and Rotation Forest (ROF) have high accuracy and efficiency in predicting natural hazards (Xia et al. 2017; Tien Bui et al. 2019; Costache et al. 2021). The advantage of the ROF algorithm is in the fact that variability in the data set and accuracy in the clustering process increase (Phong et al. 2021). The ADTree model uses fewer repetitions in its processes and can turn the classes in the model into binary to analyze the training data sets and validate them (Wu et al. 2020). In addition, the RF model is very flexible and can control regression and classification tasks with a high degree of accuracy (Chen et al. 2021). Due to the ability to identify more accurately and predict more efficiently, predicting flood susceptible regions using new hybrid tree-based ML models with metaheuristic algorithms is extremely important. There are very few studies that report on the optimization of tree-based ML models for flood susceptibility mapping. Accordingly, the objectives of this research are framed based on previous research gaps. Therefore, its main objectives are (1) to evaluate the performance of the improvement of tree-based ML models by the Binary Particle Swarm Optimization (BPSO) algorithm, (2) to determine the effective criteria in predicting flood susceptibility using BPSO along with tree-based ML models, and (3) to predict the flood susceptibility maps using new hybrid tree-based ML models to identify the most susceptible region in the study area.

The reason for combining ML models with the BPSO metaheuristic algorithm is to improve the accuracy of tree-based ML models and to determine the criteria affecting flood susceptibility in the study area, which is the innovation of the present study. Thus, the present research is distinguished from previous studies because of the use of improved tree-based ML models, including Optimized Rotation Forest (ROF-BPSO), Optimized Alternating Decision Tree (ADTree-BPSO), and Optimized Random Forest (RF-BPSO), to identify the effective criteria and predict the flood susceptibility of the Maneh and Samalqan watershed in North Khorasan province, Iran. Finally, AUC and statistical indicators were used to evaluate and compare the effectiveness of the proposed models in the case study.

The research design adopted for the present study was descriptive-analytical, and its type was based on the applied purpose. QGIS 3.1, SAGA GIS 7.9, and Google Earth Engine were used to process the data, while Python was used for quantitative calculations and the development of methods. The six step procedure of the current research can be observed in Figure 1:

  • 1.

    Preparation of a flood reference map to define the dependent variable (flood-prone locations in the case study).

  • 2.

    Extraction of flood-affecting spatial criteria in the case study to define independent variables (20 hydrogeological, topographical, geological, and environmental criteria).

  • 3.

    Use of a multi-collinearity test to investigate the independence of the spatial flood-affecting criteria in the case study.

  • 4.

    Identifying spatial flood-affecting criteria in the region using BPSO combined with tree-based ML models.

  • 5.

    Predicting flood susceptibility based on the development of tree-based ML approaches, such as ROF-BPSO, ADTree-BPSO, and RF-BPSO.

  • 6.

    Evaluating and comparing the performance of the developed ML models through the use of appropriate statistical indicators.

Figure 1

Flowchart of the research.

Figure 1

Flowchart of the research.

Close modal

Study area

The city of Maneh and Samalqan is located in the northwestern region of North Khorasan province. It is surrounded by the city of Raz and Jorgalan to the north, Golestan province to the west, the cities of Jajerm and Garmeh to the south, and the city of Bojnurd to the east. Additionally, this city shares 8 km of its border with Turkmenistan. In terms of its geographical location, this city is located between the 37 °17′ and 38 °7′ north latitude of the equator and between 55 °59′ and 57 °17′ of the east longitude from the Greenwich meridian. The city of Maneh and Samalqan has an area of 6,053 km2 and has a topography that ranges from 314 to 2,785 m in elevation. This city has a variety of different terrain and topography; it is located in a region with an arid and semi-arid climate, and the spatial and temporal distribution of rainfall is completely variable and semi-uniform. Rainfall usually occurs during cold and damp seasons, resulting in an increased chance of flooding and soil erodibility. In addition, the existence of steep slopes, extensive use of sand mines, and the general lack of vegetation and trees in the region have resulted in more floods in the past few years, causing immense damage. The hydrographic network of the city consists of seasonal and permanent rivers and is part of the Atrak watershed. These rivers originate from their sources which can be located in the heights outside of the city and make their way down these areas based on the general slope. Maneh and Samalqan city limits can be observed in Figure 2.

Figure 2

Geographical location of the study area.

Figure 2

Geographical location of the study area.

Close modal

Preparation of information layers

Flood reference map

Before predicting the occurrence of future floods in the area of research, analyses on previous floods must be performed. Flood points play an important role in the relationship between the flood-affecting criteria and the occurrence of floods. In other words, previous historical flood occurrences are indicators of future floods, in that previously affected areas are more susceptible to floods in the future (Rahmati et al. 2016). Flood points represent a level of relationship between the occurrence of floods and the criteria that cause them. Flood inventory maps indicating historic flood locations are required for spatial modeling of flood susceptibility (Tehrany et al. 2014). There are various means for the preparation of reference flood maps, such as interpreting digital satellite imagery or making use of past flood databases (Chen et al. 2019). In total, 370 flood-prone points (with a value equal to one) have been recorded in the watershed of the study area by the Iranian forests, rangelands, and watershed organization (2016–2019). In this study, 70% of these points (259 flood-prone points) were randomly chosen to train the models while the remaining 30% were used for validation (111 flood-prone points). This ratio is accepted and recommended by many researchers in the field (Khosravi et al. 2016; Costache et al. 2021). Additionally, based on the findings of previous studies and in order to make the findings of the present study more realistic, 370 non-flood points (with a value equal to zero) were created using topographic maps, and the Google Earth Engine in areas such as hills and mountains in which floods cannot randomly occur (Hong et al. 2018; Costache et al. 2021). In this research, the dependent variable was defined by merging flood and non-flood points (values of zero and one). Furthermore, the flood inventory maps used in this study and previous ones only used binary values (0, 1) for the absence or presence of floods (Tehrany et al. 2014; Rahmati et al. 2016). As a result, flood locations were given equal weight to predict flood susceptibility maps (Khosravi et al. 2016; Hong et al. 2018; Costache et al. 2021). In fact, it can be said that the binary definition of flood and non-flood data can indicate the susceptibility of the study area to flood events. For all flood and non-flood points (740 points), the raster values of the independent variables are extracted to the points. A data matrix consisting of a column of flood and non-flood points (0 or 1) and 20 columns of the raster values of the independent variables were created. Then, this data matrix can be entered as independent and dependent variables in the optimized tree-based ML models. Therefore, by analyzing the relationships between the dependent variable and their influencing criteria (independent variables) using optimized tree-based ML models, the past flood records can be used for predicting future flood events in the study area (Choubin et al. 2019; Khosravi et al. 2019). Finally, the results were imported back into QGIS to produce the flood susceptibility maps. Field surveys were conducted in order to validate the flood-prone points in the area of research (Figure 3).

Figure 3

Parts of the floodplain areas in the study area.

Figure 3

Parts of the floodplain areas in the study area.

Close modal

Flood-affecting criteria

Based on previous studies and the availability of data, 20 hydrogeological, topographical, geographical, and environmental criteria were extracted. These criteria include Curve Number (CN), Horizontal Overland Flow Distance (HOFD), Modified Fournier Index (MFI), Vertical Overland Flow Distance (VOFD), NDVI, TWI, SPI, Terrain Ruggedness Index (TRI), TPI, Digital Elevation Model (DEM), Slope angle, Flow Accumulation, Aspect, Plane curvature, Distance to a fault, Distance to the road, Distance to river, Lithology, Soil type, and Land use (Khosravi et al. 2016; Mojaddadi et al. 2017; Chen et al. 2019; Costache et al. 2021). Each of these criteria was prepared in the form of raster maps with 30 m pixel sizes. The topographical map of the region at a scale of 1:50,000 was obtained from the Iranian national cartographic center. Satellite imagery with a resolution of 30 m from the Shuttle Radar Topography Mission (SRTM) was used to obtain the DEM layer, and the slope angle, aspect, and plain curvature layers were created from this DEM layer. Elevation has a significant role in flood occurrence in such a way that areas with less elevation are more susceptible to floods (Tien Bui et al. 2016). Also, slope angle is one of the most important topographical criteria for flood control, and due to its direct influence on surface runoffs, it is present in much flood susceptibility-related research (Hong et al. 2018; Razavi Temeh et al. 2018). In addition, aspects can affect hydrological conditions as hydraulic processes such as evaporation and transpiration, weathering, and vegetation are greatly impacted by this parameter (Tien Bui et al. 2016). The morphological and topographical curvature of the region can be observed in the created waterway plane curvature map (Costache et al. 2021). Rivers influence the stability of slopes, as the distance to rivers can significantly affect the speed and size of floods (Razavi Termeh et al. 2018). The distance to the river layer was obtained using topographic maps with a scale of 1:50,000. Roads have been found to alter the stability of slopes and result in vertical cuts (Kalantari et al. 2014). In fact, roads, as surfaces with very low permeability coefficient in watersheds, play an important role in creating runoff. The distance to the road layer was obtained using topographic maps with a scale of 1:50,000. Faults, which are tectonic fractures that usually reduce pressure on stone landscapes, can cause floods if their activity increases and they start moving (Hong et al. 2018). The distance to the fault layer was extracted using topographic maps with a scale of 1:100,000. Land use and vegetation also have significant effects on the stability of hillsides and changes in land use can disrupt the natural balance of the region and cause instability (Khosravi et al. 2019). On the other hand, it reduces the speed of surface runoff and has a great effect on preventing floods in watersheds. The land-use layer was obtained using satellite imagery from the Operational Land Imager (OLI) provided by the Google Earth engine. Changes in land use, extensive felling of trees, and overgrazing have contributed to reduced permeability, hence increasing the risk of flooding (Khosravi et al. 2016). In this regard, the 4th and 5th bands of the OLI satellite were used to prepare the NDVI layer. A lower NDVI value means more flood susceptibility. Rainfall has also been identified as a criterion influencing floods. Hence, the rainfall intensity map was prepared using MFI. To this end, six synoptic stations were utilized in the area of research, and the index can be calculated as Equation (1) (Kanani-Sadat et al. 2019):
formula
(1)
In which pi is the average rainfall for the ith month and p is the average annual rainfall. The greater these numbers are, the higher the susceptibility to floods. Additionally, TWI, which has many applications in determining the effects of topography in the region, the size of the saturated area, and runoff production, can be calculated as Equation (2) (Jancewicz et al. 2019; Eini et al. 2020):
formula
(2)
where As is the area of a specific basin (m2/m) and β is the slope angle in degrees. The lower the TWI value, the higher the susceptibility to floods. Also, the CN layer, which defines the potential of the basin runoff, can be calculated as Equation (3) (Eini et al. 2020):
formula
(3)
where R is the elevation of the direct runoff and P is rainfall elevation. There is a direct relationship between the CN value and flood susceptibility. SPI relates to the erosion power of floods and has a direct relationship with slope angle and watershed area. This index can be calculated as Equation (4) (Hong et al. 2018):
formula
(4)
In which and β are the basin area (m2/m) and the slope angle in degrees, respectively. Therefore, as the speed of the surface current increases, the SPI value, and in turn the risk of flooding, increases. TPI indicates the difference in elevation of each cell with the average elevation of its neighboring cells. The lower the TPI value, the higher the susceptibility to floods. This index can be calculated as Equation (5) (Kanani-Sadat et al. 2019):
formula
(5)
where is cell elevation and is the average elevation of neighboring cells. TRI indicates the difference in elevation of adjacent cells in an elevation grid and can be calculated as Equation (6) (Kalantari et al. 2017):
formula
(6)

In which min and max are the lowest and highest elevations of the adjacent cells, respectively. This index is indicative of the topographical features of the region and can significantly influence flood occurrence (Kalantari et al. 2017). Soil type has been found to be an important mechanism regarding permeability, runoff production, and flooding (Kanani-Sadat et al. 2019). The soil type map of the region with a scale of 1:250,000 was obtained from the Iranian forests, rangelands, and watershed organization. Additionally, different lithology units show noteworthy changes in the instability of hillsides and also affect surface runoff. The lithology map of the case study with a scale of 1:100,000 was prepared using the Iranian Geological map (26 geological units). The flow accumulation contains the cumulative number of cells upstream of a cell and shows the amount of current flowing from the upstream cells to that cell (Kanani-Sadat et al. 2019). The higher the flow accumulation value, the higher the susceptibility to floods. The HOFD is the actual movement of water from one cell to another cell (Pourghasemi et al. 2020). The lower the HOFD value, the higher the susceptibility to floods. The VOFD indicates the vertical distance between the height of each cell and the height calculated for the flow network (Pourghasemi et al. 2020). The lower the VOFD value, the higher the susceptibility to floods. The flow accumulation, HOFD, and VOFD criteria were calculated using appropriate analyses in the QGIS 3.1 and SAGA GIS 7.9 software. An overview of the data used in the study is provided in Table 1.

Table 1

Data set used in the present study

CategoryUnitSourceData typeScale/Spatial resolution
MFI Millimeter Six synoptic stations Grid 30×30 m 
Slope Degree DEM Grid 30×30 m 
DEM meter SRTM satellite (https://earthexplorer.usgs.govGrid 30×30 m 
CN – DEM and rainfall Grid 30×30 m 
Distance to river meter Topographical map from Iranian national cartographic center Grid 1:50,000 
NDVI – OLI satellite (https://earthengine.google.comGrid 30×30 m 
Plane curvature 100/meter DEM Grid 30×30 m 
TWI – DEM Grid 30×30 m 
Aspect Degree DEM Grid 30×30 m 
Distance to road meter Topographical map from Iranian national cartographic center Grid 1:50,000 
VOFD – DEM Grid 30×30 m 
Distance to fault meter Topographical map from Iranian national cartographic center Grid 1:100,000 
Soil type – Iranian forests, rangelands and watershed organization Vector 1:250,000 
Land use – OLI satellite (https://earthengine.google.comVector 30 × 30 m 
Lithology – Iranian Geological map Vector 1:100,000 
HOFD – DEM Grid 30 × 30 m 
Flow accumulation – DEM Grid 30 × 30 m 
SPI – DEM Grid 30 × 30 m 
TPI – DEM Grid 30 × 30 m 
TRI – DEM Grid 30 × 30 m 
CategoryUnitSourceData typeScale/Spatial resolution
MFI Millimeter Six synoptic stations Grid 30×30 m 
Slope Degree DEM Grid 30×30 m 
DEM meter SRTM satellite (https://earthexplorer.usgs.govGrid 30×30 m 
CN – DEM and rainfall Grid 30×30 m 
Distance to river meter Topographical map from Iranian national cartographic center Grid 1:50,000 
NDVI – OLI satellite (https://earthengine.google.comGrid 30×30 m 
Plane curvature 100/meter DEM Grid 30×30 m 
TWI – DEM Grid 30×30 m 
Aspect Degree DEM Grid 30×30 m 
Distance to road meter Topographical map from Iranian national cartographic center Grid 1:50,000 
VOFD – DEM Grid 30×30 m 
Distance to fault meter Topographical map from Iranian national cartographic center Grid 1:100,000 
Soil type – Iranian forests, rangelands and watershed organization Vector 1:250,000 
Land use – OLI satellite (https://earthengine.google.comVector 30 × 30 m 
Lithology – Iranian Geological map Vector 1:100,000 
HOFD – DEM Grid 30 × 30 m 
Flow accumulation – DEM Grid 30 × 30 m 
SPI – DEM Grid 30 × 30 m 
TPI – DEM Grid 30 × 30 m 
TRI – DEM Grid 30 × 30 m 

Methods

Multi-collinearity test

Collinearity occurs when an independent variable has a linear relationship with one or more independent variables in multiple regression, and can therefore be viewed as a linear composition of the other variables. When collinearity exists between the criteria of a model, the coefficients are not valid as the effects of the independent variable on a dependent variable also include the effects of the other variables in the model (Khosravi et al. 2016; Arabameri et al. 2017). Therefore, the independence of ML model criteria from one another is necessary in order to have accurate and valid results. Correlation analyses indicate that relationships between one or more variables can cause deviations. Two indicators were used in order to investigate multi-collinearity for the various criteria in the present study, including tolerance (TOL) and Variance Inflation Factor (VIF), which can be calculated as Equations (7) and (8), respectively (Choubin et al. 2019):
formula
(7)
formula
(8)
where is the coefficient of determination of the regression model on the j regression variable. In cases where the TOL value is lower than 0.1 and the VIF value is higher than 5, high multi-collinearity exists between the predicting variables (Choubin et al. 2019).

Rotation forest

In the ROF model, principal component analysis (PCA) was used to train the model by rotating each of the main individual clusters (Pham et al. 2017). To create a training data set for each of the base clusters, the F data set was classified into K subsets and PCA was performed for each of these subsets. All of the main components remained intact in the data set in order to preserve the variety in information. Therefore, K was identified as the rotation axis and was used as a new feature for basic clustering training (Kuncheva & Rodrıguez 2007). The advantage of the ROF algorithm is in the fact that variability in the data set and accuracy in the clustering process increase (Phong et al. 2021). The extraction of features for each of the main clusters increases data variety and preserves all of the main clustering accuracy components (Xia et al. 2013). The following steps were performed when creating the ROF model (Kuncheva & Rodrıguez 2007; Phong et al. 2021): (1) the training data set features were randomly categorized into K subsets. (2) The PCA algorithm was performed for each subset. (3) The rotation matrix was aligned again according to the sequence of the main features. (4) The results of the various classifiers were combined. (5) Each pixel of the training data set was assigned a class tag before all of the pixels of the area were trained. In recent years, the ROF algorithm has been mostly used for classification and has shown satisfactory performance as confirmed by AUC and other statistical indicators (Xia et al. 2013).

Alternating decision tree

The ADTree algorithm is an enhanced decision tree that is more accurate compared to traditional decision tree algorithms and has higher reliability when resolving classification and prediction issues (Bhowmick et al. 2010; Pham et al. 2016). The ADTree algorithm can be defined as the frequent recall of the decision tree algorithm with the help of the boosting method. In other words, the ADTree algorithm is a comparative algorithm that is created by the frequent recall of data condition-based decision trees (Chen et al. 2020). The ADTree creates an optimal tree structure in response. To create this tree, an index is attributed to each feature. The root node has an index of zero and holds the number of available records, while nodes with an index of j expand from every node with an index of i (i > j). Therefore, all of the features expand from the root node. The application of decision trees and boosting algorithms results in more valid categorizers. The combination of boosting with decision trees creates new classification rules which are smaller and easier to interpret (Bhowmick et al. 2010). The advantages of ADTree include a smaller number of nodes and simplicity of explanation (Pham et al. 2016). In addition, ADTree model uses fewer repetitions in its processes and can turn the classes in the model into binary to analyze the training data sets and validate them (Wu et al. 2020).

Random forest

The RF algorithm is one of the most popular algorithms used to analyze classification issues and multiple predictions (Chen et al. 2021). This algorithm has a low sensitivity to multi-linearity and its results are relatively stable in regard to missing or irregular data (Chen et al. 2020). This algorithm is a modernized version of the base tree in which a vast array of classification and regression trees can be found (Hong et al. 2018). The RF algorithm is essentially an ensemble method in which several tree algorithms are combined in order to create a sequential prediction of any phenomenon (Rahmati et al. 2016). Generally, individual decision trees are more prone to fitness and have little generalizability. During the creation of a decision tree, small changes in the training structure can cause noteworthy changes in the shape of the tree. The predictive RF model is based on the averages of all relevant decision trees and performs classification for most data sets with high accuracy (De Santana et al. 2018). The random trees receive input vectors, classify them with all trees in the forest, and output class tags which were received from the majority of votes. In order to classify a new item, the input vector is placed at the end of each tree in the RF and each tree leads to classification and essentially votes for that class. The forest resulting from the classification process which receives the highest number of votes from the trees in the forest is chosen. The function of the RF is mainly determined by the number of decision trees (ntree) and the features present in the subsets (mtry) (De Santana et al. 2018; Chen et al. 2020). Larger numbers of trees can result in longer modeling times, while smaller numbers of trees can lead to error (Costache et al. 2021).

Binary particle swarm optimization algorithm

PSO is a collective search algorithm that has been modeled after the social behavior of fish and birds (Aghbashlo et al. 2019). PSO is a population-based stochastic optimization algorithm that becomes less entangled in the local minimum (Abed & Ahmad 2020). This algorithm can be defined in the two modes of continuous and discrete based on the problem area. In accordance with the research questions of the present study, binary PSO (BPSO) was used. In the algorithm's binary mode, particles are limited to dimensions zero and one that the particle speed and location vector can be updated as Equations (9) and (10), respectively (Beheshti 2020):
formula
(9)
formula
(10)

In which Xi(t) is the location of the i particle, Xi(t + 1) is the location of the i particle in its next position, Vi(t) is the velocity of the i particle, Vi(t + 1) is the velocity of the i particle in its next position, pbest is the best experienced position of the i particle, gbest is the best experienced position observed for all of the particles, c1 is the individual learning coefficient, c2 is the collective learning coefficient, w is the inertia weight, and r1, r1 and ρ are random numbers in the 0.1 range. The steps of the BPSO algorithm are based on Figure 4 (Beheshti 2020).

Figure 4

Calculation steps of the proposed models.

Figure 4

Calculation steps of the proposed models.

Close modal

Evaluation of the performance of the models

Statistical indicators

In the present study, 70% of the 370 flood-prone points in the Maneh and Samalqan watershed were used for training (259 points), while 30% of the total number of points were used to validate the models (111 points). In this study, a number of statistical indicators were used for the evaluation and validation of the models, including Positive Predictive Rate (PPR), Negative Predictive Rate (NPR), sensitivity (SST), specificity (SPC), and accuracy (ACC) (Chen et al. 2020; Phong et al. 2021). In the present study, PPR stands for the chance of a region having high flood potential, while NPR stands for the chance of a region having non-flood potential. The results for the PPR and NPR were positive and negative, respectively. SST shows the ratio of areas that have high flood potential and have been correctly classified, while SPC shows the ratio of correct classifications of areas that have non-flood potential. ACC stands for the ratio of areas with high flood potential and areas with non-flood potential. All of these criteria are calculated using a confusion matrix which consists of four potential outcomes, including True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). TP is the number of areas that have been correctly identified as high flood potential areas, while FP is the number of areas that have been incorrectly identified as high flood potential areas. Additionally, TN and FN are the number of areas that have been correctly and incorrectly identified to be non-flood potential areas, respectively. These statistical criteria can be calculated using the following equations (Chen et al. 2020; Phong et al. 2021):
formula
(11)
formula
(12)
formula
(13)
formula
(14)
formula
(15)
In addition, to evaluate the performance of the ML model results, three well-known statistical measures, including Root Mean Square Error (RMSE), Coefficient of Determination (R2), and Mean Absolute Error (MAE) were used (Fotheringham & Oshan 2016; Farhadi & Najafzadeh 2021):
formula
(16)
formula
(17)
formula
(18)

In which n is the number of observations, yi is the value for observation i, is the predicted value for the observation i, and is the mean value for observations.

Receiver operating characteristic

In the current study, the ROC method was used to evaluate the performance of the models with real positive and false positive rates on the Y and X axes, respectively (Chapi et al. 2017). This method uses the AUC factor for quantitative evaluation which illustrates the specificity (X-axis) against the sensitivity (Y-axis), and is calculated as Equation (19) (Rahmati et al. 2018):
formula
(19)

In which P and N are the sum of flood and non-flood locations, respectively, and the TP and TN criteria were mentioned in the previous section. The ROC curve is the most effective method of system prediction that can quantitatively estimate the accuracy of the model. In this method, the AUC values can range between 0.5 and 1, and the closer the value to 1, the higher the accuracy of the model (Yesilnacar & Topal 2005). The quantitative–qualitative correlation of the AUC and estimation evaluation has been presented in Table 2.

Table 2

Quantitative–qualitative correlation of the area under the curve in ROC method

QualitativePoorAverageGoodVery GoodExcellent
Quantitative 0.5–0.6 0.6–0.7 0.7–0.8 0.8–0.9 0.9–1 
QualitativePoorAverageGoodVery GoodExcellent
Quantitative 0.5–0.6 0.6–0.7 0.7–0.8 0.8–0.9 0.9–1 

Tuning of the machine learning and BPSO algorithm parameters

Due to how RMSE is considered one of the most important data-driven evaluation parameters, the fitness function of the BPSO algorithm has been selected with the aim of minimizing the RMSE value to check the accuracy of the ML models (Fotheringham & Oshan 2016). The RMSE parameter is used to measure the distribution of model residuals (Fotheringham & Oshan 2016). Based on Table 3, the optimal values for the original BPSO algorithm parameters were chosen through trial and error. In order to simplify the procedure, the stop condition was implemented and a specific number of executions was considered. In the present study, due to the randomized nature of the BPSO algorithm and based on previous studies, the BPSO algorithm was conducted 10 times and the best out of these 10 executions was considered as the final output (Saeidian et al. 2018).

Table 3

Set parameters in the BPSO algorithm

ParametersValueParametersValue
Swarm size 30 C2 
Total iterations 100 W 
C1 Minimum and maximum velocity [−4,4] 
ParametersValueParametersValue
Swarm size 30 C2 
Total iterations 100 W 
C1 Minimum and maximum velocity [−4,4] 

In addition, in order to implement the ML algorithms of the present study, tuning the input parameters was one of the most important steps in the process. In order to tune the optimal values (ntree and mtry) in the random forest algorithm, the k-fold cross-validation technique was used to converge the error values and increase prediction reliability (Arabameri et al. 2020). In k-fold cross-validation, the training data is divided into two groups of k folds, in which one of the folds is assigned the role of validation, while the rest of the folds (k-1) are allocated for training (Witten et al. 2011). Thus, the implemented ten-fold cross-validation in the present study involved the random division of the training data to 10 groups, before the parameters' best values were realized. Due to the fact that the number of trees has an inverse relationship with error rate (higher tree numbers mean fewer errors), 2,000 trees (ntree = 2,000) were deemed appropriate for the present study, and the subset characteristics parameter was decided to be mtry = 10. In addition, in order to avoid overfitting, ten-fold cross-validation was used to tune the parameters needed for the implementation of the two ROF and ADTree algorithms (Wang et al. 2006; Pham et al. 2016).

Preparation of data

As was previously mentioned, 20 hydrogeological, topographical, geological, and environmental criteria were used to predict the flood susceptibility of the Maneh and Samalqan watershed. These criteria include CN, HOFD, MFI, VOFD, NDVI, TWI, SPI, TRI, TPI, DEM, Slope angle, Flow Accumulation, Aspect, Plane curvature, Distance to fault, Distance to road, Distance to river, Lithology, Soil type, and Land use. Figure 5 shows maps of the mentioned criteria based on kriging interpolation (Eftekhari et al. 2021).

Figure 5

The maps of spatial criteria affecting the flood: (a) CN, (b) Aspect, (c) DEM, (d) Distance to Fault, (e) Distance to Road, (f) Distance to River, (g) HOFD, (h) Flow Accumulation, (i) Lithology, (j) Land use, (k) NDVI, (l) MFI, (m) Slope, (n) Plan curvature, (o) SPI, (p) Soil type, (q) TWI, (r) TPI, (s) VOFD, and (t) TRI.

Figure 5

The maps of spatial criteria affecting the flood: (a) CN, (b) Aspect, (c) DEM, (d) Distance to Fault, (e) Distance to Road, (f) Distance to River, (g) HOFD, (h) Flow Accumulation, (i) Lithology, (j) Land use, (k) NDVI, (l) MFI, (m) Slope, (n) Plan curvature, (o) SPI, (p) Soil type, (q) TWI, (r) TPI, (s) VOFD, and (t) TRI.

Close modal

Investigation of variable independence

The existence of multi-collinearity between the criteria was explored, and the results have been presented in Table 4. Based on these results, it was found that TRI has rejected the multi-collinearity conditions (TOL > 0.1 and VIF < 5), and therefore, it cannot be used as modeling input, hence it was removed from the series of criteria in the present study. Other than that, no high multi-collinearity was observed among the other criteria. In addition, T and sig values were considered for the coefficient test. The higher the T value, the weaker the assumption that the coefficient is zero, therefore the role of that criteria is greater in modeling (Choubin et al. 2019). This can also be investigated through consideration of the significance level, where a significance of less than 0.05 indicates that the null hypothesis (no effect of the criteria in modeling) can be rejected (Choubin et al. 2019).

Table 4

The result of multi-collinearity test

OrderCriteriaBStd errorStandardized coefficients BetaTSigTOLVIF
MFI 0.87 0.043 0.18 1.02 0.045 0.92 1.02 
Slope 0.12 0.037 0.12 2.08 0.025 0.19 2.82 
DEM 0.03 0.025 0.65 2.02 0.023 0.52 1.95 
CN 0.04 0.068 0.03 2.25 0.007 0.25 1.12 
Distance to river 0.14 0.075 0.36 3.36 0.039 0.32 2.22 
NDVI 0.22 0.077 0.74 2.73 0.017 0.19 2.95 
Plane curvature 0.12 0.066 0.36 2.95 0.001 0.42 4.85 
TWI 0.03 0.042 0.58 1.35 0.081 0.44 1.35 
Aspect 0.01 0.036 0.19 1.58 0.025 0.74 2.36 
10 Distance to road 0.14 0.031 0.27 1.95 0.075 0.39 2.22 
11 VOFD 0.18 0.041 0.75 2.33 0.009 0.33 1.63 
12 Distance to fault 0.17 0.042 0.23 1.97 0.025 0.63 3.25 
13 Soil type 0.10 0.048 0.33 1.25 0.018 0.27 4.16 
14 Land use 0.07 0.034 0.19 2.84 0.026 0.74 1.75 
15 Lithology 0.12 0.062 0.44 1.33 0.074 0.15 2.65 
16 HOFD 0.23 0.071 0.26 2.39 0.032 0.22 2.32 
17 Flow accumulation 0.15 0.042 0.29 1.74 0.005 0.83 2.08 
18 SPI 0.35 0.028 0.24 1.46 0.019 0.55 3.02 
19 TPI 0.18 0.021 0.23 2.59 0.013 0.49 1.96 
20 TRI 0.12 0.079 0.66 3.52 0.098 0.048 13.95 
OrderCriteriaBStd errorStandardized coefficients BetaTSigTOLVIF
MFI 0.87 0.043 0.18 1.02 0.045 0.92 1.02 
Slope 0.12 0.037 0.12 2.08 0.025 0.19 2.82 
DEM 0.03 0.025 0.65 2.02 0.023 0.52 1.95 
CN 0.04 0.068 0.03 2.25 0.007 0.25 1.12 
Distance to river 0.14 0.075 0.36 3.36 0.039 0.32 2.22 
NDVI 0.22 0.077 0.74 2.73 0.017 0.19 2.95 
Plane curvature 0.12 0.066 0.36 2.95 0.001 0.42 4.85 
TWI 0.03 0.042 0.58 1.35 0.081 0.44 1.35 
Aspect 0.01 0.036 0.19 1.58 0.025 0.74 2.36 
10 Distance to road 0.14 0.031 0.27 1.95 0.075 0.39 2.22 
11 VOFD 0.18 0.041 0.75 2.33 0.009 0.33 1.63 
12 Distance to fault 0.17 0.042 0.23 1.97 0.025 0.63 3.25 
13 Soil type 0.10 0.048 0.33 1.25 0.018 0.27 4.16 
14 Land use 0.07 0.034 0.19 2.84 0.026 0.74 1.75 
15 Lithology 0.12 0.062 0.44 1.33 0.074 0.15 2.65 
16 HOFD 0.23 0.071 0.26 2.39 0.032 0.22 2.32 
17 Flow accumulation 0.15 0.042 0.29 1.74 0.005 0.83 2.08 
18 SPI 0.35 0.028 0.24 1.46 0.019 0.55 3.02 
19 TPI 0.18 0.021 0.23 2.59 0.013 0.49 1.96 
20 TRI 0.12 0.079 0.66 3.52 0.098 0.048 13.95 

Identification of criteria affecting flood susceptibility

The reason for using binary PSO is to determine the optimal combination of effective criteria in predicting flood susceptibility. In the algorithm's binary mode, the dimensions of each particle are limited to zero and one. According to Table 4 and Figure 6, the BPSO algorithm used in this study includes 19 dimensions, each of which corresponds to a criterion. After initializing the dimensions of each particle (0 or 1), for dimensions that have a value of one, the criteria with raster values are entered as input into the ML algorithm and the fitness function value for the corresponding particle is calculated. The process continues until the BPSO algorithm reaches the 100th iteration, which is the end of the algorithm in this research. When the BPSO algorithm reaches the stop condition, the particle with the best value of the fitness function is selected. Then, the dimensions with a value of one indicate the optimal combination of effective criteria in predicting flood susceptibility.

Figure 6

Swarm structure of BPSO algorithm in this research.

Figure 6

Swarm structure of BPSO algorithm in this research.

Close modal

After training the ROF, ADTree, and RF ML models using the BPSO algorithm, the RMSE chart was illustrated (Figure 7). Based on Figure 7, the best RMSE values for the ROF-BPSO, ADTree-BPSO, and RF-BPSO models were calculated to be 0.335, 0.289, and 0.218, respectively. This shows how highly accurate the RF-BPSO model is in terms of predicting the flood susceptibility of the area of research. The effective criteria in predicting flood susceptibility for the ROF-BPSO, ADTree-BPSO, and RF-BPSO hybrid tree-based models were identified (Figure 8). The results showed that for the ROF-BPSO, ADTree-BPSO, and RF-BPSO models, 11 criteria (MFI, Slope, DEM, CN, Distance to river, Plane curvature, VOFD, Distance to fault, Land use, Lithology, and SPI), 12 criteria (MFI, Slope, DEM, Distance to river, NDVI, Aspect, Distance to fault, Soil type, Land use, Lithology, HOFD, and SPI), and 15 criteria (MFI, Slope, DEM, CN, Distance to river, NDVI, TWI, Aspect, Distance to road, Soil type, Land use, Lithology, HOFD, SPI, and SPI) were found to be effective in predicting flood susceptibility, respectively. It can also be concluded that a total of three output modes for the proposed models, MFI, Slope, DEM, Distance to river, Land use, Lithology, and SPI criteria, were of greater importance in predicting flood susceptibility in the study area.

Figure 7

The best value of fitness function by combining ML models and BPSO algorithm.

Figure 7

The best value of fitness function by combining ML models and BPSO algorithm.

Close modal
Figure 8

The effective criteria in predicting flood susceptibility: (a) ROF-BPSO, (b) ADTree-BPSO, and (c) RF-BPSO.

Figure 8

The effective criteria in predicting flood susceptibility: (a) ROF-BPSO, (b) ADTree-BPSO, and (c) RF-BPSO.

Close modal
Figure 9

Flood susceptibility prediction maps: (a) ROF-BPSO, (b) ADTree-BPSO, and (c) RF-BPSO.

Figure 9

Flood susceptibility prediction maps: (a) ROF-BPSO, (b) ADTree-BPSO, and (c) RF-BPSO.

Close modal

Prediction of the flood susceptibility map

Despite the similar results of the different flood susceptibility models in terms of determining flood-prone areas, another major goal of flood susceptibility modeling is to select a model that is accurate. Therefore, the greater the number of flood-prone locations, the higher the likelihood of flood occurrence, and the lower the number, the lower the likelihood of flood occurrence. In the present study, flood susceptibility prediction maps were predicted using the ROF-BPSO, ADTree-BPSO, and RF-BPSO hybrid tree-based models using training data (70%), and were validated (30%). As mentioned, combining the BPSO metaheuristics algorithm with tree-based ML models provides an optimal combination of flood effective criteria based on the value of the fitness function and predicts flood susceptibility maps according to the optimal criteria. According to Figure 9, the maps of the predicted degree of susceptibility (based on the flood-prone locations) by combining tree-based ML models and the BPSO algorithm showed in the range [0,1]. The predicted degree of susceptibility is classified into five output classes according to the natural break classification method which is shown qualitatively from a very low degree of susceptibility to a very high degree of susceptibility. This classification method was used due to how this method is based on the natural groupings in the data, and the fracture points between the groupings are identified in such a way that similar values are grouped together and the differences between the classes are maximized (Razavi Temeh et al. 2018; Chen et al. 2020; Eslaminezhad et al. 2022). Additionally, features are placed in classes in which, due to the way the class boundaries are determined, large relative changes in data values occur. According to Figure 9, these maps show good reliability, since they are compatible with the floods that occurred. There has been predicted a high or very high level of susceptibility at the location of many previous events. Also, Figure 9 shows the flood susceptibility maps for the area in question which were predicted using the aforementioned models. These maps indicate that the chances of flooding in the center of the area in question are higher than in other points due to higher elevations and lower slope angles (Kanani-Sadat et al. 2019). Furthermore, due to a decrease in forest lands and severe changes in land use in recent years, and the resulting increase in urban areas and dry lands, the likelihood of flooding damage has increased. In addition, the damage caused by floods have increased due to buildings that are built too close to the river area.

Figure 10

Percentage of each flood risk class in the study area.

Figure 10

Percentage of each flood risk class in the study area.

Close modal

The percentage of the flood susceptibility classes predicted by the ROF-BPSO, ADTree-BPSO, and RF-BPSO tree-based models can be observed in Figure 10. The findings indicated that, in the RF-BPSO model, two classes with very high and high susceptibility covered a higher percentage of the area in question as compared to similar classes in the ROF-BPSO and ADTree-BPSO models, with this model classifying 50.5 and 29.05% of the area in the very low and very high flood susceptibility classes, respectively. Additionally, the moderate class in the ADTree-BPSO model covered a higher percentage of the area compared to similar classes in the ROF-BPSO and RF-BPSO models, with this model classifying 56.23 and 23.27% of the area in the very low and very high flood susceptibility classes, respectively. Also, the two very low and low classes in the ROF-BPSO model covered a higher percentage of the area compared to similar classes in the RF-BPSO and ADTree-BPSO models, with this model classifying 62.92 and 17.15% of the area in the very low and very high flood susceptibility classes, respectively. All in all, 56.55% of the area in question was classified in the very low flood susceptibility class.

Figure 11

ROC curve and AUC value for proposed research methods: (a) training data sets and (b) validation data sets.

Figure 11

ROC curve and AUC value for proposed research methods: (a) training data sets and (b) validation data sets.

Close modal
Figure 12

Final flood susceptibility map obtained by the combination of three ML models.

Figure 12

Final flood susceptibility map obtained by the combination of three ML models.

Close modal

Validation of the flood susceptibility prediction models

Validating the predicted maps is a necessary step in the evaluation of the quality of these maps; thus, the ROC curve method was used to evaluate the performance of the models. The ROC curve for the training and validation data sets of the ROF-BPSO, ADTree-BPSO, and RF-BPSO tree-based models has been presented in Figure 11. Generally, the training data sets show the ability of the models in predicting flood susceptibility, while the validation data sets show the predictive skill of the models. For the training data sets, RF-BPSO was found to have the highest AUC value (0.961), followed by ROF-BPSO (0.957) and ADTree-BPSO (0.942) (Figure 11). Furthermore, for the validation data sets, RF-BPSO was found to have the highest accuracy (AUC = 0.935) compared to ROF-BPSO (AUC = 0.904) and ADTree-BPSO (AUC = 0.923). Therefore, while all of the models have good predictive power, the RF-BPSO model has the best accuracy and performance in regard to predicting the flood susceptibility of the area of research. These findings conform to those of previous studies (Khosravi et al. 2016; Hong et al. 2018).

Training (Table 5) and validation (Table 6) data sets were used to evaluate the flood susceptibility predicting abilities of the models. In order to classify the flood pixels, the RF-BPSO model was found to have the highest SST values for the training and validation data sets (0.929 and 0.887, respectively). For the classification of non-flood pixels, the RF-BPSO model was found to have the highest SPC for the training and validation data sets (0.963 and 0.924, respectively). In addition, the RF-BPSO model was found to have the highest positive and negative prediction rates and the highest accuracy for the two data sets. The validation findings show that the RF-BPSO model has a better performance compared to the other two models in regard to flood susceptibility prediction for the area of study. This may be due to its ability to model large databases and its ability to combine large amounts of input variables without changing them (Rahmati et al. 2016). Also, previous studies have shown that the RF model performs at a high level when predicting flood susceptibility maps (Hong et al. 2018; Zhao et al. 2018; Tang et al. 2020). The RF model makes use of the high variance in individual trees and places them in classes (Chen et al. 2017). Furthermore, studies on various issues such as fire, groundwater potential, and earthquake susceptibility have shown that RF can be used to predict a variety of criteria (Levy et al. 2007; Mukerji et al. 2009; Sahoo et al. 2009; Tien Bui et al. 2018).

Table 5

Performance of models using training data sets

NumberCriteriaROF-BPSOADTree-BPSORF-BPSO
TP 238 244 250 
TN 233 235 240 
FP 21 15 
FN 26 24 19 
PPR 0.918 0.942 0.965 
NPR 0.899 0.907 0.926 
SST 0.901 0.910 0.929 
SPC 0.917 0.940 0.963 
ACC 0.909 0.924 0.945 
NumberCriteriaROF-BPSOADTree-BPSORF-BPSO
TP 238 244 250 
TN 233 235 240 
FP 21 15 
FN 26 24 19 
PPR 0.918 0.942 0.965 
NPR 0.899 0.907 0.926 
SST 0.901 0.910 0.929 
SPC 0.917 0.940 0.963 
ACC 0.909 0.924 0.945 
Table 6

Performance of models using validation data sets

NumberCriteriaROF-BPSOADTree-BPSORF-BPSO
TP 95 99 103 
TN 91 96 98 
FP 16 12 
FN 20 15 13 
PPR 0.855 0.891 0.927 
NPR 0.819 0.864 0.882 
SST 0.826 0.868 0.887 
SPC 0.850 0.888 0.924 
ACC 0.837 0.878 0.905 
NumberCriteriaROF-BPSOADTree-BPSORF-BPSO
TP 95 99 103 
TN 91 96 98 
FP 16 12 
FN 20 15 13 
PPR 0.855 0.891 0.927 
NPR 0.819 0.864 0.882 
SST 0.826 0.868 0.887 
SPC 0.850 0.888 0.924 
ACC 0.837 0.878 0.905 

In addition to the above, the analytical performance of three flood susceptible models based on training and validation data sets using several error measures such as RMSE, MAE, and coefficient of determination (R2) is presented in Table 7. This finding clearly showed that the RF-BPSO model has the lowest RMSE, the lowest MAE, and the highest R2 compared to other proposed models for the training and validation data sets, which represents a very high performance of the RF-BPSO model in flood susceptibility prediction along with fewer errors. According to Table 7, the following ranking of statistical performance for optimized tree-based ML models is observed: RF-BPSO > ADTree-BPSO > ROF-BPSO.

Table 7

Assessing the accuracy of three optimized tree-based ML models using different error measures for training and validation data sets

ModelsROF-BPSO
ADTree-BPSO
RF-BPSO
TrainingValidationTrainingValidationTrainingValidation
RMSE 0.312 0.335 0.269 0.289 0.198 0.218 
MAE 0.191 0.193 0.175 0.176 0.145 0.147 
R2 0.842 0.813 0.886 0.877 0.931 0.926 
ModelsROF-BPSO
ADTree-BPSO
RF-BPSO
TrainingValidationTrainingValidationTrainingValidation
RMSE 0.312 0.335 0.269 0.289 0.198 0.218 
MAE 0.191 0.193 0.175 0.176 0.145 0.147 
R2 0.842 0.813 0.886 0.877 0.931 0.926 

In accordance with Khosravi et al. (2016) and Rahmati et al. (2016), the results of the present study showed that the DEM, distance to river, and MFI have a major effect on potential flood susceptibility, and are therefore useful in flood susceptibility prediction models. A direct relationship was found between MFI and flood susceptibility; more rainfall results in more chances of flooding and an increase in flood susceptible layer weight. These findings regarding the direct relationship between MFI and flooding conform to studies conducted by Pham et al. (2020) and Razavi Termeh et al. (2018). Another criterion found to affect flood susceptibility is DEM, where the higher the elevation, the lower the chance of flooding. The highest flood susceptibility was found to be present in low elevation hillside areas, which can be due to the accumulation of rainwater and flooding in these areas. This finding also conforms to previous studies (Khosravi et al. 2016; Liu et al. 2016). Based on the findings of the present study, it can be concluded that the majority of the mass accumulation and distribution of floods occurs near rivers and low elevation regions. In other words, areas with high flood susceptibility have low elevation, minimum slope angle, a flat area, and are close to rivers.

Final flood susceptibility map

All three optimized tree-based ML models (RF-BPSO, ADTree-BPSO, and ROF-BPSO) used have excellent performance (according to Table 2). Therefore, in the end, to present the final map of these three models based on their AUC, they were combined through an innovative and new method using the following equation:
formula
(20)

In which F is the value of the raster layer resulting from the combination of the three models, AUCi is the area under the curve of each model, m is the flood susceptibility layer of each model, and n is the number of models. Figure 12 shows the final flood susceptibility map in the study area, which is a combination of three models including ROF-BPSO, ADTree-BPSO, and RF-BPSO. This model classifies 54.03 and 26.21% of the area in the very low and very high flood susceptibility classes, respectively. The central parts of the study area to the east and west have high and very high flood susceptibility, the most important reasons for which are the lower elevation and minimum slope angle of these areas.

Floods are considered to be some of the most devastating and destructive forces of nature. Using appropriate spatial analyses relevant to flood susceptibility, the majority of the damage caused by these natural disasters can be avoided. The aim of the present study was to predict the flood susceptibility of the Maneh and Samalqan watershed using three ML models, i.e. ROF-BPSO, ADTree-BPSO, and RF-BPSO. In recent years, few studies have been conducted on the use of tree-based ML models in predicting flood susceptibility (Khosravi et al. 2016; Costache et al. 2021). Therefore, the performances of three ML models were evaluated and compared using statistical indicators and AUC values. The findings indicated that RF-BPSO outperformed the other two models. Compared to the other two models, RF is an algorithm that can produce more satisfactory results, with higher accuracy and lower variance and bias (Hong et al. 2018; Zhao et al. 2018; Chen et al. 2020; Nachappa et al. 2020). Therefore, the RF-BPSO model has the best performance in regards to predicting the flood susceptibility of the Maneh and Samalqan watersheds. Regarding the ADTree-BPSO model, some variance may be found in terms of the performance of some of the training and validation data, resulting in a decrease in the model's stability. In addition, it can be concluded that the general statistical performance of the ADTree-BPSO model was better compared to the ROF-BPSO model. However, the ROF-BPSO model showed a higher AUC value for the training data. This means that multiple statistical indicators must be taken into account in order to evaluate the comprehensive performance of various models. All three models can develop acceptable results in terms of flood susceptibility prediction. Regarding the distance criteria, the greater the distance to rivers, the lower the chances of flooding. Regarding elevation, which is considered to be one of the most impactful criteria in flood susceptibility, greater chances of flooding were found in lower elevations; floods usually occur in areas with elevations below sea level. Regarding the MFI, the higher the elevation, the higher the chances of rainfall, while the chances of flooding in higher elevations are lower. Land use is also an influencing criterion in flood susceptibility as the type of land use can affect runoff permeability and increase its speed. Due to human influences and changes in land use, the strength of the waterway decreases, thereby influencing the chances of flooding. These findings conform to the findings of previous studies (Khosravi et al. 2016; Liu et al. 2016; Rahmati et al. 2016; Razavi Termeh et al. 2018; Pham et al. 2020). Based on these findings and the predicted flood susceptibility maps, appropriate management measures can be undertaken in order to reduce the human and financial damage caused by floods. Finally, the use of data mining and GIS to predict the potential of flooding can be beneficial, especially in developing countries in which access to hydrogeological data is difficult.

The analysis of flood susceptibility maps using modern ML models can assist policymakers in reducing the effects and damages of floods in high-risk areas. The study proposed new hybrid tree-based ML models to identify the most susceptible region in the Maneh and Samalqan watershed, Iran. So, the present study aimed to evaluate three tree-based ML models, including ROF-BPSO, ADTree-BPSO, and RF-BPSO, to predict flood susceptibility in the study area. As a result, high-resolution satellite images, Google Earth, and primary data on previous flood locations were used to prepare the inventory map, and the 370 flood-prone locations were divided into training (70%) and testing (30%) for the construction and validation of three ML models. The findings of this study can be summarized as follows:

  • 1.

    The results showed that the three optimized ML models of RF-BPSO, ADTree-BPSO, and ROF-BPSO can perform well in regard to predicting the flood susceptibility of the area in question. RF-BPSO was found to be superior in terms of PPR, NPR, sensitivity, specificity, ACC, and AUC.

  • 2.

    The results indicated that the seven criteria, including MFI, Slope, DEM, Distance to river, Land use, Lithology, and SPI, were of greater importance in predicting flood susceptibility in the study area.

  • 3.

    The findings indicated that the chance of flooding in the center of the area in question is greater than in other points due to lower elevation, lower slope angle, more drylands, and proximity to rivers.

Due to the satisfactory performance and high accuracy of the RF-BPSO model in predicting flood susceptibility, it is suggested that future studies use this model and take hydrogeological, topographical, geological, and environmental criteria into account in order to reduce the damage caused by floods in other regions.

However, there are limitations to the practical use of reproduced flood susceptibility maps. In spite of the limitations of reproduced flood susceptibility maps, it is important to note that these maps do not include flood depth, duration, severity, or frequency. Furthermore, the flood inventory maps used in this study and previous ones only used binary values (0, 1) for the absence or presence of floods, but fail to include flood frequency. As a result, flood locations were given equal weight to predict flood susceptibility maps. Future research should consider the frequency of floods at each flood point in flood susceptibility maps.

Data cannot be made publicly available; readers should contact the corresponding author for details.

There is no conflict of interest in this article.

Abed
K. A.
&
Ahmad
A. A.
2020
The best parameters selection using PSO algorithm to solving for ITO system by new iterative technique
.
Indonesian Journal of Electrical Engineering and Computer Science
18
(
3
),
1638
1645
.
https://doi.org/10.1103/PhysRevD.90.112016
.
Aghbashlo
M.
,
Tabatabaei
M.
,
Nadian
M. H.
,
Davoodnia
V.
&
Soltanian
S.
2019
Prognostication of lignocellulosic biomass pyrolysis behavior using ANFIS model tuned by PSO algorithm
.
Fuel
253
,
189
198
.
https://doi.org/10.1016/j.fuel.2019.04.169
.
Alam
Z.
,
Sun
L.
,
Zhang
C.
,
Su
Z.
&
Samali
B.
2021
Experimental and numerical investigation on the complex behaviour of the localised seismic response in a multi-storey plan-asymmetric structure
.
Structure and Infrastructure Engineering
17
(
1
),
86
102
.
https://doi.org/10.1080/15732479.2020.1730914
.
Arabameri
A.
,
Pourghasemi
H. R.
&
Shirani
K.
2017
Zoning of flood sensitivity using a new ensemble method of Bayesian theory hierarchical analysis process (Case study: Neka watershed – Mazandaran province)
.
Eco-Hydrology
4
(
2
),
447
462
.
https://doi.org/10.22059/ije.2017.61481
.
Arabameri
A.
,
Karimi-Sangchini
E.
,
Pal
S. C.
,
Saha
A.
,
Chowdhuri
I.
,
Lee
S.
&
Tien Bui
D.
2020
Novel credal decision tree-based ensemble approaches for predicting the landslide susceptibility
.
Remote Sensing
12
(
20
),
3389
.
https://doi.org/10.3390/rs12203389
.
Arnold
J. G.
,
Srinivasan
R.
,
Muttiah
R. S.
&
Williams
J. R.
1998
Large area hydrologic modeling and assessment part I: model development 1
.
JAWRA Journal of the American Water Resources Association
34
(
1
),
73
89
.
https://doi.org/10.1111/j.1752-1688.1998.tb05961.x
.
Beheshti
Z.
2020
A time-varying mirrored S-shaped transfer function for binary particle swarm optimization
.
Information Sciences
512
,
1503
1542
.
https://doi.org/10.1016/j.ins.2019.10.029
.
Bhowmick
S.
,
Eijkhout
V.
,
Freund
Y.
,
Fuentes
E.
,
Keyes
D.
2010
Application of alternating decision trees in selecting sparse linear solvers
. In:
Software Automatic Tuning: From Concepts to State-of-the-Art Results
(
Naono
K.
,
Teranishi
K.
,
Cavazos
J.
&
Suda
R.
, eds.).
Springer New York
,
New York, NY
, pp.
153
173
.
Bicknell
B. R.
,
Imhoff
J. C.
,
Kittle Jr
J. L.
,
Donigian Jr
A. S.
&
Johanson
R. C.
1997
Hydrological Simulation Program – FORTRAN User's Manual for Version 11
.
Environmental Protection Agency Report No. EPA/600/R-97/080
.
US Environmental Protection Agency
,
Athens, GA
.
https://doi.org/EPA/600/R-97/080
.
Borah
D. K.
2011
Hydrologic procedures of storm event watershed models: a comprehensive review and comparison
.
Hydrological Processes
25
(
22
),
3472
3489
.
https://doi.org/10.1002/hyp.8075
.
Chapi
K.
,
Singh
V. P.
,
Shirzadi
A.
,
Shahabi
H.
,
Tien Bui
D.
,
Pham
B. T.
&
Khosravi
K.
2017
A novel hybrid artificial intelligence approach for flood susceptibility assessment
.
Environmental Modelling & Software
9
,
229
245
.
https://doi.org/10.1016/j.envsoft.2017.06.012
.
Chen
H.
,
Ito
Y.
,
Sawamukai
M.
&
Tokunaga
T.
2015
Flood hazard assessment in the Kujukuri Plain of Chiba Prefecture Japan based on GIS and multicriteria decision analysis
.
Natural Hazards
78
(
1
),
105
120
.
https://doi.org/10.1007/s11069-015-1699-5
.
Chen
W.
,
Shirzadi
A.
,
Shahabi
H.
,
Ahmad
B. B.
,
Zhang
S.
,
Hong
H.
&
Zhang
N.
2017
A novel hybrid artificial intelligence approach based on the rotation forest ensemble and Naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County China
.
Geomatics, Natural Hazards and Risk
8
(
2
),
1955
1977
.
https://doi.org/10.1080/19475705.2017.1401560
.
Chen
W.
,
Hong
H.
,
Li
S.
,
Shahabi
H.
,
Wang
Y.
,
Wang
X.
&
Ahmad
B. B.
2019
Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles
.
Journal of Hydrology
575
,
864
873
.
https://doi.org/10.1016/j.jhydrol.2019.05.089
.
Chen
W.
,
Li
Y.
,
Xue
W.
,
Shahabi
H.
,
Li
S.
,
Hong
H.
,
Wang
X.
,
Bian
H.
,
Zhang
S.
,
Pradhan
B.
&
Ahmad
B. B.
2020
Modeling flood susceptibility using data-driven approaches of Naïve Bayes tree, alternating decision tree, and random forest methods
.
Science of The Total Environment
701
,
134979
.
https://doi.org/10.1016/j.scitotenv.2019.134979
.
Chen
Y.
,
Li
J.
,
Lu
H.
&
Yan
P.
2021
Coupling system dynamics analysis and risk aversion programming for optimizing the mixed noise-driven shale gas-water supply chains
.
Journal of Cleaner Production
278
,
123209
.
https://dx.doi.org/10.1016/j.jclepro.2020.123209
.
Choubin
B.
,
Moradi
E.
,
Golshan
M.
,
Adamowski
J.
,
Sajedi-Hosseini
F.
&
Mosavi
A.
2019
An ensemble prediction of flood susceptibility using multivariate discriminant analysis classification and regression trees and support vector machines
.
Science of the Total Environment
651
(
Pt2
),
2087
2096
.
https://dx.doi.org/10.1016/j.scitotenv.2018.10.064
.
Costabile
P.
&
Macchione
F.
2015
Enhancing river model set-up for 2-D dynamic flood modelling
.
Environmental Modelling & Software
67
,
89
107
.
https://doi.org/10.1016/j.envsoft.2015.01.009
.
Costabile
P.
,
Costanzo
C.
&
Macchione
F.
2013
A storm event watershed model for surface runoff based on 2D fully dynamic wave equations
.
Hydrological Processes
27
(
4
),
554
569
.
https://doi.org/10.1002/hyp.9237
.
Costache
R.
,
Arabameri
A.
,
Elkhrachy
I.
,
Ghorbanzadeh
O.
&
Pham
Q. B.
2021
Detection of areas prone to flood risk using state-of-the-art machine learning models
.
Geomatics, Natural Hazards and Risk
12
(
1
),
1488
1507
.
https://dx.doi.org/10.1080/19475705.2021.1920480
.
De Santana
F. B.
,
de Souza
A. M.
&
Poppi
R. J.
2018
Visible and near infrared spectroscopy coupled to random forest to quantify some soil quality parameters
.
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy
191
,
454
462
.
https://dx.doi.org/10.1016/j.saa.2017.10.052
.
Du
J.
,
Fang
J.
,
Xu
W.
&
Shi
P.
2013
Analysis of dry/wet conditions using the standardized precipitation index and its potential usefulness for drought/flood monitoring in Hunan Province China
.
Stochastic Environmental Research and Risk Assessment
27
(
2
),
377
387
.
https://doi.org/10.1007/s00477-012-0589-6
.
Eftekhari
M.
,
Eslaminezhad
S. A.
,
Haji Elyasi
A.
&
Akbari
M.
2021
Geostatistical evaluation with Drinking Groundwater Quality Index (DGWQI) in Birjand Plain Aquifer
.
Environment and Water Engineering
7
(
2
),
268
279
.
https://doi.org/10.22034/jewe.2021.256731.1464
.
Eini
M.
,
Kaboli
H. S.
,
Rashidian
M.
&
Hedayat
H.
2020
Hazard and vulnerability in urban flood risk mapping: machine learning techniques and considering the role of urban districts
.
International Journal of Disaster Risk Reduction
50
,
101687
.
https://doi.org/10.1016/j.ijdrr.2020.101687
.
Eslaminezhad
S. A.
,
Eftekhari
M.
,
Akbari
M.
,
Elyasi
A. H.
&
Frahadian
H.
2022
Predicting flood prone areas using advanced machine learning models (Birjand Plain)
.
Water and Irrigation Management
11
(
4
),
885
904
.
https://doi.org/10.22059/jwim.2022.332875.934
.
Faizollahzadeh Ardabili
S.
,
Najafi
B.
,
Alizamir
M.
,
Mosavi
A.
,
Shamshirband
S.
&
Rabczuk
T.
2018
Using SVM-RSM and ELM-RSM approaches for optimizing the production process of methyl and ethyl esters
.
Energies
11
(
11
),
2889
.
https://doi.org/10.3390/en11112889
.
Farhadi
H.
&
Najafzadeh
M.
2021
Flood risk mapping by remote sensing data and random forest technique
.
Water
13
(
21
),
3115
.
https://doi.org/10.3390/w13213115
.
Feldman
A.
2000
Hydrologic Modeling System HEC-HMS, Technical Reference Manual. US Army Corps of Engineers. Hydrologic Engineering Center. https://doi.org/CDP-74B.
Fenicia
F.
,
Savenije
H. H.
,
Matgen
P.
&
Pfister
L.
2008
Understanding catchment behavior through stepwise model concept improvement
.
Water Resources Research
44
(
1
).
https://doi.org/10.1029/2006WR005563.
Fernández-Pato
J.
,
Caviedes-Voullième
D.
&
García-Navarro
P.
2016
Rainfall/runoff simulation with 2D full shallow water equations: sensitivity analysis and calibration of infiltration parameters
.
Journal of Hydrology
536
,
496
513
.
https://doi.org/10.1016/j.jhydrol.2016.03.021
.
Fotheringham
A. S.
&
Oshan
T. M.
2016
Geographically weighted regression and multicollinearity: dispelling the myth
.
Journal of Geographical Systems
18
(
4
),
303
329
.
https://doi.org/10.1007/s10109-016-0239-5
.
Giang
P. Q.
,
Trang
N. T. M.
,
Anh
T. T. H.
&
Binh
N. T.
2020
Prediction of economic loss of rice production due to flood inundation under climate change impacts using a modeling approach: a case study in Ha Tinh Province, Vietnam
.
Climate Change
6
,
52
63
.
Haer
T.
,
Husby
T. G.
,
Botzen
W. W.
&
Aerts
J. C.
2020
The safe development paradox: an agent-based model for flood risk under climate change in the European Union
.
Global Environmental Change
60
,
102009
.
https://doi.org/10.1016/j.gloenvcha.2019.102009
.
Hervás
J.
&
Bobrowsky
P.
2009
Mapping: inventories, susceptibility, hazard and risk
. In:
Landslides–Disaster Risk Reduction
.
Springer
,
Berlin, Heidelberg
, pp.
321
349
.
https://doi.org/10.1007/978-3-540-69970-5_19.
Hong
H.
,
Tsangaratos
P.
,
Ilia
I.
,
Liu
J.
,
Zhu
A. X.
&
Chen
W.
2018
Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China
.
Science of the Total Environment
625
,
575
588
.
https://doi.org/10.1016/j.scitotenv.2017.12.256
.
Imani
S.
,
Hassanoli
S.
,
Farkhnia
A.
,
Javadi
F.
&
Najafi
M.
2021
Evaluating the efficiency of WRF-hydro model for development of flood forecasting systems (Case study: Kashkan Watershed)
.
Iran-Water Resources Research
16
(
4
),
225
240
.
https://doi.org/20.1001.1.17352347.1399.16.4.15.5
.
Jaafari
A.
,
Panahi
M.
,
Pham
B. T.
,
Shahabi
H.
,
Bui
D. T.
,
Rezaie
F.
&
Lee
S.
2019
Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility
.
Catena
175
,
430
445
.
https://doi.org/10.1016/j.catena.2018.12.033
.
Jancewicz
K.
,
Migoń
P.
&
Kasprzak
M.
2019
Connectivity patterns in contrasting types of tablelandsandstone relief revealed by Topographic Wetness Index
.
Science of the Total Environment
656
,
1046
1062
.
https://doi.org/10.1016/j.scitotenv.2018.11.467
.
Johann
G.
&
Leismann
M.
2017
How to realise flood risk management plans efficiently in an urban area – the Seseke project
.
Journal of Flood Risk Management
10
(
2
),
173
181
.
https://doi.org/10.1111/jfr3.12075
.
Kalantari
Z.
,
Nickman
A.
,
Lyon
S. W.
,
Olofsson
B.
&
Folkeson
L.
2014
A method for mapping flood hazard along roads
.
Journal of Environment Management
133
,
69
77
.
https://doi.org/10.1016/j.jenvman.2013.11.032
.
Kalantari
Z.
,
Ferreira
C. S. S.
,
Walsh
R. P. D.
,
Ferreira
A. J. D.
&
Destouni
G.
2017
Urbanization development under climate change: hydrological responses in a peri-urban Mediterranean catchment
.
Land Degradation & Development
28
(
7
),
2207
2221
.
https://doi.org/10.1002/ldr.2747
.
Kanani-Sadat
Y.
,
Arabsheibani
R.
,
Karimipour
F.
&
Nasseri
M.
2019
A new approach to flood susceptibility assessment in data-scarce and ungauged regions based on GIS-based hybrid multi criteria decision-making method
.
Journal of Hydrology
572
,
17
31
.
https://doi.org/10.1016/j.jhydrol.2019.02.034
.
Khosravi
K.
,
Shahabi
H.
,
Pham
B. T.
,
Adamowski
J.
,
Shirzadi
A.
,
Pradhan
B.
,
Dou
J.
,
Ly
H. B.
,
Gróf
G.
&
Ho
H. L.
2019
A comparative assessment of flood susceptibility modeling using multi-criteria decision-making analysis and machine learning methods
.
Journal of Hydrology
573
,
311
323
.
https://doi.org/10.1016/j.jhydrol.2019.03.073
.
Kia
M. B.
,
Pirasteh
S.
,
Pradhan
B.
,
Mahmud
A. R.
,
Sulaiman
W. N. A.
&
Moradi
A.
2012
An artificial neural network model for flood simulation using GIS: Johor River Basin, Malaysia
.
Environmental Earth Sciences
67
(
1
),
251
264
.
https://doi.org/10.1007/s12665-011-1504-z
.
Kim
B.
,
Sanders
B. F.
,
Famiglietti
J. S.
&
Guinot
V.
2015
Urban flood modeling with porous shallow-water equations: a case study of model errors in the presence of anisotropic porosity
.
Journal of Hydrology
523
,
680
692
.
https://doi.org/10.1016/j.jhydrol.2015.01.059
.
Kocaman
S.
,
Tavus
B.
,
Nefeslioglu
H. A.
,
Karakas
G.
&
Gokceoglu
C.
2020
Evaluation of floods and landslides triggered by a meteorological catastrophe (Ordu, Turkey, August 2018) using optical and radar data
.
Geofluids
,
1
18
.
https://doi.org/10.1155/2020/8830661
.
Kuncheva
L. I.
&
Rodriguez
J. J.
2007
An experimental study on rotation forest ensembles
.
International Workshop on Multiple Classifier Systems
.
Springer
, pp.
459
468
.
Levy
J. K.
,
Hartmann
J.
,
Li
K. W.
,
An
Y.
&
Asgary
A.
2007
Multi-criteria decision support systems for flood hazard mitigation and emergency response in urban watersheds
.
Journal of the American Water Resources Association
43
,
346
358
.
https://doi.org/10.1111/j.1752-1688.2007.00027.x
.
Liu
R.
,
Chen
Y.
,
Wu
J.
,
Gao
L.
,
Barrett
D.
,
Xu
T.
,
Li
L.
,
Huang
C.
&
Yu
J.
2016
Assessing spatial likelihood of flooding hazard using Naïve Bayes and GIS: a case study in Bowen Basin, Australia
.
Stochastic Environmental Research and Risk Assessment
30
(
6
),
1575
1590
.
https://doi.org/10.1007/s00477-015-1198-y
.
Lohani
A. K.
,
Goel
N. K.
&
Bhatia
K. K. S.
2014
Improving real time flood forecasting using fuzzy inference system
.
Journal of Hydrology
509
,
25
41
.
https://doi.org/10.1016/j.jhydrol.2013.11.021
.
Mekanik
F.
,
Imteaz
M. A.
,
Gato-Trinidad
S.
&
Elmahdi
A.
2013
Multiple regression and artificial neural network for long-term rainfall forecasting using large scale climate modes
.
Journal of Hydrology
503
,
11
21
.
https://doi.org/10.1016/j.jhydrol.2013.08.035
.
Miles
R. E.
&
Snow
C. C.
1984
Designing strategic human resources systems
.
Organizational Dynamics
13
(
1
),
36
52
.
https://doi.org/10.1016/0090 -2616(84)90030-5
.
Minea
G.
2013
Assessment of the flash flood potential of Basca River Catchment (Romania) based on physiographic factors
.
Open Geosciences
5
,
344
353
.
https://doi.org/10.2478/s13533-012-0137-4
.
Mojaddadi
H.
,
Pradhan
B.
,
Nampak
H.
,
Ahmad
N.
&
Ghazali
A. H. B.
2017
Ensemble machine-learning based geospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS
.
Geomatics, Natural Hazards and Risk
8
(
2
),
1080
1102
.
https://doi.org/10.1080/19475705.2017.1294113
.
Mosavi
A.
,
Ozturk
P.
&
Chau
K. W.
2018
Flood prediction using machine learning models: literature review
.
Water
10
(
11
),
1536
.
https://doi.org/10.3390/w10111536
.
Mukerji
A.
,
Chatterjee
C.
&
Raghuwanshi
N. S.
2009
Flood forecasting using ANN, neuro-fuzzy, and neuro-GA models
.
Journal of Hydrologic Engineering
14
(
6
),
647
652
.
https://doi.org/10.1061/(ASCE)HE.1943-5584.0000040
.
Nachappa
T. G.
,
Piralilou
S. T.
,
Gholamnia
K.
,
Ghorbanzadeh
O.
,
Rahmati
O.
&
Blaschke
T.
2020
Flood susceptibility mapping with machine learning, multi-criteria decision analysis and ensemble using Dempster Shafer Theory
.
Journal of Hydrology
590
,
125275
.
https://doi.org/10.1016/j.jhydrol.2020.125275
.
Nayak
P. C.
,
Sudheer
K. P.
,
Rangan
D. M.
&
Ramasastri
K. S.
2005
Short-term flood forecasting with a neurofuzzy model
.
Water Resources Research
41
(
4
).
https://doi.org/10.1029/2004WR003562.
Pham
B. T.
,
Tien Bui
D.
,
Dholakia
M. B.
,
Prakash
I.
&
Pham
H. V.
2016
A comparative study of least square support vector machines and multiclass alternating decision trees for spatial prediction of rainfall-induced landslides in a tropical cyclones area
.
Geotechnical and Geological Engineering
34
(
6
),
1807
1824
.
https://doi.org/10.1007/s10706-016-9990-0
.
Pham
B. T.
,
Tien Bui
D.
&
Prakash
I.
2017
Landslide susceptibility assessment using bagging ensemble based alternating decision trees, logistic regression and j48 decision trees methods: a comparative study
.
Geotechnical and Geological Engineering
35
(
6
),
2597
2611
.
https://doi.org/10.1007/s10706-017-0264-2
.
Pham
B. T.
,
Avand
M.
,
Janizadeh
S.
,
Phong
T. V.
,
Al-Ansari
N.
,
Ho
L. S.
&
Jafari
F.
2020
GIS based hybrid computational approaches for flash flood susceptibility assessment
.
Water
12
(
683
),
1
30
.
https://doi.org/10.3390/w12030683
.
Phong
T. V.
,
Pham
B. T.
,
Trinh
P. T.
,
Ly
H. B.
,
Vu
Q. H.
,
Ho
L. S.
,
Le
H. V.
,
Phong
L. H.
,
Avand
M.
&
Prakash
I.
2021
Groundwater potential mapping using GIS-based hybrid artificial intelligence methods
.
Groundwater
59
(
5
),
745
760
.
https://doi.org/10.1111/gwat.13094
.
Pourghasemi
H. R.
,
Razavi-Termeh
S. V.
,
Kariminejad
N.
,
Hong
H.
&
Chen
W.
2020
An assessment of metaheuristic approaches for flood assessment
.
Journal of Hydrology
582
,
124536
.
https://doi.org/10.1016/j.jhydrol.2019.124536
.
Rahmati
O.
,
Pourghasemi
H. R.
&
Zeinivand
H.
2016
Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province, Iran
.
Geocarto International
31
(
1
),
42
70
.
https://doi.org/10.1080/10106049.2015.1041559
.
Rahmati
O.
,
Naghibi
S. A.
,
Shahabi
H.
,
Tien Bui
D.
,
Pradhan
B.
,
Azareh
A.
&
Melesse
A. M.
2018
Groundwater spring potential modelling: comprising the capability and robustness of three different modeling approaches
.
Journal of Hydrology
565
,
248
261
.
https://doi.org/10.1016/j.jhydrol.2018.08.027
.
Razavi Termeh
S. V.
,
Kornjady
A.
,
Pourghasemi
H. R.
&
Keesstra
S.
2018
Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms
.
Science of the Total Environment
615
,
438
451
.
https://doi.org/10.1016/j.scitotenv.2017.09.262
.
Saedi
A.
,
Saghafian
B.
&
Moazami
S.
2020
Uncertainty of flood forecasts via ensemble precipitation forecasts of seven NWP models for Spring 2019 Golestan Flood
.
Iran-Water Resources Research
16
(
1
),
347
359
.
https://doi.org/20.1001.1.17352347.1399.16.1.23.7
.
Saeidian
B.
,
Mesgari
M. S.
,
Pradhan
B.
&
Ghodousi
M.
2018
Optimized location-allocation of earthquake relief centers using PSO and ACO, complemented by GIS, clustering, and TOPSIS
.
ISPRS International Journal of Geo-Information
7
(
8
),
292
.
https://doi.org/10.3390/ijgi7080292
.
Safaripour
M.
,
Monavari
M.
,
Zare
M.
,
Abedi
Z.
&
Gharagozlou
A.
2012
Flood risk assessment using GIS (Case study: Golestan Province, Iran)
.
Polish Journal of Environmental Studies
21
(
6
),
1817
1824
.
Sahoo
G. B.
,
Schladow
S. G.
&
Reuter
J. E.
2009
Forecasting stream water temperature using regression analysis, artificial neural network, and chaotic non-linear dynamic models
.
Journal of Hydrology
378
,
325
342
.
https://doi.org/10.1016/j.jhydrol.2009.09.037
.
Sajedi-Hosseini
F.
,
Malekian
A.
,
Choubin
B.
,
Rahmati
O.
,
Cipullo
S.
,
Coulon
F.
&
Pradhan
B.
2018
A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination
.
Science of the Total Environment
644
,
954
962
.
https://doi.org/10.1016/j.scitotenv.2018.07.054
.
Santangelo
N.
,
Santo
A.
,
Di Crescenzo
G.
,
Foscari
G.
,
Liuzza
V.
,
Sciarrotta
S.
&
Scorpio
V.
2011
Flood susceptibility assessment in a highly urbanized alluvial fan: the case study of Sala Consilina (southern Italy)
.
Natural Hazards and Earth System Sciences
11
(
10
),
2765
2780
.
https://doi.org/10.5194/nhess-11-2765-2011
.
Seckin
N.
,
Cobaner
M.
,
Yurtal
R.
&
Haktanir
T.
2013
Comparison of artificial neural network methods with L-moments for estimating flood flow at ungauged sites: the case of East Mediterranean River Basin, Turkey
.
Water Resources Management
27
(
7
),
2103
2124
.
https://doi.org/10.1007/s11269-013-0278-3
.
Seejata
K.
,
Yodying
A.
,
Wongthadam
T.
,
Mahavik
N.
&
Tantanee
S.
2018
Assessment of flood hazard areas using analytical hierarchy process over the Lower Yom Basin, Sukhothai Province
.
Procedia Engineering
212
,
340
347
.
https://doi.org/10.1016/j.proeng.2018.01.044
.
Smith
K.
&
Ward
R.
1998
Mitigating and Managing Flood Losses. Floods: Physical Processes and Human Impacts
.
John Wiley and Sons Ltd
,
Chichester, UK
.
Tang
X.
,
Li
J.
,
Liu
M.
,
Liu
W.
&
Hong
H.
2020
Flood susceptibility assessment based on a novel random Naïve Bayes method: a comparison between different factor discretization methods
.
Catena
190
,
104536
.
https://doi.org/10.1016/j.gsf.2021.101253
.
Tehrany
M. S.
,
Pradhan
B.
&
Jebur
M. N.
2014
Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS
.
Journal of Hydrology
512
,
332
343
.
https://doi.org/10.1016/j.jhydrol.2014.03.008
.
Tien Bui
D.
,
Pradhan
B.
,
Nampak
H.
,
Bui
Q. T.
,
Tran
Q. A.
&
Nguyen
Q. P.
2016
Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibility modeling in a high-frequency tropical cyclone area using GIS
.
Journal of Hydrology
540
,
317
330
.
https://doi.org/10.1016/j.jhydrol.2016.06.027
.
Tien Bui
D.
,
Panahi
M.
,
Shahabi
H.
,
Singh
V. P.
,
Shirzadi
A.
,
Chapi
K.
&
Ahmad
A. A.
2018
Novel hybrid evolutionary algorithms for spatial prediction of floods
.
Scientific Reports
8
(
1
),
1
14
.
https://doi.org/10.1038/s41598-018-33755-7
.
Tien Bui
D.
,
Shirzadi
A.
,
Shahabi
H.
,
Geertsema
M.
,
Omidvar
E.
,
Clague
J. J.
,
Thai Pham
B.
,
Dou
J.
,
Talebpour Asl
D.
,
Bin Ahmad
B.
&
Lee
S.
2019
New ensemble models for shallow landslide susceptibility modeling in a semi-arid watershed
.
Forests
10
(
9
),
743
.
https://doi.org/10.3390/f10090743
.
Wang
X.
&
Liu
H.
2019
A knowledge-and data-driven soft sensor based on deep learning for predicting the deformation of an air preheater rotor
.
IEEE Access
7
,
159651
159660
.
https://doi.org/10.1109/ACCESS.2019.2950661
.
Wang
L. M.
,
Li
X. L.
,
Cao
C. H.
&
Yuan
S. M.
2006
Combining decision tree and Naive Bayes for classification
.
Knowledge-Based Systems
19
(
7
),
511
515
.
https://doi.org/10.1016/j.knosys.2005.10.013
.
Wang
Y.
,
Hong
H.
,
Chen
W.
,
Li
S.
,
Panahi
M.
,
Khosravi
K.
,
Shirzadi
A.
,
Shahabi
H.
,
Panahi
S.
&
Costache
R.
2019
Flood susceptibility mapping in Dingnan County (China) using adaptive neuro-fuzzy inference system with biogeography based optimization and imperialistic competitive algorithm
.
Journal of Environmental Management
247
,
712
729
.
https://doi.org/10.1016/j.jenvman.2019.06.102
.
Witten
I. H.
,
Frank
E.
&
Mark
A. H.
2011
Data Mining: Practical Machine Learning Tools and Techniques
(third edition).
Morgan Kaufmann
,
Burlington, USA
.
Wu
Y.
,
Ke
Y.
,
Chen
Z.
,
Liang
S.
,
Zhao
H.
&
Hong
H.
2020
Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping
.
Catena
187
,
104396
.
https://doi.org/10.3390/rs12203389
.
Xia
J.
,
Du
P.
,
He
X.
&
Chanussot
J.
2013
Hyperspectral remote sensing image classification based on rotation forest
.
IEEE Geoscience and Remote Sensing Letters
11
(
1
),
239
243
.
https://doi.org/10.1109/LGRS.2013.2254108
.
Xia
X.
,
Liang
Q.
,
Ming
X.
&
Hou
J.
2017
An efficient and stable hydrodynamic model with novel source term discretization schemes for overland flow and flood simulations
.
Water Resources Research
53
(
5
),
3730
3759
.
https://doi.org/10.1002/2016WR020055
.
Youssef
A. M.
,
Pradhan
B.
&
Sefry
S. A.
2016
Flash flood susceptibility assessment in Jeddah city (Kingdom of Saudi Arabia) using bivariate and multivariate statistical models
.
Environmental Earth Sciences
75
,
12
.
https://doi.org/10.1007/s12665-015-4830-8
.
Zhao
M.
&
Hendon
H. H.
2009
Representation and prediction of the Indian Ocean dipole in the POAMA seasonal forecast model
.
Quarterly Journal of the Royal Meteorological Society
135
(
639
),
337
352
.
https://doi.org/10.1002/qj.370
.
Zhao
G.
,
Pang
B.
,
Xu
Z.
,
Yue
J.
&
Tu
T.
2018
Mapping flood susceptibility in mountainous areas on a national scale in China
.
Science of the Total Environment
615
,
1133
1142
.
https://doi.org/10.1016/j.scitotenv.2017.10.037
.
Zuo
C.
,
Chen
Q.
,
Tian
L.
,
Waller
L.
&
Asundi
A.
2015
Transport of intensity phase retrieval and computational imaging for partially coherent fields: the phase space perspective
.
Optics and Lasers in Engineering
71
,
20
32
.
https://doi.org/10.1016/j.optlaseng.2015.03.006
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).