Abstract
The objective of this study was the development of an approach based on machine learning and GIS, namely Adaptive Neuro-Fuzzy Inference System (ANFIS), Gradient-Based Optimizer (GBO), Chaos Game Optimization (CGO), Sine Cosine Algorithm (SCA), Grey Wolf Optimization (GWO), and Differential Evolution (DE) to construct flood susceptibility maps in the Ha Tinh province of Vietnam. The database includes 13 conditioning factors and 1,843 flood locations, which were split by a ratio of 70/30 between those used to build and those used to validate the model, respectively. Various statistical indices, namely root mean square error (RMSE), area under curve (AUC), mean absolute error (MAE), accuracy, and R1 score, were applied to validate the models. The results show that all the proposed models performed well, with an AUC value of more than 0.95. Of the proposed models, ANFIS-GBO was the most accurate, with an AUC value of 0.96. Analysis of the flood susceptibility maps shows that approximately 32–38% of the study area is located in the high and very high flood susceptibility zone. The successful performance of the proposed models over a large-scale area can help local authorities and decision-makers develop policies and strategies to reduce the threats related to flooding in the future.
HIGHLIGHTS
Flood susceptibility modeling was done using hybrid machine learning approaches.
The proposed models have achieved great precision and have surpassed the reference models.
ANFIS-GBO and ANFIS-SCA were the best models.
Graphical Abstract
INTRODUCTION
The consequences of global warming and urban sprawl contribute to both the growing severity and number of floods, landslides, and droughts (Mukherjee et al. 2018; Feng et al. 2020; Ghorbanzadeh et al. 2020). Particularly, climate change leads to an increase in the intensity and frequency of torrential rains, which are the main aggravators of flooding (Prein et al. 2017; Nachappa et al. 2020; Tabari 2020). According to statistics from the World Bank, the occurrence of flooding has increased by more than 40% in the last two decades, leading to serious economic consequences (Alfieri et al. 2017; Andaryani et al. 2021). Globally, flooding affected an estimated 109 million people between 1995 and 2015 and caused damage worth approximately USD 75 billion each year (Andaryani et al. 2021). This figure may increase to USD 300 billion by 2050, by which time an estimated 1.3 billion people will be living on floodplains (Giang et al. 2020).
Vietnam has a coastline of more than 3,200 km; it has 2,360 rivers, with a total length of about 41,900 km. This is why it is one of the countries most affected by floods. Between 1989 and 2015, floods in Vietnam led to at least 14,867 deaths or missing persons (Luu et al. 2018). The Central region of Vietnam is particularly affected by flooding as it has a relatively high rainfall of 1,800–2,500 mm/year and is densely populated (Pham et al. 2021f). An example of this vulnerability came in October 2020, when flooding resulted in an estimated 129 deaths or missing persons and 214 injured. 111,200 houses were damaged, constituting costs of approximately USD 1 billion.
Despite the efforts of experts and policymakers to reduce the effects of flooding in recent years, its impact is in fact increasing, all over the world (Pham et al. 2021d, 2021e). We urgently need a better understanding of what makes a given area vulnerable to flooding. The term ‘flood susceptibility’ describes the spatial and temporal probability of a flood event, and flood susceptibility maps are considered crucial to the management of flood risk in the future (Pham et al. 2020). The literature shows that there are three central methods to determine areas at risk of flooding: remote sensing, physics-based models, and data-driven models.
In the case of remote sensing, researchers have used both radar and optical data to detect flood zones (Sun et al. 2000; Schumann et al. 2007; Anusha & Bharathi 2020). Although this approach can generate highly accurate flood maps at a low cost, especially when the model is integrated with GIS, it has significant limitations related to spatial and temporal resolutions and is affected by clouds. In addition, remote sensing requires the manual adjustment of the threshold parameters to obtain a good result; this process is time-consuming and laborious. Critically, remote sensing is also unable to present the original causes of the flood (Nguyen et al. 2020).
Physics-based models – such as MIKE FLOOD (Patro et al. 2009; Kadam & Sen 2012) and HEC-RAS (Shustikova et al. 2019; Ongdas et al. 2020) – have received considerable attention from researchers looking to map flood risk. Although this approach has proven capable of modeling future flood scenarios, unfortunately these models are affected by changes in topography, initial or boundary conditions, or other parameters such as coefficients of friction, diffusion, or degradation (often inaccessible to direct measurement). Moreover, these models require detailed data at location – such as hydro-geomorphological data – requiring intensive calculations and thus making short-term forecasting difficult (Eslaminezhad et al. 2022). Previous research has demonstrated that physics-based models only have short-term predictive ability (Eslaminezhad et al. 2022; Prasad et al. 2022). One final challenge is that the establishment of these models requires a thorough understanding of hydrological parameters (Pham et al. 2021f).
The drawbacks of physics-based models have led many to the use of data-based models such as statistical modeling and machine learning. Statistical modeling has been used extensively to predict flood risk, with methods including frequency ratio (FR; Samanta et al. 2018; Tehrany & Kumar 2018), logistic regression (LR; Nandi et al. 2016), fuzzy logic (Pulvirenti et al. 2011; Perera & Lahat 2015), and weight of evidence (Rahmati et al. 2016). These models have proven effective in assessing flood susceptibility in several regions around the world. However, the inundation process is often nonlinear and time-varying, which presents a significant challenge for statistical models, particularly at the regional scale (Ha et al. 2022).
Some scientists have used machine learning, integrated with data from satellite images, to determine areas that are susceptible to floods. Machine learning is quicker and requires a smaller amount of input data than physics-based models. Moreover, it can resolve the nonlinear characteristics of past flood events just from the data; it does not need to understand the physical processes involved (Islam et al. 2021). The use of machine learning is predicated on the relationship within input dataset and flooding remaining unchanged in the future (Bui et al. 2019). Examples of this methodology include Support Vector Machine (Tehrany et al. 2014; Pham et al. 2019), Random Forest (RF; Lee et al. 2017; Chen et al. 2020), Adaboost (Hong et al. 2018a; Pham et al. 2021g), Artificial Neural Networks (Falah et al. 2019; Bui et al. 2020), and Adaptive Neuro-Fuzzy Logic (Hong et al. 2018b; Tabbussum & Dar 2021).
Several previous studies have claimed that machine learning is more appropriate than traditional approaches to flood prediction (Mosavi et al. 2017), but traditional machine learning methods are becoming less effective. A particular problem is that of overfitting: machine learning is good at prediction in the training process, because systems learn tasks based on data from the past, but if data are missing or insufficiently diverse, the systems cannot predict accuracy in the testing process (Mosavi et al. 2018; Bui et al. 2020; Chou et al. 2021).
In recent years, hybrid models have become a more popular way to resolve the problems of mapping flood susceptibility (Nguyen et al. 2021a; Yen et al. 2021). Hybrid models combine individual models with metaheuristic algorithms. The advantage is that hybrid models can eliminate the weak points of individual models, to obtain more accurate results (Saha et al. 2021). Moreover, metaheuristic algorithms explore the entire search space, thus limiting local optimization problems (Zhao et al. 2020). Optimization algorithms have been applied effectively in different domains, such as economics, earth science, and engineering. These can be divided into three groups: evolution-based – such as Genetic Algorithm (GA) (Shahabi et al. 2021) and DE (Razavi-Termeh et al. 2021) – physics-based – such as Henry Gas Solubility Optimization (HGSO; Nguyen et al. 2021b), Atom Search Optimization (ASO; Mundra et al. 2022) – and swarm-based – such as Whale Optimization Algorithm (WOA; Liu et al. 2020), GWO (Nguyen et al. 2021a), and Bat Algorithm (BA; Ahmadlou et al. 2019).
One example is that of Termeh et al. (2018), who integrated ANFIS with three optimization algorithms (Ant Colony Optimization, GA, and Particle Swarm Optimization (PSO)) to produce flood susceptibility maps for the Fars province of southern Iran. The hybrid models were successful, outperforming the reference models. Bui et al. (2019) combined the Extreme Learning Machine model with PSO to evaluate flood susceptibility in a mountain region of northern Vietnam. This combination displayed promising precision, with an AUC of over 0.95. Bui et al. (2020) utilized the swarm intelligence algorithms, namely GWO, Grasshopper Optimization Algorithm, and Social Spider Optimization algorithm to improve the performance of the DNN model to determine flood susceptibility in an area of Lai Chau province. The hybrid models outperformed the Support Vector Machine and RF reference models.
Such excellent results have inspired the implementation and development of hybrid models in the modeling of many natural hazards. However, key limitations of machine learning models remain, namely the generalization problem (i.e. models can perform well above the range of the training dataset but cannot predict beyond the range) and the no-free-lunch problem (i.e. there are no models that can solve all problems in all regions).
The researchers recommend developing new ways to build flood susceptibility maps and related natural hazard simulations, and so the work of this study was to conceive of new hybrid models based on ANFIS and five optimization algorithms to predict flood susceptibility in Ha Tinh province, Vietnam. ANFIS is a popular neuro-fuzzy model for solving the highly complex problem of nonlinearity. In addition, ANFIS has the ability to adapt automatically to problems. ANFIS is based on the fuzzy model and is particularly strong at interpreting problems, which in turn improves its ability to generalize. ANFIS has gained recognition due to its strength in transforming fuzzy sets into clean inputs and providing clean output from fuzzy rules (Chopra et al. 2021). The model has been proven to be effective in several different areas, such as flood susceptibility construction and streamflow prediction (Adnan et al. 2022). However, ANFIS also has disadvantages: the selection of the type and the number of membership functions, as well as the position of the membership functions (Chopra et al. 2021). Therefore, it is necessary to integrate ANFIS with optimization algorithms like Gradient-Based Optimizer (GBO), Chaos Game Optimization (CGO), and Sine Cosine Algorithm (SCA). These optimization algorithms have proven effective in several different areas although, as yet, they have rarely been applied in the field of environmental science and natural hazard management.
We have created new methods, based on ANFIS and three novel optimization algorithms (GBO, CGO, and SCA), to generate spatial flood susceptibility maps with minimal field data for the study area, which is a province that is regularly affected by flooding. The performance of these models was compared with the reference models ANFIS-DE and ANFIS-GWO. Our initial hypothesis was that the three proposed models would successfully build a flood susceptibility map and that their performance would surpass the reference models ANFIS-DE and ANFIS-GWO.
The novel elements of this study are twofold. This is the first time flood susceptibility in Ha Tinh has been mapped in such a rigorous way; it is also the first time ANFIS has been integrated with GBO, CGO, and SCA. To the knowledge of the author, these hybrid models have not previously been used to construct a map, and so the finding of this study represents a novel approach to the use of deep-learning algorithms in generating flood susceptibility maps.
STUDY AREA AND DATA USE
Study area
Due to the combined effects of tropical depression, cold waves, and high slope, the orographic precipitation over the Central region of Vietnam is very high (Le et al. 2021). The study area lies in the tropical monsoon region and experiences two different seasons: the rainy season (November–March) and the dry season (April–October). The average annual precipitation for the period 1958–2018 was relatively high – between 1,142 and 4,391 mm, with 75% of the precipitation concentrated in the rainy season. Peak rainfall is in September and October, when the monthly average is 500–800 m, which is very high compared with other regions of Vietnam (Le et al. 2021). The study area has a dense river network with a total length of more than 400 km and an average annual flow of 195 m3/s. There are three main river systems: Ngan Sau, Ngan Pho, and the coastal river system.
Forests currently cover about 66% of the surface of the province, although illegal deforestation, infrastructure construction, and forest fires are causing this figure to rapidly decrease. According to the Department of Agriculture and Rural Development, the forested area decreased by 1,728.17 ha between 2019 and 2020. This increases the flood risk in the study area. According to the Institute of Science and Technology, between 1961 and 2015, 44 storms affected Ha Tinh province. Storms can combine with heavy precipitation over a large area to result in serious floods. In October 2020, the province suffered two particularly destructive floods, which killed 6 people, destroyed 3,765 houses, and caused damage of VND 723 billion.
In recent years, the provincial government's flood strategy has focused on two main objectives: the adoption of structural measures – such as containment or flood mitigation through actions like the construction of dams – and the integration of flood depth maps, produced using hydrodynamic modeling, into land-use planning strategies to minimize future damage to human life and the economy. However, as mentioned above, hydrodynamic modeling has critical limitations that influence the accuracy of the flood depth map. Van den Honert & McAneney (2011) pointed out that the inability of hydrodynamic modeling to accurately predict floods had led to major damage in Queensland, Australia. Therefore, in this study, machine learning was selected for the construction of the map.
Geospatial data
Flood inventory
The first step when using a machine learning approach to create a flood susceptibility model is building the flood inventory map, which details past flood locations and relations with conditioning factors at those locations (Mojaddadi et al. 2017; Nachappa et al. 2020; Pham et al. 2021a). In this study, the locations of floods were collected from sources obtained from the Department of Natural Resources and Environment of Ha Tinh (DONRE of Ha Tinh). In addition, field missions were carried out in 2020 and 2021 to measure flood marks, and Landsat 8 OLI and Sentinel 1A satellite images were used to detect floodplains during flooding in October 2010 and October 2020. Details of the methodology were presented by Kumar (2019) and Tiampo et al. (2022). A total of 1,843 samples were acquired to establish the model for Ha Tinh, including 943 samples for flood and 900 samples for non-flood. Flood and non-flood inventory were coded as 1 and 0, respectively. These data were split into two sets: 70% of the dataset was utilized to create the model, 30% to verify the model.
Conditioning factors
Elevation is an important factor because it is linked to the capacity of water accumulation. Flooding is most likely to happen in lower-lying areas. Elevation affects flood regulation because with decreasing altitude, rainfall and river flow increase (Choubin et al. 2019; Nachappa et al. 2020). In the study area, the altitude value is 0–2,280 m.
Slope is the most vital conditioning factor, because it has effects on flow velocity and water accumulation capacity. Gentle slopes are usually more prone to flooding, as precipitation runs off steeper slopes into lowland water bodies (Yariyan et al. 2020). The slope value varied from 0° to 70° in this study.
Aspect is an important factor in hydrological response. It influences the local climatic conditions, soil moisture, and infiltration. Although aspect affects flooding indirectly, several researchers have considered it essential to susceptibility maps (Nachappa et al. 2020). The aspect value varied from 0° to 360° in this study.
Several researchers have underlined the critical role of curvature in flood susceptibility modeling since it directly impacts flow accumulation (Mirzaei et al. 2021). In Ha Tinh province, curvature varied from −79 to 79.
NDVI is one of the indices to measure vegetation density in the study area. The higher the vegetation density is, the lower the probability of flooding, and conversely (Dodangeh et al. 2020; Nachappa et al. 2020). The NDVI value ranged from −1 to 1 in this study.
NDBI calculates building density. This is important because buildings directly affect surface permeability (Nguyen et al. 2021a). The NDBI value ranged from −1 to 1 in this study.
Density to river was critical, as we were looking at fluvial flooding, where the river plays an essential role in flood expansion. Most areas near rivers are prone to flooding, as water overflows the riverbank (Vojtek & Vojteková 2019; Chowdhuri et al. 2020). The density to river value varied from 0 to 6.4 m in this study.
Density to road is also a primary factor as it directly affects the infiltration capacity of the surface (Ahmadlou et al. 2021; Linh et al. 2022). In Ha Tinh province, the national Highway 1A acts as a dam that blocks the flow of water toward the sea. The density to road value varied from 0 to 7.4 m in this study.
TWI describes soil moisture and the ability of the soil to erode spatially (Nachappa et al. 2020). The TWI value varied from 1.3 to 23.1 in this study.
CTI is the quotient between the slope and the flow accumulation. It describes the capacity of water resources per unit area. A low CTI value represents a basin with a steep slope and a small water surface (Azedou et al. 2021). The CTI value varied from 1.3 to 23.1 in this study.
Land use is key when predicting flood occurrence because each type of land use has a different water impermeability capacity (Andaryani et al. 2021; El-Haddad et al. 2021). Beckers et al. (2013) pointed out that the evolution of land cover is the main cause of increased flood risk. We categorized land use as either forest, water, agricultural area, artificial area, barren land, or fish-farming areas.
Various previous studies have highlighted the relationship between precipitation and flood occurrence. Rainfall is a trigger factor for flooding and an increase in the intensity of rainfall can lead to an increase in the intensity of a flood (Chapi et al. 2017; Pham et al. 2021f). The rainfall value varied from 2,013 to 2,519 mm in this study.
Although soil type influences the process of flooding indirectly, studies have shown its importance in modeling as it controls the process of water infiltration from the surface (Bui et al. 2019; Costache et al. 2020). Therefore, in this study, soil type was divided into humic acrisols, heplic arenosols, alisols, salic fluvisols, xanthic ferralsols, thionic fluvisols, fluvisols, dystric gleysols, ferralic acrisols, ferralsols, arenic acrisols, lithic leptosols, plinthic acrisols, and hyperdystric acrisols.
METHODS
- (i)We collected flood inventory data from several field missions and from Landsat 8OLI and Sentinel 1A imagery. The 13 conditioning factors selected were elevation, aspect, slope, curvature, NDBI, NDVI, density to river, density to road, CTI, TWI, land use, rainfall, and soil type. Data were normalized using the min–max normalization technique:
In addition, we used RF to select important factors and filter out non-predictive factors, because data redundancy could increase the complexity of models and could reduce the accuracy of prediction models.
(ii) Two sets of data must be considered in any statistical model. The first is used to train the models and a separate second set is used to evaluate the models. In this study, a total of 1,843 flood and non-flood and 13 conditioning factor points were divided into sets for training (70%) and validation (30%). We used five different models to develop the flood susceptibility map using Python. For neural network usage, the network topology was determined by the dataset. The model parameter initialization process comprised two steps: the initialization of the ANFIS model hyper-parameters using the trial-and-error method and the initialization of the five optimization algorithms.
The structure of the ANFIS model comprises five layers, of which the first is the input layer, which consisted of 13 conditioning factors (see Figure 2). The final output layer contains two classes of results: flood and non-flood. The ANFIS model is influenced by hyper-parameters (number of fuzzy memberships, member of function, batch size, optimizer, and loss).
Optimization algorithms allow us to enhance the accuracy of the models by modifying the hyper-parameters. In this study, ANFIS models were developed using five optimization algorithms (GBO, CGO, SCA, GWO, and DE) to create the fuzzy interface system (FIS). After the generation of FIS, the parameters of the membership function were generated and stored in a master. In the ANFIS algorithm, both the fuzzy and defuzzy layers had parameters, while the rules and normalization layers did not. It should be noted that at each iteration, ANFIS was trained with the problem size created by the optimization algorithms. Each generated different solutions starting with the set of input weights and other specific parameters. RMSE was utilized as an objective function to help to determine the best optimization solution using five optimization algorithms. In the end, if the last condition of the model was met, producing the best result with a small RMSE, the optimization would stop. Otherwise, the optimization process was repeated until the model achieved the best results.
See Table 1 for the parameters of the proposed models.
The parameters of ANFIS optimized by GBO, CGO, SCA, GWO, and DE
Parameters . | ANFIS-GBO . | ANFIS-CGO . | ANFIS-SCA . | ANFIS-GWO . | ANFIS-DE . |
---|---|---|---|---|---|
Inputs | 13 | 13 | 13 | 13 | 13 |
Number of fuzzy memberships | 2 | 2 | 2 | 2 | 2 |
Batch size | 32 | 32 | 32 | 32 | 32 |
Member of function | Gaussian | Gaussian | Gaussian | Gaussian | Gaussian |
Optimizer | SGD | SGD | SGD | SGD | SGD |
Loss | RMSE | RMSE | RMSE | RMSE | RMSE |
Epochs | 200 | 200 | 200 | 200 | 200 |
Problem size | 114,740 | 114,740 | 114,740 | 114,740 | 114,740 |
Parameters . | ANFIS-GBO . | ANFIS-CGO . | ANFIS-SCA . | ANFIS-GWO . | ANFIS-DE . |
---|---|---|---|---|---|
Inputs | 13 | 13 | 13 | 13 | 13 |
Number of fuzzy memberships | 2 | 2 | 2 | 2 | 2 |
Batch size | 32 | 32 | 32 | 32 | 32 |
Member of function | Gaussian | Gaussian | Gaussian | Gaussian | Gaussian |
Optimizer | SGD | SGD | SGD | SGD | SGD |
Loss | RMSE | RMSE | RMSE | RMSE | RMSE |
Epochs | 200 | 200 | 200 | 200 | 200 |
Problem size | 114,740 | 114,740 | 114,740 | 114,740 | 114,740 |
The five proposed algorithms were used to calculate the weights of the model. The total value of the weights in the calculated models was 114,740.
(iii) The ROC curve is one of the techniques most often used in assessing the accuracy of a flood susceptibility model, by validating the dataset. The curve was generated by overlaying the validation dataset onto the prediction results of the five proposed models. AUC represents the accuracy of the models.
(iv) After constructing the proposed models, the map was generated by inputting into the models all the pixels of the study area, along with the value of conditioning factors. The output value is the flood susceptibility index and lies within a range of 0 to 1. These values were split for five classes (applying the natural break method): very low, low, moderate, high, and very high.
Adaptive Neuro-Fuzzy Inference System
Gradient-Based Optimizer


Chaos Game Optimization
CGO is a metaheuristic optimization proposed by Talatahari & Azizi (2021) and is based on chaos theory. Chaos theory explains how a small change in initial conditions can cause an extreme deviation later. Based on this theory, the current state of a system can be used to determine the future state of that system. In mathematics, chaos play is the methodology of constructing fractals using an initial polygon and a randomly selected initial point. This process aims to create a sequence of points by applying the method iteratively (Talatahari & Azizi 2021). The main shape of the fractal is the vertex of a polygon, so the vertex of the polygon must be appropriately defined. Next, an initial point is randomly chosen to start the fractal construction. Subsequent points are determined based on the distance of the original point, and the remaining vertex of the polygon is randomly selected after each iteration. This process is repeated continuously and, with random selection of initial points and vertices of the polygon after each iteration, a fractal is constructed (Talatahari & Azizi 2021). The triangle was constructed:



More details of this algorithm can be found in Talatahari & Azizi (2021). CGO has already been successfully applied in fields such as engineering and energy.
Sine Cosine Algorithm
represents the individual I at iteration t in the dth dimension;
is the best position of the individual at iteration t in the dth dimension; and r1, r2, and r3 are random parameters.
More details of this algorithm can be found in Gabis et al. (2021). SCA has performed well in contexts such as energy and earth sciences.
Grey Wolf Optimization
A, C, and D are defined as the coefficient vectors; Xvictim(t) and Xwolf(t) are the current positions of the victim and the wolf, respectively, and t is the current iteration.
More details of this algorithm can be found in Liu et al. (2021). GWO has been successfully applied in many different applications including the prediction of forest fires, landslides, and floods.
Differential Evolution
r1, r2, r3 ∊ {0, 1, 2,…, NP} are the three random unequal integers. F is the coefficient and varies from 0 to 2.
In the DE algorithm, all members have an equal chance of becoming parents, and the best members get to the next step compared with the parent generation. The DE algorithm has the advantage of not requiring any optimization function to achieve the best solution; however, to tune the assigned problems, the DE algorithm maintains a large number of existing solutions and initiates new solutions in parallel with existing solutions (Al-Sudani et al. 2019). More details of this algorithm can be found in Razavi-Termeh et al. (2021).
Accuracy assessment
TP constitutes the number of flood pixel classified correctly as flood, while TN is the number of non-flood pixel correctly classified as non-flood. P is the number of flood pixel. N is the number of non-flood pixel.
FP and FN are the number of flood and non-flood points that are incorrectly classified as flood and non-flood, respectively. Pp is the number of samples classified correctly by flooding or non-flooding. Pexp is the number of expected agreements.
R2 is a popular statistical index and measures the accuracy of the linear regression model. R2 is calculated as the square of the correlation coefficient between a dependent variable and one or more independent variables. It expresses the part of the variance of the dependent variable that comes from those of the independent variables. The value of R2 varies from −1 to 1. If the value of R2 is negative, the model is malfunctioning. If the value of R2 equals 0, it means that the model cannot explain the variability of the response data around its mean. If the value of R2 equals 1, the model is perfect. These are the main advantages of the R2 index compared with other indices such as AUC, RMSE, or MAE (Chicco et al. 2021).
RESULTS
Feature selection analysis
Model performance
The validation of five models using ROC based on training point and validation point.
The validation of five models using ROC based on training point and validation point.
The accuracy of the five models based on training and validation data was analyzed by the RMSE, MAE, ACC, K, and R2 (Table 2). The results show that for the validation process, the proposed ANFIS-GBO model outperformed the other models, followed by ANFIS-SCA, ANFIS-GWO, ANFIS-DE, and ANFIS-CGO, respectively. In validation, the ANFIS-GBO model was also more precise than the other models, followed by ANFIS-SCA, ANFIS-GWO, ANFIS-DE, and then ANFIS-CGO. The ANFIS-GBO model had the best overall performance and ANFIS-CGO the poorest.
Model performance of ANFIS-GBO, ANFIS-CGO, ANFIS-SCA, ANFIS-GWO, and ANFIS-DE
. | Training dataset . | Validating dataset . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | AUC . | ACC . | K . | R2 . | RMSE . | MAE . | AUC . | ACC . | K . | R2 . | |
Proposed model | ||||||||||||
ANFIS-GBO | 0.14 | 0.09 | 0.994 | 0.98 | 0.97 | 0.91 | 0.23 | 0.16 | 0.99 | 0.96 | 0.86 | 0.63 |
ANFIS-CGO | 0.2 | 0.15 | 0.992 | 0.98 | 0.96 | 0.82 | 0.26 | 0.2 | 0.98 | 0.95 | 0.83 | 0.54 |
ANFIS-SCA | 0.17 | 0.12 | 0.996 | 0.98 | 0.97 | 0.88 | 0.25 | 0.17 | 0.99 | 0.95 | 0.85 | 0.58 |
Reference model | ||||||||||||
ANFIS-GWO | 0.17 | 0.14 | 0.994 | 0.98 | 0.97 | 0.87 | 0.25 | 0.17 | 0.987 | 0.95 | 0.85 | 0.57 |
ANFIS-DE | 0.18 | 0.14 | 0.993 | 0.98 | 0.97 | 0.86 | 0.26 | 0.19 | 0.984 | 0.95 | 0.84 | 0.56 |
. | Training dataset . | Validating dataset . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | AUC . | ACC . | K . | R2 . | RMSE . | MAE . | AUC . | ACC . | K . | R2 . | |
Proposed model | ||||||||||||
ANFIS-GBO | 0.14 | 0.09 | 0.994 | 0.98 | 0.97 | 0.91 | 0.23 | 0.16 | 0.99 | 0.96 | 0.86 | 0.63 |
ANFIS-CGO | 0.2 | 0.15 | 0.992 | 0.98 | 0.96 | 0.82 | 0.26 | 0.2 | 0.98 | 0.95 | 0.83 | 0.54 |
ANFIS-SCA | 0.17 | 0.12 | 0.996 | 0.98 | 0.97 | 0.88 | 0.25 | 0.17 | 0.99 | 0.95 | 0.85 | 0.58 |
Reference model | ||||||||||||
ANFIS-GWO | 0.17 | 0.14 | 0.994 | 0.98 | 0.97 | 0.87 | 0.25 | 0.17 | 0.987 | 0.95 | 0.85 | 0.57 |
ANFIS-DE | 0.18 | 0.14 | 0.993 | 0.98 | 0.97 | 0.86 | 0.26 | 0.19 | 0.984 | 0.95 | 0.84 | 0.56 |
Flood susceptibility maps
The distribution of each class of ANFIS-GBO, ANFIS-CGO, ANFIS-SCA, ANFIS-GWO, and ANFIS-DE
. | Very low (%) . | Low (%) . | Moderate (%) . | High (%) . | Very high (%) . |
---|---|---|---|---|---|
ANFIS-GBO | 28.49 | 28.55 | 7.9 | 19.71 | 15.35 |
ANFIS-CGO | 13.52 | 27.43 | 20.83 | 19.43 | 18.78 |
ANFIS-SCA | 24.64 | 28.58 | 14.55 | 17.54 | 14.68 |
ANFIS-GWO | 19.56 | 25.95 | 18.07 | 19.5 | 16.92 |
ANFIS-DE | 18.79 | 24.14 | 18.59 | 18.65 | 19.82 |
. | Very low (%) . | Low (%) . | Moderate (%) . | High (%) . | Very high (%) . |
---|---|---|---|---|---|
ANFIS-GBO | 28.49 | 28.55 | 7.9 | 19.71 | 15.35 |
ANFIS-CGO | 13.52 | 27.43 | 20.83 | 19.43 | 18.78 |
ANFIS-SCA | 24.64 | 28.58 | 14.55 | 17.54 | 14.68 |
ANFIS-GWO | 19.56 | 25.95 | 18.07 | 19.5 | 16.92 |
ANFIS-DE | 18.79 | 24.14 | 18.59 | 18.65 | 19.82 |
DISCUSSION
Accurate estimation of the level of flood susceptibility in a given area is vital to protect inhabitants and develop effective mitigation strategies. Flood susceptibility mapping has become central to global flood risk management efforts (Pham et al. 2021b; Shahabi et al. 2021; Towfiqul Islam et al. 2021). Powerful new techniques continue to be developed, improving the accuracy of results and better supporting those responsible for flood risk management. In this study, several models – namely ANFIS, ANFIS-GBO, ANFIS-CGO, ANFIS-SCA, ANFIS-GWO, and ANFIS-DE – were developed to prepare flood susceptibility maps for a region that is often affected by flooding but does not have in place adequately powerful or effective measures to mitigate against flood damage.
In this study, 1,843 historic locations of flooding were gathered both from satellite images and field missions. Several different techniques highlighted in previous studies have been used to detect flood locations. For example, we can derive indices like Enhanced Vegetation Index (EVI), NDWI, and Normalized Difference Surface Water Index (NDSWI) to detect flood locations using optical remote sensing data such as Landsat imagery (Tong et al. 2018; Du et al. 2021). However, these data may be influenced by cloud cover; they also suffer from limited spatial and temporal resolutions. Several researchers have applied radar images like Sentinel 1 to detect flood locations (Martinis et al. 2018). These images use long wavelengths so they are not influenced by atmospheres and can obtain information day and night. Moreover, these images have the advantage of being able to detect water and non-water areas using threshold determination. For these reasons, we used radar data to collect flood locations in this study.
This study utilized RF to assess the importance of each of the 13 factors used to create the flood susceptibility model. The importance depends on the study area's geo-environmental, climatic, hydrological, and anthropic activities, and on the methodology utilized. In addition, the flood susceptibility model used in this study was data-driven, so the importance of each factor depends on data distribution (Andaryani et al. 2021; Luu et al. 2021). Importance was calculated by assigning weights to each factor. The higher the weight, the more important the factor, and vice versa. If the weight equals 0, the factor does not influence the probability of the occurrence of the flood. Shahabi et al. (2021) pointed out that slope, distance to river, drainage density, and TWI were the most important of the 10 factors used to construct a flood map of Iran's Haraz watershed, using the ORAE technique. Pham et al. (2021a) reported that soil type, distance from river, river density, and geology were most influential on flood occurrence in Vietnam's Nghe An province, using the RF technique. Luu et al. (2021) used OneR to assess the importance of conditioning factors when building flood susceptibility models for the Quang Binh province in Vietnam; they pointed out that land use, geology, slope, and rainfall were the most important factors.
Our RF analysis showed that the most important factors that cause flooding in Ha Tinh province were, in descending order, land use (0.25), elevation (0.2), slope (0.14), and distance from road (0.13). These results are consistent with previous studies. Land use ranks highly because flooding can only occur in the region after the soil has been saturated. The clearing of forests in the mountains and the construction of infrastructure allowing for the development of floodplains are the main causes of aggravation of floods. The reduction in the regulation and retention capacity of lakes, which is directly related to land use, is a crucial factor in the intensification of floods. The results show that all residential and agricultural areas are located in places with high and very high flood susceptibility. Residential areas are more vulnerable to floods because they have impermeable surfaces. In addition, agricultural areas are located on the low-lying coastal plains. Elevation and slope came second and third, and are still important: these physical characteristics strongly influence the occurrence of flooding related to the determination capacity of surface runoff. The higher the elevation and the slope, the faster the flow velocity and the faster the water accumulates in the delta area (Abedi et al. 2021). Towfiqul Islam et al. (2021) used machine learning to assess the flood susceptibility in the Teesta River Basin of Bangladesh. The authors emphasized that land use, distance to road, elevation, and slope are the most important factors contributing to the probability of flood occurrence. El-Haddad et al. (2021) reported that distance to river, land use, lithology, and slope were the most influential factors on the likelihood of flood occurrence in Wadi Qena Basin of Egypt. Arora et al. (2021) showed that distance to river, curvature, slope, river density and land use were the most important factors for the flood susceptibility model in the Ganga River Basin of India. Pham et al. (2021c) used 16 conditioning factors to establish flood susceptibility maps for the Quang Nam province of Vietnam. Of the factors used, elevation, rainfall, slope, and land cover/land use had the biggest effect on the likelihood of flood.
Finally, there is a question linked to the importance of precipitation in the flood susceptibility model. Why is it less influential than other factors such as land use, elevation, slope, and distance from road, while still being a flood trigger factor? Based on different types of floods, an appropriate conditioning factor is chosen for the flood susceptibility model. Moreover, for each type of flood, the importance of each conditioning factor is different. Towfiqul Islam et al. (2021) and Saha et al. (2021) eliminated precipitation factors when assessing flood susceptibility in the Teesta River Basin in northern Bangladesh. Luu et al. (2021) reported that rainfall ranked fourth out of the ten factors used to establish flood susceptibility maps in Quang Binh province of Vietnam. In the study of Nguyen et al. (2021a), rainfall ranked sixth out of the 13 factors used to map flood susceptibility in Quang Ngai province.
In the Ha Tinh province, there are fluvial and coastal floodplains. Coastal flooding has less of an impact because Ha Tinh province is less influenced by a micro-tidal regime and storm surges, which are the two important factors that cause flooding. The river flooding in the area is caused by torrential rains in the mountainous area and the intensity of the rains tends to reduce when they reach the plain. Moreover, several previous studies have pointed out that flooding can only occur after soil saturation (Luu et al. 2021). That is why, in most cases, floods usually occur after heavy rains or rainy seasons. The national Highway 1A acts as a dam which stores the waters evacuated toward the seas. This is one of the important causes of flooding in Ha Tinh province. Therefore, the importance of rainfall is less than that of land use, elevation, slope, or distance from road. This has been substantiated by several previous studies. NDVI had less influence on the model. The importance of land use, elevation, slope, and distance from road can be related to the diversity of characteristics of these factors. As mentioned above, the model applied in this study is data-driven; the distribution of the data strongly influences the results. So, the lower impact of NDVI may be due to the large area of vegetation (RF = 0).
The final result confirmed our initial hypothesis that the models proposed in this study would successfully build flood susceptibility maps for Ha Tinh and that they would outperform the reference models.
One of the advantages of metaheuristic algorithms is the usefulness of the stochastic system, which helps avoid local optimization problems in order to converge on a near-optimal solution (Lu et al. 2021). The objective here is not to obtain the best solution, but to have a near-optimal solution within a reasonable computation time. The system works first by exploration (to discover the promising spaces within which to search) and exploitation (the search for high-quality solutions in these promising spaces). Improving the performance of a metaheuristic algorithm requires a balance between the process of exploration and exploitation (Bui et al. 2020). The GBO and SCA algorithms had the advantage of a balance between these two processes (Abualigah & Diabat 2021; Deb et al. 2021). That is why two models, ANFIS-GBO and ANFIS-SCA, performed better than others.
Besides the simple structure, fast search speed, and high search precision, a big advantage of the GWO algorithm is that there are fewer setting parameters and it is easier to combine with other algorithms (Arora & Banyal 2021). The ANFIS-GWO model came third in terms of performance.
ANFIS-DE was fourth. Besides its simple structure and ease of use, DE has the advantage of avoiding the local optimization problem (Rout et al. 2013).
The ANFIS-CGO algorithm was less accurate than all the other models (RMSE = 0.26, MAE = 0.2, ACC = 0.95, K = 0.84, R2 = 0.54). CGO faces key limitations in the process of exploring the search space. This leads to local optimization problems (Talatahari & Azizi 2021).
The ANFIS model and associated hybrids have been applied in various studies to predict the likelihood of flood occurrence around the world. Costache et al. (2020) used ANFIS and hybrids thereof to analyze flood susceptibility in the Trotuș River Basin in Romania. The precision of the results varied from 0.85 to 0.94 for the value of AUC. Termeh et al. (2018) evaluated flood susceptibility in the Jahrom Township in Fars province in Iran using ANFIS and three optimization algorithms (Ant Colony Optimization, GA, and PSO). In this case, the value of AUC ranged from 0.91 to 0.94. Another study (Hong et al. 2018b) used ANFIS and two optimization algorithms (GA and DE) to predict flood susceptibility in Hengfeng County in China. In this study, the maximum AUC value was 0.87. Vafakhah et al. (2020) analyzed flood susceptibility in the Gilan Province of Iran using ANFIS. The accuracy score was 63%. We can conclude that the accuracy of the models proposed in this study was consistent with the accuracy of previous significant studies.
The flood susceptibility levels in Ha Tinh province are also consistent with the flood susceptibility maps obtained using the AHP method (Nguyen et al. 2022a). They are also corroborated by previous studies showing the extent to which the coastal plains in the Central region of Vietnam are often subject to large floods (Luu et al. 2021; Pham et al. 2021e; Nguyen et al. 2022b). Therefore, the flood susceptibility maps constructed in this study can be seen to be a suitable alternative solution for local organizations responsible for assessing flood susceptibility.
Flood management is now a critical task. As the climate changes, sea levels rise, and floods become more common and more destructive. In Ha Tinh and elsewhere, informed land-use planners must limit new construction or the concentration of populations in areas with high and very high probability of flooding. The accuracy of the models in this study surpassed the reference models in previous studies. Therefore, the results can support planners to develop necessary strategies to diminish the damage. Although this study applies to Vietnam, the results may be applied to other regions around the world that are similarly affected by flooding.
CONCLUSION
In the context of climate change, the severity and violence of floods are increasing in Asia, and especially in Vietnam. Therefore, building a flood susceptibility model with high accuracy is one of the most important tools available to policy makers who are responsible for formulating strategies to reduce impact. This study proposed five hybrid models – ANFIS-GBO, ANFIS-CGO, ANFIS-SCA, ANFIS-GWO, and ANFIS-DE – to determine areas with probability of occurrence of flooding for Ha Tinh province in Vietnam.
ANFIS and its hybrids are the most powerful models to determine areas with probability of occurrence of flooding; however, the accuracy depends on the structure of the input data, so data preprocessing must be done properly.
This was the first time these five models have been used to build such maps, which represents the main novelty of this study. The models are characterized by high precision (AUC > 0.95; RMSE < 0.26) and complement the current literature on flood susceptibility. Additionally, the complete proposed models can be used to assess flood susceptibility levels in any region of the world, particularly where data are limited.
Ha Tinh province was divided according to five levels of flood susceptibility. Between 32 and 38% of the study area was located in the high and very high flood susceptibility zone. These regions are mainly concentrated along the river and on the eastern coastal plain, where there is a high density of population and infrastructure.
Once potential flood-prone areas have been identified, local government agencies can take appropriate action to reduce the extreme negative impacts of flooding. The results of this study can also be applied to the prediction and assessment of other types of disasters.
This study still has general limitations related to model input data. However, reducing uncertainties will further improve the accuracy of the predictions. More conditioning factors must be selected; then one of the selection factor methods must be used to remove the non-predicting factors. Furthermore, a limited level of detail is available, particularly regarding flood depth and velocity. Plus, the flood inventory used in this study used binary values (0, 1) to present flood and non-flood points, and did not include flood frequency. Therefore, the flood locations were given equal weight when used to predict flood susceptibility. In future research, we will record the flood frequency at each flood point and put that data to use.
Flooding is changing due to climate change and land-use changes; therefore, the application of machine learning to assess the effects of these phenomena on flooding is extremely useful for decision-makers and others tasked with building effective strategies to reduce future damage to property and loss of life.
FUNDING STATEMENT
No funding was received for this study.
AUTHOR CONTRIBUTIONS STATEMENT
All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by H.D.N. The first draft of the manuscript was written by H.D.N. All authors read and approved the final manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.