ABSTRACT
The objective of this study was the development of a new machine learning model using a radial basis function neural network (RBFNN) to build flood susceptibility maps and damage assessment for the Phu Yen province of Vietnam. The built model will be optimized by five algorithms, namely Giant Trevally Optimization (GTO), Golden Jackal Optimization (GJO), Brown-Bear Optimization (BBO), Gray Wolf Optimizer (GWO), and Whale Optimization Algorithm (WOA) to find out the best model to establish the flood susceptibility map. These models were evaluated using the statistical indices such as root mean square error (RMSE), mean absolute error (MAE), receiver operating characteristic (ROC), area under the curve (AUC), and coefficient of determination (COD). The result showed that all five optimization algorithms were successfully improving the performance of the RBFNN model, among them the hybrid model RBFNN–BBO has the highest performance with AUC = 0.998 and R2 = 0.8 and the RBFNN–GTO model has the lowest performance with AUC = 0.755 and R2 = 0.65. The regions identified with a high- and very-high flood susceptibility area (1,075 km2) were concentrated on the plain and along three of the largest rivers in Phu Yen province.
HIGHLIGHTS
The RBFNN–BBO algorithm has a good performance in increasing accuracy and predicting flood susceptibility.
Model performance was evaluated using RMSE, R², and MAE.
The developed models achieved high accuracy for flood damage assessment.
INTRODUCTION
Both the intensity and number of flooding events worldwide have seen a sharp increase in recent years. The phenomenon has become a major concern for local and national governments alike (Mahdizadeh Gharakhanlou & Perez 2023; Riazi et al. 2023). These concerns have taken on a global dimension due to climate change, economic development, and measures to regulate waterways (Saikh & Mondal 2023). Typically, floods are accompanied by loss of life, destruction of infrastructure, and devastation to the agricultural sector, which can in turn lead to economic instability and food insecurity (Adnan et al. 2023; Rana & Mahanta 2023). In 2021, floods caused US$82 billion in damage globally, and according to the World Meteorological Organization, the number of people at risk in Asia is expected to increase from 70 million in 2023 to 156 million in 2040, with similar trends in Africa (25–34 million) and South America (6–12 million). These figures may yet be conservative, as they do not take into account population growth and increasing urbanization.
The risk is highest in Asia, as are flood-related deaths, where warning systems are still inadequate (Saleem Ashraf et al. 2017). Vietnam is particularly vulnerable to flooding, due to its geographic location (Hung et al. 2023; Nguyen et al. 2023a), where natural contributing factors include altitude, precipitation, tide, and typhoons. According to a report from Vietnam's Ministry of Natural Resources and Environment, floods in 2020 led to the destruction of 360 schools, the loss of two million head of livestock, and the devastation of nearly 30,000 ha of agricultural land.
There is a growing number of studies on flood risk assessment. Traditional approaches based on physical models include MIKE FLOOD (Nigussie & Altunkaynak 2019; Jahandideh-Tehrani et al. 2020) and HEC-RAS (Hicks & Peacock 2005; Ogras & Onen 2020). These models simulate the physical process of flow generation, which controls watershed response, and use physics-based equations to describe the process (Nguyen et al. 2023c). One issue with such models is that they require the adjustment of several parameters during the flood simulation process, which requires users to have a good understanding of the hydrology process. Furthermore, the use of physics-based models requires detailed site data, which can be expensive to obtain, and so is not suited to countries with limited financial resources. A data-driven approach has become more prevalent in recent years. These techniques have been widely used in previous studies to predict natural hazards such as expected landslides (Rafiei-Sardooi et al. 2018; Ren et al. 2022; Ha et al. 2024a, 2024b). In recent years, these techniques have been used to predict flood susceptibility based on relationship identification between training data and flood-influencing factors (Adnan et al. 2023; Mahdizadeh Gharakhanlou & Perez 2023). Examples include logistic regression (Özay & Orhan 2023), support vector machines (Mirkazemi et al. 2023; Salvati et al. 2023), random forest (Aydin & Iban 2023; Razavi-Termeh et al. 2023a), and AdaBoost (Aydin & Iban 2023). The problem with these models is that, individually, they are limited by the nonlinear problem.
More recently, deep learning models such as convolutional neural networks (Jiang et al. 2023), radial basis function neural networks (RBFNNs) (Nguyen 2022), adaptive neuro-fuzzy inference system (Hong et al. 2018; Termeh et al. 2018), and recurrent neural networks (Fang et al. 2021) have been used to build flood susceptibility models. Deep learning has advantages in its ability to process complex data, which is critical when building accurate models because floods are influenced by multiple factors. Such models are also flexible enough to be adapted to different regions and changing environmental conditions. Furthermore, they can be continuously updated with new data, making them suitable for continuous flood monitoring (Bentivoglio et al. 2021; Pham et al. 2021a, 2021b; Shahabi et al. 2021). However, a number of researchers have questioned the accuracy of such individual models, highlighting potential global optimization and extrapolation problems. For these reasons, some studies have integrated optimization algorithms.
The initial objective of this study was to compare the optimization capabilities of five algorithms to improve the prediction of flood susceptibility using the RBFNN model in the Phu Yen province. These algorithms were Giant Trevally Optimization (GTO), Golden Jackal Optimization (GJO), Brown-Bear Optimization Algorithm (BBO), Gray Wolf Optimizer (GWO), and Whale Optimization Algorithm (WOA). The second objective was to estimate flood damage using the flood susceptibility map produced by the proposed models.
Optimization algorithms have been used to improve the prediction capacity of RBFNNs in several past studies. Wang et al. (2023) proposed a particle swarm optimization-radial basis function (PSO-RBF) network PSO-RBF model that combines Particle Swarm Optimization and RBFNN to predict diesel engine performance (Wang et al. 2023). Sun (2023) developed an optimized neural network algorithm by determining the optimal training function and number of hidden layer nodes using an empirical formula method (Sun 2023). Liu et al. (2023) developed an improved black widow optimization (IBWO)-RBF model that uses the Improved Black Widow Optimization algorithm to enhance the generalization ability of the RBFNN (Liu et al. 2023). These studies demonstrate the effectiveness of optimization algorithms in improving the prediction capacity of RBFNNs. However, ours is the first study using optimization algorithms to improve the predictive ability of an RBFNN.
In addition, machine learning/deep learning techniques have been applied to estimate flood-related damage in various regions of Vietnam. For example, Vu et al. (2023) developed machine learning models using support vector machine and random forest algorithms to evaluate the effects of climate and land-use change on flood susceptibility in Thua Thien Hue province (Vu et al. 2023). Shahi et al. (2023) developed a Bayesian network model for the estimation of flood loss in Ho Chi Minh City, considering factors such as flood intensity, household characteristics, and damage (Shahi et al. 2023). Ha et al. (2023) investigated the importance of different attributes of flood damage, including housing damage and fatalities, using machine learning techniques (Ha et al. 2023). One study focused on flood mapping in Vietnam used a convolutional neural network to isolate flooded pixels on satellite radar imagery (La & Van Ngo 2022). However, ours is also the first study to apply RBFNN and optimization algorithms to estimate flood-related damage in a region in Vietnam. This localized approach makes it possible to develop appropriate flood risk management and adaptation strategies, contributing to efforts to strengthen the resilience capacity of communities in the fight against floods. Thus, the results of this study should improve the understanding of flood dynamics in land-use planning and provide support to decision-makers for sustainable land management, in order to reduce flood-related damage.
STUDY AREA AND MATERIALS
Study area
This province lies in the tropical monsoon region and is influenced by the ocean climate, so Phu Yen has two clear seasons: the rainy season from September to December and the dry season from January to August. The average rainfall in the study area ranges from 1,500 to 3,000 mm/year, of which about 70–80% falls in the rainy season (Phu Yen Statistical Office 2022). In addition, the flow in the study area varies not only in space but also in time. Flow distribution is correlated with the distribution of rainfall, with flow during the four months of the rainy season accounting for 70–75% of total annual flow. The hydrological system of Phu Yen province is very developed. There are three main rivers – the Ky Lo, the Ba, and the Ban Thach – a total basin area of 16,400 km2, and a total flow of 11.8 billion m3. The land resources in Phu Yen are diverse, with eight main soil groups: red and yellow, sandy, saline, alluvial, gray, black, red and yellow mountainous, and sloping valley (Pham et al. 2021a, 2021b).
MATERIAL
Flood inventory
Flood inventory maps show past flood events. They are very important for predicting future floods, as they help to highlight the relationships between flooded areas and flood triggers (Nguyen 2023; Nguyen et al. 2023b). In this study, we obtained 505 flood points from flood events in the study area in 1993, 2013, and 2021 using data from the Department of Natural Resources and Environment of Phu Yen province, our 2022 and 2023 field missions, and (for 2022) via Sentinel-1 SAR images. Our study used deep learning to predict flood susceptibility, requiring a large dataset, so data augmentation was essential.
We used a binary classification model, so the collection of non-flood points was necessary. These were randomly selected from regions of high elevation and slope in the northwest and south of the province. In total, 394 non-flood points were collected. All the flood and non-flood points, which were shown in Figure 1, were divided into two parts: 70% of the points were used to train the models and the remaining 30% to validate. This ratio has been used in several previous studies (Samanta et al. 2018; Nguyen et al. 2023a).
Flood conditioning factors
The careful selection of conditioning factors is important to identify the potential for the occurrence of flooding in a specific region. Chosen correctly, they allow us to better understand the characteristics of the environment, hydrology, climate, and human activity in a given region and present the relationships with flood locations in the past (Khosravi et al. 2019; Nachappa et al. 2020). There is no one set of factors universally agreed upon as the most effective.
ID . | Variables . | Format . | Resolution . | Source . |
---|---|---|---|---|
1 | Elevation | Raster | 10 × 10 m | Elevation, aspect, curvature, slope, and flow direction were extracted from the digital elevation model (DEM) at a resolution of 10 m, and constructed using a topography map at a scale of 1:50,000 (available at the Ministry of Natural Resources and Environment in Vietnam). These processes were carried out using the Spatial Analyst tool in ArcGIS 10.6. Distance to the river and distance to the road were extracted from a topography map with a scale of 1:50,000 using the ArcGIS Euclidean Distance tool. |
2 | Curvature | |||
3 | Aspect | |||
4 | Slope | |||
5 | Flow direction | |||
6 | Distance to river | |||
7 | Distance to road | |||
8 | Rainfall | Annual rainfall data were downloaded from https://chrsdata.eng.uci.edu/. | ||
9 | Land use | The land use in 2021 was sourced from the Ministry of Natural Resources and Environment of Vietnam. | ||
10 | NDBI | NDVI, NDBI, and NDMI were constructed from Landsat 8 OLI imagery from March 2021, which was downloaded from https://earthexplorer.usgs.gov/. NDVI, NDBI, and NDMI were calculated using the ArcGIS Raster Calculator. | ||
11 | NDVI | |||
12 | NDMI | |||
13 | Soil type | Soil type maps were sourced from the Ministry of Natural Resources and Environment of Vietnam. |
ID . | Variables . | Format . | Resolution . | Source . |
---|---|---|---|---|
1 | Elevation | Raster | 10 × 10 m | Elevation, aspect, curvature, slope, and flow direction were extracted from the digital elevation model (DEM) at a resolution of 10 m, and constructed using a topography map at a scale of 1:50,000 (available at the Ministry of Natural Resources and Environment in Vietnam). These processes were carried out using the Spatial Analyst tool in ArcGIS 10.6. Distance to the river and distance to the road were extracted from a topography map with a scale of 1:50,000 using the ArcGIS Euclidean Distance tool. |
2 | Curvature | |||
3 | Aspect | |||
4 | Slope | |||
5 | Flow direction | |||
6 | Distance to river | |||
7 | Distance to road | |||
8 | Rainfall | Annual rainfall data were downloaded from https://chrsdata.eng.uci.edu/. | ||
9 | Land use | The land use in 2021 was sourced from the Ministry of Natural Resources and Environment of Vietnam. | ||
10 | NDBI | NDVI, NDBI, and NDMI were constructed from Landsat 8 OLI imagery from March 2021, which was downloaded from https://earthexplorer.usgs.gov/. NDVI, NDBI, and NDMI were calculated using the ArcGIS Raster Calculator. | ||
11 | NDVI | |||
12 | NDMI | |||
13 | Soil type | Soil type maps were sourced from the Ministry of Natural Resources and Environment of Vietnam. |
The topographic factors were elevation, aspect, curvature, and slope. Elevation is an essential component in identifying areas with the potential for the occurrence of flooding, as water tends to accumulate in areas with low elevation. In addition, elevation also influences other characteristics such as slope, aspect, and curvature (Zhao et al. 2018; Dahri et al. 2022). Aspect describes the direction of the slope, as well as the precipitation that areas may receive (Nachappa et al. 2020). Curvature strongly influences the capacity of surface water accumulation; flat or concave surfaces are more likely to flood (Mirzaei et al. 2021). Slope controls the speed of surface flow; low-slope regions are also more vulnerable to flooding (Mirzaei et al. 2021).
The hydrometeorological factors were rainfall and flow direction. Rainfall is considered an essential factor in identifying regions with the potential for the occurrence of flooding. It determines the flow of the river, which is the main cause of flooding. Concentrated rainfall over a short period, combined with a short river and a steep slope, leads to flooding downstream (Nhu et al. 2020; Tella & Balogun 2020). Flow direction simply describes direct surface flow (Luu et al. 2022).
Land use was the only anthropogenic factor in this study. It directly influences the evaporation capacity, infiltration, flow speed, and quality of river flow (Saleh et al. 2020). Agricultural and forested areas have different characteristics than built-up areas. Concrete-covered land has reduced the water-infiltration capacity of the soil, increasing the volume of river flow. Changing land use can cause changes in natural water flow. In the study area, the reduction in the forested area due to agricultural development and urban growth is reducing the water retention capacity of the forest (Nguyen et al. 2022a, 2022b). All of this leads to an increased flood risk in the study area.
Soil type is an important geological factor that determines the water-infiltration capacity of the soil, which can affect the occurrence and severity of a flood event (Hammami et al. 2019).
In terms of conditioning factors related to location, distance to the river is key. In the Central region in general and the province of Phu Yen in particular, flooding is fluvial, so rivers and precipitation play an essential role. In general, the closer an area is to a river, the more susceptible it will be to flooding (Giovannettone et al. 2018; Choubin et al. 2019). Similarly, although being near a road can have the advantage of providing faster evacuation for residents, road distance is a key factor in flooding, as road construction can directly impact infiltration capacity, which increases the potential for flooding to occur (Choudhury et al. 2022).
NDVI plays an important role in the probability of the occurrence of floods in any region, as it represents the density of vegetation. Regions with a high density of vegetation can reduce surface runoff volumes, which decreases the likelihood of flooding (Khosravi et al. 2019). In contrast, NDBI reflects building density. Regions with high building density can increase surface runoff volumes by reducing the infiltration capacity of the soil (Li et al. 2023b). Finally, the normalized humidity index (NDMI) indicates the humidity of a region. The value of the NDMI is proportional to the probability of the occurrence of a flood (Gharakhanlou & Perez 2023). The values of NDVI, NDBI, and NDMI vary from −1 to 1.
METHODOLOGY
(i) Geospatial data preparation: Geospatial data used to build a precise flood susceptibility map were divided into two types: historical flood locations and conditioning factors. The flood locations were collected from the field and by flood detection using Sentinel 1A imagery based on the Google Earth Engine platform. The 13 conditioning factors were chosen in four distinct groups: topographic, hydrometeorological, anthropogenic, geological, and location-specific factors. These factors were collected from various sources, such as satellite imagery, the Ministries of Natural Resources and the Environment, and departments specializing in these areas. However, it should be noted that these factors were measured using different units of measurement. Therefore, it was necessary to normalize the values to make the factors comparable, adjusting them to the same scale under the assumption that the origin of the values remains constant, but that their range of values is rescaled from 0 to 1. There are several normalization techniques; in this study, we used max-min normalization based on the equation X = (X−Xmin)/(Xmax−Xmin).
(ii) Construction of models: The construction of flood susceptibility models was divided into two stages: first, the construction of the RBFNN model, and then the optimization of the RBFNN model using the algorithms BBO, GWO, WOA, GTO, and GJO.
We determined the optimal number of neurons in the hidden layer of the network and selected the most appropriate radial basis functions. We assigned weights to those functions.
The network training process began by initializing parameters, such as radial function centers, associated weights, and standard deviations. These parameters can be set randomly or by techniques such as trial and error. We used trial and error, setting parameters such as the number of neurons to 13, the variable dimension to 12, the maximum iterations, and the fitness function to the root mean square error (RMSE).
The five optimization algorithms were then used to optimize the parameters of the RBFNN model. This approach aimed to reduce the value of loss functions by iteratively adjusting the weights of the neural network. This iterative process involved repeatedly presenting training data to the network, comparing predicted values with observed values, and adjusting weights to reduce the gap between predicted and observed values.
At the beginning of the training process, the weights of the neural network are initialized by random values. The choice of the weights is very important because it directly affects the time and accuracy of the RBFNN model. After each iteration, the weights were calculated using each of the proposed optimization algorithms. The weights were modified in each iteration, so the initial value of each optimization algorithm for RBFNN was fixed, to ensure that the RMSE values were the same in the first iteration. This process ended when the value of RMSE had decreased.
For each of the five algorithms, the values of the number of neurons (12), dimensions of variables (13), problem size (181), number of iterations (500), and population size (100) stayed the same. The difference lay in the specific parameters ‘lb’: [−2] * problem_size, ‘ub’: [2] * problem_size for GTO, Grey Wolf Optimization Algorithm (GWOA), GJO, and GWO; ‘lb’: [−1] * problem_size, ‘ub’: [1] * problem_size, for BBO. The structure of the RBFNN model included 181 weight values, which corresponded to the number of weight matrix elements transferred to the RBFNN model. The five optimization algorithms were used to optimize these weights to reduce the RMSE value. The special parameters Ib and ub represent the lower and upper limits of the parameters. These parameters significantly influenced the convergence capacity of the model. In this study, we developed the models using Python programming languages, and other libraries such as TensorFlow, sklearn, and self-development libraries for models.
(iii) Evaluation and comparison of the RBFNN and hybrid models: The statistical indices RMSE, mean absolute error (MAE), AUC, and R2 were used to evaluate and compare the precision of the proposed models in order to construct maps of flood susceptibility.
(iv) Analysis of maps: After validation of the models, the models were used to generate flood susceptibility maps for the study area.
RADIAL BASIS FUNCTION NEURAL NETWORKS
RBFNNs are an example of an artificial neural network and were first presented in a research paper by Broomhead & Lowe (1988). This powerful algorithm allows for the conversion of input data into a higher-dimensional space to solve nonlinear problems or approximate complex functions. The efficiency of RBFNNs has been proven in multiple previous studies: Kwang Bo Cho and Bo Hyeun Wang used RBFNNs in plant identification and chaotic time series prediction, which returned good results with a low RMSE value (0.0668) (Cho & Wang 1996). Buchtala et al. (2005) showed the exceptional performance of RBFNNs in solving data mining problems in the fields of computer networking, biology, marketing, and chemistry. Concentrations of halo acetic acids were forecast by Hongjun Lin and his colleagues using RBFNNs (Lin et al. 2020). Their paper demonstrated the superiority of RBFNNs over multiple linear regression models in improving the accuracy of halo acetic acid estimation.
It is for these reasons that we chose RBFNNs for this study. In addition, the architecture of an RBFNN is simple, with three main layers: input, hidden, and output. The radial basis function was implemented in the hidden layer as an activation function for each hidden node with a center point and a width parameter. The center selection process, or how to choose an optimized number of radial basis functions, is one of the limitations of RBFNNs. However, this can be controlled by optimization algorithms (Bemporad 2020; Yang et al. 2020).
Giant Trevally Optimization
GTO is a new nature-inspired metaheuristic optimization algorithm, developed by Sadeeq & Abdulazeez (2022) based on the hunting strategy of the giant trevally fish (also known as the giant king fish; (Sadeeq & Abdulazeez 2022). The purpose of GTO is to find optimal solutions for complex problems. The first step is similar to many swarm-based optimization algorithms, in that it generates a random fish population size. The second step is an extensive search, which simulates the hunting movement in the search space of the fish and applies Levy distribution to avoid local convergence (Sadeeq & Abdulazeez 2022). The third step is to select the best area for foraging in the search space. The final step is to mimic the behavior of the fish as they attack. The advantages of GTO were proven by a performance comparison with five other optimization algorithms: Differential Evolution, Gravitational Search Algorithm, GWO, moth flame optimization, particle swarm optimization, and WOA. Although GTO has only existed for a short time, it is already being applied in several fields, such as energy (Sadeeq & Abdulazeez 2022) and engineering (Bhavana et al. 2023). At present, there is no research that has implemented GTO for flood damage assessment, which is why we proposed GTO for this study, to evaluate its effectiveness in this field.
Golden Jackal Optimization
GJO is another new metaheuristic optimization algorithm that was introduced by Chopra & Ansari (2022). The purpose of this algorithm is to mimic the foraging behavior of golden jackals to solve complex engineering problems in the real world. The deployment process of GJO consists of three main steps: prey searching, enclosing, and pouncing.
The first step, also known as exploration, is to seek potential solutions in the search space. The second and third (exploitation) steps are to find the best solution for the problem. GJO has many advantages, such as powerful convergence speed, few tuning parameters, easy parallelization, good scalability, and easy implementation (Arini et al. 2022). The algorithm has demonstrated its efficiency by evaluating 23 unimodal, multimodal, and fixed-dimension multimodal functions (Chopra & Ansari 2022). GJO has so far been applied in computer science (Yuan et al. 2022), medical (Houssein et al. 2022), chemical (Najjar et al. 2022), and energy (Rezaie et al. 2022b) contexts.
BBO algorithm
BBO is another novel nature-inspired optimization algorithm, presented by Prakash et al. (2023). The purpose of this algorithm was to solve the economic dispatch problem. In a similar way to other animal behavior simulation algorithms, the core idea of BBO is to mimic the hunting behavior of brown bears in their natural environment. Numbers of groups of bears are randomly generated. Then comes the first step of foraging, which is the pedal scent-making behavior phase, where gait, careful stepping, and the twisting of feet are key characteristics. The purpose of this phase is to identify the best local solution and help avoid the optimal local problem. The second foraging phase is sniff behavior (exploration), which is to determine the best global solution for the problem. The bears randomly sniff pedal marks in their territory start moving toward the nearest pedal mark, and leave others, until the final solution is found (Prakash et al. 2023).
The performance of BBO has been compared with six optimization algorithms (PSO, independent component analysis (ICA), WGO, WOA, ant lion optimizer (ALO), and GWO) and showed superior results to GWO, ALO, WOA, and PSO in the 19th, 20th, and 21st functions of cross-entropy clustering (CEC). Until now, it has not been used in research projects in any field, which is why BBO was selected for this study.
Gray Wolf Optimizer
GWO is one of the most popular nature-inspired optimization algorithms. It was proposed by Mirjalili et al. (2014) based on the foraging behavior and pack structure of gray wolves in the wild. A gray wolf pack comprises alpha, beta, and delta animals. The alpha wolf represents the best solution to the problem. Beta and delta wolves represent other potential solutions. The position of alpha, beta, and delta wolves is updated through three main steps: encircling, attacking, and searching for prey. In the encircling step, the wolves adjust their positions to surround the best solution found so far. This step promotes the exploration of the search space. In the attacking step, the wolves move toward the best solution, exploiting the promising regions of the search space. Finally, in the search for prey, the wolves adjust their positions based on a mathematical formula that considers the positions of the alpha, beta, and delta wolves. GWO continues to iterate until a stopping condition is reached, such as the end of iteration or the best solution found. The algorithm has been applied in many research fields, including computer science (Gupta & Deep 2020), energy (Jayabarathi et al. 2016), chemical engineering (Sharma et al. 2020), and medicine (Babu et al. 2018).
Whale Optimization Algorithm
WOA is a common swarm-based metaheuristic algorithm that mimics the social structure and bubble-net hunting strategy of humpback whales. It was first presented by Mirjalili, creator of GWO, and Andrew Lewis (Mirjalili & Lewis 2016). WOA has been applied in a number of research fields. For example, Nasiri & Khiyabani (2018) used the algorithm to solve the clustering problem in data mining. WOA also represented a significant step forward for Mafarja & Mirjalili (2017), as it allowed simulated annealing for the feature-selection problem.
The hunting strategy of humpback whales consists of three main phases: encircling, bubble-net, and search. The bubble-net phase itself has two steps, namely the Shrinking Encircling Mechanism and the Spiral Updating Position. In the first of these two steps, humpback whales identify the location of prey in order to encircle them. In the second (exploitation) phase, humpback whales attack prey by swimming around them within a shrinking circle and along a spiral-shaped path. In the final (exploration) phase, whales search randomly according to the position of every other individual to find the best solution (Rezaie et al. 2022a).
Model assessment
The evaluation metrics applied to assess the performance of models were RMSE, MAE, receiver operating characteristic (ROC), area under the curve (AUC), and R2. RMSE is a measure of the average difference between the predicted and actual values in a regression problem. It calculates the square root of the mean of the squared differences, providing a measure of overall model performance. MAE calculates the average absolute difference between the predicted and actual values. It is also used in regression tasks and provides a more interpretable measure of error compared to RMSE. ROC is a graphical representation of the performance of a binary classification model. It plots the true positive rate against the false positive rate at various classification thresholds. The area under the ROC curve, known as AUC, is a popular metric to assess the model's discriminatory power. A higher AUC indicates better model performance.
RESULTS
Relationship between training data and conditioning factors
During the occurrence of flooding, the most recognized features were elevation and slope. This has been found in several previous studies (Nguyen et al. 2022b). Low elevation and slope lead to major flooding in the study area. In addition, a large part of the study area was agricultural land, and in recent years, much of this area has been built on. These characteristics make flooding more likely.
Extensive road building in recent years has inhibited the evacuation of water into the sea. The replacement of soil surface with concrete surface has resulted in soil that cannot absorb rainwater, further increasing the risk of flooding. The study area has a high river density, which increases the volume of water during heavy rain. For these reasons, elevation, slope, land use, distance to the road, and distance to the river were the factors with the most influence on the occurrence of flooding. However, there is no predictive value in the case of curvature, aspect, and flow direction due to the distribution of flood points along the elevation, slope, and land-use directions.
Model validation and comparison
In terms of training data, with an AUC value of 0.998, the RBFNN–BBO model outperformed the other models. It was followed by RBFNN–GWO (AUC = 0.994), RBFNN–WOA (AUC = 0.993), RBFNN (AUC = 0.92), RBFNN–GJO (AUC = 0.86), and finally RBFNN–GTO (AUC = 0.8).
In terms of validation data, the RBFNN–BBO model also performed best, with an AUC value of 0.998. Next came RBFNN–GWO (AUC = 0.993), RBFNN–WOA (AUC = 0.98), RBFNN (AUC = 0.91), RBFNN–GJO (AUC = 0.82), and then RBFNN–GTO (AUC = 0.75).
For the BBO model, the RMSE value was more stable after the first 100 iterations, and it did not change after reaching 0.17.
The GWO model required 400 iterations before the RMSE stabilized, and it did not change after reaching 0.22.
The shape of the RMSE for the WOA model appeared similar to that of the GWO model. In particular, the RMSE values experienced an abrupt increase during the first 150 iterations (going from 1 to 0.45), before then beginning a slow decrease between the 150th and 400th iterations. They stabilized after 400 iterations, reaching a value of 0.24 at 500 iterations.
The shape of the RMSE of the RBFNN model remained almost unchanged throughout the iterative process, finally reaching an RMSE value of 0.43 after 500 iterations.
For the GJO model, the RMSE value experienced a significant change between the 150th and 250th iterations, before stabilizing at 250 iterations. The RMSE reached 0.44 after 500 iterations.
The GTO model barely changed over the course of 500 iterations, starting at 0.46 and ending at 0.45.
This study also used the MAE and R2 indices to evaluate the performance of the proposed models (Table 2). The RBFNN–BBO model exhibited higher accuracy on both training and validation data (MAE = 0.48, R2 = 0.81 for training data and MAE = 0.49, R2 = 0.8 for validation data), followed by RBFNN–GWO (MAE = 0.49, R2 = 0.79 and MAE = 0.49, R2 = 0.79), RBFNN–WOA (MAE = 0.49, R2 = 0.76 and MAE = 0.49, R2 = 0.75), RBFNN (MAE = 0.5, R2 = 0.75 and MAE = 0.5, R2 = 0.74), RBFNN–GJO (MAE = 0.51, R2 = 0.69 and MAE = 0.52, R2 = 0.65), and finally RBFNN–GTO (MAE = 0.51, R2 = 0.68 and MAE = 0.51, R2 = 0.65 for validation data).
. | Training data . | Validation data . | ||||||
---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | AUC . | R2 . | RMSE . | MAE . | AUC . | R2 . | |
RBFNN | 0.42 | 0.5 | 0.922 | 0.75 | 0.433 | 0.506 | 0.91 | 0.74 |
RBFNN–BBO | 0.16 | 0.48 | 0.998 | 0.81 | 0.17 | 0.49 | 0.998 | 0.8 |
RBFNN–GJO | 0.43 | 0.51 | 0.863 | 0.69 | 0.44 | 0.5 | 0.82 | 0.68 |
RBFNN–GTO | 0.44 | 0.51 | 0.809 | 0.68 | 0.45 | 0.51 | 0.755 | 0.65 |
RBFNN–GWO | 0.21 | 0.49 | 0.994 | 0.79 | 0.22 | 0.49 | 0.993 | 0.76 |
RBFNN–WOA | 0.23 | 0.49 | 0.993 | 0.76 | 0.24 | 0.49 | 0.985 | 0.75 |
. | Training data . | Validation data . | ||||||
---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | AUC . | R2 . | RMSE . | MAE . | AUC . | R2 . | |
RBFNN | 0.42 | 0.5 | 0.922 | 0.75 | 0.433 | 0.506 | 0.91 | 0.74 |
RBFNN–BBO | 0.16 | 0.48 | 0.998 | 0.81 | 0.17 | 0.49 | 0.998 | 0.8 |
RBFNN–GJO | 0.43 | 0.51 | 0.863 | 0.69 | 0.44 | 0.5 | 0.82 | 0.68 |
RBFNN–GTO | 0.44 | 0.51 | 0.809 | 0.68 | 0.45 | 0.51 | 0.755 | 0.65 |
RBFNN–GWO | 0.21 | 0.49 | 0.994 | 0.79 | 0.22 | 0.49 | 0.993 | 0.76 |
RBFNN–WOA | 0.23 | 0.49 | 0.993 | 0.76 | 0.24 | 0.49 | 0.985 | 0.75 |
Flood susceptibility mapping
For the RBFNN model, approximately 482 km2 of Phu Yen province was in the very low flood susceptibility zone, 773 km2 was in the low zone, 623 km2 in the moderate zone, 794 km2 in the high zone, and 501 km2 in the very-high zone.
For the RBFNN–BBO model, 493 km2 of the study area was in the very low zone, 857 km2 low, 747 km2 moderate, 633 km2 high, and 442 km2 very high.
For the RBFNN–GJO model, 657 km2 was very low, 868 km2 low, 560 km2 moderate, 220 km2 high, and 868 km2 very high.
For the RBFNN–GTO model, the very low zone covered approximately 463 km2 of the study area, the low zone 889 km2, the moderate zone 543 km2, the high zone 823 km2, and the very-high zone 463 km2.
The figures for RBFNN–GWO were 468 km2 very low, 858 km2 low, 717 km2 moderate, 671 km2 high, and 459 km2 very high.
For RBFNN–WOA, 204 km2 was very low, 743 km2 low, 649 km2 moderate, 1,114 km2 high, and 462 km2 very high.
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
RBFNN | 482.3476 | 773.0453 | 623.7308 | 794.331 | 501.1681 |
RBF–BBO | 493.2623 | 857.5994 | 747.3255 | 633.5942 | 442.8414 |
RBF–GJO | 657.3023 | 868.1354 | 560.7418 | 220.4124 | 868.0309 |
RBF–GTO | 463.0023 | 889.9834 | 534.6636 | 823.417 | 463.5565 |
RBF–GWO | 468.4679 | 858.0456 | 717.1629 | 671.5991 | 459.5517 |
RBF–WOA | 204.7861 | 743.1152 | 649.2245 | 1,114.9086 | 462.7928 |
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
RBFNN | 482.3476 | 773.0453 | 623.7308 | 794.331 | 501.1681 |
RBF–BBO | 493.2623 | 857.5994 | 747.3255 | 633.5942 | 442.8414 |
RBF–GJO | 657.3023 | 868.1354 | 560.7418 | 220.4124 | 868.0309 |
RBF–GTO | 463.0023 | 889.9834 | 534.6636 | 823.417 | 463.5565 |
RBF–GWO | 468.4679 | 858.0456 | 717.1629 | 671.5991 | 459.5517 |
RBF–WOA | 204.7861 | 743.1152 | 649.2245 | 1,114.9086 | 462.7928 |
In general, although there is little difference between the models, however, the regions with high and very-high flood susceptibility concentrate along the coastline and the Ba River watershed (south of Phu Yen province). These regions are characterized by high building density and low elevation, which reduces the infiltration capacity of the soil.
DISCUSSION
Implications for coastal spatial planning
Most of the areas severely affected by flooding are in the coastal area, downstream of the Ba River (Da Rang River), Ban Thach River, Phu Ngan River, and Vet River.
The Tuy Hoa coastal plain is the rice granary of the entire south-central coastal plain of Vietnam. Annual floods lead to rice losses for thousands of farming households and affect the province's food security. Therefore, it is necessary to construct a system of dikes and culverts to prevent river flooding, divert coastal flooding, and modify land-use structures in high- to very-high flood-risk areas to avoid long-term damage.
Phu Yen has 1,957 ha of pond aquaculture, concentrated along the shores of lagoons, bays, or estuaries, and 1,650 ha of floating cage farming (lobster farming), which takes place in Xuan Dai Bay, Vung Ro and O Loan and Cu Mong lagoons (collected from the Department of Fisheries of Phu Yen in 2020). Phu Yen is the province with the largest lobster production in the country. In the province, floods on rivers often occur after storms and heavy rain. The resulting flow drags the cages, rafts, and other property of farmers. Floodwater from the basin pours into aquaculture areas, causing a decrease in the salinity of the seawater surface of lagoons and bays. Lower salinity leads to a decline in lobster production (Jury et al. 1994; Ross & Behringer 2019). Mass die-offs of lobsters have occurred in Phu Yen following flooding, resulting in heavy losses for farmers (Table 3). Therefore, to minimize damage, the government needs to build an early-warning system for rain and flood currents, so that farmers can promptly respond, such as by reinforcing aquaculture infrastructure and changing the depth of lobster cages, limiting the use of cages and rafts on the surface layer to avoid lobsters dying due to a salinity decrease. At the same time, the province also needs to reduce farming density and re-plan aquaculture areas to avoid areas strongly affected by flood currents.
Phu Yen's coastal industrial parks are underdeveloped and concentrated mainly in Xuan Hai commune, Song Cau town. They can be strongly affected by storms and storm surges but are less affected by floods because the parks are mainly built on 3–7 m-high coastal sand dunes. On the contrary, there are many densely populated areas and sections of coastal infrastructure that are at risk of the cumulative impact of storms and subsequent flooding (Table 4). The People's Committee of Phu Yen Province should re-plan residential areas located in areas highly sensitive to floods, identify evacuation areas, adapt housing, and strengthen infrastructure. When constructing residential areas and coastal infrastructure, it is necessary to avoid blocking flooding flows. At the same time, the province needs to invest in a monitoring and early-warning system.
Date . | Loss and damage description . | Source . |
---|---|---|
15/10/2022, after the No. 5 storm | The seawater surface for lobster farming is altered by persistently high rainfall. There were a lot of dead lobsters in Xuan Thanh commune. About two tons of lobster are lost for every 21 farming households. Up to two billion Vietnamdong (VND) were estimated as total damages. | People's Committee of Song Cau town |
30/11/2021–31/11/2021 |
| Phu Yen Steering Committee for Natural Disaster Prevention and Control |
13/11/2020, after the No. 12 storm | In Song Cau town, 169 cage aquaculture households suffered damage with more than 1,520 lobster cages, an estimated loss of nearly 40 billion VND. Of which, Xuan Phuong commune has 105 households with 50,340 blue lobsters; Xuan Yen ward has 11 households; Xuan Dai ward has 3 households with 39,800 blue lobsters; and Xuan Thanh ward has 50 households with 51,940 commercial blue lobsters and 203,000 small blue lobsters. The total value of the damage caused by the direct impact of the No. 12 storm and floods in Phu Yen province from 10 to 11 December 2023 is nearly 455 billion VND. | Department of Agriculture and Rural Development of Phu Yen Province |
08/11/2016 | About 250 lobster farming households in Song Cau town were damaged by floods. Particularly in Hoa Loi village, Xuan Canh commune, about 230 farming households suffered damage with about 286,000 lobsters dying, an estimated loss of more than 31 billion VND. Other localities such as Hoa Thanh village, Xuan Canh commune had about 6,000 dead lobsters, Xuan Thinh commune had more than 10,000 dead lobsters. Many white-leg shrimp farming areas and Oc Huong snails in Song Cau town were also affected. Floodwater caused inundation and riverbank erosion. The damage to aquaculture in Song Cau town was estimated at more than 35 billion VND | People's Committee of Song Cau town |
11/11/2009 | Storms and floods caused 2,970 billion VND in damage. It is estimated that damages in Tuy An, Dong Xuan, Song Cau, Phu Hoa, Tay Hoa, Song Hinh, Dong Hoa, Tuy Hoa, and Son Hoa districts are 869, 824, 400, 150, 124, 83, 62, 50, and 25 billion VND, respectively.
| Department of Agriculture and Rural Development of Phu Yen Province |
Date . | Loss and damage description . | Source . |
---|---|---|
15/10/2022, after the No. 5 storm | The seawater surface for lobster farming is altered by persistently high rainfall. There were a lot of dead lobsters in Xuan Thanh commune. About two tons of lobster are lost for every 21 farming households. Up to two billion Vietnamdong (VND) were estimated as total damages. | People's Committee of Song Cau town |
30/11/2021–31/11/2021 |
| Phu Yen Steering Committee for Natural Disaster Prevention and Control |
13/11/2020, after the No. 12 storm | In Song Cau town, 169 cage aquaculture households suffered damage with more than 1,520 lobster cages, an estimated loss of nearly 40 billion VND. Of which, Xuan Phuong commune has 105 households with 50,340 blue lobsters; Xuan Yen ward has 11 households; Xuan Dai ward has 3 households with 39,800 blue lobsters; and Xuan Thanh ward has 50 households with 51,940 commercial blue lobsters and 203,000 small blue lobsters. The total value of the damage caused by the direct impact of the No. 12 storm and floods in Phu Yen province from 10 to 11 December 2023 is nearly 455 billion VND. | Department of Agriculture and Rural Development of Phu Yen Province |
08/11/2016 | About 250 lobster farming households in Song Cau town were damaged by floods. Particularly in Hoa Loi village, Xuan Canh commune, about 230 farming households suffered damage with about 286,000 lobsters dying, an estimated loss of more than 31 billion VND. Other localities such as Hoa Thanh village, Xuan Canh commune had about 6,000 dead lobsters, Xuan Thinh commune had more than 10,000 dead lobsters. Many white-leg shrimp farming areas and Oc Huong snails in Song Cau town were also affected. Floodwater caused inundation and riverbank erosion. The damage to aquaculture in Song Cau town was estimated at more than 35 billion VND | People's Committee of Song Cau town |
11/11/2009 | Storms and floods caused 2,970 billion VND in damage. It is estimated that damages in Tuy An, Dong Xuan, Song Cau, Phu Hoa, Tay Hoa, Song Hinh, Dong Hoa, Tuy Hoa, and Son Hoa districts are 869, 824, 400, 150, 124, 83, 62, 50, and 25 billion VND, respectively.
| Department of Agriculture and Rural Development of Phu Yen Province |
RBFNNS FOR FLOOD MODELING IMPLICATIONS
Model complexity is an important factor when data are limited. Using models that are too complex increases the potential for overfitting. The overfitting problem occurs when the model learns the biases in the training dataset. Therefore, these models cannot be generalized to new datasets. The corresponding problem of underfitting occurs when models are too simple and lack the ability to learn all the training data.
Several studies have found RBFNNs to be an excellent solution to solve the problems of overfitting and underfitting. They can also solve the nonlinear problem and are less susceptible to the local optimization problem during training (Leonard & Kramer 1991; Yu et al. 2011). They are less successful in determining the positioning of the centers of radial basis functions. They can also be very sensitive to hyperparameters and their tuning (Leonard & Kramer 1991). For these reasons, it is necessary to use optimization algorithms.
Of the five algorithms we tested, BBO was most effective in improving the performance of the RBFNN model. BBO can efficiently explore the search space for optimization problems and it has the ability to solve the local optimization problem or pass the search space where there are rarely optimal solutions (Prakash et al. 2023).
GWO was the second-most effective. In addition to its simplicity and ease of implementation, GWO has the capacity for rapid convergence and the avoidance of local optimization. It also has no derivative or gradient requirements, so it can be applied to solve problems for which this information is not available or is difficult to collect (Miao et al. 2020; Zhang et al. 2021). The third was WOA. It can balance the exploration and exploitation processes, which allows it to discover many potential optimal solutions (Mirjalili & Lewis 2016). Neither GTO nor GJO were able to improve the forecasting ability of the RBFNN model. Both have low exploitability and are sensitive to the local optimization problem (Sadeeq & Abdulazeez 2022; Mohapatra & Mohapatra 2023; Zhang et al. 2023).
The flood mapping method used in this study, based on deep learning, is consistent with maps produced by other methods such as hydraulic modeling (Mike Flood) (Tra et al. 2023; Tuan et al. 2024). Moreover, in terms of the accuracy of the proposed models, the AUC value of the models in this study was slightly higher than in previous studies (Tien Bui et al. 2020; Shahabi et al. 2021). Therefore, the method used in this study can be considered an effective tool to help decision-makers develop sustainable land-use planning strategies.
Managing flood risk and developing preventive measures are the main objectives of constructing flood susceptibility maps. Climate change and improper planning, as well as population growth, have been shown to increase the risk of flooding around the world (Razavi-Termeh et al. 2023b). Due to the change in focus of flood risk management and preventive measures, flood prevention is no longer the primary objective of risk management. Residents of areas frequently affected by flooding are often well aware of the potential consequences (Saikh & Mondal 2023). Local authorities and planners should ensure that information is provided to the population on the assessment of each area's susceptibility to flooding, as well as the building codes applicable in each area.
CHALLENGES
Flooding is increasingly influenced by human activity and climate change. This makes it difficult to predict. Several previous studies have proven that data-driven methods can successfully predict flooding, but the data must be sufficient (Solomatine & Ostfeld 2008; Wang et al. 2019). This can be difficult to achieve in Vietnam and other developing countries, due to strict data-sharing policies, as well as insufficient funding (Nguyen et al. 2024b).
Another challenge when using machine learning is the extrapolation problem, i.e. the models cannot apply for cases outside the scope of the training data. Machine learning models are suited to specific training datasets. Theoretically, this should not be a problem if the model has enough training data, including all possible conditions. As mentioned above, however, it is difficult to collect enough of the right data.
Two solutions have been put forward to resolve this problem: the first is the use of physics-based models such as hydrodynamic modeling (e.g. MIKE FLOOD) to improve the quality of training data, particularly in the case of extremes (Nguyen et al. 2020b), although such models require more detailed data on topography, hydrology, and climate (Li et al. 2023a). The second is the integration of machine learning models with other methods to improve the ability to solve the extraposition problem, such as the linear regression method (Nguyen et al. 2024a).
LIMITATIONS AND FUTURE RESEARCH
Several studies have pointed out that the risk to life and the economy is strongly influenced by the depth and speed of floods (Abt et al. 1989; Schmuck 2012). However, this study only evaluated developed areas, agricultural areas, and aquaculture areas in regions with high flood potential, with the aim of avoiding new construction or economic activities in these areas with high flood potential. In future research, we will focus on damage estimation using flood depth and speed from hydraulodynamic models.
Topography factors were extracted from DEM at a resolution of 10 m, which provides only limited information on terrain surfaces. Several studies have, however, concluded that this is still adequate to estimate flood susceptibility (Manfreda & Samela 2019; Doorga et al. 2022). Other technologies, such as unmanned aerial vehicle (UAV) or LiDAR, can capture more details, including flow direction, slope direction, and human construction such as dikes and roads, although they are very expensive and therefore limited to larger-scale applications.
We used Landsat OLI imagery at a resolution of 30 m to calculate the indices of NDVI, NDBI, and NDWI, whereas Sentinel 2A at 10 m resolution would better reflect land surface characteristics such as vegetation density and construction density. In future research, we will use data with a better resolution.
Several studies have highlighted that climate change and urban growth both have a significant influence on the frequency and intensity of flooding (Ionno et al. 2024; Ma et al. 2024; Nguyen et al. 2024c). Jafarpour Ghalehteimouri et al. (2024) highlighted that urbanization is considered one of the main causes leading to the increase in flood susceptibility in Kuala Lumpur (Jafarpour Ghalehteimouri et al. 2024). Lan et al. (2024) reported that climate change is an important driver of increased flood risk in the Yellow River Basin (Lan et al. 2024). Therefore, evaluating these changes in flood models is crucial to inform decision-makers so they may avoid new construction in high- and very-high flood zones. While this study successfully developed new flood susceptibility models with precision using machine learning, it was limited in evaluating the effects of climate change and land-cover change on the flood model. Future research will integrate climate change and socio-economic scenarios into the flood susceptibility model to assess the impacts of these changes on flooding.
This study used RMSE as the objective function for the five optimization algorithms, as well as for machine learning in general. Therefore, considering the overfitting problem was essential when building the model (Nguyen et al. 2020a; Van Phong et al. 2020). We applied several techniques to mitigate this problem, such as limiting the scope of the research, but improving the quality of the training dataset, such as by increasing the data at different geographic locations and dates, would further improve model performance.
One of the most important factors when developing a machine learning model is the quality of the dependent data, such as the flood inventory map, which plays a crucial role in determining the accuracy of the model. In many cases, these data are collected in the field. However, this method may reduce model accuracy due to incomplete data updating. An advantage of this study was the use of satellite imagery to monitor flooding and improve the quality of the flood inventory map, but we plan to further improve flood mapping using hydraulic or hydrodynamic modeling in future studies.
This study was successful in building a flood susceptibility model for Phu Yen province. It was based on localized environmental and socio-economic factors. Flood patterns elsewhere may differ in terms of terrain, land use, rainfall characteristics, etc. So, in future studies, collecting flood data from other neighboring regions would allow us to evaluate how well the models generalize, and identify areas needing recalibration before operational use.
CONCLUSION
The most effective way to reduce the effect of flooding is often through the demarcation of areas of flood susceptibility. This study constructed flood susceptibility maps using a hybrid machine learning approach. The models were RBFNN, RBFNN–BBO, RBFNN–GWO, RBFNN–WOA, RBFNN–GTO, and RBFNN–GJO, and the study area was the province of Phu Yen in Vietnam.
The integration of models is very useful for the precise assessment of flood susceptibility. These hybrid models offer considerable versatility, making it possible to assess flood susceptibility in any region, particularly in areas where available data are limited.
Among the proposed models, the RBFNN–BBO model outperformed the other models. The resulting maps showed 1,000–1,500 km2 of the surface of Phu Yen province to be in the region of high- or very-high flood susceptibility. Most of this was on the coastal plain. Aquaculture, rice fields, salt production, rural and urban settlements, and urban infrastructure were identified as land-use types that are severely damaged by storms and floods every year. The Phu Yen government must integrate the impacts of flooding into spatial development plans. In addition, it is necessary to establish an early-warning system and restructure economic activities and coastal land use to reduce flood damage.
Flood risk management is increasingly being seen as a top priority, particularly in Vietnam, which is especially badly affected by flooding. Different watersheds and regions have not yet tested flood control measures, so this study was carried out in Phu Yen province using hybrid models to help develop an understanding of flood risk to support future mitigation strategies.
ACKNOWLEDGEMENT
We are thankful to the anonymous referees for reviewing and helping us to improve our paper. This study has been supported by the Ministry of Science and Technology (MOST, Vietnam) under the grant number ĐTĐLCN-91/21.
FUNDING STATEMENT
This study has been supported by the Ministry of Science and Technology (MOST, Vietnam) under the grant number ĐTĐLCN-91/21.
AUTHOR CONTRIBUTIONS STATEMENT
All authors contributed to the study conception and design. Material preparation and data collection and analysis were performed by V.T.T., H.D.N., and Q.-H.N. The first draft of the manuscript was written by H.D.N. and V.T.T. All authors read and approved the final manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.