ABSTRACT
The objective of this study was to develop a theoretical framework based on machine learning, the hydrodynamic model, and the analytic hierarchy process (AHP) to assess the risk of flooding downstream of the Ba River in the Phu Yen. The framework was made up of three main factors: flood risk, flood exposure, and flood vulnerability. Hazard was calculated from flood depth, flood velocity, and flood susceptibility, of which depth and velocity were calculated using the hydrodynamic model, and flood susceptibility was built using machine learning, namely, support vector machines, decision trees, AdaBoost, and CatBoost. Flood exposure was constructed by combining population density, distance to the river, and land use/land cover. Flood vulnerability was constructed by combining poverty level and road density. The indices of each factor were integrated using the AHP. The results showed that the hydraulic model was successful in simulating flood events in 1993 and 2020, with Nash–Sutcliffe efficiency values of 0.95 and 0.79, respectively. All machine learning models performed well, with area under curve (AUC) values of more than 0.90; among them, AdaBoost was most accurate, with an AUC value of 0.99.
HIGHLIGHTS
Flood risk assessment was performed using machine learning hydrodynamic modelling.
Models performance was evaluated using root-mean-square error, R2, mean absolute error.
Flood risk increases due to urban growth.
INTRODUCTION
Flooding, which is one of the most common natural disasters in the world, causes devastating natural, economic, and social damage (Merz et al. 2014; Tsakiris 2014; De Moel et al. 2015). Floods claim around 20,000 lives per year (Adhikari et al. 2010) and affected approximately 2 billion people worldwide from 1998 to 2017 (Saravanan & Abijith 2022). Vietnam is especially affected by floods due to its geographical location. This country is considered one of the eight that was most influenced by extreme climate conditions (Nguyen et al. 2022).
From October 6 to November 15, 2020, Vietnam was hit by nine typhoons, causing significant flooding in eight provinces in the central region. According to a report by the Vietnam Red Cross, these events caused 357 deaths, 876 injuries, and 30,000 hectares of devastated crops. If projections are accurate, the combination of an increase in the size of the population in flood-prone areas, deforestation, the disappearance of marshlands, and an increase in average sea level will lead to more frequent catastrophic floods, which will affect a growing number of people (Nguyen et al. 2023a; Ramadhan et al. 2023).
Historically, local authorities primarily dealt with flood risks through two main approaches:
(1) Hazard control: This involved large-scale dike construction aiming to contain worst-case flooding scenarios. Despite achieving some success, this strategy proved costly and often ineffective, particularly for smaller flooding events. Moreover, it often neglected the social impacts of community disruption and the modification of natural landscapes (Nguyen et al. 2022; Chakraborty et al. 2023).
(2) Flood management and preparedness: This approach focused on educating communities and preparing them for potential flooding. Although valuable, it lacked a comprehensive understanding of flood risks and their spatial distribution (Mahmoodi et al. 2023; Saber et al. 2023).
With advancements in scientific and technological tools, flood risk management has emerged as a critical element in flood mitigation strategies. This approach goes beyond hazard control and instead assesses the likelihood and potential consequences of flooding in specific areas. Flood risk assessments integrate various analyses, including technical, cost, policy, and damage analyses (Li et al. 2023; Liu et al. 2023).
Spatial mapping is a crucial part of a robust flood prevention plan. It identifies areas where the socioeconomic consequences of flooding would be very high (Foudi et al. 2015). However, this is only part of a fully integrated flood risk management strategy, which is a key for risk assessment and policy design (Chen et al. 2011; Rana & Routray 2018). Such a strategy considers the relationships between risk management measures and analyses of their cost and effectiveness in the context of environmental and socioeconomic change. It attempts to avoid the transfer of risks from upstream to downstream and limit transfers between sectors (residential, agricultural, and non-agricultural) (Samuels et al. 2006; Dawson et al. 2011; Klijn et al. 2015).
Effective flood risk management depends on the assessment of floods and their consequences. Such assessments open discussions on how to respond in the event of flooding (De Wrachien et al. 2011; Lawrence et al. 2014). Furthermore, such assessment frameworks can be used for standardization purposes, helping local authorities suggest possible measures that can be implemented and prioritize areas of support before, during, and after a flood (Ferrier & Haque 2003). In recent decades, much research has focused on developing such flood risk assessment frameworks at different spatial scales and for different purposes (Plate 2002; Merz et al. 2010; Disse et al. 2020; Huang et al. 2020). There are, however, notable differences in the methodology and use of criteria in these assessments. These differences arise from the availability of data and the applicability of methods at different scales. However, increases in computing power and the number of available data sources allow for the generalization of theoretical frameworks at different scales and the use of global methods for more localized areas, especially where data are scarce. It is still necessary to understand the limitations of the theoretical frameworks and flood risk assessment methods of previous studies.
The conception of flood risk as a combination of hazard, exposure, and vulnerability has been widely used in previous studies. Flood hazard is defined as the possibility of causing damage or loss to people or property. This possibility depends on the potential for the occurrence of flooding in a given region (da Silva et al. 2020; Shah et al. 2020; Thapa et al. 2020). Flood exposure is defined as the presence of property and humans in flood zones (Baky et al. 2020; Hadipour et al. 2020; Tate et al. 2021). Vulnerability represents the characteristics of communities and infrastructure in the flood zone (Baky et al. 2020; Jha et al. 2020). Different natural and anthropogenic factors have effects on flooding and its intensity (rainfall, topography, land use/land cover (LULC), etc.). Specific climatic conditions and local physical characteristics (such as intense rainy episodes in the fall or the presence of poorly permeable soils) may make a territory particularly vulnerable to flooding. The significant development of artificial surfaces in recent decades has reinforced such vulnerability (urbanization of flood expansion zones, development of watercourses, waterproofing of soils, etc.) (Hemmati et al. 2020; Feng et al. 2021).
In the literature, flood risk is the result of the interaction between human and natural factors. Flood exposure and flood vulnerability are often linked to socioeconomic and human characteristics. The flood hazard often presents the physical characteristics of floods. Previous studies have often combined flood depth and flood velocity using hydrodynamic models such as MIKE FLOOD (Kadam & Sen 2012; Robi et al. 2019) and HEC-RAS (Farooq et al. 2019; El Bilali et al. 2021) to assess flood risk. However, the use of hydrodynamic modelling requires detailed data over a long period, such as topographic, bathymetric, and climatic data. Thus, these models are limited in their application on a large regional scale (Tehrany et al. 2015a, 2015b). Some recent studies have integrated flood susceptibility into flood hazard assessment by applying machine learning algorithms. These algorithms include random forest (RF) (Lee et al. 2017; Wang et al. 2020), AdaBoost (ADB) (Saravanan et al. 2023; Jahanbani et al. 2024), XGBoost (Madhuri et al. 2021; Linh et al. 2022), CatBoost (CB) (Saber et al. 2022), and artificial neural networks (Falah et al. 2019; Arora et al. 2021). Machine learning has the advantage of being able to analyse complex topographical, meteorological, and hydrological data. This enables in-depth analysis to predict high-risk regions. In addition, machine learning can automate and accelerate the process of data analysis, which can reduce the time needed to generate flood susceptibility models. In addition, compared to hydrodynamic modelling, machine learning has the advantage of resolving complex nonlinear relationships from large datasets. In addition, machine learning models can learn from new data to improve their accuracy over time (Nachappa et al. 2020; Islam et al. 2021).
To enhance the accuracy of machine learning models, several studies have integrated individual or deep learning models with optimization algorithms: Nguyen et al. (2023b) used deep neural networks along with the Aquila optimizer algorithm, Sea Lion optimization, Elephant Herding optimization, the Naked Mole-Rat algorithm, and stochastic gradient descent to evaluate flood susceptibility in the Binh Dinh province of Vietnam. Gharakhanlou & Perez (2023) applied decision trees (DTs), adaptive boosting, RF, multilayer perceptrons, logistic regression, and support vector machines to evaluate flood susceptibility in India. Saravanan & Abijith (2022) evaluated flood susceptibility in Northeast Coastal district of Tamil Nadu using gradient boosting machines, rotation forest, XGBoost, support vector machines, and Naive Bayes.
In general, previous studies focus on assessing flood risk locally, regionally, or nationally, both spatially and temporally, by integrating hydrological, hydrodynamic (flood depth, flood velocity), and socioeconomic conditions. Rarely do studies integrate flood susceptibility into flood risk assessment. Pham et al. (2021) pointed out that flood risk also depends on the probability of the occurrence of flooding in a region. Therefore, the integration of flood susceptibility in the assessment of flood risk can provide an important tool in the assessment of flood risk. In addition, while various machine learning algorithms have been employed to create flood susceptibility maps, there is no universal guideline for selecting the best algorithms for all regions, as each region has different characteristics.
The Ba River watershed in Vietnam frequently suffers from flooding, causing significant damage to human life and the economy. However, there are so far no studies on flood risk assessment in this region (Cu Thi et al. 2018). One of the most popular multicriteria analysis methods, the analytical hierarchical process (AHP), was used in this study to generate indices of different aspects of flooding (hazard, exposure, and vulnerability), to develop the flood risk maps. AHP has been applied in several studies before this study to assess flood risks, and it allows for a structured analysis that takes into account the multiple factors at play and their interdependencies.
In this study, the main objective was to develop a theoretical framework for assessing flood risk by integrating machine learning, the hydrodynamic model, and the AHP. The study area was downstream of the Ba River in Phu Yen province. It was chosen as it is affected by frequent and heavy flooding. We used machine learning models, namely, CB, ADB, and DTs to construct flood susceptibility maps, while MIKE FLOOD hydraulic modelling was used to construct flood depth and flood velocity maps. These indices were combined with the socioeconomic data to assess flood risk. Machine learning and hydraulic modelling models were selected in this study because they are considered the most popular and widely used models in previous studies (Wu et al. 2020; Nguyen et al. 2022; Youssef et al. 2022). So they can easily redo it in other regions. The novel element of this study is that it is one of the first to integrate machine learning, hydrodynamic modelling, and the AHP to develop a framework for assessing flood risk. The study also presents an effective tool for the construction of flood risk maps, which can support decision-makers in future flood risk management.
STUDY AREA AND MATERIAL
Study area
Phu Yen province is located in the South-Central region of Vietnam and occupies an area of 5,060 km2. It features a very diverse landscape of plateaus, tall hills, valleys, and coastal plains. Phu Yen borders the Cu Mong mountain range to the north (elevation of 245 m above sea level), the Vong Phu–Ca range to the south (elevation of 706 m), and the Truong Son range to the west (elevation of 2,600 m). Located in the tropical monsoon region, the province receives relatively high rainfall, ranging from 1,500 to 3,000 mm/year, mainly concentrated in the period September–December, which accounts for about 80% of the total annual rainfall.
The Phu Yen river system is very rich and fairly evenly distributed throughout the province, with more than 50 rivers, including the main three, the Ky Lo, Ba, and Ban Thach. These all originate in the Truong Son mountain range. The Ba River alone has a basin area of 13,900 km2. The entire Ba River system has a flow rate of 302 m3/sec, with an average total water volume of 9,527 million m3/year. The river system is characterized by numerous short rivers and steep slopes, causing high flow velocities.
Material
Flood inventory maps
Flood inventory plays an important role in generating a flood susceptibility map via the machine learning method. It represents the spatial relationship between the location of historical flooding and environmental, hydrological, climatic, and anthropogenic factors (Nachappa et al. 2020; Arora et al. 2021). Phu Yen province is affected by frequent flood events each year due to its unique geographical characteristics.
The model used to build the flood susceptibility maps was a binary classification model, so collecting non-flood points was also required. The approximately 200 non-flood points were collected from areas that had never flooded, such as those at high altitudes and/or with high slope values.
Finally, the flood and non-flood points were used as input data for the flood susceptibility models. These data were divided into two parts: 70% to train the flood susceptibility models and 30% to validate the models. Several rates were tried, such as 60/40 and 80/20. However, the 70/30 showed more precision; thus, we selected the rate of 70/30 to construct the machine learning model.
Flood-influencing factors
These factors were collected from different sources. Elevation, slope, aspect, curvature, TWI, and STI were extracted from a digital elevation model (DEM) with a resolution of 12.5 m (available at https://search.asf.alaska.edu/#/?zoom=3.000¢er=-97.494,39.673). Distance to river and distance to road were extracted from a topographic map with a scale of 1:50,000. Soil type and LULC in 2021 were collected from the Ministry of Natural Resources and the Environment. Rainfall was collected from the Center for Hydrometeorology and Remote Sensing at a resolution of 0.25° × 0.25°. NDBI, NDVI, and NDMI were calculated from Landsat 8 OLI imagery from March 2021. It should be noted that all these factors were transformed to a raster format with a resolution of 12.5 m.
Elevation is an essential factor in determining a region's potential for the occurrence of flooding, as water tends to concentrate in regions of low altitude (Msabi & Makonyo 2021; Luu et al. 2022). In Phu Yen province, the eastern region has a very low altitude of 0–2 m, and so this region often faces major flooding.
Slope is crucial as it exerts a significant influence on the direction and the speed of flow (Wang et al. 2023). The province of Phu Yen is characterized by a steep incline, inducing a rapid concentration of water in the coastal region. This causes major flooding.
Aspect and curvature are essential because they are directly linked to the accumulation capacity of surface water. Regions with low aspect and curvature tend to be more prone to flooding (Nachappa et al. 2020). The coastal part of Phu Yen province has low values.
Distance to river is an essential parameter. The predominant type of flooding in Vietnam in general, and the study area in particular, is fluvial in nature, where a sudden rise in river level results mainly from heavy rainfall in mountainous regions. This situation leads to flooding, which impacts the banks of the river (Mirzaei et al. 2021).
Distance to road also plays a crucial role in assessing flood susceptibility. Although roads are of major importance in supporting the evacuation of residents during a flood event, they have the effect of reducing the permeability of the soil, leading to an increase in the volume of water (Arora et al. 2021; Parajuli et al. 2023). In recent years, Phu Yen province has been identified as one of the provinces experiencing the most rapid urban growth in Vietnam. Many new roads have been built as part of this economic expansion.
LULC has direct effects on the potential for flood occurrence in the study area. Indeed, it has an influence on the infiltration capacity of soil as well as the flow speed. Floods are more frequent in urbanized areas because these areas limit the infiltration of water into the ground, leading to more surface water. Furthermore, the process of deforestation contributes to an increase in the volume of surface water, which can lead to significant flooding downstream (Chowdhuri et al. 2020; Roy et al. 2020). In addition to Phu Yen's rapid urbanization, the forested area in the province is diminishing at speed, due to urban growth, agriculture, and illegal logging.
Soil type directly influences the occurrence of flooding in a region. Different soil types allow different levels of rain infiltration into the soil, thereby affecting the generation of surface runoff. The higher the permeability of the soil, the lower the risk of flooding, and vice-versa (Priscillia et al. 2021). In Phu Yen province, there are xanthic ferralsols, cambic fluvisols, chromic luvisols, ferralic acricols, gleyic fluvisols, haplic andosols, hmuc acrisols, luvic arenosolols, orthi thionic fluvisols, plinthic acrisols, rhodic arenosolols, rhodic ferralsols, and systric fluvisols.
STI is another factor that must be considered, as riverbeds deposit a lot of sediment, reducing water storage capacity and causing flooding. In addition, large sediment deposits can reduce drainage capacity, increasing the risk of flooding (Hasanuzzaman et al. 2022; Al-Juaidi 2023).
TWI represents the drainage capacity in a region. It shows the amount of flow collected by any area, which is an important factor in estimating the capacity for flood occurrence in a region (Priscillia et al. 2021).
Rainfall is considered a factor in triaging flooding in a region, because heavy rainfall over a short period leads to flooding (Band et al. 2020). In the study area, the annual rainfall varied from 1,400 to 2,300 mm in 2021.
NDVI describes the density of vegetation in a region. A high density reduces the volume of surface water, thereby reducing the potential for flooding, and vice-versa (Soltani et al. 2021).
NDBI represents building density. Increasing concrete surface area reduces water infiltration capacity, leading to an increase in surface water volume, which can lead to significant flooding (Saha et al. 2021).
NDMI is used to assess the amount of water present in vegetation. Regions with higher NDMI values may be at a lower risk of flooding; a lower NDMI means less hydrated vegetation, which can contribute to increased flood risk (Gharakhanlou & Perez 2023).
METHODOLOGY
(i) Construction of the flood hazard index
The flood hazard index was constructed by combining flood susceptibility, flood depth, and flood velocity. Flood susceptibility was constructed using four machine learning models, namely, support vector machine (SVM), DT, ADB, and CB. Four hundred and sixty flood and non-flood points and 14 conditioning factors were used: 70% of the data to build the machine learning models and 30% to validate the models. The proposed models were developed by applying the Python platform, using tools such as TensorFlow. The model parameters are presented in Table 1. In the model construction process, the trial-and-error method was used to optimize the model parameters. In the end, the model parameters were C = 1, kernel = 'sigmoid’, gamma = 'auto’, probability = True, max_iter = 500 for the model, max_depth = 3, max_features = 'sqrt’, random_state = None for DT model; n_estimators = 100, learning_rate = 0.1, random_state = None for ADB; iterations = 100, depth = 3, loss_function = 'CrossEntropy’, eval_metric = RmseMetric() for CB. These models were evaluated using several statistical indices, namely, root-mean-square error (RMSE), mean absolute error (MAE), area under the curve (AUC), and R2. After building the machine learning model, these models were used to build the flood susceptibility map. The flood depth and flood velocity were constructed by applying the MIKE FLOOD.
(ii) Construction of flood exposure index
Models . | Parameters . |
---|---|
SVM | C = 1 Kernel = 'sigmoid’, Gamma = 'auto’, Probability = True, Max_iter = 500, Tol = 1e-6 |
DT | Criterion = 'entropy’ Max_depth = 3 Max_features = 'sqrt' |
ADB | N_estimators = 100 Learning_rate = 0.1 Algorithm = 'SAMME’ Random_state = None |
CB | Iterations = 100 Depth = 3 Loss_function = 'CrossEntropy’ Eval_metric = RmseMetric() |
Models . | Parameters . |
---|---|
SVM | C = 1 Kernel = 'sigmoid’, Gamma = 'auto’, Probability = True, Max_iter = 500, Tol = 1e-6 |
DT | Criterion = 'entropy’ Max_depth = 3 Max_features = 'sqrt' |
ADB | N_estimators = 100 Learning_rate = 0.1 Algorithm = 'SAMME’ Random_state = None |
CB | Iterations = 100 Depth = 3 Loss_function = 'CrossEntropy’ Eval_metric = RmseMetric() |
Flood is the most dangerous natural disaster in the study area. In recent years, the flood risk has increased significantly due to population growth and changing land use. Flood exposure was constructed by combining the land use and the population density for 2020. Population density in 2020 was collected from the statistical offices in all districts in the study area. Land use was collected from the Ministry of Natural Resources and Environment of Vietnam.
(iii) Construction of flood vulnerability index
The flood vulnerability map was built by combining the poverty level and road density. Several studies have highlighted that the level of vulnerability depends on the economic situation of the inhabitants. Road density plays an important role in the evacuation of population during flooding.
In this study, the poverty level in 2020 was collected from the statistical offices in all districts in the study area. Road density was extracted from a topographic map with a scale of 1:50,000, available from the Ministry of Natural Resources and Environment of Vietnam.
(iv) Construction of flood risk map
Flood risk was constructed by combining flood hazard, flood exposure, and flood vulnerability. Risk level was then divided into five classes (very low, low, moderate, high, and very high) using the natural break method. This method is mainly based on Jenk's optimization algorithm to reduce the variability of each category. It has the advantage of automatically defining final classes and better-highlighting disparities.
Machine learning
Support vector machines
SVMs are popular and powerful machine learning algorithms that are widely used for classification and regression tasks. The original algorithm was introduced in the study by Cortes & Vapnik (1995). It is known for its ability to handle both linear and nonlinear data by finding an optimal hyperplane or decision boundary that separates different classes or predicts continuous values (Tehrany et al. 2014). At their core, SVMs work by mapping the input data into a higher-dimensional feature space, where it becomes easier to find a hyperplane that maximally separates the classes. The algorithm aims to find the hyperplane that has the largest margin, which is the distance between the hyperplane and the nearest data points of each class. This margin maximization approach helps SVMs achieve good generalization and robustness against noise (Youssef et al. 2022). One of the key advantages of SVMs is their ability to handle high-dimensional data efficiently. They are particularly useful when the number of features is larger than the number of samples, as it avoids the problem of dimensionality. SVMs also have a solid theoretical foundation, which guarantees their performance and generalization ability. The performance of SVMs depends on the choice of kernel function and hyperparameter values. SVMs have been used in research in various fields, including hydrology (Deka 2014), bioinformatics (Byvatov & Schneider 2003), chemistry (Ivanciuc 2007), and computer science (Ganapathiraju et al. 2004). An SVM has regularly been selected as the benchmark machine learning algorithm with which to evaluate other new machine learning algorithms.
The SVM model is very effective in high dimensions and is also effective where the dimension of the space is larger than the number of training samples (Lin et al. 2013; Deka 2014). An SVM was used in this study for this reason.
Decision trees
A DT is a supervised machine learning algorithm that can be used to solve classification and regression tasks. The operation of DTs is very simple and is based mainly on the decisions located at the leaves of the tree (at the ends of each branch; (Tehrany et al. 2015a, 2015b)). These decisions can be adjusted based on the selections made at each node, taking into account the previous rule (Tien Bui et al. 2012). The DT structure has three parts: root node, leaf node, and branches. The tree always starts with a specific decision, represented by a box, called the root node. Each leaf node illustrates a decision that must be made. Each branch is a path towards the final result. DTs are non-parametric, requiring little data pre-processing, and are easy to both interpret and train (Pradhan 2013).
The DT model has the advantage of working well with multiple output data. It uses a white box model, which makes the results easy to explain. This model tends to be accurate, even if the assumptions of the source data are not met (Khosravi et al. 2018). This is why this model was selected to construct the flood susceptibility map in this study.
Adaboost
ADB is a boosting algorithm that was proposed by Freund & Schapire (1997). It can be used to solve classification and regression tasks. It is used to improve the accuracy of binary classification models by combining multiple weak classifications. Different classifiers are weighted in such a way that, for each prediction, the classifiers which predicted correctly will have a stronger weight than those whose prediction was incorrect (Wu et al. 2020). During training, model numbers are continually updated, allowing the model to focus on difficult or misclassified classifications, thereby improving the classifiers' prediction ability. The final model is the set of all classifications, which are calculated based on their accuracies to generate strong classifications. The ADB model is a powerful and efficient algorithm that has been applied in multiple domains. It is particularly effective at handling large nonlinear datasets (Tien Bui et al. 2016). In this study, ADB was used to address the problem of susceptibility to flooding, considered as a nonlinear problem. It has significant advantages in terms of precision and adaptability (Liu et al. 2017).
Catboost
CB is a powerful machine learning algorithm that can solve both the classification and regression problems. It was introduced by Prokhorenkova et al. (2018). The initial objective of CB was the combination of weak classifications to establish more powerful classifications. The algorithm works through the iterative process of adding decision trees to the set, with subsequent trees adjusted to correct any errors that previous trees may have encountered. At each iteration, the CB algorithm calculates the value of the negative slope of the loss function for the current forecast. It then uses this gradient value to improve the accuracy of subsequent predictions by incorporating an adjusted version of the gradient (Saber et al. 2022).
During the DT construction process, CB uses a gradient optimization technique, whereby DTs are associated with a negative gradient of the loss function, allowing CB to focus on important regions of the feature space. This helps improve the accuracy of the model. The CB model was selected to build the flood susceptibility map primarily as it is considered one of the algorithms to best solve the nonlinear problem. CB produces models with optimal precision (Van Phong et al. 2023; Wang & Qian 2023).
Hydrodynamic modelling
To assess flooding in the study area, we utilized the MIKE Powered by DHI suite. MIKE 11, a one-dimensional hydraulic model, calculates river flow using an implicit finite difference approach (DHI 2018). Key parameters, such as the Manning roughness coefficient (representing bed and channel friction), were adjusted based on comparisons with observed values (water level, discharge, and velocity) at monitoring stations. In addition, the roughness coefficient is adjusted for each area (or river section) with different land cover conditions. The roughness coefficient value found will represent both the period of calculation for calibration and model verification, as well as stability for use in calculations for simulation scenarios.
For floodplain characteristics (water level, inundation depth, velocity, and discharge), we employed MIKE 21 FM, a two-dimensional hydraulic model. MIKE 21 FM uses a finite element approach to solve fluid dynamics equations, discretizing spatial deviations with the finite volume method (Nigussie & Altunkaynak 2019). The model's spatial domain is divided into non-overlapping elements using an unstructured mesh of arbitrarily shaped polygons and triangles. Calibration parameters, such as bed resistance and eddy viscosity coefficients, are determined based on observational data.
MIKE FLOOD integrates MIKE 11 and MIKE 21 FM to simulate interactions among rivers, rainfall, and overflows. By leveraging the strengths of both one- and two-dimensional models, MIKE FLOOD overcomes limitations in resolution and accuracy seen when using MIKE 11 or MIKE 21 alone. Widely applied in Vietnam's river basins, the MIKE FLOOD model offers comprehensive insights (Nguyen et al. 2022, 2023).
Meteorological, hydrological, and topographic datasets, essential for computational purposes, were systematically acquired in the lower reaches of the Ba River. Specifically, these encompassed discharge data sourced from the Cung Son station, as well as rainfall and water level measurements obtained from the Phu Lam station. The dataset spanned the temporal range from 2009 to 2022 and originated from the General Department of Meteorology and Hydrology. Complementing these meteorological and hydrological datasets, detailed channel cross-sectional topography data and a DEM with a fine-grained resolution of 12.5 × 12.5 m were accessed via the Ministry of Natural Resources and Environment.
The MIKE 11 model was employed to simulate the dynamic hydraulic regime in the downstream region of the Ba River. The hydraulic network comprises the Ba River, spanning a length of 49.9 km from the Cung Son station to the Da Dien estuary and featuring 24 cross-sections, as well as the Ban Thach River, extending 39.5 km from the My Lam reservoir to the Da Nong estuary and encompassing 45 cross-sections. Two tributaries connecting the Ba and Ban Thach Rivers have a combined length of 32.3 km with 31 cross-sections. The upper boundary of the hydraulic network consists of hourly discharge values at the Cung Son hydrological station and outflow from the My Lam reservoir. For the lower boundary, offshore tidal levels were incorporated from the Global Tidal Model using the MIKE Zero Toolbox. These tidal levels were specifically derived for two geographical coordinates (longitude and latitude) corresponding to the locations selected for the lower boundary in the MIKE 11 model.
Water levels and discharge in the rivers simulated by the MIKE 11 model were coupled with the flood flow calculated from the hydraulic simulations of MIKE 21 through lateral links incorporated within the MIKE FLOOD model.
Analytic hierarchy process
In this study, the AHP was used to determine indicator weights (wi), reflecting the importance of each indicator. As an example, each component of flood risk is made up of several indices, such as flood hazard (flood depth, flood speed, and flood susceptibility), flood exposure (population density, LULC, and distance to river), and vulnerability to flooding (level of poverty and road density). AHP uses the pairwise comparison process to evaluate the level of importance of each index relative to the others (Chen et al. 2011; Radwan et al. 2019).
The determination of the weights of each index comprised three main stages (Tables 2 and 3):
(i) Construction of the comparison matrix between the indices using the Saaty ranking scale, assigning values from 1 to 9 to each index.
(ii) Determining the weight of each index by estimating the values of the matrix.
(iii) Calculation of the consistency index (CI) and consistency ratio (CR) according to the following equation: CR = CI/RI, where RI represents the random index and CI = (λmax − n)/(n − 1), with λmax being the largest eigenvalue.
Hazard . | Flood depth . | Velocity . | Flood susceptibility . | Average . |
---|---|---|---|---|
Flood depth | 0.63 | 0.66 | 0.57 | 0.62 |
Velocity | 0.2 | 0.22 | 0.28 | 0.23 |
Flood susceptibility | 0.15 | 0.05 | 0.14 | 0.11 |
Exposure . | Population density . | LULC . | Distance to river . | Average . |
Population density | 0.65 | 0.69 | 0.55 | 0.63 |
LULC | 0.21 | 0.23 | 0.33 | 0.25 |
Distance to river | 0.13 | 0.07 | 0.11 | 0.1 |
Hazard . | Flood depth . | Velocity . | Flood susceptibility . | Average . |
---|---|---|---|---|
Flood depth | 0.63 | 0.66 | 0.57 | 0.62 |
Velocity | 0.2 | 0.22 | 0.28 | 0.23 |
Flood susceptibility | 0.15 | 0.05 | 0.14 | 0.11 |
Exposure . | Population density . | LULC . | Distance to river . | Average . |
Population density | 0.65 | 0.69 | 0.55 | 0.63 |
LULC | 0.21 | 0.23 | 0.33 | 0.25 |
Distance to river | 0.13 | 0.07 | 0.11 | 0.1 |
Vulnerability . | Poverty level . | Road density . | Average . |
---|---|---|---|
Poverty level | 0.75 | 0.75 | 0.75 |
Road density | 0.24 | 0.25 | 0.25 |
Vulnerability . | Poverty level . | Road density . | Average . |
---|---|---|---|
Poverty level | 0.75 | 0.75 | 0.75 |
Road density | 0.24 | 0.25 | 0.25 |
Flood hazard = 0.62 × flood depth + 0.23 × velocity + 0.11 flood susceptibility.
Flood exposure = 0.63 × population density + 0.25 × LULC + 0.1 × distance to river.
Vulnerability = 0.752 × poverty + 0.25 × road density.
Flood risk = flood hazard × flood exposure × flood vulnerability.
Model assessment.
RESULTS
Flood hazard
Hydrodynamic model evaluation
To calibrate and validate the one- and two-dimensional hydraulic models for the study area, the hydrometeorological datasets from the October 1993 and November 2020 flood events were used to adjust the Manning roughness coefficient and viscosity coefficient. The Nash–Sutcliffe efficiency (NSE) index and the peak error event, representing the difference between simulated and observed highest water levels, were employed to evaluate the model's performance at the Phu Lam hydrological station.
It should be noted that the performance of the hydrodynamic model was evaluated by the NSE index, and this index was compared to the study by Moriasi et al. (2015). So, with the NSE value of 0.95, the hydrodynamic model during the October 1993 flood event performed well, while with the NSE value of 0.79, the model for the 2020 flood event was satisfactory.
Based on the calibration and validation results, it is apparent that the established hydraulic model has a good ability to simulate flood events within the research area. Therefore, the hydraulic model ensemble can be reliably employed for forecasting or simulating various flood scenarios.
. | 0.5 . | 1 . | 1.5 . | 2 . | 2.5 . | 3 . | > 3 . |
---|---|---|---|---|---|---|---|
Area (km2) | 37.7144 | 44.6572 | 39.6708 | 33.962 | 33.7196 | 21.4308 | 50.7568 |
. | 0.5 . | 1 . | 1.5 . | 2 . | 2.5 . | 3 . | > 3 . |
---|---|---|---|---|---|---|---|
Area (km2) | 37.7144 | 44.6572 | 39.6708 | 33.962 | 33.7196 | 21.4308 | 50.7568 |
In general, flooding above 3 m was mainly found along the river in Phu Hoa and Tay Hoa districts, as well as in a small part of Dong Hoa district. The entire mountainous area and coastal sand dunes, extending from the south of Da Dien estuary of Ba River to the north of Da Nong estuary of Thach Ban River, remained unaffected by flooding.
. | 0.29 m/s . | 0.29–0.6 . | 0.6–1 . | 1–1.5 . | 1.5–2.5 . | 2.5–5 . | > 5 . |
---|---|---|---|---|---|---|---|
Area (km2) | 87.3976 | 86.7652 | 47.9752 | 17.2984 | 3.7896 | 0.8432 | 0.0916 |
. | 0.29 m/s . | 0.29–0.6 . | 0.6–1 . | 1–1.5 . | 1.5–2.5 . | 2.5–5 . | > 5 . |
---|---|---|---|---|---|---|---|
Area (km2) | 87.3976 | 86.7652 | 47.9752 | 17.2984 | 3.7896 | 0.8432 | 0.0916 |
Flood susceptibility assessment maps
Important factors used to build the flood susceptibility maps
The selection of appropriate conditioning factors plays an important role in estimating the potential for flooding, as each region has different environmental, climatic, hydrological, and anthropogenic characteristics that influence the likelihood of a flood event occurring. Furthermore, there are no general conclusions regarding the best conditioning factors to select. Therefore, each study uses different techniques to assess the importance of conditioning factors before using the selected factors to construct the flood susceptibility model. In this study, each factor was assigned weights based on the analysis of the relationship between past occurrences of floods and the conditioning factors (Nguyen et al. 2022). This value was calculated as a percentage and normalized from 0 to 1. In this study, RF was used to select these factors.
The results showed that elevation was the most important factor in the occurrence of flooding in Phu Yen province, with an importance value of evaluation by the RF method of 0.52. This applies not only to Phu Yen province but also to the entire central region of Vietnam, as has been shown by previous studies (Nguyen et al. 2023; Vu et al. 2023).
Flooding occurs in regions of low altitude and slope. This is why slope was the second-most important factor, with an importance value of evaluation by the RF method of 0.32. In the alluvial areas along the Ba River and Thach Ban River, including the Tuy Hoa and Dong Hoa Delta, the terrain is low, and the slope is gentle, resulting in profound flooding and high flow speeds. LULC was the third most important factor, with an importance value of evaluation by the RF method of 0.25. In Phu Yen province, in recent years, with increasingly rapid urban growth, a large amount of agricultural land has been given over to construction; this is one of the causes of increasing the flood risk in the study area, as previous studies have shown (Liu et al. 2023; Wang et al. 2023).
Model comparison and validation
For the validation data, the ADB model outperformed other models in terms of accuracy (AUC = 0.999), followed by CB (0.998), SVM (0.97), and DT (0.91).
In this study, in addition to the AUC value, the RMSE, MAE, and coefficient of determination (R2) indices were used to evaluate the accuracy of the flood susceptibility model. These indices have been widely used in previous studies (Wu et al. 2020). In terms of training data, the ADB model performed best in terms of RMSE (0.25), MAE (0.13), and R2 (0.74). Then came CB (RMSE = 0.3, MAE = 0.18, R2 = 0.63), SVM (RMSE = 0.31, MAE = 0.31, R2 = 0.59), and DT (RMSE = 0.36, MAE = 0.36, R2 = 0.44). In terms of validation data, the ADB model outperformed the other models (RMSE = 0.25, MAE = 0.13, R2 = 0.73), followed by CB (RMSE = 0.32, MAE = 0.19, R2 = 0.56), SVM (RMSE = 0.31, MAE = 0.31, and R2 = 0.55), and DT (RMSE = 0.37, MAE = 0.36, R2 = 0.44; Table 6).
. | Training dataset . | Validating dataset . | ||||||
---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | AUC . | R2 . | RMSE . | MAE . | AUC . | R2 . | |
ADB | 0.250149 | 0.132994 | 0.999902 | 0.745774 | 0.253984 | 0.138348 | 0.999937 | 0.738055 |
CB | 0.301492 | 0.181795 | 0.999189 | 0.630704 | 0.32562 | 0.1931 | 0.999875 | 0.569454 |
SVM | 0.314371 | 0.311693 | 0.976842 | 0.598478 | 0.315105 | 0.312023 | 0.976908 | 0.55 |
DT | 0.369347 | 0.368437 | 0.93686 | 0.445768 | 0.370429 | 0.369377 | 0.917032 | 0.442805 |
. | Training dataset . | Validating dataset . | ||||||
---|---|---|---|---|---|---|---|---|
RMSE . | MAE . | AUC . | R2 . | RMSE . | MAE . | AUC . | R2 . | |
ADB | 0.250149 | 0.132994 | 0.999902 | 0.745774 | 0.253984 | 0.138348 | 0.999937 | 0.738055 |
CB | 0.301492 | 0.181795 | 0.999189 | 0.630704 | 0.32562 | 0.1931 | 0.999875 | 0.569454 |
SVM | 0.314371 | 0.311693 | 0.976842 | 0.598478 | 0.315105 | 0.312023 | 0.976908 | 0.55 |
DT | 0.369347 | 0.368437 | 0.93686 | 0.445768 | 0.370429 | 0.369377 | 0.917032 | 0.442805 |
In general, the ADB model performed better than the other models. So, it was used to construct the flood susceptibility map.
Flood susceptibility maps
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
Area (km2) | 1170.676 | 185.0089 | 319.2173 | 310.1428 | 821.3406 |
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
Area (km2) | 1170.676 | 185.0089 | 319.2173 | 310.1428 | 821.3406 |
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
Area (km2) | 74.9572 | 75.5 | 57.5368 | 29.8984 | 6.0696 |
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
Area (km2) | 74.9572 | 75.5 | 57.5368 | 29.8984 | 6.0696 |
Although the Tuy Hoa Delta has high and very high food susceptibility, flood depths and velocities are in the very low to moderate range. While the weight of flood susceptibility is low relative to the depth and speed of flooding when assessing flood hazard, the Tuy Hoa Delta has a very low to moderate level of flood hazard.
Flood exposure
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
Area (km2) | 2146.398 | 413.3784 | 237.2958 | 39.70891 | 4.848906 |
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
Area (km2) | 2146.398 | 413.3784 | 237.2958 | 39.70891 | 4.848906 |
Flood vulnerability
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
Area (km2) | 537.9747 | 1083.861 | 632.6531 | 112.0234 | 475.4303 |
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
Area (km2) | 537.9747 | 1083.861 | 632.6531 | 112.0234 | 475.4303 |
Flood risk
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
Area (km2) | 64.5816 | 84.5852 | 63.9624 | 25.122 | 5.7104 |
. | Very low . | Low . | Moderate . | High . | Very high . |
---|---|---|---|---|---|
Area (km2) | 64.5816 | 84.5852 | 63.9624 | 25.122 | 5.7104 |
DISCUSSION
Flooding is a major threat to human life, infrastructure, and economic development in countries around the world in general and Vietnam in particular (Tran et al. 2008; Nguyen et al. 2021b, 2023a). The effects of flooding are predicted to increase in the future due to climate change and urban growth (Bouwer et al. 2010). Therefore, flood risk assessment is an important step to support decision-makers in establishing effective protection activities to reduce flood-related damage. The objective of this study was to construct a theoretical framework to assess flood risk by integrating machine learning and hydrodynamic modelling. The main contribution of this study was to propose a theoretical framework with high precision to map or distribute the spatial flood risk for long-term decisions.
It is necessary to ‘get out of the water box’ in the management of water resources. That is to say, integrated flood risk management explicitly recognizes the relationships between certain risk management measures, and they are placed in the context of changing socioeconomic conditions (Gober 2013; Egan & de Loë 2020). Despite their obvious relevance, the relationship between the physical and socioeconomic dimensions of flood risk is rarely explored in detail (Waghwala & Agnihotri 2019). A technical, industry-based approach to risk management is preferable. To go beyond technocratic approaches, this study constructs a theoretical framework to assess integrated flood risk, which is widely developed in previous studies, which have integrated several socioeconomic and ecological indices linked to flood risk. This theoretical framework was applied downstream of the Ba River in the province of Phu Yen, which is often affected by flooding. Flood risk in the study area is predicted to become increasingly severe in the future due to climate change and urban growth.
The AHP was used to analyse flood risk. The model has several advantages, such as the systematization of criteria and subcriteria, as well as suitability for integration with GIS (Luu et al. 2020). In this study, three indices (flood depth, flood speed, and flood susceptibility) were used to develop the flood risk map. We consider this study to be the first to integrate flood susceptibility maps to construct flood risk maps. In addition, flood exposure and flood vulnerability were assessed by considering factors such as LULC, population density, and socioeconomic conditions, and their relationship with flood risk. The combination of high hazard (flood depth) and high exposure (population density and infrastructure development) and low vulnerability (poverty level) explains the high flood risk in the study area. Based on the planning report of Phu Yen for 2030, the local government sees Tuy Hoa town as the main driving force, and the coastal economic plan is expected to contribute about 75% of the total economic potential of the province. The project aims to build an urban centre for intercontinental maritime tourism via Tuy Hoa Airport. This airport is not only a main point of movement of goods but also a training ground for human resources and an important transit centre for economic and residential activities through the north–south road and rail system. In addition, the region has also been earmarked for new industrial and high-tech zones, thus creating favourable conditions for international economic development, leading to increasingly strong urban growth.
Urban growth is hampered by the scarcity of land outside the flood zone. Nevertheless, despite the risk inherent to the site, the urbanization of the study area continues with a resistance strategy based on embankments. These structures limit the expansion of floods, thus transforming protected land into urbanizable areas (Nguyen 2022). With the process of increasingly strong urban growth, traditional land uses such as croplands have been replaced by urban surfaces, despite traditional land uses often being more resilient in flood-prone areas. This change also leads to environmental modifications such as the imminent disappearance of mangroves and ponds, when wetlands could play a significant, natural role in reducing the risk of flooding (Schanze 2006; Klijn et al. 2015).
Although this study managed to successfully construct a theoretical framework for assessing flood risk, three main issues require resolution. First, the components and subcomponents of flood risk were weighted using the AHP method based on the authors' experience and the scientific literature. Although the hydrodynamic model and machine learning present reliable results with over 80% accuracy, the weights are assigned to flood depth, flood velocity, and flood susceptibility in a subjective way. Likewise, this subjectivity is found in the flood exposure map and in flood vulnerability. Second, in the LULC map, the weight assigned to agricultural land is uniform, when agricultural losses strongly depend on the type of agriculture (crop type, growth stage, etc.) and flood characteristics, thus introducing uncertainty when using the AHP. Finally, the development and construction of buildings and infrastructure can reduce poverty and increase the capacity to support populations after floods, leading to reduced flood risks. However, this approach is less common in Vietnam.
In the central region of Vietnam, tropical depressions and flooding often have effects over a large region that influences livelihoods and human life (Vu & Ranzi 2017; Luu et al. 2018; Nguyen et al. 2023). However, flood preparedness, mitigation, and response actions have not yet received due attention. It is considered a major limitation that needs to be resolved. The results of this study can act as a reference for decision-makers to build effective protection actions, particularly in regions with high and very high flood risk.
Although this study was successful in establishing a theoretical framework for assessing flood risk, it encountered limitations linked to the use of flood hazard, flood exposure, and flood vulnerability at a time which does not change in time and space. Future adaptation measures, such as changing residents' perceptions of flooding or the construction of flood mitigation procedures, can reduce the risk of flooding. In addition, improving these measures may lead to development in the flood zone, requiring a more interactive approach between humans and water to assess flood risk. Ultimately, the flood risk will be strongly influenced by climate change and socioeconomic growth. In this study, we used the DEM produced by ALOS PALSAR with a resolution of 12.5 m as input data for constructing the flood susceptibility model. This DEM represents only the surface terrain and does not include detailed information about infrastructure such as buildings or vegetation. In contrast, DEMs produced by LIDAR or UAV provide more detailed information about surface features. However, the use of these DEMs is costly, limiting their application in large regions.
CONCLUSION
This study developed a theoretical framework based on the integration of machine learning, hydrodynamic modelling, and the AHP to assess the flood risk downstream of the Ba River in Phu Yen province. A flood risk theoretical framework was constructed by combining flood hazard, flood exposure, and flood vulnerability. The findings are summarized as follows:
– This study was successful in assessing flood risk by integrating machine learning, hydraulic modelling, and AHP methods. Therefore, they can be used to assess flood risk in the study area to support decision-makers in establishing effective protection measures to reduce future damage.
– The machine learning models were successful in constructing the flood susceptibility map with high accuracy. All proposed models present an AUC value of plus 0.9. Among them, the ADB model had better performance with an AUC value of 0.99, RMSE of 0.25, MAE of 0.13, and R2 of 0.73. For the hydraulic modelling, the value of NSE was more than 0.95 to simulate the flood event in 1993 and 0.79 for the event of 2020.
– The flood risk in the study area is mainly linked to flood depth, population density, urban growth, and poverty level.
– The areas along the river and on the coast are most affected by the risk of heavy and very heavy flooding. About 64.5 km2 of the study area is classified as very low, 84.5 km2 as low, 63.9 km2 as moderate, 25 km2 as high, and 5.7 km2 as very high.
The flood risk map can provide useful information for decision-makers in establishing risk mitigation measures such as sustainable land use planning. The theoretical framework in this study must also be verified in other parts of the world that are often affected by flooding to justify the feasibility of the methodology. It should be noted that all models used in this study were freely available, which reinforces the potential for reproducibility of this theoretical framework in the future. In the future, we will try to evaluate the flood risk with the differences in climate change and land use change scenarios.
AUTHOR CONTRIBUTIONS STATEMENT
All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Huu Duy Nguyen and Dinh Kha Dang. The first draft of the manuscript was written by Huu Duy Nguyen and Quan-Hai Truong. All authors read and approved the final manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.