Abstract
Flood damage is becoming increasingly severe in the context of climate change and changes in land use. Assessing the effects of these changes on floods is important, to help decision-makers and local authorities understand the causes of worsening floods and propose appropriate measures. The objective of this study was to evaluate the effects of climate and land use change on flood susceptibility in Thua Thien Hue province, Vietnam, using machine learning techniques (support vector machine (SVM) and random forest (RF)) and remote sensing. The machine learning models used a flood inventory including 1,864 flood locations and 11 conditional factors in 2017 and 2021, as the input data. The predictive capacity of the proposed models was assessed using the area under the curve (AUC), the root mean square error (RMSE), and the mean absolute error (MAE). Both proposed models were successful, with AUC values exceeding 0.95 in predicting the effects of climate and land use change on flood susceptibility. The RF model, with AUC = 0.98, outperformed the SVM model (AUC = 0.97). The areas most susceptible to flooding increased between 2017 and 2021 due to increased built-up area.
HIGHLIGHTS
Machine learning algorithms were applied for flood susceptibility modeling.
The RF model had the highest AUC value (0.98).
The areas highly flood susceptibility increased between 2017 and 2021.
INTRODUCTION
Floods affect infrastructure and the environment, not just the social economy (Khosravi et al. 2019; Islam et al. 2021). Between 1995 and 2015, approximately 150,000 flood events happened worldwide and caused around 157,000 deaths (Islam et al. 2021). Several studies point out that flooding affects approximately 250 million people and cause USD 40 billion in losses each year worldwide (Nguyen 2022a; Nguyen et al. 2022b).
Fluvial, flash, coastal, and urban floods can be classified on the nature of their occurrence. Floods are one of the most devastating natural events, causing damage to human life and infrastructure due to their onset attributes and rapid flow speeds (Islam et al. 2021). In developing countries, flood damage often has more impact than in developing countries due to climate change (Janizadeh et al. 2021). Climate change can lead to weather conditions outside predicted capacity: changes in precipitation and/or heat wave frequency and increases in the number and intensity of extreme weather events such as typhoons or floods (Janizadeh et al. 2021). Flood frequency change may be caused by changes in climatic conditions – e.g., precipitation and/or temperature – and may be apparent in low-lying, riverside, coastal, and urban areas. Flooding occurs not only under extreme climatic conditions but also through the effects of changing ecosystems. In the global scenario, human activities such as deforestation, dam construction, population, and/or urban growth can all change flood intensity (Junger et al. 2022; Pal et al. 2022; Roy et al. 2022). Since the ‘Doi Moi (Innovation)’ renewal policy in 1986, Vietnam has embarked on sustained urban growth. This occurs more rapidly in coastal regions, where it is often affected by typhoons or flooding. Therefore, improving flood risk assessment using state-of-the-art methods can reduce flood risk (Nguyen et al. 2022a).
Four factors, namely topographic, climatic, hydrologic, and human activity cause floods. Topographic factors include elevation, slope, curvature, and aspect, which are relatively stable (Ghosh et al. 2022; Prasad et al. 2022). Climatic and hydrologic factors, precipitation, and drainage networks are dynamic factors because they change every year. Chen et al. (2022) report that climate change, particularly the modification of precipitation, significantly influences the probability of flooding occurring and the speed of flow. At the same time, human activity like land use and land cover change (LULC), road distance, and distance to settlement are considered strong dynamic factors, especially in LULC. Several studies note that human activity can lead to changes in the flow and speed of floods, with the risk of flooding becoming more severe. Deforestation can influence the mechanical disturbance capacity of soils. Changes in the type of land cover have an important effect on the soil's infiltration capacity (Nguyen et al. 2022b; Sugianto et al. 2022). All of this can influence the likelihood of flood occurrence, so assessing the effects of climate change and LULC on flood susceptibility is a critical task in understanding the causes of change in flood susceptibility, which can support decision-makers in proposing appropriate measures to reduce flood damage.
Previous studies using different methods to assess the effects of climate change and LULC on flood susceptibility can be grouped into four main approaches: physical models, remote sensing, and statistical and machine learning models. Physical models include Mike Flood (Mangukiya & Yadav 2022), HEC-RAS (Afzal et al. 2022; Shah et al. 2022). Soil and Water Assessment Tool (SWAT) (Liu et al. 2022) is considered promising as it can simulate floods with details such as flow velocity, flood depth and duration. However, physical models require deep understanding of hydrologic processes to adjust parameters, which poses challenges for researchers in model development. These models also require detailed in situ data, limiting their application to data-rich regions. Over past decades, remote sensing data, including synthetic aperture radar (SAR) data and optical sensor images combined with GIS, have been used to assess floods (Dewan et al. 2006; Alarifi et al. 2022). These methods have been used to build flood occurrence lists. Many different studies use remote sensing databases to assess floods hazards, combined with statistical techniques such as frequency ratio (FR), logistic regression (LR), weights of evidence (WoE model), fuzzy logic (FL) (Nandi et al. 2016; Rahmati et al. 2016; Sepehri et al. 2020; Ullah & Zhang 2020). These techniques can process spatial data to predict the likelihood of flood occurrence with high accuracy.
Several researchers have applied machine learning to develop flood susceptibility models. These include artificial neural networks (ANNs), random forest (RF), support vector machine (SVM), LR, and decision trees (DT) (Tehrany et al. 2014; Al-Juaidi et al. 2018; Chen et al. 2020; Khoirunisa et al. 2021; Wang et al. 2021). However, flood susceptibility model development using machine learning has several challenges, like selecting appropriately from the hundreds of different models and each model providing different results. Therefore, testing the models to assess flood susceptibility with different regions and types of natural hazards is recommended.
This study's objective was to test and compare machine learning models in assessing flood sensitivity in the context of climate change and land use in Thua Thien Hue province, Vietnam. This is the first time flooding has been assessed in relation to climate change and LULC in the province, which is one of those most affected by climate change and LULC in Vietnam. The key contributions of this study are: (i) selection of appropriate models to determine flood susceptibility zones; (ii) construction of a flood susceptibility map at different periods and (iii) evaluation of the effects of climate change and LULC on flood susceptibility.
DATA USE AND METHODS
Study area
The study area has a dense hydrologic system, comprising the rivers Perfume, Bo, and Ong Lau. The province also has two lagoons, Tam Giang – Cau Hai and Lang Co. The study area's climate has two main seasons: rainy and dry. The rainy season, from September to December, accounts for 70% of annual precipitation. Rainfall during this season is often brief but intense, leading to floods that can cause significant socio-economic and environmental damage.
Thua Thien Hue province is frequently affected by storms. According to the Ministry of Natural Resources and Environment, the study area is impacted by approximately 30% of the storms that occur in the country.
Flood inventory
Flood inventory maps are vital in developing flood susceptibility models. They present records of past flood events, and can be used to predict the probability of flood occurrence in future by evaluating the relationships between past events and conditioning factors (Ghosh et al. 2022; Yaseen et al. 2022). In this study a flood inventory map was constructed using flood marks measured in 2020 and 2021. Sentinel 1A images from October 16, 2020 were also used to determine flood points, to ensure the flood susceptibility model's reliability. In total, 986 flood points were collected to build the model. Because the proposed model is classified as binary, non-flooding points must also be collected, for example, in the high elevation and slope regions, never affected by floods.
In the end, 1,864 flood and non-flood points were divided into two groups: 70% as training data to train the models and 30% as validation data to evaluate the models.
Conditioning factors
Variable . | Meaning . | Source . |
---|---|---|
Elevation | The elevation of an area directly affects its ability to accumulate water, and there is an inverse relation between elevation and flooding. Elevations in the province are between 0 and 1,819 m. | Choubin et al. (2019), Nachappa et al. (2020) |
Slope | Slope controls the flow speed and capacity to accumulate water. In the study area, the slope varies from 0° to 73°. | Yariyan et al. (2020) |
Aspect | Aspect influences flow rate and soil moisture directly, and the occurrence of flooding indirectly. The aspect varies from 0° to 360° | Mojaddadi et al. (2017) |
Curvature | Curvature has an essential influence on a region's heterogeneity. The curvature varies from −48 to 43. | Mirzaei et al. (2021) |
NDVI | NDVI measures the area's vegetation density. Flooding affects regions with low vegetation density more because it influences soil infiltration strongly. In the study area, NDVI varied from −0.16 to 0.19 in 2021 and −0.24 to 0.81 in 2017. | Arabameri et al. (2022) |
NDBI | NDBI represents a region's building density and is vital in flood susceptibility modeling. Regions with high construction density have low soil permeability, which influences flood occurrence strongly. In this study area, NDBI ranged from 0.3 to 0.58 in 2021 and −0.4 to 0.34 in 2017. | Vilasan & Kapse (2022) |
Land use | Each type of land use has a different infiltration capacity, and the capacity of agricultural or forest surfaces is higher than construction surfaces. The extent of construction in the study area increased ever faster from 2017 to 2021, and is one of the causes of the increased flood risk. | Paule-Mercado et al. (2017) |
Distance to river | The floods in the study area are fluvial, so areas near the river tend to be flooded more frequently. Distance to the river ranges from 0 to 3,184 m in the Thua Thien Hue province. | Vojtek & Vojteková (2019) |
Distance to road | Distance to road has crucial effects on water infiltration capacity. In the study area, national road 1A was critical and led to flooding due to the obstruction of floodwater drainage to the sea. Distance from the road ranges from 0 to 13,818 m. | Nachappa et al. (2020) |
Rainfall | Heavy rain in a short time leads to high water levels, with the low ground and dense hydrologic network causing flooding. Thua Thien Hue province has the highest mean annual precipitation in Vietnam. Rainfall ranged from 1,032 to 2,032 mm in 2021 and from 1,689 to 2,999 mm in 2017. | Saravanan & Abijith (2022) |
TWI | TWI represents soil moisture content. In Thua Thien Hue province, the eastern region has the highest TWI | Nguyen (2022a) |
Variable . | Meaning . | Source . |
---|---|---|
Elevation | The elevation of an area directly affects its ability to accumulate water, and there is an inverse relation between elevation and flooding. Elevations in the province are between 0 and 1,819 m. | Choubin et al. (2019), Nachappa et al. (2020) |
Slope | Slope controls the flow speed and capacity to accumulate water. In the study area, the slope varies from 0° to 73°. | Yariyan et al. (2020) |
Aspect | Aspect influences flow rate and soil moisture directly, and the occurrence of flooding indirectly. The aspect varies from 0° to 360° | Mojaddadi et al. (2017) |
Curvature | Curvature has an essential influence on a region's heterogeneity. The curvature varies from −48 to 43. | Mirzaei et al. (2021) |
NDVI | NDVI measures the area's vegetation density. Flooding affects regions with low vegetation density more because it influences soil infiltration strongly. In the study area, NDVI varied from −0.16 to 0.19 in 2021 and −0.24 to 0.81 in 2017. | Arabameri et al. (2022) |
NDBI | NDBI represents a region's building density and is vital in flood susceptibility modeling. Regions with high construction density have low soil permeability, which influences flood occurrence strongly. In this study area, NDBI ranged from 0.3 to 0.58 in 2021 and −0.4 to 0.34 in 2017. | Vilasan & Kapse (2022) |
Land use | Each type of land use has a different infiltration capacity, and the capacity of agricultural or forest surfaces is higher than construction surfaces. The extent of construction in the study area increased ever faster from 2017 to 2021, and is one of the causes of the increased flood risk. | Paule-Mercado et al. (2017) |
Distance to river | The floods in the study area are fluvial, so areas near the river tend to be flooded more frequently. Distance to the river ranges from 0 to 3,184 m in the Thua Thien Hue province. | Vojtek & Vojteková (2019) |
Distance to road | Distance to road has crucial effects on water infiltration capacity. In the study area, national road 1A was critical and led to flooding due to the obstruction of floodwater drainage to the sea. Distance from the road ranges from 0 to 13,818 m. | Nachappa et al. (2020) |
Rainfall | Heavy rain in a short time leads to high water levels, with the low ground and dense hydrologic network causing flooding. Thua Thien Hue province has the highest mean annual precipitation in Vietnam. Rainfall ranged from 1,032 to 2,032 mm in 2021 and from 1,689 to 2,999 mm in 2017. | Saravanan & Abijith (2022) |
TWI | TWI represents soil moisture content. In Thua Thien Hue province, the eastern region has the highest TWI | Nguyen (2022a) |
Elevation, aspect, curvature, slope, and TWI were calculated using a DEM generated from the 1:50,000 scale topographic map (available from Vietnam Department of Survey, Mapping and Geographic information (https://www.bandovn.vn/en/topographic-maps-7)). NDVI and NDBI in 2017 and 2021 were calculated from Landsat 08 images. Land use in 2017 and 2021 were downloaded from https://www.arcgis.com. Distance to river and distance to road were extracted from the 1:50,000 scale topographic map using Line Density in Arcgis 10.6. Rainfall in 2017 and 2021 were downloaded from https://chrsdata .eng.uci.edu/. (All factors were transformed into raster format at 10 m resolution.)
METHODS
- (i)
Data in this study include historical flood points and conditioning factors. Past flood points were collected from the field mission in 2020 and 2021, and extracted from the Sentinel 1A images in October 2020.
- (ii)
The SVM and RF models were coded using the Python platform with the TensorFlow library. The hardest part is determining the model hyperparameters in SVM (C and Gamma) and RF (number of trees). Hyperparameter selection for these models is based mainly on trial and error. The final values of C and Gamma in SVM were 10 and 0.2, respectively, and 200 trees were used in RF.
- (iii)
Model evaluation: Several statistical indices – AUC, RMSE, and MAE – were used to evaluate model performance.
- (iv)
Analysis: after validation, SVM and RF were used to build the 2017 and 2021 flood susceptibility maps. The model output values vary from 0 to 1 and have been grouped into five classes: very low, low, moderate, high and very high – using the natural break method. Notably, the range of values between types in 2017 and 2021 is similar, allowing for the assessment of changes in flood susceptibility over time.
Support vector machine
SVM was developed by Vapnik Research in 1965 but became an algorithm in 1995 (Cortes & Vapnik 1995). SVMs are initially linear and binary, based on maximizing the separation margin between two classes to predict a binary qualitative variable. Kernel functions, however, mean that transformations are now possible for multiclass classification. SVM's basic principle is to divide a dataset into two subsets by a separating hyperplane (Tehrany et al. 2014, 2015). The goal is to determine an optimal hyperplane to maximize the distance to the nearest training data (known as support vectors). This distance is the ‘margin’.
The accuracy of SVM depends on the setting of C and Gamma. The higher the value of C, the more precise the data point separation. On the other hand, if the value of Gamma is significant, only points near the dividing line will be used in calculation.
Random forest
RF, developed by (Breiman 2001), can solve both classification and regression problems, and is a combination of a bagging algorithm and random subspace. RF works by constructing many DT following three main steps:
- (1)
Using the bootstrap method to build a data subset of similar size to the original dataset.
- (2)
Making the DT based on subsets.
- (3)
Predicting the outcome by majority voting.
RF can handle multidimensional data and can also work well for missing data (Chen et al. 2020). Its performance depends on the number of decision trees (Ntree) and the number of variables used to build the tree (mtry).
Assessment model
Assessing the model's accuracy is a critical step in building a flood susceptibility model. If the model performs well, it can be generalized to assess flood susceptibility elsewhere. In this study, statistical indicators including receiver operating curve (ROC), RMSE, and MAE, were used to assess data learning and validation performance.
The ROC curve shows a model's ability to predict the occurrence of floods correctly and is based on the true positive rate (sensitivity) and false-positive rate (1–specificity). The true positive rate is the proportion of actual positives that are correctly identified as positives by the model, while the false-positive rate is the proportion of actual negatives that are incorrectly identified as positives by the model. The areas under the ROC curve reflect model accuracy.
RMSE and MAE measure the differences between the predicted and observed values (Wang et al. 2019; Nguyen 2022a).
RESULTS
Land use change analysis
Factor selection
Model assessment
Table 2 gives RMSE and MAE for the flood susceptibility models. Clearly, the RF model performs better than the SVM one in terms of both, with values of 0.21 and 0.06, respectively, compared to the SVM model's 0.22 and 0.08 during training. The same was true in validation, the RF model (RMSE = 0.22, MAE = 0.09) outperformed the SVM model (0.23 and 0.12).
. | Training . | Validating . | ||||
---|---|---|---|---|---|---|
AUC . | RMSE . | MAE . | AUC . | RMSE . | MAE . | |
SVM | 0.98 | 0.22 | 0.08 | 0.97 | 0.23 | 0.12 |
RF | 0.99 | 0.21 | 0.06 | 0.98 | 0.22 | 0.09 |
. | Training . | Validating . | ||||
---|---|---|---|---|---|---|
AUC . | RMSE . | MAE . | AUC . | RMSE . | MAE . | |
SVM | 0.98 | 0.22 | 0.08 | 0.97 | 0.23 | 0.12 |
RF | 0.99 | 0.21 | 0.06 | 0.98 | 0.22 | 0.09 |
Assessment effects of land use and rainfall on flood susceptibility
. | . | Very low (km²) . | Low (km²) . | Moderate (km²) . | High (km²) . | Very high (km²) . |
---|---|---|---|---|---|---|
SVM | 2017 | 2,636 | 415 | 337 | 393 | 988 |
2021 | 988 | 513 | 287 | 277 | 1,436 | |
RF | 2017 | 1,673 | 978 | 994 | 551 | 573 |
2021 | 1,578 | 915 | 657 | 476 | 1,143 |
. | . | Very low (km²) . | Low (km²) . | Moderate (km²) . | High (km²) . | Very high (km²) . |
---|---|---|---|---|---|---|
SVM | 2017 | 2,636 | 415 | 337 | 393 | 988 |
2021 | 988 | 513 | 287 | 277 | 1,436 | |
RF | 2017 | 1,673 | 978 | 994 | 551 | 573 |
2021 | 1,578 | 915 | 657 | 476 | 1,143 |
DISCUSSION
Although several different methods have been developed and used to predict the spatial distribution of flood susceptibility, assessment is still considered an obstacle to effective flood risk management (Gharakhanlou & Perez 2023; Wu et al. 2023). Floods are caused mainly by the distribution and intensity of precipitation and are often amplified by topographic conditions and landscape characteristics, including water infiltration capacity. Both climate and LULC are expected to influence flood susceptibility in the future.
The results of this study confirmed that altitude, NDBI, NDVI, distance from the river, and rainfall are the most critical factors for the probability of flood occurrence in Thua Thien Hue province. Altitude is important because it is directly related to water storage capacity. In Thua Thien Hue province, floods occur mainly in the lower altitude, eastern region. The importance of NDBI and NDVI came second and third for flood susceptibility because they influence water infiltration capacity directly. For example, in the study area, the increase in built-up area and decrease in area of vegetation are thought to have changed the spatial distribution of floods from 2017 to 2019. Distance from the river was the fourth factor of importance for flood occurrence, because floods in the study area are fluvial, occurring near the river. Rain was only the fifth factor also because floods in the study area are fluvial, occurring at low altitude and due to LULC. Previous studies justify all this. (Nguyen 2022b) found that elevation and land use were critical factors in flood occurrence in Central Vietnam and Ha Tinh province. Ha et al. (2022) indicated that elevation, rainfall, and distance to river were the most important factors for flood susceptibility in Thua Thien Hue province, Vietnam.
In this study, two popular models, SVM and RF, were proposed to assess the effects of climate and LULC change on flood susceptibility in Thua Thien Hue province, Vietnam. The results show that with its AUC value of 0.98, the RF model outperformed the SVM model (AUC = 0.97) with respect to the effects of climate and LULC change on flood susceptibility. RF is one of the most effective and widely used, existing models, because it is the most easily explained (or interpreted) of the models. In addition, it is more transparent in relation to the use made of training data. RF also has the advantage of using all of its initial data more intelligently, in order to limit its errors (Pal 2005; Uddin & Uddiny 2015). While SVM has the advantage in solving overfitting problems, however, it is not suitable for nonlinear problems (Karamizadeh et al. 2014).
The models proposed in this study use RMSE as an objective function, because of which, the overfitting problem is seen as an important challenge, as it is for machine learning in general. To reduce the problem's effects, several techniques were applied in this study, such as setting a dropout rate and limiting the search range. However, other methods can also be tried, such as collecting training data from different geographical locations (Nguyen et al. 2020).
The study's results also confirmed that the very high flood susceptibility surfaces, concentrated mainly in the east where there were major changes in land use, increased from 2017 to 2021 in the study area. This trend is common worldwide. Nguyen et al. (2022a) show that flood susceptibility will increase in the Nhat Le – Kien Giang watershed (Vietnamese central region) will increase from 2005 to 2050 due to land use and climate change. Hagos et al. (2022) report that flooding will become more frequent due to climate change and land use change in the Awash River basin in central Ethiopia. Nirupama & Simonovic (2007) show the increase in flood risk in the province of Ontario in Canada due to urban growth. This was confirmed by the study by Hettiarachchi et al. (2018), who highlighted the increased flood risk in the South Washington Watershed District (SWWD) in Minnesota, United States, related to climate change and urban growth. This study provides additional evidence for the effects of climate change and change in land use on flood susceptibility.
CONCLUSIONS
The objective of this study was to evaluate flood susceptibility to the impact of climate change and LULC, using machine learning and remote sensing models in relation to Thua Thien Hue province, Vietnam.
The proposed models use 1,864 flood locations and 11 condition factors from 2017 to 2021 as input data. The results show that elevation, NDVI, NDBI, distance to river and rainfall were the most important factors for flood occurrence in the study area.
The combination of machine learning techniques (SVM and RF) with remote sensing successfully assessed the impact of climate change and LULC on flooding, with AUC values exceeding 0.95. The method can be generalized to determine flood susceptibility elsewhere, particularly in data-limited areas.
Very high flood susceptibility areas increased by about 30% from 2017 to 2021 in Thua Thien Hue. The changes are related to the increased construction area – the surfaces are concentrated mainly in the east – and precipitation.
FUNDING STATEMENT
This work is supported by national research program from MOST (Vietnam), project number ĐTĐL.CN-93/21.
AUTHOR CONTRIBUTIONS
All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by V.T.V., H.D.N., P.L.V., M.C.H., V.D.B., T.O.N., V.H.H., T.K.H.N. The first draft of the manuscript was written by V.T.V. and H.D.N. All authors read and approved the final manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.