Flash floods are a frequent and highly destructive natural hazard in China. In order to prevent and manage these disasters, it is crucial for decision-makers to create GIS-based flash flood susceptibility maps. In this study, we present an improved Blending approach, RF-Blending (Reserve Feature Blending), which differs from the Blending approach in that it preserves the original feature dataset during meta-learner training. Our objectives were to demonstrate the performance improvement of the RF-Blending approach and to produce flash flood susceptibility maps for all catchments in Jiangxi Province using the RF-Blending approach. The Blending approach employs a double-layer structure consisting of support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) as base learners for level-0, and the output of level-0 is utilized as the meta-feature dataset for the meta-learner in level-1, which is logistic regression (LR). RF-Blending employs the output of level-0 along with the original feature dataset for meta-learner training. To develop flood susceptibility maps, we utilized these approaches in conjunction with historical flash flood points and catchment-based factors. Our results indicate that the RF-Blending approach outperformed the other approaches. These can significantly aid catchment-based flash flood susceptibility mapping and assist managers in controlling and remediating induced damages.

  • Catchments as basic study units.

  • Producing flash flood susceptibility maps using machine learning approaches.

  • An improved Blending approach.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Flash floods are among the most catastrophic hazards that cause extensive damage and disruption to the environment and society (Khajehei et al. 2020). In China, flash floods have caused an average of 356 casualties between 2011 and 2020, representing approximately 60% of flood mortality (Ministry of Water Resources of China (MWR) 2021; Guo et al. 2018a). Flash flood susceptibility mapping is a challenging task due to the complex factors that affect flash flood generation, including catchment properties and rainfall characteristics (Rozalis et al. 2010). The Sendai Framework for Disaster Risk Reduction, signed in 2015 under the leadership of the United Nations Office for Disaster Risk Reduction (UNDRR), prioritizes the strengthening of disaster risk governance to manage disaster risk (UNDRR 2015). One way to strengthen flash flood risk management is to assess and map flash flood susceptibility (Youssef et al. 2016; Ha et al. 2021). The Chinese government has significantly increased investment and attention to flash flood prevention and control. The MWR has conducted the National Flash Flood Disasters Investigation and Evaluation project, which covered 30 provinces, 305 cities, and 2,138 counties. The project has established a database of investigation and evaluation results for national flash flood disasters to provide a solid data foundation for flash flood monitoring and warning systems, disaster management, and mitigation research (Guo et al. 2017).

To produce accurate flash flood susceptibility maps using machine learning methods, reliable historical flash flood points and factors are required (Costache et al. 2020a; Hosseini et al. 2020). Historical flash flood points are usually derived from the survey records of local authorities (Chen et al. 2020). The conditioning factors for flash floods are complex, and there is no agreement on how to apply these factors accurately (Rozalis et al. 2010; Chen et al. 2019). The conditioning factors can be classified into three categories: geometric characteristics (GC), environmental characteristics (EC), and hydrological characteristics (HC). The factors and their corresponding references are shown in Table 1. Among the GC, the topography is closely related to flood susceptibility, mainly due to the elevation and slope (Zhou et al. 2000; Ragettli et al. 2017). Catchment area, catchment flow channel length, and their ratio are considered to have a strong correlation with flash flood hazards in catchments (Fan et al. 2012). EC includes vegetation, soils, and rivers, quantified accordingly as the normalized difference vegetation index (NDVI), topographic wetness index (TWI), and the density of the river network or the distance to rivers (Soulsby et al. 2010; Huang et al. 2012; Reager et al. 2014; Miao et al. 2016; Zhao et al. 2016; Liu et al. 2019). Short-duration heavy rainfall is usually regarded as the direct cause of flash floods, and the related HC (Peak discharges per unit area, curve number, and time of concentration) also have a remarkable influence (Guo et al. 2018b).

Table 1

Factors and their corresponding references

ClassificationConditioning factorsDescriptionReference
Geometric characteristics Slope The slope measures the undulation of the ground, which affects surface runoff and vertical percolation. Zhou et al. (2000), Liu et al. (2019), Zhao et al. (2022), Zhong et al. (2019, 2020), Meraj et al. (2015)  
Elevation Water generally flows from areas of high elevation and accumulates in areas of low elevation. The elevation used is the elevation of the catchment centroid. Zhou et al. (2000), Liu et al. (2019), Zhong et al. (2019, 2020)  
Shape factor The shape factor is the ratio of the catchment area to the square of the longest flow path. Ragettli et al. (2017), Meraj et al. (2015)  
Concentration length or gradient The concentration gradient is the gradient of the longest flow path and is defined as the ratio of its height (difference of elevations at origin and outlet) to its length. Fan et al. (2012), Meraj et al. (2015), Lazaro et al. (2014)  
Environmental characteristics Topographic wetness index (TWI) TWI is a proxy of soil moisture, indicating the degree of water accumulation in a catchment. Soulsby et al. (2010), Miao et al. (2016), Ma et al. (2021)  
Normalized difference vegetation index (NDVI) NDVI is used as a proxy of vegetation conditions. Areas with lower vegetation density are often more prone to flooding. Huang et al. (2012), Liu et al. (2019), Zhong et al. (2020), Meraj et al. (2015)  
Soil Soil as an important environmental characteristic, soil moisture, soil type, etc., can affect the infiltration of runoff, which can cause flash floods. Meraj et al. (2015), Zhong et al. (2019), Liu et al. (2019), Ragettli et al. (2017)  
Land use Land use is related to human activity and it is considered to be an important driver of global environmental change, with connections to flash floods. Liu et al. (2019), Ragettli et al. (2017)  
Distance to the nearest river Distance is a common measure of proximity. Areas near rivers are often more prone to flooding. Reager et al. (2014), Zhao et al. (2016)  
Drainage density Drainage density indicates the length of the river within a unit area of a catchment, i.e. the ratio of the total length of the river to the area. Zhong et al. (2019, 2020), Meraj et al. (2015), Zhao et al. (2016)  
Hydrological characteristics Rainfall Rainfall is the main source of generating runoff. The rainfall adopted here is the maximum rainfall within 10 min in a 2-year return period. Guo et al. (2018b), Zhao et al. (2022), Zhong et al. (2019, 2020), Li & Wan (2017)  
Peak discharges per unit area Peak discharges per unit area is the ratio of peak discharges (per second) to the catchment area. Guo et al. (2018a), Li et al. (2017)  
Time of concentration Time of concentration refers to the time for water to travel across a catchment's longest flow path to reach the catchment outlet. It is often used to assess the response of a catchment to rainfall and associated flood risk. Guo et al. (2018a), Li et al. (2017), Lazaro et al. (2014)  
Curve number The curve number is an empirical parameter used in hydrology for predicting direct runoff or infiltration from rainfall excess. It is based on the soil, land use, treatment, and hydrologic condition. Zhao et al. (2022), Lazaro et al. (2014)  
ClassificationConditioning factorsDescriptionReference
Geometric characteristics Slope The slope measures the undulation of the ground, which affects surface runoff and vertical percolation. Zhou et al. (2000), Liu et al. (2019), Zhao et al. (2022), Zhong et al. (2019, 2020), Meraj et al. (2015)  
Elevation Water generally flows from areas of high elevation and accumulates in areas of low elevation. The elevation used is the elevation of the catchment centroid. Zhou et al. (2000), Liu et al. (2019), Zhong et al. (2019, 2020)  
Shape factor The shape factor is the ratio of the catchment area to the square of the longest flow path. Ragettli et al. (2017), Meraj et al. (2015)  
Concentration length or gradient The concentration gradient is the gradient of the longest flow path and is defined as the ratio of its height (difference of elevations at origin and outlet) to its length. Fan et al. (2012), Meraj et al. (2015), Lazaro et al. (2014)  
Environmental characteristics Topographic wetness index (TWI) TWI is a proxy of soil moisture, indicating the degree of water accumulation in a catchment. Soulsby et al. (2010), Miao et al. (2016), Ma et al. (2021)  
Normalized difference vegetation index (NDVI) NDVI is used as a proxy of vegetation conditions. Areas with lower vegetation density are often more prone to flooding. Huang et al. (2012), Liu et al. (2019), Zhong et al. (2020), Meraj et al. (2015)  
Soil Soil as an important environmental characteristic, soil moisture, soil type, etc., can affect the infiltration of runoff, which can cause flash floods. Meraj et al. (2015), Zhong et al. (2019), Liu et al. (2019), Ragettli et al. (2017)  
Land use Land use is related to human activity and it is considered to be an important driver of global environmental change, with connections to flash floods. Liu et al. (2019), Ragettli et al. (2017)  
Distance to the nearest river Distance is a common measure of proximity. Areas near rivers are often more prone to flooding. Reager et al. (2014), Zhao et al. (2016)  
Drainage density Drainage density indicates the length of the river within a unit area of a catchment, i.e. the ratio of the total length of the river to the area. Zhong et al. (2019, 2020), Meraj et al. (2015), Zhao et al. (2016)  
Hydrological characteristics Rainfall Rainfall is the main source of generating runoff. The rainfall adopted here is the maximum rainfall within 10 min in a 2-year return period. Guo et al. (2018b), Zhao et al. (2022), Zhong et al. (2019, 2020), Li & Wan (2017)  
Peak discharges per unit area Peak discharges per unit area is the ratio of peak discharges (per second) to the catchment area. Guo et al. (2018a), Li et al. (2017)  
Time of concentration Time of concentration refers to the time for water to travel across a catchment's longest flow path to reach the catchment outlet. It is often used to assess the response of a catchment to rainfall and associated flood risk. Guo et al. (2018a), Li et al. (2017), Lazaro et al. (2014)  
Curve number The curve number is an empirical parameter used in hydrology for predicting direct runoff or infiltration from rainfall excess. It is based on the soil, land use, treatment, and hydrologic condition. Zhao et al. (2022), Lazaro et al. (2014)  

Several methods have been utilized to identify areas susceptible to flash floods. Hydrological models have been developed to predict flash flood susceptibility or rainfall thresholds for flash floods (Miao et al. 2016; Nguyen et al. 2016). However, these models require substantial data inputs that are often difficult to obtain (Rozalis et al. 2010; Hapuarachchi et al. 2011). Studies can only be carried out for a single catchment or watershed, and it is difficult to model a larger area (Zhang et al. 2021). The GIS-based spatial analysis approaches have also been used to investigate flash flood susceptibility. These approaches discriminate the susceptibility of different regions by analyzing the spatial heterogeneity and geographical similarity of the conditioning factors (Xiong et al. 2020). These approaches have performed well in a single catchment or watershed, but their performance in large study areas remains a topic of discussion (Abdelkareem 2017; Abdo 2020).

With the development of machine learning, an increasing number of studies apply machine learning to geography and GIS (Gao 2020; Janowicz et al. 2020). Researchers have found that machine learning performs well in solving nonlinear problems, feature selection, and data mining (Liu et al. 2022). A variety of machine learning approaches are widely used to assess flash flood susceptibility, including decision tree (DT) (Tehrany et al. 2013), SVM (Tehrany et al. 2014), KNN (Costache et al. 2019), etc. Cao et al. (2020) incorporated the Pearson correlation coefficient and Geodetector into LR to filter the evaluation indicators and finally map the susceptibility to flash floods in Fujian Province, China (Cao et al. 2020). Chiang et al. (2007) employed the recurrent neural network (RNN) model to study flash floods by combining multiple precipitation data sources (Chiang et al. 2007). Fang et al. (2021) added local spatial information of grid cells to long short-term memory (LSTM) to take advantage of the sequential model to process attribute information and spatial relationships of flash floods.

The performance of a single model is always limited. To improve predictive power, researchers attempt to combine these single models in hybrid and ensemble models. Hybrid ideas are usually divided into two categories.

The first is the hybrid model, which combines completely different, heterogeneous machine learning approaches. Primary prediction models, such as neural networks, including adaptive neuro-fuzzy inference systems, are combined with stochastic optimization algorithms to optimize hyperparameters of neural networks. The resulting hybrid models outperform benchmark models in both convergence speed and prediction results (Bui et al. 2016; Hong et al. 2018; Termeh et al. 2018). Costache et al. (2020b) used the bivariate statistical method to process the data of the input model in advance, then inputted the statistical index as a new predictive variable into the machine learning model (Costache et al. 2020a, 2020b). Although the final predictions are still output by a single model, other approaches are combined into the entire experiment process, which is different from the pure single machine learning models.

The other category is ensemble model, also known as ensemble learning, which consists of homogeneous machine learning approaches, called base learners. Single or multiple base learners are combined into ensemble models according to different strategies to reduce the impact of the bias of base learners and improve predictive power (Choubin et al. 2019). Bagging, boosting, and stacking are the three most commonly used strategies. Bagging has been frequently used in flash flood susceptibility mapping (Chapi et al. 2017; Bui et al. 2019; Chen et al. 2019; Arabameri et al. 2020; Ha et al. 2021). In addition, boosting approaches, including Adaptive Boosting (AdaBoost) (Pham et al. 2020; Ha et al. 2021), Gradient Boosting Decision Tree (GBDT) (Chen et al. 2021), and eXtreme Gradient Boosting (XGBoost) (Chen et al. 2021; Ma et al. 2021), have been shown to perform well in flash flood susceptibility assessment. Stacking and blending have been used in flash flood susceptibility mapping for their superiority, as demonstrated by Yao et al. (2022), but there are relatively few studies associated with it (Yao et al. 2022).

In this study, the Blending approach and an RF-Blending approach are selected as the main approaches of this study. The study is carried out for all the catchments in Jiangxi Province, China. Catchment is used as the unit of study since it may be better to use catchments to map flash flood susceptibility in large areas (Ragettli et al. 2017). SVM, KNN, RF, and LR have all demonstrated good performance in predicting flash flood susceptibility (Tehrany et al. 2014; Costache et al. 2020b; Rahman et al. 2021). They also outperformed most methods in our previous experiments. Since Yao et al. (2022) have used the stacking and blending approach for flash flood mapping in Jiangxi Province and SVM, KNN, and RF were used as base learners and linear regression was used as a meta-learner in their study (Yao et al. 2022). For this reason, SVM, KNN, and RF are selected as the base learners. Linear regression is not suitable as a meta-learner for RF-Blending since the meta-learner still needs to learn the features in the input dataset. A linear model that has performed well in the prediction of flash flood susceptibility is selected as the meta-learner, which is LR.

This paper aims to perform flash flood susceptibility mapping in Jiangxi Province, China, using a novel GIS-based approach combined with the RF-Blending model. Firstly, a literature review of the factors and approaches for flash floods is introduced and used as the basis for the selection of factors and methods for this study. Secondly, the study area and data employed are presented. The RF-Blending approach, the Blending approach, and the benchmark models, as well as the metrics for model performance, are then explained. Finally, the results of the model and flash flood susceptibility maps are presented. The differences in model performance, the distribution of different levels of susceptibility, and the limitations of this study are discussed. The final section summarizes the content and highlights the main finding, which is that the RF-Blending has a good performance in flash flood susceptibility assessment and mapping.

Study area

Jiangxi Province is located in the southeastern region of China and is a sub-tributary watershed on the south bank of the middle reaches of the Yangtze River. It shares borders with Anhui Province to the north, Zhejiang Province to the northeast, Fujian Province to the east, Guangdong Province to the south, Hunan Province to the west, and Hubei to the northwest. The Province covers an area of about 166,900 km2 and has 11 cities, including Nanchang, Jiujiang, Ganzhou, Ji'an, Pingxiang, Yingtan, Xinyu, Yichun, Shangrao, Jingdezhen, and Fuzhou. The study area is shown in Figure 1.
Figure 1

The location, topography, and water of the study area.

Figure 1

The location, topography, and water of the study area.

Close modal

The south and northwest of Jiangxi Province are mostly hilly and at higher altitudes, while the north is relatively flatter and at lower altitudes. The vegetation cover is high in Jiangxi Province, with significantly more vegetation cover in the south than in the north, and less vegetation in the basins such as Poyang Lake. There are more than 2,400 rivers in the province, with a total length of 18,000 km, most of which converge to the Poyang Lake. The five main rivers are Ganjiang River, Xinjiang River, Fuhe River, Xiuhe River, and Raohe River.

Jiangxi belongs to the subtropical monsoon climate zone, with four distinct seasons, a humid and hot summer and a cool winter. The annual average temperature is 19 °C and the average annual rainfall is 1,400–2,200 mm, with uneven distribution between seasons and regions. The northeast of the province receives the highest amount of rain, while the northwest receives the least. The frequency of short-duration intense rainfall in Jiangxi Province is higher in the east and lower in the west, with high-frequency areas distributed in Jingdezhen and Shangrao in the northeast, and low-frequency areas located in Yichun and western Ganzhou. Temporally, rainfall is mainly concentrated in the period from April to September each year, peaking in June, with a clear trend of increasing precipitation frequency year by year (Tang et al. 2018).

Data

Establishing reliable historical flash flood records is a crucial initial step in machine learning-based flash flood susceptibility mapping. The historical flash flood record of Jiangxi Province used in this study are from the historical flash flood disaster database completed in 2015, which covers flash flood events recorded from 1950 to 2015. Flash flood susceptibility prediction using machine learning is typically considered a binary classification problem, where positive samples refer to catchments with recorded flash flood events and negative samples refer to catchments without such events. To ensure that the spatial distribution of the negative samples is relatively uniform, the subset tool in ArcGIS is used to randomly select catchments that have not experienced flash floods at the same number as the number of positive samples. After removing significant outliers, the sample dataset comprised 1,911 observations, with 940 positive and 971 negative samples. The outcome variable has a value of 1 (flash flood, 940) or 0 (non-flash flood, 971). The dataset was divided randomly using the data split function in scikit-learn, of which 70% (1,337 catchments) were used for the training dataset and 30% (574 remaining catchments) for the test dataset. The spatial distribution of the catchments for the sample dataset is shown in Figure 2.
Figure 2

The spatial distribution of the samples.

Figure 2

The spatial distribution of the samples.

Close modal
Ten conditioning factors were selected as variables for this study, based on the actual situation in Jiangxi Province and relevant literature. These factors include slope, shape factor, concentration gradient, TWI, NDVI, distance to the nearest river, rainfall, peak discharges per unit area, and time of concentration. For factors defined on raster data, the average value was calculated for each catchment if the original unit of measure is not a catchment. Their sources are shown in Table 2, and their spatial distribution is shown in Figure 3.
Table 2

Factors used in spatial modeling for flood susceptibility mapping

ClassificationConditioning factorData source
Geometric characteristics (a) Slope DEM Dataset of China 
 (b) Elevation  
 (c) Shape factor  
 (d) Concentration gradient  
Environmental characteristics (e) Topographic wetness index (TWI)  
 (f) Normalized difference vegetation index (NDVI) Landsat 7 Collection 1 Tier 1 Annual NDVI Composite (Landsat-7 image courtesy of the U.S. Geological Survey) 
 (g) Distance to the nearest river River System in China 
Hydrological characteristics (h) Rainfall Statistical Parameter Atlas of Rainstorms in China 
 (i) Peak discharges per unit area  
 (j) Time of concentration  
ClassificationConditioning factorData source
Geometric characteristics (a) Slope DEM Dataset of China 
 (b) Elevation  
 (c) Shape factor  
 (d) Concentration gradient  
Environmental characteristics (e) Topographic wetness index (TWI)  
 (f) Normalized difference vegetation index (NDVI) Landsat 7 Collection 1 Tier 1 Annual NDVI Composite (Landsat-7 image courtesy of the U.S. Geological Survey) 
 (g) Distance to the nearest river River System in China 
Hydrological characteristics (h) Rainfall Statistical Parameter Atlas of Rainstorms in China 
 (i) Peak discharges per unit area  
 (j) Time of concentration  
Figure 3

Flash flood conditioning factors. (a) Slope, (b) Elevation, (c) Shape factor, (d) Concentration gradient, (e) TWI, (f) NDVI, (g) Distance to nearest river, (h) Rainfall, (i) Peak discharge per unit area, and (j) Time of concentration.

Figure 3

Flash flood conditioning factors. (a) Slope, (b) Elevation, (c) Shape factor, (d) Concentration gradient, (e) TWI, (f) NDVI, (g) Distance to nearest river, (h) Rainfall, (i) Peak discharge per unit area, and (j) Time of concentration.

Close modal
The framework of this study is shown in Figure 4. The complete experiment process includes the following parts: data preparation, data preprocessing, model building, accuracy validation and comparison, and flash flood susceptibility mapping.
Figure 4

The framework of the Blending approach in this study.

Figure 4

The framework of the Blending approach in this study.

Close modal

The conditioning factors are first selected and the catchment dataset is collected. Subsequently, the training and test sets are preprocessed for normalization and segmentation. The RF-Blending approach used in this paper consists of a three-level structure: the base learners (level-0), the construction of a meta-feature dataset (level-1), and the meta-learner (level-2). Based on the trial-and-error method and literature review, four models are used: RF, KNN, SVM, and LR, where LR is used as a meta-learner and the remains as base learners. The base learners are first fitted on the training set, and output the results from the training and test sets, respectively. Then the obtained predictions are used to form a meta-feature dataset along with the training and test sets, and finally, the meta-learner is fitted on the meta-feature dataset. Considering that only one set of training and test sets are used for model building and performance comparison, to explore the differences between the trained models when different training and test sets are used, and to verify the stability of the models, a ten-fold cross-validation (CV) approach is used to validate their performance and compare the model performance and generalization ability of the original Blending model, the RF-Blending model, and the four benchmark models. Finally, flash flood susceptibility is assessed and mapped for all catchments in the study area. The remainder of this section details the basic definition of each model, hyperparameter tuning, and model evaluation metrics.

The experimentation process, including model training, hyperparameter tuning, and performance evaluation, is implemented using Python and Scikit-learn (https://scikit-learn.org/stable/index.html). ArcGIS was used for data management, processing, and visualization.

Blending approach

Blending is an ensemble approach and it is a variant of stacking. In contrast to bagging and boosting, it combines multiple base learners in the training phase to enhance their performance by combining the advantages of these different learners. In general, there are two levels, level-0 consisting of base learners and level-1 consisting of meta-learner. The base learners (level-0) learn from the original dataset (the input dataset) to produce the meta-feature dataset, and the meta-learner (level-1) applies the meta-feature dataset to obtain the final results. The main difference between stacking and blending is that stacking uses k-fold cross-validation in the training process of a single learner, while blending uses the leave-out method to set aside a part of the input dataset. The advantages of blending are that it avoids overfitting due to data leakage and that it is faster to train than stacking. The meta-feature dataset for the RF-Blending approach used in this study consists of the output from the base learner in level-0 together with the original dataset to further improve the model. There are three levels, level-0 consisting of base learners, the meta-feature dataset being created at level-1, and level-2 consisting of meta-learner. The contribution of factors from the original dataset to the final prediction results is retained while using the prediction results from the base learners to enhance the model performance (Figure 5).
Figure 5

Flowchart of Blending approach construction. (a) Blending and (b) RF-Blending.

Figure 5

Flowchart of Blending approach construction. (a) Blending and (b) RF-Blending.

Close modal

The RF-Blending modeling process is as follows.

  • 1.

    Train the three base learners with the training set individually.

  • 2.

    Input the training set and test set into the trained three base learners to get the predictions.

  • 3.

    Combine the predictions obtained in step 2 with the training and test sets to obtain the meta-feature data training set and test set.

  • 4.

    Train the meta-learner with the meta-feature data training set.

  • 5.

    Input the meta-feature data test set into the trained meta-learner to get the predictions and verify the model performance.

Basic learners

In this study, SVM, KNN, and RF were selected as the base learners at level-0, and LR was employed as the meta-learner.

  • (1)

    Logistic regression

LR is a classical linear classifier characterized by the fact that it does not require the input data to be normally distributed. The feature factors can be either continuous or discrete. It is mostly used to solve binary classification problems and is often used as a meta-learner in the Blending approach. The logistic function formula can be represented by Equation (1):
(1)
where μ is a location parameter (the midpoint of the curve), s is a scale parameter, and x is the input data. In a practical problem, the can be rewritten as Equation (2):
(2)
where (i = 0, 1, 2, …n) are the coefficients of the LR, (i = 1, 2, …n) are the factors of the model. The output P is between 0 and 1, and it represents the probability that the corresponding y will be unity (true). In the case of export classification results, a threshold of 0.5 is used for P. Predictions high with 0.5 are classified as true class and those below 0.5 are classified as false class.
  • (2)

    K-nearest neighbor

KNN is a classical nonparametric machine learning method that determines the category of the samples to be classified according to the categories of the K samples closest to them. However, not all classification problems have actual distance. In most cases, the similarity between samples is used instead of distance. When calculating sample similarity, there are some distance measures to choose from, including Minkowski distance, Euclidean distance, and Manhattan distance. In this study, the Minkowski distance was used. The performance of the KNN model mainly depends on the selection of the K value and distance measure.

  • (3)

    Support vector machine

SVM is a classical binary classification model that aims to find the segmentation hyperplane with the largest geometric interval that can correctly divide the dataset, to build a linear learner with the largest geometric interval in the feature space. The hyperplane divides the feature space into two parts, thus achieving the purpose of distinguishing positive and negative samples. For nonlinear data sets, kernel functions are provided to map data to high-dimensional space through a nonlinear transformation to learn this SVM. The decision function can be represented by Equation (3):
(3)
where i is the index of sample point and n is the sample size; and are parameters; and are the outcome and the feature vector of sample i; a is a new sample point to be classified; and is a kernel function. The performance of the SVM model depends on the selection of a suitable kernel function, which can be polynomial kernel (Poly), sigmoid kernel (SIG), radial basis function (RBF), or linear kernel (LN). In this study, the polynomial kernel was selected. For further details of the SVM model, refer to Noble (2006) and Tehrany et al. (2014).
  • (4)

    Random forest

RF is an improvement of the tree-based bagging method in essence. It typically consists of a series of random decision trees. Firstly, the training data and features are selected in a put-back process and divided into multiple independent sub-datasets. These sub-datasets and the corresponding features are then input into the random trees for training. Finally, the predictions from all decision trees are integrated and the final prediction is calculated by voting or averaging. Unlike traditional decision trees, the RF does not overfit the training data since the random trees that comprise it are all independent of each other. Furthermore, it effectively avoids deviations, missing values, and chaotic inputs. The RF is often used as a benchmark model to evaluate the performance of flash flood susceptibility prediction results. For further details of the RF, refer to Costache et al. (2020a) and Breiman (2001).

Hyperparameter optimization

Hyperparameters are a critical component of machine learning models and significantly affect model training and performance. Typically, a model contains several hyperparameters, and it is necessary to determine their values that achieve optimal performance on a given dataset before training the model. Hyperparameter optimization aims to find the best hyperparameters that achieve the optimal performance. In this study, Bayesian optimization is used to select hyperparameters. First, several points are randomly searched within a given range to fit a surrogate model. Then the best point is selected using the acquisition function, and this new data point is used to optimize the surrogate model. The above steps are repeated and the optimal hyperparameters are obtained after a certain number of iterations. The hyperparameter optimization is carried out using the test set to avoid overfitting. The hyperparameters are listed in Table 3.

Table 3

Models parameters inventory

ModelParameter settings (Default value if not specified)
RF-Blending C = 94, solver = ‘saga’ 
Blending C = 94, solver = ‘saga’ 
LR C = 94, solver = ‘saga’ 
RF n_estimators = 20, max_depth = 10, max_features = 10, min_samples_leaf = 25, min_samples_split = 40 
KNN n_neighbors = 15, leaf_size = 28, p = 1 
SVM C = 8.55, gamma = 0.53, coef = 0.33, degree = 2 
ModelParameter settings (Default value if not specified)
RF-Blending C = 94, solver = ‘saga’ 
Blending C = 94, solver = ‘saga’ 
LR C = 94, solver = ‘saga’ 
RF n_estimators = 20, max_depth = 10, max_features = 10, min_samples_leaf = 25, min_samples_split = 40 
KNN n_neighbors = 15, leaf_size = 28, p = 1 
SVM C = 8.55, gamma = 0.53, coef = 0.33, degree = 2 

Model evaluation metrics

Model evaluation is critical for measuring the performance of machine learning approaches. In this study, flash flood susceptibility prediction is solved as a binary classification problem, with two outputs: one representing the probability of a catchment belonging to a positive sample, which is used for flash flood susceptibility mapping, and the other representing whether a catchment belongs to a positive or negative sample, used for model performance evaluation. Classification models typically use accuracy, precision, recall, sensitivity, specificity, and F1-score to represent their performance. To more intuitively reflect the classification performance and not be affected by the threshold used for classification, the receiver operating characteristic (ROC) curve is also commonly used as a model evaluation criterion.

The accuracy, precision, recall, sensitivity, and specificity metrics depend on the distribution of positive and negative samples of predicted and actual values. They can be represented by Equations (4)–(7), respectively.
(4)
(5)
(6)
(7)
where TP represents true positives, which are correctly classified as positive samples, and TN represents true negatives, which are correctly classified as negative samples. FP represents false positives, which are incorrectly classified as positive samples but are actually negative samples, and FN represents false negatives, which are incorrectly classified as negative samples but are actually positive samples.
The F1-score is defined as the harmonic average of precision and recall, which is a metric that balances both metrics.
(8)
The true positive rate (TPR) and false positive rate (FPR) are effective metrics for model evaluation. The higher the TPR and the lower the FPR, the better the model performance is. However, these two metrics are mutually affected, which depends on the threshold value to distinguish positive and negative samples. Therefore, the ROC curve is introduced. The TPR is taken as the y-axis, and the FPR as the x-axis, and the classification threshold is set as several continuous values from 0 to 1. The area under the curve (AUC) value is introduced to quantify the ROC curve. The AUC value ranges between 0.5 and 1. The larger the value, the better the model performance. A value close to 0.5 is considered close to random binary choice (similar to the probability of a coin flip).
(9)

Equation (9) shows that TP is the number of true positive samples, TN is the number of true negative samples, P is the number of positive samples in the input data, and N is the number of negative samples.

Model performance and comparison

The values of evaluation metrics for the models on the training and test set are shown in Tables 4 and 5. Among the six models, the largest difference in accuracy between the training and test sets is observed in the Blending model with 1.49%, while RF has no difference. The largest difference in F1-score is observed in the KNN model with 2.41%, while RF has the lowest difference of 0.05%. The largest difference in AUC values is observed in the RF-Blending model with 0.0516, while LR has the lowest difference of 0.0042. None of these models exhibited significant underfitting or overfitting. On the test set, the RF-Blending model has the highest accuracy (85.37%), which is slightly higher than the Blending model (84.67%) and RF (84.14%). Additionally, the RF-Blending model has the highest F1-score (83.20%) and AUC value (0.8950), which are slightly higher than those of the Blending model (F1-score = 82.81%, AUC = 0.8919) and RF (F1-score = 81.16%, AUC = 0.8946). The accuracy of LR, KNN, and SVM are much lower than the other three models, at 70.38, 72.82, and 73.17%, respectively. The differences in precision and specificity between these six models are more obvious, with the RF-Blending, Blending, and RF having much higher values than the other three. The sensitivity and recall of the models are relatively balanced, with a maximum value of 75.18% (Blending) and a minimum of 69.15% (RF). The ROC curves for the six models on the test set are shown in Figure 6. In both figures, the six curves are divided into two categories, with the ROC curves for RF-Blending, Blending, and RF in one category and the other three in the other. The first three (RF-Blending, Blending, and RF) have better performance compared to the last three (LR, SVM, and KNN).
Table 4

Model performance evaluation results in the training set

RF-BlendingBlendingLRRFKNNSVM
TP 515 520 482 455 510 518 
FP 35 47 231 202 200 
TN 644 632 448 670 477 479 
FN 143 138 176 203 148 140 
Accuracy 86.69% 86.16% 69.56% 84.14% 73.82% 74.57% 
Precision 93.63% 91.71% 67.60% 98.06% 71.63% 72.14% 
Sensitivity/Recall 78.27% 79.03% 73.25% 69.15% 77.51% 78.72% 
Specificity 94.85% 93.08% 65.98% 98.67% 70.25% 70.54% 
F1-score 85.26% 84.90% 70.31% 81.11% 74.45% 75.29% 
AUC 0.9466 0.9326 0.7682 0.9326 0.8118 0.8148 
RF-BlendingBlendingLRRFKNNSVM
TP 515 520 482 455 510 518 
FP 35 47 231 202 200 
TN 644 632 448 670 477 479 
FN 143 138 176 203 148 140 
Accuracy 86.69% 86.16% 69.56% 84.14% 73.82% 74.57% 
Precision 93.63% 91.71% 67.60% 98.06% 71.63% 72.14% 
Sensitivity/Recall 78.27% 79.03% 73.25% 69.15% 77.51% 78.72% 
Specificity 94.85% 93.08% 65.98% 98.67% 70.25% 70.54% 
F1-score 85.26% 84.90% 70.31% 81.11% 74.45% 75.29% 
AUC 0.9466 0.9326 0.7682 0.9326 0.8118 0.8148 
Table 5

Model performance evaluation results in the test set

RF-BlendingBlendingLRRFKNNSVM
TP 208 212 204 196 201 207 
FP 10 18 92 75 79 
TN 282 274 200 287 217 213 
FN 74 70 78 86 81 75 
Accuracy 85.37% 84.67% 70.38% 84.14% 72.82% 73.17% 
Precision 95.41% 92.17% 68.92% 97.51% 72.82% 72.38% 
Sensitivity/Recall 73.76% 75.18% 72.34% 69.50% 71.28% 73.40% 
Specificity 96.58% 93.84% 68.49% 98.29% 74.32% 72.95% 
F1-score 83.20% 82.81% 70.59% 81.16% 72.04% 72.89% 
AUC 0.8950 0.8919 0.7640 0.8946 0.7889 0.7886 
RF-BlendingBlendingLRRFKNNSVM
TP 208 212 204 196 201 207 
FP 10 18 92 75 79 
TN 282 274 200 287 217 213 
FN 74 70 78 86 81 75 
Accuracy 85.37% 84.67% 70.38% 84.14% 72.82% 73.17% 
Precision 95.41% 92.17% 68.92% 97.51% 72.82% 72.38% 
Sensitivity/Recall 73.76% 75.18% 72.34% 69.50% 71.28% 73.40% 
Specificity 96.58% 93.84% 68.49% 98.29% 74.32% 72.95% 
F1-score 83.20% 82.81% 70.59% 81.16% 72.04% 72.89% 
AUC 0.8950 0.8919 0.7640 0.8946 0.7889 0.7886 
Figure 6

The ROC curve based on the test set.

Figure 6

The ROC curve based on the test set.

Close modal
The values of accuracy, F1-score, and AUC obtained from CV are shown in Figure 7, where the green line indicates the median. And the average (ave) and standard deviation (σ) for each model are listed at the top of each box plot. RF-Blending, Blending, and RF have better comprehensive performance than SVM, LR, and KNN. The RF-Blending model has the highest average accuracy (85.29%), average F1-score (83.15%), and average AUC value (0.8950). Blending is slightly inferior to the RF-Blending model in these evaluation metrics, but it has a smaller standard deviation and more stable results. This may be because the dimensionality of the meta-feature dataset of Blending is three, while the dimensionality of RF-Blending is 13, and the increase in dimensionality leads to the increase in standard deviation. Even so, the standard deviations of RF-Blending are only 0.0023 (Accuracy), 0.0027 (F1-score), and 0.0013 (AUC), which indicates that the generalization performance of RF-Blending is excellent, and the use of different training and test sets does not affect the model construction, so the previously trained model will be directly used to produce flash flood susceptibility maps in the next section. LR has the worst comprehensive performance, followed by KNN and SVM.
Figure 7

Evaluation metric values from CV. (a) Accuracy, (b) F1-Score, and (c) AUC. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/nh.2023.139.

Figure 7

Evaluation metric values from CV. (a) Accuracy, (b) F1-Score, and (c) AUC. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/nh.2023.139.

Close modal

In summary, RF-Blending outperforms the other models on the test set, with higher values for accuracy, F1-score, and AUC compared to both Blending and RF, and much higher values compared to the other three benchmarks. While the RF-Blending model is not the highest in terms of precision, specificity, and recall, it is still among the top-performing models. The cross-validation results demonstrate similar characteristics to the test set, with RF-Blending having the highest accuracy, F1-score, and AUC value, followed by Blending and RF. Although RF-Blending is slightly less stable than Blending, its standard deviation is still low and does not significantly impact the model's performance.

Flash flood susceptibility map

The Blending model was employed to assess and map the flash flood susceptibility for all catchments in the study area. The results are presented in Figure 8, where the flash flood susceptibility is categorized into different classes using various division thresholds, including ten classes, three classes, and two classes, respectively.
Figure 8

Flash flood susceptibility map in Jiangxi Province. (a) Ten classes, (b) three classes, and (c) two classes.

Figure 8

Flash flood susceptibility map in Jiangxi Province. (a) Ten classes, (b) three classes, and (c) two classes.

Close modal

As shown in Figure 8(a), flash flood susceptibility is categorized into ten classes, where a higher susceptibility value indicates a higher risk of flash floods in the corresponding catchment. The map shows that areas with high flash flood susceptibility are mainly concentrated in the north, northeast, and southwest of Jiangxi Province, while catchments in the central, eastern, and southeastern parts of the province have a low susceptibility to flash floods. In most parts of Jiangxi Province, the high and low risk areas are distributed in a striated pattern. Comparing Figure 8(b), where flash flood susceptibility is classified into three classes, with Figure 8(c), where it is classified into two classes, there are few catchments with medium susceptibility. This is corroborated by Figure 8(a), where most of the catchments with low susceptibility have susceptibility values below 0.3 and most of the catchments with high susceptibility are greater than 0.9. Different from the catchments with high susceptibility, the catchments with low susceptibility do not exhibit significant clustering, with only a few of them concentrated in the central and eastern parts of Jiangxi Province. Most catchments with low susceptibility are distributed randomly throughout the study area, except in the northeast. In addition, the Poyang Lake has low flash flood susceptibility because it is mostly a lake area itself, but most of the catchments around the Poyang Lake, especially those in the southwest region, have high flash flood susceptibility.

Three pie charts are drawn to represent the number and proportion of catchments with different flash flood susceptibility levels. Figure 9(a)–9(c) corresponds to Figure 8(a)–8(c), respectively. When using a threshold of 0.5, a total of 7,059 catchments are identified as being at risk of flash floods, accounting for approximately 57.2% of all catchments. When dividing the flash flood susceptibility into three classes, there are still 6,795 catchments in the high-risk zone, accounting for about 55.1%, 1,255 catchments in the medium-risk zone, accounting for 10.2%, and 4,259 catchments in the low risk zone, accounting for about 34.5%. When flash flood susceptibility is classified into ten classes, there are 6,568 catchments with flash flood susceptibility above 0.9, accounting for about 53.2%, and only 4% of catchments with flash flood susceptibility between 0.5 and 0.9.
Figure 9

Pie charts showing the number and proportion of different flash flood susceptibility levels. (a) Ten classes, (b) three classes, and (c) two classes.

Figure 9

Pie charts showing the number and proportion of different flash flood susceptibility levels. (a) Ten classes, (b) three classes, and (c) two classes.

Close modal

The main objectives of this study are two: the first is to prove the superiority of the RF-Blending model incorporating the original dataset into the meta-feature dataset in assessing flash flood susceptibility, while the other is to assess and map the flash flood susceptibility in Jiangxi Province, China. Further discussion on model performance, the practical implications of the flash flood susceptibility map, limitations of this study, and possible future research directions are presented in this section.

The results show that the strengths and weaknesses of the models vary across the different evaluation metrics. The RF-Blending model demonstrated a clear advantage over the Blending model and the benchmark models in terms of accuracy, F1-score, and AUC (see Tables 3 and 4) on both the training and test set. The CV box plots confirmed this advantage, with the RF-Blending model exhibiting much higher metrics than KNN, SVM, and LR, and slightly higher metrics than Blending and RF. While the RF-Blending model was found to be less stable than the Blending model, it is important to note that the RF-Blending model's meta-feature dataset contains 13 features, while the Blending model only contains 3 features. Furthermore, the standard deviation of the RF-Blending model was smaller than that of the benchmark models. Therefore, the RF-Blending model used in this study outperformed the Blending model and the benchmark models.

The flash flood susceptibility maps and pie charts for Jiangxi Province indicated that most catchments had either very low or very high susceptibility to flash floods. The binary classification approach using 1 and 0 as outcome variables may be responsible for this result, as most catchments predicted to be in the true category had a susceptibility higher than 0.9. Further classification of these catchments is necessary to refine the assessment of their flash flood susceptibility.

The flash flood susceptibility map can be used as a reference for flash flood management in Jiangxi Province. For instance, Jingdezhen, Shangrao, the western part of Ganzhou, and the western part of Jiujiang were identified as highly susceptible to flash floods and thus, should be prioritized for flood management and prevention. The high-risk areas maintained a correlation with the distribution of rivers, and exploring the connection between them could reduce flash flood risk through better management of river channels.

The study acknowledges several limitations, such as the lack of highly accurate spatiotemporal historical flash flood records and the need to consider more catchment factors relevant to flash floods. Additionally, different combinations of machine learning models, including deep learning models, could be explored to improve the performance of the Blending model.

This study demonstrates that the RF-Blending model plays a positive role in assessing flash flood susceptibility in Jiangxi Province, China, outperforming the Blending model and benchmark models. The flash flood susceptibility maps show that about half of the catchments in Jiangxi Province are highly susceptible to flash floods, particularly in the north, northeast, and southwest regions (Jingdezhen, Shangrao, the western part of Jiujiang, and the western part of Ganzhou). These results can both serve as a reference for further research and flash flood management and prevention, ultimately contributing to reducing disaster mortality, the number of affected people, and the direct disaster economic loss in accordance with the Sendai framework.

This work was supported by the National Key R&D Program of China [grant No. 2019YFC1510601].

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Arabameri
A.
,
Saha
S.
,
Chen
W.
,
Roy
J.
,
Pradhan
B.
&
Bui
D. T.
2020
Flash flood susceptibility modelling using functional tree and hybrid ensemble techniques
.
Journal of Hydrology
587
,
125007
.
Breiman
L.
2001
Random forests
.
Machine Learning
45
(
1
),
5
32
.
Bui
D. T.
,
Ngo
P. T. T.
,
Pham
T. D.
,
Jaafari
A.
,
Minh
N. Q.
,
Hoa
P. V.
&
Samui
P.
2019
A novel hybrid approach based on a swarm intelligence optimized extreme learning machine for flash flood susceptibility mapping
.
Catena
179
,
184
196
.
Cao
Y.
,
Jia
H.
,
Xiong
J.
,
Cheng
W.
,
Li
K.
,
Pang
Q.
&
Yong
Z.
2020
Flash flood susceptibility assessment based on geodetector, certainty factor, and logistic regression analyses in Fujian Province, China
.
ISPRS International Journal of Geo-Information
9
,
748
.
Chapi
K.
,
Singh
V. P.
,
Shirzadi
A.
,
Shahabi
H.
,
Bui
D. T.
,
Pham
B. T.
&
Khosravi
K.
2017
A novel hybrid artificial intelligence approach for flood susceptibility assessment
.
Environmental Modelling & Software
95
,
229
245
.
Chen
W.
,
Hong
H. Y.
,
Li
S. J.
,
Shahabi
H.
,
Wang
Y.
,
Wang
X. J.
&
Bin Ahmad
B.
2019
Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles
.
Journal of Hydrology
575
,
864
873
.
Chen
W.
,
Li
Y.
,
Xue
W.
,
Shahabi
H.
,
Li
S.
,
Hong
H.
,
Wang
X.
,
Bian
H.
,
Zhang
S.
,
Pradhan
B.
&
Bin Ahmad
B.
2020
Modeling flood susceptibility using data-driven approaches of naive Bayes tree, alternating decision tree, and random forest methods
.
Science of the Total Environment
701
,
134979
.
Chiang
Y.-M.
,
Hsu
K.-L.
,
Chang
F.-J.
,
Hong
Y.
&
Sorooshian
S.
2007
Merging multiple precipitation sources for flash flood forecasting
.
Journal of Hydrology
340
(
3
),
183
196
.
Choubin
B.
,
Moradi
E.
,
Golshan
M.
,
Adamowski
J.
,
Sajedi-Hosseini
F.
&
Mosavi
A.
2019
An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines
.
Science of the Total Environment
651
,
2087
2096
.
Costache
R.
,
Pham
Q. B.
,
Sharifi
E.
,
Thuy Linh
N. T.
,
Abba
S. I.
,
Vojtek
M.
,
Vojteková
J.
,
Thao Nhi
P. T.
&
Khoi
D. N.
2019
Flash-flood susceptibility assessment using multi-criteria decision making and machine learning supported by remote sensing and GIS techniques
.
Remote Sensing
12
(
1
),
106
.
Costache
R.
,
Pham
Q. B.
,
Avand
M.
,
Thuy Linh
N. T.
,
Vojtek
M.
,
Vojteková
J.
,
Lee
S.
,
Khoi
D. N.
,
Thao Nhi
P. T.
&
Dung
T. D.
2020b
Novel hybrid models between bivariate statistics, artificial neural networks and boosting algorithms for flood susceptibility assessment
.
Journal of Environmental Management
265
,
110485
.
Fan
J.
,
Shan
J.
,
Guan
M.
&
Xu
X.
2012
Analysis of critical rainfall calculation for flash floods in catchments in Jiangxi Province
.
Meteorological Monthly (in Chinese)
38
,
1110
1114
.
Fang
Z. C.
,
Wang
Y.
,
Peng
L.
&
Hong
H. Y.
2021
Predicting flood susceptibility using LSTM neural networks
.
Journal of Hydrology
594
,
125734
.
Gao
S.
2020
A review of recent researches and reflections on geospatial artificial intelligence
.
Geomatics and Information Science of Wuhan University
45
,
1865
1874
.
Guo
L.
,
Zhang
X.
,
Liu
R.
,
Liu
Y.
&
Liu
Q.
2017
Achievements and preliminary analysis on China National Flash Flood Disasters Investigation and Evaluation
.
Journal of Geo-Information Science
19
,
1548
1556
.
Guo
L.
,
Ding
L.
,
Sun
D.
,
Liu
C.
,
He
B.
&
Liu
R.
2018a
Key techniques of flash flood disaster prevention in China
.
Journal of Hydraulic Engineering
49
,
1123
1136
.
Guo
L.
,
He
B. S.
,
Ma
M. H.
,
Chang
Q. R.
,
Li
Q.
,
Zhang
K.
&
Hong
Y.
2018b
A comprehensive flash flood defense system in China: overview, achievements, and outlook
.
Natural Hazards
92
,
727
740
.
Ha
H.
,
Luu
C.
,
Bui
Q. D.
,
Pham
D. H.
,
Hoang
T.
,
Nguyen
V. P.
,
Vu
M. T.
&
Pham
B. T.
2021
Flash flood susceptibility prediction mapping for a road network using hybrid machine learning models
.
Natural Hazards
109
,
1247
1270
.
Hapuarachchi
H. A. P.
,
Wang
Q. J.
&
Pagano
T. C.
2011
A review of advances in flash flood forecasting
.
Hydrological Processes
25
,
2771
2784
.
Hong
H. Y.
,
Panahi
M.
,
Shirzadi
A.
,
Ma
T. W.
,
Liu
J. Z.
,
Zhu
A. X.
,
Chen
W.
,
Kougias
I.
&
Kazakis
N.
2018
Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution
.
Science of the Total Environment
621
,
1124
1141
.
Hosseini
F. S.
,
Choubin
B.
,
Mosavi
A.
,
Nabipour
N.
,
Shamshirband
S.
,
Darabi
H.
&
Haghighi
A. T.
2020
Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: application of the simulated annealing feature selection method
.
Science of the Total Environment
711
,
135161
.
Janowicz
K.
,
Gao
S.
,
McKenzie
G.
,
Hu
Y.
&
Bhaduri
B.
2020
GeoAI: spatially explicit artificial intelligence techniques for geographic knowledge discovery and beyond
.
International Journal of Geographical Information Science
34
,
625
636
.
Khajehei
S.
,
Ahmadalipour
A.
,
Shao
W.
&
Moradkhani
H.
2020
A place-based assessment of flash flood hazard and vulnerability in the contiguous United States
.
Scientific Reports
10
,
448
.
Lazaro
J. M.
,
Navarro
J. A. S.
,
Gil
A. G.
&
Romero
V. E.
2014
Sensitivity analysis of main variables present in flash flood processes. application in two Spanish catchments: Aras and Aguilon
.
Environmental Earth Sciences
71
,
2925
2939
.
Li
H.
&
Wan
Q.
2017
Study on rainfall index selection for hazard analysis of mountain torrents disaster of small watersheds
.
Journal of Geo-Information Science
19
,
425
435
.
Li
Q.
,
Wang
Y.
,
Li
H.
,
Zhang
M.
,
Li
C.
&
Chen
X.
2017
Rainfall threshold for flash flood early warning based on flood peak modulus
.
Journal of Geo-Information Science
19
,
1643
1652
.
Liu
Y.
,
Yang
Z.
,
Huang
Y.
&
Liu
C.
2019
Spatiotemporal evolution and driving factors of China's flash flood disasters since 1949
.
Science China Earth Sciences
49
,
408
420
.
Liu
Y.
,
Guo
H.
,
Li
H.
,
Dong
W.
&
Pei
T.
2022
A note on GeoAI from the perspective of geographical laws
.
Acta Geodaetica et Cartographica Sinica
51
(
6
),
1062
1069
.
Ma
M.
,
Zhao
G.
,
He
B.
,
Li
Q.
,
Dong
H.
,
Wang
S.
&
Wang
Z.
2021
XGBoost-based method for flash flood risk assessment
.
Journal of Hydrology
598
,
126382
.
Meraj
G.
,
Romshoo
S. A.
,
Yousuf
A. R.
,
Altaf
S.
&
Altaf
F.
2015
Assessing the influence of watershed characteristics on the flood vulnerability of Jhelum basin in Kashmir Himalaya
.
Natural Hazards
77
,
153
175
.
Ministry of Water Resources of China (MWR)
.
2021
Bulletin of Flood and Drought Disasters in China
.
Ministry of Water Res. of China
,
Beijing
.
Available from: http://www.mwr.gov.cn.
Nguyen
P.
,
Thorstensen
A.
,
Sorooshian
S.
,
Hsu
K. L.
,
AghaKouchak
A.
,
Sanders
B.
,
Koren
V.
,
Cui
Z. T.
&
Smith
M.
2016
A high resolution coupled hydrologic-hydraulic model (HiResFlood-UCI) for flash flood modeling
.
Journal of Hydrology
541
,
401
420
.
Noble
W. S.
2006
What is a support vector machine?
Nature Biotechnology
24
(
12
),
1565
1567
.
Pham
B. T.
,
Avand
M.
,
Janizadeh
S.
,
Phong
T. V.
,
Al-Ansari
N.
,
Ho
L. S.
,
Das
S.
,
Le
H. V.
,
Amini
A.
,
Bozchaloei
S. K.
,
Jafari
F.
&
Prakash
I.
2020
GIS based hybrid computational approaches for flash flood susceptibility assessment
.
Water
12
(
3
),
683
.
Rahman
M.
,
Chen
N. S.
,
Elbeltagi
A.
,
Islam
M. M.
,
Alam
M.
,
Pourghasemi
H. R.
,
Tao
W.
,
Zhang
J.
,
Tian
S. F.
,
Faiz
H.
,
Baig
M. A.
&
Dewan
A.
2021
Application of stacking hybrid machine learning algorithms in delineating multi-type flooding in Bangladesh
.
Journal of Environmental Management
295
,
113086
.
Reager
J. T.
,
Thomas
B. F.
&
Famiglietti
J. S.
2014
River basin flood potential inferred using GRACE gravity observations at several months lead time
.
Nature Geoscience
7
,
589
593
.
Soulsby
C.
,
Tetzlaff
D.
&
Hrachowitz
M.
2010
Spatial distribution of transit times in montane catchments: conceptualization tools for management
.
Hydrological Processes
24
,
3283
3288
.
Tang
C.
,
Xu
A.
,
Ma
F.
&
Dai
Z.
2018
Spatial and temporal variations of short-duration heavy precipitation in Jiangxi during 1961–2015
.
Torrential Rain and Disasters
37
,
421
427
.
Termeh
S. V. R.
,
Kornejady
A.
,
Pourghasemi
H. R.
&
Keesstra
S.
2018
Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms
.
Science of the Total Environment
615
,
438
451
.
UNDRR (United Nations Office for Disaster Risk Reduction)
2015
Sendai Framework for Disaster Risk Reduction 2015–2030
.
UNDRR
,
Geneva
.
Xiong
J. N.
,
Pang
Q.
,
Fan
C. K.
,
Cheng
W. M.
,
Ye
C. C.
,
Zhao
Y. L.
,
He
Y. R.
&
Cao
Y. F.
2020
Spatiotemporal characteristics and driving force analysis of flash floods in Fujian Province
.
ISPRS International Journal of Geo-Information
9
(
2
),
133
.
Yao
J.
,
Zhang
X. X.
,
Luo
W. C.
,
Liu
C. J.
&
Ren
L. L.
2022
Applications of Stacking/Blending ensemble learning approaches for evaluating flash flood susceptibility
.
International Journal of Applied Earth Observation and Geoinformation
112
,
102932
.
Zhang
T.
,
Zi
L.
,
Yang
W.
&
Wang
J.
2021
Applicability of SCS model in flash flood forecasting and early warning
.
Journal of Yangtze River Scientific Research
38
,
71
76
.
Zhao
G.
,
Pang
B.
,
Xu
Z.
,
Wang
Z.
&
Shi
R.
2016
Assessment on the hazard of flash flood disasters in China
.
Journal of Hydraulic Engineering
47
,
1133
1142 + 1152
.
Zhao
G.
,
Liu
R.
,
Yang
M.
,
Tu
T.
,
Ma
M.
,
Hong
Y.
&
Wang
X.
2022
Large-scale flash flood warning in China using deep learning
.
Journal of Hydrology
604
,
127222
.
Zhong
M.
,
Jiang
T.
,
Li
K.
,
Lu
Q. Q.
,
Wang
J.
&
Zhu
J. J.
2020
Multiple environmental factors analysis of flash flood risk in Upper Hanjiang River, Southern China
.
Environmental Science and Pollution Research
27
,
37218
37228
.
Zhou
C.
,
Wang
Q.
,
Huang
S.
&
Cheng
D.
2000
A GIS-based approach to flood risk zonation
.
Acta Geographica Sinica
1
,
15
24
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).