Abstract

In an effort to improve tools for effective flood risk assessment, we applied machine learning algorithms to predict flood-prone areas in Amol city (Iran), a site with recent floods (2017–2018). An ensemble approach was then implemented to predict hazard probabilities using the best machine learning algorithms (boosted regression tree, multivariate adaptive regression spline, generalized linear model, and generalized additive model) based on a receiver operator characteristic-area under the curve assessment. The algorithms were all trained and tested on 92 randomly selected points, information from a flood inundation survey, and geospatial predictor variables (precipitation, land use, elevation, slope percent, curve number, distance to river, distance to channel, and depth to groundwater). The ensemble model had 0.925 and 0.892 accuracy for training and testing data, respectively. We then created a vulnerability map from data on building density, building age, population density, and socio-economic conditions and assessed risk as a product of hazard and vulnerability. The results indicated that distance to channel, land use, and runoff generation were the most important factors associated with flood hazard, while population density and building density were the most important factors determining vulnerability. Areas of highest and lowest flood risks were identified, leading to recommendations on where to implement flood risk reduction measures to guide flood governance in Amol city.

INTRODUCTION

In general, floods occur when the runoff water volume exceeds the transport capacity of channels (Lytle & Poff 2004; Tockner et al. 2010; Borga et al. 2011; Petroselli et al. 2019). Flooding is intensified when drainage systems are improperly designed and/or maintained (Sadegh et al. 2018). Flood vulnerability increases when development proceeds along river channels and on floodplains (Zhou et al. 2012; Liu et al. 2016). A common narrative in many Asian cities where flooding is prevalent is increased vulnerability to flood disasters owing to uncontrolled development in flood-prone areas (Julien et al. 2009; Ghani et al. 2012; Dewan 2013; Engeland et al. 2018). In future, the flood risk is projected to increase with an intensification of hydrological and climate variables (Muller 2007; Roy 2009; Zhou et al. 2012; Yin et al. 2015; Hinkel et al. 2014; Muis et al. 2015).

Flooding impacts include the loss of life, direct (property/asset loss or damage) and indirect (the loss of livelihoods) economic damage, and damage to transportation, utility, infrastructure, and communication systems (Sharif et al. 2016; Wu et al. 2018). Impacts may persist for months or years, and secondary impacts, including health-related problems, may emerge (Tapsell et al. 2002; Sinnakaudan et al. 2003). As more than half the world's population already lives in urban areas and this proportion is projected to increase to two-thirds by 2050 (UNPD 2014; El Alfy 2016), there is an immediate need to reduce flood risks in urban population centers worldwide.

Floods in cities are often devastating because high densities of people and of assets are concentrated in areas where the flood potential is exacerbated by disturbances to nature (Kjeldsen 2010; Cherqui et al. 2015; Darabi et al. 2019). For example, urbanization is usually associated with a high proportion of impervious features (e.g., roads, walkways, and car parks), disturbed river/stream channels, and artificial storm drainage systems (Kundzewicz et al. 2010; Suriya & Mudgal 2012). Hydrological changes associated with urbanization include reductions in infiltration, evapotranspiration, and groundwater recharge and an increase in the volume of fast-flowing surface water during most storms (Schueler 1994; Mulligan & Crampton 2005; Haghighi et al. 2019; Pirnia et al. 2019). These changes affect the severity, timing, and extent of flooding (Nirupama & Simonovic 2007; Dewan & Yamaguchi 2008; Du et al. 2010; Suriya & Mudgal 2012).

Urban flood mapping is an evolving challenge in flood risk reduction planning by city managers and policymakers (Noh et al. 2016; Darabi et al. 2018; Yaraghi et al. 2019). Effective risk and vulnerability assessment requires a thorough knowledge of the conditions affecting flooding and exacerbating flood impacts (Ouma & Tateishi 2014). At the heart of such assessments are flood risk maps, which can be created by a number of approaches, including hydrological and hydraulic models (Brimicombe & Bartlett 1996; Booij 2005; Masood & Takeuchi 2012), the integration of analytic hierarchy process (AHP) and geographic information system (GIS) techniques (Ouma & Tateishi 2014), frequency ratio (Khosravi et al. 2016), multi-criteria evaluation (Meyer et al. 2009), systems simulation (Amendola et al. 2000), and probability-based analysis (Jalayer et al. 2014). An emerging technology involves machine learning methods (Chau et al. 2005; Maier et al. 2010; Lamovec et al. 2013; Choubin et al. 2018; Termeh et al. 2018; Zhao et al. 2018). Despite recent advances in producing maps for flood risk assessments (Choubin et al. 2018), limitations still exist and new approaches are still needed.

The main objectives of this study were to (i) combine machine learning techniques, GIS, and data on environmental conditioning factors to produce flood risk maps, (ii) compare the new machine learning techniques with previous ones, and (iii) apply the ensemble model and rank the importance of different conditioning factors in urban flood hazard prediction to evaluate flood vulnerability in Amol city, Iran.

The novelty of the study lies in (i) comparing formerly used algorithms with new algorithms, including support vector machine (SVM), random forest (RF), maximum entropy (Maxent), boosted regression tree (BRT), multivariate adaptive regression spline (MARS), generalized linear model (GLM), generalized additive model (GAM), (ii) developing a spatial framework for urban flood vulnerability mapping by applying new urban conditioning factors, and (iii) introducing an ensemble model for urban flood risk reduction, where ensemble modeling will employ the advantages of previous individual models in the current study in the context of urban flood and combines the prediction of above-selected models and provide an integrated model to increase the prediction accuracy.

STUDY AREA AND DATA

Amol city (36°26′02″–36°29′45″N; 52°19′14″–53°23′50″E) has a population of 237,528 (in 2016), making it the third largest city in Mazandaran Province in northern Iran (Lotfi et al. 2016). It is located at an altitude of 59–137 m above mean sea level (masl) and expanded in the geographical area from 21 to 27 km2 between 1998 and 2017 (Figure 1). Amol is located on the Haraz River, which passes through the middle of the city on its way to the Caspian Sea. The residential areas of the city are surrounded mainly by agricultural land, orchards, and high mountains covered by forest. The mean annual rainfall in the region is 680 mm, and the climate is semi-humid (Sahin 2012; Choubin et al. 2018). Over recent decades, urban development has led to extensive changes in hydrological processes and drainage systems in the city. As a result, the number of urban flood events has been increasing annually in the past ten years or so. Notable flood events occurred on 12 November 2012 (with damage to more than 40 residential areas), 16 September 2015 (31 areas), 24 November 2015, (40 areas), and 15 October 2016 (10 areas) (Sedaghat et al. 2016).

Figure 1

(Top) Location of Amol city in northern Iran and the (center) map of the city showing points with a history of flooding and (bottom) previous flooding at (a) the Amol suspension bridge, (b) Banovan park, and (c) Bidestan park.

Figure 1

(Top) Location of Amol city in northern Iran and the (center) map of the city showing points with a history of flooding and (bottom) previous flooding at (a) the Amol suspension bridge, (b) Banovan park, and (c) Bidestan park.

METHODS

Prediction of flood-prone areas

Machine learning models

We applied the following seven machine learning algorithms to predict flooded areas using geospatial predictor variables:

Support vector machine: A classification system and supervised model that enable the computer to learn how to analyze data for classification and regression (Chen et al. 2017). SVMs are popular because of good empirical performance (compared with other models, such as artificial neural networks), easy training process, the avoidance of local minima, relatively suitable mathematics for multi-dimensional data, and a tradeoff between complexity and error (Chen et al. 2017).

Random forest (suggested by Breiman 2001): A classification and regression technique based on assembling a large number of decision trees. Specifically, it is an ensemble of trees constructed from a training dataset and internally validated to obtain a dependent variable by given independent variables (Boulesteix et al. 2012). Two powerful advantages of machine learning techniques are used in RF: bagging and random feature selection. For bagging, each tree is trained on a bootstrap sample of the training data, and predictions are made by a majority vote of trees. Further, the model randomly selects a subspace of feature predictions to split at each node when growing a tree (Jiang et al. 2007).

Maximum entropy (Maxent) (proposed by Phillips et al. (2006) and specifically designed for ecological modeling and spatial distribution modeling): A general-purpose machine learning technique for making predictions from independent variables. Maxent uses the principle of maximum entropy to relate presence-only data to environmental variables and dependent variables to estimate a potential geographical distribution. Regarding a pattern in the machine learning algorithms, a presence-only feature forces Maxent to follow a solution in which it indirectly solves a discriminative problem through Bays' rule (Phillips et al. 2006; Urbani et al. 2015; Rahmati et al. 2016; Chen et al. 2017).

Boosted regression tree (belonging to the gradient boosting modeling family): A tree-based model that combines a large number of machine learning and regression tree models to learn and weigh them (by assigning individual weights to every sample point of the training dataset), in order to describe the relationship between the independent and dependent variables. It uses several techniques to improve the performance of a single model, e.g., by creating an ensemble of regression models (Littke et al. 2017; Hu et al. 2018; Wang et al. 2018).

Multivariate adaptive regression spline (introduced by Friedman (1991) to organize relationships between a set of independent variables and the dependent variables): A machine learning technique that can estimate general functions of high-dimensional arguments. Further, it is an adaptive modeling process based on non-linear and non-parametric statistics (Samui 2013). MARS makes no assumptions about the underlying functional relationships between inputs (dependent) and (target) independent variables (Zhang & Goh 2016). It allows complex relationship modeling between a dependent variable and independent variables, and its simple rule-based functions facilitate the prediction of spatial distributions using independent variables (Leathwick et al. 2006).

Generalized linear model: A model that assumes a curvilinear relationship (non-linear relationship) between the dependent variable and the independent variables. GLM is defined by three components: a random component that specifies a distribution for response and predictor variables, a systematic component that relates a parameter to the predictor's variables, and a link function that connects the random and systematic components (Guisan et al. 2002; Koubbi et al. 2011).

Generalized additive model: A GLM for which the linear predictor is specified as a sum of smooth functions of some or all of the covariates. GAM also provides estimates using a combination of the local scoring algorithm and the back-fitting algorithm (Dominici et al. 2002). It is best used for more than purely exploratory analysis, for which its smoothing parameter is a key component (Hastie & Tibshirani 1990). GAM is a semi-parametric extension of GLM. Like GLM, GAM uses a link function to establish a relationship between the mean of the dependent variable and a ‘smoothed’ function of the independent variables. The strength of GAM lies in its ability to deal with highly non-linear and non-monotonic relationships between the dependent and independent variables (Guisan et al. 2002).

Inputs to the hazard model

Flooded points in Amol were identified based on a flood inventory of inundated areas during the floods in 2017–2018 and on documents obtained from the municipal authority of Amol showing historical flood inundation areas. Historical points were validated with photographs showing the severity of flooding. These points served as dependent variables for our prediction models. In the preparation of the flood hazard map, we selected 92 flood-prone points (assigned a value of 1, for the flooded zone) and 60 non-flooded points (assigned a value of 0).

Based on the literature, we included eight geospatial predictors as model conditioning factors (independent variables) related to flood inundation. These were precipitation, land use/land cover (LULC), elevation, slope percent, curve number (CN), distance to river, distance to channel, and depth to groundwater (Fernández & Lutz 2010; Ouma & Tateishi 2014). All variables were transferred to 5-m grids to create the hazard maps from the spatial distribution models (SDMs) as machine learning algorithms in the R programing environment.

Daily precipitation data for 16 weather stations (Ramsar, Noushahr, Siahbisheh, Gharakhil-Ghaemshr, Firouzkooh, Sari, Kiasar, Amol, Polsefid, Alasht, Bandaramirabad, Galogah, Kojoor, Baladeh, Babolsar, and Dasht-E-Naz) were obtained from the Iranian Meteorological Organization (IRIMO). These data were used to prepare a precipitation depth map (for the period 2001–2016) for the Mazandaran Province, using the inverse-distance weighted (IDW) interpolation method in ArcGIS GIS 10.4. The recorded amounts vary from 672 mm in the east of the study area to 684 mm in the west (Figure 2(a)). Mean annual precipitation in Amol city is 680 mm, based on the nearby Amol weather station (Figure 1).

Figure 2

Conditioning factors incorporated within the hazard model as GIS layers: (a) rainfall, (b) land cover/land use, (c) digital elevation map (DEM), (d) slope, (e) SCS CN, (f) distance to river, (g) distance to channels, and (h) depth to groundwater.

Figure 2

Conditioning factors incorporated within the hazard model as GIS layers: (a) rainfall, (b) land cover/land use, (c) digital elevation map (DEM), (d) slope, (e) SCS CN, (f) distance to river, (g) distance to channels, and (h) depth to groundwater.

Using the 2016 LULC map obtained from Amol city authority (Figure 2(b)), we identified five LULC types: agricultural area (21.67%), orchard (9.45%), park (1.50%), residential (including residential buildings and street areas) (66.20%), and the Haraz River (1.20%). We also created a 5-m resolution digital elevation map (DEM) that represents the 59–137 masl variation across the study site (Figure 2(c)). We derived a slope map from the 5-m DEM using the ‘slope tool, Spatial Analyst’ in ArcGIS GIS 10.4. The slope varies from 0% to more than 7.21% in the study area (Figure 2(d)).

Then, we estimated the U.S. Soil Conservation Service (SCS) CN for the study area (Figure 2(e)). The SCS includes key data on the infiltration capacity and retention of runoff in a specific area, such as the type of land use and land cover (Zeng et al. 2017). We derived these values from land use and a hydrologic soil group using the ArcCN-runoff tool in the GIS software (Darabi et al. 2014, 2016; Menberu et al. 2014).

The distance to the river plays an important role in urban flood inundation mapping (Ghani et al. 2009; Leow et al. 2009). As an example for Amol city, according to records from the field survey and local authorities, many, but not all, areas affected by flooding lie near the Haraz River. The Euclidean distance to the Haraz River was calculated using the distance module in GIS 10.4 (Figure 2(f)).

According to existing records, the areas most affected by flood inundation are close to areas with poor urban drainage systems. Therefore, the Euclidean distance to other channels (as collectors of surface water) was also extracted using the distance module in GIS 10.4 (Figure 2(g)).

The groundwater level can lead to an increase or decrease in the groundwater recharge rate in unsaturated and saturated zones (Tam & Nga 2018). The groundwater level data used in this study were obtained from the Iranian Water Resources Management Company (IWRMC), and an IDW interpolation method was applied to identify the depth to groundwater (Figure 2(h)).

Inputs to the vulnerability model

To determine the vulnerability factors, ArcGIS (10.4) maps were constructed for each class in 5-m raster grids for the assignment of the weight/rank values, and an analysis was carried out using the AHP method (Saaty 2006; Fernández & Lutz 2010). The building density is important because it has significant impacts on the damage caused by urban floods. It was divided into four classes: high (>300 dwellings per hectare), medium (200–300 dwellings per hectare), low (100–200 dwellings per hectare), and very low (<100 dwellings per hectare) (Figure 3(a)). Building age was divided into five classes: recently completed (≤5 years), new (10–19 years), medium (20–29 years), old (30–39 years), and very old (≥40 years) (Figure 3(b)). The population density refers to the number of people inhabiting a given urbanized area, where high levels reflect the population at risk to floods (Güneralp et al. 2017). It was divided into four classes: high (≥1,500/km2), medium (1,500–1,000/km2), low (1,000–500/km2), and very low (≤500/km2) (Figure 3(c)). Socio-economic conditions refer to the inherent properties and behavior of humans and society within a specific urbanized region, and are assessed based on economic conditions and social welfare data on people in a given urban area. This type of information is valuable in taking into account the otherwise indirect and intangible impacts of flood hazards (Kaspersen & Halsnæs 2017). The socio-economic condition map for Amol was divided into five classes: very good, good, moderate, weak, and very weak (Figure 3(d)). Open spaces were also included as a part of Amol city for each vulnerability map. All class divisions are based on suggestions by Güneralp et al. (2017) and Darabi et al. (2019), and all the data were obtained from the municipal authority of Amol.

Figure 3

Conditioning factors incorporated within the vulnerability as GIS layers: (a) building density, (b) building age (c), population density, and (d) socio-economic conditions.

Figure 3

Conditioning factors incorporated within the vulnerability as GIS layers: (a) building density, (b) building age (c), population density, and (d) socio-economic conditions.

Model training

Our modeling approach used a variety of models (Naimi & Araújo 2016; Darabi et al. 2019) to relate response variables (here, urban flood mapping) to predictor variables (conditioning factors). In this study, each model was run based on learning algorithms using flood points and predictor variables. Each model also trained a portion of the data and was then tested on another portion.

Testing and performance

Before building the ensemble model, we assessed the prediction performance of the machine learning algorithms by considering the receiver operator characteristic-area under the curve (ROC-AUC) values based on the test data (Choubin et al. 2018). The ROC curves were created by plotting the true positive rate (TPR, or the sensitivity or probability of detection) versus the false positive rate (FPR or fallout or a false alarm, calculated as 1 – specificity). TPR and FPR were calculated as follows: 
formula
(1)
 
formula
(2)
where a true positive is a successful prediction that a point is a flood location; a false positive is an erroneous prediction that a point is a flood location; and condition positive/negative refers to the total number of points where floods occurred or did not occur (92 and 60, respectively). Values for TPR and FPR ranged from 0.0 to 1.0; high values of TPR and low values of FPR are ideal (1.0 TPR means a perfect prediction). In the ROC-AUC assessment, AUC refers to the area between the following: the curve formed by plotting TPR (y-axis) versus FPR (x-axis), and the 1:1 line for TRP versus FPR. Again, values approaching 1.0 are ideal (Tehrany et al. 2014). Points on the dotted 1:1 line indicate that the prediction power is equal to random selection; values below the line indicate that the prediction power is less than random.

Ensemble model

Based on the performance of the individual models, we built the ensemble model from the four best machine learning models. The process for building the ensemble model involved the following steps: (1) selecting the best models based on the ROC-AUC, (2) combining the selected models using an R program, to exploit all the advantages of the selected models, and (3) assessing the ensemble model using the ROC-AUC and selecting the most important conditioning factors (environmental variables) in the urban flood hazard.

Urban flood risk

We produced urban flood risk maps by multiplying hazard probabilities by vulnerability (Dewan 2013). 
formula
(3)
where the hazard maps were determined from the following conditioning factors: precipitation, LULC, elevation, slope percent, CN, distance to river, distance to channel, and depth to groundwater using the ensemble model; and the vulnerability map was based on the following factors: (1) building density, (2) building age, (3) population density, and (4) socio-economic conditions. Data to determine these factors were obtained from the municipal authority of Amol (Sedaghat et al. 2016).

RESULTS AND DISCUSSION

Performance of hazard models

Model accuracy of the SVM, RF, MAXENT, BRT, MARS, GLM, and GAM approaches, assessed using the ROC-AUC, is shown in Table 1. The AUC values during testing were highest for the GAM (0.85), GLM (0.83), MARS (0.82), and BRT (0.82) models (Table 1). The performance was lower for the SVM (0.77), Maxent (0.76), and RF (0.65) approaches (Table 1).

Table 1

Validation of the seven machine learning models and the ensemble model using the ROC-AUC method

Models AUC
 
Training Testing 
SVM 0.818 0.774 
RF 0.788 0.649 
Maxent 0.806 0.764 
BRT 0.838 0.824 
MARS 0.915 0.815 
GLM 0.876 0.833 
GAM 0.892 0.846 
Ensemble 0.925 0.892 
Models AUC
 
Training Testing 
SVM 0.818 0.774 
RF 0.788 0.649 
Maxent 0.806 0.764 
BRT 0.838 0.824 
MARS 0.915 0.815 
GLM 0.876 0.833 
GAM 0.892 0.846 
Ensemble 0.925 0.892 

Based on these performance data, we built the ensemble model from the BRT, MARS, GLM, and GAM approaches. The accuracy of the ensemble model was 0.925 and 0.89 for the training and testing data, respectively (Figure 4).

Figure 4

Validation of the ensemble model using the ROC-AUC method.

Figure 4

Validation of the ensemble model using the ROC-AUC method.

Urban flood hazard maps

Urban flood maps were constructed from the machine learning algorithms for regions with a high and low hazard of urban flooding. The importance of the conditioning factors was determined based on ensemble functions and the impact of the variables from the flooded points (flood inventory). Only the four most important factors were included in the ensemble model. These were the distance to the channel (0.92), LCLU (0.88), CN (0.84), and elevation (0.81) (Figure 5). All models demonstrated that zones with high hazard probability are mostly located in the north and center of Amol city (Figures 6(a)–(h)). The zones with a high hazard probability were mostly identified by the GAM, and these areas were considered as having a high flood risk (Figure 6).

Figure 5

Importance of different conditioning factors in urban flood hazard prediction based on the ensemble model.

Figure 5

Importance of different conditioning factors in urban flood hazard prediction based on the ensemble model.

Figure 6

Urban flood hazard probability maps for the study area created using the individual models (a) SVM, (b) RF, (c) MAXENT, (d) BRT, (e) MARS, (f) GLM, and (g) GAM, and (h) the ensemble model.

Figure 6

Urban flood hazard probability maps for the study area created using the individual models (a) SVM, (b) RF, (c) MAXENT, (d) BRT, (e) MARS, (f) GLM, and (g) GAM, and (h) the ensemble model.

Urban flood risk map

Weight and rank values were assigned to the factors and classes according to their importance in the case study. Based on expert knowledge and using AHP results to evaluate the relative importance of urban flood vulnerability factors, the factor with the greatest weight was population density (0.38), followed by building density (0.29), buildings (0.19), and socio-economic conditions (0.14). The weights obtained for each class of building density, building age, population density, and socio-economic conditions factors are shown in Table 2. In weighting factors and class ranking, total scores were applied and then each pixel of the output map was assigned a value reflecting its factor and normalized weight (Figure 7).

Table 2

Weight and rank values assigned to the different classes of the building density, building age, population density, and socio-economic factors

Factor Weighting Class Ranking Factor Weighting Class Ranking 
Building density 0.29 High 0.354 Population density 0.38 High 0.402 
Medium 0.283 Medium 0.281 
Low 0.202 Low 0.204 
Very low 0.152 Very low 0.111 
Open space 0.009 Open space 0.002 
Inconsistency ratio 0.017 Inconsistency ratio 0.014 
Building age 0.19 Newest 0.049 Socio-economic conditions 0.14 Very good 0.002 
New 0.089 Good 0.041 
Moderate 0.234 Moderate 0.195 
Old 0.269 Poor 0.332 
Very old 0.358 Very poor 0.429 
Open space 0.001 Open space 0.002 
Inconsistency ratio 0.019 Inconsistency ratio 0.021 
Factor Weighting Class Ranking Factor Weighting Class Ranking 
Building density 0.29 High 0.354 Population density 0.38 High 0.402 
Medium 0.283 Medium 0.281 
Low 0.202 Low 0.204 
Very low 0.152 Very low 0.111 
Open space 0.009 Open space 0.002 
Inconsistency ratio 0.017 Inconsistency ratio 0.014 
Building age 0.19 Newest 0.049 Socio-economic conditions 0.14 Very good 0.002 
New 0.089 Good 0.041 
Moderate 0.234 Moderate 0.195 
Old 0.269 Poor 0.332 
Very old 0.358 Very poor 0.429 
Open space 0.001 Open space 0.002 
Inconsistency ratio 0.019 Inconsistency ratio 0.021 
Figure 7

Urban flood vulnerability maps for Amol city showing: (a) flood vulnerability value and (b) flood vulnerability zones.

Figure 7

Urban flood vulnerability maps for Amol city showing: (a) flood vulnerability value and (b) flood vulnerability zones.

Urban flood risk index map

The spatial distribution of the ensemble model output shows the urban flood risk index (Figure 8(a)). Using the natural break method in ArcGIS 10.4, the risk index map was classified into very low, low, moderate, high, and very high, representing 3.40%, 16.41%, 14.17%, 18.58%, and 47.45% of the total area, respectively (Figure 8(b)). The risk index map confirmed that northern and central areas of Amol city have the highest risk of flooding. A histogram assessment of the ensemble model (Figure 9) showed that the probability of flood point occurrence (for the test data) in the very high, high, moderate, low, and very low-risk areas was 5.49%, 18.68%, 15.38%, 17.58%, and 42.86% of total points, respectively. Hence, the ratio of flood point occurrence (%) to the area of risk class (%) was 1.641, 1.139, 1.086, 0.946, and 0.903, for the very high, high, moderate, low, and very low classes, respectively (Figure 9). Accordingly, areas with a very high risk of flooding had the highest density of flood points, and areas with a very low risk had the lowest density (in a given area).

Figure 8

Spatial distribution of flood risk in Amol city predicted using the ensemble model, depicted as (a) flood risk index and (b) flood risk zones.

Figure 8

Spatial distribution of flood risk in Amol city predicted using the ensemble model, depicted as (a) flood risk index and (b) flood risk zones.

Figure 9

Histogram assessment of the ensemble model, showing flood point occurrence in five different flood risk zones (very low to very high).

Figure 9

Histogram assessment of the ensemble model, showing flood point occurrence in five different flood risk zones (very low to very high).

Urban flooding has been modeled previously using models, such as Mike and LisFlood-FP. However, these hydrological and hydraulic models simulate the physical processes of urban flooding conditions and require sophisticated datasets and abundant computations. Thus, in recent years, machine learning algorithms have been widely used in environmental and especially flood risk studies, in which mapping-based models are important. Models for mapping food-prone areas have been used by researchers worldwide and most agree that developing a specific model and identifying appropriate flood conditioning factors are important. In this study, machine learning algorithms were used to identify flood-prone areas in an urban environment, Amol city in Iran. In these models, urban flood potential areas are determined by a flood inventory map, which shows the maximum probability of urban flooding occurring in those locations. The identification of the risk of urban flood-prone areas is important in Amol, because most recent flooding has occurred within the city center. However, mapping of flood-prone areas showed very high risks across Amol city, and also areas with a lower risk around the edges of the city.

In general, urbanization increases the magnitude and frequency of floods and may expose vulnerable communities to greater risk. Machine learning and data mining techniques have now become more popular in the field of spatial distribution analysis, modeling, and urban flood mapping. Floods are affected by many factors, such as land use and meteorological, hydrological, and topographical conditions. The consideration of conditioning factors (precipitation, slope, CN, distance to river, distance to channels, depth to groundwater, land use, and elevation (DEM)) proved useful in developing a flood hazard map for Amol city using several machine learning algorithms combined into a novel ensemble model. A key advantage of this approach is that limited knowledge is required. Moreover, the methods used are parsimonious since in areas where climate, hydrological, and hydraulic data are lacking, predictive variables, namely hydrological, topographical, or land use properties, can be used instead for urban flood modeling. Thus, the method is transferable to other urban areas.

The main limitation of the approach is that some of the above-mentioned conditioning factors vary over time, and therefore influence the results. For example, precipitation is dynamic, yet we were forced to use a rainfall value reflecting the spatial distribution of the long-term mean. Moreover, our approach does not take into account potential changes in LCLU and associated changes in the infiltration capacity. If necessary, new data on the conditioning factors can be added in the future to determine flood hazards, particularly if new information on inundation is made available following future floods. Finally, we expect the most important conditioning factors (here the building density, building age, population density, and socio-economic conditions) to change in future versions of the flood risk map.

CONCLUSIONS

Preparing a flood risk map that reflects both the severity and extent of flood impact is a prerequisite for sound flood governance in urban areas, especially in regions with intensive and prolonged storms that cause recurrent floods. To overcome input data limitations, we combined seven machine learning algorithms to assess flood risk in the case study city of Amol, Iran. The accuracy of mapping differed between the individual algorithms, with the MARS, GAM, GLM, and BRT models proving most accurate. We then integrated these four algorithms into an ensemble prediction model to derive a final flood risk map, which was determined to be more accurate than the individual algorithms. The four conditioning factors that proved to be of greatest importance in determining the susceptibility to future flood damage were building density, building age, population, and socio-economic conditions. The vulnerability map produced was useful in identifying high-risk areas where flood mitigation actions are needed. According to the ensemble model, areas surrounding the city and in the vicinity of the Haraz River have the lowest vulnerability to flooding, whereas the central city district is the most vulnerable and should be a high priority in future flood mitigation planning. Overall, our ensemble approach combining several algorithms gave high accuracy in urban flood risk mapping for the study area. The accuracy could be improved with better data on rainfall distribution during flood-producing storms and on the location of inundation zones during floods. This information was not available to us but is likely to be in the future as Amol develops a stronger flood governance program.

REFERENCES

REFERENCES
Amendola
A.
,
Ermoliev
Y.
,
Ermolieva
T. Y.
,
Gitis
V.
,
Koff
G.
&
Linnerooth-Bayer
J.
2000
A systems approach to modeling catastrophic risk and insurability
.
Natural Hazards
21
(
2–3
),
381
393
.
Borga
M.
,
Anagnostou
E. N.
,
Blöschl
G.
&
Creutin
J. D.
2011
Flash flood forecasting, warning and risk management: the HYDRATE project
.
Environmental Science & Policy
14
(
7
),
834
844
.
Boulesteix
A. L.
,
Janitza
S.
,
Kruppa
J.
&
König
I. R.
2012
Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics
.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
2
(
6
),
493
507
.
Breiman
L.
2001
Random forests
.
Machine Learning
45
(
1
),
5
32
.
Brimicombe
A. J.
&
Bartlett
J. M.
1996
Linking GIS with hydraulic modeling for flood risk assessment: The Hong Kong approach. GIS and environmental modeling
.
Progress and Research Issues
12
,
165
168
.
Chau
K. W.
,
Wu
C. L.
&
Li
Y. S.
2005
Comparison of several flood forecasting models in Yangtze River
.
Journal of Hydrologic Engineering
10
(
6
),
485
491
.
Cherqui
F.
,
Belmeziti
A.
,
Granger
D.
,
Sourdril
A.
&
Le Gauffre
P.
2015
Assessing urban potential flooding risk and identifying effective risk-reduction measures
.
Science of the Total Environment
514
,
418
425
.
Choubin
B.
,
Moradi
E.
,
Golshan
M.
,
Adamowski
J.
,
Sajedi-Hosseini
F.
&
Mosavi
A.
2018
An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines
.
Science of the Total Environment
651
,
2087
2096
.
Darabi
H.
,
Shahedi
K.
,
Solaimani
K.
&
Miryaghoubzadeh
M.
2014
Prioritization of subwatersheds based on flooding conditions using hydrological model, multivariate analysis and remote sensing technique
.
Water and Environment Journal
28
(
3
),
382
392
.
Darabi
H.
,
Shahedi
K.
&
Mardian
M.
2016
Flood susceptibility and probability mapping using frequency ratio method in Pol-Doab Shazand Watershed
.
Watershed Engineering and Management
8
(
1
),
68
79
.
Darabi
H.
,
Pirnia
A.
,
Choubin
B.
&
Rozbeh
S.
2018
Urban growth modeling and prediction using logistic regression and Markov Chain in Sari
.
Journal of Geomatics Science and Technology
7
(
4
),
119
131
.
Darabi
H.
,
Choubin
B.
,
Rahmati
O.
,
Haghighi
A. T.
,
Pradhan
B.
&
Kløve
B.
2019
Urban flood risk mapping using the GARP and QUEST models: a comparative study of machine learning techniques
.
Journal of Hydrology
569
,
142
154
.
Dewan
A.
2013
Floods in a megacity: geospatial techniques in assessing hazards, risk and vulnerability
.
Springer Science & Business Media
, pp.
119
156
.
Dewan
A. M.
&
Yamaguchi
Y.
2008
Effect of land cover changes on flooding: example from Greater Dhaka of Bangladesh
.
International Journal of Geoinformatics
4
(
1
),
52
65
.
Dominici
F.
,
McDermott
A.
,
Zeger
S. L.
&
Samet
J. M.
2002
On the use of generalized additive models in time-series studies of air pollution and health
.
American Journal of Epidemiology
156
(
3
),
193
203
.
Du
N.
,
Ottens
H.
&
Sliuzas
R.
2010
Spatial impact of urban expansion on surface water bodies – a case study of Wuhan, China
.
Landscape Urban Plan
94
(
3–4
),
175
185
.
Engeland
K.
,
Wilson
D.
,
Borsányi
P.
,
Roald
L.
&
Holmqvist
E.
2018
Use of historical data in flood frequency analysis: a case study for four catchments in Norway
.
Hydrology Research
49
(
2
),
466
486
.
Friedman
J. H.
1991
Multivariate adaptive regression splines
.
The Annals of Statistics
19
(
1
),
1
67
.
Ghani
A. A.
,
Zakaria
N. A.
&
Falconer
R. A.
2009
River modelling and flood mitigation; Malaysian perspectives
.
In Proceedings of the Institution of Civil Engineers
.
Water Management
162
,
1
2
.
Ghani
A. A.
,
Chang
C. K.
,
Leow
C. S.
&
Zakaria
N. A.
2012
Sungai Pahang digital flood mapping: 2007 flood
.
International Journal of River Basin Management
10
(
2
),
139
148
.
Guisan
A.
,
Edwards
T. C.
Jr.
&
Hastie
T.
2002
Generalized linear and generalized additive models in studies of species distributions: setting the scene
.
Ecological Modelling
157
(
2–3
),
89
100
.
Güneralp
B.
,
Zhou
Y.
,
Ürge-Vorsatz
D.
,
Gupta
M.
,
Yu
S.
,
Patel
P. L.
&
Seto
K. C.
2017
Global scenarios of urban density and its impacts on building energy use through 2050
.
Proceedings of the National Academy of Sciences
114
(
34
),
8945
8950
.
Haghighi
A. T.
,
Darabi
H.
,
Shahedi
K.
,
Solaimani
K.
&
Kløve
B.
2019
A scenario-based approach for assessing the hydrological impacts of land use and climate change in the Marboreh Watershed, Iran
.
Environmental Modeling & Assessment
24
,
1
17
.
Hastie
T. J.
&
Tibshirani
R. J.
1990
Generalized Additive Models: Volume 43 of Monographs on Statistics and Applied Probability
43
,
1
11
.
Hinkel
J.
,
Lincke
D.
,
Vafeidis
A. T.
,
Perrette
M.
,
Nicholls
R. J.
,
Tol
R. S.
&
Levermann
A.
2014
Coastal flood damage and adaptation costs under 21st century sea-level rise
.
Proceedings of the National Academy of Sciences
111
(
9
),
3292
3297
.
Jalayer
F.
,
De Risi
R.
,
De Paola
F.
,
Giugni
M.
,
Manfredi
G.
,
Gasparini
P.
&
Cavan
G.
2014
Probabilistic GIS-based method for delineation of urban flooding risk hotspots
.
Natural Hazards
73
(
2
),
975
1001
.
Jiang
P.
,
Wu
H.
,
Wang
W.
,
Ma
W.
,
Sun
X.
&
Lu
Z.
2007
MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features
.
Nucleic Acids Research
35
(
suppl_2
),
W339
W344
.
Julien
P. Y.
,
Ghani
A. A.
,
Zakaria
N. A.
,
Abdullah
R.
&
Chang
C. K.
2009
Case study: flood mitigation of the Muda River, Malaysia
.
Journal of Hydraulic Engineering
136
(
4
),
251
261
.
Koubbi
P.
,
Moteki
M.
,
Duhamel
G.
,
Goarant
A.
,
Hulley
P. A.
,
O'Driscoll
R.
&
Hosie
G.
2011
Ecoregionalization of myctophid fish in the Indian sector of the Southern Ocean: results from generalized dissimilarity models
.
Deep Sea Research Part II: Topical Studies in Oceanography
58
(
1–2
),
170
180
.
Kundzewicz
Z. W.
,
Hirabayashi
Y.
&
Kanae
S.
2010
River floods in the changing climate – observations and projections
.
Water Resources Management
24
(
11
),
2633
2646
.
Lamovec
P.
,
Velkanovski
T.
,
Mikos
M.
&
Osir
K.
2013
Detecting flooded areas with machine learning techniques: case study of the Selška Sora river flash flood in September 2007
.
Journal of Applied Remote Sensing
7
(
1
),
073564
.
Leow
C. S.
,
Abdullah
R.
,
Zakaria
N. A.
,
Ghani
A. A.
&
Chang
C. K.
2009
Modelling urban river catchment: a case study in Malaysia
.
In Proceedings of the Institution of Civil Engineers
.
Water Management
162
,
25
34
.
Liu
R.
,
Chen
Y.
,
Wu
J.
,
Gao
L.
,
Barrett
D.
,
Xu
T.
&
Yu
J.
2016
Assessing spatial likelihood of flooding hazard using naïve Bayes and GIS: a case study in Bowen Basin, Australia
.
Stochastic Environmental Research and Risk Assessment
30
(
6
),
1575
1590
.
Lotfi
S.
,
Shahabi Shahmiri
M.
&
Nikbakht
E.
2016
The feasibility study of applying creative multicenter network metropolitan approach in the metropolitan area of the Central Mazandaran
.
Geography and Development Iranian Journal
14
(
43
),
1
17
.
Lytle
D. A.
&
Poff
N. L.
2004
Adaptation to natural flow regimes
.
Trends in Ecology & Evolution
19
(
2
),
94
100
.
Menberu
M. W.
,
Haghighi
A. T.
,
Ronkanen
A. K.
,
Kværner
J.
&
Kløve
B.
2014
Runoff curve numbers for peat-dominated watersheds
.
Journal of Hydrologic Engineering
20
(
4
),
04014058
.
Meyer
V.
,
Haase
D.
&
Scheuer
S.
2009
A multicriteria flood risk assessment and mapping approach
.
Flood Risk Management Research and Practice
4
,
1687
1694
.
Muis
S.
,
Güneralp
B.
,
Jongman
B.
,
Aerts
J. C.
&
Ward
P. J.
2015
Flood risk and adaptation strategies under climate change and urban expansion: a probabilistic analysis using global data
.
Science of The Total Environment
538
,
445
457
.
Muller
M.
2007
Adapting to climate change: water management for urban resilience
.
Environment and Urbanization
19
(
1
),
99
113
.
Mulligan
G. F.
&
Crampton
J. P.
2005
Population growth in the world's largest cities
.
Cities
22
(
5
),
365
380
.
Mustonen
K. R.
,
Mykrä
H.
,
Marttila
H.
,
Haghighi
A. T.
,
Kløve
B.
,
Aroviita
J.
&
Muotka
T.
2016
Defining the natural flow regimes of boreal rivers: relationship with benthic macroinvertebrate communities
.
Freshwater Science
35
(
2
),
559
572
.
Nirupama
N.
&
Simonovic
S. P.
2007
Increase of flood risk due to urbanization: a Canadian example
.
Natural Hazards
40
(
1
),
25
41
.
Phillips
S. J.
,
Anderson
R. P.
&
Schapire
R. E.
2006
Maximum entropy modeling of species geographic distributions
.
Ecological modelling
190
(
3–4
),
231
259
.
Pirnia
A.
,
Darabi
H.
,
Choubin
B.
,
Omidvar
E.
,
Onyutha
C.
&
Haghighi
A. T.
2019
Contribution of climatic variability and human activities to stream flow changes in the Haraz River basin, northern Iran
.
Journal of Hydro-environment Research
25
,
12
24
.
Saaty
T. L.
2006
There is no mathematical validity for using fuzzy number crunching in the analytic hierarchy process
.
Journal of Systems Science and Systems Engineering
15
(
4
),
457
464
.
Sadegh
M.
,
Majd
M. S.
,
Hernandez
J.
&
Haghighi
A. T.
2018
The quest for hydrological signatures: effects of data transformation on Bayesian inference of watershed models
.
Water Resources Management
32
(
5
),
1867
1881
.
Schueler
T.
1994
The importance of imperviousness
.
Watershed Protection Techniques
1
(
3
),
100
101
.
Sedaghat
M.
,
Solaimani
K.
&
Rashidpour
M.
2016
Assessment of flood susceptibility in Amol city using GIS technique
. In:
3th National Conference on Advanced Studies and Research in Geography
.
Architecture and Urban Science of Iran
,
NICONF03_250
, p.
11
.
Sharif
H. O.
,
Al-Juaidi
F. H.
,
Al-Othman
A.
,
Al-Dousary
I.
,
Fadda
E.
,
Jamal-Uddeen
S.
&
Elhassan
A.
2016
Flood hazards in an urbanizing watershed in Riyadh, Saudi Arabia
.
Geomatics, Natural Hazards and Risk
7
(
2
),
702
720
.
Sinnakaudan
S. K.
,
Ab Ghani
A.
,
Ahmad
M. S. S.
&
Zakaria
N. A.
2003
Flood risk mapping for Pari River incorporating sediment transport
.
Environmental Modelling & Software
18
(
2
),
119
130
.
Tam
V. T.
&
Nga
T. T. V.
2018
Assessment of urbanization impact on groundwater resources in Hanoi, Vietnam
.
Journal of Environmental Management
227
,
107
116
.
Tapsell
S. M.
,
Penning-Rowsell
E. C.
,
Tunstall
S. M.
&
Wilson
T. L.
2002
Vulnerability to flooding: health and social dimensions
.
Philosophical Transactions of the Royal Society of London A: Mathematical. Physical and Engineering Sciences
360
(
1796
),
1511
1525
.
Termeh
S. V. R.
,
Kornejady
A.
,
Pourghasemi
H. R.
&
Keesstra
S.
2018
Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms
.
Science of the Total Environment
615
,
438
451
.
Tockner
K.
,
Lorang
M. S.
&
Stanford
J. A.
2010
River flood plains are model ecosystems to test general hydrogeomorphic and ecological concepts
.
River Research and Applications
26
(
1
),
76
86
.
UNPD
2014
World Urbanization Prospects: the 2014 Revision
.
Urbani
F.
,
D'Alessandro
P.
,
Frasca
R.
&
Biondi
M.
2015
Maximum entropy modeling of geographic distributions of the flea beetle species endemic in Italy (Coleoptera: Chrysomelidae: Galerucinae: Alticini)
.
Zoologischer Anzeiger – A. Journal of Comparative Zoology
258
,
99
109
.
Wu
X.
,
Wang
Z.
,
Guo
S.
,
Lai
C.
&
Chen
X.
2018
A simplified approach for flood modeling in urban environments
.
Hydrology Research
49
(
6
),
1804
1816
.
Yaraghi
N.
,
Ronkanen
A. K.
,
Darabi
H.
,
Kløve
B.
&
Haghighi
A. T.
2019
Impact of managed aquifer recharge structure on river flow regimes in arid and semi-arid climates
.
Science of The Total Environment
675
,
429
438
.
Yin
J.
,
Ye
M.
,
Yin
Z.
&
Xu
S.
2015
A review of advances in urban flood risk analysis over China
.
Stochastic Environmental Research and Risk Assessment
29
(
3
),
1063
1070
.
Zhao
G.
,
Pang
B.
,
Xu
Z.
,
Yue
J.
&
Tu
T.
2018
Mapping flood susceptibility in mountainous areas on a national scale in China
.
Science of the Total Environment
615
,
1133
1142
.
Zhou
Q.
,
Mikkelsen
P. S.
,
Halsnæs
K.
&
Arnbjerg-Nielsen
K.
2012
Framework for economic pluvial flood risk assessment considering climate change effects and adaptation benefits
.
Journal of Hydrology
414
,
539
549
.