Urban flood risk mapping using data-driven geospatial techniques for a flood-prone case area in Iran


 In an effort to improve tools for effective flood risk assessment, we applied machine learning algorithms to predict flood-prone areas in Amol city (Iran), a site with recent floods (2017–2018). An ensemble approach was then implemented to predict hazard probabilities using the best machine learning algorithms (boosted regression tree, multivariate adaptive regression spline, generalized linear model, and generalized additive model) based on a receiver operator characteristic-area under the curve (ROC-AUC) assessment. The algorithms were all trained and tested on 92 randomly selected points, information from a flood inundation survey, and geospatial predictor variables (precipitation, land use, elevation, slope percent, curve number, distance to river, distance to channel, and depth to groundwater). The ensemble model had 0.925 and 0.892 accuracy for training and testing data, respectively. We then created a vulnerability map from data on building density, building age, population density, and socio-economic conditions and assessed risk as a product of hazard and vulnerability. The results indicated that distance to channel, land use, and runoff generation were the most important factors associated with flood hazard, while population density and building density were the most important factors determining vulnerability. Areas of highest and lowest flood risks were identified, leading to recommendations on where to implement flood risk reduction measures to guide flood governance in Amol city.


INTRODUCTION
In general, floods occur when the runoff water volume exceeds the transport capacity of channels (Lytle & Poff transportation, utility, infrastructure, and communication systems (Sharif et al. ; Wu et al. ). Impacts may persist for months or years, and secondary impacts, including health-related problems, may emerge (Tapsell et al. ; Sinnakaudan et al. ). As more than half the world's population already lives in urban areas and this proportion is projected to increase to two-thirds by 2050 (UNPD ; El Alfy ), there is an immediate need to reduce flood risks in urban population centers worldwide. For example, urbanization is usually associated with a high proportion of impervious features (e.g., roads, walkways, and car parks), disturbed river/stream channels, and artificial storm drainage systems (Kundzewicz et

STUDY AREA AND DATA
Amol city (36 26 0 02″-36 29 0 45″N; 52 19 0 14″-53 23 0 50″E) has a population of 237,528 (in 2016), making it the third largest city in Mazandaran Province in northern Iran (Lotfi et al. ). It is located at an altitude of 59-137 m above mean sea level (masl) and expanded in the geographical area from 21 to 27 km 2 between 1998 and 2017 ( Figure 1). Amol is located on the Haraz River, which passes through the middle of the city on its way to the Caspian Sea. The residential areas of the city are surrounded mainly by agricultural land, orchards, and high mountains covered by forest. The mean annual rainfall in the region is 680 mm, and the climate is semi-humid (Sahin ; Choubin et al. ). Over recent decades, urban development has led to extensive changes in hydrological processes and drainage systems in the city. As a result, the number of urban flood events has been increasing annually in the past 10 years or so. Notable flood events occurred on 12 November

Machine learning models
We applied the following seven machine learning algorithms to predict flooded areas using geospatial predictor variables.  Generalized additive model: A GLM for which the linear predictor is specified as a sum of smooth functions of some or all of the covariates. GAM also provides estimates using a combination of the local scoring algorithm and the back-fitting algorithm (Dominici et al. ). It is best used for more than purely exploratory analysis, for which its smoothing parameter is a key component (Hastie & Tibshirani ). GAM is a semi-parametric extension of GLM. Like GLM, GAM uses a link function to establish a relationship between the mean of the dependent variable and a 'smoothed' function of the independent variables. The strength of GAM lies in its ability to deal with highly non-linear and non-monotonic relationships between the dependent and independent variables (Guisan et al. ).
We derived a slope map from the 5-m DEM using the 'slope tool, Spatial Analyst' in ArcGIS GIS 10.4. The slope varies from 0% to more than 7.21% in the study area (Figure 2(d)).
Then, we estimated the U.S. Soil Conservation Service According to existing records, the areas most affected by flood inundation are close to areas with poor urban drainage systems. Therefore, the Euclidean distance to other channels (as collectors of surface water) was also extracted using the distance module in GIS 10.4 (Figure 2(g)).
The groundwater level can lead to an increase or decrease in the groundwater recharge rate in unsaturated and saturated zones (Tam & Nga ). The groundwater level data used in this study were obtained from the Iranian Water Resources Management Company (IWRMC), and an IDW interpolation method was applied to identify the depth to groundwater (Figure 2(h)).

Inputs to the vulnerability model
To determine the vulnerability factors, ArcGIS (10.4) maps were constructed for each class in 5-m raster grids for the assignment of the weight/rank values, and an analysis was carried out using the AHP method  (here, urban flood mapping) to predictor variables (conditioning factors). In this study, each model was run based on learning algorithms using flood points and predictor variables. Each model also trained a portion of the data and was then tested on another portion.

Testing and performance
Before building the ensemble model, we assessed the prediction performance of the machine learning algorithms Urban flood risk We produced urban flood risk maps by multiplying hazard probabilities by vulnerability (Dewan ).
where the hazard maps were determined from the following conditioning factors: precipitation, LULC, elevation, slope percent, CN, distance to river, distance to channel, and depth to groundwater using the ensemble model; and the vulnerability map was based on the following factors: (1) building density, (2)

RESULTS AND DISCUSSION
Performance of hazard models Based on these performance data, we built the ensem- Based on expert knowledge and using AHP results to evaluate the relative importance of urban flood vulnerability factors, the factor with the greatest weight was population density (0.38), followed by building density (0.29), buildings (0.19), and socio-economic conditions (0.14). The weights obtained for each class of building density, building age, population density, and socio-economic conditions factors are shown in Table 2. In weighting factors and class ranking, total scores were applied and then each pixel of the output map was assigned a value reflecting its factor and normalized weight (Figure 7).

Urban flood risk index map
The spatial distribution of the ensemble model output shows the urban flood risk index (Figure 8(a)). Using the natural break method in ArcGIS 10.4, the risk index map was classified into very low, low, moderate, high, and very high, representing 3.40%, 16.41%, 14.17%, 18.58%, and 47.45% of the total area, respectively (Figure 8(b)). The risk index map confirmed that northern and central areas of Amol         The main limitation of the approach is that some of the above-mentioned conditioning factors vary over time, and therefore influence the results. For example, precipitation is dynamic, yet we were forced to use a rainfall value