Development of algorithms for evaluating performance of flood simulation models with satellite-derived flood

Flood inundation simulation models are widely used for simulating severe events of flood, generating hazard maps, risk assessment, and to identify flood vulnerable locations. It is important to assess the degree of accuracy of flood model results as these results may be one of the triggering parameters considered in developing flood hazard maps, flood mitigation policies, and land using planning where multi-criteria analysis is approached. In the present study, an algorithm is developed in order to know the performance of flood models by validating it with flood footprints extracted from synthetic aperture radar (SAR) images using multi-segmentation and Otsu’s thresholding technique. Evaluation of the performance of the model is based on two best fit criteria called F1 and F2. For this, HEC-RAS model is used for simulating the severe event of flood witnessed in Mahanadi River in Odisha stretching between Tikarpara and Mundali during September 2008. Three simulations were made by considering three different Manning’s roughness for river and floodplain. The model gives appreciable results and best fit F11⁄4 0.85 and F21⁄4 0.74 was found for Manning’s roughness 0.020.


INTRODUCTION
Flooding is a global phenomenon that causes human and property loss as mentioned by Teng et al. (2017). A flood inundation hydraulic model predicts flood extent and it may be an important source of information for flood risk assessment studies and in determining flood-vulnerable areas (Liu et al. 2018;Shastry & Durand 2019). Flood models also play an important role in flood preparedness and flood risk reduction as stated strongly by Chen et al. (2019). Flood hydrodynamic models give the result in the form of depth, water surface elevation and velocities with respect to time. Researchers and hydraulic modelers simulate flood for a river basin to study its impact assessment and to design hydraulic structures. They generally validate their simulated flood extent results (total area) with observed flood extent (ground truth or satellite observed) total area. Validating by considering only total area simulated versus total area observed cannot be a correct practice because simulated extent may be more than observed extent at some places and at some places the reverse may be true. Thus, results compared in terms of total area may give appreciable results, however, there will be ambiguity in the validation. Hence, simply evaluating total area flood simulated versus total area flood observed cannot give a correct picture of hydraulic/flood model performance. The simulated results must be assessed pixel-wise with satellite observed extent in order to evaluate flood model performance. There are many statistical approaches for evaluating flood model performance, one of which was developed by SENAGUA (2014) to compare the water surface elevation result from the model to the ground scenario. Bates & De Roo (2000) evaluated the model performance based on flood extent. Horrit et al. (2007) and Di Baldassarre et al. (2009) worked on evaluating flood extent in terms of measures of fit F1 and F2 which was proposed for flood studies by Moya-Quirogaa et al. (2016). The present study uses Horrit and Di Baldassarre techniques, which were further modified to develop an algorithm in Python for evaluating the performance of flood model (F1 and F2) automatically by inputting two rasters, i.e., flood observed and flood simulated. This technique is selected as it evaluates flood extent in terms of measures of fit, which is limited to the scope of the present study. The significance of the developed algorithm over the existing technique is that the algorithm minimizes the number of GIS operations and computational efforts on two rasters (flood simulated and flood observed) in order to evaluate fitting values F1 and F2 and estimate cell values, which are correctly predicted, overpredicted, and underpredicted by the model. To test the algorithm, 2D flood simulation model is set up for part of Mahanadi River in Odisha in the HEC-RAS model. HEC-RAS (Hydrologic Engineering Centre -River Analysis System) is a freely available and popular hydraulic model developed by the United States Army Corps of Engineers (USACE). The flood inundation can be simulated by using 1D (one-dimensional) or 2D (two-dimensional) models. HEC-RAS version prior to 5.0.1 was limited to simulating 1D flow, which can be suitable to analyze the behavior of flow in open channels (natural or artificial) where water is flowing below danger level, and thus can be considered to be 1D flow as the water is flowing only in the longitudinal direction. When water levels exceed threat (danger) level and water approaches floodplains, there is the possibility of water moving in two directions, thus 2D flow modeling becomes essential for simulating floods with greater accuracy. Taking this into account, USACE in February 2016 developed version 5.0.1, which has the capability to simulate 2D flow. HEC-RAS is open source hydraulic software which performs better and achieves high accuracy in inundation modeling, Surwase et al. (2019) concluded in their flood inundation simulation study. The study uses HEC-RAS 2D model version 5.0.1 February 2016 release to simulate the floods for the September 2008 Mahanadi river flood. The simulations are performed by providing the daily river discharge values as an input which were recorded at the time of the flood event and obtained from Central Water Commission. The model shows the simulation results in the form of flood extent with depth and velocity. The results were further validated with RADARSAT SAR satellite image. SAR has proven to be an effective tool for water surface detection as stated by Tholey et al. (2015), Hoque et al. (2011), Henry et al. (2006, Kuenzer et al. (2013), for estimation of flood depth studied by Iervolino et al. (2015), and flooding beneath vegetation canopies in certain conditions observed by Richards et al. (1987), Hess et al. (1990), Townsend (2002aTownsend ( , 2002b, Kasischke & Bourgeau-Chavez (1997). Commonly used SAR-based flood extent mapping approaches include simple visual interpretation (Oberstadler et al. 1997), supervised classification (De Roo et al. 1999;Townsend 2002aTownsend , 2002b, image texture algorithms (Schumann et al. 2005), histogram thresholding (Schumann et al. 2010), various multitemporal change detection methods (Bazi et al. 2005), and active contour models (Bates et al. 1997;Matgen et al. 2007). Among these, Cao et al.
(2019) mentioned that histogram thresholding is a common and simple approach which can be used to separate flood and non-flood areas in a large area (swathe) due to the specular backscattering characteristics of active radar pulses on plain water surfaces and the resultant low signal return.
Visual inspection and manual editing can help in training regions of interest (ROIs) and reducing false positives/negatives, demonstrated by Pulvirenti et al. (2010). As manual editing is time-consuming and manpower-driven, it cannot be applied to rapid response to flood disasters, especially during back-to-back flood events . The common technique used is the thresholding technique which is a pixel-based operation and considered as the most efficient way for rapid mapping. The specular reflection properties of non-obstructed water have driven numerous endeavors to decide a limit, beneath which, pixels are recognized as water, which was demonstrated by Yamada (2001), Giustarini et al. (2013), Hirose et al. (2001), and Matgen et al. (2011). A single threshold may not hold well in large-area water bodies (Tan et al. 2004) or for the whole swathe of a SAR image since it suffers from the heterogeneity of the environment, caused by wind-roughening and satellite system parameters (Martinis et al. 2009). Martinis & Rieke (2015) demonstrated the temporal heterogeneity of the backscattering of permanent water bodies, implying the temporal variability of the threshold. To address the spatial variability, Martinis et al. (2009) applied a split-based approach (SBA) together with an object-oriented (OO) segmentation method as described by Baatz (1999). Matgen et al. (2011) introduced a histogram segmentation method, and Giustarini et al. (2013) generalized the calibration process. Pulvirenti et al. (2011) developed an image segmentation method consisting of dilution and erosion operators to remove isolated groups of water pixels and small holes in water bodies, which are believed to be caused by speckle. Giustarini et al. (2013), Matgen et al. (2011), andLu et al. (2014) employed the region growing algorithm (RGA) to extend inundation areas from detected water pixels. The RGA starts from seeding pixels and then keeps absorbing homogenous pixels from neighbors until no more homogenous pixels exist in neighboring areas. Martinis et al. (2009) applied SBA, as described by Bovolo & Bruzzone (2007), to determine the global threshold for binary (water and non-water) classification. In SBA, a SAR image is first divided into splits (sub-tiles) to determine their individual thresholds using the Kittler and Illingworth (KI) method, global minimum, and quality index described by Kittler & Illingworth (1986). Then, only global threshold can be obtained by qualified splits showing sufficient water and non-water pixels.
In this study, automation of the delineation of the flood is carried out using a multi-segmentation technique combined with Otsu's threshold technique and executed with Python scripts. The post-processing component includes removal of stray pixels and applying a water mask to delineate the final flood layer. The GDAL module in Python was used to read the pre-processed image/intermediate product and water mask, along with their projection system. Later, the image is processed for classification with numpy and scikit-image modules. The final output is exported using GDAL module with the same input projection system.

STUDY AREA
The flood simulation was carried out for a stretch of the Mahanadi River in Odisha state, India. The study area is from Tikarpara to Mundali and lies between latitudes 20°07 0 58.8″ N and 20°43 0 8.4″ N and longitudes 84°44 0 42″ E and 85°46 0 4.8″ E. Mahanadi is the major river flowing through Odisha state and in the study area. A location map of the study area is shown in Figure 1.
There are two gauge stations, Tikarapara and Mundali, where discharge is measured. The study area is located at the downstream side of Hirakud dam, which intercepts a catchment area of 48,700 km 2 that is responsible for floods in Mahanadi deltaic area (Parhi et al. 2012). Odisha witnessed peak floods in September 2008 affecting millions of people, damaging agricultural crops and infrastructure.
Hence, flood inundation simulation for the September 2008 flood event is carried out for the study area.

DATA USED
Flood inundation simulation was carried out by using Carto-DEM 10 meter spatial resolution. Landuse land cover (LULC) at 1:250,000 scale derived from Advanced Wide Field Sensor (AWiFS) image acquired in 2016-2017, was used for assigning Manning's roughness coefficient of the terrain. Discharge data collected from Central Water Commission (CWC) of Tikarpara gauge station were used as inflow boundary condition of the setup model in HEC-RAS. The study simulated the flood event between 2 September 2008 and 30 September 2008. Radarsat satellite SAR image was used to validate the simulated results.

METHODS
The methodology is split into three categories, namely, flood simulation, development of an algorithm for extraction of flood footprints from SAR image, and development of an algorithm for evaluating the performance of flood models.

2D flood simulation
Flood is simulated by using HEC-RAS version 5.0.1 for the flood event between 5 September and 30 September, 2008. The study area is defined by 2D closed polygon. The 2D polygon is an area where flood inundation is expected to occur. The polygon is digitized by connecting higher elevation like hills and mountains where flood inundation will not extend beyond the same features. After defining the boundary of a polygon, the computational square cells are generated. The cells are in a square grid called a staggered grid and at the boundary of the 2D polygon the cells are in a non-staggered grid which is a composition of polygons having three to a maximum of eight sides. The spacing of the square grid was kept at an interval 10 m Â 10 m in x and y directions which was kept equal to spatial resolution of the digital elevation model (DEM) used. Further, two boundary conditions were assigned to the 2D polygon. The first one is the inflow boundary condition which represents observed discharge in the form of a hydrograph at Tikarpara gauge station. The observed inflow hydrograph in between the simulation period is shown in Figure 2. The width of the river at the gauge station is 760 m which covers 76 cells, and the inflow boundary condition was applied to these 76 cells. The second boundary condition is the outflow boundary condition normal depth assigned at Mundali station. Considering topographic conditions from DEM, normal depth slope 0.001 was assigned. Three simulations were carried out for three different roughness coefficients for river banks (floodplains), with roughness coefficient for the channel being kept the same for all three simulations (Table 1).  The simulation is carried out using full momentum Equations (1) and (2): where, h is the water depth in meters, p and q are specific flow in x and y directions (m 2 /sec), s is the surface elevation in meters, n is the Manning's resistance, g is gravitational acceleration in m/sec 2 , ρ (rho) is the density of water in kg/m 3 , f is the Coriolis force (/sec) and τ xx , τ xy , and τ yy are the components of effective shear stress. Computational time step was evaluated based on the Courant-Friedrichs-Lewy condition given in Equation (3): where, cr is Courant's number, c is celerity in m/sec; Dt is a computational time step in seconds, Dx is grid cell size in meters. The celerity 'c' was calculated considering the maximum depth of water observed at Tikarpara gauge station. Thus, Equation (3) can be returned as: , g is gravitational acceleration (m/sec 2 ) and h is the maximum depth of water observed at Tikarpara gauge station.

Development of algorithm for evaluating performance of flood simulation models
The developed algorithm is based on measure of fit of two raster datasets, i.e., simulated raster R1 and satellite-derived observed flood raster R2. Both of these rasters are of the order of one bit format. In order to find the correctly predicted cells (A), overpredicted cells (B), and underpredicted cells (C), the process starts with raster calculator 2 Â R1 À R2. If the value of the cell is 1 then the corresponding cell values are correctly predicted. If the value of the cell is 2, then the corresponding cell values are overpredicted and if the value of the cell is À1, then the corresponding cell values are underpredicted. The evaluation is carried out using the formula shown by Equations (5) and (6). The performance evaluation algorithm was scripted/coded in ArcPy, which is available in the scripting module within Arc-Map (ArcGIS software module) version 10.6.1. The script requires a spatial analyst toolbox license and extension to execute the code. The map algebra function under the spatial analyst tool was used to extract the cells that are correctly predicted, underpredicted, and overpredicted. Code developed for performance evaluation of the flood model is shown in Figure 3. Figure 4 depicts the simulated layer, observed flood extent, correctly predicted, overpredicted, and underpredicted cells. The adopted methodology for evaluation of flood models is shown in Figure 5. The best fit value F1 and F2 is given by Equations (5) and (6), respectively.
The value of F1 ranges from 0 to 1; F2 value ranges from À1 to 1. If the value of F1 and F2 is closer to 1, it means that the model performance is better, and if F1 and F2 values are exactly 1, it means all the cells are correctly predicted and there is not a single cell which is under-or overpredicted.
Development of algorithm for extraction of flood footprints from SAR image SAR images should be pre-processed before carrying out the analysis. The pre-processing steps include applying orbit file, calibration, speckle filtering, terrain correction, linear to decibel conversion, adding elevation band, and deriving land/water mask using GPT tool with the help of Python scripts. After pre-processing, the image is divided into segments, each of size 100 Â 100 pixels. Each segment is further divided into four sub-segments of size 50 Â 50 pixels. The size of segments and sub-segments was fixed after testing on more than 50 images of different areas. The procedure followed by  is used for the selection of segments automatically. The mean    intensity values are calculated in each sub-segment and using the respective mean values of four sub-segments, the standard deviation of the segment is calculated. Two conditions were checked, namely, first, the standard deviation of the segment must be high (.95% quantile). This criterion helps in finding the segments where there is a greater probability of distribution of more than one class within it and the number of pixels of each class is of good ratio. Second, the mean intensity value of the segment must be less than the mean intensity value of the whole image. This ensures that the selected segments have a good number of water pixels. If the number of segments which are satisfying the above-mentioned conditions is less than five then the size of segments and sub-segments is reduced to half and the quantile for the choice of tile is reduced to (90%) make sure that the segments also are chosen having information of a comparatively low extent of water surfaces or with smaller spread water bodies. Further, the top five segments in terms of standard deviation are selected for further analysis, since that is sufficient for threshold computation (Twele et al. 2016).
Otsu's thresholding technique is used to automatically perform clustering-based image threshold (Sezgin & Sankur 2004). The algorithmic rule assumes that the SAR image contains two categories of pixels which follow a bi-modal bar-chart, termed as foreground and background pixels, it then evaluates the optimum threshold separating the two categories so that their combined spread (intra-class variance, defined as a weighted sum of variances of the two classes) is minimal, or equivalently, so that their inter-class variance is maximal by (Otsu 1979). This technique has been employed by Vala & Baxi (2013) as it is effective and found to give reliable results to discover the threshold for each segment which was selected in the previous stage. The arithmetic mean of the thresholds of the five selected segments is finally generated and it is used as the global threshold for applying to the whole image to get a preliminary classified image. The preliminary classified image is refined using SRTM 1Sec HGT DEM data, applying majority filter and water mask. The overclassified water pockets are eliminated using SRTM 1Sec HGT DEM data obtained in the pre-processing stage from the terrain-corrected input image and areas where elevation ought to be above 15 meters are eliminated and classified as water. This ultimately reduces the commission errors which are usually caused because of shadows in the hills. Majority filter is also known as mode/modal filter. It is used to replace the central pixel value within a pre-mentioned size kernel with the pixel value which is reiterated a number of times in the same kernel. This operation reduces the stray pixels on the preliminary classified image and produces a smoothened image. SRTM water mask of the study area is used to exclude the permanent water bodies so as to generate a flood layer. Figure 6 shows the methodology adopted for extracting flood footprints from the SAR image. Table 2 shows the technical specifications of hardware and software used for automation.

RESULTS AND DISCUSSION
HEC-RAS-simulated output results deliver temporal flood extent with depth, velocity of water, and surface water elevation. The simulated output results are stored in tiff file format with floating point type. The simulated output results can be generated for the desired date and time from the simulated period. For the present study, the simulated result for the desired date and time of 23 September, 2008 at 0600 hours was exported from the model to validate with RADARSAT satellite SAR data acquired on the same date and time. Delineated flood footprints from RADARSAT SAR image using multi-segmentation and Otsu's thresholding technique are shown in Figure 7. The flood simulated result for roughness coefficient 0.02 in floodplains is shown in Figure 8. Similarly, floodsimulated results for flood roughness coefficient 0.030 and 0.040 are shown in Figures 9 and 10,   respectively. The F1 and F2 measures of fit have been evaluated for all three simulations using the algorithm and are shown in Tables 3 and 4. From the results, it is observed that F1 and F2 values are closer to 1 for the roughness coefficient 0.020 for floodplains when compared to others. Further, the inundated areas for simulated and observed is calculated by multiplying the number of pixels with a single pixel area (100 m 2 ) as 10 m is the spatial resolution. It is observed that the total inundated area for this coefficient is matching 94% with the simulated. as shown in Table 4.

CONCLUSION
The present study uses HEC-RAS model for simulating the severe flood event of Mahanadi River in Odisha. Three simulations were run using different Manning's coefficient for floodplains.    Satellite-based flood footprints were extracted by multi-segmentation and Otsu's thresholding technique which will be helpful for disaster responders to prepare flood hazard maps in near real time without any manual intervention. The development of an algorithm for performance evaluation of the HEC-RAS model for three different simulations in validation with extracted flood prints from RADARSAT shows appreciable results for Manning's roughness coefficient 0.020 for floodplains and 0.032 for channel as measures of fit F1 and F2 are 0.85 and 0.74, respectively, which are closest to 1 among the other two simulation values. From the results it is inferred that the HEC-RAS model is sensitive towards the Manning's roughness coefficient. The developed algorithm for performance evaluation can be used for any flood model (MIKE, Tuflow, ANUGA, etc.) which has the capability to export the flood inundation extent result in GIS formats.

DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.