Urban floods pose a significant threat to human communities, making its prediction essential for comprehensive flood risk assessment and the formulation of effective resource allocation strategies. Data-driven deep learning approaches have gained traction in urban emergency flood prediction, addressing the efficiency constraints of physical models. However, the spatial structure of rainfall, which has a profound influence on urban flooding, is often overlooked in many deep learning investigations. In this study, we introduce a novel deep learning model known as CRU-Net equipped with an attention mechanism to predict inundation depths in urban terrains based on spatiotemporal rainfall patterns. This method utilizes eight topographic parameters related to the height of urban waterlogging, combined with spatial rainfall data as inputs to the model. Comparative evaluations between the developed CRU-Net and two other deep learning models, U-Net and ResU-Net, reveal that CRU-Net adeptly interprets the spatiotemporal traits of rainfall and accurately estimates flood depths, emphasizing deep inundation and flood-vulnerable regions. The model demonstrates exceptional accuracy, evidenced by a root mean square error of 0.054 m and a Nash–Sutcliffe efficiency of 0.975. CRU-Net also accurately predicts over 80% of inundation locations with depths exceeding 0.3 m. Remarkably, CRU-Net delivers predictions for 3 million grids in 2.9 s, showcasing its efficiency.

  • The incorporation of spatial rainfall distribution into urban flood forecasting models is performed in this study.

  • Integration of attention mechanisms enhanced the identification of high-risk flood areas.

  • The developed model can predict rapid floods within seconds.

Extreme precipitation events associated with global climate change have rendered urban flooding an increasingly pervasive challenge across metropolitan regions worldwide (Yang et al. 2016). Flooding has significant socioeconomic consequences, affecting various aspects of daily life, safety, and property. Thus, developing accurate predictive models that enable real-time flood warnings constitutes an urgent research priority (Chen et al. 2015).

Traditional hydrological and hydrodynamic modeling helps us understand the causes and dynamics of floods by employing sophisticated mathematical equations, such as the two-dimensional shallow water equation. Industry-standard models, like Info Works and MIKE Flood, along with open-source alternatives like LISFLOOD-FP (Neal et al. 2011) and TELEMAC (Galland et al. 1991), have advanced riverine flood and urban inundation simulations. However, these traditional tools have inherent drawbacks, including high computational intensity, long running time, and high dependence on accurate geospatial and land use data. As a result, the speed of traditional models in the process of conducting real-time early warning of urban flooding often appears to be overwhelming, making it difficult to meet the needs of flood prevention and risk avoidance in an urban context (Garzon et al. 2022).

Considerable effort have been dedicated to exploring methodologies aimed at alleviating the computational load through: (1) employing GPU acceleration for parallel computing (Sharifian et al. 2023); (2) circumventing the use of shallow water equations and utilizing cellular automata methods (Guidolin et al. 2016; Jamali et al. 2019); and (3) harnessing data-driven artificial intelligence, particularly deep learning techniques, to quickly emulate historical flood data or results from physical models (Berkhahn et al. 2019; He et al. 2023). Deep learning models, especially convolutional neural networks (CNNs), with their intrinsic ability to extract spatiotemporal features from huge datasets, are gaining increasing attention. Kabir et al. (2020) and Donnelly et al. (2022) demonstrate the power of CNNs in rapid river flood forecasting. Furthermore, the contributions by Löwe et al. (2021) and Liao et al. (2023) highlight the adaptability and effectiveness of deep learning agents compared to established physical models. In the present era, characterized by the prominence of large models, deep learning methods exhibit enhanced generalizability. By augmenting datasets or transferring trained models, these methods can be effectively extended to unexplored watersheds, showcasing their potential for broader applicability (Guo et al. 2020; Seleem et al. 2023).

Despite deep learning techniques becoming increasingly prevalent in flood forecasting, their limitations in handling spatiotemporal data and extracting meaningful features from rainfall data have become apparent. In flood prediction, some input features, such as topography, land cover, and inundation depth, should be emphasized. To identify and adjust the importance of these features across spatial dimensions and focus on crucial information, attention mechanisms are adopted in the deep learning model. These mechanisms allow neural networks to selectively prioritize certain data segments, potentially improving model accuracy by emphasizing areas susceptible to flooding and accounting for the unique spatial characteristics of rainfall and inundation. Zhang et al. (2023) developed a flood forecasting model that combines Long Short-Term Memory networks with attention mechanisms, concentrating on key features. The model demonstrated high predictive accuracy, achieving a coefficient of determination over 0.85, a peak flow error below 0.015 m, and a peak arrival time error under 2 min. Farahmand et al. (2023) introduced a spatiotemporal graph deep learning model for real-time urban flood forecasting. This model combines physical-based features with human-perceived data, utilizing an Attention-based Spatio-Temporal Graph Convolutional Network (ASTGCN), thus focusing on the most impactful dynamic change features. Although these studies have demonstrated the potential of attention mechanisms in improving flood prediction accuracy, particularly by focusing on crucial monitoring data to enhance the predictive capability of models, their scope has primarily been limited to predicting inundation depths at selected monitoring points or nodes. While these approaches prove partially effective, it fails to provide high-resolution flood depth prediction maps across the entire region. The prediction of high-resolution inundation depth, including details at the community scale, is crucial for comprehensive flood risk assessment and the development of effective resource allocation strategies.

The spatial distribution and intensity fluctuations of intense rainfall is another key factor that plays a crucial role in determining flood occurrence and progression (Cristiano et al. 2017). Zhou et al. (2021) conducted a comprehensive analysis of the spatiotemporal dynamics of rainfall and its impact on flood frequency in river basins, highlighting the significant role of rainfall's spatial diversity during extreme events. They found a strong link between rainfall patterns and flood peak sizes, noting an average 50% increase in flood peak magnitudes. This aligns with Zhu et al. (2018), emphasizing the importance of including spatial and temporal rainfall data in flood frequency analyses, even for small urban catchments. Their work shows that variations in rainfall distribution critically influence both the volume and peak of runoff, especially in areas with impervious surfaces. Collectively, these studies highlight the critical importance of understanding rainfall's spatial variability for accurate hydrological response predictions and effective flood forecasting. Unfortunately, a majority of existing research tends to neglect the spatial heterogeneity of rainfall and assume that the rainfall intensity is uniformly distributed across urban catchments. This assumption is an oversimplification that may not accurately capture real-world scenarios. Moreover, much of the existing research mainly focus on river-based flooding and modeling inundation in large watersheds, overlooking the intricate interactions of urban drainage systems in flood mitigation and water evacuation.

Recognizing this research gap, our study endeavors to design an attention-enhanced deep learning network specifically for predicting inundation depths in urban areas, considering the spatiotemporal dynamics of rainfall. This approach enables the processing of high-resolution rainfall data grids, effectively capturing the spatial and temporal variations of intense rainfall, offering faster and more accurate alternatives for simulating and predicting urban floods. This advancement is crucial for the development of rapid warning systems, facilitating timely responses and improved preparedness measures.

A schematic of the proposed methodology is given in Figure 1. Initially, we collect the key information, such as urban topography and pipe network systems, and construct a spatially inhomogeneous rainfall that reflects spatial and temporal variability characteristics. These data were then integrated into a mechanistic model to simulate the depth of inundation. In the next step of data configuration, the topographic and rainfall data are preprocessed through integration and normalization to form datasets along with flood depth maps. Finally, these datasets were utilized as inputs and outputs of the deep learning model for supervised training to build a surrogate model for inundation prediction. These improvements aimed to account for the intricate patterns and variations in rainfall over time and space. By training the model with the rainfall and inundation data, it learned to establish correlations and patterns, enabling it to make accurate predictions of inundation depths.
Figure 1

Schematic of the proposed method.

Figure 1

Schematic of the proposed method.

Close modal

Data collection and mechanism model

Case study

As shown in Figure 2, this case selected a specific drainage system located in Zhuhai, China, for analysis. Given Zhuhai's unique geographical environment, including low-lying urban topography and its coastal location, this makes urban waterlogging particularly prominent. The system consists of 812 nodes and 751 drainage pipes, with a total length of 65.42 km and a total drainage area of 13.36 km2. The drainage system transports rainwater to 61 outfalls marked by red triangles. Based on Zhuhai's geographical and climatic characteristics, these short and small-diameter pipes can effectively deal with common rainfall events. However, the actual rainfall distribution often has space and temporal inhomogeneity, which creates additional challenges and stresses on the drainage system.
Figure 2

Research area topography and urban drainage system map.

Figure 2

Research area topography and urban drainage system map.

Close modal

Spatial rainfall distribution

The spatial variability and uncertainty of rainfall intensity during actual storms play a crucial role in the accumulation and propagation of runoff, ultimately determining the occurrence and progression of urban flooding. To account for such complexity, this study employed a spatially variable rainfall model based on fixed storm center intensity ratio based on the method provided by Lin et al. (2022). This model, based on uniform rainfall from a designed storm process line, accurately maps the spatial distribution of rainfall intensity at each time step using a two-dimensional truncated Gaussian distribution. Its core principle lies in fixing the ratio of maximum intensity between the spatially varying rainfall field and the uniform rainfall field in the control equation, ensuring that the total precipitation volume is consistent with that in a uniform field. As depicted in Figure 3, the storm center exhibits the highest rainfall intensity, and the intensity of rainfall decreases as the distance from the center increases. In this study, the recurrence interval of rainfall intensity was set at 100 years, the duration of rainfall was set at 2 h, and the storm center was treated as a variable that moves randomly within the study area.
Figure 3

Spatial rainfall based on the increase of rainfall intensity at the fixed rainstorm center.

Figure 3

Spatial rainfall based on the increase of rainfall intensity at the fixed rainstorm center.

Close modal

1D/2D coupling mechanistic model

To more accurately simulate the inundation of urban areas under uneven rainfall conditions, a 1D/2D coupling model based on LISFLOOD-FP and the Storm Water Management Model (SWMM) was designed and implemented. It has demonstrated its applicability in simulating both surface and subsurface flows and has been effectively utilized in areas such as flood risk assessment, predicting the impact of climate change on floods, and managing river basins (Wu et al. 2017; Zeng et al. 2022). Building on the proven accuracy of the SWMM-LISFLOOD coupled model, we set up a high-resolution modeling system over a dense urban catchment using cutting-edge geospatial datasets. A total of 809 virtual rain gauges were configured in the SWMM to capture the spatial variability of rainfall, driving a LISFLOOD-FP overland flow model with surcharge manhole data. A high-resolution digital elevation model (DEM) of 3 × 3 m is utilized to represent the complex urban surface terrain. Details on the parameter settings of the coupled model are comprehensively presented in Table S1 (provided in the Supplementary material).

Data processing and configuration

In this study, the model's input features are categorized into static and dynamic types. Specifically, the model incorporates eight static feature parameters, whereas the rainfall characteristic is regarded as a dynamic feature, with its number of features being adjustable dynamically in accordance with time intervals.

Static feature images

Topography and pipe network features are utilized as static features, and they provide topography labels for different rainfall scenarios. These parameters encompass eight items, such as elevation, slope, pipe network density, and pipe diameter, which remain constant throughout the forecast period. The reasons for their selection are elaborated in Table 1. These attributes are instrumental in regulating the accumulation and flow direction of surface runoff, subsequently aiding the model's understanding of urban flooding responses under various terrain environments and pipe network topologies, thus enhancing the model's generalization ability. In this study, the spatial analysis tools of ArcGIS software were utilized to process the DEM, extracting raster terrain data such as elevation, slope, aspect, curvature, topographic wetness index (TWI), and flow accumulation. In addition, the total length and average diameter of the pipes within each raster were calculated by analyzing the drainage network's Shapefile data, thereby obtaining the drainage network's density and diameter. The resolution of all data was 3 m.

Table 1

Topographic features used as spatial markers for deep learning models

Input featuresNormalizeReason
Elevation [0,1] Ground elevation determines water flow direction and speed. Water flows from higher to lower elevations as elevation changes. 
Slope [0,1] Slope determines the speed of water flow. Steep slopes may cause water to flow faster, while gentle slopes may slow down water flow. 
Aspect [−1,1] Aspect impacts solar radiation intensity, affecting evaporation and soil moisture. It also determines water flow direction. 
Curvature [−1,1] Terrain curvature concentrates or disperses water flow. Concave surfaces concentrate flow. Convex surfaces disperse flow. 
TWI [0,1] TWI indicates soil moisture. It considers local slopes and upstream areas. High TWI means water pooling likely. 
Flow accumulation [0,1] This feature indicates the cumulative amount of upstream flow at a given point and can help determine which areas are likely to suffer greater flooding impacts. 
Link density [0,1] Pipe diameter limits urban floodwater drainage. Smaller pipes overflow faster after heavy rainfall. 
Link diameter [0,1] In urban areas, a high density of pipe networks indicates better drainage capacity, while the opposite may be a risk of poor drainage. 
Input featuresNormalizeReason
Elevation [0,1] Ground elevation determines water flow direction and speed. Water flows from higher to lower elevations as elevation changes. 
Slope [0,1] Slope determines the speed of water flow. Steep slopes may cause water to flow faster, while gentle slopes may slow down water flow. 
Aspect [−1,1] Aspect impacts solar radiation intensity, affecting evaporation and soil moisture. It also determines water flow direction. 
Curvature [−1,1] Terrain curvature concentrates or disperses water flow. Concave surfaces concentrate flow. Convex surfaces disperse flow. 
TWI [0,1] TWI indicates soil moisture. It considers local slopes and upstream areas. High TWI means water pooling likely. 
Flow accumulation [0,1] This feature indicates the cumulative amount of upstream flow at a given point and can help determine which areas are likely to suffer greater flooding impacts. 
Link density [0,1] Pipe diameter limits urban floodwater drainage. Smaller pipes overflow faster after heavy rainfall. 
Link diameter [0,1] In urban areas, a high density of pipe networks indicates better drainage capacity, while the opposite may be a risk of poor drainage. 

Dynamic feature images

High-resolution grid rainfall data serves as a dynamic feature input and undergoes detailed temporal discretization, segmented into several time intervals. The cumulative rainfall amount for each interval is extracted and utilized as an input feature and is normalized to the range of (0,1). This approach enables the accurate capture of the spatiotemporal variation characteristics of complex rainfall processes, thereby facilitating a more precise simulation of the subsequent hydrological process.

In the assembly of the dataset, the position of the storm center within the rainfall characteristics serves as the delineation criterion. The data collection methodology for both the training and validation subsets incorporates a fixed-grid sampling approach, with a uniform spacing of 600 m to ensure consistent coverage throughout the study area. This strategy precludes potential sampling bias induced by random selection. The test set is sampled via Monte Carlo simulation to guarantee randomness and representativeness. The storm centers adopted in the sampling process are depicted in Figure 4.
Figure 4

Spatial distribution of rainstorm center.

Figure 4

Spatial distribution of rainstorm center.

Close modal

Surrogate model

Deep learning model design

U-Net, an augmented model based on a fully convolutional network, has been widely used in image segmentation (Ronneberger et al. 2015; Zhou et al. 2018). Due to the adoption of the encode–decode architecture, the U-Net can efficiently capture contextual information in images while preserving detailed information, ensuring higher segmentation accuracy (Huang et al. 2020; Siddique et al. 2021). Despite U-Net's promising performance in previous flood forecasting research, doubts persist regarding its ability to effectively leverage deep features for complex data.

To address the aforementioned challenges, we propose a novel deep learning architecture, referred to as the CRU-Net model. This model incorporates Convolutional Block Attention Module (CBAM module) and Residual Block (Res Block), aiming to significantly enhance the feature extraction capabilities of deep learning models for graphics. The structure of the proposed CRU-Net model is shown in Figure 5.
Figure 5

The structure of the proposed prediction model: (a) The operation process of the CRU-Net, (b) components of the CBAM module; and (c) components of the Res Block.

Figure 5

The structure of the proposed prediction model: (a) The operation process of the CRU-Net, (b) components of the CBAM module; and (c) components of the Res Block.

Close modal

CBAM module: To adeptly capture the intricate spatial and temporal variations in rainfall, we introduce the CBAM module, as shown in Figure 5(b). This mechanism automatically identifies and adjusts the importance of features across spatial dimensions and channels, effectively focusing on crucial information while reducing irrelevant noise (Woo et al. 2018). In flood prediction, CBAM will be applied to emphasize important spatial features (e.g. topography, land cover) and channel features (e.g. multi-temporal meteorological data). By integrating CBAM in a convolutional network, the model is able to more accurately identify and weigh key factors that influence flood occurrence, such as extreme rainfall patterns and watershed response, thus improving prediction accuracy and model generalization.

Res Block: In flood prediction modeling, the incorporation of residual structures addresses the challenge of handling numerous input features. These structures as shown in Figure 5(c) bolster the model's proficiency in discerning complex nonlinear relationships, while averting issues like vanishing gradients and network degradation (He et al. 2016).

As depicted in Figure 5(a), CRU-Net inherits the encoding–decoding architecture of U-Net, comprising an input layer, downsampling layers, transition layers, upsampling layers, and an output layer. In the downsampling phase, we substituted the generic convolutional layers in U-Net with our proposed Res Blocks (Figure 5(c)). With each downsampling operation, the image dimensions were halved while the channel numbers were doubled, and vice versa during upsampling. We have integrated the CBAM module following each Res Block (Figure 5(b)). This integration allows the model to effectively discern the significance of specific channels and spatial locations within the network, thereby improving its overall ability to capture and process salient information. Finally, a 3 × 3 convolutional layer transformed the eight-channel output of the last residual unit into a flood depth map.

The CRU-Net integrates two Res Blocks. The encoder utilizes 7 × 7 convolution kernels to enhance the receptive field and extract advanced features. Conversely, the decoder employs 3 × 3 kernels for an optimal blend of computational efficiency and performance. Each block contains dual convolution layers, succeeded by Batch Normalization and Leaky ReLU activation, with a skip connection enabling identity mapping by joining the input to the post-operation output. The CBAM module uses channel attention to process global input feature context via average and max pooling. This substitutes a dense layer with dual 1 × 1 convolutions and aggregates features for channel weighting, obtaining attention weights through sigmoid activation. Spatially, global average and max pooling across channels merge, followed by a 7 × 7 convolution to determine spatial attention weights, which a sigmoid activation refines into the spatial attention map.

Model input

Depicted in Figure 6, this model employs a multichannel matrix input, where rainfall, terrain, pipe network, and other data serve as input features, comprising a total of nine feature images. Each feature is represented as an independent feature map and integrated in the channel dimension of the matrix, enabling the model to simultaneously fuse multiple parameters to extract features.
Figure 6

Multidimensional data input: terrain and rainfall feature concatenation matrix.

Figure 6

Multidimensional data input: terrain and rainfall feature concatenation matrix.

Close modal

To optimize the application of our model to watersheds of varying sizes and rainfall patterns, within the confines of hardware memory limitations, we implemented a patch-based processing workflow. This involved dividing the watershed into 1,024 × 1,024 pixel blocks, processed individually to manage the computational load effectively. The partitioning was conducted with a fixed step size of 512 pixels, ensuring uniformity in model inputs and preventing variations in performance (Guo et al. 2022). Ultimately, following the aforementioned processing, the training set, validation set, and test set comprise 624, 160, and 160 sets of input data, respectively.

Training strategies and evaluation index

In the training phase, the proposed prediction model aims to minimize the difference deviation from the model output map to the flood depth map generated by the mechanistic model. Throughout the training phase, we adhere to the parameters delineated in Table S2 (provided in the Supplementary material). For optimizing the learning rate (LR), an adaptive strategy is employed, starting with an initial LR of 10−3 and allowing it to decrease to 10−5. Should the validation set loss fail to improve after three epochs, the rate is reduced by 50%. To mitigate overfitting, early stopping is implemented. Training ceases if the model's validation set loss fails to improve over 10 successive epochs, safeguarding the model's generalizability. The upper limit for training epochs is established at 150. The evolution of the loss curve throughout the model's training phase was delineated in Figure S1 (provided in the Supplementary material).

To conduct an exhaustive performance assessment of the deep learning models, both regression and classification metrics were utilized herein to contrast the predicted inundation depths between hydrodynamic and data-driven approaches. The specific indicators are elaborated in Table S3 (provided in the Supplementary material). In alignment with preceding studies, water depths below 0.05 m were discarded before analysis to preclude their effects. In computing the critical success index (CSI), thresholds of 0.05 and 0.3 m were adopted to enable accurate appraisal of prediction accuracy across different inundation levels.

Model validation and evaluation

The ResU-Net and U-Net models are two widely used deep learning models for flood depth prediction. To assess the effectiveness of the proposed CRU-Net in handling complex data, the ResU-Net and U-Net models are also used to predict the flood depth using the same parameter settings as CRU-Net. The prediction results of ResU-Net and U-Net models are compared with the developed CRU-Net, and their performances were evaluated on the test set. Table 2 gives details of the performance of the three models in the inundation simulation task at 10 min step size. Overall, CRU-Net consistently outperforms the other two models across the evaluation metrics. In terms of regression metrics, CRU-Net achieves significantly lower values for root mean square error (RMSE: 0.054) and mean absolute error (MAE:0.011) compared to ResU-Net (0.068 and 0.017) and U-Net (0.095 and 0.020), indicating its superior accuracy in capturing varying extents of inundation. In terms of classification metrics, while ResU-Net exhibits slightly higher precision (0.890 vs. 0.871), CRU-Net demonstrates a markedly higher recall of 0.942, highlighting its advantage in distinguishing different degrees of inundation. Moreover, for comprehensive F1 scores, CRU-Net and ResU-Net perform comparably at 0.905 and 0.909, respectively.

Table 2

Comparing the performance metrics of CRU-Net, ResU-Net, and U-Net

RMSEMAENSEPrecision0.05CSI0.05CSI0.3Recall0.05F1 score0.05Parameters
CRU-Net 0.054 0.011 0.975 0.871 0.827 0.903 0.942 0.905 2.87 × 107 
ResU-Net 0.068 0.017 0.961 0.890 0.824 0.881 0.936 0.909 2.85 × 107 
U-Net 0.095 0.020 0.923 0.835 0.765 0.822 0.901 0.867 1.25 × 107 
RMSEMAENSEPrecision0.05CSI0.05CSI0.3Recall0.05F1 score0.05Parameters
CRU-Net 0.054 0.011 0.975 0.871 0.827 0.903 0.942 0.905 2.87 × 107 
ResU-Net 0.068 0.017 0.961 0.890 0.824 0.881 0.936 0.909 2.85 × 107 
U-Net 0.095 0.020 0.923 0.835 0.765 0.822 0.901 0.867 1.25 × 107 

The outstanding performance of CRU-Net can be primarily attributed to its custom-designed attention mechanism. By enabling automatic focus on salient regions, the model can better learn the characteristics of various inundation levels. This provides an explanation for CRU-Net's notable advantage in recall. Overall, the integrated architecture of CRU-Net, incorporating convolutional modules, attention mechanisms, and multi-scale features, allows for a stronger emphasis on prominent flooded areas. This, in turn, enhances the model's suitability for the complex inundation simulation task and validates the effectiveness of this architectural design.

Flood depth prediction

In this study, a slice-splicing approach was utilized to generate complete flood depth maps. To avoid noticeable stitching artifacts between adjacent image slices, a weighted averaging technique was employed to blend overlapping predictions. This ensured smooth transitions and maintained overall continuity and coherence in the synthesized maps. Research demonstrates that under nonuniform heavy rainfall conditions, waterlogging radiates outward from the storm center, leading to a spatially concentrated distribution. In these scenarios, the maximum submerged water depth significantly exceeds that observed under uniform rainfall with identical intensity, increasing by 12.4–31.8%. Deep learning models effectively capture this spatial and depth variation pattern. To showcase the capabilities of the developed CRU-Net, three representative storm centers were selected for visualization in Figure 7. The proposed CRU-Net architecture demonstrates impressive proficiency in accurately identifying inundation boundaries. It effectively distinguishes between deep and shallow flooded areas while also displaying notable precision in detecting finer features such as buildings and streets within the flooded regions. This ability is particularly noteworthy since identifying building and street outlines in complex flood scenarios is often challenging. CRU-Net successfully captures such intricate spatial details through its sophisticated deep network design and carefully designed training process.
Figure 7

Comparison of CRU-Net and LISFLOOD-FP in predicting flood depth on three typical samples in the test set.

Figure 7

Comparison of CRU-Net and LISFLOOD-FP in predicting flood depth on three typical samples in the test set.

Close modal
We conducted a comprehensive comparison between the outputs of deep learning models and the raw raster data generated by the mechanistic model. To visually represent the correlations between the two datasets, a scatter plot was created and depicted in Figure 8(a). The dense point cloud and the close linear trend observed in the plot indicate a strong agreement between the deep learning predictions and the physical model outputs across various scenarios. To further investigate the distributional differences between the predicted and target data, prediction error distribution histogram was also shown in Figure 8(a). These plots allow for a detailed examination of peaks, ranges, and potential deviations. Notably, most of the prediction errors are concentrated in the range of −0.1 to 0.1 m, and the frequency of the prediction decreases rapidly with the increase or decrease of the error value. This substantiates that the deep learning approach is capable of accurately capturing the fundamental characteristics and dynamics of the flood model.
Figure 8

(a) Scatter plot and error distribution histogram of water depth. (b) Probabilistic heat map for prediction distribution.

Figure 8

(a) Scatter plot and error distribution histogram of water depth. (b) Probabilistic heat map for prediction distribution.

Close modal

To conduct a more detailed analysis of the model's performance, the data were divided into specific flood risk categories, including slight risk (0–0.1 m), low risk (0.1–0.3 m), medium risk (0.3–1.0 m), and extreme risk (>1.0 m). This hierarchical classification provides a framework for evaluating the model's proficiency at a finer level of resolution. Figure 8(b) illustrates the probabilistic heatmaps generated from the model's predictions across the different risk categories. Notably, the occurrences along the diagonal of the heatmap are significantly more prevalent compared to the rest of the chart. This visual evidence supports the high accuracy of the deep learning model in classifying the bathymetric data. In other words, the model adeptly assigns the majority of bathymetric data to their appropriate risk categories. The results reveal that, although the probability exceeds 80% for the medium and extreme risk classes, there are instances of false positives. These misclassifications are predominantly concentrated at the lower boundaries within each category. We propose that this phenomenon may arise from inherent tendencies of convolutional networks to overly smooth outputs. When a large number of shallow depth values are present in the simulation, excessive smoothing can lead to underestimations for certain data points. This finding is consistent with the analysis from the scatterplot.

Model interpretability analysis

Features importance analysis

To comprehensively investigate flood inundation depth prediction and understand the role of individual features, this study employs a commonly used deep learning interpretability analysis technique called replacement feature importance evaluation. This technique involves randomly perturbing one feature while keeping others constant and examining the resulting changes in model accuracy. By quantifying the changes in error, we can approximate the relative contribution of each feature to the model's output.

In the original feature space, each channel was randomly perturbed 10 times, and the importance was ranked based on the average changes in RMSE and CSI. The classification metric (CSI) is used to evaluate the prediction accuracy of whether each grid point is submerged or not, while the regression metric (RMSE) measures the precision of the predicted specific flood depth at each grid point. The evaluation of feature importance is executed utilizing the identical model described previously, to ascertain which variables exert the most profound influence on the outcome of the prediction. As shown in Figure 9, Elevation had the most significant impact, while Flow Accumulation, Curvature, Aspect, and TWI had relatively comparable influences, which aligns with previous findings. However, Link Density and Diameter had minimal contributions. This could be attributed to the presence of substantial null values for these two features in some input slices, resulting from incomplete coverage of the drainage network during preprocessing. In regions with limited information, the usefulness of these features as valuable inputs to the model was hindered. Features that predominantly contain missing values tend to have diluted overall effects on model predictions.
Figure 9

Comparison of the importance of terrain features.

Figure 9

Comparison of the importance of terrain features.

Close modal

Interpretation of model visualization based on Grad-CAM

The preceding analyses support the remarkable inundation depth prediction capabilities of CRU-Net. However, the complex nature of deep neural networks with a vast number of parameters makes it challenging to provide a clear theoretical explanation of their intricate internal mechanisms. As the adoption of deep learning continues to expand across various fields, the need for model interpretability has become increasingly important. We utilized Gradient-weighted Class Activation Mapping (Grad-CAM) for this purpose. Grad-CAM, a technique designed for highlighting key regions influencing predictions, allows us to visually decode these areas during the model's inference in classification or regression tasks, enhancing our comprehension of the model's functioning (Selvaraju et al. 2017).

Figure 10 visualizes the activation maps generated by Grad-CAM at different layers of the CRU-Net model, which are (a) the encoder's layer 2 activation map highlighting low-level visual features; (b) the encoder's layer 5 activation map detecting higher-level features; (c) the decoder's layer 2 activation map beginning to reconstruct spatial information; (d) the decoder's layer 5 activation map recovering further semantic details; (e) the output layer activation map producing the final prediction; and (f) the original flood image for reference. Each subfigure reveals the inner workings of the network, demonstrating how different layers progressively learn and extract flood-relevant features as the image passes through the encoder–decoder architecture. The highlighted regions indicate the areas of focus and attention for CRU-Net. Through analyzing our model, we have observed the following patterns:
Figure 10

Grad-CAM visualization for CRU-Net: (a) encoder layer 2 activation map; (b) encoder layer 5 activation map; (c) decoder layer 2 activation map; (d) decoder layer 5 activation map; (e) output layer activation map; and (f) flood image for reference.

Figure 10

Grad-CAM visualization for CRU-Net: (a) encoder layer 2 activation map; (b) encoder layer 5 activation map; (c) decoder layer 2 activation map; (d) decoder layer 5 activation map; (e) output layer activation map; and (f) flood image for reference.

Close modal

During the progression from encoding to decoding, the attention of our CNN gradually shifts toward the regions within the waterlogging distribution. In the encoder part, the heat map exhibits a larger coverage area. This suggests that during the encoding process, the model extensively focuses on multiple regions within the input data. This behavior of the encoder can be attributed to its attempt to capture and comprehend a wide range of features and information from the input, which is crucial for subsequent data processing and feature extraction. Conversely, the heat map of the decoder section shows a decreasing range. As the model delves deeper layer by layer, the decoder increasingly concentrates on specific key regions that likely contain vital information required for decoding or regression tasks. This behavior indicates that the decoder tends to disregard less critical information when dealing with high-level, more abstract features. The output heat map of the final layer aligns with our expected results. This implies that during the final decision-making phase, the model focuses on the most relevant parts related to the task at hand. This validation of the model's efficiency and relevance further strengthens its performance. Overall, the Grad-CAM visualization provides valuable insights into the prediction process of CRU-Net, shedding light on how the model attends to different regions of the input data throughout its encoding and decoding stages.

Computational efficiency

Table 3 presents a comparison of the simulation speeds between CRU-Net and LISFLOOD-FP for reproducing a 6-h inundation caused by a specific short-duration design storm. The results clearly demonstrate CRU-Net's remarkable computational efficiency, as it is capable of generating simulations for 3 million grids in just 3 s. This is approximately 440 times faster than LISFLOOD-FP. Such a significant acceleration empowers real-time forecasting and decision support during flooding emergencies, enabling officials to take prompt and effective damage control actions.

Table 3

The runtimes of the CRU-Net and coupling model for simulating

ModelGrid numbersTrain timeSimulate timeComputing device
LIS-SWMM 3 × 106 – 22 min AMD Ryzen 7 6800H 
CRU-Net 3 × 106 6 h 2.9 s NVIDIA GeForce RTX 3060 
ResU-Net 3 × 106 6 h 2.2 s NVIDIA GeForce RTX 3060 
U-Net 3 × 106 4 h 1.9 s NVIDIA GeForce RTX 3060 
ModelGrid numbersTrain timeSimulate timeComputing device
LIS-SWMM 3 × 106 – 22 min AMD Ryzen 7 6800H 
CRU-Net 3 × 106 6 h 2.9 s NVIDIA GeForce RTX 3060 
ResU-Net 3 × 106 6 h 2.2 s NVIDIA GeForce RTX 3060 
U-Net 3 × 106 4 h 1.9 s NVIDIA GeForce RTX 3060 

The impact of rainfall heterogeneity on urban pluvial flooding characteristics, such as the increase in total system inundation volume and the prolongation of flooding duration, is a critical aspect of urban flood modeling. The uneven distribution of intense rainfall can exacerbate flood risks and management challenges (Lin et al. 2022). Our model has shown promising results in dealing with such heterogeneous rainfall inputs, ensuring robust performance even under the complex conditions that uneven precipitation patterns introduce. By effectively managing these variabilities, the CRU-Net model stands out as a valuable tool for urban flood mitigation strategies.

With advanced weather radar technologies providing high spatiotemporal quantitative precipitation estimates, detailed analysis and forecasting of spatial rainfall become feasible at high resolutions. The CRU-Net proposed in this paper demonstrates effective management of high-resolution spatial rainfall patterns with different structural distributions instead of traditional rainfall process lines (Rasheed et al. 2022; Zhang et al. 2023). These observations can supply essential initial and boundary conditions for pluvial flood models, enabling coupled forecasting of weather systems and hydrologic responses. This greatly improves the lead time and accuracy of urban flood warnings.

In this research, we introduce a model based on CNNs that demonstrates effective end-to-end processing capabilities, enabling the generation of high-resolution flood depth maps across the entire study area. This advancement contrasts with graph neural network approaches (Farahmand et al. 2023; Feng & Mao 2024), which are restricted to predefined monitoring nodes. In terms of performance comparison, our model surpasses those based on generative adversarial networks (do Lago et al. 2023), demonstrating higher accuracy in distinguishing between submerged and non-submerged conditions. Compared to the U-Net model used in previous literature (Guo et al. 2020; Löwe et al. 2021; Seleem et al. 2023), the model designed here realizes spatiotemporal deep learning for waterlogging prediction by extracting temporal information between channels and employing a spatial attention module to focus on critical spatial plots. However, current capabilities remain limited to predicting peak inundation depths without modeling the entire surface flow process.

While the tiling approach employed in this study may compromise basin-scale geomorphological features, it provides standardized input data. By incorporating terrain labels for each patch along with precipitation inputs, the model can learn inundation characteristics across different regions. This has the potential to improve the transferability of the model to arbitrary basins and rainfall conditions as more data is accumulated.

It is worth noting that despite the requirement for extensive pre-training, CRU-Net's efficient inference allows for versatile applicability to diverse precipitation patterns after training. This expedites the generation of flood alerts, enhancing its utility in real-time inundation forecasting. However, it is important to ensure the availability of sufficient training data for robust deep learning predictions, which should be taken into consideration during deployment.

Urban flooding prediction is crucial for comprehensive flood risk assessment and the development of effective resource allocation strategies. Deep learning approaches have gained traction in urban emergency flood prediction. However, the spatial structure of rainfall, which has a profound influence on urban flooding, is often overlooked in many deep learning investigations. This study introduces a novel deep learning framework equipped with an attention mechanism to anticipate inundation depths in urban terrains with the spatiotemporal rainfall patterns considered. The proposed CRU-Net model can effectively capture the intricate relationships between rainfall inputs and surface inundation dynamics by learning the inherent spatial features of precipitation. This capability extends to unseen rainfall scenarios, allowing for generalization. With an RMSE of 0.054 m and a Nash–Sutcliffe efficiency (NSE) of 0.975, the CRU-Net model demonstrates high accuracy in flood prediction. Specifically, the model achieves over 80% accuracy in predicting inundation locations with depths greater than 0.3 m. The CRU-Net model, incorporating attention mechanisms, not only accurately predicts flood evolution but also prioritizes high-risk areas such as underpasses and low-lying terrains, enhancing the model's effectiveness. Moreover, the CRU-Net can complete predictions for approximately 3 million grid cells in just 2.9 s. The superior efficiency of data-driven approaches compared to conventional physically based models make them well-suited for real-time flood forecasting and emergency response.

Moving forward, further optimization of deep learning architectures, such as exploring large transformer-based structures and augmenting the diversity of training data by simulating flooding results across different cities, will be pursued. Integration of state-of-the-art rainfall forecasting capabilities is expected to enhance the predictive power and applicability of the models. Overall, this study opens avenues for leveraging deep learning approaches in urban flood modeling and prediction, paving the way for improved flood management and response systems.

This study was supported by the National Natural Science Foundation of China (Grant 52270095, 52200119).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Berkhahn
S.
,
Fuchs
L.
&
Neuweiler
I.
2019
An ensemble neural network model for real-time prediction of urban floods
.
Journal of Hydrology
575
,
743
754
.
https://doi.org/10.1016/j.jhydrol.2019.05.066
.
Chen
Y. B.
,
Zhou
H. L.
,
Zhang
H.
,
Du
G. M.
&
Zhou
J. H.
2015
Urban flood risk warning under rapid urbanization
.
Environmental Research
139
,
3
10
.
https://doi.org/10.1016/j.envres.2015.02.028
.
Cristiano
E.
,
ten Veldhuis
M. C.
&
van De Giesen
N.
2017
Spatial and temporal variability of rainfall and their effects on hydrological response in urban areas – A review
.
Hydrology and Earth System Sciences
21
(
7
),
3859
3878
.
https://doi.org/10.5194/hess-21-3859-2017
.
do Lago
C. A. F.
,
Giacomoni
M. H.
,
Bentivoglio
R.
,
Taormina
R.
,
Gomes
M. N.
&
Mendiondo
E. M.
2023
Generalizing rapid flood predictions to unseen urban catchments with conditional generative adversarial networks
.
Journal of Hydrology
618
,
129276
.
https://doi.org/10.1016/j.jhydrol.2023.129276
.
Donnelly
J.
,
Abolfathi
S.
,
Pearson
J.
,
Chatrabgoun
O.
&
Daneshkhah
A.
2022
Gaussian process emulation of spatio-temporal outputs of a 2D inland flood model
.
Water Research
225
.
https://doi.org/10.1016/j.watres.2022.119100
.
Farahmand
H.
,
Xu
Y.
&
Mostafavi
A.
2023
A spatial–temporal graph deep learning model for urban flood nowcasting leveraging heterogeneous community features
.
Scientific Reports
13
(
1
),
6768
.
https://doi.org/10.1038/s41598-023-32548-x
.
Feng
J.
&
Mao
Y.
2024
Distribution-adaptive graph attention networks for flood forecasting
. In:
PRICAI 2023: Trends in Artificial Intelligence
, pp.
340
352
.
https://doi.org/10.1007/978-981-99-7019-3_32
.
Galland
J. C.
,
Goutal
N.
&
Hervouet
J. M.
1991
TELEMAC – A new numerical-model for solving shallow-water equations
.
Advances in Water Resources
14
(
3
),
138
148
.
https://doi.org/10.1016/0309-1708(91)90006-a
.
Garzon
A.
,
Kapelan
Z.
,
Langeveld
J.
&
Taormina
R.
2022
Machine learning-based surrogate modeling for urban water networks: Review and future research directions
.
Water Resources Research
58
,
5
.
https://doi.org/10.1029/2021wr031808
.
Guidolin
M.
,
Chen
A. S.
,
Ghimire
B.
,
Keedwell
E. C.
,
Djordjevic
S.
&
Savic
D. A.
2016
A weighted cellular automata 2D inundation model for rapid flood analysis
.
Environmental Modelling & Software
84
,
378
394
.
https://doi.org/10.1016/j.envsoft.2016.07.008
.
Guo
Z.
,
Leitão
J. P.
,
Simões
N. E.
&
Moosavi
V.
2020
Data-driven flood emulation: Speeding up urban flood predictions by deep convolutional neural networks
.
Journal of Flood Risk Management
14
,
1
.
https://doi.org/10.1111/jfr3.12684
.
Guo
Z.
,
Moosavi
V.
&
Leitão
J. P.
2022
Data-driven rapid flood prediction mapping with catchment generalizability
.
Journal of Hydrology
609
.
https://doi.org/10.1016/j.jhydrol.2022.127726
.
He
K.
,
Zhang
X.
,
Ren
S.
&
Sun
J. J. I.
2016
Deep residual learning for image recognition
. In:
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp.
770
778
.
https://doi.org/10.1109/CVPR.2016.90
.
He
J.
,
Zhang
L.
,
Xiao
T.
,
Wang
H.
&
Luo
H.
2023
Deep learning enables super-resolution hydrodynamic flooding process modeling under spatiotemporally varying rainstorms
.
Water Research
239
,
120057
.
https://doi.org/10.1016/j.watres.2023.120057
.
Huang
H. M.
,
Lin
L. F.
,
Tong
R. F.
,
Hu
H. J.
,
Zhang
Q. W.
,
Iwamoto
Y.
,
Han
X. H.
,
Chen
Y. W.
&
Wu
J.
2020
UNet 3 + : A full-scale connected UNet for medical image segmentation
. In:
ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP)
, pp.
1055
1059
.
https://doi.org/10.1109/icassp40776.2020.9053405
.
Jamali
B.
,
Bach
P. M.
,
Cunningham
L.
&
Deletic
A.
2019
A Cellular Automata fast flood evaluation (CA-ffe) model
.
Water Resources Research
55
(
6
),
4936
4953
.
https://doi.org/10.1029/2018wr023679
.
Kabir
S.
,
Patidar
S.
,
Xia
X.
,
Liang
Q.
,
Neal
J.
&
Pender
G.
2020
A deep convolutional neural network model for rapid prediction of fluvial flood inundation
.
Journal of Hydrology
590
.
https://doi.org/10.1016/j.jhydrol.2020.125481
.
Liao
Y.
,
Wang
Z.
,
Chen
X.
&
Lai
C.
2023
Fast simulation and prediction of urban pluvial floods using a deep convolutional neural network model
.
Journal of Hydrology
624
.
https://doi.org/10.1016/j.jhydrol.2023.129945
.
Lin
R.
,
Zheng
F.
,
Ma
Y.
,
Duan
H.-F.
,
Chu
S.
&
Deng
Z.
2022
Impact of spatial variation and uncertainty of rainfall intensity on urban flooding assessment
.
Water Resources Management
36
(
14
),
5655
5673
.
https://doi.org/10.1007/s11269-022-03325-8
.
Löwe
R.
,
Böhm
J.
,
Jensen
D. G.
,
Leandro
J.
&
Rasmussen
S. H.
2021
U-FLOOD – Topographic deep learning for predicting urban pluvial flood water depth
.
Journal of Hydrology
603
.
https://doi.org/10.1016/j.jhydrol.2021.126898
.
Neal
J.
,
Schumann
G.
,
Fewtrell
T.
,
Budimir
M.
,
Bates
P.
&
Mason
D.
2011
Evaluating a new LISFLOOD-FP formulation with data from the summer 2007 floods in Tewkesbury, UK
.
Journal of Flood Risk Management
4
(
2
),
88
95
.
https://doi.org/10.1111/j.1753-318X.2011.01093.x
.
Rasheed
Z.
,
Aravamudan
A.
,
Sefidmazgi
A. G.
,
Anagnostopoulos
G. C.
&
Nikolopoulos
E. I.
2022
Advancing flood warning procedures in ungauged basins with machine learning
.
Journal of Hydrology
609
.
https://doi.org/10.1016/j.jhydrol.2022.127736
.
Ronneberger
O.
,
Fischer
P.
&
Brox
T.
2015
U-Net: Convolutional networks for biomedical image segmentation
. In:
18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)
, pp.
234
241
.
https://doi.org/10.1007/978-3-319-24574-4_28
.
Seleem
O.
,
Ayzel
G.
,
Bronstert
A.
&
Heistermann
M.
2023
Transferability of data-driven models to predict urban pluvial flood water depth in Berlin, Germany
.
Natural Hazards and Earth System Sciences
23
(
2
),
809
822
.
https://doi.org/10.5194/nhess-23-809-2023
.
Selvaraju
R. R.
,
Cogswell
M.
,
Das
A.
,
Vedantam
R.
,
Parikh
D.
&
Batra
D.
&
IEEE
2017
Grad-CAM: Visual explanations from deep networks via gradient-based localization
. In:
16th IEEE International Conference on Computer Vision (ICCV)
, pp.
618
626
.
https://doi.org/10.1007/s11263-019-01228-7
.
Sharifian
M. K.
,
Kesserwani
G.
,
Chowdhury
A. A.
,
Neal
J.
&
Bates
P.
2023
LISFLOOD-FP 8.1: New GPU-accelerated solvers for faster fluvial/pluvial flood simulations
.
Geoscientific Model Development
16
(
9
),
2391
2413
.
https://doi.org/10.5194/gmd-16-2391-2023
.
Siddique
N.
,
Paheding
S.
,
Elkin
C. P.
&
Devabhaktuni
V.
2021
U-Net and its variants for medical image segmentation: A review of theory and applications
.
IEEE Access
9
,
82031
82057
.
https://doi.org/10.1109/ACCESS.2021.3086020
.
Woo
S. H.
,
Park
J.
,
Lee
J. Y.
&
Kweon
I. S.
2018
CBAM: Convolutional block attention module
. In:
15th European Conference on Computer Vision (ECCV)
, pp.
3
19
.
https://doi.org/10.1007/978-3-030-01234-2_1
.
Wu
X. S.
,
Wang
Z. L.
,
Guo
S. L.
,
Liao
W. L.
,
Zeng
Z. Y.
&
Chen
X. H.
2017
Scenario-based projections of future urban inundation within a coupled hydrodynamic model framework: A case study in Dongguan City, China
.
Journal of Hydrology
547
,
428
442
.
https://doi.org/10.1016/j.jhydrol.2017.02.020
.
Yang
L.
,
Smith
J. A.
,
Baeck
M. L.
&
Zhang
Y.
2016
Flash flooding in small urban watersheds: Storm event hydrologic response
.
Water Resources Research
52
(
6
),
4571
4589
.
https://doi.org/10.1002/2015wr018326
.
Zeng
Z. Y.
,
Wang
Z. L.
&
Lai
C. G.
2022
Simulation performance evaluation and uncertainty analysis on a coupled inundation model combining SWMM and WCA2D
.
International Journal of Disaster Risk Science
13
(
3
),
448
464
.
https://doi.org/10.1007/s13753-022-00416-3
.
Zhang
L.
,
Qin
H. P.
,
Mao
J. Q.
,
Cao
X. Y.
&
Fu
G. T.
2023
High temporal resolution urban flood prediction using attention-based LSTM models
.
Journal of Hydrology
620
.
https://doi.org/10.1016/j.jhydrol.2023.129499
.
Zhou
Z. W.
,
Siddiquee
M. M. R.
,
Tajbakhsh
N.
&
Liang
J. M.
2018
UNet plus plus: A nested U-Net architecture for medical image segmentation
. In
4th International Workshop on Deep Learning in Medical Image Analysis (DLMIA)/8th International Workshop on Multimodal Learning for Clinical Decision Support (ML-CDS)
.
https://doi.org/10.1007/978-3-030-00889-5_1
.
Zhou
Z. Z.
,
Smith
J. A.
,
Baeck
M. L.
,
Wright
D. B.
,
Smith
B. K.
&
Liu
S. G.
2021
The impact of the spatiotemporal structure of rainfall on flood frequency over a small urban watershed: An approach coupling stochastic storm transposition and hydrologic modeling
.
Hydrology and Earth System Sciences
25
(
8
),
4701
4717
.
https://doi.org/10.5194/hess-25-4701-2021
.
Zhu
Z. H.
,
Wright
D. B.
&
Yu
G.
2018
The impact of rainfall space-Time structure in flood frequency analysis
.
Water Resources Research
54
(
11
),
8983
8998
.
https://doi.org/10.1029/2018wr023550
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Supplementary data