ABSTRACT
Urban floods pose a significant threat to human communities, making its prediction essential for comprehensive flood risk assessment and the formulation of effective resource allocation strategies. Data-driven deep learning approaches have gained traction in urban emergency flood prediction, addressing the efficiency constraints of physical models. However, the spatial structure of rainfall, which has a profound influence on urban flooding, is often overlooked in many deep learning investigations. In this study, we introduce a novel deep learning model known as CRU-Net equipped with an attention mechanism to predict inundation depths in urban terrains based on spatiotemporal rainfall patterns. This method utilizes eight topographic parameters related to the height of urban waterlogging, combined with spatial rainfall data as inputs to the model. Comparative evaluations between the developed CRU-Net and two other deep learning models, U-Net and ResU-Net, reveal that CRU-Net adeptly interprets the spatiotemporal traits of rainfall and accurately estimates flood depths, emphasizing deep inundation and flood-vulnerable regions. The model demonstrates exceptional accuracy, evidenced by a root mean square error of 0.054 m and a Nash–Sutcliffe efficiency of 0.975. CRU-Net also accurately predicts over 80% of inundation locations with depths exceeding 0.3 m. Remarkably, CRU-Net delivers predictions for 3 million grids in 2.9 s, showcasing its efficiency.
HIGHLIGHTS
The incorporation of spatial rainfall distribution into urban flood forecasting models is performed in this study.
Integration of attention mechanisms enhanced the identification of high-risk flood areas.
The developed model can predict rapid floods within seconds.
INTRODUCTION
Extreme precipitation events associated with global climate change have rendered urban flooding an increasingly pervasive challenge across metropolitan regions worldwide (Yang et al. 2016). Flooding has significant socioeconomic consequences, affecting various aspects of daily life, safety, and property. Thus, developing accurate predictive models that enable real-time flood warnings constitutes an urgent research priority (Chen et al. 2015).
Traditional hydrological and hydrodynamic modeling helps us understand the causes and dynamics of floods by employing sophisticated mathematical equations, such as the two-dimensional shallow water equation. Industry-standard models, like Info Works and MIKE Flood, along with open-source alternatives like LISFLOOD-FP (Neal et al. 2011) and TELEMAC (Galland et al. 1991), have advanced riverine flood and urban inundation simulations. However, these traditional tools have inherent drawbacks, including high computational intensity, long running time, and high dependence on accurate geospatial and land use data. As a result, the speed of traditional models in the process of conducting real-time early warning of urban flooding often appears to be overwhelming, making it difficult to meet the needs of flood prevention and risk avoidance in an urban context (Garzon et al. 2022).
Considerable effort have been dedicated to exploring methodologies aimed at alleviating the computational load through: (1) employing GPU acceleration for parallel computing (Sharifian et al. 2023); (2) circumventing the use of shallow water equations and utilizing cellular automata methods (Guidolin et al. 2016; Jamali et al. 2019); and (3) harnessing data-driven artificial intelligence, particularly deep learning techniques, to quickly emulate historical flood data or results from physical models (Berkhahn et al. 2019; He et al. 2023). Deep learning models, especially convolutional neural networks (CNNs), with their intrinsic ability to extract spatiotemporal features from huge datasets, are gaining increasing attention. Kabir et al. (2020) and Donnelly et al. (2022) demonstrate the power of CNNs in rapid river flood forecasting. Furthermore, the contributions by Löwe et al. (2021) and Liao et al. (2023) highlight the adaptability and effectiveness of deep learning agents compared to established physical models. In the present era, characterized by the prominence of large models, deep learning methods exhibit enhanced generalizability. By augmenting datasets or transferring trained models, these methods can be effectively extended to unexplored watersheds, showcasing their potential for broader applicability (Guo et al. 2020; Seleem et al. 2023).
Despite deep learning techniques becoming increasingly prevalent in flood forecasting, their limitations in handling spatiotemporal data and extracting meaningful features from rainfall data have become apparent. In flood prediction, some input features, such as topography, land cover, and inundation depth, should be emphasized. To identify and adjust the importance of these features across spatial dimensions and focus on crucial information, attention mechanisms are adopted in the deep learning model. These mechanisms allow neural networks to selectively prioritize certain data segments, potentially improving model accuracy by emphasizing areas susceptible to flooding and accounting for the unique spatial characteristics of rainfall and inundation. Zhang et al. (2023) developed a flood forecasting model that combines Long Short-Term Memory networks with attention mechanisms, concentrating on key features. The model demonstrated high predictive accuracy, achieving a coefficient of determination over 0.85, a peak flow error below 0.015 m, and a peak arrival time error under 2 min. Farahmand et al. (2023) introduced a spatiotemporal graph deep learning model for real-time urban flood forecasting. This model combines physical-based features with human-perceived data, utilizing an Attention-based Spatio-Temporal Graph Convolutional Network (ASTGCN), thus focusing on the most impactful dynamic change features. Although these studies have demonstrated the potential of attention mechanisms in improving flood prediction accuracy, particularly by focusing on crucial monitoring data to enhance the predictive capability of models, their scope has primarily been limited to predicting inundation depths at selected monitoring points or nodes. While these approaches prove partially effective, it fails to provide high-resolution flood depth prediction maps across the entire region. The prediction of high-resolution inundation depth, including details at the community scale, is crucial for comprehensive flood risk assessment and the development of effective resource allocation strategies.
The spatial distribution and intensity fluctuations of intense rainfall is another key factor that plays a crucial role in determining flood occurrence and progression (Cristiano et al. 2017). Zhou et al. (2021) conducted a comprehensive analysis of the spatiotemporal dynamics of rainfall and its impact on flood frequency in river basins, highlighting the significant role of rainfall's spatial diversity during extreme events. They found a strong link between rainfall patterns and flood peak sizes, noting an average 50% increase in flood peak magnitudes. This aligns with Zhu et al. (2018), emphasizing the importance of including spatial and temporal rainfall data in flood frequency analyses, even for small urban catchments. Their work shows that variations in rainfall distribution critically influence both the volume and peak of runoff, especially in areas with impervious surfaces. Collectively, these studies highlight the critical importance of understanding rainfall's spatial variability for accurate hydrological response predictions and effective flood forecasting. Unfortunately, a majority of existing research tends to neglect the spatial heterogeneity of rainfall and assume that the rainfall intensity is uniformly distributed across urban catchments. This assumption is an oversimplification that may not accurately capture real-world scenarios. Moreover, much of the existing research mainly focus on river-based flooding and modeling inundation in large watersheds, overlooking the intricate interactions of urban drainage systems in flood mitigation and water evacuation.
Recognizing this research gap, our study endeavors to design an attention-enhanced deep learning network specifically for predicting inundation depths in urban areas, considering the spatiotemporal dynamics of rainfall. This approach enables the processing of high-resolution rainfall data grids, effectively capturing the spatial and temporal variations of intense rainfall, offering faster and more accurate alternatives for simulating and predicting urban floods. This advancement is crucial for the development of rapid warning systems, facilitating timely responses and improved preparedness measures.
METHODS AND MATERIALS
Data collection and mechanism model
Case study
Spatial rainfall distribution
1D/2D coupling mechanistic model
To more accurately simulate the inundation of urban areas under uneven rainfall conditions, a 1D/2D coupling model based on LISFLOOD-FP and the Storm Water Management Model (SWMM) was designed and implemented. It has demonstrated its applicability in simulating both surface and subsurface flows and has been effectively utilized in areas such as flood risk assessment, predicting the impact of climate change on floods, and managing river basins (Wu et al. 2017; Zeng et al. 2022). Building on the proven accuracy of the SWMM-LISFLOOD coupled model, we set up a high-resolution modeling system over a dense urban catchment using cutting-edge geospatial datasets. A total of 809 virtual rain gauges were configured in the SWMM to capture the spatial variability of rainfall, driving a LISFLOOD-FP overland flow model with surcharge manhole data. A high-resolution digital elevation model (DEM) of 3 × 3 m is utilized to represent the complex urban surface terrain. Details on the parameter settings of the coupled model are comprehensively presented in Table S1 (provided in the Supplementary material).
Data processing and configuration
In this study, the model's input features are categorized into static and dynamic types. Specifically, the model incorporates eight static feature parameters, whereas the rainfall characteristic is regarded as a dynamic feature, with its number of features being adjustable dynamically in accordance with time intervals.
Static feature images
Topography and pipe network features are utilized as static features, and they provide topography labels for different rainfall scenarios. These parameters encompass eight items, such as elevation, slope, pipe network density, and pipe diameter, which remain constant throughout the forecast period. The reasons for their selection are elaborated in Table 1. These attributes are instrumental in regulating the accumulation and flow direction of surface runoff, subsequently aiding the model's understanding of urban flooding responses under various terrain environments and pipe network topologies, thus enhancing the model's generalization ability. In this study, the spatial analysis tools of ArcGIS software were utilized to process the DEM, extracting raster terrain data such as elevation, slope, aspect, curvature, topographic wetness index (TWI), and flow accumulation. In addition, the total length and average diameter of the pipes within each raster were calculated by analyzing the drainage network's Shapefile data, thereby obtaining the drainage network's density and diameter. The resolution of all data was 3 m.
Input features . | Normalize . | Reason . |
---|---|---|
Elevation | [0,1] | Ground elevation determines water flow direction and speed. Water flows from higher to lower elevations as elevation changes. |
Slope | [0,1] | Slope determines the speed of water flow. Steep slopes may cause water to flow faster, while gentle slopes may slow down water flow. |
Aspect | [−1,1] | Aspect impacts solar radiation intensity, affecting evaporation and soil moisture. It also determines water flow direction. |
Curvature | [−1,1] | Terrain curvature concentrates or disperses water flow. Concave surfaces concentrate flow. Convex surfaces disperse flow. |
TWI | [0,1] | TWI indicates soil moisture. It considers local slopes and upstream areas. High TWI means water pooling likely. |
Flow accumulation | [0,1] | This feature indicates the cumulative amount of upstream flow at a given point and can help determine which areas are likely to suffer greater flooding impacts. |
Link density | [0,1] | Pipe diameter limits urban floodwater drainage. Smaller pipes overflow faster after heavy rainfall. |
Link diameter | [0,1] | In urban areas, a high density of pipe networks indicates better drainage capacity, while the opposite may be a risk of poor drainage. |
Input features . | Normalize . | Reason . |
---|---|---|
Elevation | [0,1] | Ground elevation determines water flow direction and speed. Water flows from higher to lower elevations as elevation changes. |
Slope | [0,1] | Slope determines the speed of water flow. Steep slopes may cause water to flow faster, while gentle slopes may slow down water flow. |
Aspect | [−1,1] | Aspect impacts solar radiation intensity, affecting evaporation and soil moisture. It also determines water flow direction. |
Curvature | [−1,1] | Terrain curvature concentrates or disperses water flow. Concave surfaces concentrate flow. Convex surfaces disperse flow. |
TWI | [0,1] | TWI indicates soil moisture. It considers local slopes and upstream areas. High TWI means water pooling likely. |
Flow accumulation | [0,1] | This feature indicates the cumulative amount of upstream flow at a given point and can help determine which areas are likely to suffer greater flooding impacts. |
Link density | [0,1] | Pipe diameter limits urban floodwater drainage. Smaller pipes overflow faster after heavy rainfall. |
Link diameter | [0,1] | In urban areas, a high density of pipe networks indicates better drainage capacity, while the opposite may be a risk of poor drainage. |
Dynamic feature images
High-resolution grid rainfall data serves as a dynamic feature input and undergoes detailed temporal discretization, segmented into several time intervals. The cumulative rainfall amount for each interval is extracted and utilized as an input feature and is normalized to the range of (0,1). This approach enables the accurate capture of the spatiotemporal variation characteristics of complex rainfall processes, thereby facilitating a more precise simulation of the subsequent hydrological process.
Surrogate model
Deep learning model design
U-Net, an augmented model based on a fully convolutional network, has been widely used in image segmentation (Ronneberger et al. 2015; Zhou et al. 2018). Due to the adoption of the encode–decode architecture, the U-Net can efficiently capture contextual information in images while preserving detailed information, ensuring higher segmentation accuracy (Huang et al. 2020; Siddique et al. 2021). Despite U-Net's promising performance in previous flood forecasting research, doubts persist regarding its ability to effectively leverage deep features for complex data.
CBAM module: To adeptly capture the intricate spatial and temporal variations in rainfall, we introduce the CBAM module, as shown in Figure 5(b). This mechanism automatically identifies and adjusts the importance of features across spatial dimensions and channels, effectively focusing on crucial information while reducing irrelevant noise (Woo et al. 2018). In flood prediction, CBAM will be applied to emphasize important spatial features (e.g. topography, land cover) and channel features (e.g. multi-temporal meteorological data). By integrating CBAM in a convolutional network, the model is able to more accurately identify and weigh key factors that influence flood occurrence, such as extreme rainfall patterns and watershed response, thus improving prediction accuracy and model generalization.
Res Block: In flood prediction modeling, the incorporation of residual structures addresses the challenge of handling numerous input features. These structures as shown in Figure 5(c) bolster the model's proficiency in discerning complex nonlinear relationships, while averting issues like vanishing gradients and network degradation (He et al. 2016).
As depicted in Figure 5(a), CRU-Net inherits the encoding–decoding architecture of U-Net, comprising an input layer, downsampling layers, transition layers, upsampling layers, and an output layer. In the downsampling phase, we substituted the generic convolutional layers in U-Net with our proposed Res Blocks (Figure 5(c)). With each downsampling operation, the image dimensions were halved while the channel numbers were doubled, and vice versa during upsampling. We have integrated the CBAM module following each Res Block (Figure 5(b)). This integration allows the model to effectively discern the significance of specific channels and spatial locations within the network, thereby improving its overall ability to capture and process salient information. Finally, a 3 × 3 convolutional layer transformed the eight-channel output of the last residual unit into a flood depth map.
The CRU-Net integrates two Res Blocks. The encoder utilizes 7 × 7 convolution kernels to enhance the receptive field and extract advanced features. Conversely, the decoder employs 3 × 3 kernels for an optimal blend of computational efficiency and performance. Each block contains dual convolution layers, succeeded by Batch Normalization and Leaky ReLU activation, with a skip connection enabling identity mapping by joining the input to the post-operation output. The CBAM module uses channel attention to process global input feature context via average and max pooling. This substitutes a dense layer with dual 1 × 1 convolutions and aggregates features for channel weighting, obtaining attention weights through sigmoid activation. Spatially, global average and max pooling across channels merge, followed by a 7 × 7 convolution to determine spatial attention weights, which a sigmoid activation refines into the spatial attention map.
Model input
To optimize the application of our model to watersheds of varying sizes and rainfall patterns, within the confines of hardware memory limitations, we implemented a patch-based processing workflow. This involved dividing the watershed into 1,024 × 1,024 pixel blocks, processed individually to manage the computational load effectively. The partitioning was conducted with a fixed step size of 512 pixels, ensuring uniformity in model inputs and preventing variations in performance (Guo et al. 2022). Ultimately, following the aforementioned processing, the training set, validation set, and test set comprise 624, 160, and 160 sets of input data, respectively.
Training strategies and evaluation index
In the training phase, the proposed prediction model aims to minimize the difference deviation from the model output map to the flood depth map generated by the mechanistic model. Throughout the training phase, we adhere to the parameters delineated in Table S2 (provided in the Supplementary material). For optimizing the learning rate (LR), an adaptive strategy is employed, starting with an initial LR of 10−3 and allowing it to decrease to 10−5. Should the validation set loss fail to improve after three epochs, the rate is reduced by 50%. To mitigate overfitting, early stopping is implemented. Training ceases if the model's validation set loss fails to improve over 10 successive epochs, safeguarding the model's generalizability. The upper limit for training epochs is established at 150. The evolution of the loss curve throughout the model's training phase was delineated in Figure S1 (provided in the Supplementary material).
To conduct an exhaustive performance assessment of the deep learning models, both regression and classification metrics were utilized herein to contrast the predicted inundation depths between hydrodynamic and data-driven approaches. The specific indicators are elaborated in Table S3 (provided in the Supplementary material). In alignment with preceding studies, water depths below 0.05 m were discarded before analysis to preclude their effects. In computing the critical success index (CSI), thresholds of 0.05 and 0.3 m were adopted to enable accurate appraisal of prediction accuracy across different inundation levels.
RESULTS
Model validation and evaluation
The ResU-Net and U-Net models are two widely used deep learning models for flood depth prediction. To assess the effectiveness of the proposed CRU-Net in handling complex data, the ResU-Net and U-Net models are also used to predict the flood depth using the same parameter settings as CRU-Net. The prediction results of ResU-Net and U-Net models are compared with the developed CRU-Net, and their performances were evaluated on the test set. Table 2 gives details of the performance of the three models in the inundation simulation task at 10 min step size. Overall, CRU-Net consistently outperforms the other two models across the evaluation metrics. In terms of regression metrics, CRU-Net achieves significantly lower values for root mean square error (RMSE: 0.054) and mean absolute error (MAE:0.011) compared to ResU-Net (0.068 and 0.017) and U-Net (0.095 and 0.020), indicating its superior accuracy in capturing varying extents of inundation. In terms of classification metrics, while ResU-Net exhibits slightly higher precision (0.890 vs. 0.871), CRU-Net demonstrates a markedly higher recall of 0.942, highlighting its advantage in distinguishing different degrees of inundation. Moreover, for comprehensive F1 scores, CRU-Net and ResU-Net perform comparably at 0.905 and 0.909, respectively.
. | RMSE . | MAE . | NSE . | Precision0.05 . | CSI0.05 . | CSI0.3 . | Recall0.05 . | F1 score0.05 . | Parameters . |
---|---|---|---|---|---|---|---|---|---|
CRU-Net | 0.054 | 0.011 | 0.975 | 0.871 | 0.827 | 0.903 | 0.942 | 0.905 | 2.87 × 107 |
ResU-Net | 0.068 | 0.017 | 0.961 | 0.890 | 0.824 | 0.881 | 0.936 | 0.909 | 2.85 × 107 |
U-Net | 0.095 | 0.020 | 0.923 | 0.835 | 0.765 | 0.822 | 0.901 | 0.867 | 1.25 × 107 |
. | RMSE . | MAE . | NSE . | Precision0.05 . | CSI0.05 . | CSI0.3 . | Recall0.05 . | F1 score0.05 . | Parameters . |
---|---|---|---|---|---|---|---|---|---|
CRU-Net | 0.054 | 0.011 | 0.975 | 0.871 | 0.827 | 0.903 | 0.942 | 0.905 | 2.87 × 107 |
ResU-Net | 0.068 | 0.017 | 0.961 | 0.890 | 0.824 | 0.881 | 0.936 | 0.909 | 2.85 × 107 |
U-Net | 0.095 | 0.020 | 0.923 | 0.835 | 0.765 | 0.822 | 0.901 | 0.867 | 1.25 × 107 |
The outstanding performance of CRU-Net can be primarily attributed to its custom-designed attention mechanism. By enabling automatic focus on salient regions, the model can better learn the characteristics of various inundation levels. This provides an explanation for CRU-Net's notable advantage in recall. Overall, the integrated architecture of CRU-Net, incorporating convolutional modules, attention mechanisms, and multi-scale features, allows for a stronger emphasis on prominent flooded areas. This, in turn, enhances the model's suitability for the complex inundation simulation task and validates the effectiveness of this architectural design.
Flood depth prediction
To conduct a more detailed analysis of the model's performance, the data were divided into specific flood risk categories, including slight risk (0–0.1 m), low risk (0.1–0.3 m), medium risk (0.3–1.0 m), and extreme risk (>1.0 m). This hierarchical classification provides a framework for evaluating the model's proficiency at a finer level of resolution. Figure 8(b) illustrates the probabilistic heatmaps generated from the model's predictions across the different risk categories. Notably, the occurrences along the diagonal of the heatmap are significantly more prevalent compared to the rest of the chart. This visual evidence supports the high accuracy of the deep learning model in classifying the bathymetric data. In other words, the model adeptly assigns the majority of bathymetric data to their appropriate risk categories. The results reveal that, although the probability exceeds 80% for the medium and extreme risk classes, there are instances of false positives. These misclassifications are predominantly concentrated at the lower boundaries within each category. We propose that this phenomenon may arise from inherent tendencies of convolutional networks to overly smooth outputs. When a large number of shallow depth values are present in the simulation, excessive smoothing can lead to underestimations for certain data points. This finding is consistent with the analysis from the scatterplot.
Model interpretability analysis
Features importance analysis
To comprehensively investigate flood inundation depth prediction and understand the role of individual features, this study employs a commonly used deep learning interpretability analysis technique called replacement feature importance evaluation. This technique involves randomly perturbing one feature while keeping others constant and examining the resulting changes in model accuracy. By quantifying the changes in error, we can approximate the relative contribution of each feature to the model's output.
Interpretation of model visualization based on Grad-CAM
The preceding analyses support the remarkable inundation depth prediction capabilities of CRU-Net. However, the complex nature of deep neural networks with a vast number of parameters makes it challenging to provide a clear theoretical explanation of their intricate internal mechanisms. As the adoption of deep learning continues to expand across various fields, the need for model interpretability has become increasingly important. We utilized Gradient-weighted Class Activation Mapping (Grad-CAM) for this purpose. Grad-CAM, a technique designed for highlighting key regions influencing predictions, allows us to visually decode these areas during the model's inference in classification or regression tasks, enhancing our comprehension of the model's functioning (Selvaraju et al. 2017).
During the progression from encoding to decoding, the attention of our CNN gradually shifts toward the regions within the waterlogging distribution. In the encoder part, the heat map exhibits a larger coverage area. This suggests that during the encoding process, the model extensively focuses on multiple regions within the input data. This behavior of the encoder can be attributed to its attempt to capture and comprehend a wide range of features and information from the input, which is crucial for subsequent data processing and feature extraction. Conversely, the heat map of the decoder section shows a decreasing range. As the model delves deeper layer by layer, the decoder increasingly concentrates on specific key regions that likely contain vital information required for decoding or regression tasks. This behavior indicates that the decoder tends to disregard less critical information when dealing with high-level, more abstract features. The output heat map of the final layer aligns with our expected results. This implies that during the final decision-making phase, the model focuses on the most relevant parts related to the task at hand. This validation of the model's efficiency and relevance further strengthens its performance. Overall, the Grad-CAM visualization provides valuable insights into the prediction process of CRU-Net, shedding light on how the model attends to different regions of the input data throughout its encoding and decoding stages.
Computational efficiency
Table 3 presents a comparison of the simulation speeds between CRU-Net and LISFLOOD-FP for reproducing a 6-h inundation caused by a specific short-duration design storm. The results clearly demonstrate CRU-Net's remarkable computational efficiency, as it is capable of generating simulations for 3 million grids in just 3 s. This is approximately 440 times faster than LISFLOOD-FP. Such a significant acceleration empowers real-time forecasting and decision support during flooding emergencies, enabling officials to take prompt and effective damage control actions.
Model . | Grid numbers . | Train time . | Simulate time . | Computing device . |
---|---|---|---|---|
LIS-SWMM | 3 × 106 | – | 22 min | AMD Ryzen 7 6800H |
CRU-Net | 3 × 106 | 6 h | 2.9 s | NVIDIA GeForce RTX 3060 |
ResU-Net | 3 × 106 | 6 h | 2.2 s | NVIDIA GeForce RTX 3060 |
U-Net | 3 × 106 | 4 h | 1.9 s | NVIDIA GeForce RTX 3060 |
Model . | Grid numbers . | Train time . | Simulate time . | Computing device . |
---|---|---|---|---|
LIS-SWMM | 3 × 106 | – | 22 min | AMD Ryzen 7 6800H |
CRU-Net | 3 × 106 | 6 h | 2.9 s | NVIDIA GeForce RTX 3060 |
ResU-Net | 3 × 106 | 6 h | 2.2 s | NVIDIA GeForce RTX 3060 |
U-Net | 3 × 106 | 4 h | 1.9 s | NVIDIA GeForce RTX 3060 |
DISCUSSION
The impact of rainfall heterogeneity on urban pluvial flooding characteristics, such as the increase in total system inundation volume and the prolongation of flooding duration, is a critical aspect of urban flood modeling. The uneven distribution of intense rainfall can exacerbate flood risks and management challenges (Lin et al. 2022). Our model has shown promising results in dealing with such heterogeneous rainfall inputs, ensuring robust performance even under the complex conditions that uneven precipitation patterns introduce. By effectively managing these variabilities, the CRU-Net model stands out as a valuable tool for urban flood mitigation strategies.
With advanced weather radar technologies providing high spatiotemporal quantitative precipitation estimates, detailed analysis and forecasting of spatial rainfall become feasible at high resolutions. The CRU-Net proposed in this paper demonstrates effective management of high-resolution spatial rainfall patterns with different structural distributions instead of traditional rainfall process lines (Rasheed et al. 2022; Zhang et al. 2023). These observations can supply essential initial and boundary conditions for pluvial flood models, enabling coupled forecasting of weather systems and hydrologic responses. This greatly improves the lead time and accuracy of urban flood warnings.
In this research, we introduce a model based on CNNs that demonstrates effective end-to-end processing capabilities, enabling the generation of high-resolution flood depth maps across the entire study area. This advancement contrasts with graph neural network approaches (Farahmand et al. 2023; Feng & Mao 2024), which are restricted to predefined monitoring nodes. In terms of performance comparison, our model surpasses those based on generative adversarial networks (do Lago et al. 2023), demonstrating higher accuracy in distinguishing between submerged and non-submerged conditions. Compared to the U-Net model used in previous literature (Guo et al. 2020; Löwe et al. 2021; Seleem et al. 2023), the model designed here realizes spatiotemporal deep learning for waterlogging prediction by extracting temporal information between channels and employing a spatial attention module to focus on critical spatial plots. However, current capabilities remain limited to predicting peak inundation depths without modeling the entire surface flow process.
While the tiling approach employed in this study may compromise basin-scale geomorphological features, it provides standardized input data. By incorporating terrain labels for each patch along with precipitation inputs, the model can learn inundation characteristics across different regions. This has the potential to improve the transferability of the model to arbitrary basins and rainfall conditions as more data is accumulated.
It is worth noting that despite the requirement for extensive pre-training, CRU-Net's efficient inference allows for versatile applicability to diverse precipitation patterns after training. This expedites the generation of flood alerts, enhancing its utility in real-time inundation forecasting. However, it is important to ensure the availability of sufficient training data for robust deep learning predictions, which should be taken into consideration during deployment.
CONCLUSION
Urban flooding prediction is crucial for comprehensive flood risk assessment and the development of effective resource allocation strategies. Deep learning approaches have gained traction in urban emergency flood prediction. However, the spatial structure of rainfall, which has a profound influence on urban flooding, is often overlooked in many deep learning investigations. This study introduces a novel deep learning framework equipped with an attention mechanism to anticipate inundation depths in urban terrains with the spatiotemporal rainfall patterns considered. The proposed CRU-Net model can effectively capture the intricate relationships between rainfall inputs and surface inundation dynamics by learning the inherent spatial features of precipitation. This capability extends to unseen rainfall scenarios, allowing for generalization. With an RMSE of 0.054 m and a Nash–Sutcliffe efficiency (NSE) of 0.975, the CRU-Net model demonstrates high accuracy in flood prediction. Specifically, the model achieves over 80% accuracy in predicting inundation locations with depths greater than 0.3 m. The CRU-Net model, incorporating attention mechanisms, not only accurately predicts flood evolution but also prioritizes high-risk areas such as underpasses and low-lying terrains, enhancing the model's effectiveness. Moreover, the CRU-Net can complete predictions for approximately 3 million grid cells in just 2.9 s. The superior efficiency of data-driven approaches compared to conventional physically based models make them well-suited for real-time flood forecasting and emergency response.
Moving forward, further optimization of deep learning architectures, such as exploring large transformer-based structures and augmenting the diversity of training data by simulating flooding results across different cities, will be pursued. Integration of state-of-the-art rainfall forecasting capabilities is expected to enhance the predictive power and applicability of the models. Overall, this study opens avenues for leveraging deep learning approaches in urban flood modeling and prediction, paving the way for improved flood management and response systems.
ACKNOWLEDGEMENTS
This study was supported by the National Natural Science Foundation of China (Grant 52270095, 52200119).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.