Abstract
Scientific and effective urban waterlogging risk prediction can help improve urban waterlogging disaster prevention capabilities. Combining the numerical simulation model with the data-driven model, the construction of the urban waterlogging risk predictive model can satisfy the prediction accuracy and improve the prediction timeliness. Thus, this paper established an urban waterlogging risk predictive model based on the coupling of the BP neural network and SWMM model, and set five input patterns, finally selected the accumulative precipitation process and precipitation characteristics as input to predict the regional waterlogging risks under different urban rainstorm scenarios. The results show that the overall performance of the pipe drainage system in the study area is lower, and it cannot resist the rainstorm with a higher return period. Moreover, the total waterlogging risk of the southern old city is higher than that of the northern new city in the study area. The calculation speed of the prediction model constructed in this paper is thousands of times higher than that of the numerical model, so the calculation speed is very fast, which meets the requirements of the forecast timeliness.
HIGHLIGHTS
Using fine data to build the SWMM model and various methods to verify the SWMM model.
Using web crawler technology to extract ponding points.
Using entropy weight method for risk quantization.
Using set pair analysis to evaluate the accuracy of the BP neural network.
Coupling SWMM model with BP neural network for risk prediction.
INTRODUCTION
Urban waterlogging refers to the phenomenon that the runoff yield volume exceeds the carrying capacity of the urban drainage system under the condition of high-intensity and short-duration rainstorms, which leads to the loss of efficacy of rainwater drainage and then results in large-scale accumulated water on the urban surface (Xu 2021). At present, with climate change and the acceleration of urbanization, urban waterlogging is becoming more and more serious (Xu et al. 2020; Liu et al. 2021). In the summer of 2021, a torrential rainstorm in Zhengzhou City of China brought about severe urban waterlog disasters with completely paralyzed traffic, causing a total of 88.534 billion yuan in direct economic losses and 302 casualties. In 2022, rare continuous heavy rains for more than 60 years hit the east of South Africa, many cities entered a state of emergency, nearly 4,000 houses were completely destroyed, and more than 40,000 people were displaced. So, the rapid and accurate prediction of urban waterlogging risk is the key to scientific waterlogging prevention (Berkhahn et al. 2019; Wang et al. 2019).
Now, scholars have carried out many research in this field, including urban flood forecasting (Yoon & Nakakita 2015; Wu et al. 2020), waterlogging risk warning (Wei et al. 2020; Zhou et al. 2022), urban waterlogging simulation (Xue et al. 2016; Chen et al. 2021), etc. For the study on urban waterlogging risk prediction, the numerical simulation models based on physical mechanisms are commonly used, such as SWMM, MIKE, and Inforworks ICM. Among them, the SWMM model has been widely used because of its characteristics of powerful functions, simple operation, and open source (Pells & Pells 2016; Behrouz et al. 2020; Dell et al. 2021). However, these numerical models usually involve complex hydrological and hydraulic processes, lots of parameters to be calibrated, and longer operation times that cannot meet the requirements in the timeliness of urban waterlog emergency response. Therefore, it is necessary to seek a new method to predict urban waterlogging risk rapidly and accurately, so as to reduce waterlogging disaster losses (Liu et al. 2022).
In recent years, data-driven models based on artificial intelligence algorithms have been successfully applied in hydrological forecasting (Araghinejad et al. 2021; Feng et al. 2022), rainfall and runoff simulation (Yang & Li 2012), waterlogging risk assessment (Cai et al. 2020; Wu et al. 2021), and so on. Compared with numerical models, data-driven models generally only consider data input and output and have excellent nonlinear mapping ability and calculation rate with simple steps, which can effectively satisfy the demand of urban waterlogging forecast in timeliness. As the widely used data-driven model, back propagation (BP) neural network model is a complex network structure formed by imitating the structure and function of a brain neural network and connecting a large number of processing units (neurons), so it can describe the relations between hydraulic elements of storm-waterlogging well (Gu et al. 2011; Yan et al. 2020). However, data-driven models need higher quality and quantity of data samples, which is often a difficult problem for workers. Therefore, this study gives a new idea to couple the numerical model with the data-driven model. The former can provide sufficient and accurate data samples and the latter is used to achieve an excellent computation rate, thus the urban waterlogging risk can be predicted accurately and efficiently.
Therefore, this paper first established a SWMM model to analyze the drainage performance and waterlogging risk under different rainfall conditions. Secondly, considering the continuity of the storm disaster-causing process and the rain pattern characteristics, five rainfall input patterns were set up. Then with the coupling of the BP neural network and the SWMM model (referred to as the BP-SWMM coupling model), the urban waterlogging risk prediction model was constructed to predict the waterlogging risk under different rainstorm scenarios.
MATERIALS AND METHODOLOGY
Study area
Data sources
SWMM model
The stormwater management model (SWMM) was developed by the US Environmental Protection Agency in 1971 and has been widely applied by many countries (Leutnant et al. 2019; Pachaly et al. 2021). The model consists of four modules: hydrological module (precipitation, evaporation, runoff), hydraulic module (pipe network, channel, surface ponding), water quality module (pollutants, erosion compound), and low impact development module (LID). In this study, only hydrology and one-dimensional hydrodynamic modules are involved, and three simulation processes are exhibited as follows.
Surface runoff-yield process
Surface confluence process
Pipeline confluence process
The pipeline confluence of the SWMM model is divided into constant flow, kinematic wave, and dynamic wave. The constant flow assumes that the water flow state in the pipeline is uniform and constant, which is inconsistent with the actual situation. The kinematic wave assumes that the slope of the water surface in the pipeline is the slope of the pipeline, which is limited to the simulation of pressure flow, backwater flow, and stagnant water, and only supports the simulation of tree pipe network. The dynamic wave can consider the loss of the inlet and outlet of the pipe and can simulate the pressure flow and some complicated and changeable water flow States in the pipeline. Therefore, the dynamic wave is selected to calculate the confluence of the pipeline confluence in this paper.
BP neural network
BP neural network is a classic feed-forward neural network, and its learning process includes two processes: forward propagation of signal and backpropagation of error (Svozil et al. 1997). When the signal propagates forward, the samples enter the network by the input layer, and output by the output layer after being processed by the hidden layer. When the error propagates back, the error between the output values and the target values is fed back to the hidden layer, so that the weight coefficient values of each layer are modified and then the square sum of the network error reaches the stated threshold.





The error between the expected output value R and the actual output value is . If
does not meet the settled convergence value, it will enter the reverse feedback process to modify the weights and thresholds between layers, and then conduct the next training; If
satisfies, it will stop training.
Entropy weight method

Set pair analysis

Technical route
MODEL BUILDING
SWMM model
Pipe network generalization
Sub-catchments delineation
There are two methods to divide sub-catchments: manual drawing and automatic delineation. In view of the large scope of the study area, it is not suitable to use manual delineation. So, the Thiessen Polygon Method was applied to delineate catchments by the analytical tool of GIS. The sub-catchments were finally obtained by cutting the detail part according to the practical requirements (as shown in Figure 6).
Parameters determination
The parameters such as average slope, area, and percentage of imperviousness of a catchment area were acquired by processing DEM and land use data. The characteristic width (Chen et al. 2013) adopted the formula , where A is the area of a sub-catchment. Horton infiltration formula was applied in the permeable area with an initial infiltration rate of 60 mm/h, the minimum infiltration rate of 3 mm/h, and the decay rate constant of 4 h−1. Other parameters were gained by referring to relevant documents (Li 2016).
Create INP file of the SWMM model
The SWMM model is saved in the form of an INP file, which can be opened in a notebook. However, the INP file in the form of a notebook is not convenient for simulation personnel to input and edit data. Using an Excel spreadsheet can better solve this problem. By combining the built-in tools in SWMM software with the Excel startup program, the SWMM model data can interact with Excel spreadsheet data.
SWMM model validation
Flow process verification
Comparison of observed and simulated outlet discharges of two historical rainstorms.
Comparison of observed and simulated outlet discharges of two historical rainstorms.
Overflow points verification
BP neural network model
Data sample set acquisition
In this study, the risk of waterlogging nodes is the main basis for evaluating the risk of urban rainstorm waterlogging. Therefore, for the BP neural network to be constructed, the rainfall and node risk samples are the input set and output set of the model respectively to form the data sample set. The specific formation process of the rainfall-node risk sample set is as follows:
According to statistics, the main disaster-causing rainfall type of urban waterlogging in Zhengzhou City is short-duration rainstorms. Therefore, 400 short-duration rainstorms with multiple temporal and spatial distributions and various rainfall patterns were screened out as the rainfall sample set. Then, these 400 rainfall events were imported into the SWMM model in turn, and 400 waterlogging node risk samples were obtained as the output sample set of the BP neural network. The original format of the rainfall sample set was the rainfall process of 10-min intervals. Considering the continuity of the rainstorm disaster-causing process and the main characteristics of rain pattern (such as total precipitation, average rainfall intensity, peak-to-times ratio, peak precipitation, and peak coefficient), five rainfall input patterns were finally formed (Table 1), a total of 1,500 rainfall samples were gained as the input sample set of BP network. Since the rainfall samples of five input patterns derived from each original rainfall sample are all corresponding to the same node risk sample, a total of 1,500 rainfall-node risk samples were acquired as the data sample set of the BP network.
Rainfall input patterns
Pattern 1 . | Pattern 2 . | Pattern 3 . | Pattern 4 . | Pattern 5 . |
---|---|---|---|---|
Accumulative precipitation and precipitation characteristics | Accumulative precipitation | Interval precipitation and precipitation characteristics | Interval precipitation | Precipitation characteristics |
Pattern 1 . | Pattern 2 . | Pattern 3 . | Pattern 4 . | Pattern 5 . |
---|---|---|---|---|
Accumulative precipitation and precipitation characteristics | Accumulative precipitation | Interval precipitation and precipitation characteristics | Interval precipitation | Precipitation characteristics |
Network model design
Relying on MATLAB software, the BP neural network was constructed by calling the newff function. And 90% of the data set as the training set and 10% as the test set were selected. The number of iterations was set at 1,000 and the learning rate was set at 0.01. The L-M optimization algorithm (trainlm) was chosen as the training function, the tangent sigmoid function (tansig) was chosen as the activation function between the hidden layers, and the linear function (purelin) was chosen as the activation function between the output layers. After many times of training, the number of hidden layers was 2, and the number of neurons in each layer was 3, 7.
RESULTS AND DISCUSSION
Urban waterlogging risk status


Urban drainage performance analysis
As can be seen from Table 2, with the increase in rainfall return period, the amount of overload conduits and junctions gradually increases and the degree deepens. In a 1-year rainstorm, only a few conduits and junctions are overloaded; in a 3-year rainstorm, the number of overloaded conduits and junctions increases significantly; in a 5-year rainstorm, the overload rate of conduit exceeds 40%, and junction exceeds 33%; when the rainfall return period is more than 10 years, more than half of the conduits and junctions are overloaded. It can be found that the overall performance of the drainage system in the study area is poor, which cannot resist the rainstorms with a high return period.
Overload number and rate of pipes and nodes in different rainfall return periods
Return period . | Number of overload pipes . | Overload rate of pipe (%) . | Number of overload nodes . | Overload rate of node (%) . |
---|---|---|---|---|
1a | 28 | 8.04 | 25 | 6.87 |
3a | 88 | 25.29 | 82 | 22.53 |
5a | 145 | 41.67 | 130 | 35.71 |
10a | 210 | 60.34 | 183 | 50.27 |
20a | 248 | 71.27 | 228 | 62.64 |
Return period . | Number of overload pipes . | Overload rate of pipe (%) . | Number of overload nodes . | Overload rate of node (%) . |
---|---|---|---|---|
1a | 28 | 8.04 | 25 | 6.87 |
3a | 88 | 25.29 | 82 | 22.53 |
5a | 145 | 41.67 | 130 | 35.71 |
10a | 210 | 60.34 | 183 | 50.27 |
20a | 248 | 71.27 | 228 | 62.64 |
Nodes risk analysis
Regional risk analysis
By extracting ponding points in Zhengzhou and combining them with the simulation results of the SWMM model, it was found that there are eight obvious waterlog-prone areas in Zhongyuan District (Figure 9). In this paper, the natural disaster risk expression (risk = danger + vulnerability) proposed by Maskey (1989) was applied to assess the waterlogging risk in waterlog-prone areas. Danger is based on natural attributes, including disaster-causing factors and hazard-inducing environment. For urban waterlogging, disaster-causing factors refer to urban ponding caused by short-term heavy rainfall, the hazard-inducing environment is an environmental factor that causes waterlogging disasters to occur. Vulnerability is based on social attributes and reflects the degree of damage caused by disasters, including aspects such as human beings themselves, the economy, and the urban transportation system. The node risk, areal slope, elevation, and impermeability were chosen as dangerous indexes and the population, point of interest, and road network were chosen as vulnerable indexes. After normalization of the values of each indicator, the weight of each index was determined by the entropy weight method and the risk values of each waterlog-prone area were acquired by weight sum calculation. By using the K-means Clustering Method, the waterlogging risk levels of waterlog-prone areas were classified, and the results show that the risk values of waterlog-prone areas are 0 ∼ 0.1, 0.1 ∼ 0.3, 0.3 ∼ 0.5, 0.5 ∼ 0.7, and 0.7 ∼ 1, grade I, grade II, grade III, grade IV, and grade V. Among them, grade I represents the lowest risk and grade V represents the highest risk. The risk values of eight waterlog-prone areas under diverse precipitation scenarios are shown in Table 3.
Waterlogging risk value of each waterlog-prone area
Return period . | Waterlog-prone area 1 . | Waterlog-prone area 2 . | Waterlog-prone area 3 . | Waterlog-prone area 4 . | Waterlog-prone area 5 . | Waterlog-prone area 6 . | Waterlog-prone area 7 . | Waterlog-prone area 8 . |
---|---|---|---|---|---|---|---|---|
1a | 0 | 0.021 | 0.052 | 0 | 0.099 | 0.037 | 0.094 | 0.015 |
3a | 0.026 | 0.120 | 0.263 | 0.056 | 0.342 | 0.180 | 0.403 | 0.116 |
5a | 0.177 | 0.294 | 0.463 | 0.175 | 0.540 | 0.320 | 0.627 | 0.305 |
10a | 0.359 | 0.437 | 0.656 | 0.473 | 0.690 | 0.572 | 0.754 | 0.587 |
20a | 0.619 | 0.732 | 0.874 | 0.684 | 0.913 | 0.825 | 0.921 | 0.803 |
Return period . | Waterlog-prone area 1 . | Waterlog-prone area 2 . | Waterlog-prone area 3 . | Waterlog-prone area 4 . | Waterlog-prone area 5 . | Waterlog-prone area 6 . | Waterlog-prone area 7 . | Waterlog-prone area 8 . |
---|---|---|---|---|---|---|---|---|
1a | 0 | 0.021 | 0.052 | 0 | 0.099 | 0.037 | 0.094 | 0.015 |
3a | 0.026 | 0.120 | 0.263 | 0.056 | 0.342 | 0.180 | 0.403 | 0.116 |
5a | 0.177 | 0.294 | 0.463 | 0.175 | 0.540 | 0.320 | 0.627 | 0.305 |
10a | 0.359 | 0.437 | 0.656 | 0.473 | 0.690 | 0.572 | 0.754 | 0.587 |
20a | 0.619 | 0.732 | 0.874 | 0.684 | 0.913 | 0.825 | 0.921 | 0.803 |
As can be seen from Table 3 and Figure 9 that with the growth of rainstorm recurrence interval, the risk values of each waterlog-prone area increase in varying degrees. Most of these waterlog-prone areas are located at the commercial areas or intersections of metropolitan zones in Zhongyuan District, and their terrains are relatively low.
It is not difficult to find that the waterlogging risk in the north of the study area is lower than that in the south. The reason is that the north of Zhongyuan District is a new city with low development degree, low impermeability of underlying surface, and better rainwater storage capacity. Based on the situation of the pipe network, although the density of the drainage conduits in the southern old city is high, the pipe diameter is smaller and the conduits are mostly aging and in bad repair with insufficient capacity. Consequently, the risk of waterlogging in the southern old city is higher when encountering heavy rains.
BP-SWMM coupling model construction and verification
Comparison of prediction results of different rainfall input patterns
Results of set pair analysis
Set pair . | The degree of identity . | The degree of discrepancy . | The degree of opposition . | The degree of connection . |
---|---|---|---|---|
(R1, P) | 0.78 | 0.18 | 0.05 | 0.82 |
(R2, P) | 0.73 | 0.23 | 0.04 | 0.80 |
(R3, P) | 0.70 | 0.26 | 0.04 | 0.79 |
(R4, P) | 0.72 | 0.22 | 0.06 | 0.77 |
(R5, P) | 0.68 | 0.25 | 0.07 | 0.74 |
Set pair . | The degree of identity . | The degree of discrepancy . | The degree of opposition . | The degree of connection . |
---|---|---|---|---|
(R1, P) | 0.78 | 0.18 | 0.05 | 0.82 |
(R2, P) | 0.73 | 0.23 | 0.04 | 0.80 |
(R3, P) | 0.70 | 0.26 | 0.04 | 0.79 |
(R4, P) | 0.72 | 0.22 | 0.06 | 0.77 |
(R5, P) | 0.68 | 0.25 | 0.07 | 0.74 |
Node risk predictive results
Figure 12 shows that the predictive value curves of the BP model are in good agreement with the true value curves of the SWMM model, which preliminarily illustrates that the prediction model has acquired a favorable effect. So as to further prove the rationality of the prediction model, the mean absolute error (MAE), root mean square error (RMSE), and NSE coefficients are introduced for analysis, as shown in Table 5.
Evaluation index values of BP model at each node
Evaluation index . | A . | B . | C . | D . |
---|---|---|---|---|
NSE | 0.943 | 0.948 | 0.980 | 0.965 |
MAE | 0.040 | 0.038 | 0.024 | 0.025 |
RMSE | 0.059 | 0.054 | 0.032 | 0.039 |
Evaluation index . | A . | B . | C . | D . |
---|---|---|---|---|
NSE | 0.943 | 0.948 | 0.980 | 0.965 |
MAE | 0.040 | 0.038 | 0.024 | 0.025 |
RMSE | 0.059 | 0.054 | 0.032 | 0.039 |
In Figure 12, it is found that the overall fitting errors of the points on the upper part of the predictive value curves and true value curves are lower than that on the lower part, that is, the predictive accuracy of the nodes with higher risk is better than that for nodes with lower risk. To confirm this conjecture, the mean relative error (MRE) of various junctions in Figure 12(b), which is less obvious, was calculated. As shown in Figure 12(b), the amounts of mild waterlog points, moderate waterlog points, and severe waterlog points are: 24, 11, and 5, respectively, in the predictive results of the test set samples, and the MRE values are 20.4, 10.1, and 5.2%, respectively. Compared with the moderate waterlog points, the MRE of forecasting severe waterlog points reduces by 4.9%, and compared with the mild waterlog points, it reduces by 15.2%. Therefore, the BP model has the highest accuracy for the prediction of severe waterlog points, that is, it has better applicability for junctions with higher risk levels.
Since there are no other nodes in the waterlog-prone area 7 except these four nodes,when the risks of the four nodes are known, the waterlogging risk value of the waterlog-prone area 7 can be acquired by the calculation method of regional waterlogging risk mentioned above. Therefore, the mapping relationship between rainfall and regional waterlogging risk is successfully established, which means the building of the BP-SWMM coupling model has been completed.
Regional risk predictive results
The established BP-SWMM coupling model was used to predict the waterlogging risk of the waterlog-prone area 7 under the five-return period rainfall conditions, the predictive results of the risk values of the four nodes are shown in Table 6.
Node risk values of waterlog-prone area 7
Return period . | Rainfall time . | A . | B . | C . | D . |
---|---|---|---|---|---|
1a | 1 h 30 min | 0.118 | 0.082 | 0.106 | 0.046 |
3a | 2 h | 0.360 | 0.261 | 0.337 | 0.242 |
5a | 2 h 30 min | 0.517 | 0.509 | 0.552 | 0.403 |
10a | 3 h | 0.734 | 0.715 | 0.763 | 0.657 |
20a | 3 h 30 min | 0.908 | 0.877 | 0.920 | 0.816 |
Return period . | Rainfall time . | A . | B . | C . | D . |
---|---|---|---|---|---|
1a | 1 h 30 min | 0.118 | 0.082 | 0.106 | 0.046 |
3a | 2 h | 0.360 | 0.261 | 0.337 | 0.242 |
5a | 2 h 30 min | 0.517 | 0.509 | 0.552 | 0.403 |
10a | 3 h | 0.734 | 0.715 | 0.763 | 0.657 |
20a | 3 h 30 min | 0.908 | 0.877 | 0.920 | 0.816 |
From Table 6, the risk values of the four junctions in the waterlog-prone area 7 under the 1-year, 3-year, 5-year, 10-year and 20-year rainstorm events are known, then by further calculation of the entropy weight method with other indexes, such as elevation, impermeability, population, and so on, the waterlogging risk values were acquired: 0.110, 0.368, 0.601, 0.730, and 0.898, the MRE values are 17, 8.7, 4.1, 3.2, and 2.5%, respectively. The MRE values in these five events are all smaller, indicating that the predictive results are outstanding and the coupling model has an excellent effect on regional risk prediction. The MRE decreases with the increase of the regional risk level, which is consistent with the previous conclusion.
In terms of prediction timeliness, to calculate the waterlogging risk values of the waterlogging area 7 under a rainfall event, the BP-SWMM coupling model took only 0.1 s. However, the SWMM model took 172, 195, 217, 245, and 270 s to run the five rainstorm events of 1 year, 2 years, and 3 years, respectively, and it will take longer to export and process the running results. The calculation speed of the coupling model is thousands of times faster than that of the numerical model. If the research range expands and the pipes and nodes increase, the calculative time of the numerical model will also increase, and the calculative efficiency cannot satisfy the real-time simulation demands (Gui et al. 2021). The method suggested in this paper has high simulative accuracy and fast running speed, which can greatly meet the needs of urban waterlog emergency work.
CONCLUSION
In this paper, a SWMM model and a BP-SWMM coupling model were established for waterlogging risk analysis and prediction in the study area. The main conclusions are as follows:
- (1)
The drainage system in the research zone has insufficient ability to resist rainstorms in the high recurrence intervals; with the increase of the rainfall return period, the risk level of each waterlog-prone area presents a transition trend from low to high, and the overall waterlogging risk of the southern old city is higher than that of the northern new city.
- (2)
When pattern 1 (accumulative precipitation and precipitation characteristics) is taken as the input of the BP-SWMM coupling model, the model has the best effect, which is 2.5% higher than pattern 2 (accumulative precipitation) and 6.5% higher than pattern 4 (interval precipitation).
- (3)
In terms of urban waterlogging prediction, the coupling model can not only satisfy the prediction accuracy (its NSE coefficient has reached above 0.9); but also improve the prediction timeliness (its calculation speed is thousands of times higher than that of the numerical model, which can effectively satisfy the urban waterlog emergency work).
ACKNOWLEDGEMENTS
This research is supported by the National Key R&D Program of China (Grant No. 2021YFC3200205) National Natural Sciences Foundation of China (Grant No. 52379028) and Natural Sciences Foundation of Henan Province (Grant No. 212300410404).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.