Research on nowcasting prediction technology for �ooding scenarios based on data-driven and real-time monitoring

With the impact of global climate change and urbanization process, the risk of urban �ooding has increased rapidly, especially in developing countries. Real-time monitoring and prediction on �ooding extent and drainage system are the foundation of effective urban �ood emergency management. Therefore, this paper presents a rapidly nowcasting prediction method of urban �ooding based on data-driven and real-time monitoring. The proposed method �rstly adopts a small number of monitoring points to deduce the urban global real-time water level based on machine learning algorithm. Then, a data-driven method is developed to achieve dynamic urban �ooding nowcasting prediction with the real-time monitoring data and high accuracy precipitation prediction. The results show that the average MAE and RMSE of the urban �ooding and conduit system in deduction method for water level are 0.101 and 0.144, 0.124 and 0.162 respectively, while the �ooding depth deduction is more stable compared to conduit system by probabilistic statistical analysis. Moreover, the urban �ooding nowcasting method can accurately predict the �ooding depth, and the R 2 are as high as 0.973 and 0.962 of testing. The urban �ooding nowcasting prediction method provides technical support for emergency �ood risk management.


Introduction
With the impact of global climate change, extreme weather events are becoming more and more frequent (Güneralp et al. 2015; Li and Willems 2020).Meanwhile, the increase in urban impervious surface area due to rapid global urbanization will lead to greater ood risk, especially in developing countries (Ding et al. 2022; Wang et al. 2022).Rapidly predicting urban ooding is a critical part of providing decisionmakers with enough time to take action and thus minimize damage.(Hou et al. 2021; Yan et al. 2021).
To better predict the ooding scenario, hydraulic model based on physical methods is used extensively for simulating the urban ood (Chang et al. 2021).The most commonly used software, such as InfoWorks ICM (Innovyze 2019), LISFLOOD-FP (Bates and De Roo 2000) and HEC-RAS (de Arruda Gomes et al. 2021), are based on the shallow water equations (SWEs) and its simpli ed form (Guidolin et al., 2016).However, solving SWEs at high spatial resolution is very complex and requires signi cant computational costs (Zhao et al. 2020; Buttinger-Kreuzhuber et al. 2022).Therefore, different prediction methods have been proposed for rapidly and accurate urban ood prediction.
Machine learning methods have emerged in recent years as an surrogate model for urban ooding prediction (Mosavi et al. 2018).It provides an e cient and accurate prediction approach that does not need to consider complex physical processes such as nonlinear uid motion (Kabir et al. 2020;Chu et al. 2020).Recently, these methods were extensively used in the prediction of urban ood and risk assessment (Youssef et al. 2022; Zahura and Goodall 2022; Madayala et al. 2022).However, these models are usually based on hydraulic model data or historical data and can only provide static predictions of urban ooding conditions (Wu et al. 2020;Zhou et al. 2021Zhou et al. , 2022)).Thus, understanding the dynamics of ooding processes and the associated variable effects on ood zone objects (e.g., buildings) can provide valuable information for better management of ood disasters (Gao et al. 2021).
Moreover, the data-driven method also has shortcomings in dynamic prediction of urban ooding.
Changes in actual conditions will impcat the forecast accuracy, and make it di cult to develop a dynamic nowcasting prediction (Mancini et al. 2022).Therefore, in order to develop a highly accurate data-driven model, the effective model calibration based on real-time monitored data is hence essential (Fattoruso et al. 2015).Nevertheless, the arrangement of monitoring sensors requires a lot of human and material resources, and it is not practical to arrange a large number of them (Banik et al. 2015;Yazdi 2018).Thus, it is quite important to realize the dynamic prediction of urban ooding based on real-time monitoring using a small number of monitoring sensors.
The present work aims to develop a method based on data-driven and real-time monitoring for rapidally nowcasting prediction of urban ooding.The main components are as follows: (i) Determining the location and number of monitoring sensors, and developing an deduction method for water level of the whole drainage system based on the monitoring sensors; (ii) Proposing a data-driven model and real-time monitoring-based method for rapidly nowcasting prediction of urban ooding.

Methodology
An urban ooding nowcasting prediction method was developed to enhance the dynamic nowcasting of urban ooding based on data-driven model and real-time monitoring data.The method consists of three modules, the data sources, real-time water level deduced in the whole system through machine learning (ML) algorithm and urban ooding rapidly nowcasting through data-driven method.
As shown in Fig. 1, the proposed model consists of three modules.Data sources (establish the dataset by hydraulic model), real-time monitoring (select monitor sites and deduce urban global water level) and urban ooding prediction.
Figure 1 Flowchart for construction of urban ooding prediction with real-time data (UFP-RD) methods

Data sources
A hydraulic model was constructed using InfoWorks ICM software to simulation the urban drainage system and ooding scenario, which including the conduit drainage model and the urban ood model.
The river pump gate operation data was obtained from the management department, and the model was calibrated with these data and actual measurement data from the sensors.In several rainfall events for validation, the R 2 of the hydraulic model at each ow meter reached 0.92-0.98.Then, 120 different rainfall events were simulated and used for training UFP-RD model.

Real-time monitoring
Optimizing the arrangement of monitoring sites is an important part of real-time monitoring, the essence of which is to obtain as much information of urban drainage system as possible from a limited number of monitoring sites.Thus, we adopt Principal Component Analysis (PCA) method to simplify the data dimensions for monitoring sites optimization.The results of PCA in the train data set are rst ranked by the eigenvalue, and each principal component corresponds to the point of maximum load as the monitoring site.Then, select the top-ranked monitoring sites according to the number of required monitoring points.
We adopt ML techniques to quantify the cross-scale correlation in local-global water level.XGBoost is selected as the ML algorithm due to its fast, e cient, accurate and fault-tolerant compared with other ML methods (Chen and Guestrin 2016).The input of the XGBoost model is the water depth of monitoring sites and simulated precipitation events from the train set, and the output of the ML learning model is the water depth of urban ooding except for monitoring sites.The speci c theories and calculation process are shown in the supplementary materials.

Urban ooding risk prediction
Data driven method is used to predict the water level of monitoring sites for developing the urban ooding prediction model.In this study, XGBoost model is adopted to develop a time-series model of water level at monitoring sites, where the input of the model is accurate precipitation prediction and monitoring water level data for the corresponding time (Fang et al. 2021).Then, the predicted water level of monitoring sites is used as the input of local-global deduction (LGD) model (section 2.2) to obtain the urban ooding prediction results.where is the ood risk of th manhole in this rainfall event, is the predicted water level of most unfavorable moment in the th manhole, is the elevation of the bottom in the th manhole, is the elevation of the ground level in the th manhole.For the urban ground segment, the risk can be expressed as: where is the ood risk of th ground point in this rainfall event, is the predicted water level of most unfavorable moment in the th ground point, is the elevation of the ground level in the th ground point.where is the measured value of FMPs, is the predicted value of FMPs, is the average value of , is the number of samples.The MAXE, MAE and RMSE are all closer to 0 indicating better model results.The R 2 value is between 0 and 1, the closer to 1 means the better the t is (Chu et al. 2020).
Apart from the evaluation indicators, the probability indicators including Cumulative Distribution Function (CDF) Consistency Histogram (CCH) and Metric Consistency Deviation (CD) were used to evaluate the consistency of the two frequency distribution histograms (Chen et al. 2020), which was slightly modi ed to make it more reasonable in this study.The gure of CCH is a bar chart consisting the same number of columns as the corresponding CDF, and the value of the th column in the gure was de ned as: where is the number of columns in the CCH, is the value of the th column.The CD value is between 0 and 1, and the closer to 0 indicates the better stability of the model.
3 Case study

Study area
J city is located in the Hangjiahu plain, in the north of Zhejiang Province in China.It is one of the commercial centers in the Yangtze River Delta region.J city has a dense network of urban rivers, with an average annual rainfall of up to 1168mm.To control urban ooding, sluice gates and pumping stations were constructed in urban low-lying ood-prone areas, and some scattered small urban polders have been separated in the city.The SGT polder was selected as the case study area, as shown in Fig. 3.The SGT polder is mainly for commercial and residential land using, and the four main types of land use are 35% for roofs, 28% for urban green areas, 22% for tarmacadam and 10% for brick.The region has an area of approximately m 2 with average attitude of 3.5m (Yellow Sea elevation datum of China) and served by separate sewer system.The drainage system has a design return period of less than one year, and thus SGT highly prone to urban ooding when the disaster-caused rainstorm coming (Tang et al., 2020).

Meteorological and hydrological data
Meteorological data were taken from historical meteorological monitoring data, which was provided by WheatA agro weather big data system.In order to obtain real-time observed data, sensors were installed at the corresponding sites according to the results of optimal arrangement of monitoring sites.The installation position of these sensors as shown in Fig. 3. Three water level meters (HOBO U20L-01) were installed at different locations along the river.Two rain gauges (L99-YL, China) were also installed to record the precipitation event.Finally, six ow-level meters (Isco 2150) were installed at optimized nodes of drainage system as monitor sites.

Real-time monitoring of urban drainage system
The number of principal components and the corresponding total variance explained were shown in Fig. 4. The results show that 99.3% of the variance of the water level can be explained by 14 principal components.The percentage of variance explanation indicates the degree to which the total variance of sample can be explained by the predictor variables (Gewers et al. 2022).This means that the water level in the almost all drainage system in the SGT area can be calculated if we get the water level in the 14 manholes.Meanwhile, it can also be seen from Fig. 4 that the increase in the number of principal components was not effective in improving the variance explained when the number of principal components exceeds 14.In other words, it cannot improve the accuracy of the urban global water level.In addition, according to Kaiser's rule, only the principal components with eigenvalues greater than 1 which has a great signi cance in PCA (Gewers et al. 2022).These principal components (eigenvalue > 1) are also the same as those 14 principal components described above.The results present the point corresponding to the maximum load of each principal component for selection (Table S1).For the balance of economy and accuracy, six monitoring sensors (95.3% explanation rate) were installed in the case study area with two as validation (Fig. 3).

Figure 4 The total variance explained by different number of principal components
The results for validation of the LGD model and hydraulic model simulation were performed using ten rainfall events of test set (Table S2).As demonstrated in Table 1, the average R 2 values of surface node and manhole reaches 0.909 and 0.899 respectively.In addition, the average MAE and RMSE values between the two models are relatively low, which are 0.101 and 0.144 (urban surface), 0.124 and 0.162 (conduit system) respectively.These results revealed a strong t between the LGD model and hydraulic model, indicating the reliable accuracy of the results deduced by LGD model.Figure 5 presents the most unfavorable ooding scenario under a disaster-causing rainfall event.Through compared the results of the LGD model and hydraulic model, it reveals that both of the two models simulate ood extents similarly and is also consistent with the recorded ood spot in the SGT polder.Thus, it shows the reasonableness of the results deduced by LGD model.Meanwhile, we compared the conduit system water level in a non-disaster causing rainfall events.The most unfavorable moment of the drainage network is shown in Fig. 6.Table 1 shown the average values of MAXE, MAE, and RMSE of the conduit system were 0.218, 0.124 and 0.162 respectively, while R 2 is 0.899.The results were similar to the urban surface.
Figure 6 The most unfavorable moment of drainage network: (a) conduit simulated by PXM model and (b) conduit simulated by hydraulic model

Table 1 Results for validation of the LGD model and hydraulic model
To quantitatively assess the performance of LGD model and hydraulic model in representing the probability distribution of ooding results, CDF consistency histogram are provided in Fig. 7. On each subplot, the reference dashed line (0.1) of a perfectly atted histogram.Figure 7 clearly illustrates that the CCH column of the result of water level in conduit system is high at the 0.3-0.4quartile compared to reference line.This indicates that the frequency of water level simulated values at this quantile is high compared to the test values and there is an overestimation.Similarly, there is an underestimation at the 0.5-0.9quantile.In addition, the CCH column of the values at urban surface almost coincides with the reference line, indicates that LGD model is more effective in calculating the water level at urban surface And the CD values of conduit system and urban surface are 0.272 and 0.013, respectively.The closer the CD value is to 0, the better the stability of the model.Thus, these results reveal that the urban surface is more stable compared to conduit system when predicting water level, which is also consistent with the results of the RMSE (Table 1).ooding risk points is generally similar for the UFP-RD model and XGBoost model, but the speci c ooding risk (water depth) were different, as shown by the zones identi ed in the yellow boxes.Moreover, there were some similar results were also found in conduit system (Fig. 9).The speci c evaluation indicator values were shown in Table 2.In two monitoring sites, the R 2 for the XGBoost model was 0.933 and 0.914, whereas the R 2 for the UFP-RD model was higher, with 0.973 and 0.962, respectively.In addition, other evaluation indicator values also indicate that the proposed UFP-RD model performs high accuracy during the dynamic prediction process.The UFP-RD model improved the prediction accuracy compared to the traditional machine learning model because the method considers adopting the real-time observed data to eliminate the cumulative error at each time step.Moreover, the UFP-RD model retains the advantages of high computational e ciency and short time consumption of traditional machine learning model, and the total calculation time of 0.29 (CPU) and 0.25 (GPU) (Fig. S3).Thus, the proposed model can be well used for nowcasting prediction of urban ooding scenarios.

Conclusions
This study develops a rapidly nowcasting prediction method for urban ooding risk based on data-driven and real-time monitoring.Our approach provides a LGD model based on ML algorithm that enables quantify the cross-scale relationship in local-global water level.The following conclusions were obtained from a case study in SGT polder of J City: 1. Through a case study in SGT polder of J City.The LGD model can deduce the urban global water level based on a small number of monitoring sites.The average values of MAE and RMSE for the urban surface and conduit system in LGD model were as low as 0.101 and 0.144, 0.124 and 0.162 respectively.Meanwhile, the LGD model was more stable compared to conduit system when deducing water levels in urban surface.
2. This study implements a nowcasting prediction method of urban ooding scenarios based on datadriven and real-time monitoring data, which named UFP-RD model.The proposed UFP-RD model has higher accuracy than traditional ML algorithm in predicting the water depth of urban ood, and it retains the advantages of high computational e ciency.It can provide important technical reference for the early warning and control of urban ooding disasters.

Figures
Page 13/  The schematic of UFP-RD model prediction process

Figure 2
Figure 2 illustrates the schematic of the proposed UFP-RD model used for urban ooding risk prediction.The observed water depth values at monitoring sites for the current time are used in the UFP-RD model to predict urban ooding during the prediction step.The UFP-RD model is run for the second time with the observed water level values of the next time step, and update the prediction results.The predicted water depth is simply converted to a probabilistic ood risk map.For the drainage system of the conduit component.The risk can be expressed as:

Figure 2
Figure 2 The schematic of UFP-RD model prediction process 2.4 Model evaluation value of the predicted value in the column representing the th CDF, is the frequency value of the measured value in the column representing the th CDF, is the number of columns in the CDF.When the model works very well, the CDF gure of the predicted value and measured value are exactly the same.At the point the values of each column in the CCH are uniformly equal to .Due to the strong subjective arbitrariness of CCH, Chen et al. proposed CD to more objectively evaluate CCH (Chen et al. 2020):

Figure 3
Figure 3 Schematic diagram of the location of the study area

Figure 5
Figure 5 The most unfavorable moment of urban ood: (a) urban surface simulated by LGD model, (b) urban surface simulated by hydraulic model

Figure 7
Figure 7 Cumulative distribution function (CDF) consistency histogram of the LGD model 4.2 Urban ooding risk prediction

Figure 4 The
Figure 4

Figure 5 The 7
Figure 5

Table 1
Results for validation of the LGD model and hydraulic model

Table 2
Speci c results of the model with real-time monitoring and without real-time monitoring

Table 2
Speci c results of the model with real-time monitoring and without real-time monitoringIt is clear from the comparison analysis above that this paper's urban surface part lacks a comparison between simulation results and actual ooding depth.This is because monitoring data for this area is di cult to obtain.