ABSTRACT
Predicting water levels in urban storm-water sewer systems is a critical study that could provide vital information to help reduce the risk of flooding. This study proposed a new ensemble model based on the integration of a meta-learner model, residual-error corrections, and a multiple-output framework. To achieve the meta-learner model, three multiple-output data-driven-based (MOD) sewer flooding models employing support vector regression (SVR), k-nearest neighbor regression (KNR), and categorical gradient boosting regression (CGBR) techniques were constructed and applied to predict the short-duration evolution of water levels at seven storm-water gauging sites in Taipei city, Taiwan, considering 10-min datasets spanning nearly 6 years (2016–2021). The Bayesian optimization algorithm was utilized in the training phases for all the models to avoid overfitting or underfitting. Enhancing the analysis of feature importance was also conducted to explore model interpretability based on the SHapley Additive exPlanation (SHAP) algorithm. The outputs of storm-water management model (SWMM) were used as benchmark solutions. For the model validation phase, the proposed integrated model improved the lead-time-averaged Nash–Sutcliffe efficiency of single KNR, SVR, and CGBR models by 174.5, 42.4, and 69.4%, respectively, showing that the proposed accurate model could be useful for urban flood warning systems.
HIGHLIGHTS
A new ensemble model based on the integration of meta-learning, residual-error correction, and multiple-output framework was proposed.
The proposed model improved the lead-time-averaged Nash–Sutcliffe efficiency of single KNR, SVR, and CGBR models by 174.5, 42.4, and 69.4%, respectively.
The proposed model was found to be an effective and precise method for predicting short-duration storm-water levels.
INTRODUCTION
Flooding is one of the most common natural hazards worldwide. Between 2000 and 2019, floods affected 1.6 billion people, resulting in an estimated economic loss of $651 billion (USD). The frequency, impact, duration, and intensity of floods are increasing due to rapid economic growth in urban areas and the global climate (Devitt et al. 2023). In Taiwan, the frequency of short-duration heavy rainfall events has significantly increased. Urban storm-water drainage systems are usually unable to meet the needs of existing and future urban development. When rainwater exceeds the design capacity of urban storm-water sewer systems during the rainy or typhoon season, the main sewers can overflow, causing further surface flooding in the city. In 2016, Typhoon Megi caused severe damage to areas such as Tainan and Kaohsiung, resulting in severe flooding and agricultural losses of NTD 3.3 billion. In 2017, Typhoons Nesat and Haitang severely damaged Tainan city and Pingtung County and caused severe flooding. On 22 July 2019, between 15:00 and 18:00, a strong convection brought severe short-duration heavy rainfall to areas in Taipei city and New Taipei city, causing 248 flood disasters and affecting driving safety. On 4 June 2021, Taiwan was affected by the approaching front and peripheral circulation of Typhoon Choi-wan, which caused short-duration heavy rainfall in many areas of Taipei city and New Taipei city. The observational data showed that the maximum hourly rainfall was 137.5 mm at the Fuzhou station in the Daan district of Taipei city. Therefore, the impact of flooding in urban areas on the safety of people and on property is considerable.
To reduce disaster losses due to urban flooding, numerical simulation analysis of urban flooding is one of the most essential approaches to nonengineering disaster mitigation. With regard to dynamic urban flood simulation, several physical process-based (PP) models have been developed and applied in practical areas, such as the SOBEK model (Poretti & De Amicis 2011), the DHI MIKE model (Nigussie & Altunkaynak 2019; Zhou et al. 2022), the two-dimensional flood routing model (FLO-2D) (Wang et al. 2022), the storm-water management model (SWMM) (Farina et al. 2023; Zhuang et al. 2023), and the Hydrologic Engineering Center's River Analysis System (HEC-RAS) 2D flood model (Yazdi et al. 2022; Shaikh et al. 2023). Generally, these PP models can achieve relatively accurate performances because they consider hydrological processes. However, the PP models usually employ some assumed conditions to simplify the governing equations. Therefore, the simulation results may not be entirely consistent with the actual situation. In addition, some field-measured data may be difficult to obtain, which could make it more complicated to implement data preprocessing and parameter calibrations of the model. Moreover, PP models are time-consuming and thus inefficient for real-time early flood warning and flood operation.
Compared to PP models, the data-driven (DD) model proposed in recent years overcomes the assumptions and restrictions of PP models. The DD model also does not require hydrology or geographical parameters for model calibrations. The DD model only requires time series data to implement model training, validation, and testing, leading to fast prediction processing. Several DD models have been applied to river flooding simulations, such as river stage predictions and rainfall–runoff simulations, which play essential roles in preventing and mitigating flood disasters. For instance, Badrzadeh et al. (2015) proposed and compared four DD models, namely, artificial neural networks (ANNs), adaptive neuro-fuzzy inference systems (ANFISs), wavelet neural networks, and hybrid wavelet-based models, for flood forecasting with lead times of 1–48 h at the Richmond River, Australia. Their results indicated that the hybrid model produced the best performance compared to the ANFIS and ANN models. Wang et al. (2019) proposed a dilated causal convolutional neural network (CNN) model for river stage forecasting in the Yilan River basin in Taiwan. The proposed model was also compared with support vector regression (SVR) and multilayer perceptron regression (MLPR) models, which achieved the best prediction performance for lead times of 1–6 h. Dazzi et al. (2021) employed SVR, MLPR, and long short-term memory (LSTM) models to forecast the river stage with 1–9 h lead times in the Parma River, Italy. The results suggested that the LSTM model is the most accurate in their study area and can be helpful for the development of a flood operational forecasting system. Guo et al. (2021) proposed and employed four DD models, namely, SVR, random forest regression (RFR), MLPR, and light gradient boosting machine (LightGBM), to predict the river stage at the tidal reach of the Lan-Yang River Basin, Taiwan. Based on their simulated results with 1–6 h lead times, the LightGBM model achieved the best overall performance among the four tested models. Belyakova et al. (2022) employed three DD models, namely, the M5P decision tree model, extreme gradient boosting (XGBoost), and MLPR model, to predict river stage at lead times of 1–20 h for the Pshish and Mzymta Rivers in the western Caucasus. Their results revealed that the most suitable models for the Pshish and Mzymta Rivers were XGBoost and MLPR. Kumar et al. (2023) conducted a comparative analysis of several DD models for predicting the river streamflow of the Garudeshwar watershed, India. They presented several models, including k-nearest neighbor regression (KNR), RFR, LightGBM, linear regression, MLPR, XGBoost, and categorical gradient boosting regression (CGBR). Their results revealed that the CGBR model achieved the best performance. Vizi et al. (2023) employed the LSTM model for water level prediction on the Tisza River, Central Europe. They found that the LSTM model provided accurate and reasonable 7-day-ahead forecasts.
Regarding the application of DD models to urban inundation simulation, the datasets for the observational inundation depth and area in the urban study area are relatively smaller than those for the river stage or discharge. Therefore, to enhance the performance, further advanced studies could combine the DD and PP models for urban flood simulations. When the dataset in the study urban area is limited, the PP model can be employed to simulate flooding and create the required dataset. The generated dataset is then combined with the DD model to learn the nonlinear relationship between inputs and outputs. Accordingly, the hybrid model could achieve accurate performance consistent with that of the PP model and save more significant investment in computational time. For example, Lin et al. (2013) proposed an urban flood prediction model by combining the k-means clustering algorithm and the SVR for forecasting flood levels with 1–3-h lead times in Yunlin County, Taiwan. Their results indicated that the proposed model, combined with point and spatial expansion forecasting, can quickly and accurately simulate both flood depth and inundation area. Yan et al. (2018) investigated the integration of the SVR with the numerical MIKE flood model for simulating urban floods in the Jinlong River Basin, Hangzhou, China. Berkhahn et al. (2019) employed a 2D hydrodynamic model to produce a dataset for training and testing an ANN model. Their model was successfully applied to predict flood maps, including the maximum floodwater levels, in two distinct urban catchments. Bermúdez et al. (2019) presented the application of the SVR model to the spatial prediction of urban floods in the urban area of Vilagarcia de Arousa, Northwest Spain. The results demonstrated that the SVR model produced reasonable and appropriate predictions for flood hazard mapping in coastal urban areas. Kabir et al. (2020) proposed a rapid prediction model for fluvial flood inundation. Their proposed model combined the CNN model with the hydraulic LISFLOOD-FP model. They found that the CNN model had an outstanding ability to model the floodwater depth in real time. Kim & Han (2020) employed the nonlinear autoregressive with exogenous inputs (NARX) model in conjunction with a self-organizing map to propose a rapid simulation model for urban flood prediction. Their proposed model successfully predicted the accumulative overflow at all manhole positions in various drainage districts in Seoul within a computational time of 2 min. Yan et al. (2021) proposed a novel neural network-based DD model coupled with the personal computer-based SWMM to predict the maximum flood depth at stations with multiple risks. Using the weighted redistribution approach, Hou et al. (2021) employed the RFR and KNR models to propose an integrated DD model for simulating urban flooding. Their results showed that the coupling DD model can enhance the prediction accuracy in comparison to the single RFR and KNR models. Moon et al. (2023) employed the SWMM model to simulate urban flooding and generated the dataset for training the LSTM model. The results indicated that the proposed hybrid model achieved a remarkable prediction performance, with a Nash–Sutcliffe efficiency (NSE) of over 0.8 for the case study in the Dorim Stream Basin, Korea. Xu et al. (2023) proposed a rapid urban flood prediction model based on the combination of SWMM with the LightGBM model. The results of their application to Taidian Island, Hainan Province, China, indicated that the LightGBM model achieved superior performance to the RFR, XGBoost, and KNR models.
According to the literature reviewed, the hybrid model was found to be the most commonly used method for predicting the urban flooding. First, well-calibrated PP models, such as the MIKE or SOBEK models, were adopted to create the required dataset. Then, sufficient data were provided for the DD models during the model training and testing periods. Therefore, the hybrid model not only achieves the proper and satisfactory prediction performance when compared to the PP model but can also satisfy the requirement of fast real-time prediction. However, the quality of the dataset could affect the prediction performance of the models. The data created by the PP models will differ from the actual observational data. More recently, sensors based on Internet of Things (IoT) technology have been increasingly employed to monitor flood detection (Sood et al. 2018; Goudarzi et al. 2021). The IoT sensor data can be utilized to produce a large dataset for the application of DD models. Yang & Chang (2020) combined IoT sensor data with the simulated results obtained from the SOBEK inundation model for predicting the regional average inundation depth in the Erren River basin, Taiwan. Their results suggested that adding IoT sensor data to the inputs of DD models can significantly reduce prediction errors. Therefore, this information provides promising evidence that the application of IoT sensor data can substantially improve the performance of DD models. Since the amount of IoT sensor data in Taipei City, Taiwan, is increasing annually, this study aims to utilize IoT sensor data to train and construct a DD model without considering the dataset generated by the PP model.
Therefore, this study proposed a new ensemble model based on the combination of three multiple-output data-driven (MOD) models with residual-error correction for short-duration predictions of water levels in urban storm-water sewer systems. Three MOD models, KNR, SVR, and CGBR, were applied to predict the storm-water levels at seven stations in Taipei city. The performances of the three MOD models were evaluated and analyzed using several evaluation criteria, including the correlation coefficient (CC), mean absolute error (MAE), root mean square error (RMSE), NSE, peak water level error (PWE), generalization ability (GA), reduction in the percentage error of the lead-time-averaged RMSE (RRMSE), and improvement in the percentage of error of the lead-time-averaged NSE (INSE) (Getirana et al. 2020; Guo et al. 2023). To further understand the interpretability of the model, the SHapley Additive exPlanation (SHAP) algorithm was utilized to investigate the importance of both global and local features. A sensitivity analysis was also conducted to highlight the applicability of the models by examining the impact of Bayesian optimization (BO) and input combinations on the prediction performance. Moreover, the prediction results obtained by the SWMM were compared with those achieved by the three MOD models for the selection of the most appropriate model. The most appropriate sewer flooding model was further employed, associated with the residual-error time series dataset for training and constructing a new ensemble model. To evaluate the capability of the proposed new ensemble model, analyses of the model validation with independent data were performed and studied. Comparisons of the results of the present study with those of other previous studies were presented to show the performance, potential, and reliability of the proposed model.
Although the techniques of three individual MOD models are not new, the model diversity from three MOD models can provide the proper and appropriate architecture for constructing a new ensemble model. By coupling three MOD models with residual-error correction, the present study proposed a new ensemble-based multiple-output model. Hence, the novelty of the present study is the new application of the proposed new model to urban storm-water level prediction with hydrological challenges (i.e., short-duration conditions, limited datasets, and extreme storm events), which provides a great opportunity to test the potential of the model. Consequently, the importance and originality of this study are as follows: (1) it investigated the performances of three MOD models and a newly proposed ensemble model for predicting short-duration water levels with lead times of 10–60 min at urban storm-water sewers; (2) this study contributed to a deeper investigation of feature importance and model interpretability for urban flooding using the SHAP algorithm; and (3) this is the first study to explain the difference between MOD-based sewer flooding models and the SWMM model for enhancing the appropriate advantages of the proposed models.
STUDY AREA AND DATA
With respect to disaster prevention and maintenance management, obtaining real-time monitoring data from storm-water sewer systems enables the most accurate monitoring of current storm-water runoff situations in sewers. Further analysis of real-time monitoring data can offer insights into the state of urban storm-water sewers, which could be valuable for early flood warnings.
The Hydraulic Engineering Office, Public Works Department, Taipei City Government, Taiwan, provided the monitoring data. The real-time data were obtained by pressure-type water level sensors monitored from 2016 to 2021. The hydrograph of the storm-water level may change significantly during typhoon or storm periods. Therefore, this study selected several major events to produce an event-based 10-min dataset for model training, validation, and testing. Table 1 summarizes the detailed information for these seven stations, including the area of the drainage region, the number of datasets, and the storm-water level statistics. Among the seven stations, the Yucheng 8 station has the largest drainage region of 1,623 ha, and its range of storm-water levels varies from 0.84 to 5.4 m. In addition, the Bailing 1 station has the most significant dataset of 1,385, and its range of storm-water levels is from 0.6 to 2.76 m. Furthermore, rainfall is the primary and direct factor influencing storm-water level hydrographs in storm-water sewer systems. Therefore, this study also collected 10-min rainfall data from four rainfall stations, Jiyan, Fude, Donghu Elementary School, and Civic Center, as shown in Figure 1.
District . | Stations . | Name of drainage region . | Area of drainage region (ha) . | Number of datasets . | Number of events . | Characteristics of storm-water level (m) . | ||
---|---|---|---|---|---|---|---|---|
Minimum . | Average . | Maximum . | ||||||
Beitou | Bailing 1 | Bailing | 638.74 | 1,385 | 30 | 0.6 | 1.28 | 2.76 |
Shilin | Shuangsi right-4A | Wunchang | 56.26 | 703 | 19 | 1.22 | 2.32 | 4.55 |
Shezih 2 | Shezih | 67.95 | 305 | 13 | −0.13 | 0.57 | 1.95 | |
Dihua 2 | Dihua | 170.47 | 808 | 39 | 0.52 | 0.97 | 1.87 | |
Neihu | Kangle 1 | Kangle | 189.91 | 341 | 15 | 5.45 | 5.79 | 7.36 |
Kangle 2 | Kangle | 189.91 | 421 | 24 | 6.21 | 7.16 | 8.46 | |
Xinyi | Yucheng 8 | Yucheng | 1,623.01 | 664 | 15 | 0.84 | 2.20 | 5.40 |
District . | Stations . | Name of drainage region . | Area of drainage region (ha) . | Number of datasets . | Number of events . | Characteristics of storm-water level (m) . | ||
---|---|---|---|---|---|---|---|---|
Minimum . | Average . | Maximum . | ||||||
Beitou | Bailing 1 | Bailing | 638.74 | 1,385 | 30 | 0.6 | 1.28 | 2.76 |
Shilin | Shuangsi right-4A | Wunchang | 56.26 | 703 | 19 | 1.22 | 2.32 | 4.55 |
Shezih 2 | Shezih | 67.95 | 305 | 13 | −0.13 | 0.57 | 1.95 | |
Dihua 2 | Dihua | 170.47 | 808 | 39 | 0.52 | 0.97 | 1.87 | |
Neihu | Kangle 1 | Kangle | 189.91 | 341 | 15 | 5.45 | 5.79 | 7.36 |
Kangle 2 | Kangle | 189.91 | 421 | 24 | 6.21 | 7.16 | 8.46 | |
Xinyi | Yucheng 8 | Yucheng | 1,623.01 | 664 | 15 | 0.84 | 2.20 | 5.40 |
SEWER FLOODING MODELS
SWMM model
Urban flooding areas are usually affected by the functions of urban drainage facilities. Rainfall surface runoff is the primary source of water entering drainage systems. Rainwater flows into storm-water sewers via streets and gutters and is transported to water gates or pumping stations. It is then discharged into the adjacent drainage system, river, or sea. Hence, urban flood simulations should consider all the dynamic processes in urban storm-water drainage systems, including rainfall-runoff, storm-water sewer systems, water gates, and pumping station operations, to improve the prediction performance.
The SWMM was proposed by the United States Environmental Protection Agency (EPA). The SWMM model is widely used in storm-water drainage system planning, analysis, and design (Farina et al. 2023; Zhuang et al. 2023). The model is primarily based on one-dimensional (1D) continuous equations and dynamic flow theory and comprises the surface runoff (RUNOFF) and extended transport (EXTRAN) modules.
With respect to the simulation analysis using the SWMM, it is essential to gather diverse data, including the drainage conditions of regional storm-water sewer systems, the urban planning subzone of land use, and precipitation type data. Therefore, the SWMM model inputs include rainfall time series data, digital elevation data, pipeline data, manhole data, pumping station data, and drainage system data. The outputs of the SWMM model are the hydrographs of the urban storm-water levels.
To evaluate the performance of DD models, the most common method is to compare simulation results with measured data. In addition, the SWMM, which is a type of PP model, is commonly used as a benchmark model for comparison with DD models, including rainfall–runoff modeling (Wang & Altunkaynak 2012; Granata et al. 2016) and urban sewer flood simulation (Xu et al. 2023). Therefore, the SWMM is one of the most commonly used urban flood prediction models, particularly in Taiwan. Due to the increasing amount of observed data from monitored stations in urban Taiwan areas, the basic dataset required for the SWMM is updated annually. Parameter calibrations and model verifications are also performed in flood-prone urban areas. Moreover, the advantage of using SWMM is that it can be quickly and suitably combined with a 2D flood model to produce operational flood forecasts as well as flood potential maps (Chang et al. 2021). Furthermore, the SWMM could be integrated with cloud computing services to establish a real-time storm sewer simulation system for early flood warning in urban areas (Lin et al. 2023). In this study, based on the advantages mentioned earlier, the SWMM was employed as a benchmark for comparisons with the proposed models.
Proposed MOD models
The multistep time series problem is commonly investigated, in which the given outputs at the next multiple time steps must be predicted. The four commonly used frameworks for performing multistep predictions are direct, recursive, hybrid direct-recursive, and multiple-output frameworks (Ben Taieb et al. 2010; Bontempi et al. 2013). The direct framework employs a separate one-step prediction model for each time step. The recursive framework involves the multiple use of separate one-step models, where the prediction for the previous time step is used as an input to predict the next time step. Direct and recursive frameworks could be combined, resulting in a hybrid methodology to achieve the benefits of both frameworks.
Furthermore, the multiple-output framework creates a single model capable of predicting all outputs in a single time step. The multiple-output framework has a more complex structure than the other three frameworks. However, it can learn the dependence relationships between the inputs and the outputs and among the outputs themselves. Increased complexity can lead to slower training times, but it can enhance the accuracy in the multistep time series prediction problem. The multiple-output framework can be applied to time series problems with multitarget outputs or multistep-ahead prediction. This study utilized the multiple-output framework as the primary methodology to propose three MOD models with multistep-ahead prediction using the KNR, SVR, and CGBR techniques.
KNR technique
The KNR model is a straightforward DD model suitable for nonparametric supervised learning. In practical applications, the KNR model utilizes the k-nearest training dataset as model inputs. The model output of the KNR model is based on the average value of the k-closest neighbors.
The algorithm of the KNR model for continuous variable estimation (i.e., the time series prediction problem) can be summarized as follows (Hou et al. 2021; Beskopylny et al. 2022):
1. The Mahalanobis or Euclidean distance between the queried and labeled examples is calculated.
2. The distance is scaled according to the order of the labeled examples.
3. Based on the cross-validation using the RMSE indicator, the optimal number of nearest neighbors k is estimated.
4. The k-nearest multivariate neighbors are utilized to compute the weighted averaging inverse distance.
SVR technique
CGBR technique
Consequently, the CGBR model has several desirable features. For instance, the CGBR model utilized ordered boosting to overcome model overfitting and applied symmetric trees for faster execution, making it computationally efficient, robust, and accurate for model training (Prokhorenkova et al. 2018).
Proposed new ensemble model
To enhance the appropriate performance of three MOD models (KNR, SVR, and CGBR), this study further employed these three models as the base learner for proposing a new ensemble model. Recently, the stacking ensemble learning has been applied in several different studies, such as reservoir inflow simulation (Zhang et al. 2021), flash flood susceptibility (Yao et al. 2022), and the prediction of customer lifetime (Gadgil et al. 2023). The core idea of stacking ensemble learning comprises two levels of procedures (Mienye & Sun 2022). For the first-level procedure, the individual models, referred to as the base learners, are trained and employed to produce the predictions. In the second-level procedure, a meta-learner is used to combine the predictions made by the base learners.
The higher the model diversity is, the greater the accuracy that can be achieved. This study employed three different MOD models. The KNR and SVR models are traditional and commonly used machine learning methods. However, CGBR is based on gradient boosting, which can reduce the bias error of the model and thus quickly avoid model overfitting. Recently, machine learning methods using gradient boosting, named LightGBM, XGBoost, and CGBR, have been applied to solve several time series problems: particulate matter estimation (Mampitiya et al. 2024), oil formation volume forecasting (Kharazi Esfahani et al. 2023), wind power prediction (Ponkumar et al. 2023), and stock price prediction (Hartanto et al. 2023). The main difference among these three models is the use of the tree growth technique. The LightGBM uses leafwise tree growth to avoid model overfitting, while XGBoost employs levelwise tree growth. Unlike XGBoost and LightGBM, the CGBR model applies a balanced symmetric tree architecture that has the benefits of controlling model overfitting and reducing prediction time. In a previous study by Guo et al. (2023), the CGBR model was found to be suitable and accurate for predicting river flood stages in a steep mountain river basin. Therefore, the CGBR model was also selected in this study to test its reliability, potential and performance in predicting storm-water levels.
In addition to stacking ensemble learning, residual-error correction is also a robust and efficient method for improving prediction performance. According to the study reported by Phan & Nguyen (2020), there are two methods for correcting residual errors. The first method is to train and model the residual error for correcting the time series predictions, referred to herein as the error-modeling-based correction. The alternative method is to use the residual error as the input factor for training and correcting the time series predictions, referred to herein as the error-factor-based correction. Although the residual-error correction is not a new method, this study extended the error-factor-based correction in combination with the multiple-output framework, leading to a new method.
The detailed algorithm of the proposed model is summarized as follows:
1. Three MOD models (base learners) were trained and tested using the first-level procedure with the prepared inputs, expressed in Equations (3)–(9).
2. Three predictions were obtained from the MOD models, and the most appropriate regression model was analyzed based on several indicators.
3. The most appropriate model was selected as the regression function of meta-learning.
4. Based on the second-level procedure expressed in Equations (14)–(17), the time series residual errors were estimated by three MOD models and combined with the input vectors to achieve the final predictions.
Bayesian optimization
The DD models use multiple parameters to train links between inputs and outputs. The various parameters used in different DD models considerably affect their predictive capabilities. Thus, determining the optimal parameters for model training is essential when processing using DD models. Several optimization methods exist, such as grid search, random search, and BO, to identify the best parameters (Bai et al. 2021; Yin et al. 2023).
A grid search is a simple method that evaluates all possible combinations of hyperparameters. Therefore, this method results in extensive computation times for practical application. In addition, the random search is more efficient because it involves randomly selecting combinations of parameters. Furthermore, the BO employs two main core techniques, namely, probabilistic modeling and the use of an acquisition function, to achieve several beneficial results, including algorithmic effectiveness, random evaluation handling, and global optimization across all the data.
The BO process can be summarized as follows.
1. Initially, a few sets of arbitrary combinations of hyperparameters are selected for evaluation. Additionally, the fitting of the probabilistic model is performed, and the performance metric is estimated.
2. Determination of the subsequent hyperparameter combination for evaluation utilizing the acquisition function. Then, the hyperparameter combination is adopted to train the DD model.
3. Combining the new evaluation into the dataset, the probabilistic model is reconstructed, and the acquisition function is given.
4. The model is updated based on the convergence criterion.
SHAP technique
Regarding the implementation in computing the SHAP, the KernelSHAP and TreeSHAP are promising and suitable algorithms. Both algorithms can address global interpretation, local interpretation, and visualization. Compared to KernelSHAP, the TreeSHAP algorithm explicitly utilizes a tree-based model to handle the SHAP values and decrease the computational complexity for large datasets. Consequently, TreeSHAP has the advantage of fast implementation for enhancing model interpretability; therefore, it was utilized in this study.
Performance evaluation metrics
The predicted result will be better if the CC or NSE value is close to 1. In addition, if the RMSE or MAE approaches 0, the prediction outcome improves. The PWE indicator is employed to explain the ability of the model to predict the peak storm-water level of a sewer system. The smaller the PWE is, the better the prediction of the peak storm-water level. The GA is a helpful indicator for assessing the GA of the DD model. As the GA approaches 1, the model exhibits the optimal learning performance. However, in practical applications, if the GA is smaller than 1, the model is underfit, whereas if the GA is greater than 1, the model is overfit. The performance of the proposed model also depends on the lead time in the prediction. The overall performance of the proposed model can be assessed effectively and appropriately by taking the average of all the prediction results over different lead times (Getirana et al. 2020). Therefore, this study conducted a model evaluation following the method presented by Guo et al. (2023) and used the RRMSE and INSE indicators to assess the improvement percentage of the proposed model compared to three single MOD models. The more the RRMSE and INSE values increase, the more the model can be improved.
Study flowchart and model implementation
(1) Data preprocessing:
To propose three MOD models, this study collected related information consisting of precipitation and water levels in urban storm-water sewer systems. The gathered dataset was normalized utilizing the min–max normalization method and divided into training and test datasets based on the widely used ratio of 7:3. Subsequently, the data relevant to the requirements of the SWMM were also collected for the implementation of the PP model.
(2) Model development:
By applying the collected data, three MOD models, namely, the KNR, SVR, and CGBR models, were proposed. The sensitivity and interpretability of the models were also investigated. Additionally, the SWMM was employed to simulate the dynamic process of the water level in urban storm-water sewer systems.
(3) Performance evaluations:
The intense storm events were selected, simulated, and analyzed using the three MOD models. The predicted results from the SWMM were also compared with those from the MOD models. The different indicators were utilized to evaluate the model performance and determine the most appropriate regression model.
(4) Proposing a new ensemble model
The predictions from three MOD models were further selected as the base learners. The most appropriate model was selected as the meta-learner and coupled with the residual-error correction. The final outputs were predicted, and model validation with independent data was performed. Consequently, the improvement in performance achieved by the proposed model was analyzed and discussed.
The proposed models were programmed and implemented in Python 3.7 with Keras and Scikit-Learn open-source libraries. With respect to the PP model, this study employed the SWMM version 4.4 h. The input file of the SWMM 4.4 h model is a text file, which can be created using a standard text editor. The model can be run in the disk operating system (DOS) environment or the PCSWMM window environment interface to perform the SWMM simulation.
RESULTS AND DISCUSSION
Analysis of the influence of BO on predictions
Most researchers have investigated the direct influence of BO on prediction results. However, there needs to be more exploration of performance improvement in terms of proportion. Therefore, this section examines the impact of BO on predictions and analyses the corresponding improvement proportions.
This study selected the Yucheng 8 station as a test case to evaluate the effectiveness of BO treatment in combination with three proposed MOD models. The MOD models without BO treatment were also employed to highlight the benefits of using BO. The parameter sets of the three MOD models without BO were set as ‘default’ in the Python libraries for simulations.
MOD models . | Optimal parameter sets . |
---|---|
KNR | Number of neighbors: 10; algorithm: auto |
SVR | Kernel: RBF; gamma: 0.02; C = 20; epsilon: 0.01 |
CGBR | Max depth: 2; number of estimators: 80; learning rate: 0.1 |
MOD models . | Optimal parameter sets . |
---|---|
KNR | Number of neighbors: 10; algorithm: auto |
SVR | Kernel: RBF; gamma: 0.02; C = 20; epsilon: 0.01 |
CGBR | Max depth: 2; number of estimators: 80; learning rate: 0.1 |
Table 3 lists the summary results representing the performances of the three MOD models with and without BO. To investigate the overall performance, the results over all lead times were averaged. The lead-time-averaged NSE values obtained from the KNR, SVR, and CGBR models without BO were 0.50, 0.73, and 0.72, respectively. Using BO, the lead-time-averaged NSE values achieved from the KNR, SVR, and CGBR models were 0.55, 0.76, and 0.76, respectively. Therefore, the three MOD models with BO exhibited increases in the lead-time-averaged NSE of 8.94, 4.79, and 5.34%, respectively, compared to those without BO. As shown in Table 3, the three MOD models with BO also reduced the lead-time-averaged RMSE by 4.79, 7.53, and 6.30%, respectively, in contrast to the models without BO. The results demonstrated that the three MOD models with BO achieved overall improvements compared to those without BO.
Models . | Lead times . | Without BO . | With BO . | ||||||
---|---|---|---|---|---|---|---|---|---|
CC . | MAE (m) . | RMSE (m) . | NSE . | CC . | MAE (m) . | RMSE (m) . | NSE . | ||
KNR | 10 | 0.84 | 0.25 | 0.37 | 0.67 | 0.87 | 0.22 | 0.34 | 0.72 |
20 | 0.79 | 0.29 | 0.42 | 0.57 | 0.82 | 0.27 | 0.40 | 0.61 | |
30 | 0.74 | 0.32 | 0.46 | 0.49 | 0.78 | 0.30 | 0.43 | 0.54 | |
40 | 0.72 | 0.33 | 0.48 | 0.44 | 0.76 | 0.32 | 0.45 | 0.49 | |
50 | 0.72 | 0.34 | 0.49 | 0.43 | 0.75 | 0.33 | 0.47 | 0.47 | |
60 | 0.72 | 0.34 | 0.49 | 0.43 | 0.74 | 0.34 | 0.48 | 0.45 | |
SVR | 10 | 0.96 | 0.11 | 0.18 | 0.92 | 0.98 | 0.07 | 0.13 | 0.96 |
20 | 0.92 | 0.16 | 0.26 | 0.83 | 0.93 | 0.15 | 0.25 | 0.85 | |
30 | 0.87 | 0.21 | 0.33 | 0.74 | 0.88 | 0.20 | 0.32 | 0.75 | |
40 | 0.83 | 0.26 | 0.37 | 0.66 | 0.85 | 0.24 | 0.36 | 0.69 | |
50 | 0.81 | 0.28 | 0.39 | 0.62 | 0.84 | 0.26 | 0.37 | 0.67 | |
60 | 0.80 | 0.30 | 0.41 | 0.60 | 0.83 | 0.27 | 0.37 | 0.67 | |
CGBR | 10 | 0.97 | 0.10 | 0.16 | 0.93 | 0.97 | 0.10 | 0.17 | 0.93 |
20 | 0.92 | 0.16 | 0.26 | 0.83 | 0.91 | 0.16 | 0.26 | 0.83 | |
30 | 0.86 | 0.22 | 0.33 | 0.73 | 0.87 | 0.21 | 0.32 | 0.75 | |
40 | 0.83 | 0.26 | 0.38 | 0.66 | 0.85 | 0.24 | 0.35 | 0.71 | |
50 | 0.80 | 0.29 | 0.40 | 0.61 | 0.84 | 0.25 | 0.36 | 0.69 | |
60 | 0.79 | 0.31 | 0.42 | 0.58 | 0.82 | 0.27 | 0.38 | 0.66 |
Models . | Lead times . | Without BO . | With BO . | ||||||
---|---|---|---|---|---|---|---|---|---|
CC . | MAE (m) . | RMSE (m) . | NSE . | CC . | MAE (m) . | RMSE (m) . | NSE . | ||
KNR | 10 | 0.84 | 0.25 | 0.37 | 0.67 | 0.87 | 0.22 | 0.34 | 0.72 |
20 | 0.79 | 0.29 | 0.42 | 0.57 | 0.82 | 0.27 | 0.40 | 0.61 | |
30 | 0.74 | 0.32 | 0.46 | 0.49 | 0.78 | 0.30 | 0.43 | 0.54 | |
40 | 0.72 | 0.33 | 0.48 | 0.44 | 0.76 | 0.32 | 0.45 | 0.49 | |
50 | 0.72 | 0.34 | 0.49 | 0.43 | 0.75 | 0.33 | 0.47 | 0.47 | |
60 | 0.72 | 0.34 | 0.49 | 0.43 | 0.74 | 0.34 | 0.48 | 0.45 | |
SVR | 10 | 0.96 | 0.11 | 0.18 | 0.92 | 0.98 | 0.07 | 0.13 | 0.96 |
20 | 0.92 | 0.16 | 0.26 | 0.83 | 0.93 | 0.15 | 0.25 | 0.85 | |
30 | 0.87 | 0.21 | 0.33 | 0.74 | 0.88 | 0.20 | 0.32 | 0.75 | |
40 | 0.83 | 0.26 | 0.37 | 0.66 | 0.85 | 0.24 | 0.36 | 0.69 | |
50 | 0.81 | 0.28 | 0.39 | 0.62 | 0.84 | 0.26 | 0.37 | 0.67 | |
60 | 0.80 | 0.30 | 0.41 | 0.60 | 0.83 | 0.27 | 0.37 | 0.67 | |
CGBR | 10 | 0.97 | 0.10 | 0.16 | 0.93 | 0.97 | 0.10 | 0.17 | 0.93 |
20 | 0.92 | 0.16 | 0.26 | 0.83 | 0.91 | 0.16 | 0.26 | 0.83 | |
30 | 0.86 | 0.22 | 0.33 | 0.73 | 0.87 | 0.21 | 0.32 | 0.75 | |
40 | 0.83 | 0.26 | 0.38 | 0.66 | 0.85 | 0.24 | 0.35 | 0.71 | |
50 | 0.80 | 0.29 | 0.40 | 0.61 | 0.84 | 0.25 | 0.36 | 0.69 | |
60 | 0.79 | 0.31 | 0.42 | 0.58 | 0.82 | 0.27 | 0.38 | 0.66 |
Sensitivity analysis for feature importance
The prediction performance of the DD models may be influenced by various factors, such as the quality, quantity, and features contained in the dataset. In this section, a sensitivity analysis of the combination of feature factors was conducted to explore the feature importance in water level predictions of urban storm-water sewer systems.
Table 4 lists five sets of input variables concerning the rainfall and storm-water level. The combination of C1 considered only the antecedent storm-water levels from time step t to t − 6. The combination of C2 only employed the antecedent rainfall from time step t to t − 6. Moreover, the test case of C3 combined the antecedent and subsequent rainfall inputs from time steps t − 6 to t + 6. Furthermore, the test case of C4 used a combination of the antecedent storm-water level and rainfall variables at time steps t to t − 6. For C5, all the input features were adopted.
Combinations . | Antecedent rainfall . | Future rainfall . | Antecedent water levels . |
---|---|---|---|
C1 | – | – | Ht−6…. Ht−1,…. Ht |
C2 | Rt−6…. Rt−1,…. Rt | – | – |
C3 | Rt−6…. Rt−1,…. Rt | Rt+1…. Rt+6 | – |
C4 | Rt−6…. Rt−1,…. Rt | – | Ht−6…. Ht−1,…. Ht |
C5 | Rt−6…. Rt−1,…. Rt | Rt+1…. Rt+6 | Ht−6…. Ht−1,…. Ht |
Combinations . | Antecedent rainfall . | Future rainfall . | Antecedent water levels . |
---|---|---|---|
C1 | – | – | Ht−6…. Ht−1,…. Ht |
C2 | Rt−6…. Rt−1,…. Rt | – | – |
C3 | Rt−6…. Rt−1,…. Rt | Rt+1…. Rt+6 | – |
C4 | Rt−6…. Rt−1,…. Rt | – | Ht−6…. Ht−1,…. Ht |
C5 | Rt−6…. Rt−1,…. Rt | Rt+1…. Rt+6 | Ht−6…. Ht−1,…. Ht |
Table 5 lists the performance results of the CGBR model with five different input combinations based on four indicators during the training and test phases. For the testing phase, the lead-time-averaged CCs for the models with C1, C2, C3, C4, and C5 were 0.63, 0.68, 0.78, 0.75, and 0.88, respectively; this suggested that the CGBR model with C5 performed the best, while the model with C1 had the worst performance. In addition, the lead-time-averaged RMSEs for models C1, C2, C3, C4, and C5 are 0.47, 0.57, 0.51, 0.40, and 0.31 m, respectively; this indicates that the predictive error of the CGBR model with C5 is the lowest. Regarding the GA performance, the lead-time-averaged GA values for the CGBR models with C1, C2, C3, C4, and C5 are 1.16, 0.88, 0.85, 1.12, and 1.08, respectively. The results revealed that the CGBR model with C5 has outstanding GA performance. The results also demonstrated that the antecedent storm-water level and future rainfall variables are crucial and helpful for predicting water levels in urban storm-water sewer systems due to their considerable contributions.
Combinations . | Lead times . | Training . | Test . | ||||||
---|---|---|---|---|---|---|---|---|---|
CC . | MAE (m) . | RMSE (m) . | NSE . | CC . | MAE (m) . | RMSE (m) . | NSE . | ||
C1 | 10 | 0.98 | 0.12 | 0.16 | 0.96 | 0.95 | 0.12 | 0.19 | 0.91 |
20 | 0.95 | 0.20 | 0.28 | 0.89 | 0.84 | 0.22 | 0.35 | 0.70 | |
30 | 0.90 | 0.27 | 0.38 | 0.80 | 0.71 | 0.32 | 0.47 | 0.46 | |
40 | 0.85 | 0.33 | 0.47 | 0.70 | 0.56 | 0.39 | 0.55 | 0.25 | |
50 | 0.79 | 0.38 | 0.54 | 0.61 | 0.42 | 0.45 | 0.60 | 0.12 | |
60 | 0.74 | 0.43 | 0.59 | 0.53 | 0.29 | 0.49 | 0.64 | 0.01 | |
C2 | 10 | 0.68 | 0.51 | 0.64 | 0.45 | 0.77 | 0.38 | 0.53 | 0.30 |
20 | 0.70 | 0.50 | 0.62 | 0.48 | 0.77 | 0.38 | 0.53 | 0.31 | |
30 | 0.69 | 0.50 | 0.63 | 0.47 | 0.75 | 0.38 | 0.54 | 0.29 | |
40 | 0.67 | 0.51 | 0.64 | 0.43 | 0.69 | 0.39 | 0.57 | 0.21 | |
50 | 0.62 | 0.53 | 0.68 | 0.37 | 0.60 | 0.42 | 0.61 | 0.09 | |
60 | 0.56 | 0.55 | 0.71 | 0.31 | 0.50 | 0.45 | 0.65 | -0.02 | |
C3 | 10 | 0.71 | 0.49 | 0.62 | 0.49 | 0.77 | 0.36 | 0.52 | 0.32 |
20 | 0.72 | 0.49 | 0.61 | 0.50 | 0.78 | 0.36 | 0.52 | 0.34 | |
30 | 0.73 | 0.48 | 0.60 | 0.51 | 0.78 | 0.36 | 0.51 | 0.35 | |
40 | 0.73 | 0.48 | 0.60 | 0.51 | 0.79 | 0.35 | 0.51 | 0.38 | |
50 | 0.74 | 0.47 | 0.59 | 0.52 | 0.78 | 0.35 | 0.51 | 0.37 | |
60 | 0.74 | 0.47 | 0.59 | 0.52 | 0.79 | 0.34 | 0.50 | 0.39 | |
C4 | 10 | 0.99 | 0.11 | 0.14 | 0.97 | 0.97 | 0.11 | 0.17 | 0.93 |
20 | 0.96 | 0.17 | 0.23 | 0.93 | 0.90 | 0.18 | 0.28 | 0.81 | |
30 | 0.93 | 0.24 | 0.33 | 0.85 | 0.82 | 0.25 | 0.37 | 0.66 | |
40 | 0.88 | 0.29 | 0.41 | 0.77 | 0.71 | 0.31 | 0.47 | 0.47 | |
50 | 0.83 | 0.34 | 0.48 | 0.68 | 0.60 | 0.37 | 0.53 | 0.32 | |
60 | 0.78 | 0.39 | 0.54 | 0.60 | 0.50 | 0.42 | 0.58 | 0.19 | |
C5 | 10 | 0.99 | 0.10 | 0.14 | 0.97 | 0.97 | 0.10 | 0.17 | 0.93 |
20 | 0.97 | 0.16 | 0.22 | 0.94 | 0.91 | 0.16 | 0.26 | 0.83 | |
30 | 0.95 | 0.21 | 0.28 | 0.89 | 0.87 | 0.21 | 0.32 | 0.75 | |
40 | 0.93 | 0.24 | 0.32 | 0.86 | 0.85 | 0.24 | 0.35 | 0.71 | |
50 | 0.91 | 0.27 | 0.35 | 0.83 | 0.84 | 0.25 | 0.36 | 0.69 | |
60 | 0.90 | 0.29 | 0.38 | 0.80 | 0.82 | 0.27 | 0.38 | 0.66 |
Combinations . | Lead times . | Training . | Test . | ||||||
---|---|---|---|---|---|---|---|---|---|
CC . | MAE (m) . | RMSE (m) . | NSE . | CC . | MAE (m) . | RMSE (m) . | NSE . | ||
C1 | 10 | 0.98 | 0.12 | 0.16 | 0.96 | 0.95 | 0.12 | 0.19 | 0.91 |
20 | 0.95 | 0.20 | 0.28 | 0.89 | 0.84 | 0.22 | 0.35 | 0.70 | |
30 | 0.90 | 0.27 | 0.38 | 0.80 | 0.71 | 0.32 | 0.47 | 0.46 | |
40 | 0.85 | 0.33 | 0.47 | 0.70 | 0.56 | 0.39 | 0.55 | 0.25 | |
50 | 0.79 | 0.38 | 0.54 | 0.61 | 0.42 | 0.45 | 0.60 | 0.12 | |
60 | 0.74 | 0.43 | 0.59 | 0.53 | 0.29 | 0.49 | 0.64 | 0.01 | |
C2 | 10 | 0.68 | 0.51 | 0.64 | 0.45 | 0.77 | 0.38 | 0.53 | 0.30 |
20 | 0.70 | 0.50 | 0.62 | 0.48 | 0.77 | 0.38 | 0.53 | 0.31 | |
30 | 0.69 | 0.50 | 0.63 | 0.47 | 0.75 | 0.38 | 0.54 | 0.29 | |
40 | 0.67 | 0.51 | 0.64 | 0.43 | 0.69 | 0.39 | 0.57 | 0.21 | |
50 | 0.62 | 0.53 | 0.68 | 0.37 | 0.60 | 0.42 | 0.61 | 0.09 | |
60 | 0.56 | 0.55 | 0.71 | 0.31 | 0.50 | 0.45 | 0.65 | -0.02 | |
C3 | 10 | 0.71 | 0.49 | 0.62 | 0.49 | 0.77 | 0.36 | 0.52 | 0.32 |
20 | 0.72 | 0.49 | 0.61 | 0.50 | 0.78 | 0.36 | 0.52 | 0.34 | |
30 | 0.73 | 0.48 | 0.60 | 0.51 | 0.78 | 0.36 | 0.51 | 0.35 | |
40 | 0.73 | 0.48 | 0.60 | 0.51 | 0.79 | 0.35 | 0.51 | 0.38 | |
50 | 0.74 | 0.47 | 0.59 | 0.52 | 0.78 | 0.35 | 0.51 | 0.37 | |
60 | 0.74 | 0.47 | 0.59 | 0.52 | 0.79 | 0.34 | 0.50 | 0.39 | |
C4 | 10 | 0.99 | 0.11 | 0.14 | 0.97 | 0.97 | 0.11 | 0.17 | 0.93 |
20 | 0.96 | 0.17 | 0.23 | 0.93 | 0.90 | 0.18 | 0.28 | 0.81 | |
30 | 0.93 | 0.24 | 0.33 | 0.85 | 0.82 | 0.25 | 0.37 | 0.66 | |
40 | 0.88 | 0.29 | 0.41 | 0.77 | 0.71 | 0.31 | 0.47 | 0.47 | |
50 | 0.83 | 0.34 | 0.48 | 0.68 | 0.60 | 0.37 | 0.53 | 0.32 | |
60 | 0.78 | 0.39 | 0.54 | 0.60 | 0.50 | 0.42 | 0.58 | 0.19 | |
C5 | 10 | 0.99 | 0.10 | 0.14 | 0.97 | 0.97 | 0.10 | 0.17 | 0.93 |
20 | 0.97 | 0.16 | 0.22 | 0.94 | 0.91 | 0.16 | 0.26 | 0.83 | |
30 | 0.95 | 0.21 | 0.28 | 0.89 | 0.87 | 0.21 | 0.32 | 0.75 | |
40 | 0.93 | 0.24 | 0.32 | 0.86 | 0.85 | 0.24 | 0.35 | 0.71 | |
50 | 0.91 | 0.27 | 0.35 | 0.83 | 0.84 | 0.25 | 0.36 | 0.69 | |
60 | 0.90 | 0.29 | 0.38 | 0.80 | 0.82 | 0.27 | 0.38 | 0.66 |
Global and local analysis of model interpretability
DD models are frequently referred to as black-box models because they do not consider the modeling of physical processes. To further understand the importance of features in DD model prediction, the interpretability of the model was assessed and analyzed in this section. Information about the interpretability of the model could be helpful for verifying and improving the proposed model.
Based on the global and local analysis results in this section, Ht is the most critical feature for forecasting the storm-water level. With respect to the predictions with longer lead times, considering future rainfall can also substantially contribute to the prediction results. Therefore, for future work related to the development of real-time operational sewer flooding systems, real-time data from water level monitoring are vital for Ht. Furthermore, to achieve appropriate forecasting performance with longer lead times, forecasts of future rainfall are also important components of the operational system.
Performance assessment of seven stations by three MOD models
To further discuss the performance of the three MOD models, Table 6 presents the performance results of the models in predicting 10–60-min lead times at seven gauged sites. The results were evaluated using four commonly used indicators on the test datasets. The results demonstrated that the SVR and CGBR models performed well in terms of the lead-time-averaged CC values, ranging from 0.6 to 0.88. Additionally, both models yielded the lowest lead-time-averaged RMSE values, ranging from 0.14 to 0.36 m. However, it should be noted that the SVR and CGBR models demonstrated poor lead-time-averaged NSE performance at the Shezih 2 station; this can be attributed to the scarcity of the dataset recorded at the Shezih 2 station, resulting in the worst lead-time-averaged GA performance of 1.41. However, based on the average prediction results from all seven stations, the SVR model has an acceptable total average performance with a CC of 0.75, an MAE of 0.2 m, an RMSE of 0.26 m, an NSE of 0.41 and a GA of 1.13; these results are very similar to those of the CGBR model (CC = 0.73, MAE = 0.21 m, RMSE = 0.27 m, NSE = 0.38 and GA = 1.21). Therefore, the SVR and CGBR models are reliable, accurate, and suitable for the overall simulation of storm-water levels at seven stations.
Stations . | MOD Models . | Averaged values in lead times of 10–60 min based on test datasets . | Averaged values in lead times of 10–60 min . | |||
---|---|---|---|---|---|---|
CC . | MAE (m) . | RMSE (m) . | NSE . | GA . | ||
Bailing 1 | KNR | 0.62 | 0.25 | 0.31 | 0.33 | 1.17 |
SVR | 0.76 | 0.19 | 0.25 | 0.54 | 1.18 | |
CGBR | 0.71 | 0.21 | 0.26 | 0.51 | 1.25 | |
Shuangsi right-4A | KNR | 0.58 | 0.24 | 0.32 | 0.27 | 0.99 |
SVR | 0.61 | 0.24 | 0.31 | 0.29 | 1.01 | |
CGBR | 0.57 | 0.24 | 0.31 | 0.31 | 1.00 | |
Shezih 2 | KNR | 0.51 | 0.22 | 0.27 | 0.03 | 1.17 |
SVR | 0.76 | 0.23 | 0.28 | -0.13 | 1.41 | |
CGBR | 0.71 | 0.22 | 0.28 | -0.04 | 1.41 | |
Dihua 2 | KNR | 0.66 | 0.14 | 0.17 | 0.35 | 1.06 |
SVR | 0.74 | 0.11 | 0.14 | 0.51 | 1.02 | |
CGBR | 0.74 | 0.14 | 0.17 | 0.33 | 1.09 | |
Kangle 1 | KNR | 0.73 | 0.21 | 0.29 | 0.33 | 1.16 |
SVR | 0.84 | 0.15 | 0.20 | 0.67 | 1.06 | |
CGBR | 0.84 | 0.16 | 0.22 | 0.60 | 1.09 | |
Kangle 2 | KNR | 0.59 | 0.38 | 0.46 | −0.31 | 1.50 |
SVR | 0.67 | 0.27 | 0.35 | 0.24 | 1.24 | |
CGBR | 0.67 | 0.29 | 0.36 | 0.21 | 1.57 | |
Yucheng 8 | KNR | 0.79 | 0.30 | 0.43 | 0.55 | 1.05 |
SVR | 0.88 | 0.20 | 0.30 | 0.76 | 1.00 | |
CGBR | 0.88 | 0.21 | 0.31 | 0.76 | 1.08 |
Stations . | MOD Models . | Averaged values in lead times of 10–60 min based on test datasets . | Averaged values in lead times of 10–60 min . | |||
---|---|---|---|---|---|---|
CC . | MAE (m) . | RMSE (m) . | NSE . | GA . | ||
Bailing 1 | KNR | 0.62 | 0.25 | 0.31 | 0.33 | 1.17 |
SVR | 0.76 | 0.19 | 0.25 | 0.54 | 1.18 | |
CGBR | 0.71 | 0.21 | 0.26 | 0.51 | 1.25 | |
Shuangsi right-4A | KNR | 0.58 | 0.24 | 0.32 | 0.27 | 0.99 |
SVR | 0.61 | 0.24 | 0.31 | 0.29 | 1.01 | |
CGBR | 0.57 | 0.24 | 0.31 | 0.31 | 1.00 | |
Shezih 2 | KNR | 0.51 | 0.22 | 0.27 | 0.03 | 1.17 |
SVR | 0.76 | 0.23 | 0.28 | -0.13 | 1.41 | |
CGBR | 0.71 | 0.22 | 0.28 | -0.04 | 1.41 | |
Dihua 2 | KNR | 0.66 | 0.14 | 0.17 | 0.35 | 1.06 |
SVR | 0.74 | 0.11 | 0.14 | 0.51 | 1.02 | |
CGBR | 0.74 | 0.14 | 0.17 | 0.33 | 1.09 | |
Kangle 1 | KNR | 0.73 | 0.21 | 0.29 | 0.33 | 1.16 |
SVR | 0.84 | 0.15 | 0.20 | 0.67 | 1.06 | |
CGBR | 0.84 | 0.16 | 0.22 | 0.60 | 1.09 | |
Kangle 2 | KNR | 0.59 | 0.38 | 0.46 | −0.31 | 1.50 |
SVR | 0.67 | 0.27 | 0.35 | 0.24 | 1.24 | |
CGBR | 0.67 | 0.29 | 0.36 | 0.21 | 1.57 | |
Yucheng 8 | KNR | 0.79 | 0.30 | 0.43 | 0.55 | 1.05 |
SVR | 0.88 | 0.20 | 0.30 | 0.76 | 1.00 | |
CGBR | 0.88 | 0.21 | 0.31 | 0.76 | 1.08 |
Comparisons of the simulated results with the SWMM
The presented SWMM was well calibrated in the urban districts of Taipei city, and the related results are shown in Table 7. On the basis of the results of five selected storm events, the PWE can be controlled within 0.5 m, indicating that the SWMM achieved acceptable performance in model calibration.
Events . | Peak values of storm-water level . | PWE (m) . | |
---|---|---|---|
Measured (m) . | Simulated (m) . | ||
Storm at the Yucheng 9 station (4 Jun 2021 from 13:00 to 19:30) | 4.34 | 4.10 | −0.24 |
Storm at the Dihua 1 station (4 Jun 2021 from 13:00 to 19:30) | 0.95 | 0.99 | 0.04 |
Storm at the Yucheng 7 station (4 Jun 2021 from 13:00 to 19:30) | 6.69 | 7.19 | 0.50 |
Storm at the Dihua 1 station (22 Jul 2019 from 14:00 to 18:00) | 1.26 | 1.31 | 0.05 |
Storm at the Yucheng 4 station (22 Jul 2019 from 14:00 to 18:00) | 4.01 | 4.33 | 0.32 |
Events . | Peak values of storm-water level . | PWE (m) . | |
---|---|---|---|
Measured (m) . | Simulated (m) . | ||
Storm at the Yucheng 9 station (4 Jun 2021 from 13:00 to 19:30) | 4.34 | 4.10 | −0.24 |
Storm at the Dihua 1 station (4 Jun 2021 from 13:00 to 19:30) | 0.95 | 0.99 | 0.04 |
Storm at the Yucheng 7 station (4 Jun 2021 from 13:00 to 19:30) | 6.69 | 7.19 | 0.50 |
Storm at the Dihua 1 station (22 Jul 2019 from 14:00 to 18:00) | 1.26 | 1.31 | 0.05 |
Storm at the Yucheng 4 station (22 Jul 2019 from 14:00 to 18:00) | 4.01 | 4.33 | 0.32 |
Table 8 presents the performance comparisons of the four models for three selected storm events based on three indicators. All models achieved satisfactory performance in terms of the CC values for three storm events, varying from 0.88 to 0.99. For the NSE performance, all models performed well for the Yucheng 8 station. Moreover, the SVR model yielded the best NSE performance at the Shezih 2 and Kangle 2 stations, whereas the SWMM exhibited the worst NSE performance.
No. . | Events . | Indicators . | Models . | |||
---|---|---|---|---|---|---|
SWMM . | KNR . | SVR . | CGBR . | |||
1 | Storm at the Shezih 2 station (4 Jun 2021 from 13:00 to 19:30) | CC | 0.91 | 0.91 | 0.98 | 0.96 |
NSE | 0.24 | 0.70 | 0.95 | 0.88 | ||
PWE (m) | 0.29 | −0.21 | 0.02 | −0.01 | ||
2 | Storm at the Kangle 2 station (4 Jun 2021 from 13:00 to 1700) | CC | 0.91 | 0.88 | 0.95 | 0.90 |
NSE | 0.33 | 0.18 | 0.84 | 0.48 | ||
PWE (m) | 0.29 | −0.01 | 0.27 | 0.38 | ||
3 | Storm at the Yucheng 8 station (22 Jul 2019 from 14:00 to 18:00) | CC | 0.99 | 0.97 | 0.99 | 0.99 |
NSE | 0.89 | 0.92 | 0.99 | 0.97 | ||
PWE (m) | 0.16 | −0.46 | −0.09 | 0.02 |
No. . | Events . | Indicators . | Models . | |||
---|---|---|---|---|---|---|
SWMM . | KNR . | SVR . | CGBR . | |||
1 | Storm at the Shezih 2 station (4 Jun 2021 from 13:00 to 19:30) | CC | 0.91 | 0.91 | 0.98 | 0.96 |
NSE | 0.24 | 0.70 | 0.95 | 0.88 | ||
PWE (m) | 0.29 | −0.21 | 0.02 | −0.01 | ||
2 | Storm at the Kangle 2 station (4 Jun 2021 from 13:00 to 1700) | CC | 0.91 | 0.88 | 0.95 | 0.90 |
NSE | 0.33 | 0.18 | 0.84 | 0.48 | ||
PWE (m) | 0.29 | −0.01 | 0.27 | 0.38 | ||
3 | Storm at the Yucheng 8 station (22 Jul 2019 from 14:00 to 18:00) | CC | 0.99 | 0.97 | 0.99 | 0.99 |
NSE | 0.89 | 0.92 | 0.99 | 0.97 | ||
PWE (m) | 0.16 | −0.46 | −0.09 | 0.02 |
Validation of the proposed model with independent data
Training-validation-test data splitting ratio . | Lead-time-averaged RMSE (m) . | Lead-time-averaged NSE . | ||||
---|---|---|---|---|---|---|
Training . | Validation . | Test . | Training . | Validation . | Test . | |
SP1 (60:20:20) | 0.105 | 0.117 | 0.152 | 0.928 | 0.910 | 0.813 |
SP2 (80:10:10) | 0.106 | 0.161 | 0.148 | 0.929 | 0.798 | 0.818 |
SP3 (50:20:30) | 0.091 | 0.180 | 0.158 | 0.941 | 0.820 | 0.826 |
Training-validation-test data splitting ratio . | Lead-time-averaged RMSE (m) . | Lead-time-averaged NSE . | ||||
---|---|---|---|---|---|---|
Training . | Validation . | Test . | Training . | Validation . | Test . | |
SP1 (60:20:20) | 0.105 | 0.117 | 0.152 | 0.928 | 0.910 | 0.813 |
SP2 (80:10:10) | 0.106 | 0.161 | 0.148 | 0.929 | 0.798 | 0.818 |
SP3 (50:20:30) | 0.091 | 0.180 | 0.158 | 0.941 | 0.820 | 0.826 |
Table 9 lists the performances of the proposed model with three ratios of SPs in terms of the lead-time-averaged RMSE and NSE. The results indicated that the proposed model using SP1 yielded the best overall lead-time-averaged NSE performance, with values of 0.93, 0.91 and 0.81 for the model training, validation and test phases, respectively. Concerning the results of the lead-time-averaged RMSE, the proposed model using SP1 also produced superior overall performance, with values of 0.11, 0.12, and 0.15 m for the model training, validation, and test phases, respectively. Accordingly, the proposed model produced excellent overall performance when using the SP1 dataset with a ratio of 60:20:20.
Performance evaluation of the proposed ensemble model
Phase . | Indicators (lead-time-averaged) . | KNR . | SVR . | CGBR . | Proposed . |
---|---|---|---|---|---|
Training | RMSE (m) | 0.260 | 0.203 | 0.205 | 0.105 |
NSE | 0.549 | 0.709 | 0.713 | 0.928 | |
Validation | RMSE (m) | 0.317 | 0.223 | 0.261 | 0.117 |
NSE | 0.332 | 0.639 | 0.538 | 0.910 | |
Test | RMSE (m) | 0.321 | 0.287 | 0.250 | 0.152 |
NSE | 0.179 | 0.313 | 0.495 | 0.813 |
Phase . | Indicators (lead-time-averaged) . | KNR . | SVR . | CGBR . | Proposed . |
---|---|---|---|---|---|
Training | RMSE (m) | 0.260 | 0.203 | 0.205 | 0.105 |
NSE | 0.549 | 0.709 | 0.713 | 0.928 | |
Validation | RMSE (m) | 0.317 | 0.223 | 0.261 | 0.117 |
NSE | 0.332 | 0.639 | 0.538 | 0.910 | |
Test | RMSE (m) | 0.321 | 0.287 | 0.250 | 0.152 |
NSE | 0.179 | 0.313 | 0.495 | 0.813 |
Comparisons with previous studies
In this study, the proposed model was trained using outputs predicted from three MOD models for forecasting storm-water levels with lead times of 10–60 min. The well-trained proposed model was then applied to simulate storm-water levels at the Bailing 1 station. The results indicated that the proposed model with SP1 achieved superior prediction resolutions based on the overall lead-time-averaged performance. To strengthen the importance of the present study, Table 11 lists the differences between the present study and the previous studies for urban flood simulation. First, the previous studies by Yang & Chang (2020) and Xu et al. (2023) employed the SWMM model to generate the dataset required for model training, whereas the present study only used the observed data without any data created from the SWMM. Regarding the time scale, the previous studies by Yang & Chang (2020) and Xu et al. (2023) investigated urban floods on an hourly time scale, whereas Moon et al. (2023) and the present study conducted research on the minute time scale of predictions. Although all the studies presented in Table 11 applied different models, all the studies reported that the adopted models achieved overall reasonable performances in terms of NSE values. However, the proposed model still achieved superior performance compared to other models, even when adopting the least amount of data. Therefore, it has been demonstrated that the proposed model shows promise, robustness, and applicability in predicting storm-water levels.
Concepts of comparisons . | Yang & Chang (2020) . | Xu et al. (2023) . | Moon et al. (2023) . | Present study (2024) . |
---|---|---|---|---|
Study area | Erren River basin, Taiwan | The Haidian Island, China | The Dorim stream, South Korea. | Seven water level monitoring stations, Taipei City, Taiwan |
Study hydrology issue | Forecasting regional flood inundation depth | Prediction of urban flood depth | Forecasting urban flood water level | Short-duration prediction of urban storm-water levels |
Hydrologic data | 631 hourly datasets, including measured data and simulated data by SOBEK model | 490 hourly datasets, including seven return period simulations by SWMM | Over 21 years (2001–2021) 10-min collected dataset | Near 6 years (2016–2021) 10-min datasets |
Methodology | The recurrent nonlinear autoregressive with exogenous inputs model (RNARX) | LightGBM, RFR, XGBoost, and KNR | LSTM combined with SWMM | The three MOD models (KNR, SVR and CGBR) and a proposed ensemble-interpretable-based model |
Conclusions | RNARX model achieved reliable forecasts for future 3 h with an NSE value of 0.84 | LightGBM model obtained a better NSE performance of 0.98 compared to other three models | LSTM model yielded good performance in terms of NSE, with a value greater than 0.8 | The proposed model achieved high accuracy in terms of NSE, with values of 0.93, 0.91, and 0.81 in the training, validation, and test stages, respectively |
Concepts of comparisons . | Yang & Chang (2020) . | Xu et al. (2023) . | Moon et al. (2023) . | Present study (2024) . |
---|---|---|---|---|
Study area | Erren River basin, Taiwan | The Haidian Island, China | The Dorim stream, South Korea. | Seven water level monitoring stations, Taipei City, Taiwan |
Study hydrology issue | Forecasting regional flood inundation depth | Prediction of urban flood depth | Forecasting urban flood water level | Short-duration prediction of urban storm-water levels |
Hydrologic data | 631 hourly datasets, including measured data and simulated data by SOBEK model | 490 hourly datasets, including seven return period simulations by SWMM | Over 21 years (2001–2021) 10-min collected dataset | Near 6 years (2016–2021) 10-min datasets |
Methodology | The recurrent nonlinear autoregressive with exogenous inputs model (RNARX) | LightGBM, RFR, XGBoost, and KNR | LSTM combined with SWMM | The three MOD models (KNR, SVR and CGBR) and a proposed ensemble-interpretable-based model |
Conclusions | RNARX model achieved reliable forecasts for future 3 h with an NSE value of 0.84 | LightGBM model obtained a better NSE performance of 0.98 compared to other three models | LSTM model yielded good performance in terms of NSE, with a value greater than 0.8 | The proposed model achieved high accuracy in terms of NSE, with values of 0.93, 0.91, and 0.81 in the training, validation, and test stages, respectively |
Suggestions for future application of the proposed ensemble model
For time series prediction using DD models, two major factors could affect the prediction performance: the quality of the dataset and the model algorithm. To explore the influence of the dataset on the predictions, this study examined three different ratios of SPs by using the proposed model, which indicated that the SP1 (60:20:20) was accurate and suitable for the present study area. However, the results may vary depending on the application area. Therefore, it is suggested that the data division test should be conducted first before model validation to ensure the accuracy of prediction.
In this study, the proposed model is based mainly on the framework of ensemble learning. By combining several DD models, ensemble learning aims to improve the prediction performance for a single target. However, if there are many target stations, the method needs to be modified or improved. For prediction purposes involving multiple different tasks, a meta-learning method can further adapt individual models to become more robust. This study employed three MOD models as the base learners, and the CGBR model was selected as the meta-learner, leading to the proposed integrated model. On the basis of the present results, the proposed model can still improve the prediction performance when using only three base learners. Future research could consider additional base learners to increase model diversity, which may enhance model performance.
Furthermore, the BO and SHAP methods presented in this study are effective and useful methods for identifying the optimal parameters and exploring the feature importance of the model. The results suggested that the inputs related to future rainfall significantly contributed to the predictions. Therefore, accurately forecasting future rainfall is crucial for establishing an operational sewer flooding forecasting system.
CONCLUSION
This study proposed three MOD models, KNR, SVR, and CGBR, for urban storm-water sewer flooding predictions with lead times up to 60 min. Three MOD models were further extended into a new ensemble model with residual-error correction through a multiple-output framework. The performances of the three MOD models and the new ensemble model were evaluated at urban storm-water gauging sites in Taipei city, Taiwan. In addition, the interpretability of applying the SHAP method in the proposed model for storm-water level predictions was investigated. Moreover, the validation of the proposed model with independent data was conducted based on three different ratios of training-validation-test data splitting.
The most important findings in this study are summarized as follows: (1) The SVR and CGBR models yielded reasonable overall performances at seven stations compared to those of the KNR and SWMM models. (2) Compared to the single KNR, SVR, and CGBR models, the proposed new ensemble-based sewer flooding model has superior performance, with RRMSE improvements of 63.1, 47.6, and 55.2%, respectively, and INSE improvements of 174.5, 42.4, and 69.4%, respectively. (3) Based on the comparisons of the present study with previous studies, the proposed integrated model was found to be an effective and precise method for predicting short-duration storm-water levels in urban cities without requiring additional synthesis datasets from PP models.
The lead time considered in this study was limited to 60 min. To enhance the flood disaster response, extending the lead time beyond 60 min may be necessary. Future studies will investigate extending the lead time while maintaining the prediction accuracy. In addition, the proposed models are primarily based on training individual stations. Including feature factors from neighboring stations may affect the prediction results of the target station. Recently, the graph convolutional network (GCN) has been successfully applied as a robust and advanced technique for traffic forecasting. This technique utilizes spatial-temporal time series data to train graph networks and can be applied to hydrology problems. Therefore, our future research will utilize GCN techniques to explore the potential for predicting longer-duration water levels in urban storm-water sewers.
ACKNOWLEDGEMENTS
The authors thank the Hydraulic Engineering Office, Public Works Department, Taipei City Government, Taiwan, for providing the rainfall and water level data at the study stations.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.