Predicting water levels in urban storm-water sewer systems is a critical study that could provide vital information to help reduce the risk of flooding. This study proposed a new ensemble model based on the integration of a meta-learner model, residual-error corrections, and a multiple-output framework. To achieve the meta-learner model, three multiple-output data-driven-based (MOD) sewer flooding models employing support vector regression (SVR), k-nearest neighbor regression (KNR), and categorical gradient boosting regression (CGBR) techniques were constructed and applied to predict the short-duration evolution of water levels at seven storm-water gauging sites in Taipei city, Taiwan, considering 10-min datasets spanning nearly 6 years (2016–2021). The Bayesian optimization algorithm was utilized in the training phases for all the models to avoid overfitting or underfitting. Enhancing the analysis of feature importance was also conducted to explore model interpretability based on the SHapley Additive exPlanation (SHAP) algorithm. The outputs of storm-water management model (SWMM) were used as benchmark solutions. For the model validation phase, the proposed integrated model improved the lead-time-averaged Nash–Sutcliffe efficiency of single KNR, SVR, and CGBR models by 174.5, 42.4, and 69.4%, respectively, showing that the proposed accurate model could be useful for urban flood warning systems.

  • A new ensemble model based on the integration of meta-learning, residual-error correction, and multiple-output framework was proposed.

  • The proposed model improved the lead-time-averaged Nash–Sutcliffe efficiency of single KNR, SVR, and CGBR models by 174.5, 42.4, and 69.4%, respectively.

  • The proposed model was found to be an effective and precise method for predicting short-duration storm-water levels.

Flooding is one of the most common natural hazards worldwide. Between 2000 and 2019, floods affected 1.6 billion people, resulting in an estimated economic loss of $651 billion (USD). The frequency, impact, duration, and intensity of floods are increasing due to rapid economic growth in urban areas and the global climate (Devitt et al. 2023). In Taiwan, the frequency of short-duration heavy rainfall events has significantly increased. Urban storm-water drainage systems are usually unable to meet the needs of existing and future urban development. When rainwater exceeds the design capacity of urban storm-water sewer systems during the rainy or typhoon season, the main sewers can overflow, causing further surface flooding in the city. In 2016, Typhoon Megi caused severe damage to areas such as Tainan and Kaohsiung, resulting in severe flooding and agricultural losses of NTD 3.3 billion. In 2017, Typhoons Nesat and Haitang severely damaged Tainan city and Pingtung County and caused severe flooding. On 22 July 2019, between 15:00 and 18:00, a strong convection brought severe short-duration heavy rainfall to areas in Taipei city and New Taipei city, causing 248 flood disasters and affecting driving safety. On 4 June 2021, Taiwan was affected by the approaching front and peripheral circulation of Typhoon Choi-wan, which caused short-duration heavy rainfall in many areas of Taipei city and New Taipei city. The observational data showed that the maximum hourly rainfall was 137.5 mm at the Fuzhou station in the Daan district of Taipei city. Therefore, the impact of flooding in urban areas on the safety of people and on property is considerable.

To reduce disaster losses due to urban flooding, numerical simulation analysis of urban flooding is one of the most essential approaches to nonengineering disaster mitigation. With regard to dynamic urban flood simulation, several physical process-based (PP) models have been developed and applied in practical areas, such as the SOBEK model (Poretti & De Amicis 2011), the DHI MIKE model (Nigussie & Altunkaynak 2019; Zhou et al. 2022), the two-dimensional flood routing model (FLO-2D) (Wang et al. 2022), the storm-water management model (SWMM) (Farina et al. 2023; Zhuang et al. 2023), and the Hydrologic Engineering Center's River Analysis System (HEC-RAS) 2D flood model (Yazdi et al. 2022; Shaikh et al. 2023). Generally, these PP models can achieve relatively accurate performances because they consider hydrological processes. However, the PP models usually employ some assumed conditions to simplify the governing equations. Therefore, the simulation results may not be entirely consistent with the actual situation. In addition, some field-measured data may be difficult to obtain, which could make it more complicated to implement data preprocessing and parameter calibrations of the model. Moreover, PP models are time-consuming and thus inefficient for real-time early flood warning and flood operation.

Compared to PP models, the data-driven (DD) model proposed in recent years overcomes the assumptions and restrictions of PP models. The DD model also does not require hydrology or geographical parameters for model calibrations. The DD model only requires time series data to implement model training, validation, and testing, leading to fast prediction processing. Several DD models have been applied to river flooding simulations, such as river stage predictions and rainfall–runoff simulations, which play essential roles in preventing and mitigating flood disasters. For instance, Badrzadeh et al. (2015) proposed and compared four DD models, namely, artificial neural networks (ANNs), adaptive neuro-fuzzy inference systems (ANFISs), wavelet neural networks, and hybrid wavelet-based models, for flood forecasting with lead times of 1–48 h at the Richmond River, Australia. Their results indicated that the hybrid model produced the best performance compared to the ANFIS and ANN models. Wang et al. (2019) proposed a dilated causal convolutional neural network (CNN) model for river stage forecasting in the Yilan River basin in Taiwan. The proposed model was also compared with support vector regression (SVR) and multilayer perceptron regression (MLPR) models, which achieved the best prediction performance for lead times of 1–6 h. Dazzi et al. (2021) employed SVR, MLPR, and long short-term memory (LSTM) models to forecast the river stage with 1–9 h lead times in the Parma River, Italy. The results suggested that the LSTM model is the most accurate in their study area and can be helpful for the development of a flood operational forecasting system. Guo et al. (2021) proposed and employed four DD models, namely, SVR, random forest regression (RFR), MLPR, and light gradient boosting machine (LightGBM), to predict the river stage at the tidal reach of the Lan-Yang River Basin, Taiwan. Based on their simulated results with 1–6 h lead times, the LightGBM model achieved the best overall performance among the four tested models. Belyakova et al. (2022) employed three DD models, namely, the M5P decision tree model, extreme gradient boosting (XGBoost), and MLPR model, to predict river stage at lead times of 1–20 h for the Pshish and Mzymta Rivers in the western Caucasus. Their results revealed that the most suitable models for the Pshish and Mzymta Rivers were XGBoost and MLPR. Kumar et al. (2023) conducted a comparative analysis of several DD models for predicting the river streamflow of the Garudeshwar watershed, India. They presented several models, including k-nearest neighbor regression (KNR), RFR, LightGBM, linear regression, MLPR, XGBoost, and categorical gradient boosting regression (CGBR). Their results revealed that the CGBR model achieved the best performance. Vizi et al. (2023) employed the LSTM model for water level prediction on the Tisza River, Central Europe. They found that the LSTM model provided accurate and reasonable 7-day-ahead forecasts.

Regarding the application of DD models to urban inundation simulation, the datasets for the observational inundation depth and area in the urban study area are relatively smaller than those for the river stage or discharge. Therefore, to enhance the performance, further advanced studies could combine the DD and PP models for urban flood simulations. When the dataset in the study urban area is limited, the PP model can be employed to simulate flooding and create the required dataset. The generated dataset is then combined with the DD model to learn the nonlinear relationship between inputs and outputs. Accordingly, the hybrid model could achieve accurate performance consistent with that of the PP model and save more significant investment in computational time. For example, Lin et al. (2013) proposed an urban flood prediction model by combining the k-means clustering algorithm and the SVR for forecasting flood levels with 1–3-h lead times in Yunlin County, Taiwan. Their results indicated that the proposed model, combined with point and spatial expansion forecasting, can quickly and accurately simulate both flood depth and inundation area. Yan et al. (2018) investigated the integration of the SVR with the numerical MIKE flood model for simulating urban floods in the Jinlong River Basin, Hangzhou, China. Berkhahn et al. (2019) employed a 2D hydrodynamic model to produce a dataset for training and testing an ANN model. Their model was successfully applied to predict flood maps, including the maximum floodwater levels, in two distinct urban catchments. Bermúdez et al. (2019) presented the application of the SVR model to the spatial prediction of urban floods in the urban area of Vilagarcia de Arousa, Northwest Spain. The results demonstrated that the SVR model produced reasonable and appropriate predictions for flood hazard mapping in coastal urban areas. Kabir et al. (2020) proposed a rapid prediction model for fluvial flood inundation. Their proposed model combined the CNN model with the hydraulic LISFLOOD-FP model. They found that the CNN model had an outstanding ability to model the floodwater depth in real time. Kim & Han (2020) employed the nonlinear autoregressive with exogenous inputs (NARX) model in conjunction with a self-organizing map to propose a rapid simulation model for urban flood prediction. Their proposed model successfully predicted the accumulative overflow at all manhole positions in various drainage districts in Seoul within a computational time of 2 min. Yan et al. (2021) proposed a novel neural network-based DD model coupled with the personal computer-based SWMM to predict the maximum flood depth at stations with multiple risks. Using the weighted redistribution approach, Hou et al. (2021) employed the RFR and KNR models to propose an integrated DD model for simulating urban flooding. Their results showed that the coupling DD model can enhance the prediction accuracy in comparison to the single RFR and KNR models. Moon et al. (2023) employed the SWMM model to simulate urban flooding and generated the dataset for training the LSTM model. The results indicated that the proposed hybrid model achieved a remarkable prediction performance, with a Nash–Sutcliffe efficiency (NSE) of over 0.8 for the case study in the Dorim Stream Basin, Korea. Xu et al. (2023) proposed a rapid urban flood prediction model based on the combination of SWMM with the LightGBM model. The results of their application to Taidian Island, Hainan Province, China, indicated that the LightGBM model achieved superior performance to the RFR, XGBoost, and KNR models.

According to the literature reviewed, the hybrid model was found to be the most commonly used method for predicting the urban flooding. First, well-calibrated PP models, such as the MIKE or SOBEK models, were adopted to create the required dataset. Then, sufficient data were provided for the DD models during the model training and testing periods. Therefore, the hybrid model not only achieves the proper and satisfactory prediction performance when compared to the PP model but can also satisfy the requirement of fast real-time prediction. However, the quality of the dataset could affect the prediction performance of the models. The data created by the PP models will differ from the actual observational data. More recently, sensors based on Internet of Things (IoT) technology have been increasingly employed to monitor flood detection (Sood et al. 2018; Goudarzi et al. 2021). The IoT sensor data can be utilized to produce a large dataset for the application of DD models. Yang & Chang (2020) combined IoT sensor data with the simulated results obtained from the SOBEK inundation model for predicting the regional average inundation depth in the Erren River basin, Taiwan. Their results suggested that adding IoT sensor data to the inputs of DD models can significantly reduce prediction errors. Therefore, this information provides promising evidence that the application of IoT sensor data can substantially improve the performance of DD models. Since the amount of IoT sensor data in Taipei City, Taiwan, is increasing annually, this study aims to utilize IoT sensor data to train and construct a DD model without considering the dataset generated by the PP model.

Therefore, this study proposed a new ensemble model based on the combination of three multiple-output data-driven (MOD) models with residual-error correction for short-duration predictions of water levels in urban storm-water sewer systems. Three MOD models, KNR, SVR, and CGBR, were applied to predict the storm-water levels at seven stations in Taipei city. The performances of the three MOD models were evaluated and analyzed using several evaluation criteria, including the correlation coefficient (CC), mean absolute error (MAE), root mean square error (RMSE), NSE, peak water level error (PWE), generalization ability (GA), reduction in the percentage error of the lead-time-averaged RMSE (RRMSE), and improvement in the percentage of error of the lead-time-averaged NSE (INSE) (Getirana et al. 2020; Guo et al. 2023). To further understand the interpretability of the model, the SHapley Additive exPlanation (SHAP) algorithm was utilized to investigate the importance of both global and local features. A sensitivity analysis was also conducted to highlight the applicability of the models by examining the impact of Bayesian optimization (BO) and input combinations on the prediction performance. Moreover, the prediction results obtained by the SWMM were compared with those achieved by the three MOD models for the selection of the most appropriate model. The most appropriate sewer flooding model was further employed, associated with the residual-error time series dataset for training and constructing a new ensemble model. To evaluate the capability of the proposed new ensemble model, analyses of the model validation with independent data were performed and studied. Comparisons of the results of the present study with those of other previous studies were presented to show the performance, potential, and reliability of the proposed model.

Although the techniques of three individual MOD models are not new, the model diversity from three MOD models can provide the proper and appropriate architecture for constructing a new ensemble model. By coupling three MOD models with residual-error correction, the present study proposed a new ensemble-based multiple-output model. Hence, the novelty of the present study is the new application of the proposed new model to urban storm-water level prediction with hydrological challenges (i.e., short-duration conditions, limited datasets, and extreme storm events), which provides a great opportunity to test the potential of the model. Consequently, the importance and originality of this study are as follows: (1) it investigated the performances of three MOD models and a newly proposed ensemble model for predicting short-duration water levels with lead times of 10–60 min at urban storm-water sewers; (2) this study contributed to a deeper investigation of feature importance and model interpretability for urban flooding using the SHAP algorithm; and (3) this is the first study to explain the difference between MOD-based sewer flooding models and the SWMM model for enhancing the appropriate advantages of the proposed models.

With respect to disaster prevention and maintenance management, obtaining real-time monitoring data from storm-water sewer systems enables the most accurate monitoring of current storm-water runoff situations in sewers. Further analysis of real-time monitoring data can offer insights into the state of urban storm-water sewers, which could be valuable for early flood warnings.

This study focused on four administrative districts in Taipei city, namely, Beitou, Shilin, Neihu, and Xinyi, as shown in Figure 1. Seven water level monitoring stations, namely, Bailing 1, Shuangsi right-4A, Shezih 2, Dihua 2, Kangle 1, Kangle 2, and Yucheng 8, were further selected as the targets of the storm-water gauged sites. During typhoons or storms, the rainfall often exceeds the designed capacity at these vulnerable stations, thus leading to urban flooding.
Figure 1

The map of the urban area studied, showing the rainfall stations, water level gauging stations, and storm sewer systems.

Figure 1

The map of the urban area studied, showing the rainfall stations, water level gauging stations, and storm sewer systems.

Close modal

The Hydraulic Engineering Office, Public Works Department, Taipei City Government, Taiwan, provided the monitoring data. The real-time data were obtained by pressure-type water level sensors monitored from 2016 to 2021. The hydrograph of the storm-water level may change significantly during typhoon or storm periods. Therefore, this study selected several major events to produce an event-based 10-min dataset for model training, validation, and testing. Table 1 summarizes the detailed information for these seven stations, including the area of the drainage region, the number of datasets, and the storm-water level statistics. Among the seven stations, the Yucheng 8 station has the largest drainage region of 1,623 ha, and its range of storm-water levels varies from 0.84 to 5.4 m. In addition, the Bailing 1 station has the most significant dataset of 1,385, and its range of storm-water levels is from 0.6 to 2.76 m. Furthermore, rainfall is the primary and direct factor influencing storm-water level hydrographs in storm-water sewer systems. Therefore, this study also collected 10-min rainfall data from four rainfall stations, Jiyan, Fude, Donghu Elementary School, and Civic Center, as shown in Figure 1.

Table 1

The summarized information related to the study stations

DistrictStationsName of drainage regionArea of drainage region (ha)Number of datasetsNumber of eventsCharacteristics of storm-water level (m)
MinimumAverageMaximum
Beitou Bailing 1 Bailing 638.74 1,385 30 0.6 1.28 2.76 
Shilin Shuangsi right-4A Wunchang 56.26 703 19 1.22 2.32 4.55 
Shezih 2 Shezih 67.95 305 13 −0.13 0.57 1.95 
Dihua 2 Dihua 170.47 808 39 0.52 0.97 1.87 
Neihu Kangle 1 Kangle 189.91 341 15 5.45 5.79 7.36 
Kangle 2 Kangle 189.91 421 24 6.21 7.16 8.46 
 Xinyi Yucheng 8 Yucheng 1,623.01 664 15 0.84 2.20 5.40 
DistrictStationsName of drainage regionArea of drainage region (ha)Number of datasetsNumber of eventsCharacteristics of storm-water level (m)
MinimumAverageMaximum
Beitou Bailing 1 Bailing 638.74 1,385 30 0.6 1.28 2.76 
Shilin Shuangsi right-4A Wunchang 56.26 703 19 1.22 2.32 4.55 
Shezih 2 Shezih 67.95 305 13 −0.13 0.57 1.95 
Dihua 2 Dihua 170.47 808 39 0.52 0.97 1.87 
Neihu Kangle 1 Kangle 189.91 341 15 5.45 5.79 7.36 
Kangle 2 Kangle 189.91 421 24 6.21 7.16 8.46 
 Xinyi Yucheng 8 Yucheng 1,623.01 664 15 0.84 2.20 5.40 

SWMM model

Urban flooding areas are usually affected by the functions of urban drainage facilities. Rainfall surface runoff is the primary source of water entering drainage systems. Rainwater flows into storm-water sewers via streets and gutters and is transported to water gates or pumping stations. It is then discharged into the adjacent drainage system, river, or sea. Hence, urban flood simulations should consider all the dynamic processes in urban storm-water drainage systems, including rainfall-runoff, storm-water sewer systems, water gates, and pumping station operations, to improve the prediction performance.

The SWMM was proposed by the United States Environmental Protection Agency (EPA). The SWMM model is widely used in storm-water drainage system planning, analysis, and design (Farina et al. 2023; Zhuang et al. 2023). The model is primarily based on one-dimensional (1D) continuous equations and dynamic flow theory and comprises the surface runoff (RUNOFF) and extended transport (EXTRAN) modules.

After rainfall falls on the ground, storm-water flows into the main drainage pipes, resulting in surface runoff. The SWMM employs the RUNOFF module to simulate each rainfall–runoff hydrograph in a catchment area. For the EXTRAN module, an iterative method is employed to solve the dynamic flow equation and calculate the discharge of the central storm-water drainage system and the water volume overflowing the manhole. The 1D Saint–Venant equation can be expressed by (Farina et al. 2023):
(1)
(2)
in which Q denotes the discharge; x is the spatial coordinate along the direction of flow; V is the cross-sectional averaging velocity; y is the water depth; t stands for the time; g represents the acceleration of gravity; n denotes the Manning roughness coefficient; stands for the bed slope term; and is the energy slope term. Equations (1) and (2) represent the continuous and momentum equations, respectively, for 1D gradually varying unsteady flow under the assumption without lateral inflows.

With respect to the simulation analysis using the SWMM, it is essential to gather diverse data, including the drainage conditions of regional storm-water sewer systems, the urban planning subzone of land use, and precipitation type data. Therefore, the SWMM model inputs include rainfall time series data, digital elevation data, pipeline data, manhole data, pumping station data, and drainage system data. The outputs of the SWMM model are the hydrographs of the urban storm-water levels.

To evaluate the performance of DD models, the most common method is to compare simulation results with measured data. In addition, the SWMM, which is a type of PP model, is commonly used as a benchmark model for comparison with DD models, including rainfall–runoff modeling (Wang & Altunkaynak 2012; Granata et al. 2016) and urban sewer flood simulation (Xu et al. 2023). Therefore, the SWMM is one of the most commonly used urban flood prediction models, particularly in Taiwan. Due to the increasing amount of observed data from monitored stations in urban Taiwan areas, the basic dataset required for the SWMM is updated annually. Parameter calibrations and model verifications are also performed in flood-prone urban areas. Moreover, the advantage of using SWMM is that it can be quickly and suitably combined with a 2D flood model to produce operational flood forecasts as well as flood potential maps (Chang et al. 2021). Furthermore, the SWMM could be integrated with cloud computing services to establish a real-time storm sewer simulation system for early flood warning in urban areas (Lin et al. 2023). In this study, based on the advantages mentioned earlier, the SWMM was employed as a benchmark for comparisons with the proposed models.

Proposed MOD models

The multistep time series problem is commonly investigated, in which the given outputs at the next multiple time steps must be predicted. The four commonly used frameworks for performing multistep predictions are direct, recursive, hybrid direct-recursive, and multiple-output frameworks (Ben Taieb et al. 2010; Bontempi et al. 2013). The direct framework employs a separate one-step prediction model for each time step. The recursive framework involves the multiple use of separate one-step models, where the prediction for the previous time step is used as an input to predict the next time step. Direct and recursive frameworks could be combined, resulting in a hybrid methodology to achieve the benefits of both frameworks.

Furthermore, the multiple-output framework creates a single model capable of predicting all outputs in a single time step. The multiple-output framework has a more complex structure than the other three frameworks. However, it can learn the dependence relationships between the inputs and the outputs and among the outputs themselves. Increased complexity can lead to slower training times, but it can enhance the accuracy in the multistep time series prediction problem. The multiple-output framework can be applied to time series problems with multitarget outputs or multistep-ahead prediction. This study utilized the multiple-output framework as the primary methodology to propose three MOD models with multistep-ahead prediction using the KNR, SVR, and CGBR techniques.

By applying the multiple-output framework for predicting urban storm-water levels, the framework of the proposed MOD model can be expressed as follows (Solomatine & Ostfeld 2008; Bontempi et al. 2013):
(3)
where Y is the output vector and X represents the inputs, which are composed of three vectors, expressed by (Guo et al. 2023):
(4)
with
(5)
(6)
(7)
where is the vector of antecedent rainfall, is the vector of future rainfall, is the vector of antecedent storm-water level, L is the lead time, t is the time step and f is the regression function of the model.
By integrating all three vectors of inputs, Equation (4) can be further expressed as follows (Guo et al. 2023):
(8)
In this study, the multiple-output framework is employed in the DD model structure, and thus, the output vectors can be expressed as (Guo et al. 2023):
(9)
in which denotes the water level in storm-water sewer systems at a lead time of t + L.

KNR technique

The KNR model is a straightforward DD model suitable for nonparametric supervised learning. In practical applications, the KNR model utilizes the k-nearest training dataset as model inputs. The model output of the KNR model is based on the average value of the k-closest neighbors.

The algorithm of the KNR model for continuous variable estimation (i.e., the time series prediction problem) can be summarized as follows (Hou et al. 2021; Beskopylny et al. 2022):

  • 1. The Mahalanobis or Euclidean distance between the queried and labeled examples is calculated.

  • 2. The distance is scaled according to the order of the labeled examples.

  • 3. Based on the cross-validation using the RMSE indicator, the optimal number of nearest neighbors k is estimated.

  • 4. The k-nearest multivariate neighbors are utilized to compute the weighted averaging inverse distance.

SVR technique

SVR is a commonly used DD model. The SVR model utilizes the concept of structural risk minimization to express the regression function as follows (Faruq et al. 2021; Masood et al. 2023):
(10)
where f denotes the regression function, X is the input vector, w is the weight vector of the regression function, is a nonlinear mapping function, and b represents the bias value.
By introducing two parameters, the regression function equation can be further rewritten as (Bazrkar & Chu 2022):
(11)
(12)
where i is the dataset, m is the total number of datasets, Y is the output of the prediction target, C denotes the parameter of the penalty coefficient, and represent the slack variables, and represents the parameter of the insensitive loss function.

CGBR technique

The CGBR model employs the gradient boosting framework to solve for categorical features and to avoid model overfitting. In addition, greedy target-based statistics are used to handle categorical features by introducing prior distribution terms. Accordingly, the feature transformation value of the CGBR model can be expressed as (Prokhorenkova et al. 2018):
(13)
where are the transformation feature values, denotes the indicator function, P is the prior value, is the prior weight, n is the number of datasets (j = 1, 2, . . ., m), and yj represents the output value of the jth dataset.

Consequently, the CGBR model has several desirable features. For instance, the CGBR model utilized ordered boosting to overcome model overfitting and applied symmetric trees for faster execution, making it computationally efficient, robust, and accurate for model training (Prokhorenkova et al. 2018).

Proposed new ensemble model

To enhance the appropriate performance of three MOD models (KNR, SVR, and CGBR), this study further employed these three models as the base learner for proposing a new ensemble model. Recently, the stacking ensemble learning has been applied in several different studies, such as reservoir inflow simulation (Zhang et al. 2021), flash flood susceptibility (Yao et al. 2022), and the prediction of customer lifetime (Gadgil et al. 2023). The core idea of stacking ensemble learning comprises two levels of procedures (Mienye & Sun 2022). For the first-level procedure, the individual models, referred to as the base learners, are trained and employed to produce the predictions. In the second-level procedure, a meta-learner is used to combine the predictions made by the base learners.

The higher the model diversity is, the greater the accuracy that can be achieved. This study employed three different MOD models. The KNR and SVR models are traditional and commonly used machine learning methods. However, CGBR is based on gradient boosting, which can reduce the bias error of the model and thus quickly avoid model overfitting. Recently, machine learning methods using gradient boosting, named LightGBM, XGBoost, and CGBR, have been applied to solve several time series problems: particulate matter estimation (Mampitiya et al. 2024), oil formation volume forecasting (Kharazi Esfahani et al. 2023), wind power prediction (Ponkumar et al. 2023), and stock price prediction (Hartanto et al. 2023). The main difference among these three models is the use of the tree growth technique. The LightGBM uses leafwise tree growth to avoid model overfitting, while XGBoost employs levelwise tree growth. Unlike XGBoost and LightGBM, the CGBR model applies a balanced symmetric tree architecture that has the benefits of controlling model overfitting and reducing prediction time. In a previous study by Guo et al. (2023), the CGBR model was found to be suitable and accurate for predicting river flood stages in a steep mountain river basin. Therefore, the CGBR model was also selected in this study to test its reliability, potential and performance in predicting storm-water levels.

In addition to stacking ensemble learning, residual-error correction is also a robust and efficient method for improving prediction performance. According to the study reported by Phan & Nguyen (2020), there are two methods for correcting residual errors. The first method is to train and model the residual error for correcting the time series predictions, referred to herein as the error-modeling-based correction. The alternative method is to use the residual error as the input factor for training and correcting the time series predictions, referred to herein as the error-factor-based correction. Although the residual-error correction is not a new method, this study extended the error-factor-based correction in combination with the multiple-output framework, leading to a new method.

Accordingly, this study followed the stacking ensemble learning (Mienye & Sun 2022) and further employed the residual-error correction (Phan & Nguyen 2020) to propose a new ensemble model:
(14)
where is the output of the storm-water level predicted by the proposed ensemble model, represents the regression function of the meta-learner model, and represents the model inputs. Considering the residual-error time series obtained by subtracting the predicted value from the observed value, the following equations were derived (Phan & Nguyen 2020):
(15)
with
(16)
(17)
in which is the input vector of the storm-water level composed of predictions by three MOD models and represents the time series residual error estimated from three MOD models.
The framework of the proposed model is presented in Figure 2. Due to its integration with a multiple-output framework, the proposed model can predict outputs at multiple lead times simultaneously. For the implementation of the proposed model, a meta-learner is required to conduct model training, validation, and testing. Any DD model can be employed as a meta-learner. If the optimal regression model is used as the meta-learner, there could be an improvement in overall performance. However, there may not be an optimal regression model that is suitable for all regional areas. Therefore, the aim of this study is to explore the most appropriate regression model among the three proposed MOD models. The objective is to determine which model is the most robust and accurate for predicting urban storm-water levels in the present case study area.
Figure 2

The framework of the proposed model representing the inputs, core algorithm, and outputs.

Figure 2

The framework of the proposed model representing the inputs, core algorithm, and outputs.

Close modal

The detailed algorithm of the proposed model is summarized as follows:

  • 1. Three MOD models (base learners) were trained and tested using the first-level procedure with the prepared inputs, expressed in Equations (3)–(9).

  • 2. Three predictions were obtained from the MOD models, and the most appropriate regression model was analyzed based on several indicators.

  • 3. The most appropriate model was selected as the regression function of meta-learning.

  • 4. Based on the second-level procedure expressed in Equations (14)–(17), the time series residual errors were estimated by three MOD models and combined with the input vectors to achieve the final predictions.

Bayesian optimization

The DD models use multiple parameters to train links between inputs and outputs. The various parameters used in different DD models considerably affect their predictive capabilities. Thus, determining the optimal parameters for model training is essential when processing using DD models. Several optimization methods exist, such as grid search, random search, and BO, to identify the best parameters (Bai et al. 2021; Yin et al. 2023).

A grid search is a simple method that evaluates all possible combinations of hyperparameters. Therefore, this method results in extensive computation times for practical application. In addition, the random search is more efficient because it involves randomly selecting combinations of parameters. Furthermore, the BO employs two main core techniques, namely, probabilistic modeling and the use of an acquisition function, to achieve several beneficial results, including algorithmic effectiveness, random evaluation handling, and global optimization across all the data.

The BO process can be summarized as follows.

  • 1. Initially, a few sets of arbitrary combinations of hyperparameters are selected for evaluation. Additionally, the fitting of the probabilistic model is performed, and the performance metric is estimated.

  • 2. Determination of the subsequent hyperparameter combination for evaluation utilizing the acquisition function. Then, the hyperparameter combination is adopted to train the DD model.

  • 3. Combining the new evaluation into the dataset, the probabilistic model is reconstructed, and the acquisition function is given.

  • 4. The model is updated based on the convergence criterion.

SHAP technique

The DD model is frequently considered a ‘black box’ that is quite challenging to further explain, making it difficult to identify the rationale behind the decision or prediction. Therefore, researchers continue to propose innovative techniques to improve model interpretability to better understand various problems. The SHAP method, proposed by Lundberg & Lee (2017), is suitable for explaining global and individual predictions. According to the optimal Shapley value, the explanation model can be expressed as (Lundberg & Lee 2017; Song et al. 2023):
(18)
in which G denotes the explanation model, is the simplified feature or the coalition vector, M denotes the maximum coalition size, represents the Shapley value of the feature attribution for a feature j, and stands for the contribution without any inputs.

Regarding the implementation in computing the SHAP, the KernelSHAP and TreeSHAP are promising and suitable algorithms. Both algorithms can address global interpretation, local interpretation, and visualization. Compared to KernelSHAP, the TreeSHAP algorithm explicitly utilizes a tree-based model to handle the SHAP values and decrease the computational complexity for large datasets. Consequently, TreeSHAP has the advantage of fast implementation for enhancing model interpretability; therefore, it was utilized in this study.

Performance evaluation metrics

This study employed eight performance evaluation metrics to analyze the prediction accuracy of the models. The indicators include the CC, MAE, RMSE, NSE, PWE, GA, RRMSE, and INSE, which can be expressed as follows (Unnikrishnan & Jothiprakash 2018; Getirana et al. 2020; Sezen & Partal 2022; Guo et al. 2023; Piadeh et al. 2023):
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
where and represent the measured and predicted storm-water levels, respectively; and denote the means of the measured and predicted storm-water levels in the sewer system, respectively; and are the measured and predicted values of the peak storm-water level, respectively; and are the lead-time-averaged RMSE and NSE obtained by the single MOD models (i.e., KNR, SVR, or CGBR); and and represent the lead-time-averaged RMSE and NSE obtained by the proposed model, respectively.

The predicted result will be better if the CC or NSE value is close to 1. In addition, if the RMSE or MAE approaches 0, the prediction outcome improves. The PWE indicator is employed to explain the ability of the model to predict the peak storm-water level of a sewer system. The smaller the PWE is, the better the prediction of the peak storm-water level. The GA is a helpful indicator for assessing the GA of the DD model. As the GA approaches 1, the model exhibits the optimal learning performance. However, in practical applications, if the GA is smaller than 1, the model is underfit, whereas if the GA is greater than 1, the model is overfit. The performance of the proposed model also depends on the lead time in the prediction. The overall performance of the proposed model can be assessed effectively and appropriately by taking the average of all the prediction results over different lead times (Getirana et al. 2020). Therefore, this study conducted a model evaluation following the method presented by Guo et al. (2023) and used the RRMSE and INSE indicators to assess the improvement percentage of the proposed model compared to three single MOD models. The more the RRMSE and INSE values increase, the more the model can be improved.

Study flowchart and model implementation

Figure 3 illustrates the methodology presented in this study, which can be summarized into four main parts:
  • (1) Data preprocessing:

Figure 3

The flowchart in the present study representing the four main processes.

Figure 3

The flowchart in the present study representing the four main processes.

Close modal

To propose three MOD models, this study collected related information consisting of precipitation and water levels in urban storm-water sewer systems. The gathered dataset was normalized utilizing the min–max normalization method and divided into training and test datasets based on the widely used ratio of 7:3. Subsequently, the data relevant to the requirements of the SWMM were also collected for the implementation of the PP model.

  • (2) Model development:

By applying the collected data, three MOD models, namely, the KNR, SVR, and CGBR models, were proposed. The sensitivity and interpretability of the models were also investigated. Additionally, the SWMM was employed to simulate the dynamic process of the water level in urban storm-water sewer systems.

  • (3) Performance evaluations:

The intense storm events were selected, simulated, and analyzed using the three MOD models. The predicted results from the SWMM were also compared with those from the MOD models. The different indicators were utilized to evaluate the model performance and determine the most appropriate regression model.

  • (4) Proposing a new ensemble model

The predictions from three MOD models were further selected as the base learners. The most appropriate model was selected as the meta-learner and coupled with the residual-error correction. The final outputs were predicted, and model validation with independent data was performed. Consequently, the improvement in performance achieved by the proposed model was analyzed and discussed.

The proposed models were programmed and implemented in Python 3.7 with Keras and Scikit-Learn open-source libraries. With respect to the PP model, this study employed the SWMM version 4.4 h. The input file of the SWMM 4.4 h model is a text file, which can be created using a standard text editor. The model can be run in the disk operating system (DOS) environment or the PCSWMM window environment interface to perform the SWMM simulation.

Analysis of the influence of BO on predictions

Most researchers have investigated the direct influence of BO on prediction results. However, there needs to be more exploration of performance improvement in terms of proportion. Therefore, this section examines the impact of BO on predictions and analyses the corresponding improvement proportions.

This study selected the Yucheng 8 station as a test case to evaluate the effectiveness of BO treatment in combination with three proposed MOD models. The MOD models without BO treatment were also employed to highlight the benefits of using BO. The parameter sets of the three MOD models without BO were set as ‘default’ in the Python libraries for simulations.

Table 2 lists the results of the optimal parameter sets for the three MOD models. It is noted that the algorithm option in KNR is set to ‘auto,’ meaning that the algorithm attempts to determine the optimal algorithm by evaluating the values inputted into the fit method. In addition, the radial basis function (RBF) is a popular kernel function of SVR. Figure 4 compares the measured and simulated storm-water levels using the CGBR model with and without BO for three different lead times of 10, 30, and 60 min. For a lead time of 10 min, both results using the CGBR model with and without BO are similar to those of the measured storm-water levels. As the lead time increases to 30 and 60 min, the difference between using the CGBR model with and without BO significantly increases.
Table 2

The results of the optimal parameter sets for three MOD models

MOD modelsOptimal parameter sets
KNR Number of neighbors: 10; algorithm: auto 
SVR Kernel: RBF; gamma: 0.02; C = 20; epsilon: 0.01 
CGBR Max depth: 2; number of estimators: 80; learning rate: 0.1 
MOD modelsOptimal parameter sets
KNR Number of neighbors: 10; algorithm: auto 
SVR Kernel: RBF; gamma: 0.02; C = 20; epsilon: 0.01 
CGBR Max depth: 2; number of estimators: 80; learning rate: 0.1 
Figure 4

Comparisons of measured and simulated water levels using the CGBR model with and without BO at the Yucheng 8 station for lead times of (a) 10, (b) 30, and (c) 60 min.

Figure 4

Comparisons of measured and simulated water levels using the CGBR model with and without BO at the Yucheng 8 station for lead times of (a) 10, (b) 30, and (c) 60 min.

Close modal

Table 3 lists the summary results representing the performances of the three MOD models with and without BO. To investigate the overall performance, the results over all lead times were averaged. The lead-time-averaged NSE values obtained from the KNR, SVR, and CGBR models without BO were 0.50, 0.73, and 0.72, respectively. Using BO, the lead-time-averaged NSE values achieved from the KNR, SVR, and CGBR models were 0.55, 0.76, and 0.76, respectively. Therefore, the three MOD models with BO exhibited increases in the lead-time-averaged NSE of 8.94, 4.79, and 5.34%, respectively, compared to those without BO. As shown in Table 3, the three MOD models with BO also reduced the lead-time-averaged RMSE by 4.79, 7.53, and 6.30%, respectively, in contrast to the models without BO. The results demonstrated that the three MOD models with BO achieved overall improvements compared to those without BO.

Table 3

Summary results for the performance of three MOD models with and without BO

ModelsLead timesWithout BO
With BO
CCMAE (m)RMSE (m)NSECCMAE (m)RMSE (m)NSE
KNR 10 0.84 0.25 0.37 0.67 0.87 0.22 0.34 0.72 
20 0.79 0.29 0.42 0.57 0.82 0.27 0.40 0.61 
30 0.74 0.32 0.46 0.49 0.78 0.30 0.43 0.54 
40 0.72 0.33 0.48 0.44 0.76 0.32 0.45 0.49 
50 0.72 0.34 0.49 0.43 0.75 0.33 0.47 0.47 
60 0.72 0.34 0.49 0.43 0.74 0.34 0.48 0.45 
SVR 10 0.96 0.11 0.18 0.92 0.98 0.07 0.13 0.96 
20 0.92 0.16 0.26 0.83 0.93 0.15 0.25 0.85 
30 0.87 0.21 0.33 0.74 0.88 0.20 0.32 0.75 
40 0.83 0.26 0.37 0.66 0.85 0.24 0.36 0.69 
50 0.81 0.28 0.39 0.62 0.84 0.26 0.37 0.67 
60 0.80 0.30 0.41 0.60 0.83 0.27 0.37 0.67 
CGBR 10 0.97 0.10 0.16 0.93 0.97 0.10 0.17 0.93 
20 0.92 0.16 0.26 0.83 0.91 0.16 0.26 0.83 
30 0.86 0.22 0.33 0.73 0.87 0.21 0.32 0.75 
40 0.83 0.26 0.38 0.66 0.85 0.24 0.35 0.71 
50 0.80 0.29 0.40 0.61 0.84 0.25 0.36 0.69 
60 0.79 0.31 0.42 0.58 0.82 0.27 0.38 0.66 
ModelsLead timesWithout BO
With BO
CCMAE (m)RMSE (m)NSECCMAE (m)RMSE (m)NSE
KNR 10 0.84 0.25 0.37 0.67 0.87 0.22 0.34 0.72 
20 0.79 0.29 0.42 0.57 0.82 0.27 0.40 0.61 
30 0.74 0.32 0.46 0.49 0.78 0.30 0.43 0.54 
40 0.72 0.33 0.48 0.44 0.76 0.32 0.45 0.49 
50 0.72 0.34 0.49 0.43 0.75 0.33 0.47 0.47 
60 0.72 0.34 0.49 0.43 0.74 0.34 0.48 0.45 
SVR 10 0.96 0.11 0.18 0.92 0.98 0.07 0.13 0.96 
20 0.92 0.16 0.26 0.83 0.93 0.15 0.25 0.85 
30 0.87 0.21 0.33 0.74 0.88 0.20 0.32 0.75 
40 0.83 0.26 0.37 0.66 0.85 0.24 0.36 0.69 
50 0.81 0.28 0.39 0.62 0.84 0.26 0.37 0.67 
60 0.80 0.30 0.41 0.60 0.83 0.27 0.37 0.67 
CGBR 10 0.97 0.10 0.16 0.93 0.97 0.10 0.17 0.93 
20 0.92 0.16 0.26 0.83 0.91 0.16 0.26 0.83 
30 0.86 0.22 0.33 0.73 0.87 0.21 0.32 0.75 
40 0.83 0.26 0.38 0.66 0.85 0.24 0.35 0.71 
50 0.80 0.29 0.40 0.61 0.84 0.25 0.36 0.69 
60 0.79 0.31 0.42 0.58 0.82 0.27 0.38 0.66 

Sensitivity analysis for feature importance

The prediction performance of the DD models may be influenced by various factors, such as the quality, quantity, and features contained in the dataset. In this section, a sensitivity analysis of the combination of feature factors was conducted to explore the feature importance in water level predictions of urban storm-water sewer systems.

Table 4 lists five sets of input variables concerning the rainfall and storm-water level. The combination of C1 considered only the antecedent storm-water levels from time step t to t − 6. The combination of C2 only employed the antecedent rainfall from time step t to t − 6. Moreover, the test case of C3 combined the antecedent and subsequent rainfall inputs from time steps t − 6 to t + 6. Furthermore, the test case of C4 used a combination of the antecedent storm-water level and rainfall variables at time steps t to t − 6. For C5, all the input features were adopted.

Table 4

The designed scenarios of input combinations consisting of storm-water level and rainfall

CombinationsAntecedent rainfallFuture rainfallAntecedent water levels
C1 – – Ht−6…. Ht−1,…. Ht 
C2 Rt−6…. Rt−1,…. Rt – – 
C3 Rt−6…. Rt−1,…. Rt Rt+1…. Rt+6 – 
C4 Rt−6…. Rt−1,…. Rt – Ht−6…. Ht−1,…. Ht 
C5 Rt−6…. Rt−1,…. Rt Rt+1…. Rt+6 Ht−6…. Ht−1,…. Ht 
CombinationsAntecedent rainfallFuture rainfallAntecedent water levels
C1 – – Ht−6…. Ht−1,…. Ht 
C2 Rt−6…. Rt−1,…. Rt – – 
C3 Rt−6…. Rt−1,…. Rt Rt+1…. Rt+6 – 
C4 Rt−6…. Rt−1,…. Rt – Ht−6…. Ht−1,…. Ht 
C5 Rt−6…. Rt−1,…. Rt Rt+1…. Rt+6 Ht−6…. Ht−1,…. Ht 

Figure 5 compares the measured and simulated storm-water levels using the CGBR model with five combinations of inputs for 10-, 30-, and 60-min lead times. For a lead time of 10 min, the CGBR model incorporating C1, C4, and C5 demonstrated comparable agreement with the measured data. In contrast, the models encompassing C2 and C3 performed relatively poorly. When the lead times increase to 30 and 60 min, the CGBR models with C1, C4, and C5 produce more prediction errors. However, the model with C5 still provided the best resolutions for simulating the storm-water level among the five combinations.
Figure 5

Comparisons of measured and simulated water levels using the CGBR model with five inputs at the Yucheng 8 station for lead times of (a) 10, (b) 30, and (c) 60 min.

Figure 5

Comparisons of measured and simulated water levels using the CGBR model with five inputs at the Yucheng 8 station for lead times of (a) 10, (b) 30, and (c) 60 min.

Close modal

Table 5 lists the performance results of the CGBR model with five different input combinations based on four indicators during the training and test phases. For the testing phase, the lead-time-averaged CCs for the models with C1, C2, C3, C4, and C5 were 0.63, 0.68, 0.78, 0.75, and 0.88, respectively; this suggested that the CGBR model with C5 performed the best, while the model with C1 had the worst performance. In addition, the lead-time-averaged RMSEs for models C1, C2, C3, C4, and C5 are 0.47, 0.57, 0.51, 0.40, and 0.31 m, respectively; this indicates that the predictive error of the CGBR model with C5 is the lowest. Regarding the GA performance, the lead-time-averaged GA values for the CGBR models with C1, C2, C3, C4, and C5 are 1.16, 0.88, 0.85, 1.12, and 1.08, respectively. The results revealed that the CGBR model with C5 has outstanding GA performance. The results also demonstrated that the antecedent storm-water level and future rainfall variables are crucial and helpful for predicting water levels in urban storm-water sewer systems due to their considerable contributions.

Table 5

Performance results of the CGBR model with five different input combinations based on four indicators

CombinationsLead timesTraining
Test
CCMAE (m)RMSE (m)NSECCMAE (m)RMSE (m)NSE
C1 10 0.98 0.12 0.16 0.96 0.95 0.12 0.19 0.91 
20 0.95 0.20 0.28 0.89 0.84 0.22 0.35 0.70 
30 0.90 0.27 0.38 0.80 0.71 0.32 0.47 0.46 
40 0.85 0.33 0.47 0.70 0.56 0.39 0.55 0.25 
50 0.79 0.38 0.54 0.61 0.42 0.45 0.60 0.12 
60 0.74 0.43 0.59 0.53 0.29 0.49 0.64 0.01 
C2 10 0.68 0.51 0.64 0.45 0.77 0.38 0.53 0.30 
20 0.70 0.50 0.62 0.48 0.77 0.38 0.53 0.31 
30 0.69 0.50 0.63 0.47 0.75 0.38 0.54 0.29 
40 0.67 0.51 0.64 0.43 0.69 0.39 0.57 0.21 
50 0.62 0.53 0.68 0.37 0.60 0.42 0.61 0.09 
60 0.56 0.55 0.71 0.31 0.50 0.45 0.65 -0.02 
C3 10 0.71 0.49 0.62 0.49 0.77 0.36 0.52 0.32 
20 0.72 0.49 0.61 0.50 0.78 0.36 0.52 0.34 
30 0.73 0.48 0.60 0.51 0.78 0.36 0.51 0.35 
40 0.73 0.48 0.60 0.51 0.79 0.35 0.51 0.38 
50 0.74 0.47 0.59 0.52 0.78 0.35 0.51 0.37 
60 0.74 0.47 0.59 0.52 0.79 0.34 0.50 0.39 
C4 10 0.99 0.11 0.14 0.97 0.97 0.11 0.17 0.93 
20 0.96 0.17 0.23 0.93 0.90 0.18 0.28 0.81 
30 0.93 0.24 0.33 0.85 0.82 0.25 0.37 0.66 
40 0.88 0.29 0.41 0.77 0.71 0.31 0.47 0.47 
50 0.83 0.34 0.48 0.68 0.60 0.37 0.53 0.32 
60 0.78 0.39 0.54 0.60 0.50 0.42 0.58 0.19 
C5 10 0.99 0.10 0.14 0.97 0.97 0.10 0.17 0.93 
20 0.97 0.16 0.22 0.94 0.91 0.16 0.26 0.83 
30 0.95 0.21 0.28 0.89 0.87 0.21 0.32 0.75 
40 0.93 0.24 0.32 0.86 0.85 0.24 0.35 0.71 
50 0.91 0.27 0.35 0.83 0.84 0.25 0.36 0.69 
60 0.90 0.29 0.38 0.80 0.82 0.27 0.38 0.66 
CombinationsLead timesTraining
Test
CCMAE (m)RMSE (m)NSECCMAE (m)RMSE (m)NSE
C1 10 0.98 0.12 0.16 0.96 0.95 0.12 0.19 0.91 
20 0.95 0.20 0.28 0.89 0.84 0.22 0.35 0.70 
30 0.90 0.27 0.38 0.80 0.71 0.32 0.47 0.46 
40 0.85 0.33 0.47 0.70 0.56 0.39 0.55 0.25 
50 0.79 0.38 0.54 0.61 0.42 0.45 0.60 0.12 
60 0.74 0.43 0.59 0.53 0.29 0.49 0.64 0.01 
C2 10 0.68 0.51 0.64 0.45 0.77 0.38 0.53 0.30 
20 0.70 0.50 0.62 0.48 0.77 0.38 0.53 0.31 
30 0.69 0.50 0.63 0.47 0.75 0.38 0.54 0.29 
40 0.67 0.51 0.64 0.43 0.69 0.39 0.57 0.21 
50 0.62 0.53 0.68 0.37 0.60 0.42 0.61 0.09 
60 0.56 0.55 0.71 0.31 0.50 0.45 0.65 -0.02 
C3 10 0.71 0.49 0.62 0.49 0.77 0.36 0.52 0.32 
20 0.72 0.49 0.61 0.50 0.78 0.36 0.52 0.34 
30 0.73 0.48 0.60 0.51 0.78 0.36 0.51 0.35 
40 0.73 0.48 0.60 0.51 0.79 0.35 0.51 0.38 
50 0.74 0.47 0.59 0.52 0.78 0.35 0.51 0.37 
60 0.74 0.47 0.59 0.52 0.79 0.34 0.50 0.39 
C4 10 0.99 0.11 0.14 0.97 0.97 0.11 0.17 0.93 
20 0.96 0.17 0.23 0.93 0.90 0.18 0.28 0.81 
30 0.93 0.24 0.33 0.85 0.82 0.25 0.37 0.66 
40 0.88 0.29 0.41 0.77 0.71 0.31 0.47 0.47 
50 0.83 0.34 0.48 0.68 0.60 0.37 0.53 0.32 
60 0.78 0.39 0.54 0.60 0.50 0.42 0.58 0.19 
C5 10 0.99 0.10 0.14 0.97 0.97 0.10 0.17 0.93 
20 0.97 0.16 0.22 0.94 0.91 0.16 0.26 0.83 
30 0.95 0.21 0.28 0.89 0.87 0.21 0.32 0.75 
40 0.93 0.24 0.32 0.86 0.85 0.24 0.35 0.71 
50 0.91 0.27 0.35 0.83 0.84 0.25 0.36 0.69 
60 0.90 0.29 0.38 0.80 0.82 0.27 0.38 0.66 

Global and local analysis of model interpretability

DD models are frequently referred to as black-box models because they do not consider the modeling of physical processes. To further understand the importance of features in DD model prediction, the interpretability of the model was assessed and analyzed in this section. Information about the interpretability of the model could be helpful for verifying and improving the proposed model.

Using all datasets at the Yucheng 8 station, Figure 6 shows the SHAP values obtained via the CGBR model considering all feature factors. Each point in Figure 6 depicts an individual data sample, with the y-axis on the right representing its corresponding feature values. The red coloration signifies larger feature values, while the blue coloration denotes small values. In addition, each row represents a feature, and its feature ranking is based on the average absolute value of SHAP. As shown in Figure 6, with the global analysis, the results indicated that the sample of the Ht feature is the most dispersed on the x-axis and has the highest ranking on the y-axis among all features. Therefore, Ht is the most essential feature factor for all lead times. As the lead time increases to 60 min, the influence of other features on the predictions, such as Rt+2, Rt+3, and Rt+4, increases, suggesting that considering future precipitation features will improve positive forecast outcomes.
Figure 6

Global summary plots illustrating the feature importance using the CGBR model with SHAP values at the Yucheng 8 station for lead times of (a) 10 min, (b) 20 min, (c) 40 min, and (d) 60 min.

Figure 6

Global summary plots illustrating the feature importance using the CGBR model with SHAP values at the Yucheng 8 station for lead times of (a) 10 min, (b) 20 min, (c) 40 min, and (d) 60 min.

Close modal
The storm event between 2 pm and 6 pm on 22 July 2019 at the Yucheng 8 station was selected to analyze the local feature interpretability using the CGBR model. Using the data on the peak storm-water level as an example, Figure 7 shows the waterfall plots at four different lead times for the local explanation of the model, in which f(x) represents the prediction value of the peak storm-water level for the 2019 storm event, and E[f(x)] is the average value of all predictions. The findings revealed that the Ht feature positively contributed to the peak storm-water level in the 2019 storm event for all considered lead times. As the lead time increased to 60 min, Rt+2, Rt+3, and Rt+4 made more negative contributions to the peak storm-water level.
Figure 7

Waterfall plots illustrating the local explanation using the CGBR model with SHAP values at the Yucheng 8 station for lead times of (a)10 min, (b) 20 min, (c) 40 min, and (d) 60 min.

Figure 7

Waterfall plots illustrating the local explanation using the CGBR model with SHAP values at the Yucheng 8 station for lead times of (a)10 min, (b) 20 min, (c) 40 min, and (d) 60 min.

Close modal

Based on the global and local analysis results in this section, Ht is the most critical feature for forecasting the storm-water level. With respect to the predictions with longer lead times, considering future rainfall can also substantially contribute to the prediction results. Therefore, for future work related to the development of real-time operational sewer flooding systems, real-time data from water level monitoring are vital for Ht. Furthermore, to achieve appropriate forecasting performance with longer lead times, forecasts of future rainfall are also important components of the operational system.

Performance assessment of seven stations by three MOD models

After conducting a sensitivity analysis, this section evaluated the performances of three MOD models at seven stations. By employing the BO and the input combination of C5, Figure 8 compares the measured and simulated storm-water levels using three MOD models for a lead time of 60 min at seven stations. For the Bailing 1 station, the SVR appears to perform well in addressing peak storm-water levels, but it generated overestimated predictions during certain events. For the Shuangsi right-4A station, three MOD models produced significant prediction errors for the time-to-peak in certain events; however, the overall outcomes from these three MOD models were satisfactory. At the Shezih 2 station, three MOD models simulated the storm-water level evolution well, while the KNR model provided slightly poorer results. For the Dihua 2 station, all models showed good agreement with the measured data, particularly regarding the time-to-peak error. Moreover, there are distinguishable variances between the three MOD models at the Kangle 1 station, indicating that the SVR model exhibited remarkable performance across most time series events. Furthermore, the KNR model performed better at the Kangle 2 and Yucheng 8 stations.
Figure 8

Comparisons of measured and simulated water levels using three DD models for a lead time of 60 min at the (a) Bailing 1, (b) Shuangsi right-4A, (c) Shezih 2, (d) Dihua 2, (e) Kangle 1, (f) Kangle 2, and (g) Yucheng 8 stations.

Figure 8

Comparisons of measured and simulated water levels using three DD models for a lead time of 60 min at the (a) Bailing 1, (b) Shuangsi right-4A, (c) Shezih 2, (d) Dihua 2, (e) Kangle 1, (f) Kangle 2, and (g) Yucheng 8 stations.

Close modal

To further discuss the performance of the three MOD models, Table 6 presents the performance results of the models in predicting 10–60-min lead times at seven gauged sites. The results were evaluated using four commonly used indicators on the test datasets. The results demonstrated that the SVR and CGBR models performed well in terms of the lead-time-averaged CC values, ranging from 0.6 to 0.88. Additionally, both models yielded the lowest lead-time-averaged RMSE values, ranging from 0.14 to 0.36 m. However, it should be noted that the SVR and CGBR models demonstrated poor lead-time-averaged NSE performance at the Shezih 2 station; this can be attributed to the scarcity of the dataset recorded at the Shezih 2 station, resulting in the worst lead-time-averaged GA performance of 1.41. However, based on the average prediction results from all seven stations, the SVR model has an acceptable total average performance with a CC of 0.75, an MAE of 0.2 m, an RMSE of 0.26 m, an NSE of 0.41 and a GA of 1.13; these results are very similar to those of the CGBR model (CC = 0.73, MAE = 0.21 m, RMSE = 0.27 m, NSE = 0.38 and GA = 1.21). Therefore, the SVR and CGBR models are reliable, accurate, and suitable for the overall simulation of storm-water levels at seven stations.

Table 6

Performance results of the three MOD models for lead times of 10–60 min at seven gauged sites

StationsMOD ModelsAveraged values in lead times of 10–60 min based on test datasets
Averaged values in lead times of 10–60 min
CCMAE (m)RMSE (m)NSEGA
Bailing 1 KNR 0.62 0.25 0.31 0.33 1.17 
SVR 0.76 0.19 0.25 0.54 1.18 
CGBR 0.71 0.21 0.26 0.51 1.25 
Shuangsi right-4A KNR 0.58 0.24 0.32 0.27 0.99 
SVR 0.61 0.24 0.31 0.29 1.01 
CGBR 0.57 0.24 0.31 0.31 1.00 
Shezih 2 KNR 0.51 0.22 0.27 0.03 1.17 
SVR 0.76 0.23 0.28 -0.13 1.41 
CGBR 0.71 0.22 0.28 -0.04 1.41 
Dihua 2 KNR 0.66 0.14 0.17 0.35 1.06 
SVR 0.74 0.11 0.14 0.51 1.02 
CGBR 0.74 0.14 0.17 0.33 1.09 
Kangle 1 KNR 0.73 0.21 0.29 0.33 1.16 
SVR 0.84 0.15 0.20 0.67 1.06 
CGBR 0.84 0.16 0.22 0.60 1.09 
Kangle 2 KNR 0.59 0.38 0.46 −0.31 1.50 
SVR 0.67 0.27 0.35 0.24 1.24 
CGBR 0.67 0.29 0.36 0.21 1.57 
Yucheng 8 KNR 0.79 0.30 0.43 0.55 1.05 
SVR 0.88 0.20 0.30 0.76 1.00 
CGBR 0.88 0.21 0.31 0.76 1.08 
StationsMOD ModelsAveraged values in lead times of 10–60 min based on test datasets
Averaged values in lead times of 10–60 min
CCMAE (m)RMSE (m)NSEGA
Bailing 1 KNR 0.62 0.25 0.31 0.33 1.17 
SVR 0.76 0.19 0.25 0.54 1.18 
CGBR 0.71 0.21 0.26 0.51 1.25 
Shuangsi right-4A KNR 0.58 0.24 0.32 0.27 0.99 
SVR 0.61 0.24 0.31 0.29 1.01 
CGBR 0.57 0.24 0.31 0.31 1.00 
Shezih 2 KNR 0.51 0.22 0.27 0.03 1.17 
SVR 0.76 0.23 0.28 -0.13 1.41 
CGBR 0.71 0.22 0.28 -0.04 1.41 
Dihua 2 KNR 0.66 0.14 0.17 0.35 1.06 
SVR 0.74 0.11 0.14 0.51 1.02 
CGBR 0.74 0.14 0.17 0.33 1.09 
Kangle 1 KNR 0.73 0.21 0.29 0.33 1.16 
SVR 0.84 0.15 0.20 0.67 1.06 
CGBR 0.84 0.16 0.22 0.60 1.09 
Kangle 2 KNR 0.59 0.38 0.46 −0.31 1.50 
SVR 0.67 0.27 0.35 0.24 1.24 
CGBR 0.67 0.29 0.36 0.21 1.57 
Yucheng 8 KNR 0.79 0.30 0.43 0.55 1.05 
SVR 0.88 0.20 0.30 0.76 1.00 
CGBR 0.88 0.21 0.31 0.76 1.08 

Comparisons of the simulated results with the SWMM

The presented SWMM was well calibrated in the urban districts of Taipei city, and the related results are shown in Table 7. On the basis of the results of five selected storm events, the PWE can be controlled within 0.5 m, indicating that the SWMM achieved acceptable performance in model calibration.

Table 7

Model calibration of the SWMM for five selected storm events

EventsPeak values of storm-water level
PWE (m)
Measured (m)Simulated (m)
Storm at the Yucheng 9 station (4 Jun 2021 from 13:00 to 19:30) 4.34 4.10 −0.24 
Storm at the Dihua 1 station (4 Jun 2021 from 13:00 to 19:30) 0.95 0.99 0.04 
Storm at the Yucheng 7 station (4 Jun 2021 from 13:00 to 19:30) 6.69 7.19 0.50 
Storm at the Dihua 1 station (22 Jul 2019 from 14:00 to 18:00) 1.26 1.31 0.05 
Storm at the Yucheng 4 station (22 Jul 2019 from 14:00 to 18:00) 4.01 4.33 0.32 
EventsPeak values of storm-water level
PWE (m)
Measured (m)Simulated (m)
Storm at the Yucheng 9 station (4 Jun 2021 from 13:00 to 19:30) 4.34 4.10 −0.24 
Storm at the Dihua 1 station (4 Jun 2021 from 13:00 to 19:30) 0.95 0.99 0.04 
Storm at the Yucheng 7 station (4 Jun 2021 from 13:00 to 19:30) 6.69 7.19 0.50 
Storm at the Dihua 1 station (22 Jul 2019 from 14:00 to 18:00) 1.26 1.31 0.05 
Storm at the Yucheng 4 station (22 Jul 2019 from 14:00 to 18:00) 4.01 4.33 0.32 

Three storm events were selected to compare the three MOD models with the SWMM. Figure 9 compares the measured and simulated storm-water levels using the four models for the three selected events. For the storm at the Shezih 2 station (13:00 ∼ 19:30 on 4 June 2021), the SWMM model overpredicted the peak storm-water levels and underestimated the storm-water level during the recession stage. Both the SVR and CGBR models demonstrated good agreement with the measured data. For the storm event at the Kangle 2 station, the SWMM generated a more significant error in the peak storm-water level. The KNR model achieved adequate performance related to the peak storm-water level, but it obtained more prediction errors associated with the overall hydrographs. For the storm event at the Yucheng 8 station, the SVR and CGBR models performed better overall than the SWMM and KNR models.
Figure 9

Comparisons of measured and simulated water levels using four models for (a) storm No. 1 at the Shezih 2 station, (b) storm No. 2 at the Kangle 2 station, and (c) storm No. 3 at the Yucheng 8 station.

Figure 9

Comparisons of measured and simulated water levels using four models for (a) storm No. 1 at the Shezih 2 station, (b) storm No. 2 at the Kangle 2 station, and (c) storm No. 3 at the Yucheng 8 station.

Close modal

Table 8 presents the performance comparisons of the four models for three selected storm events based on three indicators. All models achieved satisfactory performance in terms of the CC values for three storm events, varying from 0.88 to 0.99. For the NSE performance, all models performed well for the Yucheng 8 station. Moreover, the SVR model yielded the best NSE performance at the Shezih 2 and Kangle 2 stations, whereas the SWMM exhibited the worst NSE performance.

Table 8

Performance comparisons of the four models for three selected storm events based on three indicators

No.EventsIndicatorsModels
SWMMKNRSVRCGBR
Storm at the Shezih 2 station (4 Jun 2021 from 13:00 to 19:30) CC 0.91 0.91 0.98 0.96 
NSE 0.24 0.70 0.95 0.88 
PWE (m) 0.29 −0.21 0.02 −0.01 
Storm at the Kangle 2 station (4 Jun 2021 from 13:00 to 1700) CC 0.91 0.88 0.95 0.90 
NSE 0.33 0.18 0.84 0.48 
PWE (m) 0.29 −0.01 0.27 0.38 
Storm at the Yucheng 8 station (22 Jul 2019 from 14:00 to 18:00) CC 0.99 0.97 0.99 0.99 
NSE 0.89 0.92 0.99 0.97 
PWE (m) 0.16 −0.46 −0.09 0.02 
No.EventsIndicatorsModels
SWMMKNRSVRCGBR
Storm at the Shezih 2 station (4 Jun 2021 from 13:00 to 19:30) CC 0.91 0.91 0.98 0.96 
NSE 0.24 0.70 0.95 0.88 
PWE (m) 0.29 −0.21 0.02 −0.01 
Storm at the Kangle 2 station (4 Jun 2021 from 13:00 to 1700) CC 0.91 0.88 0.95 0.90 
NSE 0.33 0.18 0.84 0.48 
PWE (m) 0.29 −0.01 0.27 0.38 
Storm at the Yucheng 8 station (22 Jul 2019 from 14:00 to 18:00) CC 0.99 0.97 0.99 0.99 
NSE 0.89 0.92 0.99 0.97 
PWE (m) 0.16 −0.46 −0.09 0.02 

By selecting three stations, Figure 10 shows the overall performance comparisons of the four models. Overall, the three MOD models achieved better prediction performances than the SWMM. Compared to the SWMM, three MOD models, the KNR, SVR, and CGBR models, could increase the accuracy in terms of the NSE by 22.54, 89.51, and 59.35%, respectively. Moreover, the three MOD models, KNR, SVR, and CGBR, decreased the peak storm-water level error by 6.87, 48.69, and 45.40%, respectively. The findings demonstrated that the three MOD models can significantly improve the performance of the SWMM for simulating water levels in urban storm-water sewer systems.
Figure 10

Overall performance comparisons of four models at three selected stations.

Figure 10

Overall performance comparisons of four models at three selected stations.

Close modal

Validation of the proposed model with independent data

To investigate the performance of the proposed ensemble model, the Bailing 1 station, which has the most datasets, was selected as the test case in this section. According to the results in the previous sections, the SVR model is as accurate as the CGBR model. However, the CGBR model utilizes the gradient boosting method, which is more remarkable and suitable for the use of the most appropriate regression model in constructing the proposed ensemble model. Hence, the CGBR model was further selected as the meta-leaner. In addition, the collected dataset was divided into three parts – training, validation, and testing – to explore the influence of the independent data on the predictions. As shown in Table 9, three ratios of training-validation-test data splitting were designed: SP1 (60:20:20), SP2 (80:10:10), and SP3 (50:20:30). Following the algorithm of the proposed model, the prediction results considering three ratios of SP for a lead time of 60 min are presented in Figure 11. For the training phase, as shown in Figure 11(a), the proposed model with three ratios of SPs achieved an overall reasonable training performance. However, the proposed model produced underestimated results when the storm-water level was greater than 2.0 m. As shown in Figure 11(b), for the model validation phase, the proposed model with three ratios of SPs again provided underestimated results for high storm-water level conditions. Figure 11(c) shows the predicted results in the model test phase, indicating acceptable performance, with predictions ranging from 0.5 to 1.5 m from the storm-water level.
Table 9

The influence of the three training-validation-test data splitting ratios on the predictions

Training-validation-test data splitting ratioLead-time-averaged RMSE (m)
Lead-time-averaged NSE
TrainingValidationTestTrainingValidationTest
SP1 (60:20:20) 0.105 0.117 0.152 0.928 0.910 0.813 
SP2 (80:10:10) 0.106 0.161 0.148 0.929 0.798 0.818 
SP3 (50:20:30) 0.091 0.180 0.158 0.941 0.820 0.826 
Training-validation-test data splitting ratioLead-time-averaged RMSE (m)
Lead-time-averaged NSE
TrainingValidationTestTrainingValidationTest
SP1 (60:20:20) 0.105 0.117 0.152 0.928 0.910 0.813 
SP2 (80:10:10) 0.106 0.161 0.148 0.929 0.798 0.818 
SP3 (50:20:30) 0.091 0.180 0.158 0.941 0.820 0.826 
Figure 11

Comparison of simulated storm-water levels and measured data by the proposed model with three different ratios of SPs for a lead time of 60 min in the (a) training, (b) validation, and (c) test phases.

Figure 11

Comparison of simulated storm-water levels and measured data by the proposed model with three different ratios of SPs for a lead time of 60 min in the (a) training, (b) validation, and (c) test phases.

Close modal

Table 9 lists the performances of the proposed model with three ratios of SPs in terms of the lead-time-averaged RMSE and NSE. The results indicated that the proposed model using SP1 yielded the best overall lead-time-averaged NSE performance, with values of 0.93, 0.91 and 0.81 for the model training, validation and test phases, respectively. Concerning the results of the lead-time-averaged RMSE, the proposed model using SP1 also produced superior overall performance, with values of 0.11, 0.12, and 0.15 m for the model training, validation, and test phases, respectively. Accordingly, the proposed model produced excellent overall performance when using the SP1 dataset with a ratio of 60:20:20.

Performance evaluation of the proposed ensemble model

To further explore the potential of the proposed ensemble model, SP1 was selected to compare the three MOD models with the proposed ensemble model. The predicted storm-water levels at the Bailing 1 station according to the four models are presented in Figure 12 and Table 10. Figure 12 shows that the SVR model produced more prediction errors for the peak storm-water level in some events. In addition, the proposed model achieved overall good agreement with the measured data, especially for the peak storm-water levels. As shown in Table 10, the proposed model yielded the smallest lead-time-averaged RMSE values, whereas the KNR model achieved the largest values for all three phases. Table 10 also shows that the proposed model presented the largest lead-time-averaged NSE values, whereas the KNR model produced the smallest values for all three phases. Obviously, the proposed model achieved more accurate prediction solutions than the other three MOD models. According to the results of the model validation phase, the proposed model achieved the improvements in RRMSE of approximately 63.1, 47.6, and 55.2% in comparison with the KNR, SVR, and CGBR models. Moreover, the proposed model outperformed the KNR, SVR, and CGBR models by approximately 174.5, 42.4, and 69.4%, respectively, in terms of the INSE. Accordingly, the proposed model can achieve an INSE of at least approximately 42.4% and an RRMSE of at least 47.6% in the model validation phase compared to the three other MOD models.
Table 10

The comparison results of the four models based on the training, validation and test phases

PhaseIndicators (lead-time-averaged)KNRSVRCGBRProposed
Training RMSE (m) 0.260 0.203 0.205 0.105 
NSE 0.549 0.709 0.713 0.928 
Validation RMSE (m) 0.317 0.223 0.261 0.117 
NSE 0.332 0.639 0.538 0.910 
Test RMSE (m) 0.321 0.287 0.250 0.152 
NSE 0.179 0.313 0.495 0.813 
PhaseIndicators (lead-time-averaged)KNRSVRCGBRProposed
Training RMSE (m) 0.260 0.203 0.205 0.105 
NSE 0.549 0.709 0.713 0.928 
Validation RMSE (m) 0.317 0.223 0.261 0.117 
NSE 0.332 0.639 0.538 0.910 
Test RMSE (m) 0.321 0.287 0.250 0.152 
NSE 0.179 0.313 0.495 0.813 
Figure 12

Comparison of simulated storm-water levels and measured data by four models with SP1 for a lead time of 60 min in the (a) training, (b) validation, and (c) test phases.

Figure 12

Comparison of simulated storm-water levels and measured data by four models with SP1 for a lead time of 60 min in the (a) training, (b) validation, and (c) test phases.

Close modal
Furthermore, the Taylor diagram was employed to explain the performance of the proposed ensemble model in terms of the lead-time-averaged CC, RMSE, and standard deviation. Figure 13 shows the Taylor diagram for the results in the model validation and test phases. It is obvious that the proposed model obtained the closest values to the measured data among all the models, demonstrating that the proposed model is the most accurate in terms of the lead-time-averaged performance.
Figure 13

Taylor plots of the four models based on the lead-time-averaged results for the model (a) validation and (b) test phases.

Figure 13

Taylor plots of the four models based on the lead-time-averaged results for the model (a) validation and (b) test phases.

Close modal

Comparisons with previous studies

In this study, the proposed model was trained using outputs predicted from three MOD models for forecasting storm-water levels with lead times of 10–60 min. The well-trained proposed model was then applied to simulate storm-water levels at the Bailing 1 station. The results indicated that the proposed model with SP1 achieved superior prediction resolutions based on the overall lead-time-averaged performance. To strengthen the importance of the present study, Table 11 lists the differences between the present study and the previous studies for urban flood simulation. First, the previous studies by Yang & Chang (2020) and Xu et al. (2023) employed the SWMM model to generate the dataset required for model training, whereas the present study only used the observed data without any data created from the SWMM. Regarding the time scale, the previous studies by Yang & Chang (2020) and Xu et al. (2023) investigated urban floods on an hourly time scale, whereas Moon et al. (2023) and the present study conducted research on the minute time scale of predictions. Although all the studies presented in Table 11 applied different models, all the studies reported that the adopted models achieved overall reasonable performances in terms of NSE values. However, the proposed model still achieved superior performance compared to other models, even when adopting the least amount of data. Therefore, it has been demonstrated that the proposed model shows promise, robustness, and applicability in predicting storm-water levels.

Table 11

Comparison of the results of the present study with those of previous studies

Concepts of comparisonsYang & Chang (2020) Xu et al. (2023) Moon et al. (2023) Present study (2024)
Study area Erren River basin, Taiwan The Haidian Island, China The Dorim stream, South Korea. Seven water level monitoring stations, Taipei City, Taiwan 
Study hydrology issue Forecasting regional flood inundation depth Prediction of urban flood depth Forecasting urban flood water level Short-duration prediction of urban storm-water levels 
Hydrologic data 631 hourly datasets, including measured data and simulated data by SOBEK model 490 hourly datasets, including seven return period simulations by SWMM Over 21 years (2001–2021) 10-min collected dataset Near 6 years (2016–2021) 10-min datasets 
Methodology The recurrent nonlinear autoregressive with exogenous inputs model (RNARX) LightGBM, RFR, XGBoost, and KNR LSTM combined with SWMM The three MOD models (KNR, SVR and CGBR) and a proposed ensemble-interpretable-based model 
Conclusions RNARX model achieved reliable forecasts for future 3 h with an NSE value of 0.84 LightGBM model obtained a better NSE performance of 0.98 compared to other three models LSTM model yielded good performance in terms of NSE, with a value greater than 0.8 The proposed model achieved high accuracy in terms of NSE, with values of 0.93, 0.91, and 0.81 in the training, validation, and test stages, respectively 
Concepts of comparisonsYang & Chang (2020) Xu et al. (2023) Moon et al. (2023) Present study (2024)
Study area Erren River basin, Taiwan The Haidian Island, China The Dorim stream, South Korea. Seven water level monitoring stations, Taipei City, Taiwan 
Study hydrology issue Forecasting regional flood inundation depth Prediction of urban flood depth Forecasting urban flood water level Short-duration prediction of urban storm-water levels 
Hydrologic data 631 hourly datasets, including measured data and simulated data by SOBEK model 490 hourly datasets, including seven return period simulations by SWMM Over 21 years (2001–2021) 10-min collected dataset Near 6 years (2016–2021) 10-min datasets 
Methodology The recurrent nonlinear autoregressive with exogenous inputs model (RNARX) LightGBM, RFR, XGBoost, and KNR LSTM combined with SWMM The three MOD models (KNR, SVR and CGBR) and a proposed ensemble-interpretable-based model 
Conclusions RNARX model achieved reliable forecasts for future 3 h with an NSE value of 0.84 LightGBM model obtained a better NSE performance of 0.98 compared to other three models LSTM model yielded good performance in terms of NSE, with a value greater than 0.8 The proposed model achieved high accuracy in terms of NSE, with values of 0.93, 0.91, and 0.81 in the training, validation, and test stages, respectively 

Suggestions for future application of the proposed ensemble model

For time series prediction using DD models, two major factors could affect the prediction performance: the quality of the dataset and the model algorithm. To explore the influence of the dataset on the predictions, this study examined three different ratios of SPs by using the proposed model, which indicated that the SP1 (60:20:20) was accurate and suitable for the present study area. However, the results may vary depending on the application area. Therefore, it is suggested that the data division test should be conducted first before model validation to ensure the accuracy of prediction.

In this study, the proposed model is based mainly on the framework of ensemble learning. By combining several DD models, ensemble learning aims to improve the prediction performance for a single target. However, if there are many target stations, the method needs to be modified or improved. For prediction purposes involving multiple different tasks, a meta-learning method can further adapt individual models to become more robust. This study employed three MOD models as the base learners, and the CGBR model was selected as the meta-learner, leading to the proposed integrated model. On the basis of the present results, the proposed model can still improve the prediction performance when using only three base learners. Future research could consider additional base learners to increase model diversity, which may enhance model performance.

Furthermore, the BO and SHAP methods presented in this study are effective and useful methods for identifying the optimal parameters and exploring the feature importance of the model. The results suggested that the inputs related to future rainfall significantly contributed to the predictions. Therefore, accurately forecasting future rainfall is crucial for establishing an operational sewer flooding forecasting system.

This study proposed three MOD models, KNR, SVR, and CGBR, for urban storm-water sewer flooding predictions with lead times up to 60 min. Three MOD models were further extended into a new ensemble model with residual-error correction through a multiple-output framework. The performances of the three MOD models and the new ensemble model were evaluated at urban storm-water gauging sites in Taipei city, Taiwan. In addition, the interpretability of applying the SHAP method in the proposed model for storm-water level predictions was investigated. Moreover, the validation of the proposed model with independent data was conducted based on three different ratios of training-validation-test data splitting.

The most important findings in this study are summarized as follows: (1) The SVR and CGBR models yielded reasonable overall performances at seven stations compared to those of the KNR and SWMM models. (2) Compared to the single KNR, SVR, and CGBR models, the proposed new ensemble-based sewer flooding model has superior performance, with RRMSE improvements of 63.1, 47.6, and 55.2%, respectively, and INSE improvements of 174.5, 42.4, and 69.4%, respectively. (3) Based on the comparisons of the present study with previous studies, the proposed integrated model was found to be an effective and precise method for predicting short-duration storm-water levels in urban cities without requiring additional synthesis datasets from PP models.

The lead time considered in this study was limited to 60 min. To enhance the flood disaster response, extending the lead time beyond 60 min may be necessary. Future studies will investigate extending the lead time while maintaining the prediction accuracy. In addition, the proposed models are primarily based on training individual stations. Including feature factors from neighboring stations may affect the prediction results of the target station. Recently, the graph convolutional network (GCN) has been successfully applied as a robust and advanced technique for traffic forecasting. This technique utilizes spatial-temporal time series data to train graph networks and can be applied to hydrology problems. Therefore, our future research will utilize GCN techniques to explore the potential for predicting longer-duration water levels in urban storm-water sewers.

The authors thank the Hydraulic Engineering Office, Public Works Department, Taipei City Government, Taiwan, for providing the rainfall and water level data at the study stations.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Badrzadeh
H.
,
Sarukkalige
R.
&
Jayawardena
A. W.
2015
Hourly runoff forecasting for flood risk management: Application of various computational intelligence models
.
Journal of Hydrology
529
,
1633
1643
.
Bai
H.
,
Li
G.
,
Liu
C.
,
Li
B.
,
Zhang
Z.
&
Qin
H.
2021
Hydrological probabilistic forecasting based on deep learning and Bayesian optimization algorithm
.
Hydrology Research
52
(
4
),
927
943
.
Belyakova
P. A.
,
Moreido
V. M.
,
Tsyplenkov
A. S.
,
Amerbaev
A. N.
,
Grechishnikova
D. A.
,
Kurochkina
L. S.
,
Filippov
V. A.
&
Makeev
M. S.
2022
Forecasting water levels in Krasnodar Krai rivers with the use of machine learning
.
Water Resources
49
(
1
),
10
22
.
Ben Taieb
S.
,
Sorjamaa
A.
&
Bontempi
G.
2010
Multiple-output modeling for multistep-ahead time series forecasting
.
Neurocomputing
73
(
10–12
),
1950
1957
.
Berkhahn
S.
,
Fuchs
L.
&
Neuweiler
I.
2019
An ensemble neural network model for real-time prediction of urban floods
.
Journal of Hydrology
575
,
743
754
.
Bermúdez
M.
,
Cea
L.
&
Puertas
J.
2019
A rapid flood inundation model for hazard mapping based on least squares support vector machine regression
.
Journal of Flood Risk Management
12
(
S1
),
e12522
.
Beskopylny
A. N.
,
Stel'makh
S. A.
,
Shcherban’
E. M.
,
Mailyan
L. R.
,
Meskhi
B.
,
Razveeva
I.
,
Chernil'nik
A.
&
Beskopylny
N.
2022
Concrete strength prediction using machine learning methods CatBoost, k-nearest neighbors, support vector regression
.
Applied Sciences
12
(
21
),
10864
.
Bontempi
G.
,
Ben Taieb
S.
&
Le Borgne
Y. A.
2013
Machine learning strategies for time series forecasting
.
Lecture Notes in Business Information Processing
138
,
62
77
.
Devitt
L.
,
Neal
J.
,
Coxon
G.
,
Savage
J.
&
Wagener
T.
2023
Flood hazard potential reveals global floodplain settlement patterns
.
Nature Communications
14
(
1
),
2801
.
Farina
A.
,
Di Nardo
A.
,
Gargano
R.
,
van der Werf
J. A.
&
Greco
R.
2023
A simplified approach for the hydrological simulation of urban drainage systems with SWMM
.
Journal of Hydrology
623
,
129757
.
Faruq
A.
,
Marto
A.
&
Abdullah
S. S.
2021
Flood forecasting of Malaysia Kelantan river using support vector regression technique
.
Computer Systems Science and Engineering
39
(
3
),
297
306
.
Gadgil
K.
,
Gill
S. S.
&
Abdelmoniem
A. M.
2023
A meta-learning based stacked regression approach for customer lifetime value prediction
.
Journal of Economy and Technology
1
,
197
207
.
Getirana
A.
,
Jung
H. C.
,
Arsenault
K.
,
Shukla
S.
,
Kumar
S.
,
Peters-Lidard
C.
,
Maigari
I.
&
Mamane
B.
2020
Satellite gravimetry improves seasonal streamflow forecast initialization in Africa
.
Water Resources Research
56
(
2
),
e2019WR026259
.
Goudarzi
S.
,
Soleymani
S. A.
,
Anisi
M. H.
,
Ciuonzo
D.
,
Kama
N.
,
Abdullah
S.
,
Azgomi
M. A.
,
Chaczko
Z.
&
Azmi
A.
2021
Real-time and intelligent flood forecasting using UAV-assisted wireless sensor network
.
Computers, Materials and Continua
70
(
1
),
715
738
.
Guo
W. D.
,
Chen
W. B.
&
Chang
C. H.
2023
Error-correction-based data-driven models for multiple-hour-ahead river stage predictions: A case study of the upstream region of the Cho-Shui river, Taiwan
.
Journal of Hydrology: Regional Studies
47
,
101378
.
Hartanto
A. D.
,
Kholik
Y. N.
&
Pristyanto
Y.
2023
Stock price time series data forecasting using the light gradient boosting machine (LightGBM) model
.
International Journal on Informatics Visualization
7
(
4
),
2270
2279
.
Hou
J.
,
Zhou
N.
,
Chen
G.
,
Huang
M.
&
Bai
G.
2021
Rapid forecasting of urban flood inundation using multiple machine learning models
.
Natural Hazards
108
(
2
),
2335
2356
.
Kabir
S.
,
Patidar
S.
,
Xia
X.
,
Liang
Q.
,
Neal
J.
&
Pender
G.
2020
A deep convolutional neural network model for rapid prediction of fluvial flood inundation
.
Journal of Hydrology
590
,
125481
.
Kharazi Esfahani
P.
,
Peiro Ahmady Langeroudy
K.
&
Khorsand Movaghar
M. R.
2023
Enhanced machine learning-ensemble method for estimation of oil formation volume factor at reservoir conditions
.
Scientific Reports
13
(
1
),
15199
.
Kim
H. I.
&
Han
K. Y.
2020
Data-driven approach for the rapid simulation of urban flood prediction
.
KSCE Journal of Civil Engineering
24
(
6
),
1932
1943
.
Lin
G. F.
,
Lin
H. Y.
&
Chou
Y. C.
2013
Development of a real-time regional-inundation forecasting model for the inundation warning system
.
Journal of Hydroinformatics
15
(
4
),
1391
1407
.
Lin
S. S.
,
Zhu
K. Y.
,
Zhang
X. H.
,
Liu
Y. C.
&
Wang
C. Y.
2023
Development of a microservice-based storm sewer simulation system with IoT devices for early warning in urban areas
.
Smart Cities
6
(
6
),
3411
3426
.
Lundberg
S. M.
&
Lee
S. L.
2017
A unified approach to interpreting model predictions
.
In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, Long Beach, CA, 4-9 December 2017. Curran Associates Inc, Red Hook, pp. 4768–4777. https://dl.acm.org/doi/10.5555/3295222.3295230
.
Mampitiya
L.
,
Rathnayake
N.
,
Hoshino
Y.
&
Rathnayake
U.
2024
Forecasting PM10 levels in Sri Lanka: A comparative analysis of machine learning models PM10
.
Journal of Hazardous Materials Advances
13
,
100395
.
Moon
H.
,
Yoon
S.
&
Moon
Y.
2023
Urban flood forecasting using a hybrid modeling approach based on a deep learning technique
.
Journal of Hydroinformatics
25
(
2
),
593
610
.
Piadeh
F.
,
Behzadian
K.
,
Chen
A. S.
,
Campos
L. C.
,
Rizzuto
J. P.
&
Kapelan
Z.
2023
Event-based decision support algorithm for real-time flood forecasting in urban drainage systems using machine learning modeling
.
Environmental Modeling and Software
167
,
105772
.
Poretti
I.
&
De Amicis
M.
2011
An approach for flood hazard modeling and mapping in the medium Valtellina
.
Natural Hazards and Earth System Science
11
(
4
),
1141
1151
.
Prokhorenkova
L.
,
Gusev
G.
,
Vorobev
A.
,
Dorogush
A. V.
&
Gulin
A.
2018
Catboost: Unbiased boosting with categorical features
.
In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, Montreal, 3-8 November 2018. Curran Associates Inc, Red Hook, pp. 6639–6649. https://dl.acm.org/doi/10.5555/3327757.3327770
.
Shaikh
A. A.
,
Pathan
A. I.
,
Waikhom
S. I.
,
Agnihotri
P. G.
,
Islam
M. N.
&
Singh
S. K.
2023
Application of latest HEC-RAS version 6 for 2D hydrodynamic modeling through GIS framework: A case study from coastal urban floodplain in India
.
Modeling Earth Systems and Environment
9
(
1
),
1369
1385
.
Solomatine
D. P.
&
Ostfeld
A.
2008
Data-driven modeling: Some past experiences and new approaches
.
Journal of Hydroinformatics
10
,
3
22
.
Sood
S. K.
,
Sandhu
R.
,
Singla
K.
&
Chang
V.
2018
Iot, big data and HPC based smart flood management framework
.
Sustainable Computing: Informatics and Systems
20
,
102
117
.
Song
H.
,
Li
Y.
,
Zou
X.
,
Hu
P.
&
Liu
T.
2023
Elite male table tennis matches diagnosis using SHAP and a hybrid LSTM-BPNN algorithm
.
Scientific Reports
13
(
1
),
11533
.
Vizi
Z.
,
Batki
B.
,
Rátki
L.
,
Szalánczi
S.
,
Fehérváry
I.
,
Kozák
P.
&
Kiss
T.
2023
Water level prediction using long short-term memory neural network model for a lowland river: A case study on the Tisza River, Central Europe
.
Environmental Sciences Europe
35
(
1
),
92
.
Wang
K. H.
&
Altunkaynak
A.
2012
Comparative case study of rainfall-runoff modeling between SWMM and fuzzy logic approach
.
Journal of Hydrologic Engineering
17
(
2
),
283
291
.
Wang
J. H.
,
Lin
G. F.
,
Chang
M. J.
,
Huang
I. H.
&
Chen
Y. R.
2019
Real-time water-level forecasting using dilated causal convolutional neural networks
.
Water Resources Management
33
(
11
),
3759
3780
.
Wang
S.
,
Luo
P.
,
Xu
C.
,
Zhu
W.
,
Cao
Z.
&
Ly
S.
2022
Reconstruction of historical land Use and urban flood simulation in Xi'an, Shannxi, China
.
Remote Sensing
14
(
23
),
6067
.
Yan
J.
,
Jin
J.
,
Chen
F.
,
Yu
G.
,
Yin
H.
&
Wang
W.
2018
Urban flash flood forecast using support vector machine and numerical simulation
.
Journal of Hydroinformatics
20
(
1
),
232
245
.
Yao
J.
,
Zhang
X.
,
Luo
W.
,
Liu
C.
&
Ren
L.
2022
Applications of stacking/blending ensemble learning approaches for evaluating flash flood susceptibility
.
International Journal of Applied Earth Observation and Geoinformation
112
,
102932
.
Yazdi
J.
,
Heydari Mofrad
H.
&
Heydari Mofrad
M.
2022
Development of a risk-based optimization approach to improve the performance of urban drainage systems
.
Hydrological Sciences Journal
67
(
5
),
689
702
.
Yin
Y.
,
Zhang
X.
,
Guan
Z.
,
Chen
Y.
,
Liu
C.
&
Yang
T.
2023
Flash flood susceptibility mapping based on catchments using an improved blending machine learning approach
.
Hydrology Research
54
(
4
),
557
579
.
Zhou
R.
,
Zheng
H.
,
Liu
Y.
,
Xie
G.
&
Wan
W.
2022
Flood impacts on urban road connectivity in southern China
.
Scientific Reports
12
(
1
),
16866
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).