Recently, urban waterlogging prevention and treatment of black–odorous rivers have become a social concern and the upgradation of drainage system and the development of river runoff pollution control projects have accelerated. The use of deep tunnels to upgrade old drainage systems and achieve pollution control-related engineering designs has complicated the drainage system operation control. The traditional operation control mainly relies on human experience or model simulation. This study provides a perspective of machine learning for controlling the operation of the drainage system and exploring whether the operation suggestions regarding facilities in this system can be given in real time while relying only on real-time data and avoiding the complex model simulation process. Herein, five drainage systems were used as examples: the initial water level of a pipeline, key point water level flow, pump station front pool water level, and most unfavorable point water level were selected as relevant variables and four machine-learning discrimination methods were used for to analyze the weir-lowering operation of a deep tunnel. This study found that the average error rate of the linear discrimination method was <10%, thereby exhibiting satisfactory performance. This study provides insights for improving the operation of complex drainage systems.

  • ML can be used to address the switching problem in a deep tunnel, which is important for the functionality of it.

  • This study provides an insight to improve the operation of complex drainage systems using ML to provide operation suggestions of weirs in real time employing real-time data only.

  • ML explains physical phenomena from the perspective of probability distribution, providing a new way to solve switching problems.

Graphical Abstract

Graphical Abstract
Graphical Abstract

For efficient drainage, a deep tunnel (deeply buried storage and drainage tunnel) is buried underground at a depth of >20 m. Deep tunnels usually have a large storage capacity for rainwater or combined sewage storage and transportation. Recently, extensive research has been conducted on the application and development trends of deep drainage tunnel technology, optimization of deep tunnel system control, and analysis of pollution control effects (Wang et al. 2016; Tan et al. 2018; Liao et al. 2019; Liu et al. 2019; Wei et al. 2019).

The construction of deep tunnel projects has two major objectives (Figure 1). The first objective is flood prevention for improving the safety of the drainage systems through standard upgrading and waterlogging prevention. The other objective is pollution control for reducing initial rainwater pollution and combined sewer overflow (CSO) emissions. Because the overall structure of a deep tunnel system is in the new drainage system, which comprises the deep tunnel system and original drainage system, collaboration is required to achieve this engineering objective. For improving the original rainwater system standard, deep tunneling increases the inflow point and changes the hydraulic line of the original system, thereby reducing the quantity of water that exceeds the system's design standard.

Figure 1

Schematic of a deep tunnel.

Figure 1

Schematic of a deep tunnel.

Close modal

In deep tunnel projects, a unified and contradictory complex relationship exists between flood prevention and pollution control. The unity implies that the two engineering objectives must be realized in the same project and coordinated with each other. However, contradiction arises because the requirements and operating modes of the two engineering objectives do not match completely and in some cases, there are conflicts between the two engineering objectives. For example, under certain conditions, to achieve the objective of a pollution control project of not releasing a certain number of millimeters of rain (10 mm of initial rain) into a river, the municipal pump is expected to start after the first 10 mm of rain enters the deep tunnel system. However, to improve the system's drainage capacity, the deep tunnel system must be used at a peak-shaving storage capacity, i.e., the system will not be activated until the rain peak is achieved. In essence, the two objectives are realized through the deep tunnel's storage space, and storage space limitation is the main cause of contradiction.

A deep tunnel is equipped with weirs at new inflow points, allowing a small rainfall's water to be lifted to the sewage pipe network through the established system's interception pump. This way, deep tunnels are avoided, and the cost of power consumption is reduced. To control the inflow of a deep tunnel, an adjustable weir with a flexible control form is used.

The adjustment of a weir is critical for realizing the functionality of a deep tunnel. The model's calculation results show that lowering the weir too early or too late may affect the functionality. On the one hand, if the weir is lowered too early, the deep tunnel system may get filled and closed prematurely. Therefore, the deep tunnel system will be unable to participate in the peak-clipping process, resulting in water accumulation in the system. Furthermore, discharging rainwater into deep tunnels before peak runoff may result in the incomplete startup of municipal pumps and inadequate usage of the shallow system's drainage capacity. On the other hand, if the weir is lowered too late, the flow weir may be unable to contribute to reducing the rain peak. Because when the rain peak is reached, if the shallow pipe network is running at maximum load and the inflow weir is not lowered on time, the insufficient flow of the weir will lead to water accumulation in the system. Therefore, it is critical to lower the weir at the appropriate time to increase the flow and peak-cutting capacities. Hence, this study investigates how to establish a relationship between the data obtained from drainage system's monitoring and the time of lowering the weir.

Machine learning (ML), as a type of artificial intelligence technology, can predict the future based on the large amount of collected data (Zalavadia et al. 2021). The ML technology mainly uses algorithms to analyze data and make inferences or predictions based on learning (Bernardelli et al. 2020). Given the large amount of continuously updated effective data generated during deep tunnel research and actual operation, ML can analyze the data to determine the time of the weir lowering. Because lowering the weir in the deep tunnel is a simple switching problem (subject to supervised learning) (Ki et al. 2018), the data (training data) in the database has a clear judgment result. Based on the aforementioned conditions, the established prediction model is continuously adjusted by comparing the ML prediction results with the actual training data results, to achieve a higher judgment accuracy rate (Fleuren et al. 2020).

ML has been widely used in finance, transportation, medicine, and other fields. For example, ML has been used in the prevention, management, and monitoring of new coronaviruses (Rodríguez-Tomàs et al. 2021). By analyzing large amounts of data (including medical information, human behavior patterns, and environmental conditions), ML can assist judgment and decision-making. Nowadays, the application of ML technology in drainage system is common; in fact, there are numerous studies regarding CSO control using the ML technology (Hong et al. 2017; Gudaparthi et al. 2020). In this study, five drainage systems in Shanghai, China, were used as examples to obtain a training database from each system for ML (Figure 2). Then, the time when a deep tunnel should lower the weir was determined using different discriminant analysis algorithms. Thus, when a discrimination algorithm identifies the data sequence obtained from the monitoring center as being true, a command is sent to the inflow weir to perform the weir-lowering operation, thereby ensuring that the deep tunnel improves the realization of the rainwater system standard.

Figure 2

Drainage systems (a–e).

Figure 2

Drainage systems (a–e).

Close modal

The features of the five drainage systems are shown in Table 1.

Table 1

Features of the five drainage systems

System ASystem BSystem CSystem DSystem E
Type of system Established diversion system Established diversion system Diversion system planned to be built Established diversion system Established diversion system 
Service area (km23.50 2.93 3.28 2.90 1.30 
Rainstorm return period (year) 
Integrated runoff coefficient (the ratio of surface runoff to rainfall for a certain catchment area) 0.6 0.6 0.6 0.6 0.6 
Main trunk pipes of system The system had two main trunk pipes, and the rainwater in the two trunk pipes merged into a 3500 × 2400 rainwater tank culvert. The main pipe diameter was DN1800–DN2700. The main pipe diameter was DN800–DN3000. Three main pipes Three main pipes 
System ASystem BSystem CSystem DSystem E
Type of system Established diversion system Established diversion system Diversion system planned to be built Established diversion system Established diversion system 
Service area (km23.50 2.93 3.28 2.90 1.30 
Rainstorm return period (year) 
Integrated runoff coefficient (the ratio of surface runoff to rainfall for a certain catchment area) 0.6 0.6 0.6 0.6 0.6 
Main trunk pipes of system The system had two main trunk pipes, and the rainwater in the two trunk pipes merged into a 3500 × 2400 rainwater tank culvert. The main pipe diameter was DN1800–DN2700. The main pipe diameter was DN800–DN3000. Three main pipes Three main pipes 

Selection of control factors and control methods

Selection of control factors

This study predicts the working conditions based on the characteristic changes of relevant variables at key points to guide the control and scheduling of related facilities. The key monitoring points regarding to relevant variables are as follows: initial water level, inflow point of the secondary and tertiary pipelines (simultaneous monitoring of water level and flow), pumping station forebay (located at the most downstream of the system, where the system's water level can be monitored), and the most unfavorable point (the water level at this point is monitored). Water is most likely accumulated in the upper and middle parts of the system, where the terrain is low. This study investigated the relationship between real-time data and weir-lowering operation based on the data (the above-mentioned factors) used for training the ML model.

Control method

Control objective: Under the premise of ensuring the safety of flood control (five-year rainfall without ponding), pollution control should be achieved as much as possible (10-mm initial rainfall into a deep tunnel).

Control elements: adjustable weir and municipal pump.

  • 1.

    After the real-time data were analyzed and identified as ‘true,’ the adjustable weir would be lowered to the bottom.

  • 2.

    Water accumulation at an unfavorable point (main pipe) triggered the pump immediately.

Selection of discriminant analysis

To achieve the control objectives, we determined the operation of relevant variables based on discriminant analysis. Discriminant analysis is a method for classifying samples of unknown categories. After classifying the research objects, the discriminant formula and criterion were established on the basis of the extracted samples; subsequently, the categories of the unknown samples were determined.

Discriminant analysis is useful for various applications. For example, in archaeology (Kovarovic et al. 2011), the age of a tomb, its identity, and the sex of the owner are identified using unearthed objects. In medicine, the type of disease is determined by analyzing a patient's clinical symptoms and laboratory results (Stühler et al. 2011). In the field of pattern recognition, the analysis is used for text recognition, speech recognition, fingerprint recognition, etc.

In this study, the following four methods were selected to analyze the weir-lowering operation of a deep tunnel: linear, linear diagonal matrix, quadratic discriminant in distance, and naive Bayes discriminant in Bayesian discriminant methods. The calculation principles of the distance and Bayesian discriminant methods are given as follows.

  • 1.

    Distance discriminant method (Mahalanobis distance): The Mahalanobis distance (mean vector and covariance matrix) between the sample and its population is the shortest, whereas the Mahalanobis distance between the sample and other populations should be large. The calculation principle is as follows:

Assuming G is a p-dimensional population, the mean vector (μ) and covariance matrix (CM) of its distribution are
formula
(1)
are assumed to be two samples taken from the population G. If CM > 0 (CM is a positive definite matrix), the square of Mahalanobis distance (d) between x and y is defined as
formula
(2)
Meanwhile, the squared Mahalanobis distance from x to the overall G is defined as
formula
(3)
  • 2.

    Bayesian discriminant method: People's existing cognition of the research object may affect the result of the judgment; however, the distance discrimination method does not consider this cognition. First, the Bayesian discriminant assumes a prior probability to describe an existing cognition. Then, the sample corrects the prior probability to obtain the posterior probability. Finally, another decision is made on the basis of the posterior probability. The calculation principle is as follows:

The k p-dimensional populations are assumed as G1, G2,…, Gk. Thus, the probability density functions are , , …, , respectively. Assuming the prior probability of sample x coming from the population Gi is pi (i = 1, 2, 3…, k), then . Based on the Bayesian theory, the posterior probability of sample x from the total Gi is
formula
(4)
If the misjudgment cost is considered, Ri is used to represent the set of all samples that may be classified as Gi (i = 1, 2, …, k) according to a certain criterion. Simultaneously, c(j|i) (i, j = 1, 2, …, k) is used to represent the cost of misclassification of sample x from Gi as coming from Gj, then c(i|i) = 0. The conditional probability of misjudging the sample x from Gi as coming from Gj is
formula
(5)
Expected cost of misclassification (ECM) of any discriminant rule can be obtained as
formula
(6)
The discriminant rule to minimize ECM is given by
formula
(7)

If the sample's ECM to Gi is smaller than the sum of the other overall miscalculation costs, the sample is classified as Gi.

Model selection

Selection of mathematical model

Urban drainage network models can be divided into three categories: hydrological, hydraulic, and comprehensive models. The hydrological model mainly adopts a black or gray box model to simulate the influence of rainfall on runoff and confluence (Susanna et al. 2016). The hydraulic model mainly adopts the microscopic physical laws, such as continuity and momentum equations, to simulate the flow of rainwater and sewage in the slope and pipe networks, especially changes in the values of hydraulic elements such as flow velocity and volumetric flow. The comprehensive model is a combination of the hydrological and hydraulic models as well as a comprehensive application that includes the simulation of discharge and transmission in rainwater and sewage. The current model of urban drainage systems mainly adopts the comprehensive model, and some modules in the comprehensive model belong to the hydrological or hydraulic models. For example, the RUNOFF module in storm water management model (SWMM) belongs to the hydrological model and the TRANSPORT and EXTRAN modules belong to the hydraulic model.

In terms of model origin and development, the United States Environmental Protection Agency proposed SWMM and a storage, treatment, overflow, runoff model in the early 1970s, which were continuously updated in the later period. Stormwater hydraulics and quality models such as Distributed Routing Rainfall-Runoff Model-Quality (DR3M-QUAL), Hydrologic Simulation Program-Fortran, Hydro-works, Wallingford, and Model of Urban Sewers (MOUSE) have emerged internationally since then. Later, Hydro-works further evolved into an InfoWorks model. The InfoWorks CS series was the first model that was launched, followed by the InfoWorks ICM series in 2011.

Numerous studies have been conducted worldwide on urban drainage network models. The most widely used comprehensive models are SWMM, InfoWorks, and MOUSE (belonging to the MIKE series software). The comparison of the three models is presented in Table 2.

Table 2

Comparison of similarities and differences among the SWMM, InfoWorks, and MOUSE models

FactorSWMMInfoWorksMouse
Software type Comprehensive software Comprehensive software Comprehensive software 
Simulation method Single and continuous event simulation Single and continuous event simulation Single and continuous event simulation 
Water quantity simulation Yes Yes Yes 
Water quality simulation Yes Yes Yes 
Inflow mode Node inflow Node inflow Nodal and lateral inflow 
Rainfall-runoff module Three types of runoff modules and one type of confluence modules 13 types of runoff modules and nine types of confluence modules Five types of production and confluence modules 
Data interface Connect with pictures Connect with AutoCAD and GIS Connect with AutoCAD and GIS 
Modalities of property rights Free Paid Paid 
Software maturity Sometimes secondary development is needed Relatively mature Relatively mature 
User self-development Yes No No 
FactorSWMMInfoWorksMouse
Software type Comprehensive software Comprehensive software Comprehensive software 
Simulation method Single and continuous event simulation Single and continuous event simulation Single and continuous event simulation 
Water quantity simulation Yes Yes Yes 
Water quality simulation Yes Yes Yes 
Inflow mode Node inflow Node inflow Nodal and lateral inflow 
Rainfall-runoff module Three types of runoff modules and one type of confluence modules 13 types of runoff modules and nine types of confluence modules Five types of production and confluence modules 
Data interface Connect with pictures Connect with AutoCAD and GIS Connect with AutoCAD and GIS 
Modalities of property rights Free Paid Paid 
Software maturity Sometimes secondary development is needed Relatively mature Relatively mature 
User self-development Yes No No 

SWMM, InfoWorks, and MOUSE are relatively complete in terms of functionality. They can simulate not only the amount of rainwater but also the water quality of rainwater runoff and drainage networks. Because InfoWorks and MOUSE are paid software, they are more integrated and mature. However, SWMM may require some subsequent development. Furthermore, InfoWorks provides the most choices in the production and concentration of modules (Zalavadia & Gildin 2021), which may be more applicable to different cities and regions. Therefore, InfoWorks was selected for the simulation in this study.

Selection of rainfall-runoff model

In this study, three commonly used runoff models were selected for comparison: the integrated runoff coefficient method, fixed runoff coefficient method, and Horton method. Their overview and application scope are as follows:

  • (1)

    Integrated runoff coefficient method (proportional loss model): It can directly define the proportion of rainfall entering the system, i.e., net rainfall is a fixed proportion of rainfall intensity. Instead of subdividing different land-use types, the entire catchment area adopts a fixed proportion.

  • (2)

    Fixed runoff coefficient method: This model defines a fixed percentage of net rainfall, which becomes runoff. Different coefficients can be used for different catchment areas.

  • (3)

    Horton method: This method considers the soil's infiltration capacity and its time-variation (Yang et al. 2020).

In this study, two commonly used concentration models were selected for further comparison: the Wallingford model and SWMM. Their overview and scope of application are as follows:

  • (1)

    Wallingford model: Its storage-routing model is based on a dual quasilinear reservoir model. For each surface type, two reservoirs are used in a series, with each reservoir having an equivalent storage–output relationship.

  • (2)

    SWMM: Flow is routed using a single nonlinear reservoir, whose routing coefficient depends on the surface roughness, surface area, ground slope, and catchment width.

First, the abovementioned three runoff generation models and two confluence models are combined to create six rainfall-runoff models. The results of the combination are shown in Table 3. Then, taking system A as the research object, two typical rainfall processes in June 2015 and August 2015 were selected to compare the consistency between the simulation results (simulated water level curve of pumping station forebay) and the actual data (measured water level curve of pumping station forebay). Finally, for the study area, the optimal combination of the runoff generation and confluence models for the study area was selected.

Table 3

Six rainfall-runoff models

Runoff generation modelsConfluence models
Integrated runoff coefficient method SWMM 
Integrated runoff coefficient method Wallingford 
Fixed runoff coefficient method SWMM 
Fixed runoff coefficient method Wallingford 
Horton method SWMM 
Horton method Wallingford 
Runoff generation modelsConfluence models
Integrated runoff coefficient method SWMM 
Integrated runoff coefficient method Wallingford 
Fixed runoff coefficient method SWMM 
Fixed runoff coefficient method Wallingford 
Horton method SWMM 
Horton method Wallingford 

In this study, the parameters of runoff generation and confluence models were benchmark values applicable to the Shanghai area (Table 4), which were selected through normative and literature review (ASCE 1992; McCuen 1996; Rossman 2015; Rossman & Huber 2016).

Table 4

Parameters of runoff generation and confluence model

ObjectTypeReasonable rangeBenchmark
Runoff coefficient Composite Specified value Specified value 
Permeable area 0.85–0.95 0.9 
Impervious area 0.1–0.2 0.15 
The confluence parameters of SWMM (Manning roughness coefficient) Composite 0.03–0.10 Optimal value by calibration in a reasonable range 
Impervious area 0.01–0.32 0.0175 
Permeable area 0.1–0.3 0.2 
Confluence parameters of Wallingford Composite 5–8 Optimal value by calibration in a reasonable range 
Permeable area 5–7 
Impervious area About 10 10 
Parameters of Houghton model Initial permeability (mm/h) 67–120 70 
Steady permeability (mm/h) 0.6–25.4 2.5 
Attenuation coefficient (h−1About 2 
Initial loss value (mm) Permeable area 6–10 
Impervious area 0.5–2.5 1.5 
ObjectTypeReasonable rangeBenchmark
Runoff coefficient Composite Specified value Specified value 
Permeable area 0.85–0.95 0.9 
Impervious area 0.1–0.2 0.15 
The confluence parameters of SWMM (Manning roughness coefficient) Composite 0.03–0.10 Optimal value by calibration in a reasonable range 
Impervious area 0.01–0.32 0.0175 
Permeable area 0.1–0.3 0.2 
Confluence parameters of Wallingford Composite 5–8 Optimal value by calibration in a reasonable range 
Permeable area 5–7 
Impervious area About 10 10 
Parameters of Houghton model Initial permeability (mm/h) 67–120 70 
Steady permeability (mm/h) 0.6–25.4 2.5 
Attenuation coefficient (h−1About 2 
Initial loss value (mm) Permeable area 6–10 
Impervious area 0.5–2.5 1.5 

Input and output

First, we assume that the weir operation is closely related to the initial water level, the water level and flow at the inflow point of the secondary and tertiary pipelines, the water level of the pump station forebay, and the water level at the most unfavorable point. Second, we define the record operation categories as 0 and 1. In operation category 1, the weir-lowering operation should be performed based on the monitoring data of the same line. By contrast, in operation category 0, the weir-lowering operation should not be performed based on the monitoring data of the same line. Then, we record six and twelve sets of observations with operation categories 1 and 0, respectively. Subsequently, we record a set of data as unknown operation data. Finally, we establish a sample matrix (including all the above data), a training matrix (including all the data except the unknown records), and a weir-lowering operation matrix (group 1).

Based on the training matrix and group 1, we use different discriminant methods to obtain group 2 of the sample matrix. The difference between group 1 and group 2 is defined as the misjudgment ratio (group 1 will be exactly equal to group 2 only if the sample completely conforms to the overall distribution).

The calculation procedure is shown in Figure 3.

Figure 3

Calculation procedure.

Figure 3

Calculation procedure.

Close modal

Model simulation

Taking system A as an example, when the runoff generation model used the integrated runoff coefficient method, the curves obtained using different confluence models (SWMM and Wallingford model) are shown in Figures 4 and 5, respectively. The shape of the simulated water level of the pump station forebay is known to be closer to the shape of the measured process line when the runoff generation and confluence models use the integrated runoff coefficient method and SWMM, respectively.

Figure 4

Comparison of simulated and measured process lines under rainfall conditions in June 2015 (the runoff generation model using the integrated runoff coefficient method; the confluence model using the SWMM and Wallingford model).

Figure 4

Comparison of simulated and measured process lines under rainfall conditions in June 2015 (the runoff generation model using the integrated runoff coefficient method; the confluence model using the SWMM and Wallingford model).

Close modal
Figure 5

Comparison of simulated and measured process lines under rainfall conditions in August 2015 (the runoff generation model using the integrated runoff coefficient method; the confluence model using the SWMM and Wallingford model).

Figure 5

Comparison of simulated and measured process lines under rainfall conditions in August 2015 (the runoff generation model using the integrated runoff coefficient method; the confluence model using the SWMM and Wallingford model).

Close modal

When the runoff model adopted the fixed runoff coefficient method, the curves obtained from different confluence models are as shown in Figures 6 and 7.

Figure 6

Comparison of simulated and measured process lines under rainfall conditions in June 2015 (the runoff generation model using the fixed runoff coefficient method; the confluence model using the SWMM and Wallingford model).

Figure 6

Comparison of simulated and measured process lines under rainfall conditions in June 2015 (the runoff generation model using the fixed runoff coefficient method; the confluence model using the SWMM and Wallingford model).

Close modal
Figure 7

Comparison of simulated and measured process lines under rainfall conditions in August 2015 (the runoff generation model using the fixed runoff coefficient method; the confluence model using the SWMM and Wallingford model).

Figure 7

Comparison of simulated and measured process lines under rainfall conditions in August 2015 (the runoff generation model using the fixed runoff coefficient method; the confluence model using the SWMM and Wallingford model).

Close modal

Figures 6 and 7 show that the water level process lines obtained from the confluence model using SWMM and Wallingford models are similar to the process lines measured in June and August 2015, respectively, when the fixed runoff coefficient method was employed in the runoff generation model. Furthermore, the water level process line obtained from the confluence model using the Wallingford model shows that at around 9 a.m. on August 24, 2015, there was no peak water level of forebay corresponding to the measured rainfall peak, whereas the SWMM does. Although the peak value generated by the SWMM used in the confluence model is higher than the actual water level, it is more conducive with respect to the engineering safety considerations.

Therefore, the SWMM is more favorable with respect to engineering safety when the runoff generation model adopts the fixed runoff coefficient method.

The curves obtained using different confluence models when the runoff generation model adopts the Horton method (wherein permeable area can be obtained using the Horton method and impervious area can be obtained using the fixed runoff coefficient method) are as shown in Figures 8 and 9.

Figure 8

Comparison of simulated and measured process lines under rainfall conditions in June 2015 (the runoff generation model using the Horton method; the confluence model using the SWMM and Wallingford).

Figure 8

Comparison of simulated and measured process lines under rainfall conditions in June 2015 (the runoff generation model using the Horton method; the confluence model using the SWMM and Wallingford).

Close modal
Figure 9

Comparison of simulated and measured process lines under rainfall conditions in August 2015 (the runoff generation model using the Horton method; the confluence model using the SWMM and Wallingford).

Figure 9

Comparison of simulated and measured process lines under rainfall conditions in August 2015 (the runoff generation model using the Horton method; the confluence model using the SWMM and Wallingford).

Close modal

Figures 8 and 9 show that when the runoff generation model adopts the Horton method, the shape of the process line drawn by the two confluence models is similar to the rainfall process in June 2015. However, the peak water level corresponding to the peak rainfall of the curve obtained by the SWMM is relatively complete, which is conducive to engineering safety. Furthermore, the water level process line drawn from the confluence model, using the Wallingford method, shows that at around 9 a.m. on August 23, 2015, there was no peak water level of forebay corresponding to the measured rainfall peak, but the SWMM does. Similar to the preceding analysis, while the peak value generated by the SWMM used in the confluence model is higher than the actual water level, it is more conducive to engineering safety considerations.

Therefore, when the runoff generation model adopts the Horton method, it is more beneficial to choose SWMM as the confluence model with respect to engineering safety.

Figures 10 and 11 show the simulation results of all the rainfall runoff models for the two periods of rainfall in June and August 2015, respectively. Considering the shape similarity and engineering safety, the Horton method and SWMM were recommended for the runoff generation and confluence models, respectively. Additionally, in this study, the relative error between the simulated and actual values of each peak water level was estimated to evaluate the degree of conformity between the model and actual processes (Table 5). By observing Table 5, we can draw similar conclusions as Figures 10 and 11.

Table 5

Relative error of water level of pumping station forebay with six rainfall-runoff models

Runoff generation modelsConfluence modelsRelative error (%)
Rainfall conditions in June 2015Rainfall conditions in August 2015
Integrated runoff coefficient method SWMM −37% −27% 
Integrated runoff coefficient method Wallingford 22% 4% 
Fixed runoff coefficient method SWMM 16% 6% 
Fixed runoff coefficient method Wallingford −34% −16% 
Horton method SWMM 4% 4% 
Horton method Wallingford −38% −36% 
Runoff generation modelsConfluence modelsRelative error (%)
Rainfall conditions in June 2015Rainfall conditions in August 2015
Integrated runoff coefficient method SWMM −37% −27% 
Integrated runoff coefficient method Wallingford 22% 4% 
Fixed runoff coefficient method SWMM 16% 6% 
Fixed runoff coefficient method Wallingford −34% −16% 
Horton method SWMM 4% 4% 
Horton method Wallingford −38% −36% 
Figure 10

Effect evaluation of different runoff generation and confluence combination models under rainfall conditions in June 2015.

Figure 10

Effect evaluation of different runoff generation and confluence combination models under rainfall conditions in June 2015.

Close modal
Figure 11

Effect evaluation of different rainfall-runoff models under rainfall conditions in August 2015.

Figure 11

Effect evaluation of different rainfall-runoff models under rainfall conditions in August 2015.

Close modal

Similarly, by evaluating the runoff generation and confluence models of other systems, this study also obtained the same conclusions.

The model parameters in five systems were further calibrated and verified using the recommended rainfall-runoff model, and the results are shown in Table 6.

Table 6

Calibration results of model parameters for five drainage systems

System ASystem BSystem CSystem DSystem E
Initial loss value of impervious area (mm) 0.5 0.5 0.5 0.5 0.5 
Initial loss value of green area (mm) 
Initial infiltration rate of green space (mm/h) 70 65 70 70 70 
Steady infiltration rate of green space (mm/h) 2.5 2.5 2.5 2.5 
Characteristic width (m) Defaults Defaults*0.5 Defaults*0.8 Defaults Defaults 
Manning roughness of impervious area 0.023 0.0175 0.0175 0.023 0.023 
Manning roughness of green space 0.2 0.2 0.2 0.2 0.2 
System ASystem BSystem CSystem DSystem E
Initial loss value of impervious area (mm) 0.5 0.5 0.5 0.5 0.5 
Initial loss value of green area (mm) 
Initial infiltration rate of green space (mm/h) 70 65 70 70 70 
Steady infiltration rate of green space (mm/h) 2.5 2.5 2.5 2.5 
Characteristic width (m) Defaults Defaults*0.5 Defaults*0.8 Defaults Defaults 
Manning roughness of impervious area 0.023 0.0175 0.0175 0.023 0.023 
Manning roughness of green space 0.2 0.2 0.2 0.2 0.2 

Based on the selected rainfall-runoff model and parameters, we could proceed to the next step of research, which was to simulate the inflow point flow, water level, and the judgment result of weir lowering under these design conditions.

Procedure and results of calculation

Original data and standardized conversion

First, system A was taken as an example, and the variables mentioned in 2.1.1 were selected as the sample matrix based on the model's simulation results (the last data was assumed to be unknown). Table 7 presents the element table of relevant variables of system A.

Table 7

Element table of relevant variables

NumberOperation categoryInitial water level (m)Water level at the inflow point of Road X (m)Water level at the inflow point of Road Y (m)Water level at the most unfavorable point (m)Water level of pumping station forebay (m)Inflow point flow of Road X (m3/s)Inflow point flow of Road Y (m3/s)
1.3 2.06 1.91 2.15 2.17 6.04 4.32 
1.3 2.1 1.93 2.15 2.23 6.45 4.52 
1.3 2.09 1.92 2.08 2.24 6.34 4.46 
1.3 1.94 1.84 1.87 2.08 4.6 3.56 
0.25 1.47 1.15 1.63 2.02 12.25 7.81 
0.25 1.79 1.3 2.23 2.3 17.45 9.87 
1.3 2.25 2.02 2.8 2.32 8.36 5.49 
0.25 1.68 1.24 2.79 1.99 15.58 8.99 
0.25 1.35 1.06 1.77 1.5 10.48 6.65 
10 −4.32 −0.4 0.03 1.8 0.8 24.42 7.62 
11 −4.32 −0.61 −0.11 1.69 0.4 20.54 5.99 
12 −4.32 −0.75 −0.21 1.62 0.08 17.97 4.76 
13 −4.32 −0.99 −0.44 1.52 −0.37 14.08 2.51 
14 1.3 2.25 2.05 2.8 2.32 8.36 5.49 
15 1.3 2.34 2.07 3.07 2.46 9.56 6.13 
16 1.3 1.98 1.86 2.16 5.11 3.79 
17 1.3 1.79 1.73 1.85 1.79 3.08 2.53 
18 1.3 1.47 1.47 1.48 1.47 0.64 0.62 
19 NaN 0.25 1.08 0.88 1.42 1.15 6.93 4.58 
NumberOperation categoryInitial water level (m)Water level at the inflow point of Road X (m)Water level at the inflow point of Road Y (m)Water level at the most unfavorable point (m)Water level of pumping station forebay (m)Inflow point flow of Road X (m3/s)Inflow point flow of Road Y (m3/s)
1.3 2.06 1.91 2.15 2.17 6.04 4.32 
1.3 2.1 1.93 2.15 2.23 6.45 4.52 
1.3 2.09 1.92 2.08 2.24 6.34 4.46 
1.3 1.94 1.84 1.87 2.08 4.6 3.56 
0.25 1.47 1.15 1.63 2.02 12.25 7.81 
0.25 1.79 1.3 2.23 2.3 17.45 9.87 
1.3 2.25 2.02 2.8 2.32 8.36 5.49 
0.25 1.68 1.24 2.79 1.99 15.58 8.99 
0.25 1.35 1.06 1.77 1.5 10.48 6.65 
10 −4.32 −0.4 0.03 1.8 0.8 24.42 7.62 
11 −4.32 −0.61 −0.11 1.69 0.4 20.54 5.99 
12 −4.32 −0.75 −0.21 1.62 0.08 17.97 4.76 
13 −4.32 −0.99 −0.44 1.52 −0.37 14.08 2.51 
14 1.3 2.25 2.05 2.8 2.32 8.36 5.49 
15 1.3 2.34 2.07 3.07 2.46 9.56 6.13 
16 1.3 1.98 1.86 2.16 5.11 3.79 
17 1.3 1.79 1.73 1.85 1.79 3.08 2.53 
18 1.3 1.47 1.47 1.48 1.47 0.64 0.62 
19 NaN 0.25 1.08 0.88 1.42 1.15 6.93 4.58 
The dimension and magnitude of each variable, such as water level and flow, were inconsistent. Therefore, it was necessary to first standardize the data to eliminate the limitation of dimension and order of magnitude to facilitate subsequent statistical analysis. The equation used to standardize the data is as follows:
formula
(8)
formula
(9)
is the variance of the variable , and is the mean value of the column.

Calculation method and misjudgment rate

The judgment was obtained after the joint estimation of the covariance matrix was performed based on the sample using the linear discriminant method, assuming that the prior distributions of each group were p-element normal distributions with the same covariance matrix. The covariance matrix can be estimated using the linear diagonal matrix. In the quadratic discriminant method, the prior distribution of each group was assumed to be p-ary normal distribution, but the covariance matrix was not the same (Peck et al. 1988). In the Bayesian discriminant method, the future samples were predicted using a naive Bayesian classifier after the samples were fitted. The ML results of the four methods are presented in Table 8.

Table 8

Machine learning results for Bayesian, linear discrimination, and linear diagonal, quadratic discrimination methods

NumberLinear discriminant methodLinear diagonal matrixQuadratic discriminant methodBayes discriminant method
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
NumberLinear discriminant methodLinear diagonal matrixQuadratic discriminant methodBayes discriminant method
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 

P (j|i) (i = 1, 2) was used to represent the probability that samples originally belonging to group i were misjudged to belong to group j. The estimation of the misjudgment rate of system B was taken as an example of linear discriminant: P (1|0) = 0/6 = 0 and P (0|1) = 7/12. The misjudgment rate is Err = 0.5 P (1|0) + 0.5 P (0|1) = 0.29. The misjudgment rate of the linear diagonal matrix method was 29%.

Thus, the misjudgment rates of the five drainage systems A, B, C, D, and E were calculated, and the results are as shown in Table 9.

Table 9

Misjudgment rate of Bayesian, linear discrimination, linear diagonal, quadratic discrimination methods

Drainage systemMisjudgment rate (%)
Linear discriminant methodLinear diagonal matrixQuadratic discriminant methodBayes discriminant method
30 17 17 
29 46 46 46 
30 17 25 
40 40 32 
Drainage systemMisjudgment rate (%)
Linear discriminant methodLinear diagonal matrixQuadratic discriminant methodBayes discriminant method
30 17 17 
29 46 46 46 
30 17 25 
40 40 32 

In system A, the misjudgment rates of the linear, quadratic, Bayesian, and diagonal matrix discriminant methods were 0, 17, 17, and 30%, respectively. The linear discriminant method had the best effect on the deciding the lowering of weir. The results showed that all the misjudgement methods provided accurate results when the weir could be lowered. When the rainfall level was less than the yellow warning (a rainfall of more than 50 mm in 6 h), the type of misjudgment was less harmful. However, when the rainfall level was at the orange (a rainfall of more than 50 mm within 3 h) or red (rainfall of more than 100 mm within 3 h) warning, the deep tunnel was filled in advance, making it difficult to achieve the objective of improving the flood prevention standard. In system B, the linear discriminant method was not ideal to judge whether a weir fell; here, the misjudgment rate was in the range of 29–46%. Although the linear discriminant method had the lowest misjudgment rate of 29%, the misjudgment in this method was associated with the decision of when the weir was to be lowered. Further, other judgment methods made accurately judged when the weir should have been lowered but misjudged when the weir did not need to be lowered. The judgment result was unsatisfactory, which may be due to the large deviation between the sample and overall distribution. The application effect of the linear discriminant method in systems C and E were relatively ideal. All the judgment methods made correct decisions regarding when the weir should have been lowered; all misjudgments occurred regarding only when the weir was not to be lowered. In system D, the misjudgment rates of the linear, quadratic, diagonal matrix, and Bayesian discriminant methods were 4, 40, 40, and 32%, respectively. The application effect of the linear discriminant method was better than those of other methods. Furthermore, in the last three types of discrimination, there were two misjudgments that indicated the weir should not be lowered. In summary, the application effect of linear discrimination in the distance discrimination method was the best while that of the Bayesian method was poor.

In this study, five drainage systems in Shanghai, China, were taken as examples. Following the selection of relevant variables, four discriminant ML methods were used to assist decision-making on key steps of drainage system control. Among them, the linear discriminant method had the best judgment effect, and the average misjudgment rate was less than 10%, indicating a better auxiliary decision-making. However, misjudgment rates of different drainage systems differed because of the drainage system characteristics, the representativeness of control factors, the discriminant method, number of samples, and proximity of samples to the overall. Furthermore, the risk of some drainage systems relying entirely on auxiliary decision-making for operation control was high. However, because the effect of ML depends on the number of training samples, training samples can be continuously expanded by accumulating large amounts of effective data through static simulation and actual operation. The judgment effect should be improved as the number of training samples is closer to the population. Therefore, the application of ML in auxiliary decision-making under complex conditions still has certain theoretical and practical significance.

All relevant data are included in the paper or its Supplementary Information.

American Society of Civil Engineers
1992
Urban Water Resources Research Council & Federation, W. E. Design and Construction of Urban Stormwater Management Systems. American Society of Civil Engineers, Water Environment Federation
.
Bernardelli
A.
,
Marsili-Libelli
S.
,
Manzini
A.
,
Stancari
S.
&
Venier
S.
2020
Real-time model predictive control of a wastewater treatment plant based on machine learning
.
Water Science & Technology
81
(
11
).
doi:10.2166/wst.2020.298
.
Fleuren
L. M.
,
Klausch
T.
,
Zwager
C. L.
,
Schoonmade
L. J.
&
Elbers
P.
2020
Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy
.
Intensive Care Medicine
46
(
11
).
doi:10.1007/s00134-019-05872-y
.
Gudaparthi
H.
,
Johnson
R.
,
Challa
H.
&
Niu
N.
2020
Deep learning for smart sewer systems: assessing nonfunctional requirements
. In
Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS ‘20)
.
Association for Computing Machinery
,
New York, NY
,
USA
, pp.
35
38
.
doi: 10.1145/3377815.3381379
.
Hong
N.
,
Zhu
P.
&
Liu
A.
2017
Modelling heavy metals build-up on urban road surfaces for effective stormwater reuse strategy implementation
.
Environmental Pollution
231
(
Pt 1
),
821
828
.
doi:10.1016/j.envpol.2017.08.056
.
Kovarovic
K.
,
Aiello
L. C.
,
Cardini
A.
&
Lockwood
C. A.
2011
Discriminant function analyses in archaeology: are classification rates too good to be true?
Journal of Archaeological Science
38
(
11
),
3006
3018
.
doi:10.1016/j.jas.2011.06.028
.
Liao
L.
,
An
R.
,
Li
J.
,
Yi
W.
,
Liu
X.
,
Meng
W.
&
Zhu
L.
2019
Hydraulic characteristics of stepped spillway dropshafts for urban deep tunnel drainage systems-a case study of Chengdu city
.
Water Science and Technology
.
doi:10.2166/wst.2019.405
.
Liu
J.
,
Xia
L.
,
Mei
C.
,
Shao
W.
,
Haijun
Y. U.
&
Jianming
M. A.
2019
Effects of deep tunnel drainage system in urban waterlogging prevention
.
Journal of Basic Science and Engineering
.
doi:10.16058/j.issn.1005-0930.2019.02.002
.
McCuen
R.
1996
Hydrology
.
Federal Highway Administration
,
Washington, DC
.
Peck
R.
,
Linda
J. W.
&
Dean
Y. M.
1988
A comparison of several biased estimators for improving the expected error rate of the sample quadratic discriminant function
.
Journal of Statistical Computation & Simulation
29
(
2
),
143
156
.
doi:10.1080/00949658808811057
.
Rodríguez-Tomàs
E.
,
Iftimie
S.
,
Castañé
H.
,
Baiges-Gaya
G.
,
Hernández-Aguilera
A.
,
González-Viñas
M.
,
Casteo
A.
,
Camps
J.
&
Joven
J.
2021
Clinical performance of paraoxonase-1-related variables and novel markers of inflammation in coronavirus disease-19. A machine learning approach
.
Antioxidants
10
(
6
),
991
.
doi:10.3390/antiox10060991
.
Rossman
L. A.
2015
Storm Water Management Model User's Manual, Version 5.1
.
National Risk Management Research Laboratory, Office of Research and Development, US Environmental Protection Agency
,
Cincinnati, OH
,
US
.
Rossman
L. A.
&
Huber
W. C.
2016
Storm Water Management Model Reference Manual Volume I – Hydrology (Revised)
.
Envrionmental Protection Agency, Office of Research and Developement, National Risk Management Laboratory
,
Cincinnati, OH
,
US
.
Stühler
E.
,
Platsch
G.
,
Weih
M.
,
Kornhuber
J.
,
Kuwert
T.
&
Merhof
D.
2011
Multiple discriminant analysis of spect data for Alzheimer's disease, frontotemporal dementia and asymptomatic controls
.
IEEE
.
doi: 10.1109/NSSMIC.2011.6153848
.
Susanna
E.
,
Sharon
M.
,
Eylon
S.
,
Karletta
C.
&
Kelly
M. L.
2016
Opening the black box: using a hydrological model to link stakeholder engagement with groundwater management
.
Water
8
(
5
),
216
216
.
doi:10.3390/w8050216
.
Tan
Q.
,
Zhang
J.
&
Shi
Z.
2018
Introduction on Thames Tideway Tunnel Project in London. Shanghai Water
.
Wang
G. H.
,
Chen
Y.
,
Zhou
J. H.
,
Chen
Y. L.
,
Wen-Tao
L. I.
,
Yang
X. H.
&
Tao
L. Y.
2016
Discussion on application and development trend of deep tunnel drainage technology. China Water & Wastewater
.
Wei
Z. L.
,
Shang
Y. Q.
,
Sun
H. Y.
,
Xu
H. D.
&
Wang
D. F.
2019
The effectiveness of a drainage tunnel in increasing the rainfall threshold of a deep-seated landslide
.
Landslides
16
(
1
).
doi:10.1007/s10346-019-01241-4
.
Yang
M.
,
Zhang
Y.
&
Pan
X.
2020
Improving the horton infiltration equation by considering soil moisture variation
.
Journal of Hydrology
586
(
4
),
124864
.
doi:10.1016/j.jhydrol.2020.124864
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).