ABSTRACT
Cascade gates and pumps are common hydraulic structures in the open-canal section of water transfer projects, characterized by high energy consumption and substantial costs, making it challenging to regulate. By implementing cascade gates regulation to control the hydraulic process, lift distribution of pump stations can be optimized, thus enhancing operational efficiency and reducing energy consumption. However, the selection of control models and parameter optimization is difficult because hydraulic processes are nonlinear, high-dimensional, large hysteresis, strong coupling, and time-varying. This study considers minimum energy consumption of pump stations as the regulation objective and employs the reinforcement learning (RL) algorithm for optimization regulation (OR) within a typical canal section of the Jiaodong Water Transfer Project. Our results demonstrate that after regulating, OR can precisely control the water level to achieve the high efficiency lift interval of pump station, enhancing efficiency by 4.12–6.02% compared to previous operation. Moreover, using optimized hyperparameters group, the RL model proves robust under different work conditions. The proposed method is suitable for complex hydraulic processes, highlighting its potential to support more effective decision-making in water resources regulation.
HIGHLIGHTS
An open-canal segment of the Jiaodong Water Transfer Project was modeled.
Reinforcement learning was used to optimize the hydraulic control process.
Optimization regulation was robust under variable model conditions.
Low-energy consumption automatic regulation was achieved by the model.
ABBREVIATIONS
- OR
optimization regulation
- RL
reinforcement learning
- WTP
water transfer project
- OCWTP
open-canal of water transfer project
- SISO
single-input and single-output
- MIMO
multiple-input and multiple-output
- PI
proportional integral
- LQR
linear quadratic regulator
- MPC
model predictive control
- ID
integral delay
- DDPG
deep deterministic policy gradient
INTRODUCTION
A multitude of water transfer projects (WTPs) have been built globally, including the California State Water Project and Central Arizona Project in the United States, the Provence Water Project in France, and the South-to-North Water Diversion Project in China. Cascade gates and pumps are the prevalent configurations of hydraulic structures in open-canal water transfer projects (OCWTPs), characterized by a large engineering scale, numerous units, significant lift variation, high energy consumption, and substantial costs (Horváth et al. 2022). Optimization regulation (OR) can enhance the benefits of WTPs and the efficiency of their hydraulic structures, playing an increasingly crucial role in OCWTP because of their high efficiency, low labor cost, and satisfactory performance (Wang et al. 2022).
Primarily, research on the OR of water resources has focused on the development of appropriate models, including interval-chance constrained programming models (Zhao et al. 2023), fuzzy programming models (Li et al. 2022), and stochastic programming models (Liao et al. 2020). With continuous innovation in technological solutions, there has been a gradual shift toward achieving more accurate automation control ideas (Sun et al. 2023). These automatic control methods can be broadly categorized into single-input single-output (SISO) and multiple-input multiple-output (MIMO) controllers (Kong et al. 2022). In the SISO control method, a single check gate is controlled based on a single water level input, such as the proportional integral (PI) control method. In the MIMO control method, all check gates are controlled simultaneously according to the water level inputs from all monitoring points, such as the linear quadratic regulator (LQR) and model predictive control (MPC) (Kong et al. 2022).
The PI control method simplifies multivariate functions by utilizing transformation functions and exhibits high reliability. However, the interdependence of its parameters poses challenges for a comprehensive adjustment and compromises its robustness. Current research has mainly focused on the process of determining algorithm parameters (Kong et al. 2019). Arauz et al. (2020) introduced a novel parameter optimization approach for the PI method based on linear matrix inequality, which minimized the actual maximum error and undesirable mutual interactions between canal pools. Zhong et al. (2018) proposed an LQR model to determine the key parameters of a PI controller and tested it in different canals. Common methods for optimizing the parameters of PI control also include the automatic tuning change method (Litrico et al. 2007), frequency response method (Weyer 2008), optimization theory (van Overloop et al. 2005), and neural networks (Cheng & Wu 2023). The LQR control method exhibits notable performance in linear systems. In the nonlinear case, Boubacar Kirgni & Wang (2023) converted a nonlinear reactor model into a linear parameter-varying system and designed a reference control law based on the linear model by integrating an LQR-based control with terminal sliding-mode control. Mathiyalagan & Sangeetha (2022) discussed the robust finite-time stability of conic-type nonlinear systems with time-varying delays by utilizing LQR in nonlinear systems using the Lyapunov-Krasovskii stability theory and the linear matrix inequality approach. Given that hydrodynamics in the OCWTP exhibit strong nonlinearity, it may be more convenient for decision makers to directly select a method that exhibits a good application effect in nonlinear cases. MPC is a MIMO control technology that integrates prediction methods, control theory, and optimization methods. For the OR of hydraulic structures, prediction methods use hydraulic models to calculate the hydraulic process. Kong et al. (2023) presented a closed-loop MPC method for pumping stations in the CH-BWH lake based on an integral delay (ID) predictive method. Rodriguez et al. (2020) used a centralized linear MPC to stabilize an irrigation system whose prediction process was represented by an ID model. Zheng et al. (2019) formulated an MPC for a cascaded irrigation canal system in Gansu, China using an ID model. These studies demonstrate that the ID model is a typical selection as predictive method in MPC. This is attributed to the differential characteristics of the Saint-Venant equation, making it difficult to directly establish the relationship between the water level and the discharge at the control point. Because the ID model is derived from the linearization of the Saint-Venant equation near the stable point, theoretically, the ID model is only suitable for small-scale study areas and linear cases (Schuurmans 1997). Thus, more sophisticated predictive methods and advanced MPCs methods have been developed, such as adaptive MPC (Liu et al. 2023), robust MPC (Chen et al. 2021a, 2021b), and nonlinear MPC (Aydin et al. 2022). However, the calculation process for these MPCs is considerably long and inefficient (Ding et al. 2018) especially in high-dimensional cases (Ion et al. 2014).
In practice, in addition to nonlinearity, the hydraulic process also exhibits the following challenging characteristics: (a) high dimensionality: there is massive high-dimensional data information that needs to be considered, such as discharge, water level, lift, and gate combinations. (b) Potential scarcity: information can sometimes not be collected completely or simultaneously, causing considerable challenges in water resource management and control (Benra et al. 2021). Existing research falls short of extracting useful information from high-dimensional but potentially scarce data (Zeyou et al. 2023). (c) Strong time variability: flow process is stochastic and complicated (Halil 2022). Solving dynamic optimization problems cannot rely solely on traditional algorithms, as the objective function, constraints, and Pareto front surface may change over time (Farina et al. 2004). Prevalent particle swarm optimization and genetic algorithms are mediocre algorithms for solving dynamic optimization problems (Jordehi 2014), because their variants have difficulty balancing all-period optimality and dynamic computational effort issues (Kim et al. 2014). With the rapid development of artificial intelligence algorithms in recent decades, neural network algorithms have overcome some limitations of previous studies and have attracted widespread attention. However, it is equally challenging to construct the aforementioned mathematical models and traditional neural network models for large-scale long-distance WTPs that lack comprehensively measured regulatory data (Gan 2022). Reinforcement learning (RL) with self-learning and adaptive capabilities that effectively handles nonlinear and high-dimensional conditions, and does not require any labeled data, has gained attention in various fields. Compared to mathematical models, the RL model is relatively simple and can demonstrate a fast response and high accuracy, even in basins with limited data (Aydin et al. 2022). It has been applied in areas such as water resource regulation (Lee & Labadie 2007; Castelletti et al. 2010, 2013, 2014; Madani & Hooshyar 2014), real-time flood control (Saliba et al. 2020; Bowes et al. 2022), and wastewater treatment (Chen et al. 2021a, 2021b; Lu et al. 2021; Zhou et al. 2022). Although RL has great potential for hydraulic control, to the best of our knowledge, studies that have utilized the RL algorithm for low-energy consumption automatic hydraulic regulation in OCWTP (Hu et al. 2023) are scarce. The possible reasons are as follows: (a) although RL training does not require numerous labeled data, an available environment requires data from complex hydraulic models, which is not easy to construct; (b) the hyperparameters of RL are difficult to design; (c) computationally, the response of the environment to the actions generated by RL is a complex process, difficult to converge, which may be the primary reason for the use of simple models.
Currently, in China, the regulation of hydraulic structures in OCWTP still relies mainly on manual subjective experience (Sun et al. 2023), which can lead to excessive consumption and energy wastage at pump stations (Wang et al. 2022). The purpose of this study is to present a low-energy consumption automatic regulation model in a typical canal of Jiaodong WTP. By employing cascade gates regulation to achieve canal hydraulic control, the optimal operating condition of the pump station can be achieved, thereby reducing the energy consumption required for water transportation in the pump station and achieving a low-energy redistribution of water resources. The RL algorithm is a suitable approach for nonlinear, high-dimensional, large hysteresis, strong coupling, and time-varying hydraulic processes to enhance the efficiency of pump stations and boost the overall benefits of WTPs. The main novelties of this paper are the selection of control methods and parameters optimization. It is hoped that the concepts, initial results, and formulations provided in this study will help build a foundation to support RL as a viable option for hydraulic control.
METHODOLOGY
This study investigates the low-energy consumption automatic regulation of cascade gates and pump in open canals for the Jiaodong WTP, based on the RL model. The research steps were as follows:
Typical canal sections were selected for the study and generalized. Subsequently, a hydraulic model was constructed using the HEC-RAS software to provide a environment foundation for RL model. Through the secondary development of the HEC-RAS using Python, programming modifications of the hydraulic model could be achieved.
An OR model based on RL was established, and constraint conditions and optimization objectives were implemented through the designation of reward function in the RL model. The optimization objective was to minimize energy consumption of the pump station. The constraint conditions included the water level constraint, lift constraint, and discharge constraint. Then, the model parameter settings were discussed.
Finally, the typical conditions to validate the RL model were selected.
RL for optimal regulation of hydraulic structures
In RL, the agent executes actions with the goal of maximizing the long-term return R. R is formulated to reflect the specific control objectives. The agent receives a negative reward to prevent unexpected regulation schemes, such as an unsafe water level or costly pump lift, and receives a positive reward to encourage actions that satisfy the objective function. By maximizing these rewards in response to various actions over time, the agent learns a control strategy to achieve the desired objective.
Various types of RL algorithms are currently available, and the classifications of RL and its applications in water research are shown in Table 1.
Category . | Algorithm . | Pros . | Cons . | References . | |
---|---|---|---|---|---|
Model-based | Given model | AlphaZero | Effective with simple tasks | Its applications are limited by the requirement of a deterministic environment, such as Go chess with a deterministic winning outcome, making it unsuitable for complex hydraulic process. | / / / / / / / |
Learning the model | World Models | ||||
I2A | |||||
MBVE | |||||
MBMF | |||||
Model-free | Value-based | C51 | Good convergence and less computational cost | Only applicable to discrete action spaces. But the gate opening is a continuous process. Can only be adopted for simple examples, such as single gate scenarios. | |
QR-DQN | |||||
HER | |||||
DQN | water reservoir (Lee & Labadie 2007; Castelletti et al. 2010; Madani & Hooshyar 2014), water heater(Amasyali et al. 2021) | ||||
Policy-based | Policy gradient | Good robustness and high efficiency | High computational cost and high memory consumption. | / | |
A2C/A3C | water energy (Yuansheng et al. 2021), irrigation (Alibabaei et al. 2022) | ||||
PPO | hydraulic structure (Lu et al. 2021; Xu et al. (2021) | ||||
TRPO | / | ||||
Combination of value and policy method | DDPG | A trade-off between convergence speed, computational cost | Sensitive to hyperparameters | water desalination (Bonny et al. 2022), water reactor (Chen & Asok Ray 2022), water tank(Likun & Jiang 2023), storm water (Saliba et al. 2020), water diversion strategy (Jiang et al. 2024) | |
TD3 | / | ||||
SAC | / |
Category . | Algorithm . | Pros . | Cons . | References . | |
---|---|---|---|---|---|
Model-based | Given model | AlphaZero | Effective with simple tasks | Its applications are limited by the requirement of a deterministic environment, such as Go chess with a deterministic winning outcome, making it unsuitable for complex hydraulic process. | / / / / / / / |
Learning the model | World Models | ||||
I2A | |||||
MBVE | |||||
MBMF | |||||
Model-free | Value-based | C51 | Good convergence and less computational cost | Only applicable to discrete action spaces. But the gate opening is a continuous process. Can only be adopted for simple examples, such as single gate scenarios. | |
QR-DQN | |||||
HER | |||||
DQN | water reservoir (Lee & Labadie 2007; Castelletti et al. 2010; Madani & Hooshyar 2014), water heater(Amasyali et al. 2021) | ||||
Policy-based | Policy gradient | Good robustness and high efficiency | High computational cost and high memory consumption. | / | |
A2C/A3C | water energy (Yuansheng et al. 2021), irrigation (Alibabaei et al. 2022) | ||||
PPO | hydraulic structure (Lu et al. 2021; Xu et al. (2021) | ||||
TRPO | / | ||||
Combination of value and policy method | DDPG | A trade-off between convergence speed, computational cost | Sensitive to hyperparameters | water desalination (Bonny et al. 2022), water reactor (Chen & Asok Ray 2022), water tank(Likun & Jiang 2023), storm water (Saliba et al. 2020), water diversion strategy (Jiang et al. 2024) | |
TD3 | / | ||||
SAC | / |
Owing to the requirement of continuous adjustment of the gate opening and practical operating conditions, the combination of value and policy methods is an appropriate option for the OR of hydraulic structures. The deep deterministic policy gradient algorithm (DDPG) is typical of the possible options, possessing strong nonlinear and high-dimensional modeling capabilities, and exhibiting notable real-time optimization.
DDPG and its network architecture
The performance of Qπ(s,a) in RL is not always satisfactory in many tasks. First, when using a Q value table to record the returns of executing various actions in different states, the high dimensionality of state action space can lead to a ‘curse of dimensionality’. Moreover, the initial Q value obtained may be inefficient and may require multiple visits to improve the corresponding Q value. Furthermore, the algorithm is susceptible to uncertainty, which leads to unstable convergence.
Figure 2 reveals that DDPG incorporates four neural networks, thereby resulting in a multitude of algorithmic parameters. Consequently, the convergence of the algorithm is highly sensitive to the parameter settings.
Hydraulic calculation and software introduction
By incorporating the boundary conditions of the river section and employing the Newton–Raphson iteration, it is feasible to directly calculate the changes in discharge and water level at the initial and terminal sections, as well as the increments in water level and discharge at different intermediate sections.
Construction of an OR model
This study presents a low-energy consumption OR model for cascade pump and gates in the open-canal of the Jiaodong WTP based on DDPG algorithm.
Study area
An overview of the hydraulic construction in the typical canal is provided in Table 2.
Hydraulic constructions . | Indicator . | Value . |
---|---|---|
Wangnou Pump Station | Maximum Water level of Pump Outlet Pool (LM-p) | 12.55 m |
Minimum Water level of Pump Outlet Pool (Lm-p) | 10.63 m | |
Maximum Net Lift (HM-p) | 10.05 m | |
Minimum Net Lift (Hm-p) | 9.30 m | |
Gate 1: Jiagou Gate | Design Front Water Level of Gate 1 (Ld-g1f) | 11.71 m |
Design Rear Water Level of Gate 1 (Ld-g1r) | 11.50 m | |
Design discharge of Gate 2 (Qd1) | 29.50 m3/s | |
Gate 2: Huaihexidi Gate | Design Front Water Level of Gate 2 (Ld-g2f) | 11.45 m |
Design Rear Water Level of Gate 2 (Ld-g2r) | 11.40 m | |
Design discharge of Gate 2 (Qd2) | 29.00 m3/s | |
Gate 3: Huaihe Gate | Design Front Water Level of Gate 3 (Ld-g3f) | 11.39 m |
Design Rear Water Level of Gate 3 (Ld-g3r) | 10.68 m | |
Design discharge of Gate 3 (Qd3) | 29.00 m3/s | |
Gate 4: Huaihezhongdi Gate | Design Front Water Level of Gate 4 (Ld-g4f) | 10.56 m |
Design Rear Water Level of Gate 4 (Ld-g4r) | 10.50 m | |
Design discharge of Gate 4 (Qd4) | 29.00 m3/s |
Hydraulic constructions . | Indicator . | Value . |
---|---|---|
Wangnou Pump Station | Maximum Water level of Pump Outlet Pool (LM-p) | 12.55 m |
Minimum Water level of Pump Outlet Pool (Lm-p) | 10.63 m | |
Maximum Net Lift (HM-p) | 10.05 m | |
Minimum Net Lift (Hm-p) | 9.30 m | |
Gate 1: Jiagou Gate | Design Front Water Level of Gate 1 (Ld-g1f) | 11.71 m |
Design Rear Water Level of Gate 1 (Ld-g1r) | 11.50 m | |
Design discharge of Gate 2 (Qd1) | 29.50 m3/s | |
Gate 2: Huaihexidi Gate | Design Front Water Level of Gate 2 (Ld-g2f) | 11.45 m |
Design Rear Water Level of Gate 2 (Ld-g2r) | 11.40 m | |
Design discharge of Gate 2 (Qd2) | 29.00 m3/s | |
Gate 3: Huaihe Gate | Design Front Water Level of Gate 3 (Ld-g3f) | 11.39 m |
Design Rear Water Level of Gate 3 (Ld-g3r) | 10.68 m | |
Design discharge of Gate 3 (Qd3) | 29.00 m3/s | |
Gate 4: Huaihezhongdi Gate | Design Front Water Level of Gate 4 (Ld-g4f) | 10.56 m |
Design Rear Water Level of Gate 4 (Ld-g4r) | 10.50 m | |
Design discharge of Gate 4 (Qd4) | 29.00 m3/s |
Relevant data include geospatial data (river network, cross-section, geospatial data of hydraulic structures), flow data (non-constant flow discharge), hydraulic data (manning coefficient, canal roughness, slope gradient), boundary conditions (upstream and downstream boundary conditions and internal boundary conditions), and the information presented in Figure 4. These data were provided by the Shandong Province Bureau of the Jiaodong Water Diversion. Due to the good maintenance of the Jiaodong WTP, the hydraulic parameters were selected based on the design values.
Construction of the hydraulic model
Construction of the DDPG model
(1) Environment construction: The main function of the environment is to connect the DDPG model with hydraulic models, thereby providing a basic environment for the DDPG model. At a given moment, in DDPG, the agent selects an action (gate opening) based on state (hydraulic information) and control strategies. The action and state are then input into the environment module (hydraulic model). By calculating the unsteady flow in the hydraulic model, the state is updated at the next time step and compared with the target value. DDPG receives feedback in the form of a reward function, providing a directional judgment of the updating effect until the end condition is reached. The construction of the environmental module included the following steps.
(1) Reading of HEC-RAS: Python was used to complete the reading of the HEC-RAS files, including geometric topology, hydraulic boundary conditions, and unsteady flow operating condition information in the hydraulic model, and to provide hydraulic information in the current state to the DDPG model.
(2) Modification of the gate opening: The gate opening was extracted and modified using functions built in Python.
(3) Operation of the hydraulic model: Run the modified unsteady flow operation file of HEC-RAS using the Python function.
(4) Output of result: Read the result file of the HEC-RAS, obtain hydraulic information, such as the water level and discharge at the next time step, calculate the reward according to the reward function, and then provide feedback to the DDPG model.
(5) Preservation of hydraulic model: Save the hydraulic model after running.
(2) Construction of agent: An agent comprises a set of strategies and actions. This strategy represents the current estimation of the action distribution. This study employs a noise strategy for action selection to strike a balance between exploration and exploitation.
(3) Establishment of state function: In the DDPG, the state function can specifically represent various types of information of the agent. In this study, the water level and discharge of the control and pump lift at the current time were considered as the states. By setting a certain time step for the state transition in HEC-RAS, state information from the previous moment can be transmitted to the next moment.
- (4) Establishment of reward function: The optimization goal is to minimize the energy consumption of the pump station. By regulating the cascade gates, the water level of the pump outlet pool Lp can be controlled to reach the target water level L*, thereby achieving a high efficiency lift H* of pump station (calculated through Figure 4); namely, min{|Lp-L*|}. As a result, the reward function is defined as f = k1-k2*|Lp-L|C (k1, k2, and C are constant). In addition, if the parameters exceed the constraint conditions, a large penalty value is assigned to the reward function. Three relative constraints are included: the water level, lift, and discharge constraints. For the water level constraint, the water level of the pump outlet pool Lp does not exceed its maximum and minimum values, Lm−p < Lp < LM−p; the water level of the gate Li (i = 1, 2, 3, 4) does not exceed its designed value, Li < Ld−gi, and the rate of water level decline does not exceed 0.30 m/day. The lift constraint specifies that the lift of the pump station does not exceed the maximum and minimum values Hm-p < Hp < HM-p. Finally, the discharge constraint indicates that the discharge of gate Qi does not exceed its design value Qi < Qdi. Considering the Compilation of Construction and Management Documents of Yellow River Diversion Project in Jiaodong Area and programming experience, the reward functions f were set as Equation (10):where C is the power of reward function. When C = 1, the reward function was linear. When C = 2, the reward function was quadratic.
(5) Model setting: For model setting in DDPG, the architecture of the neural network is related to its ability to fit complex relationships, convergence, and training time (Kapanova et al. 2018). In this study, the network architectures of the actor and critic were set as shown in Table 3. The optimizer used in the training process was an efficient Adam optimizer (Zhang 2018).
Component of architecture . | Actor . | Critic . |
---|---|---|
Input layer | States | States, action |
Size of first hidden layer | 256 | 256 |
The activation function of the first layer | ReLu | ReLu |
Output processing of the first layer | Gaussian noise | – |
Size of second hidden layer | 256 | 256 |
The activation function of the second layer | ReLu | ReLu |
Output processing of the second layer | Gaussian noise | – |
Output layer | Action | Q value |
Dimension of the output layer | 1 | 1 |
The activation function of the output layer | Tanh | ReLu |
Component of architecture . | Actor . | Critic . |
---|---|---|
Input layer | States | States, action |
Size of first hidden layer | 256 | 256 |
The activation function of the first layer | ReLu | ReLu |
Output processing of the first layer | Gaussian noise | – |
Size of second hidden layer | 256 | 256 |
The activation function of the second layer | ReLu | ReLu |
Output processing of the second layer | Gaussian noise | – |
Output layer | Action | Q value |
Dimension of the output layer | 1 | 1 |
The activation function of the output layer | Tanh | ReLu |
Suitable hyperparameter settings foster efficient exploration in complex environments and expedite the identification of optimal policies. The main hyperparameters include learning rate, update factor of soft transition τ, noise value, capacity of experience pool D and discount factor γ. The Sobol method was used to analyze the sensitivity of each parameter to the objective function as described in Zhang et al. (2015).
- (6) Evaluation of results : To evaluate the effectiveness of the DDPG model, it was assessed using indicators such as the degree of reward improvement /% (I1), efficiency improvement degree of pump station /% (I2), rate of pump lift within the high efficiency interval (I3), rate of pump lift between maximum and minimum values (I4), rate of gates water level does not exceed design value/% (I5), and rate of gates discharge does not exceed design value/% (I6), which are defined as Equation (11):where P1 and P2 are the operating efficiencies of the Wangnou Pump Station without OR and after OR, respectively; NL is the number of data within the high efficiency lift interval; Nl is the number of data within the maximum and minimum lift values; Nh is the number of data not exceeding the design water level of gates; Nd is the number of data not exceeding the design discharge of gates; and N is the total number of data.
RESULTS AND DISCUSSION
Results
(1) Model setting: The well-performing DDPG benefited from the rational architecture, hyperparameter settings, and selection of the reward function. The main settings included the learning rate, update factor of soft transition τ, noise value, capacity of experience pool D and discount factor γ, and the power of reward function C. However, these hyperparameters in DDPG lacked a uniform standard and their values were based on experience and experimentation (Liessner et al. 2019). The Sobol method was used to analyze the sensitivity of each model setting to the objective function. The number of samples was set to 500. The results are presented in Figure 6.
Scenario group . | ID . | Settings . |
---|---|---|
A: Learning rate | A1 | LR of actor network = 0.001, LR of critic network = 0.001 |
A2 | LR of actor network = 0.001, LR of critic network = 0.0001 | |
A3 | LR of actor network = 0.0001, LR of critic network = 0.001 | |
A4 | LR of actor network = 0.0001, LR of critic network = 0.0001 | |
B: Noise value | B1 | Initial noise value = 0.1 final_noise value = 0.001 descent factor = 0.999 |
B2 | Initial noise value = 0.1 final_noise value = 0.001 descent factor = 0.99 | |
B3 | Initial noise value = 0.5 final_noise value = 0.001 descent factor = 0.999 | |
B4 | Initial noise value = 0.5 final_noise value = 0.001 descent factor = 0.99 | |
C: γ | C1 | γ= 0.98 |
C2 | γ= 0.95 | |
C3 | γ= 0.90 | |
D: Reward function | D1 | Linear reward function C = 1 |
D2 | Quadratic reward function C = 2 |
Scenario group . | ID . | Settings . |
---|---|---|
A: Learning rate | A1 | LR of actor network = 0.001, LR of critic network = 0.001 |
A2 | LR of actor network = 0.001, LR of critic network = 0.0001 | |
A3 | LR of actor network = 0.0001, LR of critic network = 0.001 | |
A4 | LR of actor network = 0.0001, LR of critic network = 0.0001 | |
B: Noise value | B1 | Initial noise value = 0.1 final_noise value = 0.001 descent factor = 0.999 |
B2 | Initial noise value = 0.1 final_noise value = 0.001 descent factor = 0.99 | |
B3 | Initial noise value = 0.5 final_noise value = 0.001 descent factor = 0.999 | |
B4 | Initial noise value = 0.5 final_noise value = 0.001 descent factor = 0.99 | |
C: γ | C1 | γ= 0.98 |
C2 | γ= 0.95 | |
C3 | γ= 0.90 | |
D: Reward function | D1 | Linear reward function C = 1 |
D2 | Quadratic reward function C = 2 |
(2) The OR scheme: A study case, 2021/02/03 0:00–02/05 0:00 was selected for a typical example as shown in Figure 8. The Wangnou Pump Station is located at the upper boundary of the typical canal. The discharge of Wangnou Pump Station is shown in (a). Four water pumps were operated at Wangnou Pump Station during the study period. A single-station discharge was obtained by dividing the data in (a) by four. The water level of the pump inlet pool (b) was determined under this specific water inflow conditions. The pump lift is equal to the difference between the water levels of the inlet and outlet pools. By implementing cascade gates regulation to control the hydraulic process, the optimal water level of the pump outlet pool L* can be achieved. As a result, the high efficiency lift of the pump station can be attained. The high efficiency lift was calculated by inputting a single-station discharge into Figure 4, as shown in (c). According to the Compilation of Construction and Management Documents of Yellow River Diversion Project in Jiaodong Area, the high efficiency lift interval was ± 0.25 m for L*, as shown in (d).
Figure 9 shows that the DDPG model converged ideally (Figure 9(a)); the water level of the pump outlet pool could be precisely controlled within the optimal interval (Figure 9(c)); and the efficiency of the pump station enhanced 4.28% after OR (Figure 9(d)). The opening of Gate 1 varied significantly from the initial value of 2, whereas the opening of gates 2–4 was approximately 2. This may be because Gate 1 is the closest to the pump station, and the hydraulic response of the Wangnou Pump Station to Gate 1 was faster and stronger.
(3) Robustness analysis: Multiple operating conditions with different characteristics were selected to verify the robustness of the proposed model. The evaluation of different operating conditions is listed in Table 5. The regulation results are shown in Figure 10.
Case . | Indicator . | ||||||||
---|---|---|---|---|---|---|---|---|---|
Time . | Note . | I1 (%) . | I2 (%) . | I3 (%) . | I4 (%) . | I5 (%) . | I6 (%) . | ||
First 50 episodes . | After 50 episodes . | ||||||||
Case 1 | 2022/03/19 0:00 ∼ 03/20 0:00 | high discharge, low single pump discharge | 137.98 | 12.89 | 6.02 | 100 | 99.17 | 100 | 70.25 |
Case 2 | 2021/12/11 0:00 ∼ 12/12 0:00 | low discharge, large fluctuations in control target | 242.54 | 29.90 | 4.15 | 99.17 | 100 | 100 | 100 |
Case 3 | 2023/03/17 0:00 ∼ 03/18 0:00 | large discharge variance | 204.63 | 23.55 | 5.43 | 100 | 100 | 100 | 100 |
Case 4 | 2022/06/04 0:00 ∼ 06/05 0:00 | small discharge variance | 181.09 | 29.17 | 5.74 | 100 | 100 | 100 | 100 |
Case 5 | 2021/05/27 0:00 ∼ 05/28 0:00 | high single pump discharge | 234.26 | 22.10 | 4.12 | 100 | 100 | 100 | 100 |
Case 6 | 2021/03/31 0:00 ∼ 04/01 0:00 | low single pump discharge | 139.29 | 44.01 | 5.20 | 100 | 98.33 | 99.17 | 100 |
Case 7 | 2021/03/13 0:00 ∼ 03/14 0:00 | large fluctuations in control target | 72.28 | 19.75 | 5.34 | 97.50 | 100 | 100 | 100 |
Case . | Indicator . | ||||||||
---|---|---|---|---|---|---|---|---|---|
Time . | Note . | I1 (%) . | I2 (%) . | I3 (%) . | I4 (%) . | I5 (%) . | I6 (%) . | ||
First 50 episodes . | After 50 episodes . | ||||||||
Case 1 | 2022/03/19 0:00 ∼ 03/20 0:00 | high discharge, low single pump discharge | 137.98 | 12.89 | 6.02 | 100 | 99.17 | 100 | 70.25 |
Case 2 | 2021/12/11 0:00 ∼ 12/12 0:00 | low discharge, large fluctuations in control target | 242.54 | 29.90 | 4.15 | 99.17 | 100 | 100 | 100 |
Case 3 | 2023/03/17 0:00 ∼ 03/18 0:00 | large discharge variance | 204.63 | 23.55 | 5.43 | 100 | 100 | 100 | 100 |
Case 4 | 2022/06/04 0:00 ∼ 06/05 0:00 | small discharge variance | 181.09 | 29.17 | 5.74 | 100 | 100 | 100 | 100 |
Case 5 | 2021/05/27 0:00 ∼ 05/28 0:00 | high single pump discharge | 234.26 | 22.10 | 4.12 | 100 | 100 | 100 | 100 |
Case 6 | 2021/03/31 0:00 ∼ 04/01 0:00 | low single pump discharge | 139.29 | 44.01 | 5.20 | 100 | 98.33 | 99.17 | 100 |
Case 7 | 2021/03/13 0:00 ∼ 03/14 0:00 | large fluctuations in control target | 72.28 | 19.75 | 5.34 | 97.50 | 100 | 100 | 100 |
Table 5 indicates that the reward function converges after approximately 50 generations. After the regulation, the efficiency of the pumping station was improved by 4.12–6.02% (I2), efficiently reducing the energy consumption of the Wangnou Pump Station. The lift of the pump station could be precisely controlled within the high efficiency lift interval (I3 ≥ 97.50), the pump lift could be controlled between its design minimum and maximum value (I4 > 98.33%), the water level of gates could be controlled under its design value (I5 > 99.17%). I3 of Case 2 and Case 7 did not reach 100%, probably because their control targets fluctuated significantly, making it difficult to precisely control the level within the fluctuating interval. I4 of Case 1, Case 6 and I5 of Case 6 did not reach 100%, probably because the single-pump discharge of Case 1 and Case 6 were small. According to the discharge-lift-efficiency curve of one water pump (Figure 4), the corresponding high-efficiency lift was high, resulting in a higher corresponding water level of the pump outlet pool; thus, it was more likely to exceed the upper bounds. I6 of Case 1 was relatively low because Case 1 had a high discharge value that exceeded its design value. For floods exceeding the design standard, even after regulation, the safety requirements may still be difficult to meet. Therefore, other methods should be adopted to address flooding, such as reservoir and flood detention area regulations or sponge city research.
In general, the constructed DDPG model exhibited good performance under different operating conditions and possessed distinct robustness, which is attributed to its adoption of a deep learning architecture, shared parameters across the 4 networks, and experience pool techniques. These methods enabled the model to capture commonalities between different conditions during the training process, resulting in improved adaptability and strong generalization capabilities when encountering new conditions.
Discussion
(1) Tips facilitate convergence: Convergence can be facilitated through several strategies. a) Generating targeted experience pool: To obtain the best performance of the neural network, generating targeted training samples is necessary (Mudunuru et al. 2022). Based on the measured hydraulic data, positive and negative sampling can be simultaneously used when constructing the experience pool during the training process to avoid the ‘curse of dimensionality’ and overfitting, and facilitate convergence and value function estimation. A mediocre programming scenario occurs when the initial values of the buffer are all set to 0 or an empty buffer, rather than positive and negative sampling based on prior knowledge. DDPG is smart enough to learn a better strategy even in this way but requires much more time to run. b) Embedding prior knowledge into neural network: Incorporating prior knowledge into a neural network can also reduce overfitting and accelerate convergence (Raissi et al. 2019). For instance, if our regulation objective is to enhance the water level of a specific river cross section, the opening degrees of gates located downstream may tend to decrease, but the decline quantity is obscure. This experience serves as crucial prior knowledge for the actor neural network. We can enable the actor network to have a higher probability of taking on reduced values in advance. c) Baseline of reward function: In this study, the reward function f = positive baseline – |agent's state-its control target|. When the control effect was ideal, the reward was positive; otherwise, it was negative. The convergence speed is higher with a positive baseline (Wang et al. 2005). d) Noise value setting: Given that the noise value decreases as the training process, leading to less exploration and more exploitation. The agent must learn multiple useful knowledge before the noise decreases to a relative small value. Consequently, the suitable noise setting should take the size of experience pool, learning rate and other hyperparameters into consideration. After using the activation function to map the input of the actor network to [−1, 1] or [0, 1], the addition of noise is likely to exceed the interval, resulting in the actor frequently taking extreme values. As a solution, adjusting noise decline speed, optimizing network structures, and regularization can be employed. e) Information dimension and network's tensor structure: When the proposed method was applied to an unsteady flow, it was necessary to add discharge information to the input layer of the neural network, which may require a redesign of the neural network's tensor structure. Based on programming experience, DDPG is smart enough to learn a better strategy without discharge information but requires much more time to run.
(2) Limitations of implications: Certain limitations are evident in the extent of the study implications. Because of the continuous water supply process of the Wangnou Pump Station, the on/off states of the pump units were not considered in the current model. In addition, the number of gate adjustments also affects energy consumption (Sun et al. 2023). The control time interval is set to 2 hours according to practical conditions. A larger time interval may compromise the control effect but a smaller time interval may contribute more adjustment time and more energy consumption. Subsequent research can incorporate the goal of minimizing the control times of the gates into the optimal objectives to further reduce the regulation energy consumption.
(3) Suggestions of future implications: As RL learns strategies through interactions between the agent and corresponding environment, agents can improve strategies in dynamic and uncertain environments, resulting in strong applicability. RL can be used to achieve high-dimensional multivariable collaborative control. The regulatory objective of this study is to reduce energy consumption. However, the RL algorithm is instrumental in various regulatory objectives. In future applications, when the minimum water shortage, highest improvement in water quality, flood control water level exceedance, or ecological water level satisfaction are considered in the objective function, water resource regulation, water quality-quantity regulation, flood control regulation, or ecological regulation can be achieved. Furthermore, instead of requiring a large amount of pre-annotated training data, the RL algorithm continuously learns through interactions with the corresponding environment, making it applicable to open canals with insufficient data. In addition, convergence of DDPG algorithm is sensitive to the parameter settings. But there is currently no systematic and authoritative model setting method that can be applied to different research areas. We suggest obtaining the optimal hyperparameters settings through experimental comparison. This article identifies suitable model parameter groups and model objective function parameter groups through experiments, hoping to provide setting experience for related research. The data used in this study were limited to the WTP of Jiaodong in China, but it can be expected that the model would also work under different scenarios.
CONCLUSIONS
This study takes the typical open-canal of the Jiaodong WTP as an example and uses RL to conduct research on OR in the OCWTP. By employing cascade gates regulation to achieve canal hydraulic control, the high efficiency operating lift of the pump can be attained, thereby reducing the energy consumption required for water transportation. The purpose of this study is to achieve a low-energy redistribution of water resources.
The unique contribution of this study is the selection of a suitable method for hydraulic control after fully considering the complex characteristics of the hydraulic process and optimizing the model parameters. Owing to the characteristics of nonlinearity, large hysteresis, strong coupling, high dimensionality, and time-varying hydraulic process, the real-time regulation challenge in OCWTP is exceptionally steep. Existing research on hydraulic control falls short of extracting useful information from high-dimensional (discharge, water level, lift, gate combinations, etc.) but potentially scarce data (hydraulic data may not be measured or are difficult to collect in some cases). The proposed model, which combines the RL algorithm and the HRC-RAS software, overcomes the limitations of previous control methods. RL, with self-learning and adaptive capabilities that effectively handle nonlinear and high-dimensional conditions and do not require any labels, has high control accuracy even in basins with limited data. To the best of our knowledge, this is the first instance in which it has been explicitly adopted for low-energy consumption automatic regulation of hydraulic structures in an OCWTP. The results demonstrate a good coupling effect between the RL and hydraulic models. The OR can precisely control the hydraulic process to achieve an interval of high efficiency pump lift, improving efficiency of the pump station by 4.12–6.02% compared to previous operations. Moreover, the noise value and learning rate are important parameters for model results, and their uncertainties should be carefully considered during the establishment of the OR model. Using well-designed hyperparameters, OR proved to be robust under different uncertainties in the model parameters.
The concepts, initial results, and formulations provided in this study should help build a foundation to support RL as a viable option for hydraulic control.
ACKNOWLEDGEMENTS
We wish to thank the Shandong province Jiaodong Water Transfer Bureau for providing the required data. We also thank the reviewers and editors for their insightful comments and suggestions that improved the clarity of the paper.
AUTHOR CONTRIBUTIONS
H.L., Z. and T.G. conceptualized the study; T.G. performed the methodology and validated the study; H.D. and J.Y., H. did formal analysis; T.G. investigated the study; Y.Z., J. and H.L., Z. collected the resources; T.G. wrote the draft; H.L., Z. and T.G. visualized the study; Y.Z., J. and H.L., Z. supervise the study; Y.Z., J. and H.L., Z. did project administration; Y.Z., J. and H.L., Z acquired funding. All authors have read and agreed to the published version of the manuscript.
FUNDING
This work was supported by the National Natural Science Foundation of China 52130907 and Shandong Province Water Diversion Project Operation and Maintenance Center Cooperation Project SDGP37000000202102002416.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.