Cascade gates and pumps are common hydraulic structures in the open-canal section of water transfer projects, characterized by high energy consumption and substantial costs, making it challenging to regulate. By implementing cascade gates regulation to control the hydraulic process, lift distribution of pump stations can be optimized, thus enhancing operational efficiency and reducing energy consumption. However, the selection of control models and parameter optimization is difficult because hydraulic processes are nonlinear, high-dimensional, large hysteresis, strong coupling, and time-varying. This study considers minimum energy consumption of pump stations as the regulation objective and employs the reinforcement learning (RL) algorithm for optimization regulation (OR) within a typical canal section of the Jiaodong Water Transfer Project. Our results demonstrate that after regulating, OR can precisely control the water level to achieve the high efficiency lift interval of pump station, enhancing efficiency by 4.12–6.02% compared to previous operation. Moreover, using optimized hyperparameters group, the RL model proves robust under different work conditions. The proposed method is suitable for complex hydraulic processes, highlighting its potential to support more effective decision-making in water resources regulation.

  • An open-canal segment of the Jiaodong Water Transfer Project was modeled.

  • Reinforcement learning was used to optimize the hydraulic control process.

  • Optimization regulation was robust under variable model conditions.

  • Low-energy consumption automatic regulation was achieved by the model.

OR

optimization regulation

RL

reinforcement learning

WTP

water transfer project

OCWTP

open-canal of water transfer project

SISO

single-input and single-output

MIMO

multiple-input and multiple-output

PI

proportional integral

LQR

linear quadratic regulator

MPC

model predictive control

ID

integral delay

DDPG

deep deterministic policy gradient

A multitude of water transfer projects (WTPs) have been built globally, including the California State Water Project and Central Arizona Project in the United States, the Provence Water Project in France, and the South-to-North Water Diversion Project in China. Cascade gates and pumps are the prevalent configurations of hydraulic structures in open-canal water transfer projects (OCWTPs), characterized by a large engineering scale, numerous units, significant lift variation, high energy consumption, and substantial costs (Horváth et al. 2022). Optimization regulation (OR) can enhance the benefits of WTPs and the efficiency of their hydraulic structures, playing an increasingly crucial role in OCWTP because of their high efficiency, low labor cost, and satisfactory performance (Wang et al. 2022).

Primarily, research on the OR of water resources has focused on the development of appropriate models, including interval-chance constrained programming models (Zhao et al. 2023), fuzzy programming models (Li et al. 2022), and stochastic programming models (Liao et al. 2020). With continuous innovation in technological solutions, there has been a gradual shift toward achieving more accurate automation control ideas (Sun et al. 2023). These automatic control methods can be broadly categorized into single-input single-output (SISO) and multiple-input multiple-output (MIMO) controllers (Kong et al. 2022). In the SISO control method, a single check gate is controlled based on a single water level input, such as the proportional integral (PI) control method. In the MIMO control method, all check gates are controlled simultaneously according to the water level inputs from all monitoring points, such as the linear quadratic regulator (LQR) and model predictive control (MPC) (Kong et al. 2022).

The PI control method simplifies multivariate functions by utilizing transformation functions and exhibits high reliability. However, the interdependence of its parameters poses challenges for a comprehensive adjustment and compromises its robustness. Current research has mainly focused on the process of determining algorithm parameters (Kong et al. 2019). Arauz et al. (2020) introduced a novel parameter optimization approach for the PI method based on linear matrix inequality, which minimized the actual maximum error and undesirable mutual interactions between canal pools. Zhong et al. (2018) proposed an LQR model to determine the key parameters of a PI controller and tested it in different canals. Common methods for optimizing the parameters of PI control also include the automatic tuning change method (Litrico et al. 2007), frequency response method (Weyer 2008), optimization theory (van Overloop et al. 2005), and neural networks (Cheng & Wu 2023). The LQR control method exhibits notable performance in linear systems. In the nonlinear case, Boubacar Kirgni & Wang (2023) converted a nonlinear reactor model into a linear parameter-varying system and designed a reference control law based on the linear model by integrating an LQR-based control with terminal sliding-mode control. Mathiyalagan & Sangeetha (2022) discussed the robust finite-time stability of conic-type nonlinear systems with time-varying delays by utilizing LQR in nonlinear systems using the Lyapunov-Krasovskii stability theory and the linear matrix inequality approach. Given that hydrodynamics in the OCWTP exhibit strong nonlinearity, it may be more convenient for decision makers to directly select a method that exhibits a good application effect in nonlinear cases. MPC is a MIMO control technology that integrates prediction methods, control theory, and optimization methods. For the OR of hydraulic structures, prediction methods use hydraulic models to calculate the hydraulic process. Kong et al. (2023) presented a closed-loop MPC method for pumping stations in the CH-BWH lake based on an integral delay (ID) predictive method. Rodriguez et al. (2020) used a centralized linear MPC to stabilize an irrigation system whose prediction process was represented by an ID model. Zheng et al. (2019) formulated an MPC for a cascaded irrigation canal system in Gansu, China using an ID model. These studies demonstrate that the ID model is a typical selection as predictive method in MPC. This is attributed to the differential characteristics of the Saint-Venant equation, making it difficult to directly establish the relationship between the water level and the discharge at the control point. Because the ID model is derived from the linearization of the Saint-Venant equation near the stable point, theoretically, the ID model is only suitable for small-scale study areas and linear cases (Schuurmans 1997). Thus, more sophisticated predictive methods and advanced MPCs methods have been developed, such as adaptive MPC (Liu et al. 2023), robust MPC (Chen et al. 2021a, 2021b), and nonlinear MPC (Aydin et al. 2022). However, the calculation process for these MPCs is considerably long and inefficient (Ding et al. 2018) especially in high-dimensional cases (Ion et al. 2014).

In practice, in addition to nonlinearity, the hydraulic process also exhibits the following challenging characteristics: (a) high dimensionality: there is massive high-dimensional data information that needs to be considered, such as discharge, water level, lift, and gate combinations. (b) Potential scarcity: information can sometimes not be collected completely or simultaneously, causing considerable challenges in water resource management and control (Benra et al. 2021). Existing research falls short of extracting useful information from high-dimensional but potentially scarce data (Zeyou et al. 2023). (c) Strong time variability: flow process is stochastic and complicated (Halil 2022). Solving dynamic optimization problems cannot rely solely on traditional algorithms, as the objective function, constraints, and Pareto front surface may change over time (Farina et al. 2004). Prevalent particle swarm optimization and genetic algorithms are mediocre algorithms for solving dynamic optimization problems (Jordehi 2014), because their variants have difficulty balancing all-period optimality and dynamic computational effort issues (Kim et al. 2014). With the rapid development of artificial intelligence algorithms in recent decades, neural network algorithms have overcome some limitations of previous studies and have attracted widespread attention. However, it is equally challenging to construct the aforementioned mathematical models and traditional neural network models for large-scale long-distance WTPs that lack comprehensively measured regulatory data (Gan 2022). Reinforcement learning (RL) with self-learning and adaptive capabilities that effectively handles nonlinear and high-dimensional conditions, and does not require any labeled data, has gained attention in various fields. Compared to mathematical models, the RL model is relatively simple and can demonstrate a fast response and high accuracy, even in basins with limited data (Aydin et al. 2022). It has been applied in areas such as water resource regulation (Lee & Labadie 2007; Castelletti et al. 2010, 2013, 2014; Madani & Hooshyar 2014), real-time flood control (Saliba et al. 2020; Bowes et al. 2022), and wastewater treatment (Chen et al. 2021a, 2021b; Lu et al. 2021; Zhou et al. 2022). Although RL has great potential for hydraulic control, to the best of our knowledge, studies that have utilized the RL algorithm for low-energy consumption automatic hydraulic regulation in OCWTP (Hu et al. 2023) are scarce. The possible reasons are as follows: (a) although RL training does not require numerous labeled data, an available environment requires data from complex hydraulic models, which is not easy to construct; (b) the hyperparameters of RL are difficult to design; (c) computationally, the response of the environment to the actions generated by RL is a complex process, difficult to converge, which may be the primary reason for the use of simple models.

Currently, in China, the regulation of hydraulic structures in OCWTP still relies mainly on manual subjective experience (Sun et al. 2023), which can lead to excessive consumption and energy wastage at pump stations (Wang et al. 2022). The purpose of this study is to present a low-energy consumption automatic regulation model in a typical canal of Jiaodong WTP. By employing cascade gates regulation to achieve canal hydraulic control, the optimal operating condition of the pump station can be achieved, thereby reducing the energy consumption required for water transportation in the pump station and achieving a low-energy redistribution of water resources. The RL algorithm is a suitable approach for nonlinear, high-dimensional, large hysteresis, strong coupling, and time-varying hydraulic processes to enhance the efficiency of pump stations and boost the overall benefits of WTPs. The main novelties of this paper are the selection of control methods and parameters optimization. It is hoped that the concepts, initial results, and formulations provided in this study will help build a foundation to support RL as a viable option for hydraulic control.

This study investigates the low-energy consumption automatic regulation of cascade gates and pump in open canals for the Jiaodong WTP, based on the RL model. The research steps were as follows:

  • Typical canal sections were selected for the study and generalized. Subsequently, a hydraulic model was constructed using the HEC-RAS software to provide a environment foundation for RL model. Through the secondary development of the HEC-RAS using Python, programming modifications of the hydraulic model could be achieved.

  • An OR model based on RL was established, and constraint conditions and optimization objectives were implemented through the designation of reward function in the RL model. The optimization objective was to minimize energy consumption of the pump station. The constraint conditions included the water level constraint, lift constraint, and discharge constraint. Then, the model parameter settings were discussed.

  • Finally, the typical conditions to validate the RL model were selected.

RL for optimal regulation of hydraulic structures

When formulated as an RL problem, the OR of the hydraulic structures in an OCWTP can be fully described by the agent and environment (Figure 1).
Figure 1

Framework of RL.

Figure 1

Framework of RL.

Close modal
Figure 2

Framework of DDPG algorithm.

Figure 2

Framework of DDPG algorithm.

Close modal
The environment represents the hydraulic model to calculate the hydraulic process. An agent represents the entity hydraulic structure controlling system. The control process can be described as a Markov decision process < S, A, P, R > , where S is state space, A is action space, P is the state transition function, R is the reward function, and policy is represented by π. After executing action A (e.g., opening a valve or turning on a pump), state S (e.g., discharge or water level) of the agent will be changed, yielding an immediate reward (estimation of action). Based on the reward for each action, the agent receives a long-term reward R (Equation (1)). Owing to the delay in the agent's reward, it is inaccurate to evaluate the quality of the action solely based on the immediate reward. Therefore, the state value function Vπ(s) and action value Qπ(s,a) function are used for evaluation in RL. Vπ(s) describes the expected return R when the agent is in the specific state of the environment, representing the probability distribution of the agent in a specific state (Equation (2)). Qπ(s,a) describes the expected return R when the agent takes a specific action in a specific state, representing the probability distribution of the agent choosing specific actions (Equation (3)). To solve in the bootstrapping method, the Bellman equation form of Vπ(s) is represented as Equation (4):
formula
(1)
formula
(2)
formula
(3)
formula
(4)
where π is the strategy to be optimal, Eπ is the expected value after adopting π, γ∈ [0,1] is the discount factor during each action, and t represents time.

In RL, the agent executes actions with the goal of maximizing the long-term return R. R is formulated to reflect the specific control objectives. The agent receives a negative reward to prevent unexpected regulation schemes, such as an unsafe water level or costly pump lift, and receives a positive reward to encourage actions that satisfy the objective function. By maximizing these rewards in response to various actions over time, the agent learns a control strategy to achieve the desired objective.

Various types of RL algorithms are currently available, and the classifications of RL and its applications in water research are shown in Table 1.

Table 1

The classification of RL the algorithm

Category
AlgorithmProsConsReferences
Model-based Given model AlphaZero Effective with simple tasks Its applications are limited by the requirement of a deterministic environment, such as Go chess with a deterministic winning outcome, making it unsuitable for complex hydraulic process. /
/
/
/
/
/
Learning the model World Models 
I2A 
MBVE 
MBMF 
Model-free Value-based C51 Good convergence and less computational cost Only applicable to discrete action spaces. But the gate opening is a continuous process. Can only be adopted for simple examples, such as single gate scenarios. 
QR-DQN 
HER 
DQN water reservoir (Lee & Labadie 2007; Castelletti et al. 2010; Madani & Hooshyar 2014), water heater(Amasyali et al. 2021
Policy-based Policy gradient Good robustness and high efficiency High computational cost and high memory consumption. 
A2C/A3C water energy (Yuansheng et al. 2021), irrigation (Alibabaei et al. 2022
PPO hydraulic structure (Lu et al. 2021; Xu et al. (2021
TRPO 
Combination of value and policy method DDPG A trade-off between convergence speed, computational cost Sensitive to hyperparameters water desalination (Bonny et al. 2022), water reactor (Chen & Asok Ray 2022), water tank(Likun & Jiang 2023), storm water (Saliba et al. 2020), water diversion strategy (Jiang et al. 2024
TD3 
SAC 
Category
AlgorithmProsConsReferences
Model-based Given model AlphaZero Effective with simple tasks Its applications are limited by the requirement of a deterministic environment, such as Go chess with a deterministic winning outcome, making it unsuitable for complex hydraulic process. /
/
/
/
/
/
Learning the model World Models 
I2A 
MBVE 
MBMF 
Model-free Value-based C51 Good convergence and less computational cost Only applicable to discrete action spaces. But the gate opening is a continuous process. Can only be adopted for simple examples, such as single gate scenarios. 
QR-DQN 
HER 
DQN water reservoir (Lee & Labadie 2007; Castelletti et al. 2010; Madani & Hooshyar 2014), water heater(Amasyali et al. 2021
Policy-based Policy gradient Good robustness and high efficiency High computational cost and high memory consumption. 
A2C/A3C water energy (Yuansheng et al. 2021), irrigation (Alibabaei et al. 2022
PPO hydraulic structure (Lu et al. 2021; Xu et al. (2021
TRPO 
Combination of value and policy method DDPG A trade-off between convergence speed, computational cost Sensitive to hyperparameters water desalination (Bonny et al. 2022), water reactor (Chen & Asok Ray 2022), water tank(Likun & Jiang 2023), storm water (Saliba et al. 2020), water diversion strategy (Jiang et al. 2024
TD3 
SAC 

Owing to the requirement of continuous adjustment of the gate opening and practical operating conditions, the combination of value and policy methods is an appropriate option for the OR of hydraulic structures. The deep deterministic policy gradient algorithm (DDPG) is typical of the possible options, possessing strong nonlinear and high-dimensional modeling capabilities, and exhibiting notable real-time optimization.

DDPG and its network architecture

The performance of Qπ(s,a) in RL is not always satisfactory in many tasks. First, when using a Q value table to record the returns of executing various actions in different states, the high dimensionality of state action space can lead to a ‘curse of dimensionality’. Moreover, the initial Q value obtained may be inefficient and may require multiple visits to improve the corresponding Q value. Furthermore, the algorithm is susceptible to uncertainty, which leads to unstable convergence.

DDPG utilizes techniques to address these issues. First, it takes actions directly by neural network model, thereby demonstrating excellent performance and generalization capabilities in the continuous action space. Moreover, DDPG is designed based on the actor-critic architecture, where the actor is responsible for generating action, and the critic evaluates the merits of the actions. The actor and critic collaborate to achieve concurrent optimization, enabling better convergence in searching for the optimal policy and making it suitable for complex dynamic environments. Furthermore, the experience replay mechanism, which constructs an experience pool, is adopted in DDPG. After each action, the agent deposits the data into an experience pool. During each training session, random sampling is performed from the experience pool to eliminate correlations in the observation sequence. The main actor network μ(s|θμ), target actor network μ(s|θμ), main critic network Q(s,a|θQ), and target critic network Q(s,a|θQ) are simultaneously used in DDPG to participate in the training process. As the training progresses, the target network periodically copies the parameters from the main network, enabling the agent to learn a better strategy from the environment. This process is accomplished through a soft transition, which is employed to accelerate convergence. The definition of a soft transition is shown in Equation (5):
formula
(5)
where τ is update factor; and θμ, θμ, θQ, and θQ represent the parameters of the main actor network, target actor network, main critic network, and target critic network, respectively.
During the optimization process, the action of the agent in each state can be gradually optimized according to Equation (6).
formula
(6)
where nt is the noise that strikes a balance between the exploration and exploitation.
In the initial stage, a larger noise value allows the algorithm to perform more exploration, preventing it from becoming stuck in a local optimal solution. As the iteration proceeds, the noise is gradually reduced, causing the algorithm to depend more on the current optimal policy, thereby improving the convergence speed. nt is defined as shown in Equation (7):
formula
(7)
where ntinitial, ntfinal, and σ represent the initial value, final value, and descent factor of the exploration noise, respectively.
The framework of the DDPG algorithm is illustrated in Figure 2. The pseudocode for the DDPG algorithm is as follows:
Figure 3

Study area: (a) Jiaodong WTP; (b) geographic information of typical canal; and (c) section view of typical canal.

Figure 3

Study area: (a) Jiaodong WTP; (b) geographic information of typical canal; and (c) section view of typical canal.

Close modal

Figure 2 reveals that DDPG incorporates four neural networks, thereby resulting in a multitude of algorithmic parameters. Consequently, the convergence of the algorithm is highly sensitive to the parameter settings.

Hydraulic calculation and software introduction

Flood simulations can be effectively performed using one-dimensional hydraulic models that enable macroscopic descriptions of flood movement and have been extensively applied in practice. HEC-RAS is a widely used hydraulic model designed to simulate natural river networks or artificial canal hydraulics. Owing to its excellent interactivity, HEC-RAS was selected as the hydraulic model for this study. In HEC-RAS, non-steady-state discharge calculations were performed using control equations based on the continuous equation and momentum equation (Equation (8)):
formula
(8)
where AT is the area of the microelement, Q is the discharge of the river canal, ql is the lateral convergence per unit length, V is the average discharge velocity, g is gravitational acceleration, l is the water level, and Sf is the friction slope.
Currently, the finite-difference equation is a prevalent computational method. Based on practical requirements, the river canals were divided into multiple segments, and the basic equations were discretized in time and space. For each segment, differential equations were simplified and solved using algebraic equations. The algebraic equation system on the segment only contains 4 unknown variables: discharge and water level changes at the beginning and ending time period ΔQj, ΔQj+1, ΔLj, and ΔLj+1. The final equation system is shown in Equation (9).
formula
(9)
where θ is the weight coefficient (0 ≤ θ ≤ 1), to ensure the stability of the difference equation; and r, ηn−1, θ, β, ɑ, are coefficients of A1jE1j, A2jE2j.

By incorporating the boundary conditions of the river section and employing the Newton–Raphson iteration, it is feasible to directly calculate the changes in discharge and water level at the initial and terminal sections, as well as the increments in water level and discharge at different intermediate sections.

Construction of an OR model

This study presents a low-energy consumption OR model for cascade pump and gates in the open-canal of the Jiaodong WTP based on DDPG algorithm.

Study area

The Jiaodong WTP is a large-scale, inter-basin, long-distance WTP in Shandong Province, China. The canal section incorporates hydraulic structures consisting of cascade pumps and gates. The Wangnou–Weihezhongdi section of the Jiaodong WTP is a typical canal, which is 7.844 km long, with a trapezoidal cross-sectional shape. The geographical location of the study area and a schematic of a typical canal are shown in Figure 3.
Figure 4

Operational conditions: (a) discharge-lift curve and (b) discharge-lift-efficiency curve of one water pump.

Figure 4

Operational conditions: (a) discharge-lift curve and (b) discharge-lift-efficiency curve of one water pump.

Close modal

An overview of the hydraulic construction in the typical canal is provided in Table 2.

Table 2

Hydraulic constructions

Hydraulic constructionsIndicatorValue
Wangnou Pump Station Maximum Water level of Pump Outlet Pool (LM-p12.55 m 
Minimum Water level of Pump Outlet Pool (Lm-p10.63 m 
Maximum Net Lift (HM-p10.05 m 
Minimum Net Lift (Hm-p9.30 m 
Gate 1: Jiagou Gate Design Front Water Level of Gate 1 (Ld-g1f11.71 m 
Design Rear Water Level of Gate 1 (Ld-g1r11.50 m 
Design discharge of Gate 2 (Qd129.50 m3/s 
Gate 2: Huaihexidi Gate Design Front Water Level of Gate 2 (Ld-g2f11.45 m 
Design Rear Water Level of Gate 2 (Ld-g2r11.40 m 
Design discharge of Gate 2 (Qd229.00 m3/s 
Gate 3: Huaihe Gate Design Front Water Level of Gate 3 (Ld-g3f11.39 m 
Design Rear Water Level of Gate 3 (Ld-g3r10.68 m 
Design discharge of Gate 3 (Qd329.00 m3/s 
Gate 4: Huaihezhongdi Gate Design Front Water Level of Gate 4 (Ld-g4f10.56 m 
Design Rear Water Level of Gate 4 (Ld-g4r10.50 m 
Design discharge of Gate 4 (Qd429.00 m3/s 
Hydraulic constructionsIndicatorValue
Wangnou Pump Station Maximum Water level of Pump Outlet Pool (LM-p12.55 m 
Minimum Water level of Pump Outlet Pool (Lm-p10.63 m 
Maximum Net Lift (HM-p10.05 m 
Minimum Net Lift (Hm-p9.30 m 
Gate 1: Jiagou Gate Design Front Water Level of Gate 1 (Ld-g1f11.71 m 
Design Rear Water Level of Gate 1 (Ld-g1r11.50 m 
Design discharge of Gate 2 (Qd129.50 m3/s 
Gate 2: Huaihexidi Gate Design Front Water Level of Gate 2 (Ld-g2f11.45 m 
Design Rear Water Level of Gate 2 (Ld-g2r11.40 m 
Design discharge of Gate 2 (Qd229.00 m3/s 
Gate 3: Huaihe Gate Design Front Water Level of Gate 3 (Ld-g3f11.39 m 
Design Rear Water Level of Gate 3 (Ld-g3r10.68 m 
Design discharge of Gate 3 (Qd329.00 m3/s 
Gate 4: Huaihezhongdi Gate Design Front Water Level of Gate 4 (Ld-g4f10.56 m 
Design Rear Water Level of Gate 4 (Ld-g4r10.50 m 
Design discharge of Gate 4 (Qd429.00 m3/s 

The Wangnou Pump Station is located 7 km west of Changyi City and 250 m south of the Yanzhao Highway. The main water pumps of the Wangnou Pump Station comprise six 1600HD-9.5 (1.6HL-50A) guide vane mixed-flow pumps. According to the total lift loss calculation of the inlet and outlet water pool, the operating condition of one of the six water pumps is shown in Figure 4.
Figure 5

Construction of the HEC-RAS model: (a) river network; (b) cross section; (c) boundary condition; and (d) hydraulic structure.

Figure 5

Construction of the HEC-RAS model: (a) river network; (b) cross section; (c) boundary condition; and (d) hydraulic structure.

Close modal

Relevant data include geospatial data (river network, cross-section, geospatial data of hydraulic structures), flow data (non-constant flow discharge), hydraulic data (manning coefficient, canal roughness, slope gradient), boundary conditions (upstream and downstream boundary conditions and internal boundary conditions), and the information presented in Figure 4. These data were provided by the Shandong Province Bureau of the Jiaodong Water Diversion. Due to the good maintenance of the Jiaodong WTP, the hydraulic parameters were selected based on the design values.

Construction of the hydraulic model

The HEC-RAS model is constructed through the following steps: (a) create a new engineering project; (b) import geographical topological data from ArcGIS; (c) set up the hydraulic structures involved; (d) define flow data and boundary conditions; (e) set nonconstant flow operation plans; (f) conduct hydraulic calculations using the software; and (g) perform model operation and processing analysis. Figure 5 shows the construction process of the HEC-RAS model.
Figure 6

Sensitivity analysis of parameters to objective function.

Figure 6

Sensitivity analysis of parameters to objective function.

Close modal

Construction of the DDPG model

  • (1) Environment construction: The main function of the environment is to connect the DDPG model with hydraulic models, thereby providing a basic environment for the DDPG model. At a given moment, in DDPG, the agent selects an action (gate opening) based on state (hydraulic information) and control strategies. The action and state are then input into the environment module (hydraulic model). By calculating the unsteady flow in the hydraulic model, the state is updated at the next time step and compared with the target value. DDPG receives feedback in the form of a reward function, providing a directional judgment of the updating effect until the end condition is reached. The construction of the environmental module included the following steps.

    • (1) Reading of HEC-RAS: Python was used to complete the reading of the HEC-RAS files, including geometric topology, hydraulic boundary conditions, and unsteady flow operating condition information in the hydraulic model, and to provide hydraulic information in the current state to the DDPG model.

    • (2) Modification of the gate opening: The gate opening was extracted and modified using functions built in Python.

    • (3) Operation of the hydraulic model: Run the modified unsteady flow operation file of HEC-RAS using the Python function.

    • (4) Output of result: Read the result file of the HEC-RAS, obtain hydraulic information, such as the water level and discharge at the next time step, calculate the reward according to the reward function, and then provide feedback to the DDPG model.

    • (5) Preservation of hydraulic model: Save the hydraulic model after running.

  • (2) Construction of agent: An agent comprises a set of strategies and actions. This strategy represents the current estimation of the action distribution. This study employs a noise strategy for action selection to strike a balance between exploration and exploitation.

  • (3) Establishment of state function: In the DDPG, the state function can specifically represent various types of information of the agent. In this study, the water level and discharge of the control and pump lift at the current time were considered as the states. By setting a certain time step for the state transition in HEC-RAS, state information from the previous moment can be transmitted to the next moment.

  • (4) Establishment of reward function: The optimization goal is to minimize the energy consumption of the pump station. By regulating the cascade gates, the water level of the pump outlet pool Lp can be controlled to reach the target water level L*, thereby achieving a high efficiency lift H* of pump station (calculated through Figure 4); namely, min{|Lp-L*|}. As a result, the reward function is defined as f = k1-k2*|Lp-L|C (k1, k2, and C are constant). In addition, if the parameters exceed the constraint conditions, a large penalty value is assigned to the reward function. Three relative constraints are included: the water level, lift, and discharge constraints. For the water level constraint, the water level of the pump outlet pool Lp does not exceed its maximum and minimum values, Lm−p < Lp < LM−p; the water level of the gate Li (i = 1, 2, 3, 4) does not exceed its designed value, Li < Ld−gi, and the rate of water level decline does not exceed 0.30 m/day. The lift constraint specifies that the lift of the pump station does not exceed the maximum and minimum values Hm-p < Hp < HM-p. Finally, the discharge constraint indicates that the discharge of gate Qi does not exceed its design value Qi < Qdi. Considering the Compilation of Construction and Management Documents of Yellow River Diversion Project in Jiaodong Area and programming experience, the reward functions f were set as Equation (10):
    formula
    (10)
    where C is the power of reward function. When C = 1, the reward function was linear. When C = 2, the reward function was quadratic.
  • (5) Model setting: For model setting in DDPG, the architecture of the neural network is related to its ability to fit complex relationships, convergence, and training time (Kapanova et al. 2018). In this study, the network architectures of the actor and critic were set as shown in Table 3. The optimizer used in the training process was an efficient Adam optimizer (Zhang 2018).

Table 3

The network architecture of the Actor and the Critic in DDPG

Component of architectureActorCritic
Input layer States States, action 
Size of first hidden layer 256 256 
The activation function of the first layer ReLu ReLu 
Output processing of the first layer Gaussian noise – 
Size of second hidden layer 256 256 
The activation function of the second layer ReLu ReLu 
Output processing of the second layer Gaussian noise – 
Output layer Action Q value 
Dimension of the output layer 
The activation function of the output layer Tanh ReLu 
Component of architectureActorCritic
Input layer States States, action 
Size of first hidden layer 256 256 
The activation function of the first layer ReLu ReLu 
Output processing of the first layer Gaussian noise – 
Size of second hidden layer 256 256 
The activation function of the second layer ReLu ReLu 
Output processing of the second layer Gaussian noise – 
Output layer Action Q value 
Dimension of the output layer 
The activation function of the output layer Tanh ReLu 

Suitable hyperparameter settings foster efficient exploration in complex environments and expedite the identification of optimal policies. The main hyperparameters include learning rate, update factor of soft transition τ, noise value, capacity of experience pool D and discount factor γ. The Sobol method was used to analyze the sensitivity of each parameter to the objective function as described in Zhang et al. (2015).

  • (6) Evaluation of results : To evaluate the effectiveness of the DDPG model, it was assessed using indicators such as the degree of reward improvement /% (I1), efficiency improvement degree of pump station /% (I2), rate of pump lift within the high efficiency interval (I3), rate of pump lift between maximum and minimum values (I4), rate of gates water level does not exceed design value/% (I5), and rate of gates discharge does not exceed design value/% (I6), which are defined as Equation (11):
    formula
    (11)
    where P1 and P2 are the operating efficiencies of the Wangnou Pump Station without OR and after OR, respectively; NL is the number of data within the high efficiency lift interval; Nl is the number of data within the maximum and minimum lift values; Nh is the number of data not exceeding the design water level of gates; Nd is the number of data not exceeding the design discharge of gates; and N is the total number of data.

Results

The selection of control time intervals for the OR requires a comprehensive consideration of the lag time of the channel and actual control needs. According to Compilation of Construction and Management Documents of Yellow River Diversion Project in Jiaodong Area, the control time interval was set to 2 h.
  • (1) Model setting: The well-performing DDPG benefited from the rational architecture, hyperparameter settings, and selection of the reward function. The main settings included the learning rate, update factor of soft transition τ, noise value, capacity of experience pool D and discount factor γ, and the power of reward function C. However, these hyperparameters in DDPG lacked a uniform standard and their values were based on experience and experimentation (Liessner et al. 2019). The Sobol method was used to analyze the sensitivity of each model setting to the objective function. The number of samples was set to 500. The results are presented in Figure 6.

Figure 7

The impact of different model settings. (a)–(d) present scenario group represents A–D, respectively.

Figure 7

The impact of different model settings. (a)–(d) present scenario group represents A–D, respectively.

Close modal
As shown in Figure 6, the noise value and learning rate have the greatest impact on model effectiveness, γ and C have a moderate impact, and τ and D have a limited impact. We selected important hyperparameters in DDPG and conducted a scenario analysis to identify the optimal settings in DDPG (Table 4 and Figure 7). The capacity of the experience pool was set to 10,000 and τ was set as 0.01.
Table 4

Model settings in DDPG under different scenarios

Scenario groupIDSettings
A: Learning rate A1 LR of actor network = 0.001, LR of critic network = 0.001 
A2 LR of actor network = 0.001, LR of critic network = 0.0001 
A3 LR of actor network = 0.0001, LR of critic network = 0.001 
A4 LR of actor network = 0.0001, LR of critic network = 0.0001 
B: Noise value B1 Initial noise value = 0.1 final_noise value = 0.001 descent factor = 0.999 
B2 Initial noise value = 0.1 final_noise value = 0.001 descent factor = 0.99 
B3 Initial noise value = 0.5 final_noise value = 0.001 descent factor = 0.999 
B4 Initial noise value = 0.5 final_noise value = 0.001 descent factor = 0.99 
C: γ C1 γ= 0.98 
C2 γ= 0.95 
C3 γ= 0.90 
D: Reward function D1 Linear reward function C = 1 
D2 Quadratic reward function C = 2 
Scenario groupIDSettings
A: Learning rate A1 LR of actor network = 0.001, LR of critic network = 0.001 
A2 LR of actor network = 0.001, LR of critic network = 0.0001 
A3 LR of actor network = 0.0001, LR of critic network = 0.001 
A4 LR of actor network = 0.0001, LR of critic network = 0.0001 
B: Noise value B1 Initial noise value = 0.1 final_noise value = 0.001 descent factor = 0.999 
B2 Initial noise value = 0.1 final_noise value = 0.001 descent factor = 0.99 
B3 Initial noise value = 0.5 final_noise value = 0.001 descent factor = 0.999 
B4 Initial noise value = 0.5 final_noise value = 0.001 descent factor = 0.99 
C: γ C1 γ= 0.98 
C2 γ= 0.95 
C3 γ= 0.90 
D: Reward function D1 Linear reward function C = 1 
D2 Quadratic reward function C = 2 
Figure 8

Typical study case: (a) discharge of the Wangnou pump station; (b) water level of pump inlet pool; (c) low-energy consumption head; and (d) low-energy consumption head interval.

Figure 8

Typical study case: (a) discharge of the Wangnou pump station; (b) water level of pump inlet pool; (c) low-energy consumption head; and (d) low-energy consumption head interval.

Close modal
As shown in Figure 7, the optimal settings are A4, B1, C1, and D2. The results indicate that the noise value is an important parameter for the model results, and its uncertainty should be carefully considered during the establishment of the DDPG model.
  • (2) The OR scheme: A study case, 2021/02/03 0:00–02/05 0:00 was selected for a typical example as shown in Figure 8. The Wangnou Pump Station is located at the upper boundary of the typical canal. The discharge of Wangnou Pump Station is shown in (a). Four water pumps were operated at Wangnou Pump Station during the study period. A single-station discharge was obtained by dividing the data in (a) by four. The water level of the pump inlet pool (b) was determined under this specific water inflow conditions. The pump lift is equal to the difference between the water levels of the inlet and outlet pools. By implementing cascade gates regulation to control the hydraulic process, the optimal water level of the pump outlet pool L* can be achieved. As a result, the high efficiency lift of the pump station can be attained. The high efficiency lift was calculated by inputting a single-station discharge into Figure 4, as shown in (c). According to the Compilation of Construction and Management Documents of Yellow River Diversion Project in Jiaodong Area, the high efficiency lift interval was ± 0.25 m for L*, as shown in (d).

Figure 9

Regulation results of a typical study case. (a) Reward of a typical case; (b) water level before and after regulation; (c) regulation scheme of Gate 1 to Gate 4; and (d) efficiency of the pump station.

Figure 9

Regulation results of a typical study case. (a) Reward of a typical case; (b) water level before and after regulation; (c) regulation scheme of Gate 1 to Gate 4; and (d) efficiency of the pump station.

Close modal
According to prior knowledge obtained from the HEC-RAS, the initial gate opening was set as 2, and the gate opening was modified based on 2. The regulation results are shown in Figure 9.
Figure 10

Regulation results of different operating conditions. (a)–(f) represent rewards of Cases 1–7, respectively; (h)–(n) represent regulation results of Cases 1–7, respectively.

Figure 10

Regulation results of different operating conditions. (a)–(f) represent rewards of Cases 1–7, respectively; (h)–(n) represent regulation results of Cases 1–7, respectively.

Close modal

Figure 9 shows that the DDPG model converged ideally (Figure 9(a)); the water level of the pump outlet pool could be precisely controlled within the optimal interval (Figure 9(c)); and the efficiency of the pump station enhanced 4.28% after OR (Figure 9(d)). The opening of Gate 1 varied significantly from the initial value of 2, whereas the opening of gates 2–4 was approximately 2. This may be because Gate 1 is the closest to the pump station, and the hydraulic response of the Wangnou Pump Station to Gate 1 was faster and stronger.

  • (3) Robustness analysis: Multiple operating conditions with different characteristics were selected to verify the robustness of the proposed model. The evaluation of different operating conditions is listed in Table 5. The regulation results are shown in Figure 10.

Table 5

Evaluation of different operating conditions

CaseIndicator
TimeNoteI1 (%)
I2 (%)I3 (%)I4 (%)I5 (%)I6 (%)
First 50 episodesAfter 50 episodes
Case 1 2022/03/19 0:00 ∼ 03/20 0:00 high discharge, low single pump discharge 137.98 12.89 6.02 100 99.17 100 70.25 
Case 2 2021/12/11 0:00 ∼ 12/12 0:00 low discharge, large fluctuations in control target 242.54 29.90 4.15 99.17 100 100 100 
Case 3 2023/03/17 0:00 ∼ 03/18 0:00 large discharge variance 204.63 23.55 5.43 100 100 100 100 
Case 4 2022/06/04 0:00 ∼ 06/05 0:00 small discharge variance 181.09 29.17 5.74 100 100 100 100 
Case 5 2021/05/27 0:00 ∼ 05/28 0:00 high single pump discharge 234.26 22.10 4.12 100 100 100 100 
Case 6 2021/03/31 0:00 ∼ 04/01 0:00 low single pump discharge 139.29 44.01 5.20 100 98.33 99.17 100 
Case 7 2021/03/13 0:00 ∼ 03/14 0:00 large fluctuations in control target 72.28 19.75 5.34 97.50 100 100 100 
CaseIndicator
TimeNoteI1 (%)
I2 (%)I3 (%)I4 (%)I5 (%)I6 (%)
First 50 episodesAfter 50 episodes
Case 1 2022/03/19 0:00 ∼ 03/20 0:00 high discharge, low single pump discharge 137.98 12.89 6.02 100 99.17 100 70.25 
Case 2 2021/12/11 0:00 ∼ 12/12 0:00 low discharge, large fluctuations in control target 242.54 29.90 4.15 99.17 100 100 100 
Case 3 2023/03/17 0:00 ∼ 03/18 0:00 large discharge variance 204.63 23.55 5.43 100 100 100 100 
Case 4 2022/06/04 0:00 ∼ 06/05 0:00 small discharge variance 181.09 29.17 5.74 100 100 100 100 
Case 5 2021/05/27 0:00 ∼ 05/28 0:00 high single pump discharge 234.26 22.10 4.12 100 100 100 100 
Case 6 2021/03/31 0:00 ∼ 04/01 0:00 low single pump discharge 139.29 44.01 5.20 100 98.33 99.17 100 
Case 7 2021/03/13 0:00 ∼ 03/14 0:00 large fluctuations in control target 72.28 19.75 5.34 97.50 100 100 100 

Table 5 indicates that the reward function converges after approximately 50 generations. After the regulation, the efficiency of the pumping station was improved by 4.12–6.02% (I2), efficiently reducing the energy consumption of the Wangnou Pump Station. The lift of the pump station could be precisely controlled within the high efficiency lift interval (I3 ≥ 97.50), the pump lift could be controlled between its design minimum and maximum value (I4 > 98.33%), the water level of gates could be controlled under its design value (I5 > 99.17%). I3 of Case 2 and Case 7 did not reach 100%, probably because their control targets fluctuated significantly, making it difficult to precisely control the level within the fluctuating interval. I4 of Case 1, Case 6 and I5 of Case 6 did not reach 100%, probably because the single-pump discharge of Case 1 and Case 6 were small. According to the discharge-lift-efficiency curve of one water pump (Figure 4), the corresponding high-efficiency lift was high, resulting in a higher corresponding water level of the pump outlet pool; thus, it was more likely to exceed the upper bounds. I6 of Case 1 was relatively low because Case 1 had a high discharge value that exceeded its design value. For floods exceeding the design standard, even after regulation, the safety requirements may still be difficult to meet. Therefore, other methods should be adopted to address flooding, such as reservoir and flood detention area regulations or sponge city research.

In general, the constructed DDPG model exhibited good performance under different operating conditions and possessed distinct robustness, which is attributed to its adoption of a deep learning architecture, shared parameters across the 4 networks, and experience pool techniques. These methods enabled the model to capture commonalities between different conditions during the training process, resulting in improved adaptability and strong generalization capabilities when encountering new conditions.

Discussion

  • (1) Tips facilitate convergence: Convergence can be facilitated through several strategies. a) Generating targeted experience pool: To obtain the best performance of the neural network, generating targeted training samples is necessary (Mudunuru et al. 2022). Based on the measured hydraulic data, positive and negative sampling can be simultaneously used when constructing the experience pool during the training process to avoid the ‘curse of dimensionality’ and overfitting, and facilitate convergence and value function estimation. A mediocre programming scenario occurs when the initial values of the buffer are all set to 0 or an empty buffer, rather than positive and negative sampling based on prior knowledge. DDPG is smart enough to learn a better strategy even in this way but requires much more time to run. b) Embedding prior knowledge into neural network: Incorporating prior knowledge into a neural network can also reduce overfitting and accelerate convergence (Raissi et al. 2019). For instance, if our regulation objective is to enhance the water level of a specific river cross section, the opening degrees of gates located downstream may tend to decrease, but the decline quantity is obscure. This experience serves as crucial prior knowledge for the actor neural network. We can enable the actor network to have a higher probability of taking on reduced values in advance. c) Baseline of reward function: In this study, the reward function f = positive baseline – |agent's state-its control target|. When the control effect was ideal, the reward was positive; otherwise, it was negative. The convergence speed is higher with a positive baseline (Wang et al. 2005). d) Noise value setting: Given that the noise value decreases as the training process, leading to less exploration and more exploitation. The agent must learn multiple useful knowledge before the noise decreases to a relative small value. Consequently, the suitable noise setting should take the size of experience pool, learning rate and other hyperparameters into consideration. After using the activation function to map the input of the actor network to [−1, 1] or [0, 1], the addition of noise is likely to exceed the interval, resulting in the actor frequently taking extreme values. As a solution, adjusting noise decline speed, optimizing network structures, and regularization can be employed. e) Information dimension and network's tensor structure: When the proposed method was applied to an unsteady flow, it was necessary to add discharge information to the input layer of the neural network, which may require a redesign of the neural network's tensor structure. Based on programming experience, DDPG is smart enough to learn a better strategy without discharge information but requires much more time to run.

  • (2) Limitations of implications: Certain limitations are evident in the extent of the study implications. Because of the continuous water supply process of the Wangnou Pump Station, the on/off states of the pump units were not considered in the current model. In addition, the number of gate adjustments also affects energy consumption (Sun et al. 2023). The control time interval is set to 2 hours according to practical conditions. A larger time interval may compromise the control effect but a smaller time interval may contribute more adjustment time and more energy consumption. Subsequent research can incorporate the goal of minimizing the control times of the gates into the optimal objectives to further reduce the regulation energy consumption.

  • (3) Suggestions of future implications: As RL learns strategies through interactions between the agent and corresponding environment, agents can improve strategies in dynamic and uncertain environments, resulting in strong applicability. RL can be used to achieve high-dimensional multivariable collaborative control. The regulatory objective of this study is to reduce energy consumption. However, the RL algorithm is instrumental in various regulatory objectives. In future applications, when the minimum water shortage, highest improvement in water quality, flood control water level exceedance, or ecological water level satisfaction are considered in the objective function, water resource regulation, water quality-quantity regulation, flood control regulation, or ecological regulation can be achieved. Furthermore, instead of requiring a large amount of pre-annotated training data, the RL algorithm continuously learns through interactions with the corresponding environment, making it applicable to open canals with insufficient data. In addition, convergence of DDPG algorithm is sensitive to the parameter settings. But there is currently no systematic and authoritative model setting method that can be applied to different research areas. We suggest obtaining the optimal hyperparameters settings through experimental comparison. This article identifies suitable model parameter groups and model objective function parameter groups through experiments, hoping to provide setting experience for related research. The data used in this study were limited to the WTP of Jiaodong in China, but it can be expected that the model would also work under different scenarios.

This study takes the typical open-canal of the Jiaodong WTP as an example and uses RL to conduct research on OR in the OCWTP. By employing cascade gates regulation to achieve canal hydraulic control, the high efficiency operating lift of the pump can be attained, thereby reducing the energy consumption required for water transportation. The purpose of this study is to achieve a low-energy redistribution of water resources.

The unique contribution of this study is the selection of a suitable method for hydraulic control after fully considering the complex characteristics of the hydraulic process and optimizing the model parameters. Owing to the characteristics of nonlinearity, large hysteresis, strong coupling, high dimensionality, and time-varying hydraulic process, the real-time regulation challenge in OCWTP is exceptionally steep. Existing research on hydraulic control falls short of extracting useful information from high-dimensional (discharge, water level, lift, gate combinations, etc.) but potentially scarce data (hydraulic data may not be measured or are difficult to collect in some cases). The proposed model, which combines the RL algorithm and the HRC-RAS software, overcomes the limitations of previous control methods. RL, with self-learning and adaptive capabilities that effectively handle nonlinear and high-dimensional conditions and do not require any labels, has high control accuracy even in basins with limited data. To the best of our knowledge, this is the first instance in which it has been explicitly adopted for low-energy consumption automatic regulation of hydraulic structures in an OCWTP. The results demonstrate a good coupling effect between the RL and hydraulic models. The OR can precisely control the hydraulic process to achieve an interval of high efficiency pump lift, improving efficiency of the pump station by 4.12–6.02% compared to previous operations. Moreover, the noise value and learning rate are important parameters for model results, and their uncertainties should be carefully considered during the establishment of the OR model. Using well-designed hyperparameters, OR proved to be robust under different uncertainties in the model parameters.

The concepts, initial results, and formulations provided in this study should help build a foundation to support RL as a viable option for hydraulic control.

We wish to thank the Shandong province Jiaodong Water Transfer Bureau for providing the required data. We also thank the reviewers and editors for their insightful comments and suggestions that improved the clarity of the paper.

H.L., Z. and T.G. conceptualized the study; T.G. performed the methodology and validated the study; H.D. and J.Y., H. did formal analysis; T.G. investigated the study; Y.Z., J. and H.L., Z. collected the resources; T.G. wrote the draft; H.L., Z. and T.G. visualized the study; Y.Z., J. and H.L., Z. supervise the study; Y.Z., J. and H.L., Z. did project administration; Y.Z., J. and H.L., Z acquired funding. All authors have read and agreed to the published version of the manuscript.

This work was supported by the National Natural Science Foundation of China 52130907 and Shandong Province Water Diversion Project Operation and Maintenance Center Cooperation Project SDGP37000000202102002416.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Alibabaei
K.
,
Gaspar
P. D.
,
Assunção
E.
,
Alirezazadeh
S.
,
Lima
T. M.
,
Soares
V. N. G. J.
&
Caldeira
J. M. L. P.
2022
Comparison of on-policy deep reinforcement learning A2C with off-policy DQN in irrigation optimization: A case study at a site in Portugal
.
Computers
11
,
104
.
https://doi.org/10.3390/computers11070104
.
Amasyali
K.
,
Munk
J.
,
Kurte
K.
,
Kuruganti
T.
&
Zandi
H.
2021
Deep reinforcement learning for autonomous water heater control
.
Buildings
11
,
548
.
https://doi.org/10.3390/buildings11110548
.
Arauz
T.
,
Maestre
J. M.
,
Tian
X.
&
Guan
G.
2020
Design of PI controllers for irrigation canals based on linear matrix inequalities
.
Water
12
,
855
.
https://doi.org/10.3390/w12030855
.
Aydin
B. E.
,
Oude Essink
G. H. P. O.
,
Delsman
J. R.
,
Giesenvan de
N.
&
Abraham
E.
2022
Nonlinear model predictive control of salinity and water level in polder networks: Case study of Lissertocht catchment
.
Agric. Water Manage.
264
,
107502
.
https://doi.org/10.1016/j.agwat.2022.107502
.
Benra
F.
,
De Frutos
A.
,
Gaglio
M.
,
Álvarez-Garretón
C.
,
Felipe-Lucia
M.
&
Bonn
A.
2021
Mapping water ecosystem services: Evaluating InVEST model predictions in data scarce regions
.
Environ. Modell. Software
138
,
104982
.
https://doi.org/10.1016/j.envsoft.2021.104982
.
Bonny
T.
,
Kashkash
M.
&
Ahmed
F.
2022
An efficient deep reinforcement machine learning-based control reverse osmosis system for water desalination
.
Desalination
522
,
115443
.
https://doi.org/10.1016/j.desal.2021.115443
.
Boubacar Kirgni
H. B.
&
Wang
J.
2023
LQR-based adaptive TSMC for nuclear reactor in load following operation
.
Prog. Nucl. Energy
156
,
104560
.
https://doi.org/10.1016/j.pnucene.2022.104560
.
Bowes
B. D.
,
Wang
C.
,
Ercan
M. B.
,
Culver
T. B.
,
Beling
P. A.
&
Goodall
J. L.
2022
Reinforcement learning-based real-time control of coastal urban stormwater systems to mitigate flooding and improve water quality
.
Environ. Sci. Water Res. Technol.
8
,
2065
2086
.
https://doi.org/10.1039/D1EW00582K
.
Castelletti
A.
,
Galelli
S.
,
Restelli
M.
&
Soncini-Sessa
R.
2010
Tree-based reinforcement learning for optimal water reservoir operation
.
Water Resour. Res.
46
,
W09507
.
https://doi.org/10.1029/2009WR008898
.
Castelletti
A.
,
Pianosi
F.
&
Restelli
M.
2013
A multiobjective reinforcement learning approach to water resources systems operation: Pareto frontier approximation in a single run
.
Water Resour. Res.
49
,
3476
3486
.
https://doi.org/10.1002/wrcr.20295
.
Castelletti
A.
,
Yajima
H.
,
Giuliani
M.
&
Soncini-Sessa
R.
2014
Weber, planning the optimal operation of a multioutlet water reservoir with water quality and quantity targets
.
Water Resour. Plann. Manage.
140
,
496
510
.
Chen
X.
&
Asok Ray
D.
2022
Reinforcement learning control of a boiling water reactor
.
IEEE Trans. Nucl. Sci.
69
,
1820
1832
.
Chen
K. H.
,
Wang
H. C.
,
Valverde-Pérez
B.
,
Zhai
S. Y.
,
Vezzaro
L.
&
Wang
A. J.
2021a
Optimal control towards sustainable wastewater treatment plants based on multi-agent reinforcement learning
.
Chemosphere
279
,
130498
.
https://doi.org/10.1016/j.chemosphere.2021.130498
.
Chen
W.-H.
,
Shang
C.
,
Zhu
S.
,
Haldeman
K.
,
Santiago
M.
,
Stroock
A. D.
&
You
F.
2021b
Data-driven Robust model predictive control framework for stem water potential regulation and irrigation in water management
.
Control Eng. Pract.
113
,
104841
.
https://doi.org/10.1016/j.conengprac.2021.104841
.
Ding
Y.
,
Wang
L.
,
Li
Y.
&
Li
D. L.
2018
Model predictive control and its application in agriculture: A review
.
Comput. Electron. Agric.
151
,
104
117
.
https://doi.org/10.1016/j.compag.2018.06.004
.
Gan
T.
2022
Application Research on Flood Regulation Based on Data Mining in the Upper Reaches of Gongjia Sluice of Tuhai River
.
Shandong University, Jinan City
.
Halil
I. B.
2022
Comparison of different ANN (FFBP, GRNN, RBF) algorithms and Multiple Linear Regression for daily streamflow prediction in Kocasu River, Turkey
.
Fresenius Environmental Bulletin
31
(
5
),
4699
4708
.
Horváth
K.
,
van Esch
B.
,
Vreeken
T.
,
Piovesan
T.
,
Talsma
J.
&
Pothof
I.
2022
Potential of model predictive control of a polder water system including pumps, weirs and gates
.
J. Process Control.
119
,
128
140
.
https://doi.org/10.1016/j.jprocont.2022.10.003
.
Hu
S.
,
Gao
J.
,
Zhong
D.
,
Wu
R.
&
Liu
L.
2023
Real-time scheduling of pumps in water distribution systems based on exploration-enhanced deep reinforcement learning
.
Systems
11
,
56
.
https://doi.org/10.3390/systems11020056
.
Ion
N.
,
Stoican
F.
,
Clipici
D.
,
Patrascu
A.
&
Hovd
M.
2014
A linear MPC algorithm for embedded systems with computational complexity guarantees
. In:
Proceedings of the 18th International Conference on System Theory, Control and Computing
,
Sinaia, Romania
, pp.
17
19
.
Jiang
Q.
,
Li
J.
,
Sun
Y.
,
Huang
J.
,
Zou
R.
,
Wenjing
M.
,
Guo
H.
,
Wang
Z.
&
Liu
Y.
2024
Deep-reinforcement-learning-based water diversion strategy
.
Environ. Sci. Ecotechnol.
17
,
100298
.
Jordehi
A. R.
2014
Particle swarm optimisation for dynamic optimisation problems: A review
.
Neural Comput. Appl.
25
(
7e8
),
1507
1516
.
Kapanova
K. G.
,
Dimov
I.
&
Sellier
J. M.
2018
A genetic approach to automatic neural network architecture optimization
.
Neural Comput. Appl.
29
(
1481e
),
1492
.
Kong
L. Z.
,
Lei
X. H.
,
Wang
M. N.
,
Shang
Y. Z.
,
Quan
J.
&
Wang
H.
2019
A regulation algorithm for automatic control of canal systems under emergency conditions
.
Irrig. Drain.
68
,
646
656
.
https://doi.org/10.1002/ird.2353
.
Kong
L. Z.
,
Lei
X. H.
,
Wang
H.
,
Long
Y.
,
Lu
L. B.
&
Yang
Q.
2022
A model predictive water-level difference control method for automatic control of irrigation canals
.
Water
11
,
762
.
https://doi.org/10.3390/w11040762
.
Kong
L.
,
Li
Y.
,
Tang
H.
,
Yuan
S.
,
Yang
Q.
,
Ji
Q.
,
Li
Z.
&
Chen
R.
2023
Predictive control for the operation of cascade pumping stations in water supply canal systems considering energy consumption and costs
.
Appl. Energy
341
,
121103
.
https://doi.org/10.1016/j.apenergy.2023.121103
.
Lee
J. H.
&
Labadie
J. W.
2007
Stochastic optimization of multireservoir systems via reinforcement learning
.
Water Resour. Res.
43
,
11408
.
https://doi.org/10.1029/2006WR005627
.
Li
P. Y.
,
Yang
H.
,
He
W.
,
Yang
L. Z.
,
Hao
N.
,
Sun
P. X.
&
Li
Y.
2022
Optimal water resources allocation in the Yinma River Basin in Jilin Province, China, using fuzzy programming
.
Water
14
,
2119
.
https://doi.org/10.3390/w14132119
.
Liessner
R.
,
Schmitt
J.
,
Dietermann
A.
&
Bäker
B.
2019
Hyperparameter optimization for deep reinforcement learning in vehicle energy management
. In
11th International Conference on Agents and Artificial Intelligence (ICAART)
.
Scite Press
,
Prague, Czech Republic
, pp.
134
144
.
https://doi.org/10.5220/0007364701340144
.
Litrico
X.
,
Malaterre
P.-O.
,
Baume
J.-P.
,
Vion
P.-Y.
&
Ribot-Bruno
J.
2007
Automatic tuning of PI controllers for an irrigation canal pool
.
J. Irrig. Drain. Eng.
133
,
27
37
.
https://doi.org/10.1061/(ASCE)0733-9437(2007)133:1(27)
.
Liu
J.
,
Wang
Z.
,
Yang
Z.
&
Zhang
T.
2023
An adaptive predictive control algorithm for comprehensive dendritic canal systems
.
J. Irrig. Drain. Eng.
149
,
04022046
.
https://doi.org/10.1061/(ASCE)IR.1943-4774.0001736
.
Lu
L.
,
Zheng
H.
,
Jie
J.
,
Zhang
M.
&
Dai
R.
2021
Reinforcement learning-based particle swarm optimization for sewage treatment control
.
Complex Intell. Syst.
7
,
2199
2210
.
https://doi.org/10.1007/s40747-021-00395-w
.
Mudunuru
M. K.
,
Cromwell
E. L. D.
,
Wang
H.
&
Chen
X.
2022
Deep learning to estimate permeability using geophysical data
.
Adv. Water Resour.
167
,
104272
.
https://doi.org/10.1016/j.advwatres.2022.104272
.
Raissi
M.
,
Perdikaris
P.
&
Karniadakis
G. E.
2019
Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
.
J. Comput. Phys.
378
,
686
707
.
https://doi.org/10.1016/j.jcp.2018.10.045
.
Rodriguez
L. P.
,
Maestre
J. M.
,
Camacho
E. F.
&
Sánchez
M. C.
2020
Decentralized ellipsoidal state estimation for linear model predictive control of an irrigation canal
.
J. Hydroinf.
22
,
593
605
.
https://doi.org/10.2166/hydro.2020.150
.
Saliba
S. M.
,
Bowes
B. D.
,
Adams
S.
,
Beling
P. A.
&
Goodall
J. L.
2020
Deep reinforcement learning with uncertain data for real-time stormwater system control and flood mitigation
.
Water
12
,
3222
.
https://doi.org/10.3390/w12113222
.
Schuurmans
J.
1997
Control of Water Levels in Open-Channels
.
Netherlands University of Technol
,
Delft
.
Sun
A. J.
,
Hu
D.
,
Shan
C. J.
&
Wang
J.
2023
Automatic scheduling and control technology of pump gate clusters of regional water conservancy project
.
Desalin. Water Treat.
293
,
82
88
.
https://doi.org/10.5004/dwt.2023.29421
.
van Overloop
J.
,
Schuurmans
J.
,
Brouwer
R.
&
Burt
C.
2005
Multiple-model optimization of proportional integral controllers on canals
.
J. Irrig. Drain. Eng.
131
,
190
196
.
Wang
X. N.
,
Xu
X.
&
Wu
T.
2005
Optimal reward baseline for policy-gradient reinforcement learning
.
Chin. J. Comput
.
6
,
1021
1026
.
Wang
Y.
,
Wang
M.
,
Wang
D.
&
Chang
Y.
2022
Stochastic configuration network based cascade generalized predictive control of main steam temperature in power plants
.
Int. J. Inf. Sci.
587
,
123
141
.
https://doi.org/10.1016/j.ins.2021.12.006
.
Weyer
E.
2008
Control of irrigation channels
.
IEEE Trans. Control Syst. Technol.
16
,
664
675
.
https://doi.org/10.1109/TCST.2007.912122
.
Xu
J.
,
Wang
H.
,
Rao
J.
&
Wang
J.
2021
Zone scheduling optimization of pumps in water distribution networks with deep reinforcement learning and knowledge-assisted learning
.
Soft Comput.
25
,
14757
14767
.
https://doi.org/10.1007/s00500-021-06177-3
.
Yuansheng
H.
,
Mengshu
S.
,
Weiye
W.
&
Hongyu
L.
2021
A two-stage planning and optimization model for water – Hydrogen integrated energy system with isolated grid
.
J. Cleaner Prod.
313
,
127889
.
https://doi.org/10.1016/j.jclepro.2021.127889
.
Zhang
Z. J.
2018
Improved Adam optimizer for deep neural networks
. In
IEEE International Symposium on Quality of Service
.
IWQOS
, pp.
1
2
.
https://doi.org/10.1109/IWQoS.2018.8624183
.
Zhang
X. Y.
,
Trame
M. N.
,
Lesko
L. J.
&
Schmidt
S.
2015
Sobol sensitivity analysis: A tool to guide the development and evaluation of systems pharmacology models
.
CPT Pharmacometrics Syst. Pharmacol.
4
,
69
79
.
https://doi.org/10.1002/psp4.6
.
Zheng
Z.
,
Wang
Z.
,
Zhao
J.
&
Zheng
H.
2019
Constrained model predictive control algorithm for cascaded irrigation canals
.
J. Irrig. Drain. Eng.
145
,
104841
.
https://doi.org/10.1061/(ASCE)IR.1943-4774.0001390
.
Zhong
K.
,
Guan
G.
,
Mao
Z.
,
Liao
W.
,
Xiao
C.
&
Su
H.
2018
Linear quadratic optimal controller design for constant downstream water-level PI feedback control of open-canal systems
.
MATEC Web Conf.
246
,
01056
.
https://doi.org/10.1051/matecconf/201824601056
.
Zhou
P.
,
Wang
X.
&
Chai
T. Y.
2022
Multiobjective operation optimization of waste water treatment process based on reinforcement self-learning and knowledge guidance
.
IEEE Trans. Cybern.
53
,
6896
6909
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).