ABSTRACT
The water supply network is a crucial component of urban construction and plays a pivotal role in the normal operation of the city. The complexity and periodicity of pressure variations in the water supply networks pose significant challenges to traditional prediction models. In this study, we introduce a novel short-term pressure prediction model, termed the 1DCNN-GRU-multi-head attention (CGMA) model, which incorporates a multi-head attention mechanism alongside an integrated network consisting of one-dimensional convolutional neural networks (1DCNNs) and gated recurrent units (GRUs). Initially, the model employs a 1DCNN network to extract features from the pressure data. Subsequently, the extracted features are input into the GRU neural network, leveraging its long-term dependency capabilities to improve prediction accuracy. Then, the attention mechanism is incorporated into the GRU component to highlight key information, enabling the model to focus on more important data features, thereby improving the prediction performance. We implemented this model in a real urban water distribution system equipped with five pressure sensors. The model achieved a mean absolute error of 0.00197 and a root mean square error of 0.00262. When compared with alternative approaches, our method demonstrated superior predictive performance for pressure data, thereby confirming its efficacy in practical applications.
HIGHLIGHTS
The model in this study is anchored by the GRU model as its central framework, complemented by a 1DCNN-GRU-multi-head attention composite model from additional modules.
The 1DCNN is responsible for capturing local features, while the attention mechanism focuses on identifying key information within the data.
This integrative approach substantially enhances the model's comprehension of time series data.
INTRODUCTION
The water supply network plays a pivotal role in providing essential water resources to residents, industrial sectors, and public facilities. Within this network, the distribution segment holds a critical responsibility: to convey treated potable water to end-users while maintaining the requisite flow rate, pressure, and quality standards (Ostfeld & Salomons 2004). The water supply network is intrinsically complex, encompassing not only the conventional pipelines but also integrating valves, reservoir, and sophisticated hydraulic monitoring apparatuses (Günther et al. 2015). Effective pressure management, crucial for minimizing risks of pipeline bursts and ensuring consumer satisfaction, relies on strategies such as variable frequency pumps, zonal divisions, and real-time monitoring (Creaco et al. 2015). Accurate short-term pressure predictions enhance management efficiency and adaptability to fluctuating demands, promoting a robust and responsive network (Negharchi & Shafaghat 2022).
In the water supply network, predicting short-term pressure fluctuations is essential for efficient water supply management, impacting resource allocation and the detection of faults within the infrastructure. Due to user activities, pressure fluctuations in the water distribution network can be both long-term and short-term (Marsili et al. 2023; Mazzoni et al. 2024). The challenge intensifies during atypical usage patterns, like extreme weather events or holidays, where predictive model reliability is paramount (Bello et al. 2019). Advanced analytical approaches are necessary to maintain precision and effectiveness in short-term pressure forecasting, contributing significantly to the reliability, safety, and economic efficiency of urban water supply systems (Stańczyk et al. 2022).
Traditional hydraulic models rely on explicit mathematical equations, which often need to be simplified and assumed to make the calculations feasible. However, water supply networks are highly nonlinear and complex, involving numerous interactions and dynamic changes, posing significant challenges for traditional hydraulic models. Conversely, deep learning models take a data-driven approach, capturing complex nonlinear relationships within the data for learning and computation, without the need for explicit mathematical descriptions. Thanks to advancements in the Internet of Things (IoT), the speed and scope of information collection have significantly improved. Deep learning models can utilize vast amounts of historical data for training, predicting future outcomes by identifying and learning patterns and trends within the data. This capability makes them particularly effective in handling high-dimensional, large-scale datasets, extracting valuable information from historical operational data to predict future pressure. Although traditional hydraulic equations are theoretically robust and effective under specific conditions, deep learning offers more flexible and precise tools, particularly when dealing with complex and variable real-world applications.
The traditional approach to analyzing time series data of pipeline states has predominantly been through statistical methods. Sinha & McKim (2007) employed a time-dependent Markov prediction model to forecast pipeline deterioration, facilitating more effective pipeline management. Francis et al. (2014) employed Bayesian belief networks in water supply network rupture prediction, aiming to predict or understand the rate of pipe rupture in a given potable water supply system, even when faced with potentially incomplete or imperfect data. Chen & Guikema (2020) discovered that locally weighted density scans are the most accurate method for identifying areas of high breakage. This application allows statistical prediction models to more accurately prioritize high-risk pipelines, thereby enhancing their performance. Młyński et al. (2021) regard the Colwell indexes as a reliable tool for evaluating the seasonality of water pipeline failure events.
These statistical methods subsequently evolved with the integration of machine learning algorithms, including artificial neural networks (ANNs), support vector machines (SVMs), and random forests, which have been progressively employed to enhance the predictive accuracy of water supply networks. Perea et al. (2019) leveraged genetic algorithms to optimize ANNs, particularly targeting scenarios characterized by limited data sets to refine water demand predictions. Ping et al. (2014) developed an RBF-SVM model, expressly designed to predict actual pressures. Xue et al. (2012) proposed an improved PCA-SVM model, elevating both the accuracy and efficiency of water demand forecasting. Herrera et al. (2010) conducted a comparative analysis of various models, revealing the superior precision of the support vector regression (SVR) model. Mouatadid & Adamowski (2017) introduced the application of an extreme learning machine (ELM) model in water demand forecasting, which uniquely incorporates a range of variables, including water demand, total precipitation, and peak temperature, thereby augmenting its predictive precision. Xia et al. (2021) presented a novel approach combining genetic algorithms with backpropagation (BP) neural networks, aiming at forecasting flow and pressure dynamics in water supply networks. Tiwari & Adamowski (2013) proposed a wavelet–bootstrap–neural network (WBNN) hybrid model that is tailored specifically for short-term urban water demand forecasting, contributing to the sophisticated array of predictive tools in this crucial field.
Due to the complex structure of water supply networks, machine learning methods face challenges when training models with large-scale, intricate data. To address the processing of voluminous and complex data, deep learning has been progressively applied for pressure prediction in water distribution networks. Zhe et al. (2015) developed a nonlinear autoregressive (NARX) network model, which harnesses the power of both real-time and historical operational data to estimate demands at various nodes and to construct functional correlations among principal variables within the network, resulting in the proficiency to efficaciously track and forecast the dynamic fluctuations inherent in water supply systems. Xu et al. (2020) proposed the long short-term memory (LSTM) models for their exceptional capacity in nonlinear mapping, which is particularly adept at handling time series data such as pressure in pipe networks. The research introduced a parallel LSTM and deep neural network (PLDNN) model, which amalgamates the respective strengths of LSTM and DNN architectures and specifically designs to augment the feature extraction efficiency of both control and state variables, resulting in enhanced accuracy and efficacy in predictive performance. Kavya et al. (2023) have compared the results of various models in predicting short-term water demand and found that for multivariate data, LSTM yields the best predictions. Liao et al. (2022) proposed a pressure prediction method based on a spatiotemporal neural network (PP-STNN), which captures the pipeline network's spatial and temporal correlation using a graph convolutional network (GCN) and gated recurrent units (GRUs), respectively, linking the spatial and temporal dynamics of pressure at the nodes of the water supply network. Zanfei et al. (2022) have proposed an anomaly detection method based on GCN, which relies on pressure and flow data.
It has been observed that machine learning methods are prone to losing sequence information and exhibit higher error rates when processing excessively long input sequences. Additionally, other deep learning models demonstrate limitations in extracting local information features and adequately prioritizing important features, as shown in Table 1. Compared with other models, this study excels at accurately predicting future pressures in water distribution networks by leveraging a deep understanding of historical pressure data, coupled with the benefits of attention weights and fast computation. The study presents an innovative CGMA model, where the GRU model acts as the core framework, supplemented by additional modules to improve the accuracy of pressure prediction in water supply networks. The architecture of the GRU model has been carefully tailored to complement the unique characteristics of these networks. The CGMA model represents its first application to the prediction of pressure dynamics in water supply networks. This integration substantially refines the model's understanding of time series data, which is a notable contribution of this research.
Comparison of advantages and disadvantages of each model
Model . | Advantage . | Disadvantage . |
---|---|---|
SVM | Suitable for small- and medium-sized data, the prediction speed is fast and the accuracy is high | Less effective on noisier datasets with overlapping classes; performance and speed degrade with larger datasets |
ARIMA | Good for time series data with a strong seasonal pattern, straightforward to implement and interpret | Requires the data to be stationary |
GA-BP | Combines GA's global search capability with BP's local search ability | Computationally expensive, and can be slow to converge to the optimal solution; risk of overfitting |
LSTM | Excellent at capturing long-term dependencies in time series data | Computationally intensive, challenging to tune due to many hyperparameters |
CNN-GRU | Leverages CNN's feature extraction capabilities and GRU's efficiency in sequence prediction, suitable for spatiotemporal data | Complex model structure that can be hard to interpret; requires large amounts of data to train effectively |
Model . | Advantage . | Disadvantage . |
---|---|---|
SVM | Suitable for small- and medium-sized data, the prediction speed is fast and the accuracy is high | Less effective on noisier datasets with overlapping classes; performance and speed degrade with larger datasets |
ARIMA | Good for time series data with a strong seasonal pattern, straightforward to implement and interpret | Requires the data to be stationary |
GA-BP | Combines GA's global search capability with BP's local search ability | Computationally expensive, and can be slow to converge to the optimal solution; risk of overfitting |
LSTM | Excellent at capturing long-term dependencies in time series data | Computationally intensive, challenging to tune due to many hyperparameters |
CNN-GRU | Leverages CNN's feature extraction capabilities and GRU's efficiency in sequence prediction, suitable for spatiotemporal data | Complex model structure that can be hard to interpret; requires large amounts of data to train effectively |
Methodology and data
1DCNN component
1DCNN And CNN features comparison
Feature . | 1DCNN . | CNN . |
---|---|---|
Data dimension | Processes one-dimensional data (e.g., time series) | Processes two-dimensional data (e.g., images) |
Convolution operation | Convolution kernel moves along the time axis of the data | Convolution kernel moves across the height and width of the data |
Application areas | Audio processing, time series analysis, feature extraction from sequences | Image and video processing, image classification |
Network structure | Simple structure | More complex structure |
Feature . | 1DCNN . | CNN . |
---|---|---|
Data dimension | Processes one-dimensional data (e.g., time series) | Processes two-dimensional data (e.g., images) |
Convolution operation | Convolution kernel moves along the time axis of the data | Convolution kernel moves across the height and width of the data |
Application areas | Audio processing, time series analysis, feature extraction from sequences | Image and video processing, image classification |
Network structure | Simple structure | More complex structure |
The convolutional operations in the two convolutional layers are set differently. For the first type, the kernel size is set to a smaller value to extract local patterns and structures in the data (Figure 2). For the other type, the kernel size is consistent with the period of the time series to capture long-term information in the time series. After each convolutional operation, the application of ReLU activation, batch normalization, and dropout is intended to improve training effects and prevent overfitting. Compared with traditional activation functions such as Sigmoid or Tanh, the nonsaturating activation function ReLU is able to pass the gradient more efficiently and mitigate the issue of gradient vanishing during BP. Batch normalization renders the model less sensitive to weight initialization and serves as a regularizer, which helps to prevent overfitting of the model. Dropout compels the network to run without relying on any specific combination of neurons, akin to training multiple networks and sharing their weights, thereby preventing the model from overfitting the training data and improving the model's generalization ability. Finally, after undergoing two different types of convolutional operations, the output data of the convolutional layers become the input for the next phase.
GRU component
Deep learning technology, a branch of neural network research, has seen significant progress forward with the development of recurrent neural network (RNN). Renowned for their exceptional nonlinear mapping capabilities, RNNs have demonstrated particular efficacy in the analysis and prediction of time series data. Theoretically, RNNs possess the capability to process historical information of indefinite length. However, in practice, processing such information requires an expansion into a proportionate number of network layers, akin to a multilayer feedforward neural network. This necessary expansion often leads to issues such as gradient vanishing and other training-related challenges. Such difficulties impose limitations on the RNN's ability to efficiently handle extensive historical data. To solve this problem, Cho et al. (2014) proposed the GRU model.










At the crux of the GRU model's operational mechanism is its capacity to dynamically modulate information flow, contingent upon the current inputs and historical data of the sequence. The synergistic operation of the update and reset gates empowers the GRU to adeptly capture long-term dependencies in time series data, circumventing the gradient vanishing dilemma commonly encountered in traditional RNNs. Additionally, the GRU is characterized by a parameter-optimized architecture, possessing fewer parameters (sans output gates) in comparison with the LSTM. This attribute renders the GRU more expedient and computationally economical during the training phase.
Attention component
Contrary to conventional time series models, which uniformly weigh each time step, attention mechanisms introduce a nuanced approach by attributing differential weights to various time steps. This feature proves pivotal in contexts where specific historical data points possess greater predictive value for future outcomes (Niu et al. 2021). In the realm of time series, the attention mechanism employs the constructs of query (Q), key (K), and value (V). The query typically corresponds to the current state within the sequence being forecasted, whereas the keys and values are extrapolated from historical data. The model computes attention scores through a comparative analysis of the query against each key. These scores are indicative of the relative significance of each time step's data (value) in the prediction of the imminent time step (Vaswani et al. 2017). The normalization of these scores is achieved using a softmax function, thereby converting them into a probabilistic distribution that functions as weights for the values. This leads to a contextual focus where each value, representing a historical data point, is assigned weights based on the softmax-derived outputs. Consequently, the model formulates a context vector for each predictive time step, selectively emphasizing certain historical moments over others. In the final phase of output prediction, this context vector, which encapsulates a concentrated historical narrative, is utilized in conjunction with neural networks, to forecast the subsequent time step in the series.
To augment the GRU model's capability in extracting diverse data features, a sophisticated variant of the attention mechanism, known as the multi-head attention mechanism, is implemented. This mechanism concurrently processes information across various subspaces, thereby intricately capturing the complex interrelations embedded within the data (Su et al. 2023). Each ‘head’ in this mechanism independently assimilates the input data's characteristics from distinct perspectives, enriching the model's comprehension with a broader spectrum of information. The amalgamation of insights gleaned from these multiple heads culminates in a data representation that is substantially more nuanced and potent than what is achievable through single-head attention. This multi-faceted approach enables simultaneous focus on data from assorted dimensions, substantially elevating the learning process in terms of efficiency and efficacy. While the insights derived from each head may differ, their integration facilitates a more holistic and comprehensive representation of the data.
The incorporation of the multi-head attention mechanism adeptly addresses the critical issue of long-range dependencies in sequential data processing. By methodically accentuating the most salient segments of the data sequence, this advanced mechanism substantially enhances the model's predictive precision. It refines the analytical acumen of the GRU by concentrating on pivotal data points, thereby amplifying computational efficacy. This innovative methodology facilitates prudent resource distribution, effectively augmenting the system's proficiency and operational capabilities in intricate sequence analysis endeavors. As a result, the synergy of the multi-head attention mechanism with the GRU framework results in a more formidable and exhaustive approach to sequential data analysis, effectively circumventing the historical limitations associated with managing extensive temporal correlations.
The integrated model of 1DCNN-GRU-multi-head attention
Sequential data typically exhibit significant dependencies between preceding and subsequent elements in the time dimension. Temporally, the pressures in the water supply network demonstrate proximity and similarity, which suggests that future output is intimately linked to the past state. Consequently, an innovative CGMA model is introduced to handle such data more effectively. Initially, local characteristics in the time series data are captured via a one-dimensional convolutional layer, extracting nuanced, localized features from the dataset. Subsequently, the output from the CNN is further processed in the GRU layer that excels in identifying and capturing long-term dependencies, effectively discerning important trends and patterns in the time series through analyzing successive progressions and dynamic fluctuations. This layer's processing is imperative for understanding long-term behavioral patterns in complex time-series data. Ultimately, the output of the GRU layer is directed to an attention mechanism module, which primarily focuses the model's attention on the most critical segments of the time series. The introduced attention mechanism quantitatively assigns importance weights to each specific time step in the sequential features, aiming to mitigate the attentional dispersion defects of the traditional GRU.
Thus, the model effectively identifies the historical information most critical to the current prediction, improving the overall prediction performance.
The prediction steps of the CGMA model delineated in this paper are as follows:
(1) Collect the pressure data of the water supply network and conduct data preprocessing.
(2) Retrieve the pressure state of the water supply network from the dataset and build a one-dimensional convolutional neural network to extract local features.
(3) A one-dimensional convolutional network is utilized to process the water distribution network's pressure features, providing input for the GRU network. This approach enables the extraction of long-term temporal characteristics from the pressure data.
(4) Implement the attention mechanism to optimize the weight distribution automatically. Multiply and sum the output vectors of the GRU's hidden layer at different time points with the corresponding weights to emphasize the important feature components.
(5) Integrate the temporal characteristics of the water distribution network pressure into the regression prediction layer as input, and compute the corresponding prediction results. Define the model's loss function and iteratively optimize the model parameters based on the loss function's value using the BP algorithm. Real-time water distribution network pressure data is continuously fed into the model to enable short-term predictions.
Data presentation
Data preprocessing
Experimental environment and determination of model hyperparameters
The experimental environment consisted of an Intel(R) Xeon(R) Platinum 8222CL CPU @ 3.00 GHz and an NVIDIA GeForce RTX 4070 Ti GPU. We used Python 3.11 and the Pytorch framework to train and run the 1DCNN-GRU-attention neural network deep prediction model.
The hyperparameters of a model include the learning rate, batch size, the number of GRU layers, and the number of attention heads. The learning rate dictates the step size for adjusting model weights. An excessively high learning rate may prevent the loss function from converging during training, whereas a suboptimal low learning rate can impede the training process and potentially lead to local minima. Batch size denotes the number of data samples simultaneously processed during training. The quantity of GRU layers directly influences the model's complexity and learning capacity. Additional layers can enable the model to learn more complex features and may also lead to overfitting and increased training costs. In the multi-head attention mechanism, the count of heads dictates the range of different information streams that the model can process in parallel. An increased number of heads enables the model to consider more perspectives during sequence processing but also escalates the computational burden of the model.
Selecting hyperparameters often relies on empirical methods, which may be inefficient and suboptimal for identifying the best parameter combinations. To overcome this challenge, Bayesian optimization is employed for hyperparameter selection in our model. Bayesian optimization is initiated by constructing a surrogate model, typically a Gaussian process, to approximate the objective function. The surrogate model provides predictions of the objective function and quantifies the uncertainty of these predictions (i.e., confidence intervals). Based on the surrogate model, an acquisition function, which aims to balance exploration in areas of high uncertainty and exploitation in regions with known high performance, is utilized to guide the search process. A point is evaluated each time based on the acquisition function, and Bayesian optimization updates the surrogate model, iteratively refining the process (Chung et al. 2022). Each iteration enhances the understanding of the objective function, thereby simplifying the search for the optimal solution. The hyperparameters selected from the optimization results and based on experience are listed in Table 3.
1DCNN-GRU-attention model hyperparameter
No. . | Hyperparameters . | Best hyperparameters . |
---|---|---|
1 | Batch size | 64 |
2 | Number of output channels | 6 |
3 | Kernel size | 24 |
4 | Learning rate | 0.0322 |
5 | Number of hidden units | 160 |
6 | Number of GRU layers | 2 |
7 | Number of heads | 10 |
No. . | Hyperparameters . | Best hyperparameters . |
---|---|---|
1 | Batch size | 64 |
2 | Number of output channels | 6 |
3 | Kernel size | 24 |
4 | Learning rate | 0.0322 |
5 | Number of hidden units | 160 |
6 | Number of GRU layers | 2 |
7 | Number of heads | 10 |
Evaluation indicators of the model

RESULTS AND DISCUSSION
Due to the CGMA model's superior speed and stability in processing large-scale data compared with machine learning models, this section tests its accuracy by comparing the results of the CGMA model with those of other machine learning models. Then, we decompose the complete CGMA model into its components and compare it with the full CGMA model, analyzing the experimental results from multiple viewpoints.
Comparison with benchmark models
Pressure prediction effect of five monitoring points: (a) Monitoring point 1; (b) Monitoring point 2; (c) Monitoring point 3; (d) Monitoring point 4; and (e) Monitoring point 5.
Pressure prediction effect of five monitoring points: (a) Monitoring point 1; (b) Monitoring point 2; (c) Monitoring point 3; (d) Monitoring point 4; and (e) Monitoring point 5.
Effect of 1DCNN and multi-head attention mechanism
CONCLUSION
In this study, we employed the CGMA model to predict pressure within LS City's water supply network. This method combines 1DCNN and GRU structures for training time series data and outperforms competing models in extracting temporal features from the input data. Additionally, an attention mechanism was introduced to enhance learning by focusing on key features in the temporal data. The proposed CGMA model demonstrated superior predictive accuracy, with an MAE of 0.00197, an MSE of 0.00001, and an RMSE of 0.00262. Throughout the entire testing period, the CGMA had the lowest prediction error, indicating that CGMA is a viable option for pressure prediction within water supply networks. This study utilizes a deep learning method to enhance the accuracy of pressure prediction, thus optimizing pressure management and ensuring a more efficient and reliable water supply. By reducing waste and ensuring the optimal use of resources, enhanced pressure prediction supports sustainable water management.
Given that the CGMA model primarily focuses on feature extraction and learning from historical data, it can be adapted to other water distribution networks with similar or different characteristics. This model can also be used to predict other hydraulic parameters in the water distribution network, including water demand and flow rate. When using this model to predict pressure or flow in other water distribution networks, model training and application can be performed with just the corresponding historical data. That is, as long as there is sufficient historical data, this model can be applied to any water distribution network. However, when the data is scarce or of poor quality, its effectiveness significantly decreases. This study exhibits some limitations. Firstly, a thorough investigation of the inter-node relationships is crucial. Secondly, external variables, including weather and holidays, substantially influence daily water pressure patterns. Furthermore, the data collection interval for pressure in this study is set at 1 h. Reducing the time interval and increasing the data volume can mitigate the uncertainties associated with long-term periods. To enhance accuracy, we could consider reducing the pressure data collection interval to 1, 5, or 15 min, enabling the model to extract more detailed information. Incorporating these factors into future research will be crucial.
ACKNOWLEDGEMENTS
This work was supported by the National Key R & D Program of China (Grant number [2022YFC3801000]), the Program for Innovative Research Team (in Science and Technology) in the University of Henan Province (Grant number [23IRTSTHN004]), and Scientific Research Projects of Power China Railway Construction Investment Group Co. Ltd (DJ-ZSLJ-2023-01).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.