The water supply network is a crucial component of urban construction and plays a pivotal role in the normal operation of the city. The complexity and periodicity of pressure variations in the water supply networks pose significant challenges to traditional prediction models. In this study, we introduce a novel short-term pressure prediction model, termed the 1DCNN-GRU-multi-head attention (CGMA) model, which incorporates a multi-head attention mechanism alongside an integrated network consisting of one-dimensional convolutional neural networks (1DCNNs) and gated recurrent units (GRUs). Initially, the model employs a 1DCNN network to extract features from the pressure data. Subsequently, the extracted features are input into the GRU neural network, leveraging its long-term dependency capabilities to improve prediction accuracy. Then, the attention mechanism is incorporated into the GRU component to highlight key information, enabling the model to focus on more important data features, thereby improving the prediction performance. We implemented this model in a real urban water distribution system equipped with five pressure sensors. The model achieved a mean absolute error of 0.00197 and a root mean square error of 0.00262. When compared with alternative approaches, our method demonstrated superior predictive performance for pressure data, thereby confirming its efficacy in practical applications.

  • The model in this study is anchored by the GRU model as its central framework, complemented by a 1DCNN-GRU-multi-head attention composite model from additional modules.

  • The 1DCNN is responsible for capturing local features, while the attention mechanism focuses on identifying key information within the data.

  • This integrative approach substantially enhances the model's comprehension of time series data.

The water supply network plays a pivotal role in providing essential water resources to residents, industrial sectors, and public facilities. Within this network, the distribution segment holds a critical responsibility: to convey treated potable water to end-users while maintaining the requisite flow rate, pressure, and quality standards (Ostfeld & Salomons 2004). The water supply network is intrinsically complex, encompassing not only the conventional pipelines but also integrating valves, reservoir, and sophisticated hydraulic monitoring apparatuses (Günther et al. 2015). Effective pressure management, crucial for minimizing risks of pipeline bursts and ensuring consumer satisfaction, relies on strategies such as variable frequency pumps, zonal divisions, and real-time monitoring (Creaco et al. 2015). Accurate short-term pressure predictions enhance management efficiency and adaptability to fluctuating demands, promoting a robust and responsive network (Negharchi & Shafaghat 2022).

In the water supply network, predicting short-term pressure fluctuations is essential for efficient water supply management, impacting resource allocation and the detection of faults within the infrastructure. Due to user activities, pressure fluctuations in the water distribution network can be both long-term and short-term (Marsili et al. 2023; Mazzoni et al. 2024). The challenge intensifies during atypical usage patterns, like extreme weather events or holidays, where predictive model reliability is paramount (Bello et al. 2019). Advanced analytical approaches are necessary to maintain precision and effectiveness in short-term pressure forecasting, contributing significantly to the reliability, safety, and economic efficiency of urban water supply systems (Stańczyk et al. 2022).

Traditional hydraulic models rely on explicit mathematical equations, which often need to be simplified and assumed to make the calculations feasible. However, water supply networks are highly nonlinear and complex, involving numerous interactions and dynamic changes, posing significant challenges for traditional hydraulic models. Conversely, deep learning models take a data-driven approach, capturing complex nonlinear relationships within the data for learning and computation, without the need for explicit mathematical descriptions. Thanks to advancements in the Internet of Things (IoT), the speed and scope of information collection have significantly improved. Deep learning models can utilize vast amounts of historical data for training, predicting future outcomes by identifying and learning patterns and trends within the data. This capability makes them particularly effective in handling high-dimensional, large-scale datasets, extracting valuable information from historical operational data to predict future pressure. Although traditional hydraulic equations are theoretically robust and effective under specific conditions, deep learning offers more flexible and precise tools, particularly when dealing with complex and variable real-world applications.

The traditional approach to analyzing time series data of pipeline states has predominantly been through statistical methods. Sinha & McKim (2007) employed a time-dependent Markov prediction model to forecast pipeline deterioration, facilitating more effective pipeline management. Francis et al. (2014) employed Bayesian belief networks in water supply network rupture prediction, aiming to predict or understand the rate of pipe rupture in a given potable water supply system, even when faced with potentially incomplete or imperfect data. Chen & Guikema (2020) discovered that locally weighted density scans are the most accurate method for identifying areas of high breakage. This application allows statistical prediction models to more accurately prioritize high-risk pipelines, thereby enhancing their performance. Młyński et al. (2021) regard the Colwell indexes as a reliable tool for evaluating the seasonality of water pipeline failure events.

These statistical methods subsequently evolved with the integration of machine learning algorithms, including artificial neural networks (ANNs), support vector machines (SVMs), and random forests, which have been progressively employed to enhance the predictive accuracy of water supply networks. Perea et al. (2019) leveraged genetic algorithms to optimize ANNs, particularly targeting scenarios characterized by limited data sets to refine water demand predictions. Ping et al. (2014) developed an RBF-SVM model, expressly designed to predict actual pressures. Xue et al. (2012) proposed an improved PCA-SVM model, elevating both the accuracy and efficiency of water demand forecasting. Herrera et al. (2010) conducted a comparative analysis of various models, revealing the superior precision of the support vector regression (SVR) model. Mouatadid & Adamowski (2017) introduced the application of an extreme learning machine (ELM) model in water demand forecasting, which uniquely incorporates a range of variables, including water demand, total precipitation, and peak temperature, thereby augmenting its predictive precision. Xia et al. (2021) presented a novel approach combining genetic algorithms with backpropagation (BP) neural networks, aiming at forecasting flow and pressure dynamics in water supply networks. Tiwari & Adamowski (2013) proposed a wavelet–bootstrap–neural network (WBNN) hybrid model that is tailored specifically for short-term urban water demand forecasting, contributing to the sophisticated array of predictive tools in this crucial field.

Due to the complex structure of water supply networks, machine learning methods face challenges when training models with large-scale, intricate data. To address the processing of voluminous and complex data, deep learning has been progressively applied for pressure prediction in water distribution networks. Zhe et al. (2015) developed a nonlinear autoregressive (NARX) network model, which harnesses the power of both real-time and historical operational data to estimate demands at various nodes and to construct functional correlations among principal variables within the network, resulting in the proficiency to efficaciously track and forecast the dynamic fluctuations inherent in water supply systems. Xu et al. (2020) proposed the long short-term memory (LSTM) models for their exceptional capacity in nonlinear mapping, which is particularly adept at handling time series data such as pressure in pipe networks. The research introduced a parallel LSTM and deep neural network (PLDNN) model, which amalgamates the respective strengths of LSTM and DNN architectures and specifically designs to augment the feature extraction efficiency of both control and state variables, resulting in enhanced accuracy and efficacy in predictive performance. Kavya et al. (2023) have compared the results of various models in predicting short-term water demand and found that for multivariate data, LSTM yields the best predictions. Liao et al. (2022) proposed a pressure prediction method based on a spatiotemporal neural network (PP-STNN), which captures the pipeline network's spatial and temporal correlation using a graph convolutional network (GCN) and gated recurrent units (GRUs), respectively, linking the spatial and temporal dynamics of pressure at the nodes of the water supply network. Zanfei et al. (2022) have proposed an anomaly detection method based on GCN, which relies on pressure and flow data.

It has been observed that machine learning methods are prone to losing sequence information and exhibit higher error rates when processing excessively long input sequences. Additionally, other deep learning models demonstrate limitations in extracting local information features and adequately prioritizing important features, as shown in Table 1. Compared with other models, this study excels at accurately predicting future pressures in water distribution networks by leveraging a deep understanding of historical pressure data, coupled with the benefits of attention weights and fast computation. The study presents an innovative CGMA model, where the GRU model acts as the core framework, supplemented by additional modules to improve the accuracy of pressure prediction in water supply networks. The architecture of the GRU model has been carefully tailored to complement the unique characteristics of these networks. The CGMA model represents its first application to the prediction of pressure dynamics in water supply networks. This integration substantially refines the model's understanding of time series data, which is a notable contribution of this research.

Table 1

Comparison of advantages and disadvantages of each model

ModelAdvantageDisadvantage
SVM Suitable for small- and medium-sized data, the prediction speed is fast and the accuracy is high Less effective on noisier datasets with overlapping classes; performance and speed degrade with larger datasets 
ARIMA Good for time series data with a strong seasonal pattern, straightforward to implement and interpret Requires the data to be stationary 
GA-BP Combines GA's global search capability with BP's local search ability Computationally expensive, and can be slow to converge to the optimal solution; risk of overfitting 
LSTM Excellent at capturing long-term dependencies in time series data Computationally intensive, challenging to tune due to many hyperparameters 
CNN-GRU Leverages CNN's feature extraction capabilities and GRU's efficiency in sequence prediction, suitable for spatiotemporal data Complex model structure that can be hard to interpret; requires large amounts of data to train effectively 
ModelAdvantageDisadvantage
SVM Suitable for small- and medium-sized data, the prediction speed is fast and the accuracy is high Less effective on noisier datasets with overlapping classes; performance and speed degrade with larger datasets 
ARIMA Good for time series data with a strong seasonal pattern, straightforward to implement and interpret Requires the data to be stationary 
GA-BP Combines GA's global search capability with BP's local search ability Computationally expensive, and can be slow to converge to the optimal solution; risk of overfitting 
LSTM Excellent at capturing long-term dependencies in time series data Computationally intensive, challenging to tune due to many hyperparameters 
CNN-GRU Leverages CNN's feature extraction capabilities and GRU's efficiency in sequence prediction, suitable for spatiotemporal data Complex model structure that can be hard to interpret; requires large amounts of data to train effectively 

Methodology and data

The CGMA model introduced in this article operates as depicted in Figure 1, segmented into data input, data preparation, pressure prediction, and pressure output stages. Initially, raw pressure data undergoes cleaning and transformation for preprocessing. The model, integrating a 1DCNN, GRU, and an attention mechanism, extracts features from the pressure data, facilitating subsequent predictions of future pressure states following substantial training. This section provides a focused introduction to these components. Following this, the preprocessing steps are elucidated through a case study. The exposition concludes by detailing the extensive hyperparameters employed in the model.
Figure 1

The architecture of the CGMA method.

Figure 1

The architecture of the CGMA method.

Close modal

1DCNN component

CNN is a form of deep learning architecture renowned for its prowess in processing image data, predominantly employed for feature detection within two-dimensional images. 1DCNN is very similar to CNN, composed of a series of convolutional layers, but they also exhibit numerous differences, as detailed in Table 2. This study employs 1DCNN to leverage the feature extraction capabilities inherent in CNN, facilitating the identification and extraction of features from pressure data (Huang et al. 2019). Despite having only one dimension, 1DCNN retains the advantages of CNN's translation invariance in feature recognition, and the one-dimensional nature of the convolution kernel means that a large convolution kernel does not result in an excessive number of parameters and calculations (Wang et al. 2021). 1DCNN slides the convolutional kernel along the input data (time series) and computes the dot product of the kernel with the input data at each position. Take the output eigenvalue of the convolution operation at time step t as an example:
(1)
where x(t + s) represents an element in the input sequence covered by the convolutional kernel, starting at time step t, s is the relative position within the kernel. ω(s) is the weight of the convolutional kernel at position s. Each convolutional kernel performs a dot product operation with the local region of the input data through its weights, thereby extracting specific features. K, the size of the convolutional kernel, determines the range of the input data covered by the kernel. b is the bias term, which enhances the flexibility and expressiveness of the model. y(t) is the value of the output feature map at time step t resulting from the convolution operation.
Table 2

1DCNN And CNN features comparison

Feature1DCNNCNN
Data dimension Processes one-dimensional data (e.g., time series) Processes two-dimensional data (e.g., images) 
Convolution operation Convolution kernel moves along the time axis of the data Convolution kernel moves across the height and width of the data 
Application areas Audio processing, time series analysis, feature extraction from sequences Image and video processing, image classification 
Network structure Simple structure More complex structure 
Feature1DCNNCNN
Data dimension Processes one-dimensional data (e.g., time series) Processes two-dimensional data (e.g., images) 
Convolution operation Convolution kernel moves along the time axis of the data Convolution kernel moves across the height and width of the data 
Application areas Audio processing, time series analysis, feature extraction from sequences Image and video processing, image classification 
Network structure Simple structure More complex structure 
Figure 2 includes four different operational layers, defined as follows. The convolutional layer (Conv Layer) uses multiple convolutional kernels to perform convolution operations on the input data, extracting local features. The batch normalization layer (Batch Norm) standardizes the output of the convolutional layer by normalizing the data for each batch, thereby reducing internal covariate shift. The rectified linear unit (ReLU) is an activation function that sets input values less than 0–0, while keeping input values greater than or equal to 0 unchanged, thus introducing nonlinearity. The dropout layer randomly sets the output of some neurons to zero during training.
Figure 2

1DCNN network structure.

Figure 2

1DCNN network structure.

Close modal

The convolutional operations in the two convolutional layers are set differently. For the first type, the kernel size is set to a smaller value to extract local patterns and structures in the data (Figure 2). For the other type, the kernel size is consistent with the period of the time series to capture long-term information in the time series. After each convolutional operation, the application of ReLU activation, batch normalization, and dropout is intended to improve training effects and prevent overfitting. Compared with traditional activation functions such as Sigmoid or Tanh, the nonsaturating activation function ReLU is able to pass the gradient more efficiently and mitigate the issue of gradient vanishing during BP. Batch normalization renders the model less sensitive to weight initialization and serves as a regularizer, which helps to prevent overfitting of the model. Dropout compels the network to run without relying on any specific combination of neurons, akin to training multiple networks and sharing their weights, thereby preventing the model from overfitting the training data and improving the model's generalization ability. Finally, after undergoing two different types of convolutional operations, the output data of the convolutional layers become the input for the next phase.

GRU component

Deep learning technology, a branch of neural network research, has seen significant progress forward with the development of recurrent neural network (RNN). Renowned for their exceptional nonlinear mapping capabilities, RNNs have demonstrated particular efficacy in the analysis and prediction of time series data. Theoretically, RNNs possess the capability to process historical information of indefinite length. However, in practice, processing such information requires an expansion into a proportionate number of network layers, akin to a multilayer feedforward neural network. This necessary expansion often leads to issues such as gradient vanishing and other training-related challenges. Such difficulties impose limitations on the RNN's ability to efficiently handle extensive historical data. To solve this problem, Cho et al. (2014) proposed the GRU model.

The GRU model adeptly processes the historical state information of pipe network pressure and accommodates historical data of varying lengths. This capability allows it to overcome the limitations of traditional RNN. The GRU model is used for predictive analysis. The GRU model enhances the efficacy of conventional RNNs by the integration of two distinct units: the update gate and the reset gate, the structure of which is shown in Figure 3.
Figure 3

GRU structure diagram.

Figure 3

GRU structure diagram.

Close modal
The GRU consists of four main components: the update gate, the reset gate, the candidate hidden state, and the final hidden state. The update gate, denoted as , functions similarly to a combination of the forget and input gates in the LSTM model. It plays a crucial role in modulating the influence of previous hidden states on the current state by determining the extent of information retention from past states and the magnitude of new information incorporation from the current time step. The reset gate, denoted as , dictates the degree to which preceding hidden state information is disregarded during the computation of the current candidate hidden state. The candidate hidden state, denoted as , is derived from the output of the reset gate and introduces fresh informational content into the hidden state at the current time step. The final hidden state, denoted as , amalgamates information from past hidden states with that of the newly computed candidate hidden state, moderated by the update gate (Niu et al. 2023).
(2)
(3)
(4)
(5)
(6)
(7)
where represents the weights associated with the update gate. is the previous hidden state, is the current input, and denotes the sigmoid activation function. is weights associated with the reset gate. represents the weights associated with the candidate hidden state, and tanh denotes the hyperbolic tangent activation function, which maps real-valued inputs to the range [−1, 1], providing a smooth, bounded output that aids in capturing complex, nonlinear relationships.

At the crux of the GRU model's operational mechanism is its capacity to dynamically modulate information flow, contingent upon the current inputs and historical data of the sequence. The synergistic operation of the update and reset gates empowers the GRU to adeptly capture long-term dependencies in time series data, circumventing the gradient vanishing dilemma commonly encountered in traditional RNNs. Additionally, the GRU is characterized by a parameter-optimized architecture, possessing fewer parameters (sans output gates) in comparison with the LSTM. This attribute renders the GRU more expedient and computationally economical during the training phase.

Attention component

Contrary to conventional time series models, which uniformly weigh each time step, attention mechanisms introduce a nuanced approach by attributing differential weights to various time steps. This feature proves pivotal in contexts where specific historical data points possess greater predictive value for future outcomes (Niu et al. 2021). In the realm of time series, the attention mechanism employs the constructs of query (Q), key (K), and value (V). The query typically corresponds to the current state within the sequence being forecasted, whereas the keys and values are extrapolated from historical data. The model computes attention scores through a comparative analysis of the query against each key. These scores are indicative of the relative significance of each time step's data (value) in the prediction of the imminent time step (Vaswani et al. 2017). The normalization of these scores is achieved using a softmax function, thereby converting them into a probabilistic distribution that functions as weights for the values. This leads to a contextual focus where each value, representing a historical data point, is assigned weights based on the softmax-derived outputs. Consequently, the model formulates a context vector for each predictive time step, selectively emphasizing certain historical moments over others. In the final phase of output prediction, this context vector, which encapsulates a concentrated historical narrative, is utilized in conjunction with neural networks, to forecast the subsequent time step in the series.

To augment the GRU model's capability in extracting diverse data features, a sophisticated variant of the attention mechanism, known as the multi-head attention mechanism, is implemented. This mechanism concurrently processes information across various subspaces, thereby intricately capturing the complex interrelations embedded within the data (Su et al. 2023). Each ‘head’ in this mechanism independently assimilates the input data's characteristics from distinct perspectives, enriching the model's comprehension with a broader spectrum of information. The amalgamation of insights gleaned from these multiple heads culminates in a data representation that is substantially more nuanced and potent than what is achievable through single-head attention. This multi-faceted approach enables simultaneous focus on data from assorted dimensions, substantially elevating the learning process in terms of efficiency and efficacy. While the insights derived from each head may differ, their integration facilitates a more holistic and comprehensive representation of the data.

The incorporation of the multi-head attention mechanism adeptly addresses the critical issue of long-range dependencies in sequential data processing. By methodically accentuating the most salient segments of the data sequence, this advanced mechanism substantially enhances the model's predictive precision. It refines the analytical acumen of the GRU by concentrating on pivotal data points, thereby amplifying computational efficacy. This innovative methodology facilitates prudent resource distribution, effectively augmenting the system's proficiency and operational capabilities in intricate sequence analysis endeavors. As a result, the synergy of the multi-head attention mechanism with the GRU framework results in a more formidable and exhaustive approach to sequential data analysis, effectively circumventing the historical limitations associated with managing extensive temporal correlations.

The integrated model of 1DCNN-GRU-multi-head attention

Sequential data typically exhibit significant dependencies between preceding and subsequent elements in the time dimension. Temporally, the pressures in the water supply network demonstrate proximity and similarity, which suggests that future output is intimately linked to the past state. Consequently, an innovative CGMA model is introduced to handle such data more effectively. Initially, local characteristics in the time series data are captured via a one-dimensional convolutional layer, extracting nuanced, localized features from the dataset. Subsequently, the output from the CNN is further processed in the GRU layer that excels in identifying and capturing long-term dependencies, effectively discerning important trends and patterns in the time series through analyzing successive progressions and dynamic fluctuations. This layer's processing is imperative for understanding long-term behavioral patterns in complex time-series data. Ultimately, the output of the GRU layer is directed to an attention mechanism module, which primarily focuses the model's attention on the most critical segments of the time series. The introduced attention mechanism quantitatively assigns importance weights to each specific time step in the sequential features, aiming to mitigate the attentional dispersion defects of the traditional GRU.

Thus, the model effectively identifies the historical information most critical to the current prediction, improving the overall prediction performance.

The prediction steps of the CGMA model delineated in this paper are as follows:

  • (1) Collect the pressure data of the water supply network and conduct data preprocessing.

  • (2) Retrieve the pressure state of the water supply network from the dataset and build a one-dimensional convolutional neural network to extract local features.

  • (3) A one-dimensional convolutional network is utilized to process the water distribution network's pressure features, providing input for the GRU network. This approach enables the extraction of long-term temporal characteristics from the pressure data.

  • (4) Implement the attention mechanism to optimize the weight distribution automatically. Multiply and sum the output vectors of the GRU's hidden layer at different time points with the corresponding weights to emphasize the important feature components.

  • (5) Integrate the temporal characteristics of the water distribution network pressure into the regression prediction layer as input, and compute the corresponding prediction results. Define the model's loss function and iteratively optimize the model parameters based on the loss function's value using the BP algorithm. Real-time water distribution network pressure data is continuously fed into the model to enable short-term predictions.

Overall, the prediction model introduced in this paper is adept at effectively processing pressure data in water supply networks and yielding accurate time-series predictions. The model's complete structure, as shown in Figure 4, integrates 1DCNN, GRU, and the attention mechanism, and not only enables a more comprehensive understanding and prediction of complex time series data but also assists in optimizing the operation and management of the water supply network.
Figure 4

Flowchart of the CGMA pressure prediction model.

Figure 4

Flowchart of the CGMA pressure prediction model.

Close modal

Data presentation

In the experiment, the dataset was obtained from five pressure monitoring sensors located within the water supply network of WTQ District, LS City, as shown in Figure 5. Five black dots indicate the locations of the key pressure monitoring points, and the black square represents the location of the water plant. The total length of the water supply network is 300 km, serving a population of 213,100. The main pipeline shown in Figure 5 is a 400 mm polyethylene (PE) pipe installed one year ago. The average elevation of the area is 424 m, located in the urban–rural fringe, primarily meeting residential water demand, with minimal industrial water usage. The data were collected from 15 August 2023 to 27 September 2023, with a sampling interval of 1 h. In total, each sensor recorded 1,057 values within the dataset. During the training process, 90% of the total samples were used to optimize the parameters of the developed model, while the remaining 10% were used to evaluate the model's performance.
Figure 5

LS city WTQ district water supply network.

Figure 5

LS city WTQ district water supply network.

Close modal

Data preprocessing

The nodal pressures of the pipe network exhibit cyclical patterns, characterized by periodic variations likely related to water consumption, resulting in pressure fluctuations on different scales (daily, weekly, and seasonal). However, the magnitude of these changes varies at specific moments. Short-term pressure prediction in a water supply network entails forecasting the pressure at a future moment. To facilitate the network's training process, data preprocessing that includes outlier removal, missing value imputation, and normalization is essential. Firstly, the box plot method was employed, which imputed missing values using the median value. Finally, normalizing the data, by converting the scale of variables from 0 to 1, is imperative for training the pressure prediction model. This normalization method, a linear scaling technique, is particularly effective when dealing with input data spanning various ranges, often the case in deep learning model training. In the case of multi-feature inputs, the value ranges of different features can vary greatly. If normalization is not performed, features with larger value ranges may have a greater impact on the loss function, causing the model to be biased toward these features. By normalizing, all features can be brought to similar scales, ensuring that the model treats each feature more fairly. Min-max scaling, as depicted in Equation (8), was applied in this study:
(8)
where the normalization value, X′, is obtained by dividing the difference between X and Xmin by the difference between Xmax and Xmin.

Experimental environment and determination of model hyperparameters

The experimental environment consisted of an Intel(R) Xeon(R) Platinum 8222CL CPU @ 3.00 GHz and an NVIDIA GeForce RTX 4070 Ti GPU. We used Python 3.11 and the Pytorch framework to train and run the 1DCNN-GRU-attention neural network deep prediction model.

The hyperparameters of a model include the learning rate, batch size, the number of GRU layers, and the number of attention heads. The learning rate dictates the step size for adjusting model weights. An excessively high learning rate may prevent the loss function from converging during training, whereas a suboptimal low learning rate can impede the training process and potentially lead to local minima. Batch size denotes the number of data samples simultaneously processed during training. The quantity of GRU layers directly influences the model's complexity and learning capacity. Additional layers can enable the model to learn more complex features and may also lead to overfitting and increased training costs. In the multi-head attention mechanism, the count of heads dictates the range of different information streams that the model can process in parallel. An increased number of heads enables the model to consider more perspectives during sequence processing but also escalates the computational burden of the model.

Selecting hyperparameters often relies on empirical methods, which may be inefficient and suboptimal for identifying the best parameter combinations. To overcome this challenge, Bayesian optimization is employed for hyperparameter selection in our model. Bayesian optimization is initiated by constructing a surrogate model, typically a Gaussian process, to approximate the objective function. The surrogate model provides predictions of the objective function and quantifies the uncertainty of these predictions (i.e., confidence intervals). Based on the surrogate model, an acquisition function, which aims to balance exploration in areas of high uncertainty and exploitation in regions with known high performance, is utilized to guide the search process. A point is evaluated each time based on the acquisition function, and Bayesian optimization updates the surrogate model, iteratively refining the process (Chung et al. 2022). Each iteration enhances the understanding of the objective function, thereby simplifying the search for the optimal solution. The hyperparameters selected from the optimization results and based on experience are listed in Table 3.

Table 3

1DCNN-GRU-attention model hyperparameter

No.HyperparametersBest hyperparameters
Batch size 64 
Number of output channels 
Kernel size 24 
Learning rate 0.0322 
Number of hidden units 160 
Number of GRU layers 
Number of heads 10 
No.HyperparametersBest hyperparameters
Batch size 64 
Number of output channels 
Kernel size 24 
Learning rate 0.0322 
Number of hidden units 160 
Number of GRU layers 
Number of heads 10 

Evaluation indicators of the model

The mean absolute error (MAE), mean squared error (MSE), and root mean square error (RMSE) were utilized as metrics to assess the model's accuracy, as delineated in Equations (9) and (10) (Plutowski et al. 1996). In this context, n represents the number of data points, y denotes the actual node pressure, and ŷ signifies the predicted node pressure. Considering RMSE's more frequent application in preceding studies, it was adopted as the principal evaluation metric.
(9)
(10)
where yi is the measured value and is the predicted value.

Due to the CGMA model's superior speed and stability in processing large-scale data compared with machine learning models, this section tests its accuracy by comparing the results of the CGMA model with those of other machine learning models. Then, we decompose the complete CGMA model into its components and compare it with the full CGMA model, analyzing the experimental results from multiple viewpoints.

Comparison with benchmark models

This study compares the SVM, ARIMA, and VAR models with the CGMA model, as shown in Figure 6. Traditional models suffer from low training efficiency on large-scale data, limited robustness, and inadequate generalization. Compared with traditional models, the CGMA model addresses these weaknesses, resulting in a significant improvement in prediction accuracy. The predictive performance of the ARIMA and VAR models is superior to that of the SVM model, particularly in real-time sequence data modeling. The CGMA model surpasses the ARIMA and VAR models in predictive capability due to the inherent advantages of neural networks in handling nonlinear time series data. Neural networks can learn complex nonlinear mappings through multilayer neuron activation functions, effectively capturing relationships within the data. Additionally, compared with traditional machine learning models, deep learning models can enhance generalization capability through large-scale data training. The CGMA model proposed in this study demonstrates a marked improvement over other machine learning methods.
Figure 6

Comparison of the prediction methods.

Figure 6

Comparison of the prediction methods.

Close modal
The variations of RMSE in the CGMA model within the verification set are depicted in Figure 7. An epoch represents a complete pass through and learning from the entire training dataset by the model. The RMSE of the CGMA model on the verification set demonstrates a decreasing trend with an increasing number of iterations and eventually stabilizes after a period of oscillation, which implies that the pressure within the water network can be precisely predicted. Figure 8 illustrates the CGMA model's predictive performance at five monitoring points across a span of 4 days.
Figure 7

The RMSE of the CGMA model.

Figure 7

The RMSE of the CGMA model.

Close modal
Figure 8

Pressure prediction effect of five monitoring points: (a) Monitoring point 1; (b) Monitoring point 2; (c) Monitoring point 3; (d) Monitoring point 4; and (e) Monitoring point 5.

Figure 8

Pressure prediction effect of five monitoring points: (a) Monitoring point 1; (b) Monitoring point 2; (c) Monitoring point 3; (d) Monitoring point 4; and (e) Monitoring point 5.

Close modal

Effect of 1DCNN and multi-head attention mechanism

GRU and 1DCNN-GRU are utilized as baseline models to analyze and compare the predictive performance of the proposed model. The results of the three prediction models are shown in Figure 9. It can be seen that the CGMA model exhibits the most significant performance advantage among the four methods. The performance comparison of the other three models highlights the advantages of different modules in the 24-h pressure prediction composite model. The RMSE error of the GRU model is greater than that of the 1DCNN-GRU model, indicating that the extraction of local features and key information plays a critical role in optimizing pressure prediction performance. The addition of the 1DCNN module enhances the model's ability to recognize and interpret local and temporal aspects. The RMSE of the CGMA model is 53.4, 50.8, and 36.3% lower than that of the GRU, GRU-attention, and 1DCNN-GRU models, respectively, demonstrating that combining the local feature extraction ability of 1DCNN with the attention mechanism's focus on key information forms a robust framework for predicting water pipeline pressure data. The 1DCNN first processes the input data, capturing important local patterns and features. Then, the GRU identifies long-term temporal patterns. Finally, the model uses an attention mechanism to strategically utilize the key information of these features for prediction.
Figure 9

Comparison of the reference models.

Figure 9

Comparison of the reference models.

Close modal

In this study, we employed the CGMA model to predict pressure within LS City's water supply network. This method combines 1DCNN and GRU structures for training time series data and outperforms competing models in extracting temporal features from the input data. Additionally, an attention mechanism was introduced to enhance learning by focusing on key features in the temporal data. The proposed CGMA model demonstrated superior predictive accuracy, with an MAE of 0.00197, an MSE of 0.00001, and an RMSE of 0.00262. Throughout the entire testing period, the CGMA had the lowest prediction error, indicating that CGMA is a viable option for pressure prediction within water supply networks. This study utilizes a deep learning method to enhance the accuracy of pressure prediction, thus optimizing pressure management and ensuring a more efficient and reliable water supply. By reducing waste and ensuring the optimal use of resources, enhanced pressure prediction supports sustainable water management.

Given that the CGMA model primarily focuses on feature extraction and learning from historical data, it can be adapted to other water distribution networks with similar or different characteristics. This model can also be used to predict other hydraulic parameters in the water distribution network, including water demand and flow rate. When using this model to predict pressure or flow in other water distribution networks, model training and application can be performed with just the corresponding historical data. That is, as long as there is sufficient historical data, this model can be applied to any water distribution network. However, when the data is scarce or of poor quality, its effectiveness significantly decreases. This study exhibits some limitations. Firstly, a thorough investigation of the inter-node relationships is crucial. Secondly, external variables, including weather and holidays, substantially influence daily water pressure patterns. Furthermore, the data collection interval for pressure in this study is set at 1 h. Reducing the time interval and increasing the data volume can mitigate the uncertainties associated with long-term periods. To enhance accuracy, we could consider reducing the pressure data collection interval to 1, 5, or 15 min, enabling the model to extract more detailed information. Incorporating these factors into future research will be crucial.

This work was supported by the National Key R & D Program of China (Grant number [2022YFC3801000]), the Program for Innovative Research Team (in Science and Technology) in the University of Henan Province (Grant number [23IRTSTHN004]), and Scientific Research Projects of Power China Railway Construction Investment Group Co. Ltd (DJ-ZSLJ-2023-01).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Bello
O.
,
Abu-Mahfouz
A. M.
,
Hamam
Y.
,
Page
P. R.
,
Adedeji
K. B.
&
Piller
O.
2019
Solving management problems in water distribution networks: A survey of approaches and mathematical models
.
Water
11
(
3
),
562
.
Chen
T. Y.-J.
&
Guikema
S. D.
2020
Prediction of water main failures with the spatial clustering of breaks
.
Reliability Engineering & System Safety
203
,
107108
.
Cho
K.
,
Merrienboer
B. v.
,
Gülçehre
Ç.
,
Bahdanau
D.
,
Bougares
F.
,
Schwenk
H.
&
Bengio
Y.
2014
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. arXiv preprint arXiv, 1406,1078
.
Creaco
E.
,
Farmani
R.
,
Kapelan
Z.
,
Vamvakeridou-Lyroudia
L.
&
Savic
D.
2015
Considering the mutual dependence of pulse duration and intensity in models for generating residential water demand
.
Journal of Water Resources Planning Management
141
(
11
),
04015031
.
Francis
R. A.
,
Guikema
S. D.
&
Henneman
L.
2014
Bayesian belief networks for predicting drinking water distribution system pipe breaks
.
Reliability Engineering & System Safety
130
,
1
11
.
Günther
M.
,
Camhy
D.
,
Steffelbauer
D.
,
Neumayer
M.
&
Fuchs-Hanusch
D.
2015
Showcasing a smart water network based on an experimental water distribution system
.
Procedia Engineering
119
,
450
457
.
Herrera
M.
,
Torgo
L.
,
Izquierdo
J.
&
Pérez-García
R.
2010
Predictive models for forecasting hourly urban water demand
.
Journal of Hydrology
387
(
1–2
),
141
150
.
Kavya
M.
,
Mathew
A.
,
Shekar
P. R.
&
Sarwesh
P.
2023
Short term water demand forecast modelling using artificial intelligence for smart water management
.
Sustainable Cities Society
95
,
104610
.
Liao
Z.
,
Liu
M.
,
Du
B.
,
Zhou
H.
&
Li
L.
2022
A temporal and spatial prediction method for urban pipeline network based on deep learning
.
Physica A: Statistical Mechanics its Applications
608
,
128299
.
Marsili
V.
,
Mazzoni
F.
,
Alvisi
S.
,
Maietta
F.
,
Capponi
C.
,
Meniconi
S.
,
Brunone
B.
&
Franchini
M.
2023
Investigation of pressure transients induced on a real water service line by user's activity
.
AQUA – Water Infrastructure, Ecosystems Society
72
(
12
),
2331
2346
.
Mazzoni
F.
,
Blokker
M.
,
Alvisi
S.
&
Franchini
M.
2024
An enhanced method for automated end-use classification of household water data
.
Journal of Hydroinformatics
26
(
2
),
408
423
.
Młyński
D.
,
Bergel
T.
,
Młyńska
A.
&
Kudlik
K.
2021
A study of the water supply system failure in terms of the seasonality: Analysis by statistical approaches
.
AQUA – Water Infrastructure, Ecosystems and Society
70
(
3
),
289
302
.
Mouatadid
S.
&
Adamowski
J.
2017
Using extreme learning machines for short-term urban water demand forecasting
.
Urban Water Journal
14
(
6
),
630
638
.
Niu
Z.
,
Zhong
G.
&
Yu
H.
2021
A review on the attention mechanism of deep learning
.
Neurocomputing
452
,
48
62
.
Niu
Z.
,
Zhong
G.
,
Yue
G.
,
Wang
L.-N.
,
Yu
H.
,
Ling
X.
&
Dong
J.
2023
Recurrent attention unit: A new gated recurrent unit for long-term memory of important parts in sequential data
.
Neurocomputing
517
,
1
9
.
Ostfeld
A.
&
Salomons
E.
2004
Optimal operation of multiquality water distribution systems: Unsteady conditions
.
Engineering Optimization
36
(
3
),
337
359
.
Perea
R. G.
,
Poyato
E. C.
,
Montesinos
P.
&
Díaz
J. A. R.
2019
Optimisation of water demand forecasting by artificial intelligence with short data sets
.
Biosystems Engineering
177
,
59
66
.
Ping
J.
,
Wang
R.
,
Sun
J.
,
Xiao
C.
,
2014
Pressure prediction of a water distribution network based on SVM
. In:
ICPTT 2014:Creating Infrastructure for a Sustainable World
(
Ma
B.
,
Najafi
M.
&
Tang
H.
, eds.).
ASCE
,
Reston, VA, USA
, pp.
155
168
.
Plutowski
M.
,
Cottrell
G.
&
White
H.
1996
Experience with selecting exemplars from clean data
.
Neural Networks
9
(
2
),
273
294
.
Sinha
S. K.
&
McKim
R. A.
2007
Probabilistic based integrated pipeline management system
.
Tunnelling and Underground Space Technology
22
(
5–6
),
543
552
.
Stańczyk
J.
,
Kajewska-Szkudlarek
J.
,
Lipiński
P.
&
Rychlikowski
P.
2022
Improving short-term water demand forecasting using evolutionary algorithms
.
Scientific Reports
12
(
1
),
13522
.
Vaswani
A.
,
Shazeer
N.
,
Parmar
N.
,
Uszkoreit
J.
,
Jones
L.
,
Gomez
A. N.
,
Kaiser
Ł.
&
Polosukhin
I.
2017
Attention is all you need
. In:
1st Conference on Neural Information Processing Systems (NIPS 2017).
Curran Associations Inc., Long Beach, CA, USA, pp. 6000–6010.
Wang
K.
,
Ma
C.
,
Qiao
Y.
,
Lu
X.
,
Hao
W.
&
Dong
S.
2021
A hybrid deep learning model with 1DCNN-LSTM-Attention networks for short-term traffic flow prediction
.
Physica A: Statistical Mechanics and its Applications
583
,
126293
.
Xia
W.
,
Wang
Y.
,
Liu
R.
&
Wang
S.
2021
Research on flow and pressure prediction of urban water supply pipeline network based on GA-BP algorithm
.
Journal of Physics: Conference Series
1792
(
1
),
012045
.
Xue
X. H.
,
Xue
X. F.
&
Xu
L.
2012
Study on improved PCA-SVM model for water demand prediction
.
Advanced Materials Research
591
,
1320
1324
.
Zanfei
A.
,
Menapace
A.
,
Brentan
B. M.
,
Righetti
M.
&
Herrera
M.
2022
Novel approach for burst detection in water distribution systems based on graph neural networks
.
Sustainable Cities and Society
86
,
104090
.
Zhe
X.
,
Jie
Y.
,
Huaqiang
C.
,
Yaguang
K.
&
Bishi
H.
2015
Water distribution network modeling based on NARX
.
IFAC-PapersOnLine
48
(
11
),
72
77
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).