One of the goals of efficient water supply management is to provide a regular supply of water pressure to meet the water needs of consumers. Water pressure control is closely related to consumer water demands, and an accurate prediction of water consumption is critical for effective water supply management. However, it is difficult to effectively predict the flow of water supply networks, which are characterized by uncertainty and instability. To obtain more accurate flow prediction data, this work proposes a new adaptive robust method for time series prediction modeling, called Complete Ensemble Empirical Mode Decomposition with Adaptive Noise–Kernel Principal Component Analysis–Long Short-Term Memory (CEEMDAN–KPCA–LSTM), in which CEEMDAN and KPCA preprocessing techniques are combined for determining flow prediction. First, the flow data are decomposed using the CEEMDAN algorithm to reduce non-smoothness. Then, KPCA is used to extract the key influencing factors from the feature series. Finally, the LSTM network is constructed to predict the water supply network flow using the results of the CEEMDAN and KPCA algorithms. The suggested scheme offers significant application prospects for water supply systems.

  • The water flow data are decomposed by Complete Ensemble Empirical Mode Decomposition with Adaptive Noise to extract more informative features.

  • Kernel Principal Component Analysis was used to select important data decomposition sequence to reduce the computational load of the model.

  • The developed method could effectively improve the prediction accuracy.

  • The proposed model has a shorter runtime compared to other models.

  • The scheme has significant potential for use in water supply systems.

Abbreviations Expansions

ANN

artificial neural network

ARIMA

AutoRegressive Integrated Moving Average

CEEMDAN

Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

CNN

convolutional neural network

EMD

empirical mode decomposition

EEMD

ensemble empirical mode decomposition

GRU

gated recursive unit

IMF

intrinsic mode function

KPCA

Kernel Principal Component Analysis

LSTM

Long Short-Term Memory

MAE

mean absolute error

MAPE

mean absolute percentage error

RNN

recurrent neural network

RMSE

root mean square error

SWAT

Soil and Water Assessment Tool

VMD

variational mode decomposition

CEEMDAN–AM–LSTM

Complete Ensemble Empirical Mode Decomposition with Adaptive Noise–Amplitude Modulation–Long Short-Term Memory

The water supply system is an important part of the urban infrastructure, which is not only directly related to the lives of city residents but also has a great impact on the economic development of the city (Saleem et al. 2021). Flow prediction of the water supply network refers to the process of using historical data and mathematical models to estimate the trend and pattern of water flows in the water supply network in the future period. For the government, flow prediction is important to optimize the design, operation, control, and management of the water supply network, improve the efficiency of water resources utilization, and reduce fluctuations in water pressure and water quantity (Xia et al. 2022). For residents, they could enjoy a more stable water experience, because the water supply department can reasonably plan the pressure of each area according to the result of water flow prediction. Therefore, it is of great significance to study water flow prediction, both in regional planning of water supply networks and in improving residents' water consumption experience.

Over the past few decades, scholars have conducted a great deal of research on urban water demand prediction. As the flow data from a water supply network is a typical time series, and traditional statistical and machine learning methods are the most common solution. Brentan et al. used adaptive Fourier series to improve support vector regression, eliminating the errors and most of the biases inherent in the regression structure, which could predict the water demand in near real time (Brentan et al. 2017). Zubaidi et al. aimed to provide a suitable and reliable technique for predicting municipal water demands using the gravitational search algorithm and the backtracking search algorithm with artificial neural network (ANN) (Zubaidi et al. 2018). Wu et al. proposed the back-propagation neural network based on the principal component analysis model to predict the water demand in Taiyuan, Shanxi Province, China, and the model outperformed other models in a variety of evaluation factors (Wu et al. 2021). Saranya and Vinish proposed a neural network autoregression model that was evaluated for the first time as a replacement for the physically based Soil and Water Assessment Tool hydrologic model for predicting the streamflow under data-poor conditions and for immediate, high-quality modeling results (Saranya & Vinish 2023). Direct water flow prediction is often not very good because it is nonlinear and non-smooth, with partial noise and outliers in the time series. Therefore, to improve the prediction accuracy, the data are usually decomposed and reduced in dimensionality. The empirical mode decomposition (EMD) is an adaptive and efficient method used to decompose nonlinear and non-stationary signals (Battista et al. 2009). It extracts a set of intrinsic mode functions (IMFs) from the analyzed signal by sifting stepwise. With the ensemble EMD (EEMD) algorithm by Wu & Huang (2004), which adds the Gaussian white noise to the signal to be decomposed, the modal mixing problem of EMD decomposition was successfully solved. Rezaiy and Shabri introduced EEMD coupled with the AutoRegressive Integrated Moving Average (ARIMA) model for drought prediction (Rezaiy & Shabri 2024). However, there is always some residual white noise in the IMF generated by the signal decomposition of the EEMD algorithm, which makes subsequent processing and analysis difficult.

To address these issues, Cao et al. proposed a Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) (Cao et al. 2019). After each order of decomposition, CEEMDAN added the Gaussian white noise to the decomposition and immediately performed an overall mean computation. Wang et al. proposed a hybrid CEEMDAN–AM–LSTM model for NOx prediction, and this method achieves the most accurate prediction of NOx concentration among other methods for thermal power plants (Wang et al. 2022). Guo et al. combined CEEMDAN with Long Short-Term Memory (LSTM) and bi-directional LSTM, respectively, to predict precipitation and lower Yellow River discharge, and the models have high accuracy (Guo et al. 2022; Zhang et al. 2023). Jiao et al. proposed a hybrid water quality prediction model based on variational mode decomposition (VMD), which was optimized by a sparrow search algorithm and two-way gated recursive unit (GRU) to provide technical support for river water quality protection and pollution prevention (Dong et al. 2023; Jiao et al. 2024). Gou and Ning established a deep convolutional neural network (CNN) based on the Kernel Principal Component Analysis (KPCA) and a coding scheme method for power prediction with high prediction accuracy and good robustness (Gou & Ning 2021). However, these methods do not fundamentally reduce the dimensionality of the data, which makes the data run in the prediction model for a longer period of time and is not generalizable.

The LSTM model has a long-term memory function, which can effectively solve the problems of gradient explosion and gradient disappearance generated by the recurrent neural network (RNN). Guo et al. established a threshold GRU network model for water demand forecasting, which outperformed the ANN and ARIMA models (Guo et al. 2018). Mu et al. used an LSTM model to predict short-term urban water demand in Hefei, China, and demonstrated that external parameters did not improve the performance of the LSTM model (Mu et al. 2020). Wang et al. proposed a short-term water quality prediction model based on VMD to optimize LSTM, and there is good performance in short-term water quality prediction (Wang et al. 2023). Yao et al. proposed a hybrid model based on CNN and LSTM (CNN–LSTM) for runoff prediction, which has wide applicability as verified by several datasets (Yao et al. 2023). It can be seen that LSTM neural networks are widely used in water flow prediction with high accuracy and low computational complexity. In this work, the LSTM is used to study the water flow prediction in a water supply network. In addition, the structure of the LSTM is modified for the characteristics of a water supply network. The specific contributions are as follows:

  • The CEEMDAN–KPCA–LSTM urban water supply network flow prediction model is established in this article. Compared with the preprocessing and prediction techniques in previous studies, CEEMDAN–KPCA can effectively deal with the nonlinear and non-smooth problems of the data, reduce the noise and dimensionality of the flow data, and extract the intrinsic features of the flow signal. It can improve the performance of flow prediction models by improving the input data quality and reducing the computational complexity.

  • In the flow prediction studies of water supply networks, there is a lack of a unified model evaluation standard or framework that integrates the predictive performance, computational complexity, interpretability and scalability of models. In this article, starting from the characteristics of water supply network flow data with time series data, combined with modal decomposition and feature extraction, the flow of the water supply network is predicted by the prediction model.

The rest of this article is organized as follows. Section 2 briefly introduces the basic principles of CEEMDAN, KPCA, and LSTM. Our model, which blends CEEMDAN and KPCA algorithms to achieve higher accuracy and lower time complexity, is presented in Section 3. In Section 4, the model is verified in an urban water supply network system.

In this section, we will briefly introduce the basic elements of the model, including CEEMDAN, KPCA, and LSTM.

Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

CEEMDAN is an extension of EEMD. In terms of eliminating mode mixing and reducing computational cost, CEEMDAN is efficient over EMD and EEMD. Compared to EEMD, CEEMDAN has a small number of sifting iterations (Poongadan & Lineesh 2024). The CEEMDAN model adds a finite-time adaptive Gaussian white noise to the decomposition process, which reduces the residual noise in the final result and solves the problem of modal mixing that occurs in EMD. The specific steps of the CEEMDAN decomposition are shown in the following.

  • (1) is the original input water flow sequence, is the added Gaussian white noise, and is the signal-to-noise ratio. represents the sequence to be decomposed, which required T experiments to construct:
    (1)
  • (2) In the first EMD decomposition, it will produce the first-mode components and the first-stage residuals as in Equations (2)–(3). is generated by performing k EMD decomposition of and a series of functional operations as in Equations (4)–(5). denotes the kth IMF component obtained by EMD decomposition. After T cycles of sequence decomposition, the ith IMF can finally be obtained.
    (2)
    (3)
    (4)
    (5)
  • (3) The final CEEMDAN decomposition of the original signal is given by the following equation
    (6)

Kernel Principal Component Analysis

By using kernel functions to map data into a high-dimensional feature space instead of PCA, KPCA improves the linear separability of the data in this space. Then, the traditional PCA algorithm is used to reduce the dimension of the mapped data. In this way, non-linearly separable data can be solved.

The dimensionality reduction function of the kernel function used is the Gaussian kernel function, also called the robust radial basis kernel, which has a good anti-interference ability for the noise in the data. the kernel matrix of the function is used. It can be expressed as
(7)
where , represent the data points in the high-dimensional space, and represents the smoothing factor, which is used to control the degree of dimensionality reduction of the kernel function. The eigenvalues and eigenvectors of the matrix are then calculated.
The cumulative contribution h of the eigenvalues can be calculated as
(8)
where P is the number of principal components extracted.
To preserve as much detail as possible in the original data, the contribution rate is set to less than q% (98%), and the first P principal components are selected as the data after dimensionality reduction (Chen et al. 2024). The obtained principal components are input variables to the LSTM model. As different principal components often have different orders of magnitude, this situation will affect the analysis results. Therefore, the data are normalized in MATLAB so that each principal component has the same order of magnitude. The normalized form is shown in the following equation:
(9)
where x is the input flow data and y is the flow data after normalization.

Long Short-Term Memory

The LSTM is a type of RNNs that is well suited to time series forecasting because of its ability to capture long-term dependencies in sequential data. Unlike traditional RNNs, LSTMs are designed to avoid the problem of a vanishing gradient problem that arises in long-term dependencies (Pires & Martins 2024). LSTM introduces three gating units based on RNNs: forgetting gate , input gate , and output gate , which allow LSTM to selectively remember valid information and forget invalid information in the cell state . and are the inputs, and is the output of this cell. The cell structure of the LSTM is shown in Figure 1.
Figure 1

Structure of the LSTM model.

Figure 1

Structure of the LSTM model.

Close modal
The three LSTM gates are computed as follows:
(10)
(11)
(12)
where is the sigmoid activation function; , , and are the weight vectors; and , , and are the bias vectors.
The candidate state can be calculated with the external state of the previous moment and the input of the current moment.
(13)
where tanh, , and are the activation function, the weight vector, and the bias vector, respectively.
The current cell state is computed by combining the forgetting gate, the input gate, the cell state at the previous moment and the candidate state.
(14)
Finally, is activated with the tanh function to obtain the output of this unit.
(15)

The flow sequence in the large-scale water supply network is often linked to the water habits of the users and shows a cyclical trend. In reality, there are some extreme situations that occur frequently, such as pipeline maintenance, which can disrupt the periodicity of the flow sequence. This has uncertainty and random disturbance in time and has chaotic characteristics.

CEEMDAN need to introduce noise into each decomposition and perform multiple decompositions, which makes the computation significantly larger. Compared to traditional EMD or EEMD, CEEMDAN is more expensive to compute, especially when dealing with high-dimensional data or long time series, where the computation time can be significantly extended. However, flow prediction models in the water supply network often need to respond to real-time flow to adjust future flow predictions. Therefore, an algorithm to reduce the dimensionality of the data is needed. The reason why we chose the KPCA algorithm is because it uses the kernel function to map data to high-dimensional feature space, which can effectively process data with complex nonlinear structure and extract more information and patterns. Meanwhile, traditional PCA can only capture linear features.

In this context, we propose a CEEMDAN–KPCA–LSTM model, which adopts the method of dimensionality reduction and noise reduction of data. It could reduce the random disturbance of water flow series and retain the periodic information of the water flow series to the greatest extent. Finally, this method could filter out the series data with strong periodicity and strong trends. The LSTM model has good prediction performance for this kind of data. The framework diagram is shown in Figure 2. The effect of CEEMDAN is to decompose the flow data into different components of IMF and residual. As we can see in Figure 2(e), the water flow series contains less and less valid information as the decomposition progresses. At the end of IMF10, the series is almost a straight line, while the series at IMF1 still retains a large amount of sequence information. The purpose of this method is to decompose the anomalies and residuals in the sequence to provide a basis for subsequent KPCA dimensionality reduction. The KPCA is used as dimensionality reduction for all IMF data. The principal components are sorted according to their contribution rates. In Figure 2(f), we decompose the entire IMF by using the KPCA method. The dimensionality reduction of each data point is represented as a small blue dot in space, and the red sphere represents the contribution boundary of 98% of the effective contribution of the data. It is obvious that some outliers are not within the boundary, so we decide to exclude them and further smooth the water flow sequence. The purpose of this method is to smooth out unnecessary residual terms and abnormal sequences, which could improve the degree of fit of the LSTM model to the curve and reduce the computation time.
Figure 2

Flowchart of the CEEMDAN–KPCA–LSTM algorithm.

Figure 2

Flowchart of the CEEMDAN–KPCA–LSTM algorithm.

Close modal

Before performing LSTM operations, normalization of the reduced dimensional data and the original data is necessary for the accuracy of the prediction data. In this part, the normalization method uses the max–min normalization method. The principal components are used as input variables to build the LSTM model for prediction, and finally, the model prediction results and evaluation indices as output variables. Importantly, the rectified linear unit (RELU) is used for the activation function in the LSTM. The traditional saturated activation functions, such as sigmoid and tanh, lead to a vanishing gradient, while the unsaturated activation functions, such as RELU, do not (Tollner et al. 2024). Compared to the saturated activation function, an unsaturated activation function such as RELU can accelerate the convergence speed of the model. The deep learning model using RELU can achieve similar or better results without pre-training prior to supervised training.

In this section, we will introduce how the CEEMDAN–KPCA–LSTM model is applied in the urban water supply network and compare it with other mainstream prediction models, including accuracy and running time.

Data source

This study uses real flow data from a water supply network of a city in Anhui, China. These flow data were obtained from some sensors and electromagnetic flowmeters deployed in the water supply network. The models of the sensors and flow meters are TDS-100W and KEFN-XXX-103-G3, respectively. In addition, the whole collection period is from March 1 to November 26, 2019. The sampling interval of the flow data was 1 h, and the study period was from 00:00 to 23:00.

In this water supply network, a total of 14 monitoring nodes were used to monitor water pressure and flow. The network topology diagram is shown in Figure 3. To simplify the water supply network model, only each monitoring node is simply connected without marking other nodes. Figure 4 shows the flow sequence in March, which is one of the network monitoring nodes. We can see that the existence of the flow sequence has a vague periodicity and trend. Our model will predict this kind of flow data.
Figure 3

Topography of the study area.

Figure 3

Topography of the study area.

Close modal
Figure 4

Chart of hourly flow changes during March.

Figure 4

Chart of hourly flow changes during March.

Close modal

Validation of CEEMDAN decomposition

The flow data of the water supply network were decomposed by the EMD and CEEMDAN algorithms using MATLAB. The Gaussian white noise standard deviation is set to 0.1 and the number of noise additions is set to 100. The flow data were decomposed into 12 IMF components and a residual. The results of the CEEMDAN decomposition are shown in Figure 5. The CEEMDAN algorithm ensures the completeness of the decomposition. It reduces the problems of modal aliasing and endpoint effects and achieves better decomposition results.
Figure 5

CEEMDAN-based decomposition results.

Figure 5

CEEMDAN-based decomposition results.

Close modal

From Figure 5 it can be seen that the shortest wavelength and the highest frequency are generated from IMF1. Starting from IMF6, the frequency gradually decreases, the wavelength becomes longer, and the amplitude decreases. From March 1 to November 26, 2019, the residual term represents the overall trend of the change in the flow of the water supply network in the city. It can be seen that the flow gradually increases in March, reaches its maximum in July and August, and then starts to decrease. The overall trend is consistent with the seasonal changes.

Implementation of dimensionality reduction based on the KPCA algorithm

KPCA can use kernel functions to map the data into a high-dimensional space where nonlinear features can be separated. In addition, this method is particularly suitable for dealing with complex nonlinear signals, whereas PCA may miss important nonlinear features.

The results of the CEEMDAN decomposition were analyzed using the KPCA algorithm, and the contributions of the KPCA principal components are shown in Figure 6.
Figure 6

KPCA-based principal component contribution rates.

Figure 6

KPCA-based principal component contribution rates.

Close modal

Based on the previously mentioned contribution requirement of q% (98%), six principal components are examined. The contributions of the first six major components are 47.75, 31.74, 8.44, 5.09, 2.85, and 2.27%, respectively. The cumulative contribution rate of the first six principal components is 98.14%, as shown in the cumulative contribution rate line graph, which meets the contribution rate requirement. The screened principal components are highly representative and contain most of the information characteristics. Reducing the 13-dimensional data to six-dimensional data reduces the complexity of the data, reduces the computation time, and increases the computation speed.

Evaluation metrics

To measure the performance of the proposed methodology for predicting flows in urban water supply networks, the root mean square error (RMSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE), and the coefficient of determination (R2) are used as the evaluation metrics.
(16)
(17)
(18)
(19)
where is the actual flow value at time i; is the predicted flow value at time l; is the predicted value flow at time i; is the average value of the actual flow, and n is the length of the time series.

Flow prediction implementation

The LSTM model is used to predict the flow data of the urban water supply network. The screened principal components are the input variables to the LSTM model. The dataset is divided into a training set and a test set in a ratio of 7:3 and is also normalized. After continuous debugging sessions, the parameters used to build the LSTM model for urban water supply network flow prediction are as follows: the number of input layer time steps is 24, the number of input layer dimensions is 6, the number of output layer dimensions is 1, the number of hidden layer dimensions is 1, the number of hidden layer nodes is 100, and the gradient threshold is set to 1 to avoid overfitting.

Based on the above parameters, the monthly flow data of the urban water supply network were predicted by the LSTM model to verify the predictive performance of the model in this article. Due to the small amount of data in April and June, the data from March, May, July, and August were selected for the monthly prediction. The flow data for March, May, July, and August were decomposed by the CEEMDAN algorithm to obtain 10, 10, 10, and 9 principal components, respectively. The results of the CEEMDAN algorithm were used for the KPCA dimensionality reduction to retain the 7, 4, 5, and 4 principal components, respectively. The monthly streamflow prediction results are shown in Figure 7, it can be seen that the predicted data and the actual data are basically in agreement. March and July were slightly less accurate in predicting some of the extreme values, and May was slightly less accurate in predicting after 70 h.
Figure 7

Monthly flow prediction results for the following months: (a) March; (b) May; (c) July; and (d) August.

Figure 7

Monthly flow prediction results for the following months: (a) March; (b) May; (c) July; and (d) August.

Close modal

To observe the prediction performance of the CEEMDAN–KPCA–LSTM model more intuitively, the RMSE, MAE, MAPE, and R2 are used to check the prediction results for each month. From Table 1, the MAPE for March, May, July, and August are all around 5% and the R2 is >0.97. The MAPE for May is 5.173%, which is the worst prediction among the 5 months. The results show that the developed model has higher accuracy and better prediction performance for monthly flow predictions.

Table 1

Evaluation of monthly performances

ModelMonthRMSEMAEMAPER2
CEEMDAN–KPCA–LSTM Mar 13.102 9.485 4.857 0.972 
May 14.086 11.294 5.173 0.983 
Jul 13.129 10.055 4.338 0.973 
Aug 12.003 10.488 4.669 0.979 
ModelMonthRMSEMAEMAPER2
CEEMDAN–KPCA–LSTM Mar 13.102 9.485 4.857 0.972 
May 14.086 11.294 5.173 0.983 
Jul 13.129 10.055 4.338 0.973 
Aug 12.003 10.488 4.669 0.979 

To further verify the superiority of the CEEMDAN–KPCA–LSTM model, the flow data of the urban water supply network are predicted by other models. The prediction results of the CEEMDAN–KPCA–LSTM model are compared with some existing algorithms, including GRU (Gao et al. 2020), LSTM (Yu et al. 2019), EMD–LSTM (Hao et al. 2022), CEEMDAN–GRU (Zhang & Yang 2020), CEEMDAN–LSTM (Cao et al. 2019), EMD–KPCA–LSTM (Jin & Ran 2023), and CEEMDAN–KPCA–GRU. All flux data, 4,586 in total, were used in this experiment. The CEEMDAN decomposition is shown in Figure 5, where the decomposition yields 12 IMFs and one residual. The KPCA dimensionality reduction is shown in Figure 6, where six principal components are retained. As the data are too large, 100 data points are selected for plotting in the prediction results, and the prediction results of the seven models are shown in Figure 8. The comparison results of the performance metrics RMSE, MAE, MAPE, and R2 for each model are shown in Table 2.
Table 2

Results of performance evaluation

ModelsRMSEMAEMAPER2
GRU 33.675 24.178 11.341 0.843 
LSTM 32.726 23.23 11.312 0.852 
EMD–LSTM 14.928 13.172 7.240 0.969 
CEEMDAN–GRU 16.132 11.96 5.784 0.964 
CEEMDAN–LSTM 13.278 11.09 5.947 0.976 
EMD–KPCA–LSTM 7.696 6.026 3.399 0.991 
CEEMDAN–KPCA–GRU 8.646 7.023 3.805 0.989 
CEEMDAN–KPCA–LSTM 7.113 5.481 2.756 0.993 
ModelsRMSEMAEMAPER2
GRU 33.675 24.178 11.341 0.843 
LSTM 32.726 23.23 11.312 0.852 
EMD–LSTM 14.928 13.172 7.240 0.969 
CEEMDAN–GRU 16.132 11.96 5.784 0.964 
CEEMDAN–LSTM 13.278 11.09 5.947 0.976 
EMD–KPCA–LSTM 7.696 6.026 3.399 0.991 
CEEMDAN–KPCA–GRU 8.646 7.023 3.805 0.989 
CEEMDAN–KPCA–LSTM 7.113 5.481 2.756 0.993 
Figure 8

Prediction results of different models.

Figure 8

Prediction results of different models.

Close modal

Compared with GRU, LSTM, EMD–LSTM, CEEMDAN–GRU, CEEMDAN–LSTM, EMD–KPCA–LSTM, and CEEMDAN–KPCA–GRU models, the developed model has the best prediction performance. Compared with the single LSTM model, the CEEMDAN–KPCA–LSTM model performs signal decomposition and dimensionality reduction, which has higher prediction accuracy. All performance metrics are superior to the single LSTM model. The RMSE, MAE, and MAPE of the CEEMDAN–KPCA–LSTM model are improved by 78.3, 76.4, and 75.6%, respectively, compared with the single LSTM. These are 46.4, 50.6, and 53.7% compared to the CEEMDAN–LSTM model without dimensionality reduction. The combined models significantly improved the prediction accuracy compared to the single models. From the above data, it can be seen that it is necessary to perform the KPCA dimensionality reduction. The prediction results show that our model has a better performance than the others.

To verify the effect of the KPCA dimensionality reduction on the running time of the algorithm, the running time of the three prediction models was calculated in five rounds, and the results are shown in Table 3. It shows that the LSTM model with direct input data has the shortest running time. The CEEMDAN–LSTM model, which performs the CEEMDAN decomposition, has the longest running time because it uses the 13-dimensional data obtained from the decomposition as input variables. Compared with the CEEMDAN–LSTM model, the running time of our model with the KPCA dimensionality reduction is significantly reduced. From the above, it can be seen that performing the KPCA dimensionality reduction can reduce the running time of the algorithm in this study.

Table 3

Running time of the algorithm

ModelsRound 1Round 2Round 3Round 4Round 5
LSTM 52.54 s 56.77 s 59.99 s 56.03 s 54.80 s 
CEEMDAN–LSTM 87.51 s 93.75 s 93.08 s 92.04 s 91.47 s 
CEEMDAN–KPCA–LSTM 65.38 s 70.16 s 69.56 s 70.52 s 68.81 s 
ModelsRound 1Round 2Round 3Round 4Round 5
LSTM 52.54 s 56.77 s 59.99 s 56.03 s 54.80 s 
CEEMDAN–LSTM 87.51 s 93.75 s 93.08 s 92.04 s 91.47 s 
CEEMDAN–KPCA–LSTM 65.38 s 70.16 s 69.56 s 70.52 s 68.81 s 

In short, the advantage of the CEEMDAN–KPCA–LSTM model is that it can effectively extract the multi-scale features of the time series, reduce the dimensionality and complexity of the data, and improve the prediction performance of the LSTM network. However, the model has some shortcomings. In dimensionality reduction, the choice of different kernel functions will lead to different reduction effects, and the kernel matrix will also take up a lot of memory and computational resources when the data volume is large.

In this article, the CEEMDAN decomposition, the KPCA dimensionality reduction, and the LSTM model were combined to build the CEEMDAN–KPCA–LSTM model, which was applied to the flow prediction of an urban water supply network. The results were compared with related baseline models, and the main findings are as follows:

  • The original flow data were decomposed by the CEEMDAN algorithm to obtain different IMFs and residual terms. The various scale fluctuations and trends present in the flow data were decomposed. The KPCA algorithm was used to extract the six principal components that reflect the original information. It can remove redundant features, reduce the dimensionality of model input parameters, and improve model efficiency and performance. Reducing high-dimensional data to low-dimensional data speeds up the operation of the algorithm and improves the accuracy of the model.

  • The prediction results of the water supply network flow data from March 1 to November 26, 2019 show that the CEEMDAN–KPCA–LSTM model has a high prediction performance. The model prediction result of the RMSE is 7.113, the MAE is 5.481, the MAPE is 2.756%, and the R2 is 0.993. These indicators show that the model in this article outperforms other models when it comes to flow prediction.

Our model focuses primarily on data optimization and learning from historical data so that it can be adapted to other distribution networks with either similar or different characteristics. The developed model can also be used to predict other hydraulic parameters in the distribution network that often exhibit periodic patterns, such as water demand and flow rate. When we apply this model to predict flow in other distribution networks, it can be trained using only the corresponding historical data. In other words, as long as there is sufficient historical data, this model can be applied to any distribution network, and it shows significant advantages when data are scarce or of poor quality. Most importantly, the runtime of the model is shorter compared to other models, which has positive implications for real-time scheduling of water supply networks. In addition, the data collection interval for pressure in this study was set to 1 h. By shortening the time interval and increasing the amount of data, uncertainties caused by long-term correlations can be effectively reduced. To improve the accuracy, we can consider reducing the pressure data collection interval to 1, 5, or 15 min, allowing the model to extract more detailed information. Incorporating these factors into future research studies will be crucial.

This work was supported in part by the Demonstration Project of Water Supply Safety Guarantee and Optimized Operation of Chaohu Pipeline Network and Major Scientific Research Project of Universities in Anhui Province (2024AH040039).

W.X. conceptualized, visualized, and investigated the study. Y.C. conceptualized the study, performed the methodology, wrote the original draft, wrote the review and edited. K.N. curated data and wrote the review. Z.M. wrote the review and edited, and supervised the study.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Battista
B. M.
,
Addison
A. D.
&
Knapp
C. C.
(
2009
)
Empirical mode decomposition operator for dewowing GPR data
,
Journal of Environmental and Engineering Geophysics
,
14
(
4
),
163
169
.
Brentan
B. M.
,
Luvizotto Jr
E.
,
Herrera
M.
,
Izquierdo
J.
&
Pérez-García
R.
(
2017
)
Hybrid regression model for near real-time urban water demand forecasting
,
Journal of Computational and Applied Mathematics
,
309
,
532
541
.
Cao
J.
,
Li
Z.
&
Li
J.
(
2019
)
Financial time series forecasting model based on CEEMDAN and LSTM
,
Physica A: Statistical Mechanics and its Applications
,
519
,
127
139
.
Chen
Q.
,
Huang
Y.
&
Fellah
Z. E. A.
(
2024
)
Predicting screening efficiency of probability screens using KPCA-GRNN with WP-EE feature reconstruction
,
Advances in Mathematical Physics
,
2024
(
1
),
11
.
Dong
J.
,
Wang
Z.
,
Wu
J.
,
Huang
J.
&
Zhang
C.
(
2023
)
A water quality prediction model based on signal decomposition and ensemble deep learning techniques
,
Water Science and Technology
,
88
(
10
),
2611
2632
.
Gao
S.
,
Huang
Y.
,
Zhang
S.
,
Han
J.
,
Wang
G.
,
Zhang
M.
&
Lin
Q.
(
2020
)
Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation
,
Journal of Hydrology
,
589
,
125188
.
Gou
H.
&
Ning
Y.
(
2021
)
Forecasting model of photovoltaic power based on KPCA-MCS-DCNN
,
Computer Modeling in Engineering & Sciences
,
128
(
2
),
803
822
.
Guo
G.
,
Liu
S.
,
Wu
Y.
,
Li
J.
,
Zhou
R.
&
Zhu
X.
(
2018
)
Short-term water demand forecast based on deep learning method
,
Journal of Water Resources Planning and Management
,
144
(
12
),
04018076
.
Hao
W.
,
Sun
X.
,
Wang
C.
,
Chen
H.
&
Huang
L.
(
2022
)
A hybrid EMD-LSTM model for non-stationary wave prediction in offshore China
,
Ocean Engineering
,
246
,
110566
.
Jiao
J.
,
Ma
Q.
,
Huang
S.
,
Liu
F.
&
Wan
Z.
(
2024
)
A hybrid water quality prediction model based on variational mode decomposition and bidirectional gated recursive unit
,
Water Science and Technology
,
89
(
9
),
2273
2289
.
Jin
F.
&
Ran
J.
(
2023
)
Short-term wind power prediction based on EMD-KPCA-LSTM[C]//International Symposium on New Energy and Electrical Technology
.
Singapore: Springer Nature Singapore, 2023, 145–156
.
Mu
L.
,
Zheng
F.
,
Tao
R.
,
Zhang
Q.
&
Kapelan
Z.
(
2020
)
Hourly and daily urban water demand predictions using a long short-term memory based model
,
Journal of Water Resources Planning and Management
,
146
(
9
),
05020017
.
Poongadan
S.
&
Lineesh
M. C.
(
2024
)
Non-linear time series prediction using improved CEEMDAN, SVD and LSTM
,
Neural Processing Letters
,
56
(
3
),
164
.
Saleem
A.
,
Mahmood
I.
,
Sarjoughian
H.
,
Nasir
H. A.
&
Malik
A. W.
(
2021
)
A water evaluation and planning-based framework for the long-term prediction of urban water demand and supply
,
Simulation
,
97
(
5
),
323
345
.
Tollner
D.
,
Ziyu
W.
,
Zöldy
M.
&
Török
Á
. (
2024
)
Demonstrating a new evaluation method on ReLU based neural networks for classification problems
,
Expert Systems with Applications
,
250
,
123905
.
Wang
Z.
,
Wang
Q.
&
Wu
T.
(
2023
)
A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM
,
Frontiers of Environmental Science & Engineering
,
17
(
7
),
88
.
Wu
Z.
&
Huang
N. E.
(
2004
)
A study of the characteristics of white noise using the empirical mode decomposition method
.
Proceedings of the Royal Society of London
,
Series A: Mathematical, Physical and Engineering Sciences
,
460
(
2046
),
1597
1611
.
Wu
J.
,
Wang
Z.
&
Dong
L.
(
2021
)
Prediction and analysis of water resources demand in Taiyuan City based on principal component analysis and BP neural network
,
Journal of Water Supply: Research and Technology-Aqua
,
70
(
8
),
1272
1286
.
Yu
Y.
,
Si
X.
,
Hu
C.
&
Zhang
J.
(
2019
)
A review of recurrent neural networks: lSTM cells and network architectures
,
Neural Computation
,
31
(
7
),
1235
1270
.
Zhang
X.
,
Qiao
W.
,
Huang
J.
,
Shi
J.
&
Zhang
M.
(
2023
)
Flow prediction in the lower Yellow River based on CEEMDAN-BILSTM coupled model
,
Water Supply
,
23
(
1
),
396
409
.
Zubaidi
S. L.
,
Gharghan
S. K.
,
Dooley
J.
,
Alkhaddar
R. M.
&
Abdellatif
M.
(
2018
)
Short-Term urban water demand prediction considering weather factors
,
Water Resources Management
,
32
(
14
),
4527
4542
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).