This study proposes a novel approach for predicting variations in water quality at wastewater treatment plants (WWTPs), which is crucial for optimizing process management and pollution control. The model combines convolutional bi-directional gated recursive units (CBGRUs) with adaptive bandwidth kernel function density estimation (ABKDE) to address the challenge of multivariate time series interval prediction of WWTP water quality. Initially, wavelet transform (WT) was employed to smooth the water quality data, reducing noise and fluctuations. Linear correlation coefficient (CC) and non-linear mutual information (MI) techniques were then utilized to select input variables. The CBGRU model was applied to capture temporal correlations in the time series, integrating the Multiple Heads of Attention (MHA) mechanism to enhance the model's ability to comprehend complex relationships within the data. ABKDE was employed, supplemented by bootstrap to establish upper and lower bounds of the prediction intervals. Ablation experiments and comparative analyses with benchmark models confirmed the superior performance of the model in point prediction, interval prediction, the analysis of forecast period, and fluctuation detection for water quality data. Also, this study verifies the model's broad applicability and robustness to anomalous data. This study contributes significantly to improved effluent treatment efficiency and water quality control in WWTPs.

  • Proposing a new hybrid interval prediction model for water quality in WWTPs.

  • Identify multivariate input variables using linear CC and non-linear MI techniques.

  • Reducing water quality noise and revealing data trends using wavelet transforms.

  • Using GSS and bootstrap methods to determine optimal bandwidth and upper and lower limits.

  • Peak, mutation and different time period predictions validate model applicability.

ABKDE

adaptive bandwidth kernel function density estimation

AI

artificial intelligence

ANN

artificial neural network

AM

attention mechanism

BPNN

back propagation neural network

BiGRU

Bi-directional Gated Recurrent Unit

coefficient of determination

CRPS

continuous ranked probability score

CBGRU

Convolutional Bi-directional Gated Recurrent Unit

CNN

convolutional neural network

CC

correlation coefficient

CWC

coverage width-based criterion

DL

deep learning

GRU

gated recurrent unit

GSS

Golden Section Search

KNN

k-nearest neighbor

LSTM

long short-term memory

ML

machine learning

MAE

mean absolute error

MAPE

mean absolute percentage error

MHA

Multi-Head Attention

MI

mutual information

PICP

PI Coverage Probability

PINAW

PI Normalized Averaged Width

PP

point prediction

PI

Prediction Interval

RF

random forest

RMSE

root mean square error

RNN

recurrent neural network

SVM

support vector machine

TCN

temporal convolutional network

WWTP

wastewater treatment plant

WT

wavelet transform

As urbanization accelerates and populations surge alongside increased industrial and agricultural activities, wastewater treatment emerges as a pivotal endeavor within the realm of environmental concerns. The effective treatment of wastewater is not only crucial for human health and quality of life, but also directly impacts the maintenance of ecosystems (Onu et al. 2023). The wastewater treated in wastewater treatment plants (WWTPs) contains a significant amount of organic substances, heavy metals, micro-organisms, and other harmful components (Saravanan et al. 2021). If not managed scientifically and effectively, the direct discharge of these pollutants into water bodies can severely jeopardize water quality and ecological balance, potentially endangering human health (Rathi et al. 2021). Therefore, to optimize and manage wastewater treatment, it is essential to research and forecast water quality indicators in WWTPs (Aghdam et al. 2023).

The construction of WWTPs has been a global phenomenon for some time now, but the accurate and scientific forecasting of wastewater still faces several challenges (Roohi et al. 2024). WWTPs involve intricate biological and chemical processes, leading to a high degree of variability and classification of discharges. This complexity stems from the diverse sources of wastewater, including industrial effluents, domestic discharges, as well as commercial and public waste streams. As a result, the operation of WWTPs becomes inherently complex due to the varied compositions, pH levels, and flow rates encountered (Yang et al. 2024). Second, water quality characteristics and treatment needs vary significantly from region to region. Industrial cities typically have a higher proportion of industrial wastewater compared to other cities, while habitable cities may mainly deal with household-generated wastewater (Macedo et al. 2022). Third, climate change and natural disasters introduce uncertainties to wastewater systems. Elevated temperatures can decrease the efficiency of microbial organic matter treatment, thereby affecting wastewater treatment effectiveness negatively. Additionally, water pollution events like eutrophication and hydrophobization near WWTPs can further impact treatment processes and the ability to predict water quality outcomes (Malairajan & Namasivayam 2021).

Historically, mathematical models and linear regression techniques have been the predominant forecasting tools for water quality data in WWTPs. These models, such as statistical and fuzzy models, are used to simulate and understand the parametric relationships among different variables during the operational stages of wastewater treatment (Tan et al. 2023; Wang et al. 2023; Bankole et al. 2024).

Real-time monitoring and control of key treatment parameters can be achieved through computer modeling. By modifying treatment processes, controls, operations, and chemical dosing at each stage of the process, wastewater treatment constraints can be met to achieve optimal treatment with minimal resources and costs. For instance, Wang et al. (2022) developed a prediction model for chemical oxygen demand (COD) in WWTP effluent using statistical correlation methods, while Abouzari et al. (2021) utilized linear and non-linear statistical models to estimate COD in WWTPs. However, relying solely on these traditional methods can be time-consuming, involve lengthy processes, and fail to capture the overall non-linear relationships and complex dynamics present in water treatment processes. Additionally, these methods may struggle to adapt to dynamic treatment requirements (Safeer et al. 2022). Moreover, traditional methodologies often simplify processes based on unrealistic assumptions and idealized conditions (Sharafi et al. 2024), highlighting the need for more flexible and adaptable approaches (Wu et al. 2023b).

Artificial intelligence (AI) is a prominent and rapidly advancing technology in various fields (Ye et al. 2021). Recent studies have delved into the behavior of WWTPs in meeting effluent quality standards by employing AI techniques (Rajaei & Nazif 2022). AI methods, particularly machine learning (ML), include deep learning (DL), which specializes in modeling real-time issues with high complexities (Xu et al. 2023). DL techniques consist of artificial neural networks (ANNs) (Zhang et al. 2021), long short-term memory (LSTM) networks (Joseph et al. 2024), among others. ML approaches encompass support vector machine (SVM), K-nearest neighbor (KNN) (Xu et al. 2022), and random forest (RF) (Kok et al. 2021). Furthermore, combining multiple AI methods and utilizing optimization techniques like Genetic Algorithms and Gradient Descent (Jana et al. 2022; Kovacs et al. 2022) can yield improved outcomes. The modeling capabilities of AI offer significant benefits in predicting water quality and optimizing wastewater treatment processes. They enable autonomous analysis, evaluation, and prediction based on input water quality data, optimizing system variables, and issuing alerts to adjust outputs accordingly. This not only reduces human error but also enhances productivity (Safeer et al. 2022; Gao et al. 2023).

In the realm of AI-based modeling for WWTPs, the predominant focus often centers on point prediction (PP) of samples (Farhi et al. 2021). PP involves a model that maps one or a set of inputs to a singular output value. However, these PP methods frequently overlook the inherent uncertainty associated with modeling various outputs. Among the plethora of techniques available for quantifying uncertainty, prediction interval (PI) stands out as one of the most effective methods for exploring uncertainty (Nourani et al. 2023). The PI consists of upper and lower bounds that define the range in which the uncertainty prediction is situated. This uncertainty prediction is calculated by an independent predictor, which leverages dependent targets within a specified range to determine a certain level of confidence (Chen et al. 2024a). The predominant methods utilized for predicting WWTP intervals predominantly rely on ANNs (Bagherzadeh et al. 2021; Mehrani et al. 2022; Nourani et al. 2023; Sun et al. 2023). Nevertheless, multilayer perceptron (MLP) does exhibit certain limitations (Pang et al. 2023). For instance, the efficacy of MLP in handling data is heavily reliant on data representation and feature engineering (Gao et al. 2024). MLP is susceptible to overfitting and a decline in generalization performance to data (Xu et al. 2024). Furthermore, MLPs are not specifically tailored to handle sequential data and may not perform as adeptly as models like the gated recurrent unit (GRU) on time series data (Yao et al. 2023; Chen et al. 2024b; Cui et al. 2024; Dong et al. 2024; Wang et al. 2024a, b).

In the wastewater treatment process, the quality of effluent water is one of the key indicators for assessing treatment effectiveness and environmental compliance. Particularly, pH is a critical chemical parameter in effluent water that directly impacts the health of ecosystems and the legal discharge standards. Predicting the pH value of effluent water holds significant importance for the following reasons:

  • (1) Process control and optimization. pH is an important indicator of water stability and allows stability monitoring. In wastewater treatment, maintaining pH within a desirable range helps to ensure the effectiveness of biochemical treatment processes, such as activated sludge, which are highly pH sensitive. By predicting pH trends, the amount of chemicals to be added, the amount of aeration and other operating parameters in the treatment process can be adjusted in advance to optimize the overall treatment process, reduce operating costs and improve treatment efficiency.

  • (2) Environmental protection. Although the acceptable range is 6–9, extreme fluctuations in pH can have an impact on downstream water bodies, especially if discharges deviate from this range, and can disrupt the ecological balance of the receiving water body. Therefore, accurate prediction of pH can be used as an early warning mechanism for potential environmental risks.

  • (3) Data-driven decision support. By continuously monitoring and predicting pH, valuable data can be collected that can be used to analyze long-term trends in the treatment process and guide future plant upgrades and technology choices.

Therefore, this study aims to enhance the management level and effluent safety of WWTPs by accurately predicting the pH value of effluent water. This prediction is of significant practical importance for achieving process control and optimization, environmental protection, and data-driven decision support.

In order to address the challenge of inadequate prediction of multivariate water quality data in WWTPs and the presence of correlations among multiple wastewater indicators that traditional prediction models struggle to capture (Ni et al. 2023), this paper introduces a novel approach. The proposed method, a multivariate adaptive bandwidth kernel density interval prediction based on CBGRU, aims to predict water quality data in WWTPs. The model is evaluated using real water quality data and compared against other multivariate time series forecasting models. Results indicate that the model performs exceptionally well in both point and interval predictions of water quality data in WWTPs. This study offers several key contributions:

  • (1) A feature attention mechanism (AM) based on MHA is developed to enhance the identification of potential correlations among various WWTP indicators, improving the robustness of multivariate time series prediction.

  • (2) The model incorporates convolutional neural network (CNN) and Bi-directional Gated Recurrent Unit (BiGRU) layers to effectively capture local and global features.

  • (3) Interval prediction is achieved through ABKDE, utilizing regression models to establish upper and lower bounds, with model parameters updated using stochastic gradient descent.

  • (4) Experimental results demonstrate the model's capability to predict water quality data with peaks and significant fluctuations, showcasing excellent performance in both point and interval prediction of water quality data in WWTPs. Additionally, the model exhibits remarkable suitability and robustness, further enhancing its utility and reliability in real-world applications.

While the effluent quality of WWTPs is primarily determined by influent water quality and process parameters, in practical operation, historical data of effluent quality also holds significant value. Through real-time monitoring and historical data of effluent quality, we can forecast changes in effluent pH trends in advance, which is crucial for short-term control at WWTPs. Compared to complex multivariate modeling, predicting based on effluent data can simplify the modeling complexity of the treatment process and to some extent capture the lagged response of the treatment system. Therefore, the proposed prediction model based on effluent pH data in this study aims to provide a simplified yet effective early warning mechanism for WWTPs, enhancing flexibility in effluent water quality management.

The subsequent steps in this study are outlined as follows: Section 2 provides a detailed explanation of the main models used in the study, as well as the evaluation metrics employed. Section 3 analyses the data and models used in the study, presenting and interpreting the results. Section 4 includes ablation experiments, as well as experiments comparing PP and interval prediction models. It also covers the analysis of forecast period, special data handling, suitability, and robustness. Finally, Section 5 summarizes the key findings and issues discussed in the study.

Feature selection

Modeling inefficiencies in water quality data parameters of WWTPs stem from the presence of multiple candidate inputs and outputs. Including all but one output variable in the input variables can lead to increased noise (Alvi et al. 2023). Filtering and optimizing these inputs and outputs is essential to identify the most effective sets and combinations, thereby reducing irrelevance and alleviating high computational demands.

Feature selection is the process of identifying the most valuable subset of features from the original feature set (Dhal & Azad 2022). In this study, one approach to feature selection involves examining the correlations among the variables in the water quality data of the WWTPs and selecting the parameter variables with higher correlations as potential predictors and target variables. The paper utilizes a combination of Pearson linear correlation coefficient (CC) and non-linear mutual information (MI) as the method. The Pearson linear CC is a statistic used to measure the degree of linear correlation between two continuous variables. By computing the Pearson linear CC between the input and output variables, the study can evaluate whether a linear relationship exists between them:
(1)

In the formula, n represents the number of data points, and represent the mean values of the x and y samples, respectively. The closer the CC is to 1 or −1, the stronger the linear relationship between the two variables; whereas, the closer it is to 0, the weaker the linear relationship. A CC above 0 indicates a positive correlation, while a CC below 0 indicates a negative correlation.

On the other hand, MI measures the non-linear correlation between two variables. By looking at the MI between input and output variables, it is able to capture more complex relationships, not just linear ones. The MI calculation method provides a more complete picture of the potential associations between variables, especially in the presence of non-linear relationships. When defining the entropy of a variable X of length N, denotes the probability that the ith term of X is present:
(2)
Moreover, the joint entropy of the two random variables X and Y is defined based on as the joint probability of the two variables:
(3)
The MI is subsequently computed between input candidate predictors and target predictors. A higher MI value between two variables indicates reduced uncertainty and a stronger dependence between them:
(4)

The selection of predictors was based on the evaluation of Pearson CC and MI measures between the candidate predictors and the target variables. The predictor exhibiting the highest MI and CC values was identified as the primary predictor for modeling purposes. While CC is commonly used in selecting data for linear regression algorithms, MI quantifies the information one random variable conveys about another (Gong et al. 2024). Therefore, in the context of WWTP effluent data parameters, the CC and MI methodology was utilized to establish the relationship (Nourani et al. 2023).

Wavelet transform

Noise is inherent in the water quality data collected from WWTPs. To enhance the reliability of the data, a process of data smoothing has been employed to eliminate noise and sudden fluctuations, resulting in a more consistent dataset. Data smoothing is a method used to uncover underlying trends by minimizing noise or irregularities within the data. Wavelet transform (WT) is utilized for this purpose in the study. WT introduces a new approach to signal analysis by refining the concept of localizing the Short-Time Fourier Transform, addressing issues such as fixed window size across frequencies. By providing a ‘time–frequency’ window that adjusts to varying frequencies, WT emerges as a valuable tool for time–frequency analysis and signal processing. WT operates as a multiscale analysis technique, breaking down a signal into components of diverse scales and frequencies. The core of WT lies in the process of decomposition and reassembly. Through wavelet packet decomposition, wavelet packet coefficients are generated for each scale and frequency. In the context of wavelet packet coefficients, all coefficients except for the low frequency segment are zeroed out, thereby preserving the primary trend of the signal. Wavelet packet reconstruction is performed using the processed coefficients to obtain the smoothed signal:
(5)

In the WT formula, represents the coefficients obtained through WT, denotes the original signal, signifies the wavelet function, denotes the conjugate of the wavelet function, a stands for the scale parameter, and b represents the translation parameter. By experimenting with different wavelet functions and parameters, the optimal combination was determined.

Due to the diverse ranges of various metrics, directly inputting raw data into the model may negatively impact both training speed and prediction accuracy. Therefore, normalization is essential. The model described in this paper utilizes Min–Max Normalization to standardize the data within the range of 0–1. The formula for Min–Max Normalization is as follows:
(6)

In the Min–Max Normalization formula, Y represents the processed data, and X denotes the original data. and denote the minimum and maximum values of X, respectively.

CBGRU

The CBGRU model, which combines CNN and BiGRU, is primarily utilized for predicting multivariate time series data. In this study, the CBGRU model employs a CNN to extract features from the time series data of various variables. Subsequently, these features are inputted into a BiGRU for sequence modeling. Finally, the model utilizes a fully connected layer to generate the ultimate prediction.

CNNs are influenced by visual neuroscience and are mainly composed of convolutional and pooling layers. The convolution layer captures local features from the input data while maintaining spatial relationships. In contrast, the pooling layer decreases the dimensionality of hidden layers using methods like maximal or average pooling, which helps reduce computational complexity and introduce rotational invariance. The architecture of a CNN is illustrated in Supplementary material, Figure S1.

Convolution layers are created by applying various convolution kernels to the input data and executing a sequence of operations. These kernels conduct element-wise multiplications with the input data at each position and aggregate the outcomes. Nonetheless, employing smaller convolution kernels may result in inadequate coverage and restrict the algorithm's expressiveness. To address this issue, it is typical to employ zero padding to manage the output size. During the training of the algorithm, a set of convolution kernels with dimensions typically denoted as (,,c) is slipped over the input data of fixed size . The size of the convolution layer can be calculated using the following equation.
(7)
where , , and are the height, width, and depth of the input, respectively, is the padding, is the stride.
The activation function plays a crucial role in determining the output of a neuron by processing a set of inputs. It calculates the weighted sum of the linear network inputs and then applies a non-linear transformation. This paper specifically focuses on the utilization of the ReLU activation function, as shown in the following equation.
(8)
Derivatives are shown in the following equation.
(9)

A pooling layer acts as a downsampling mechanism, combining the outputs of a cluster of neurons from a previous layer into a single neuron in a lower layer. This pooling process takes place after the non-linear activation function. Its main purposes are to decrease the parameter count to avoid overfitting and to act as a filter to remove unwanted noise (Duan et al. 2022).

When the number of convolution kernels and the stride are used to perform pooling operations, their dimensions can be calculated using the following equation:
(10)

BiGRU is a neural network structure used for processing sequential data, serving as a variation of the GRU model. The GRU itself is a type of recurrent neural network (RNN) specifically designed to combat the issue of vanishing or exploding gradients often faced by traditional RNNs when dealing with long sequences of data. While a standard GRU can only process data in one direction, BiGRU is unique in that it processes information simultaneously from both the front to the back and the back to the front. This dual-directional processing capability makes BiGRU well-suited for tasks where the temporal context of the data is crucial. The architectural layout of BiGRU is illustrated in Supplementary material, Figure S2.

The mathematics of the BiGRU model can be represented by the following equations:

First, for an input sequence , the forward computation of the BiGRU model can be expressed as:
(11)
(12)
where and denote left-to-right and right-to-left hidden states, respectively, GRU denotes a GRU unit, and denotes the tth element in the input sequence.
Then, the hidden states in both directions are stitched together to get the final hidden state :
(13)
Finally, the hidden state is passed to a fully connected layer to get the output :
(14)
where W and b are the weight and bias of the fully connected layer, respectively, and softmax is the softmax activation function.

CNNs play a crucial role in reducing the need for manual feature engineering by autonomously learning and extracting features from data, especially beneficial for tasks with intricate data patterns. The multi-layer architecture of CNNs enables them to progressively grasp data features from low to high levels, with each convolutional layer uncovering varying levels of information to effectively identify complex patterns. This characteristic makes CNNs highly efficient for processing spatially or temporally correlated data like images and sequences.

Despite feature selection and normalization efforts, intricate patterns and relationships may still persist within the data. CNNs excel in refining these features through an automated feature learning process, enhancing their utility for predictive tasks. By optimizing hyperparameters such as the number of convolutional layers, convolutional kernel size, and step size, the model configuration can be fine-tuned to enhance prediction accuracy and generalization capabilities.

The utilization of BiGRU models proves particularly valuable when working with time series data. BiGRU models can effectively capture both long-term and short-term dependencies within time series data by integrating past and future information through their bi-directional structure, thereby enhancing the comprehension of the current state and forecasting accuracy. This model is particularly well-suited for forecasting tasks that encounter challenges like instrument delays and irregular sampling rates, as it can unveil intricate patterns and dependencies that traditional statistical methods might overlook.

In conclusion, CNNs and BiGRU models are uniquely positioned to capture and integrate complex relationships within time series data, proving highly effective in understanding and predicting data with intricate dependencies.

Multi-head AM

In order to improve the interactions between multiple water quality data metrics of WWTPs, a feature AM based on a MHA mechanism is used. The MHA mechanism is a powerful sequence processing technique that can simultaneously focus on multiple positions within a sequence, allowing for the examination of complex relationships among different segments. This mechanism increases the model's ability to detect important features, especially when dealing with long sequences, leading to a more comprehensive understanding of dynamics.

As illustrated in Supplementary material, Figure S3, the attention-related query matrix is denoted by Q, the key matrix is denoted by K, and the value matrix is denoted by V (Guo et al. 2024). The input matrix X is transformed to calculate the Q, K, and V matrices:
(15)
(16)
(17)
where , , are the learnable weight matrices.
Afterwards each head calculates the similarity matrix on the dot product of Q and K using the sigmoid activation function:
(18)
where is the query and key corresponding to the ith header.
The resulting similarity score is then applied to V:
(19)
where is the value of the ith header.
Finally, the outputs of all the heads are spliced together and the final linear transformation is performed using the matrix to produce the final output Y:
(20)
where ℎ is the number of heads and is the final linear transformation matrix.

ABKDE

Water quality data from WWTPs typically consists of time series observations on various parameters related to the wastewater treatment process. This data often reveals complex interrelationships between variables, as well as seasonal or cyclical patterns and potential outliers. The ABKDE method is employed to derive the probability density function from the data, allowing for a better understanding of the distribution's characteristics and uncovering potential relationships between variables.

In this study, the Gaussian kernel is selected as the kernel function due to its suitability for most distributions and its mathematical smoothness. The formula used to estimate the kernel density for calculating the local loss function is as follows:
(21)
where is an estimate of the density function, N is the number of samples, h is a bandwidth parameter to control the width of the kernel function, is the kernel function to assign weights to each sample point, and is the location of the sample point.
Since the choice of the bandwidth parameter h directly affects the smoothness of the estimation results and the sensitivity to the data features, an optimization strategy for the bandwidth needs to be used. The accuracy of kernel density estimation can be improved by iteratively optimizing the bandwidth. In this paper, the Golden Section Search (GSS) method is employed to determine the optimal bandwidth. The GSS method is a technique utilized for identifying the minimum of a function within a specified interval. It is based on the golden ratio, an efficient method of reducing the search interval. The GSS method starts by initializing the parameters. Suppose that a and b are the initial boundaries of the search interval, usually set to a very small value and a very large value. is the golden ratio, that is, . Two interior points and are calculated in each iteration:
(22)
(23)

The cost functions for evaluating these two points are and . Adjust the intervals a and b according to the results of the comparison of and . If < , update b = . Otherwise, update a = . When is less than a certain tolerance, the iteration stops. The optimal bandwidth is determined by the final interval found between a and b, which contains the minimum value of the objective function.

The upper and lower bounds of the interval are determined using the bootstrap method. Multiple datasets are generated by randomly sampling with replacement. The number of samples is determined by the Poisson distribution. Each bootstrap sample set undergoes local bandwidth optimization and kernel density estimation to calculate the probability density function. This process is repeated multiple times to obtain probability density functions for various bootstrap sample sets. Quantile values are calculated for each sample point based on a certain confidence level, using all bootstrap sample sets. The resulting upper and lower bounds of the interval provide information on confidence intervals, allowing for statistical inference of the estimated probability density function.

Model performance indicators

To assess the time series forecasting models employed in this study, along with other PP methods and models, the evaluation metrics utilized include mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (). These metrics are defined as follows:
(24)
(25)
(26)
(27)
where n denotes the number of data points, denotes the actual value observed, denotes the corresponding predicted value, denotes taking the absolute value, and denotes the average of the actual values.

The MAE is a non-negative value, with a smaller MAE indicating a better model. RMSE is used to indicate how much error is introduced into the model predictions. It is more sensitive to larger error values. Again, the smaller the RMSE, the better. MAPE is more useful for assessing percentage error. A smaller MAPE indicates a better model. measures how well the model explains the variance of the target variable, with values closer to 1 indicating that the model explains the variable better. If , it means that the model is not as good as the baseline model, and it is likely that there is no linear relationship in the data. The advantage of is that it is easier to see the gaps between models.

Commonly used assessment metrics when evaluating interval predictions are PI Coverage Probability (PICP), PI Normalized Averaged Width (PINAW), coverage width-based criterion (CWC), and continuous ranked probability score (CRPS).

PICP is a measure of PI calibration with a predefined confidence level that indicates the rate at which the true value falls within the upper and lower bounds of the PI, defined as:
(28)
where if , otherwise . and are the lower and upper bounds of the ith PI, respectively.
PINAW is defined to measure the narrowness of the PIs.
(29)
where R is the range of target values.
In order to combine the coverage and narrowness of the PIs, CWC is proposed, which is defined as follows:
(30)
where is determined by PICP:
(31)

and are the parameters used to determine the penalty levels. is determined by the confidence interval. For example, if the confidence level is 90%, then μ = 0.9. The difference between PICP and is amplified and the magnitude of this amplification is controlled by , as defined by CWC. The smaller , the smaller the penalty for PICP not meeting the confidence level and the lower the importance level. If PICP reaches the confidence level, then CWC = PINAW.

CRPS measures the inconsistency between predicted and observed probability distributions, which is calculated as follows:
(32)
where F is the predicted cumulative distribution function. x is the observed value. H is a step function which is 0 if and is 1 if . y is a variable for integration. is the probability of being less than or equal to y in the prediction distribution. denotes the step function at observation x. It jumps to 1 at x, indicating that the observation is after x. The difference represents the square of the difference between the predicted probability distribution and the observed values. Integrating over the entire real number axis takes into account the squares of all differences, providing a comprehensive assessment of the entire probability distribution. The smaller the value of CRPS, the more accurate the prediction of the probability distribution and the better the agreement with observations.

Model framework

The proposed model framework is illustrated in Figure 1. The data flow of the model is illustrated in Figure 2. The key steps outlined in this study include:
  • (1) Data preparation and pre-processing, which involves dataset preparation from two WWTPs, feature selection, and data pre-processing.

  • (2) Constructing the network layer, where the network architecture is assembled with components such as CNN, BiGRU, and MHA.

  • (3) Interval prediction, where the merged representations are fed into a fully connected layer or suitable layer for interval prediction using ABKDE. The optimal bandwidth is determined using the GSS method, and the upper and lower bounds of the interval prediction are obtained through the bootstrap method.

  • (4) Model Prediction and Evaluation, which includes predicting and evaluating results on two WWTP test datasets, with additional post-processing of results as needed.

Figure 1

Framework of the model.

Figure 1

Framework of the model.

Close modal
Figure 2

The data flow of the model.

Figure 2

The data flow of the model.

Close modal

The CNN–BIGRU–MHA model offers precise point predictions that allow managers to monitor water quality conditions in real time and make prompt adjustments. On the other hand, the ABKDE model provides interval predictions of water quality metrics, helping decision-makers assess uncertainty and make more robust treatment plans. For example, in day-to-day operations, managers could use the CNN–BIGRU–MHA model's point predictions to quickly adjust the treatment process while also relying on the ABKDE model's interval predictions to prepare for potential future changes in water quality. This approach will help demonstrate the overall value of the models in real-world applications.

Data description

Two datasets are utilized in this study, one collected from a WWTP in Shanghai, China and the other from a WWTP in Zhejiang, China. These datasets consist of water quality information related to the effluent process of the WWTPs. The geographical details are presented in Supplementary material, Figure S4. To simplify discussions, these datasets are denoted as Dataset A and Dataset B, covering the time period from 8 November 2020 to 31 December 2023. Measurements were taken at 4-h intervals, with daily sampling times set at 0:00, 4:00, 8:00, 12:00, 16:00, and 20:00, each representing a single data point. However, certain time points have missing data. The raw data collected includes parameters such as monitoring time, water temperature, pH, pH category, dissolved oxygen, dissolved oxygen category, permanganate index, permanganate category, ammonia nitrogen, ammonia nitrogen category, total phosphorus, total nitrogen, conductivity, turbidity, chlorophyll and algal density. Notably, the pH category, dissolved oxygen category, permanganate category, ammonia nitrogen category, chlorophyll and algal density exhibit a substantial amount of missing values, rendering them unsuitable for modeling purposes. The remaining parameters need to be analyzed, with monitoring time serving as a time series variable. Supplementary material, Figure S5 depicts the raw data trends for the target predictor pH in this study. Dataset A and Dataset B contain 6,365 and 6,207 raw data entries, respectively.

Upon analysis of the raw data obtained from the two WWTPs, it was noted that there are missing values and anomalies present. Specifically, negative numbers were found in data related to parameters such as permanganate index, ammonia, total phosphorus, total nitrogen, conductivity, and turbidity, which is clearly illogical. Utilizing such noisy raw data in a predictive model can potentially compromise the accuracy of the data-driven model and lead to significant prediction errors. Therefore, it is crucial to address these issues before proceeding with the modeling process (Wang & Ying 2023). To handle missing values, linear interpolation methods are utilized. Since abnormal data occurrences are limited, the most effective approach is to directly remove outliers (Shen et al. 2022). Supplementary material, Table S1 displays the attributes of the processed Datasets A and B, including maximum, minimum, mean, and standard deviation.

In practical operations, the quality of effluent from WWTPs serves as a critical indicator of system performance and environmental compliance. The reasons for selecting effluent quality data are as follows:

  • (1) Real-time monitoring and adaptive management: In an operational environment, WWTPs can utilize real-time monitoring data of effluent quality to predict future changes based on historical trends. This predictive approach provides a foundation for dynamically adjusting treatment parameters, enabling early warnings of fluctuations in effluent quality, particularly in the face of uncertain external factors such as variations in environmental temperature or sudden increases in load.

  • (2) Simplifying complexity: Although influent information and treatment variables significantly impact effluent quality, the high complexity and uncertainty of the treatment process (e.g., variations in aeration rates, changes in microbial activity) make it very challenging to construct a comprehensive model that encompasses all variables. In contrast, using historical data of effluent quality for predictions mitigates this complexity and allows for quick and effective short-term warnings, aiding operators in making real-time adjustments.

While effluent quality is heavily influenced by influent information and treatment variables, it also exhibits a certain degree of autocorrelation, particularly in short-term predictions. This autocorrelation may manifest in the lagged effects of treatment facilities, where changes in effluent do not occur instantaneously but rather as cumulative responses to prior operations. The methods employed in this study leverage historical effluent pH data to capture these change trends and provide a reliable basis for short-term predictions. Our model utilizes DL techniques, particularly sequence models like BiGRU, to identify underlying patterns and long-term dependencies in time series data, enhancing the recognition and utilization of autocorrelation.

Experimental results demonstrate that the prediction framework based on effluent pH data performs well in short-term forecasts, especially regarding pH trends over the next few time steps. This study constructs a pH prediction model centered on CNN–BIGRU–MHA–ABKDE using historical effluent data. Although effluent quality is significantly influenced by influent conditions and treatment parameters, the effluent data itself exhibits a degree of autocorrelation in the time series. Leveraging the time series feature extraction capabilities of DL models, this method can accurately predict changes in effluent pH over short durations, providing operators with real-time adjustment references and enhancing the system's adaptive capacity.

Data smoothing

This paper uses WT to smooth the data, as shown in Supplementary material, Figure S6. The wavelet packet decomposition employs the Daubechies 10th order wavelet as its basis function, with a decomposition level of 6.

Based on the WT smoothing outcomes observed in datasets A and B, it is evident that WT mitigates the influence of noisy data to a certain degree while preserving the inherent trends within the datasets.

Data correlation analysis

This study utilizes a combination of CC and MI to identify input variables and target predictors, as well as to establish correlations between multiple variables and target predictors. The resulting plots of CC and MI for each of the two WWTPs are illustrated in Figure 3. It is evident from Figure 3 that the CC measurements of pH, total nitrogen, and turbidity show higher magnitudes compared to other parameters in the WWTP dataset. Similarly, the MI measurements of water temperature, pH, and turbidity also demonstrate larger values. After conducting several modeling trials, it was determined that pH was better modeled and thus selected as the target predictor. Through a series of modeling trials and errors, it was found that the inclusion of water temperature, dissolved oxygen, permanganate index, total phosphorus, total nitrogen, conductivity, and turbidity as input variables yielded the best results.
Figure 3

Correlation analysis of the two datasets: (a) results of the CC method for Dataset A, (b) results of the MI method for Dataset A, (c) results of the CC method for Dataset B, (d) results of the MI method for Dataset B.

Figure 3

Correlation analysis of the two datasets: (a) results of the CC method for Dataset A, (b) results of the MI method for Dataset A, (c) results of the CC method for Dataset B, (d) results of the MI method for Dataset B.

Close modal

Model parameters

It is essential to specify the water quality variables as they can significantly impact the model's predictive performance. The input parameters of the model include water temperature, dissolved oxygen, permanganate index, total phosphorus, total nitrogen, conductivity, and turbidity, with the target predictor being pH. The model operates on a 64-bit computer running Windows 11. Data pre-processing, correlation analysis, and WT operations were conducted in the Jupyter Notebook environment of Anaconda3 software (July 2020 version, Python 3.8), while both PP and interval prediction experiments were carried out in the MATLAB R2022b environment.

Wavelet packet decomposition utilizes the Daubechies 10th order wavelet with a decomposition level of 6. The datasets are split into training and test sets, sorted in time series order, with the top 70% as the training set and 30% as the test set. The training set to dataset ratio is 0.7. The experimental results discussed in this paper are solely based on the training set. The key parameters of CNN and BiGRU models are outlined in Supplementary material, Table S2.

Results and analysis

PP results and analysis

The CBGRU–MHA model was utilized in this study to generate PP results for both datasets. The black line in Supplementary material, Figure S7 represents the real value of the target predictor pH, while the red line represents the forecasted value. The close alignment between the curves of the forecast values and the real values indicates minimal deviation between the model's predictions and the actual values, showcasing excellent predictive performance. The evaluation results in Table 1 reveal that the of the model achieved 0.97 for both datasets, demonstrating a strong ability to explain the data and capture most of the variability in both the training and test sets. Additionally, the low MAE, RMSE, and MAPE values signify minimal model error, high precision, and good accuracy and stability, further confirming the exceptional PP performance of the model.

Table 1

Results of evaluation indicators

MAERMSEMAPEConfidence levelPICPPINAWCWCCRPS
Dataset A 0.0403 0.0521 0.46% 0.9943 95% 0.9454 0.0814 1.0837 0.1233 
90% 0.8409 0.0540 1.0840 0.1255 
85% 0.7350 0.0422 1.1419 0.1256 
Dataset B 0.0643 0.0816 0.80% 0.9797 95% 0.8957 0.1230 1.1505 0.2350 
90% 0.7932 0.1105 1.1654 0.2352 
85% 0.7142 0.0740 1.1851 0.2353 
MAERMSEMAPEConfidence levelPICPPINAWCWCCRPS
Dataset A 0.0403 0.0521 0.46% 0.9943 95% 0.9454 0.0814 1.0837 0.1233 
90% 0.8409 0.0540 1.0840 0.1255 
85% 0.7350 0.0422 1.1419 0.1256 
Dataset B 0.0643 0.0816 0.80% 0.9797 95% 0.8957 0.1230 1.1505 0.2350 
90% 0.7932 0.1105 1.1654 0.2352 
85% 0.7142 0.0740 1.1851 0.2353 

In addition, comparing the PP results of the CBGRU–MHA model on Dataset A and Dataset B revealed that the model exhibits superior performance on Dataset A.This discrepancy can be attributed to the significant variability of the predictor variable pH in Dataset B, which posed challenges for accurate model predictions due to its fluctuating nature. Nonetheless, it is noteworthy that the PP results for both datasets are commendable and exhibit minimal disparities, underscoring the model's strong generalizability and robustness.

Interval prediction results and analysis

Although the model in PP demonstrates outstanding performance, the individual measurements from PP results lack uncertainty on the assay, which may not be adequate for decision-making purposes (Wang & Ying 2023). To address this issue, the study employs ABKDE to generate interval prediction results with varying confidence intervals. The interval prediction results from this model for both datasets are illustrated in Supplementary material, Figure S8.

Within each confidence interval, the study observes that the model shows a diverse performance, as reflected in the trends of the different evaluation metrics (Table 1). This diversity further reveals differences in the predictive accuracy and reliability of the models in dealing with different confidence levels. Observation of Table 1 leads to the following conclusions: The values of PICP for the model at the 95% confidence interval are 0.9454 and 0.8957, respectively, indicating that the model is able to capture the actual observations at the 95% confidence level and the range of observations is well covered. The values of PINAW are 0.0814 and 0.1230, respectively, indicating that the average width of the predicted values is relatively small and the dispersion is also small. The values of CWC are 1.0837 and 1.1505, respectively, indicating moderate coverage of the range of observations. And the values of CRPS are 0.1233 and 0.2350, respectively, indicating that the model performs well in terms of cumulative distribution residuals. As the confidence interval narrows, the range of observations captured by the model narrows, while the CWC and CRPS values show a gradual increase but are still within reasonable limits. It is possible that this is due to the fact that the model is more uncertain about future observations when the confidence intervals are reduced, and that the model may miss certain edge cases or outliers, resulting in a slight decrease in the accuracy of the model. However, overall, the model has better interval prediction performance at different confidence interval levels. In addition, this study found that the width of the intervals decreases as the confidence interval level increases, resulting in interval predictions that provide smaller PIs, i.e. more detailed information, suggesting that the proportion of true values falling within the PIs decreases as the confidence interval level increases. On this basis, the predictive effects at different confidence levels are compared and, as CWC combines PICP and PINAW, it is found to be the best in Datasets A and B at the 95% confidence level, with the smallest CWC values and CRPS values compared to the other confidence levels.

In this paper, multiple error metrics were used to evaluate the model's predictive performance, with RMSE and CRPS being the most relevant to the model's ability to predict fluctuations in the effluent pH values. RMSE amplifies prediction errors by squaring them, making it particularly sensitive to sudden and large fluctuations, thus reflecting the model's accuracy in predicting sharp changes. CRPS, on the other hand, measures the accuracy of the predicted probability distribution, capturing the model's performance in forecasting fluctuations in a probabilistic sense. Lower RMSE and CRPS values indicate that the model is better at handling abrupt changes.

Moreover, upon meticulous examination of Table 1, it is apparent that the model demonstrates exceptional proficiency in forecasting fluctuation values. This highlights the model's outstanding performance not only in traditional predictive tasks but also in accurately forecasting the trends of fluctuations within the dataset.

Ablation experiment

Ablation experiments are essential to validate the different components of the model used in this study. The study involved running models with and without this strategy on two datasets, measuring various evaluation metrics for each model (refer to Table 2 and Figure 4). To maintain consistency, all models were executed in the MATLAB R2022b environment with identical parameters.
Table 2

Comparison of point prediction evaluation metrics for models with and without strategy

ModelDataset A
Dataset B
MAERMSEMAPE MAERMSEMAPE
CBGRU–MHA 0.0403 0.0521 0.46% 0.9943 0.0643 0.0816 0.80% 0.9797 
CNN–GRU–MHA 0.0686 0.1100 0.78% 0.9746 0.0820 0.1273 1.02% 0.9506 
CNN–BiGRU–AM 0.0715 0.1173 0.82% 0.9711 0.0927 0.1298 1.15% 0.9486 
CNN–GRU–AM 0.2395 0.2058 2.73% 0.9111 0.2746 0.2191 3.42% 0.8537 
CNN–MHA 0.2112 0.2033 2.41% 0.9132 0.2442 0.1731 3.04% 0.9087 
BiGRU–MHA 0.1955 0.1933 2.23% 0.9215 0.2497 0.1568 3.11% 0.9250 
CNN 0.2310 0.2099 2.63% 0.9075 0.2923 0.1834 3.64% 0.8974 
BiGRU 0.2447 0.2180 2.79% 0.9002 0.2370 0.1732 2.95% 0.9085 
ModelDataset A
Dataset B
MAERMSEMAPE MAERMSEMAPE
CBGRU–MHA 0.0403 0.0521 0.46% 0.9943 0.0643 0.0816 0.80% 0.9797 
CNN–GRU–MHA 0.0686 0.1100 0.78% 0.9746 0.0820 0.1273 1.02% 0.9506 
CNN–BiGRU–AM 0.0715 0.1173 0.82% 0.9711 0.0927 0.1298 1.15% 0.9486 
CNN–GRU–AM 0.2395 0.2058 2.73% 0.9111 0.2746 0.2191 3.42% 0.8537 
CNN–MHA 0.2112 0.2033 2.41% 0.9132 0.2442 0.1731 3.04% 0.9087 
BiGRU–MHA 0.1955 0.1933 2.23% 0.9215 0.2497 0.1568 3.11% 0.9250 
CNN 0.2310 0.2099 2.63% 0.9075 0.2923 0.1834 3.64% 0.8974 
BiGRU 0.2447 0.2180 2.79% 0.9002 0.2370 0.1732 2.95% 0.9085 
Figure 4

Comparison results of different models.

Figure 4

Comparison results of different models.

Close modal

The results show that removing BiGRU, MHA or CNN all lead to worse model results (Liu et al. 2023; Zhou et al. 2023; Bi et al. 2024; Cui et al. 2024; Wang & Cao 2024), proving that these components play an important role in the models studied in this paper and are essential for improving the predictive performance of the models. The hybrid model CNN–GRU–AM is even worse than CNN–MHA, BiGRU–MHA, which may be due to the characteristics of CNN–MHA, BiGRU–MHA to better capture the data of the two selected WWTPs. Further, it is observed that the addition of BiGRU significantly outweighs the optimization effect of CNN, possibly due to the sequence data being more important than the spatial data in Dataset A, Dataset B. Therefore, the ability of BiGRU to capture the dependencies and dynamic changes in sequence data is more important. Meanwhile, the addition of BiGRU is better than the addition of GRU alone, indicating that the introduction of BiGRU can significantly improve the performance of the model because BiGRU can better account for past and future information in the sequence. Finally, adding MHA is more effective than adding AM alone. This may be because MHA is able to better capture the importance of different aspects of the input sequence, thereby increasing the model's focus on the most important information in the data.

PP results and comparative analysis

To assess the CBGRU–MHA model's efficacy in the PP task, the study conducted a comparative analysis against traditional single models. Specifically, the study compared the CBGRU–MHA model with the following individual models: the ANN model, the Back Propagation Neural Network (BPNN) model, the CNN model, the BiGRU model, the LSTM model, the RNN model, the Temporal Convolutional Network (TCN) model, and the SVM model.

To explore the performance of other hybrid models, the study selected LSTM and BiLSTM models and combined them with CNN, MHA or AM models to create new hybrid models. Specifically, the study compares the performance of the CNN–BiLSTM–AM (Xia et al. 2023), CNN–LSTM–AM (Sun et al. 2024), CNN–BiLSTM–MHA (Li et al. 2024) and CNN–LSTM–MHA (Wu et al. 2023a) models, and the main parameters of these comparison models are shown in Supplementary material, Table S3. The results of the performance evaluation metrics of the different prediction methods are shown in Supplementary material, Table S4 and Figure 5(a) and 5(b).
Figure 5

Histogram of evaluation metrics for the results of the comparison of the point prediction and interval prediction models. (a) Comparison of point predictions – Dataset A, (b) comparison of point predictions – Dataset B, (c) comparison of interval predictions (confidence interval of 95%) – Dataset A, (d) comparison of interval predictions (Confidence interval of 90%) – Dataset A, (e) comparison of interval predictions (Confidence interval of 95%) – Dataset B, (f) comparison of interval predictions (Confidence interval of 90%) – Dataset B.

Figure 5

Histogram of evaluation metrics for the results of the comparison of the point prediction and interval prediction models. (a) Comparison of point predictions – Dataset A, (b) comparison of point predictions – Dataset B, (c) comparison of interval predictions (confidence interval of 95%) – Dataset A, (d) comparison of interval predictions (Confidence interval of 90%) – Dataset A, (e) comparison of interval predictions (Confidence interval of 95%) – Dataset B, (f) comparison of interval predictions (Confidence interval of 90%) – Dataset B.

Close modal

According to the results, among the individual models, the study observed that the CNN model and BiGRU model performed the best in Dataset A and Dataset B, followed by the LSTM model with relatively better predictions. The TCN and SVM models performed relatively poorly. Therefore, it is concluded that PP performance of the CBGRU–MHA model in predicting pH for the two datasets used in this paper is better than the other single AI prediction models compared.

Then, comparing and analyzing the results of the hybrid models, among the four hybrid models added, the best result is CNN–BiLSTM–MHA, but the prediction is still worse than the CBGRU–MHA model used in this paper. In addition, the result illustrates once again that MHA and bi-directional models can optimize the model to a certain extent, as in the case of the CNN–BiLSTM–MHA model in comparison with the CNN–BiLSTM–AM model and the CNN–LSTM- MHA model. However, it should be noted that on the datasets used in this paper, there are cases where the hybrid models are less effective than single models, for example, the BiGRU model is better than the CNN–LSTM–AM model, which may be due to the fact that the characteristics of the wastewater datasets used are not sufficiently suitable for the application of the CNN–LSTM–AM hybrid model. This demonstrates that hybrid models are not necessarily advantageous on certain datasets. Hence, a comprehensive consideration of task requirements, data characteristics, model design, and domain knowledge is essential when choosing models, ensuring that the selected model aligns most effectively with the prediction task's needs.

When comparing the performance of different models, it is essential to consider not only predictive accuracy but also the impact of model size and complexity on prediction performance. Larger models typically have more adjustable parameters, which theoretically give them greater capacity for fitting the data. However, they also run the risk of overfitting, especially when dealing with small or noisy datasets, where their performance may degrade. Additionally, larger models can significantly increase both training and inference time.

Among the models compared in this study, although the CBGRU–MHA model has a relatively large number of parameters, it still outperforms other single and hybrid models in terms of predictive performance. In contrast, some models with fewer parameters, such as SVM and TCN, are limited in their parameter count and representational capacity, which makes them less capable of capturing complex spatiotemporal dependencies, leading to inferior performance.

However, the performance of hybrid models varies. For example, the CNN–BiLSTM–MHA model, despite its higher parameter count and complexity, performs well but still falls short of the CBGRU–MHA model on certain datasets. This suggests that while larger models may have an advantage in terms of representational power, their actual performance depends on the characteristics of the dataset and the specific task at hand.

Interval prediction results and comparative analysis

The interval prediction method used in this paper is ABKDE and the kernel function is Gaussian kernel. Using three different kernel functions, the study compares and evaluates the interval prediction results of ABKDE and KDE. Confidence intervals of 95 and 90% are used, giving more reliable results while increasing the range of PIs. The evaluation metrics of the model's interval prediction under different strategies are shown in Supplementary material, Table S5 and Figure 5(c)–5(f).

Firstly, it is observed that the PICP values measured using the ABKDE method are generally higher than the KDE on the two datasets used. This means that the actual observations of the ABKDE are more likely to fall within the predicted intervals, suggesting that the ABKDE is more accurate in capturing the true data distribution. Meanwhile, indicators such as PINAW, CWC and CRPS are also generally reduced to varying degrees compared to KDE, indicating that ABKDE has improved in terms of PI width and accuracy.

In the following, for the three kernel functions Gauss, Laplace and Cauchy, it is observed that both ABKDE and KDE give the best results for Gauss in terms of PICP. This suggests that the Gaussian kernel function is better at capturing data distributions, and its typical bell curve shape is suitable for most real data distributions. In addition, the Gaussian kernel function exhibits lower error and higher robustness compared to Laplace and Cauchy in the CRPS metric, further confirming the superiority of Gaussian in the two datasets used.

The interval prediction results of ABKDE and KDE with three different kernel functions validate the excellent performance of ABKDE and Gaussian kernel functions. For ABKDE, the combination of the Gaussian kernel function shows good adaptability and robustness in interval prediction, making it a reliable interval prediction method. The results of this study provide an important reference for future interval prediction model design and optimization.

Forecast period

To offer a comprehensive evaluation of the prediction models utilized to address the diverse decision and planning requirements of real-world WWTPs, this study took into account the influence of the prediction time frame on prediction accuracy. In this study, forecast periods are categorized as very short-term, short-term and long-term. Time steps are , , and days for the very short-term, 1, 2, and 3 days for the short-term and 7, 14, and 28 days for the long-term. The results are shown in Supplementary material, Table S6. From the analysis of the length of the prediction time period, in the very short-term, the MAE, RMSE, and MAPE values of Dataset A gradually decrease and the value increases with the increase of the time step, and the best performance is achieved at the time step of day. But the MAE and MAPE values of Dataset B fluctuate upward, the RMSE value gradually increases, gradually decreases, and relatively speaking, Dataset B performs best at a time step of days. In both short and long term, the MAE, RMSE and MAPE values of Dataset A are gradually increasing and is decreasing. The difference is that in the short-term the change is slower, while in the long-term the change is much faster. With a time step of 28 days compared to 14 days, the MAE increased by 0.1058, the RMSE increased by 0.1205, the MAPE increased by 1.17% and the decreased by 0.0942. The MAE, RMSE, MAPE, and of Dataset B show a fluctuating downward trend. So in the short-term, Dataset A performs best at a time step of 1 day, while Dataset B performs best at a time step of 2 days. In the long-term, Dataset A performs best at a time step of 7 days, while Dataset B performs best at a time step of 14 days. Overall, the model performs better in the very short-term than in the short-term, and in the short-term than in the long-term, but the predictive performance is still excellent.

As shown in Figure 6, throughout the analysis of forecast period, the of Dataset A and Dataset B are above 0.88, with the of Dataset B reaching above 0.94, demonstrating the superior predictive performance of the model. For very short-term and short-term predictions, the model performs consistently on the two WWTP datasets, with high values and low MAEs and RMSEs, indicating that the model is capable of making accurate predictions for very short-term and short-term changes.
Figure 6

Line graph of evaluation indicators for different forecast period: (a) Dataset A and (b) Dataset B.

Figure 6

Line graph of evaluation indicators for different forecast period: (a) Dataset A and (b) Dataset B.

Close modal

In terms of long-term forecasting, the model shows instability as the time step increases, and MAE, RMSE, and MAPE show an increasing trend, indicating that the model's predictive ability is more challenged. According to the results, for predicted time steps less than or equal to 14 days, the model performance is excellent, with above 0.97 on both datasets. A more pronounced fluctuation in prediction performance occurs at a time step of 28 days, which may be due to the fact that the effects of internal and external factors are more complex at this time and the model has difficulty capturing all the dynamic relationships.

In addition, the study finds that the error of Dataset A is generally lower than that of Dataset B when the time step is small, but the increase in error is relatively large. In terms of fit, the of Dataset A is generally higher than that of Dataset B, indicating a better model fit for Dataset A. It is worth noting that Dataset A has a significant drop in prediction at a time step of 28 days. This may be due to the more stable operation of the wastewater treatment system in the Shanghai, where the data is more regular and the model is more likely to capture the regularity. Dataset A performs well in the short-term, but as the time step increases, the prediction error increases more rapidly and the fit decreases. This can be affected by region-specific conditions, such as weather changes and equipment failures, which can affect the operational stability of the treatment system and reduce the fit of the model. Further development of the model in terms of long-term forecasting will be necessary in the future to improve the model's forecasting performance.

Special data forecasts

Although the model has shown excellent predictive performance on both datasets, the data is relatively complete as the datasets used by the model run from 8 November 2020 to 31 December 2023. To assess the model's performance on unique datasets, the study identified several segments within Dataset A featuring special values such as peaks, fluctuations, maximums, and minimums. These segments were discerned based on the real data depiction of the predictor pH as illustrated in Supplementary material, Figure S5. According to the labeled sections in Supplementary material, Figure S9, the study takes these four labeled data from Dataset A, respectively, and the groups of data with basic information are shown in Supplementary material, Table S7.

The model of this paper is run for these four particular datasets in turn, as shown in Supplementary material, Figure S10 and Table 3. It can be observed that under the conditions of special values such as peak values, the PP performance of the model still remains in a relatively excellent range, and the are all greater than 0.8, which can well explain the changes of the predicted target variables, and the degree of variation between the predicted values and the actual observed values is relatively small, and it has a high ability in explaining the special data. For Datasets 1–4, the trends of the fitted curves of the PP values and the true values are basically the same, with a good fit. The values of MAE, RMSE and MAPE, which indicate different errors, for Datasets 1, 4 are all small, while are all greater than 0.85, suggesting that the model PPs in Datasets 1, 4 very well. For Datasets 2 and 3, despite the relatively large error, the minimum value of for the measured dataset is 0.8013, indicating that although the model has some error in predicting some of the special values, it is highly interpretable and able to predict data containing peaks with a high degree of fluctuation to some extent.

Table 3

Assessment metrics for point prediction for four special datasets

DatasetMAERMSEMAPE Confidence levelPICPPINAWCWCCRPS
0.0604 0.0711 0.83% 0.9111 95% 0.9019 0.3163 1.3535 0.2623 
90% 0.8529 0.2524 1.3022 0.2555 
0.0760 0.1071 0.92% 0.8013 95% 0.9310 0.5756 1.5978 0.1464 
90% 0.8867 0.5473 1.5795 0.1413 
0.1685 0.2004 1.90% 0.8222 95% 0.9464 0.2471 1.2615 0.3595 
90% 0.8892 0.2200 1.2508 0.3805 
0.0473 0.0949 0.56% 0.8543 95% 0.9465 0.3855 1.3999 0.2441 
90% 0.8983 0.2033 1.2294 0.2274 
DatasetMAERMSEMAPE Confidence levelPICPPINAWCWCCRPS
0.0604 0.0711 0.83% 0.9111 95% 0.9019 0.3163 1.3535 0.2623 
90% 0.8529 0.2524 1.3022 0.2555 
0.0760 0.1071 0.92% 0.8013 95% 0.9310 0.5756 1.5978 0.1464 
90% 0.8867 0.5473 1.5795 0.1413 
0.1685 0.2004 1.90% 0.8222 95% 0.9464 0.2471 1.2615 0.3595 
90% 0.8892 0.2200 1.2508 0.3805 
0.0473 0.0949 0.56% 0.8543 95% 0.9465 0.3855 1.3999 0.2441 
90% 0.8983 0.2033 1.2294 0.2274 

For interval prediction, 95 and 90% confidence intervals are selected, and the performance of the models used in this study for special datasets is evaluated using PICP, PINAW, CWC, and CRPS, as shown in Supplementary material, Figure S11 and Table 3.

Datasets 1 and 4 have better results, while Datasets 2 and 3 have differences in the prediction of some of the values. After analyzing the results, the study believes that there are several reasons for the following:

  • (1) The data fluctuations of Datasets 2, 3 are to some extent larger than those of Datasets 1, 4. The data distribution is more uneven and less regular, and the sharp fluctuations of the data cause difficulties in prediction.

  • (2) The test sets of Datasets 2 and 3 show huge data fluctuations. The extreme deviation of the target predictor pH reaches 1 on the test set of Dataset 2, and even 1.4 on Dataset 3. The disconnected test sets limit the predictions of the model.

  • (3) In this experiment, the parameters chosen for the model remain the same as those used throughout the study of the model in this paper. It is well known that model parameters affect prediction, so for datasets containing large peaks and fluctuations, the structure and parameters of the model need to be adjusted to obtain better interval predictions.

The prediction results are also reflected in the evaluation indicators. Firstly, for Datasets 1 and 3, the evaluation indicators show excellent results, indicating that the model captures the true values very well. This shows that the model performs extremely well, accurately capturing the range of the distribution of the data with a high degree of interpretability and adaptability. For Dataset 4, the study observes that the model covers the true values well, with low CWC and CRPS. This shows that the model performs well in predicting the intervals of Dataset 4 at 95 and 90% confidence levels. This good performance may stem from the characteristics of the data distribution of Dataset 4 and the adaptability of the model in dealing with this data distribution. In addition, the performance of the model at higher confidence levels demonstrates the robustness of its predictions, capturing the distributional characteristics well. However, the interval predictions for Dataset 2 were relatively poor and did not perform as well as Datasets 1, 3, and 4. This may be due to the fact that the characteristics of Dataset 2 is different from the previous three, leading to challenges for the model in dealing with this distribution. The characteristics of the datasets and possible limitations of the model need to be further analyzed. However, despite the challenges encountered, the model is still able to make predictions on datasets, which suggests that the model has the ability to generalize and adapt to handle some degree of spikes and fluctuations.

In summary, the model used in this paper is able to provide some good interval prediction for data containing peaks and large fluctuations. Differences in the predictive effectiveness of intervals across datasets reflect the sensitivity of the model to the characteristics of the data, as well as the strengths and weaknesses of the model itself. In future studies, it is worthwhile to further analyze the data characteristics from different WWTPs at various stages and to investigate the model performance on data with different characteristics, which is crucial for improving the predictive and generalizing ability of the model.

Model suitability

The model utilized in this paper is capable of forecasting multivariate input water quality data originating from WWTPs. In this section, to thoroughly validate the robustness of the CBGRU–MHA–ABKDE model, dissolved oxygen and turbidity are employed as predictor variables, and the outcomes are depicted in Supplementary material, Figures S12 and S13, Table 4.

Table 4

Evaluation metrics when dissolved oxygen is used as target predictor variable

Predictor variableDatasetMAERMSEMAPE Confidence levelPICPPINAWCWCCRPS
Dissolved oxygen 0.0611 0.0726 0.71% 0.9914 95% 0.8971 0.0833 1.1231 0.0944 
90% 0.7968 0.0660 1.1456 0.0945 
85% 0.7250 0.0575 1.1627 0.0946 
0.3352 0.4331 3.73% 0.9697 95% 0.8860 0.1154 1.1609 0.2051 
90% 0.8379 0.0923 1.1499 0.2050 
85% 0.7932 0.0780 1.1460 0.2052 
Turbidity 0.1739 0.2549 5.35% 0.9837 95% 0.9114 0.0852 1.1175 0.1851 
90% 0.8218 0.0592 1.1253 0.1867 
85% 0.7430 0.0441 1.1393 0.1883 
2.9184 5.159 59.89% 0.8653 95% 0.8804 0.0964 1.1448 0.0955 
90% 0.7703 0.0516 1.1456 0.0906 
85% 0.6877 0.0402 1.1662 0.0925 
Predictor variableDatasetMAERMSEMAPE Confidence levelPICPPINAWCWCCRPS
Dissolved oxygen 0.0611 0.0726 0.71% 0.9914 95% 0.8971 0.0833 1.1231 0.0944 
90% 0.7968 0.0660 1.1456 0.0945 
85% 0.7250 0.0575 1.1627 0.0946 
0.3352 0.4331 3.73% 0.9697 95% 0.8860 0.1154 1.1609 0.2051 
90% 0.8379 0.0923 1.1499 0.2050 
85% 0.7932 0.0780 1.1460 0.2052 
Turbidity 0.1739 0.2549 5.35% 0.9837 95% 0.9114 0.0852 1.1175 0.1851 
90% 0.8218 0.0592 1.1253 0.1867 
85% 0.7430 0.0441 1.1393 0.1883 
2.9184 5.159 59.89% 0.8653 95% 0.8804 0.0964 1.1448 0.0955 
90% 0.7703 0.0516 1.1456 0.0906 
85% 0.6877 0.0402 1.1662 0.0925 

The results showed that when the target predictor variable was dissolved oxygen, the developed model gave excellent prediction results on both datasets, with reaching 96% and the curves of predicted and actual values fitted well. In terms of PP, the model predicts Dataset A better than Dataset B. In terms of interval prediction, both PICP and PINAW tended to increase gradually with increasing confidence interval level, with less difference for CRPS. According to the CWC indicator, Dataset A is best predicted at the 95% confidence level, while Dataset B is best predicted at the 85% confidence level.

For the targeted predictor variable, turbidity, the model showed a high accuracy with of 98% on Dataset A, while it was slightly less accurate on Dataset B with of 0.8653. In terms of PP, the model still predicts Dataset A better than Dataset B; In terms of interval prediction, PICP and PINAW increased with increasing confidence levels, whereas CRPS differed less. According to the CWC indicator, datasets A and B are better predicted at the 95% confidence level.

Taken together, the CBGRU–MHA–ABKDE model showed good prediction results for both target predictor variables, dissolved oxygen and turbidity, confirming the suitability of the model. However, further research and modeling improvements are needed to improve the accuracy of predictions in specific contexts.

Model robustness

Data acquisition equipment at WWTPs can be affected by factors such as equipment limitations, ageing, and inaccuracies, leading to erroneous measurements. Therefore, it is essential to evaluate the model's predictive performance when erroneous data is present in the training set to assess its robustness. In this study, 2% of the processed training data was randomly sampled, and errors were introduced into these data points. These erroneous data were generated as random outliers, with values either two to three times larger or smaller than the original normal data.

Given the use of multivariate inputs in this study, a randomized approach was adopted to determine the number and magnitude of erroneous data for the input variables in the sampled training set. For example, within a single row of data, there could be one magnified erroneous data point, or there might be three magnified and two minimized erroneous data points, all assigned randomly.

The training sets derived from processed datasets A and B, after being modified with introduced erroneous data, were merged with the original test set. Then, predictions were made using the model proposed in this paper, ensuring consistency in model parameters throughout the process to assess its robustness. The results, which are presented in Table 5, summarize the evaluation metrics after the insertion of erroneous data.

Table 5

Robustness detection (evaluation metrics results after inserting erroneous data)

DatasetMAERMSEMAPE Confidence levelPICPPINAWCWCCRPS
0.0390 0.0521 0.45% 0.9943 95% 0.9554 0.0887 1.0986 0.1229 
90% 0.8774 0.0599 1.0968 0.1248 
85% 0.7721 0.0464 1.1258 0.1252 
0.0721 0.0936 0.90% 0.9733 95% 0.8974 0.1409 1.1804 0.2329 
90% 0.7988 0.1042 1.1827 0.2345 
85% 0.7154 0.0847 1.1952 0.2363 
DatasetMAERMSEMAPE Confidence levelPICPPINAWCWCCRPS
0.0390 0.0521 0.45% 0.9943 95% 0.9554 0.0887 1.0986 0.1229 
90% 0.8774 0.0599 1.0968 0.1248 
85% 0.7721 0.0464 1.1258 0.1252 
0.0721 0.0936 0.90% 0.9733 95% 0.8974 0.1409 1.1804 0.2329 
90% 0.7988 0.1042 1.1827 0.2345 
85% 0.7154 0.0847 1.1952 0.2363 

This test enables us to evaluate the model's robustness. Robustness refers to the model's ability to maintain good predictive performance even in the presence of noise or anomalies in the data. In this test, we simulated potential data abnormalities by introducing randomly generated erroneous data into the training set and then observed how the model performed under these conditions. If the model is able to maintain high predictive accuracy after the introduction of erroneous data, and even show improvements in certain metrics, this would indicate a high tolerance of the model to noise.

Based on the results, it is found that the model's point and interval prediction performance is still very good after setting the wrong data. The performance evaluation metrics are essentially unchanged from the original model, and in some cases are even better. After analyzing the situation, there may be the above reasons:

  • (1) Robustness of the model. The model is highly fault-tolerant, which means it can effectively deal with noise and outliers in the input data. The model is capable of handling data with a 2% error rate.

  • (2) If there is some degree of correlation between several input features, the model can use this redundant information to compensate for the loss of information caused by partially erroneous data.

The model maintains excellent performance after the introduction of erroneous data, reflecting its good robustness. The model used in this paper is currently able to handle datasets with an error rate of at least 2%. It also suggests that the robustness of the model can be further explored and optimized in future research, for example by introducing more sophisticated anomaly detection mechanisms.

In this section, we provide a detailed discussion on how the proposed water quality prediction model can be integrated into the existing system of a WWTP. Through intelligent and automated management, the integration aims to enhance overall system efficiency, reduce operational costs, and ensure both environmental compliance and long-term sustainability.

Integration of data collection and preprocessing

WWTPs are widely equipped with water quality monitoring devices, such as pH sensors, COD sensors, and dissolved oxygen sensors. These sensors provide real-time, multidimensional data on water quality, forming the foundation for the effective operation of the water quality prediction model.

To ensure data accuracy and consistency, the data collected by the sensors must undergo preprocessing. This includes noise removal, filling missing data, and performing standardization to ensure that the input data meets the model's expected format. This preprocessing module will integrate with the existing monitoring system to establish a real-time, efficient data pipeline.

Model deployment and computational architecture design

Deploying the model is critical and requires selecting the appropriate architecture. WWTPs can opt for either local deployment or cloud-based deployment, depending on their computational resources and real-time processing requirements. For larger plants, local deployment helps reduce data transmission latency and enhances real-time prediction performance. The model is interfaced with the plant's control system through APIs or other data interfaces, ensuring that prediction results are effectively applied in real-world operations.

The integration of the model enables real-time water quality predictions, allowing for accurate forecasting of future conditions. Based on these predictions, the system can automatically adjust key treatment parameters, such as chemical dosing, aeration intensity, and sedimentation time, optimizing the entire treatment process.

Application of predictions and automated control

The integration of the water quality prediction model significantly improves WWTP management processes. The model can forecast changes in water quality over a specified period, providing the basis for automated control decisions. By incorporating the prediction model, the system can proactively adjust process parameters in advance, preventing non-compliant discharges or insufficient treatment.

In practice, when the model predicts that water quality will exceed regulatory limits or that treatment challenges will increase, the system can issue timely alerts and automatically take corrective measures, such as increasing chemical dosing, extending treatment time, or activating backup treatment facilities. This approach effectively reduces the risk of sudden pollution incidents and enhances the stability and reliability of the treatment process.

Long-term adaptation and model optimization

Over time, WWTPs accumulate increasing amounts of real-time water quality data. Using this data, the water quality prediction model can be continuously optimized to become more adaptive. The model's self-learning capability allows it to handle water quality fluctuations, seasonal variations, and unforeseen events, ensuring that prediction accuracy and reliability improve over time.

This adaptive mechanism also supports the plant's long-term operational planning, reducing the need for manual intervention and advancing intelligent management.

Enhancing management efficiency and environmental compliance

The integration of the water quality prediction model is not merely a technical innovation; it represents a comprehensive upgrade to the management model of WWTPs. Traditionally, these plants have relied heavily on experience and manual control. However, with increasingly stringent water quality regulations and more complex water compositions, manual management is no longer efficient. By adopting intelligent and automated methods, human intervention can be significantly reduced, leading to more efficient operations.

  • (1) Reducing operational costs: By optimizing chemical dosing, energy usage, and treatment times, the prediction model can effectively reduce the resource consumption of wastewater treatment processes. This is particularly important for plants that operate over long periods, as it not only lowers production costs but also reduces environmental impact.

  • (2) Improving emergency response capabilities: In the event of sudden pollution incidents, the water quality prediction model can respond rapidly, forecasting the spread of contamination and automatically adjusting treatment processes. This significantly enhances emergency response capabilities and reduces the risk of non-compliant discharges.

  • (3) Ensuring long-term compliance: As water discharge standards become more stringent, WWTPs face increasing regulatory pressure. With the model's early warning system and real-time control, treated water can consistently meet regulatory standards, helping plants avoid fines or shutdowns.

  • (4) The necessity for future development: With rapid urbanization and increased industrial discharges, the demand for wastewater treatment is rising. In the future, treatment plants will face greater pressures, and water quality will become increasingly complex. Traditional treatment technologies alone will no longer suffice to meet these challenges. Therefore, the introduction of water quality prediction models is not only essential for current management needs but also provides a more intelligent solution for the future of wastewater treatment.

Integrating the water quality prediction model is a crucial step toward intelligent and automated management in the wastewater treatment industry. The application of this technology not only significantly improves operational efficiency but also ensures environmental compliance, reduces operational risks, and aligns with current and future environmental policies and sustainability goals. As the industry evolves, intelligent and automated management will become the standard, and this work offers a feasible and practically applicable solution for that transition.

The study presents the CBGRU–MHA with ABKDE method to tackle the challenge of multivariate time series interval prediction in water quality data from WWTPs. Initially, data from WWTPs in Shanghai and Zhejiang, China, underwent preprocessing, involving WT for data smoothing and noise reduction. Subsequently, CC and MI methods were employed to identify input variables and target predictors, utilizing multiple modeling iterations and trial-and-error approaches. To account for the varied ranges of different indicators, data normalization was performed. Moving forward, a combination of CNN and BiGRU models addressed time series correlation, while MHA evaluated potential correlations among water quality data indicators of WWTPs. Interval prediction was facilitated using ABKDE, with upper and lower bounds derived through the bootstrap method. Stochastic gradient descent optimized model parameters, enhancing effectiveness. Ablation experiments elucidated the impacts of various model components, showcasing the model's capability to handle data containing peaks and significant fluctuations. The model's superiority in water quality prediction of WWTPs was highlighted through the analysis of forecast period and comparisons with alternative models. Suitability and robustness testing confirmed the model's outstanding generalization capability and resilience.

Despite these advancements, certain limitations remain, necessitating further enhancements. For instance, the current forecast factors focus solely on WWTP effluent indicators, overlooking other potential influencing factors such as geographical location, climatic conditions, and industrial structure.

Incorporating these considerations into predictive modeling for WWTPs can enhance accuracy, interpretability, and contribute to sustainable development goals.

It was supported by the Humanities and Social Sciences Research Planning Fund Program, Ministry of Education, China (No. 24YJAZH167). It was also supported by the Open Fund of Key Laboratory of Sediment Science and Northern River Training, the Ministry of Water Resources, China Institute of Water Resources and Hydropower Research (Grant No. IWHR-SEDI-2023-10).

S.L. contributed to conceptualization, methodology, software, writing – original draft. Z.W. contributed to methodology, validation, writing – review & editing, supervision. Y.L. did software analysis.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abouzari
M.
,
Pahlavani
P.
,
Izaditame
F.
&
Bigdeli
B.
(
2021
)
Estimating the chemical oxygen demand of petrochemical wastewater treatment plants using linear and nonlinear statistical models–A case study
,
Chemosphere
,
270
,
129465
.
https://doi.org/10.1016/j.chemosphere.2020.129465
.
Aghdam
E.
,
Mohandes
S. R.
,
Manu
P.
,
Cheung
C.
,
Yunusa-Kaltungo
A.
&
Zayed
T.
(
2023
)
Predicting quality parameters of wastewater treatment plants using artificial intelligence techniques
,
Journal of Cleaner Production
,
405
,
137019
.
https://doi.org/10.1016/j.jclepro.2023.137019
.
Alvi
M.
,
Batstone
D.
,
Mbamba
C. K.
,
Keymer
P.
,
French
T.
,
Ward
A.
,
Dwyer
J.
&
Cardell-Oliver
R.
(
2023
)
Deep learning in wastewater treatment: A critical review
,
Water Research
,
245
,
120518
.
https://doi.org/10.1016/j.watres.2023.120518
.
Bagherzadeh
F.
,
Nouri
A. S.
,
Mehrani
M. J.
&
Thennadil
S.
(
2021
)
Prediction of energy consumption and evaluation of affecting factors in a full-scale WWTP using a machine learning approach
,
Process Safety and Environmental Protection
,
154
,
458
466
.
https://doi.org/10.1016/j.psep.2021.08.040
.
Bankole
A. O.
,
Moruzzi
R.
,
Negri
R. G.
,
Bressane
A.
,
Reis
A. G.
,
Sharifi
S.
,
Bankole
A. O.
&
Bankole
A. R.
(
2024
)
Machine learning framework for modeling flocculation kinetics using non-intrusive dynamic image analysis
,
Science of The Total Environment
,
908
,
168452
.
https://doi.org/10.1016/j.scitotenv.2023.168452
.
Bi
J.
,
Gao
M.
,
Bao
K.
,
Zhang
W.
,
Zhang
X.
&
Cheng
H.
(
2024
)
A CNNGRU-MHA method for ship trajectory prediction based on marine fusion data
,
Ocean Engineering
,
310
,
118701
.
https://doi.org/10.1016/j.oceaneng.2024.118701
.
Chen
C.
,
An
J.
,
Zhou
X.
,
Wang
C.
,
Li
H.
&
Yan
D.
(
2024a
)
Deviation entropy-based dynamic multi-model ensemble interval prediction method for quantifying uncertainty of building cooling load
,
Energy and Buildings
,
318
,
114419
.
https://doi.org/10.1016/j.enbuild.2024.114419
.
Chen
Z.
,
Zhang
B.
,
Du
C.
,
Meng
W.
&
Meng
A.
(
2024b
)
A novel dynamic spatio-temporal graph convolutional network for wind speed interval prediction
,
Energy
,
294
,
130930
.
https://doi.org/10.1016/j.energy.2024.130930
.
Cui
Z.
,
Sun
Y.
,
Li
Z.
,
Liu
B.
&
Tian
W.
(
2024
)
Traceability analysis of wastewater in coal to ethylene glycol process based on dynamic simulation and deep learning
,
Journal of Cleaner Production
,
443
,
141133
.
https://doi.org/10.1016/j.jclepro.2024.141133
.
Dhal
P.
&
Azad
C.
(
2022
)
A comprehensive survey on feature selection in the various fields of machine learning
,
Applied Intelligence
,
52
(
4
),
4543
4581
.
https://doi.org/10.1007/s10489-021-02550-9
.
Dong
J.
,
Wang
Z.
,
Wu
J.
,
Cui
X.
&
Pei
R.
(
2024
)
A novel runoff prediction model based on support vector machine and gate recurrent unit with secondary mode decomposition
,
Water Resources Management
,
38
(
3
),
1655
1674
.
https://doi.org/10.1007/s11269-024-03748-5
.
Duan
J.
,
Hu
C.
,
Zhan
X.
,
Zhou
H.
,
Liao
G.
&
Shi
T.
(
2022
)
MS-SSPCANet: A powerful deep learning framework for tool wear prediction
,
Robotics and Computer-Integrated Manufacturing
,
78
,
102391
.
https://doi.org/10.1016/j.rcim.2022.102391
.
Farhi
N.
,
Kohen
E.
,
Mamane
H.
&
Shavitt
Y.
(
2021
)
Prediction of wastewater treatment quality using LSTM neural network
,
Environmental Technology & Innovation
,
23
,
101632
.
https://doi.org/10.1016/j.eti.2021.101632
.
Gao
Z.
,
Chen
J.
,
Wang
G.
,
Ren
S.
,
Fang
L.
,
Yinglan
A.
&
Wang
Q.
(
2023
)
A novel multivariate time series prediction of crucial water quality parameters with Long Short-Term Memory (LSTM) networks
,
Journal of Contaminant Hydrology
,
259
,
104262
.
https://doi.org/10.1016/j.jconhyd.2023.104262
.
Gao
J.
,
Zhao
X.
,
Li
M.
,
Zhao
M.
,
Wu
R.
,
Guo
R.
,
Liu
Y.
&
Yin
D.
(
2024
)
SMLP4Rec: an Efficient all-MLP architecture for sequential recommendations
,
ACM Transactions on Information Systems
,
42
(
3
),
1
23
.
https://doi.org/10.1145/3637871
.
Gong
H.
,
Li
Y.
,
Zhang
J.
,
Zhang
B.
&
Wang
X.
(
2024
)
A new filter feature selection algorithm for classification task by ensembling Pearson correlation coefficient and mutual information
,
Engineering Applications of Artificial Intelligence
,
131
,
107865
.
https://doi.org/10.1016/j.engappai.2024.107865
.
Guo
B.
,
Qiao
Z.
,
Dong
H.
,
Wang
Z.
,
Huang
S.
,
Xu
Z.
,
Wu
F.
,
Huang
C.
&
Ni
Q.
(
2024
)
Temporal convolutional approach with residual multi-head attention mechanism for remaining useful life of manufacturing tools
,
Engineering Applications of Artificial Intelligence
,
128
,
107538
.
https://doi.org/10.1016/j.engappai.2023.107538
.
Jana
D. K.
,
Bhunia
P.
,
Adhikary
S. D.
&
Bej
B.
(
2022
)
Optimization of effluents using artificial neural network and support vector regression in detergent industrial wastewater treatment
,
Cleaner Chemical Engineering
,
3
,
100039
.
https://doi.org/10.1016/j.clce.2022.100039
.
Joseph
L. P.
,
Deo
R. C.
,
Casillas-Pérez
D.
,
Prasad
R.
,
Raj
N.
&
Salcedo-Sanz
S.
(
2024
)
Short-term wind speed forecasting using an optimized three-phase convolutional neural network fused with bidirectional long short-term memory network model
,
Applied Energy
,
359
,
122624
.
https://doi.org/10.1016/j.apenergy.2024.122624
.
Kok
Z. H.
,
Shariff
A. R. M.
,
Alfatni
M. S. M.
&
Khairunniza-Bejo
S.
(
2021
)
Support vector machine in precision agriculture: A review
,
Computers and Electronics in Agriculture
,
191
,
106546
.
https://doi.org/10.1016/j.compag.2021.106546
.
Kovacs
D. J.
,
Li
Z.
,
Baetz
B. W.
,
Hong
Y.
,
Donnaz
S.
,
Zhao
X.
,
Ding
H.
&
Dong
Q.
(
2022
)
Membrane fouling prediction and uncertainty analysis using machine learning: A wastewater treatment plant case study
,
Journal of Membrane Science
,
660
,
120817
.
https://doi.org/10.1016/j.memsci.2022.120817
.
Li
Z.
,
Li
L.
,
Chen
J.
&
Wang
D.
(
2024
)
A multi-head attention mechanism aided hybrid network for identifying batteries’ state of charge
,
Energy
,
286
,
129504
.
https://doi.org/10.1016/j.energy.2023.129504
.
Liu
D.
,
Dong
X.
,
Bian
D.
&
Zhou
W.
(
2023
)
Epileptic seizure prediction using attention augmented convolutional network
,
International Journal of Neural Systems
,
33
,
11
.
https://doi.org/10.1142/S0129065723500545
.
Macedo
H. E.
,
Lehner
B.
,
Nicell
J.
,
Grill
G.
,
Li
J.
,
Limtong
A.
&
Shakya
R.
(
2022
)
Distribution and characteristics of wastewater treatment plants within the global river network
,
Earth System Science Data
,
14
(
2
),
559
577
.
https://doi.org/10.5194/essd-14-559-2022
.
Malairajan
S.
&
Namasivayam
V.
(
2021
)
Management of phosphate in domestic wastewater treatment plants
,
Environmental Biotechnology
,
68
(
4
),
69
100
.
https://doi.org/10.1007/978-3-030-77795-1_3
.
Mehrani
M. J.
,
Bagherzadeh
F.
,
Zheng
M.
,
Kowal
P.
,
Sobotka
D.
&
Mąkinia
J.
(
2022
)
Application of a hybrid mechanistic/machine learning model for prediction of nitrous oxide (N2O) production in a nitrifying sequencing batch reactor
,
Process Safety and Environmental Protection
,
162
,
1015
1024
.
https://doi.org/10.1016/j.psep.2022.04.058
.
Ni
Q.
,
Cao
X.
,
Tan
C.
,
Peng
W.
&
Kang
X.
(
2023
)
An improved graph convolutional network with feature and temporal attention for multivariate water quality prediction
,
Environmental Science and Pollution Research
,
30
(
5
),
11516
11529
.
https://doi.org/10.1007/s11356-022-22719-0
.
Nourani
V.
,
Zonouz
R. S.
&
Dini
M.
(
2023
)
Estimation of prediction intervals for uncertainty assessment of artificial neural network based wastewater treatment plant effluent modeling
,
Journal of Water Process Engineering
,
55
,
104145
.
https://doi.org/10.1016/j.jwpe.2023.104145
.
Onu
M. A.
,
Ayeleru
O. O.
,
Oboirien
B.
&
Olubambi
P. A.
(
2023
)
Challenges of wastewater generation and management in sub-Saharan Africa: A review
,
Environmental Challenges
,
11
,
100686
.
https://doi.org/10.1016/j.envc.2023.100686
.
Pang
H.
,
Wu
L.
,
Liu
J.
,
Liu
X.
&
Liu
K.
(
2023
)
Physics-informed neural network approach for heat generation rate estimation of lithium-ion battery under various driving conditions
,
Journal of Energy Chemistry
,
78
,
1
12
.
https://doi.org/10.1016/j.jechem.2022.11.036
.
Rajaei
M.
&
Nazif
S.
(
2022
)
Improving wastewater treatment plant performance based on effluent quality, operational costs, and reliability using control strategies for water and sludge lines
,
Process Safety and Environmental Protection
,
167
,
398
411
.
https://doi.org/10.1016/j.psep.2022.09.012
.
Rathi
B. S.
,
Kumar
P. S.
&
Vo
D. V. N.
(
2021
)
Critical review on hazardous pollutants in water environment: Occurrence, monitoring, fate, removal technologies and risk assessment
,
Science of The Total Environment
,
797
,
149134
.
https://doi.org/10.1016/j.scitotenv.2021.149134
.
Roohi
A. M.
,
Nazif
S.
&
Ramazi
P.
(
2024
)
Tackling data challenges in forecasting effluent characteristics of wastewater treatment plants
,
Journal of Environmental Management
,
354
,
120324
.
https://doi.org/10.1016/j.jenvman.2024.120324
.
Safeer
S.
,
Pandey
R. P.
,
Rehman
B.
,
Safdar
T.
,
Ahmad
I.
,
Hasan
S. W.
&
Ullah
A.
(
2022
)
A review of artificial intelligence in water purification and wastewater treatment: Recent advancements
,
Journal of Water Process Engineering
,
49
,
102974
.
https://doi.org/10.1016/j.jwpe.2022.102974
.
Saravanan
A.
,
Kumar
P. S.
,
Jeevanantham
S.
,
Karishma
S.
,
Tajsabreen
B.
,
Yaashikaa
P. R.
&
Reshma
B.
(
2021
)
Effective water/wastewater treatment methodologies for toxic pollutants removal: Processes and applications towards sustainable development
,
Chemosphere
,
280
,
130595
.
https://doi.org/10.1016/j.chemosphere.2021.130595
.
Sharafi
M.
,
Rezaverdinejad
V.
,
Behmanesh
J.
&
Samadianfard
S.
(
2024
)
Development of long short-term memory along with differential optimization and neural networks for coagulant dosage prediction in water treatment plant
,
Journal of Water Process Engineering
,
65
,
105784
.
https://doi.org/10.1016/j.jwpe.2024.105784
.
Shen
Z.
,
Fan
X.
,
Zhang
L.
&
Yu
H.
(
2022
)
Wind speed prediction of unmanned sailboat based on CNN and LSTM hybrid neural network
,
Ocean Engineering
,
254
,
111352
.
https://doi.org/10.1016/j.oceaneng.2022.111352
.
Sun
J.
,
Xu
Y.
,
Nairat
S.
,
Zhou
J.
&
He
Z.
(
2023
)
Prediction of biogas production in anaerobic digestion of a full-scale wastewater treatment plant using ensembled machine learning models
,
Water Environment Research
,
95
(
6
),
e10893
.
https://doi.org/10.1002/wer.10893
.
Sun
Y.
,
Zhou
Q.
,
Sun
L.
,
Sun
L.
,
Kang
J.
&
Li
H.
(
2024
)
CNN–LSTM–AM: A power prediction model for offshore wind turbines
,
Ocean Engineering
,
301
,
117598
.
https://doi.org/10.1016/j.oceaneng.2024.117598
.
Tan, R., Wang, Z., Wu, T. & Wu, J. (2023) A data-driven model for water quality prediction in Tai Lake, China, using secondary modal decomposition with multidimensional external features, Journal of Hydrology-Region Studies, 47, 101435. http://doi.org/10.1016/j.ejrh.2023.101435
.
Wang
M.
&
Ying
F.
(
2023
)
Point and interval prediction for significant wave height based on LSTM-GRU and KDE
,
Ocean Engineering
,
289
,
116247
.
https://doi.org/10.1016/j.oceaneng.2023.116247
.
Wang
R.
,
Yu
Y.
,
Chen
Y.
,
Pan
Z.
,
Li
X.
,
Tan
Z.
&
Zhang
J.
(
2022
)
Model construction and application for effluent prediction in wastewater treatment plant: Data processing method optimization and process parameters integration
,
Journal of Environmental Management
,
302
,
114020
.
https://doi.org/10.1016/j.jenvman.2021.114020
.
Wang
Z.
,
Wang
Q.
&
Wu
T.
(
2023
)
A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM
,
Frontiers of Environmental Science & Engineering
,
17
(
7
),
88
.
https://doi.org/10.1007/s11783-023-1688-y
.
Wang
Y.
,
Xie
W.
,
Liu
C.
,
Luo
J.
,
Qiu
Z.
&
Deconinck
G.
(
2024a
)
Forecast of coal consumption in salt lake enterprises based on temporal gated recurrent unit network with squeeze-and-excitation attention
,
Energy
,
299
,
131405
.
https://doi.org/10.1016/j.energy.2024.131405
.
Wang
Z.
,
Xu
N.
,
Bao
X.
,
Wu
J.
&
Cui
X.
(
2024b
)
Spatio-temporal deep learning model for accurate streamflow prediction with multi-source data fusion
,
Environmental Modelling & Software
,
178
,
106091
.
https://doi.org/ 10.1016/j.envsoft.2024.106091
.
Wu
F.
,
He
J.
,
Cai
L.
,
Du
M.
&
Huang
M.
(
2023a
)
Accurate multi-objective prediction of CO2 emission performance indexes and industrial structure optimization using multihead attention-based convolutional neural network
,
Journal of Environmental Management
,
337
,
117759
.
https://doi.org/10.1016/j.jenvman.2023.117759
.
Wu
J.
,
Wang
Z.
,
Dong
J.
,
Cui
X.
,
Tao
S.
&
Chen
X.
(
2023b
)
Robust runoff prediction with explainable artificial intelligence and meteorological variables from deep learning ensemble model
,
Water Resources Research
,
59
(
9
),
e2023WR035676
.
https://doi.org/10.1029/2023WR035676
.
Xia
H.
,
Wang
Y.
,
Zhang
J. Z.
,
Zheng
L. J.
,
Kamal
M. M.
&
Arya
V.
(
2023
)
COVID-19 fake news detection: A hybrid CNN-BiLSTM-AM model
,
Technological Forecasting and Social Change
,
195
,
122746
.
https://doi.org/10.1016/j.techfore.2023.122746
.
Xu
Y.
,
Zeng
X.
,
Bernard
S.
&
He
Z.
(
2022
)
Data-driven prediction of neutralizer pH and valve position towards precise control of chemical dosage in a wastewater treatment plant
,
Journal of Cleaner Production
,
348
,
131360
.
https://doi.org/10.1016/j.jclepro.2022.131360
.
Xu
Y.
,
Kohtz
S.
,
Boakye
J.
,
Gardoni
P.
&
Wang
P.
(
2023
)
Physics-informed machine learning for reliability and systems safety applications: State of the art and challenges
,
Reliability Engineering & System Safety
,
230
,
108900
.
https://doi.org/10.1016/j.ress.2022.108900
.
Xu
G.
,
Yin
J.
,
Zhang
S.
&
Gong
M.
(
2024
)
MLP-AIR: An effective MLP-based module for actor interaction relation learning in group activity recognition
,
Knowledge-Based Systems
,
304
,
112453
.
https://doi.org/10.1016/j.knosys.2024.112453
.
Yang
H.
,
Li
X.
,
Li
X.
,
Wang
X.
,
Ma
H.
&
Zheng
X.
(
2024
)
The fate of antibiotic resistance genes and their correlation with microbial communities and wastewater quality/parameters in a wastewater treatment plant under different seasons
,
Journal of Water Process Engineering
,
60
,
105156
.
https://doi.org/10.1016/j.jwpe.2024.105156
.
Ye
L.
,
Wang
Z.
,
Liu
Y.
,
Chen
P.
,
Li
H.
,
Zhang
H.
,
Wu
M.
,
He
W.
,
Shen
L.
,
Zhang
Y.
,
Tan
Z.
,
Wang
Y.
&
Huang
R.
(
2021
)
The challenges and emerging technologies for low-power artificial intelligence IoT systems
,
IEEE Transactions on Circuits and Systems I: Regular Papers
,
68
(
12
),
4821
4834
.
https://doi.org/10.1109/TCSI.2021.3095622
.
Zhang
Y.
,
Zhou
T.
,
Huang
X.
,
Cao
L.
&
Zhou
Q.
(
2021
)
Fault diagnosis of rotating machinery based on recurrent neural networks
,
Measurement
,
171
,
108774
.
https://doi.org/10.1016/j.measurement.2020.108774
.
Zhou
G.
,
Guo
Z.
,
Sun
S.
&
Jin
Q.
(
2023
)
A CNN-BiGRU-AM neural network for AI applications in shale oil production prediction
,
Applied Energy
,
344
,
121249
.
https://doi.org/10.1016/j.apenergy.2023.121249
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data