ABSTRACT
This study proposes a novel approach for predicting variations in water quality at wastewater treatment plants (WWTPs), which is crucial for optimizing process management and pollution control. The model combines convolutional bi-directional gated recursive units (CBGRUs) with adaptive bandwidth kernel function density estimation (ABKDE) to address the challenge of multivariate time series interval prediction of WWTP water quality. Initially, wavelet transform (WT) was employed to smooth the water quality data, reducing noise and fluctuations. Linear correlation coefficient (CC) and non-linear mutual information (MI) techniques were then utilized to select input variables. The CBGRU model was applied to capture temporal correlations in the time series, integrating the Multiple Heads of Attention (MHA) mechanism to enhance the model's ability to comprehend complex relationships within the data. ABKDE was employed, supplemented by bootstrap to establish upper and lower bounds of the prediction intervals. Ablation experiments and comparative analyses with benchmark models confirmed the superior performance of the model in point prediction, interval prediction, the analysis of forecast period, and fluctuation detection for water quality data. Also, this study verifies the model's broad applicability and robustness to anomalous data. This study contributes significantly to improved effluent treatment efficiency and water quality control in WWTPs.
HIGHLIGHTS
Proposing a new hybrid interval prediction model for water quality in WWTPs.
Identify multivariate input variables using linear CC and non-linear MI techniques.
Reducing water quality noise and revealing data trends using wavelet transforms.
Using GSS and bootstrap methods to determine optimal bandwidth and upper and lower limits.
Peak, mutation and different time period predictions validate model applicability.
ABBREVIATIONS
- ABKDE
adaptive bandwidth kernel function density estimation
- AI
artificial intelligence
- ANN
artificial neural network
- AM
attention mechanism
- BPNN
back propagation neural network
- BiGRU
Bi-directional Gated Recurrent Unit
coefficient of determination
- CRPS
continuous ranked probability score
- CBGRU
Convolutional Bi-directional Gated Recurrent Unit
- CNN
convolutional neural network
- CC
correlation coefficient
- CWC
coverage width-based criterion
- DL
deep learning
- GRU
gated recurrent unit
- GSS
Golden Section Search
- KNN
k-nearest neighbor
- LSTM
long short-term memory
- ML
machine learning
- MAE
mean absolute error
- MAPE
mean absolute percentage error
- MHA
Multi-Head Attention
- MI
mutual information
- PICP
PI Coverage Probability
- PINAW
PI Normalized Averaged Width
- PP
point prediction
- PI
Prediction Interval
- RF
random forest
- RMSE
root mean square error
- RNN
recurrent neural network
- SVM
support vector machine
- TCN
temporal convolutional network
- WWTP
wastewater treatment plant
- WT
wavelet transform
INTRODUCTION
As urbanization accelerates and populations surge alongside increased industrial and agricultural activities, wastewater treatment emerges as a pivotal endeavor within the realm of environmental concerns. The effective treatment of wastewater is not only crucial for human health and quality of life, but also directly impacts the maintenance of ecosystems (Onu et al. 2023). The wastewater treated in wastewater treatment plants (WWTPs) contains a significant amount of organic substances, heavy metals, micro-organisms, and other harmful components (Saravanan et al. 2021). If not managed scientifically and effectively, the direct discharge of these pollutants into water bodies can severely jeopardize water quality and ecological balance, potentially endangering human health (Rathi et al. 2021). Therefore, to optimize and manage wastewater treatment, it is essential to research and forecast water quality indicators in WWTPs (Aghdam et al. 2023).
The construction of WWTPs has been a global phenomenon for some time now, but the accurate and scientific forecasting of wastewater still faces several challenges (Roohi et al. 2024). WWTPs involve intricate biological and chemical processes, leading to a high degree of variability and classification of discharges. This complexity stems from the diverse sources of wastewater, including industrial effluents, domestic discharges, as well as commercial and public waste streams. As a result, the operation of WWTPs becomes inherently complex due to the varied compositions, pH levels, and flow rates encountered (Yang et al. 2024). Second, water quality characteristics and treatment needs vary significantly from region to region. Industrial cities typically have a higher proportion of industrial wastewater compared to other cities, while habitable cities may mainly deal with household-generated wastewater (Macedo et al. 2022). Third, climate change and natural disasters introduce uncertainties to wastewater systems. Elevated temperatures can decrease the efficiency of microbial organic matter treatment, thereby affecting wastewater treatment effectiveness negatively. Additionally, water pollution events like eutrophication and hydrophobization near WWTPs can further impact treatment processes and the ability to predict water quality outcomes (Malairajan & Namasivayam 2021).
Historically, mathematical models and linear regression techniques have been the predominant forecasting tools for water quality data in WWTPs. These models, such as statistical and fuzzy models, are used to simulate and understand the parametric relationships among different variables during the operational stages of wastewater treatment (Tan et al. 2023; Wang et al. 2023; Bankole et al. 2024).
Real-time monitoring and control of key treatment parameters can be achieved through computer modeling. By modifying treatment processes, controls, operations, and chemical dosing at each stage of the process, wastewater treatment constraints can be met to achieve optimal treatment with minimal resources and costs. For instance, Wang et al. (2022) developed a prediction model for chemical oxygen demand (COD) in WWTP effluent using statistical correlation methods, while Abouzari et al. (2021) utilized linear and non-linear statistical models to estimate COD in WWTPs. However, relying solely on these traditional methods can be time-consuming, involve lengthy processes, and fail to capture the overall non-linear relationships and complex dynamics present in water treatment processes. Additionally, these methods may struggle to adapt to dynamic treatment requirements (Safeer et al. 2022). Moreover, traditional methodologies often simplify processes based on unrealistic assumptions and idealized conditions (Sharafi et al. 2024), highlighting the need for more flexible and adaptable approaches (Wu et al. 2023b).
Artificial intelligence (AI) is a prominent and rapidly advancing technology in various fields (Ye et al. 2021). Recent studies have delved into the behavior of WWTPs in meeting effluent quality standards by employing AI techniques (Rajaei & Nazif 2022). AI methods, particularly machine learning (ML), include deep learning (DL), which specializes in modeling real-time issues with high complexities (Xu et al. 2023). DL techniques consist of artificial neural networks (ANNs) (Zhang et al. 2021), long short-term memory (LSTM) networks (Joseph et al. 2024), among others. ML approaches encompass support vector machine (SVM), K-nearest neighbor (KNN) (Xu et al. 2022), and random forest (RF) (Kok et al. 2021). Furthermore, combining multiple AI methods and utilizing optimization techniques like Genetic Algorithms and Gradient Descent (Jana et al. 2022; Kovacs et al. 2022) can yield improved outcomes. The modeling capabilities of AI offer significant benefits in predicting water quality and optimizing wastewater treatment processes. They enable autonomous analysis, evaluation, and prediction based on input water quality data, optimizing system variables, and issuing alerts to adjust outputs accordingly. This not only reduces human error but also enhances productivity (Safeer et al. 2022; Gao et al. 2023).
In the realm of AI-based modeling for WWTPs, the predominant focus often centers on point prediction (PP) of samples (Farhi et al. 2021). PP involves a model that maps one or a set of inputs to a singular output value. However, these PP methods frequently overlook the inherent uncertainty associated with modeling various outputs. Among the plethora of techniques available for quantifying uncertainty, prediction interval (PI) stands out as one of the most effective methods for exploring uncertainty (Nourani et al. 2023). The PI consists of upper and lower bounds that define the range in which the uncertainty prediction is situated. This uncertainty prediction is calculated by an independent predictor, which leverages dependent targets within a specified range to determine a certain level of confidence (Chen et al. 2024a). The predominant methods utilized for predicting WWTP intervals predominantly rely on ANNs (Bagherzadeh et al. 2021; Mehrani et al. 2022; Nourani et al. 2023; Sun et al. 2023). Nevertheless, multilayer perceptron (MLP) does exhibit certain limitations (Pang et al. 2023). For instance, the efficacy of MLP in handling data is heavily reliant on data representation and feature engineering (Gao et al. 2024). MLP is susceptible to overfitting and a decline in generalization performance to data (Xu et al. 2024). Furthermore, MLPs are not specifically tailored to handle sequential data and may not perform as adeptly as models like the gated recurrent unit (GRU) on time series data (Yao et al. 2023; Chen et al. 2024b; Cui et al. 2024; Dong et al. 2024; Wang et al. 2024a, b).
In the wastewater treatment process, the quality of effluent water is one of the key indicators for assessing treatment effectiveness and environmental compliance. Particularly, pH is a critical chemical parameter in effluent water that directly impacts the health of ecosystems and the legal discharge standards. Predicting the pH value of effluent water holds significant importance for the following reasons:
(1) Process control and optimization. pH is an important indicator of water stability and allows stability monitoring. In wastewater treatment, maintaining pH within a desirable range helps to ensure the effectiveness of biochemical treatment processes, such as activated sludge, which are highly pH sensitive. By predicting pH trends, the amount of chemicals to be added, the amount of aeration and other operating parameters in the treatment process can be adjusted in advance to optimize the overall treatment process, reduce operating costs and improve treatment efficiency.
(2) Environmental protection. Although the acceptable range is 6–9, extreme fluctuations in pH can have an impact on downstream water bodies, especially if discharges deviate from this range, and can disrupt the ecological balance of the receiving water body. Therefore, accurate prediction of pH can be used as an early warning mechanism for potential environmental risks.
(3) Data-driven decision support. By continuously monitoring and predicting pH, valuable data can be collected that can be used to analyze long-term trends in the treatment process and guide future plant upgrades and technology choices.
Therefore, this study aims to enhance the management level and effluent safety of WWTPs by accurately predicting the pH value of effluent water. This prediction is of significant practical importance for achieving process control and optimization, environmental protection, and data-driven decision support.
In order to address the challenge of inadequate prediction of multivariate water quality data in WWTPs and the presence of correlations among multiple wastewater indicators that traditional prediction models struggle to capture (Ni et al. 2023), this paper introduces a novel approach. The proposed method, a multivariate adaptive bandwidth kernel density interval prediction based on CBGRU, aims to predict water quality data in WWTPs. The model is evaluated using real water quality data and compared against other multivariate time series forecasting models. Results indicate that the model performs exceptionally well in both point and interval predictions of water quality data in WWTPs. This study offers several key contributions:
(1) A feature attention mechanism (AM) based on MHA is developed to enhance the identification of potential correlations among various WWTP indicators, improving the robustness of multivariate time series prediction.
(2) The model incorporates convolutional neural network (CNN) and Bi-directional Gated Recurrent Unit (BiGRU) layers to effectively capture local and global features.
(3) Interval prediction is achieved through ABKDE, utilizing regression models to establish upper and lower bounds, with model parameters updated using stochastic gradient descent.
(4) Experimental results demonstrate the model's capability to predict water quality data with peaks and significant fluctuations, showcasing excellent performance in both point and interval prediction of water quality data in WWTPs. Additionally, the model exhibits remarkable suitability and robustness, further enhancing its utility and reliability in real-world applications.
While the effluent quality of WWTPs is primarily determined by influent water quality and process parameters, in practical operation, historical data of effluent quality also holds significant value. Through real-time monitoring and historical data of effluent quality, we can forecast changes in effluent pH trends in advance, which is crucial for short-term control at WWTPs. Compared to complex multivariate modeling, predicting based on effluent data can simplify the modeling complexity of the treatment process and to some extent capture the lagged response of the treatment system. Therefore, the proposed prediction model based on effluent pH data in this study aims to provide a simplified yet effective early warning mechanism for WWTPs, enhancing flexibility in effluent water quality management.
The subsequent steps in this study are outlined as follows: Section 2 provides a detailed explanation of the main models used in the study, as well as the evaluation metrics employed. Section 3 analyses the data and models used in the study, presenting and interpreting the results. Section 4 includes ablation experiments, as well as experiments comparing PP and interval prediction models. It also covers the analysis of forecast period, special data handling, suitability, and robustness. Finally, Section 5 summarizes the key findings and issues discussed in the study.
METHODS
Feature selection
Modeling inefficiencies in water quality data parameters of WWTPs stem from the presence of multiple candidate inputs and outputs. Including all but one output variable in the input variables can lead to increased noise (Alvi et al. 2023). Filtering and optimizing these inputs and outputs is essential to identify the most effective sets and combinations, thereby reducing irrelevance and alleviating high computational demands.
In the formula, n represents the number of data points, and represent the mean values of the x and y samples, respectively. The closer the CC is to 1 or −1, the stronger the linear relationship between the two variables; whereas, the closer it is to 0, the weaker the linear relationship. A CC above 0 indicates a positive correlation, while a CC below 0 indicates a negative correlation.
The selection of predictors was based on the evaluation of Pearson CC and MI measures between the candidate predictors and the target variables. The predictor exhibiting the highest MI and CC values was identified as the primary predictor for modeling purposes. While CC is commonly used in selecting data for linear regression algorithms, MI quantifies the information one random variable conveys about another (Gong et al. 2024). Therefore, in the context of WWTP effluent data parameters, the CC and MI methodology was utilized to establish the relationship (Nourani et al. 2023).
Wavelet transform
In the WT formula, represents the coefficients obtained through WT, denotes the original signal, signifies the wavelet function, denotes the conjugate of the wavelet function, a stands for the scale parameter, and b represents the translation parameter. By experimenting with different wavelet functions and parameters, the optimal combination was determined.
In the Min–Max Normalization formula, Y represents the processed data, and X denotes the original data. and denote the minimum and maximum values of X, respectively.
CBGRU
The CBGRU model, which combines CNN and BiGRU, is primarily utilized for predicting multivariate time series data. In this study, the CBGRU model employs a CNN to extract features from the time series data of various variables. Subsequently, these features are inputted into a BiGRU for sequence modeling. Finally, the model utilizes a fully connected layer to generate the ultimate prediction.
CNNs are influenced by visual neuroscience and are mainly composed of convolutional and pooling layers. The convolution layer captures local features from the input data while maintaining spatial relationships. In contrast, the pooling layer decreases the dimensionality of hidden layers using methods like maximal or average pooling, which helps reduce computational complexity and introduce rotational invariance. The architecture of a CNN is illustrated in Supplementary material, Figure S1.
A pooling layer acts as a downsampling mechanism, combining the outputs of a cluster of neurons from a previous layer into a single neuron in a lower layer. This pooling process takes place after the non-linear activation function. Its main purposes are to decrease the parameter count to avoid overfitting and to act as a filter to remove unwanted noise (Duan et al. 2022).
BiGRU is a neural network structure used for processing sequential data, serving as a variation of the GRU model. The GRU itself is a type of recurrent neural network (RNN) specifically designed to combat the issue of vanishing or exploding gradients often faced by traditional RNNs when dealing with long sequences of data. While a standard GRU can only process data in one direction, BiGRU is unique in that it processes information simultaneously from both the front to the back and the back to the front. This dual-directional processing capability makes BiGRU well-suited for tasks where the temporal context of the data is crucial. The architectural layout of BiGRU is illustrated in Supplementary material, Figure S2.
The mathematics of the BiGRU model can be represented by the following equations:
CNNs play a crucial role in reducing the need for manual feature engineering by autonomously learning and extracting features from data, especially beneficial for tasks with intricate data patterns. The multi-layer architecture of CNNs enables them to progressively grasp data features from low to high levels, with each convolutional layer uncovering varying levels of information to effectively identify complex patterns. This characteristic makes CNNs highly efficient for processing spatially or temporally correlated data like images and sequences.
Despite feature selection and normalization efforts, intricate patterns and relationships may still persist within the data. CNNs excel in refining these features through an automated feature learning process, enhancing their utility for predictive tasks. By optimizing hyperparameters such as the number of convolutional layers, convolutional kernel size, and step size, the model configuration can be fine-tuned to enhance prediction accuracy and generalization capabilities.
The utilization of BiGRU models proves particularly valuable when working with time series data. BiGRU models can effectively capture both long-term and short-term dependencies within time series data by integrating past and future information through their bi-directional structure, thereby enhancing the comprehension of the current state and forecasting accuracy. This model is particularly well-suited for forecasting tasks that encounter challenges like instrument delays and irregular sampling rates, as it can unveil intricate patterns and dependencies that traditional statistical methods might overlook.
In conclusion, CNNs and BiGRU models are uniquely positioned to capture and integrate complex relationships within time series data, proving highly effective in understanding and predicting data with intricate dependencies.
Multi-head AM
In order to improve the interactions between multiple water quality data metrics of WWTPs, a feature AM based on a MHA mechanism is used. The MHA mechanism is a powerful sequence processing technique that can simultaneously focus on multiple positions within a sequence, allowing for the examination of complex relationships among different segments. This mechanism increases the model's ability to detect important features, especially when dealing with long sequences, leading to a more comprehensive understanding of dynamics.
ABKDE
Water quality data from WWTPs typically consists of time series observations on various parameters related to the wastewater treatment process. This data often reveals complex interrelationships between variables, as well as seasonal or cyclical patterns and potential outliers. The ABKDE method is employed to derive the probability density function from the data, allowing for a better understanding of the distribution's characteristics and uncovering potential relationships between variables.
The cost functions for evaluating these two points are and . Adjust the intervals a and b according to the results of the comparison of and . If < , update b = . Otherwise, update a = . When is less than a certain tolerance, the iteration stops. The optimal bandwidth is determined by the final interval found between a and b, which contains the minimum value of the objective function.
The upper and lower bounds of the interval are determined using the bootstrap method. Multiple datasets are generated by randomly sampling with replacement. The number of samples is determined by the Poisson distribution. Each bootstrap sample set undergoes local bandwidth optimization and kernel density estimation to calculate the probability density function. This process is repeated multiple times to obtain probability density functions for various bootstrap sample sets. Quantile values are calculated for each sample point based on a certain confidence level, using all bootstrap sample sets. The resulting upper and lower bounds of the interval provide information on confidence intervals, allowing for statistical inference of the estimated probability density function.
Model performance indicators
The MAE is a non-negative value, with a smaller MAE indicating a better model. RMSE is used to indicate how much error is introduced into the model predictions. It is more sensitive to larger error values. Again, the smaller the RMSE, the better. MAPE is more useful for assessing percentage error. A smaller MAPE indicates a better model. measures how well the model explains the variance of the target variable, with values closer to 1 indicating that the model explains the variable better. If , it means that the model is not as good as the baseline model, and it is likely that there is no linear relationship in the data. The advantage of is that it is easier to see the gaps between models.
Commonly used assessment metrics when evaluating interval predictions are PI Coverage Probability (PICP), PI Normalized Averaged Width (PINAW), coverage width-based criterion (CWC), and continuous ranked probability score (CRPS).
and are the parameters used to determine the penalty levels. is determined by the confidence interval. For example, if the confidence level is 90%, then μ = 0.9. The difference between PICP and is amplified and the magnitude of this amplification is controlled by , as defined by CWC. The smaller , the smaller the penalty for PICP not meeting the confidence level and the lower the importance level. If PICP reaches the confidence level, then CWC = PINAW.
Model framework
(1) Data preparation and pre-processing, which involves dataset preparation from two WWTPs, feature selection, and data pre-processing.
(2) Constructing the network layer, where the network architecture is assembled with components such as CNN, BiGRU, and MHA.
(3) Interval prediction, where the merged representations are fed into a fully connected layer or suitable layer for interval prediction using ABKDE. The optimal bandwidth is determined using the GSS method, and the upper and lower bounds of the interval prediction are obtained through the bootstrap method.
(4) Model Prediction and Evaluation, which includes predicting and evaluating results on two WWTP test datasets, with additional post-processing of results as needed.
The CNN–BIGRU–MHA model offers precise point predictions that allow managers to monitor water quality conditions in real time and make prompt adjustments. On the other hand, the ABKDE model provides interval predictions of water quality metrics, helping decision-makers assess uncertainty and make more robust treatment plans. For example, in day-to-day operations, managers could use the CNN–BIGRU–MHA model's point predictions to quickly adjust the treatment process while also relying on the ABKDE model's interval predictions to prepare for potential future changes in water quality. This approach will help demonstrate the overall value of the models in real-world applications.
DATA AND RESULTS
Data description
Two datasets are utilized in this study, one collected from a WWTP in Shanghai, China and the other from a WWTP in Zhejiang, China. These datasets consist of water quality information related to the effluent process of the WWTPs. The geographical details are presented in Supplementary material, Figure S4. To simplify discussions, these datasets are denoted as Dataset A and Dataset B, covering the time period from 8 November 2020 to 31 December 2023. Measurements were taken at 4-h intervals, with daily sampling times set at 0:00, 4:00, 8:00, 12:00, 16:00, and 20:00, each representing a single data point. However, certain time points have missing data. The raw data collected includes parameters such as monitoring time, water temperature, pH, pH category, dissolved oxygen, dissolved oxygen category, permanganate index, permanganate category, ammonia nitrogen, ammonia nitrogen category, total phosphorus, total nitrogen, conductivity, turbidity, chlorophyll and algal density. Notably, the pH category, dissolved oxygen category, permanganate category, ammonia nitrogen category, chlorophyll and algal density exhibit a substantial amount of missing values, rendering them unsuitable for modeling purposes. The remaining parameters need to be analyzed, with monitoring time serving as a time series variable. Supplementary material, Figure S5 depicts the raw data trends for the target predictor pH in this study. Dataset A and Dataset B contain 6,365 and 6,207 raw data entries, respectively.
Upon analysis of the raw data obtained from the two WWTPs, it was noted that there are missing values and anomalies present. Specifically, negative numbers were found in data related to parameters such as permanganate index, ammonia, total phosphorus, total nitrogen, conductivity, and turbidity, which is clearly illogical. Utilizing such noisy raw data in a predictive model can potentially compromise the accuracy of the data-driven model and lead to significant prediction errors. Therefore, it is crucial to address these issues before proceeding with the modeling process (Wang & Ying 2023). To handle missing values, linear interpolation methods are utilized. Since abnormal data occurrences are limited, the most effective approach is to directly remove outliers (Shen et al. 2022). Supplementary material, Table S1 displays the attributes of the processed Datasets A and B, including maximum, minimum, mean, and standard deviation.
In practical operations, the quality of effluent from WWTPs serves as a critical indicator of system performance and environmental compliance. The reasons for selecting effluent quality data are as follows:
(1) Real-time monitoring and adaptive management: In an operational environment, WWTPs can utilize real-time monitoring data of effluent quality to predict future changes based on historical trends. This predictive approach provides a foundation for dynamically adjusting treatment parameters, enabling early warnings of fluctuations in effluent quality, particularly in the face of uncertain external factors such as variations in environmental temperature or sudden increases in load.
(2) Simplifying complexity: Although influent information and treatment variables significantly impact effluent quality, the high complexity and uncertainty of the treatment process (e.g., variations in aeration rates, changes in microbial activity) make it very challenging to construct a comprehensive model that encompasses all variables. In contrast, using historical data of effluent quality for predictions mitigates this complexity and allows for quick and effective short-term warnings, aiding operators in making real-time adjustments.
While effluent quality is heavily influenced by influent information and treatment variables, it also exhibits a certain degree of autocorrelation, particularly in short-term predictions. This autocorrelation may manifest in the lagged effects of treatment facilities, where changes in effluent do not occur instantaneously but rather as cumulative responses to prior operations. The methods employed in this study leverage historical effluent pH data to capture these change trends and provide a reliable basis for short-term predictions. Our model utilizes DL techniques, particularly sequence models like BiGRU, to identify underlying patterns and long-term dependencies in time series data, enhancing the recognition and utilization of autocorrelation.
Experimental results demonstrate that the prediction framework based on effluent pH data performs well in short-term forecasts, especially regarding pH trends over the next few time steps. This study constructs a pH prediction model centered on CNN–BIGRU–MHA–ABKDE using historical effluent data. Although effluent quality is significantly influenced by influent conditions and treatment parameters, the effluent data itself exhibits a degree of autocorrelation in the time series. Leveraging the time series feature extraction capabilities of DL models, this method can accurately predict changes in effluent pH over short durations, providing operators with real-time adjustment references and enhancing the system's adaptive capacity.
Data smoothing
This paper uses WT to smooth the data, as shown in Supplementary material, Figure S6. The wavelet packet decomposition employs the Daubechies 10th order wavelet as its basis function, with a decomposition level of 6.
Based on the WT smoothing outcomes observed in datasets A and B, it is evident that WT mitigates the influence of noisy data to a certain degree while preserving the inherent trends within the datasets.
Data correlation analysis
Model parameters
It is essential to specify the water quality variables as they can significantly impact the model's predictive performance. The input parameters of the model include water temperature, dissolved oxygen, permanganate index, total phosphorus, total nitrogen, conductivity, and turbidity, with the target predictor being pH. The model operates on a 64-bit computer running Windows 11. Data pre-processing, correlation analysis, and WT operations were conducted in the Jupyter Notebook environment of Anaconda3 software (July 2020 version, Python 3.8), while both PP and interval prediction experiments were carried out in the MATLAB R2022b environment.
Wavelet packet decomposition utilizes the Daubechies 10th order wavelet with a decomposition level of 6. The datasets are split into training and test sets, sorted in time series order, with the top 70% as the training set and 30% as the test set. The training set to dataset ratio is 0.7. The experimental results discussed in this paper are solely based on the training set. The key parameters of CNN and BiGRU models are outlined in Supplementary material, Table S2.
Results and analysis
PP results and analysis
The CBGRU–MHA model was utilized in this study to generate PP results for both datasets. The black line in Supplementary material, Figure S7 represents the real value of the target predictor pH, while the red line represents the forecasted value. The close alignment between the curves of the forecast values and the real values indicates minimal deviation between the model's predictions and the actual values, showcasing excellent predictive performance. The evaluation results in Table 1 reveal that the of the model achieved 0.97 for both datasets, demonstrating a strong ability to explain the data and capture most of the variability in both the training and test sets. Additionally, the low MAE, RMSE, and MAPE values signify minimal model error, high precision, and good accuracy and stability, further confirming the exceptional PP performance of the model.
. | MAE . | RMSE . | MAPE . | . | Confidence level . | PICP . | PINAW . | CWC . | CRPS . |
---|---|---|---|---|---|---|---|---|---|
Dataset A | 0.0403 | 0.0521 | 0.46% | 0.9943 | 95% | 0.9454 | 0.0814 | 1.0837 | 0.1233 |
90% | 0.8409 | 0.0540 | 1.0840 | 0.1255 | |||||
85% | 0.7350 | 0.0422 | 1.1419 | 0.1256 | |||||
Dataset B | 0.0643 | 0.0816 | 0.80% | 0.9797 | 95% | 0.8957 | 0.1230 | 1.1505 | 0.2350 |
90% | 0.7932 | 0.1105 | 1.1654 | 0.2352 | |||||
85% | 0.7142 | 0.0740 | 1.1851 | 0.2353 |
. | MAE . | RMSE . | MAPE . | . | Confidence level . | PICP . | PINAW . | CWC . | CRPS . |
---|---|---|---|---|---|---|---|---|---|
Dataset A | 0.0403 | 0.0521 | 0.46% | 0.9943 | 95% | 0.9454 | 0.0814 | 1.0837 | 0.1233 |
90% | 0.8409 | 0.0540 | 1.0840 | 0.1255 | |||||
85% | 0.7350 | 0.0422 | 1.1419 | 0.1256 | |||||
Dataset B | 0.0643 | 0.0816 | 0.80% | 0.9797 | 95% | 0.8957 | 0.1230 | 1.1505 | 0.2350 |
90% | 0.7932 | 0.1105 | 1.1654 | 0.2352 | |||||
85% | 0.7142 | 0.0740 | 1.1851 | 0.2353 |
In addition, comparing the PP results of the CBGRU–MHA model on Dataset A and Dataset B revealed that the model exhibits superior performance on Dataset A.This discrepancy can be attributed to the significant variability of the predictor variable pH in Dataset B, which posed challenges for accurate model predictions due to its fluctuating nature. Nonetheless, it is noteworthy that the PP results for both datasets are commendable and exhibit minimal disparities, underscoring the model's strong generalizability and robustness.
Interval prediction results and analysis
Although the model in PP demonstrates outstanding performance, the individual measurements from PP results lack uncertainty on the assay, which may not be adequate for decision-making purposes (Wang & Ying 2023). To address this issue, the study employs ABKDE to generate interval prediction results with varying confidence intervals. The interval prediction results from this model for both datasets are illustrated in Supplementary material, Figure S8.
Within each confidence interval, the study observes that the model shows a diverse performance, as reflected in the trends of the different evaluation metrics (Table 1). This diversity further reveals differences in the predictive accuracy and reliability of the models in dealing with different confidence levels. Observation of Table 1 leads to the following conclusions: The values of PICP for the model at the 95% confidence interval are 0.9454 and 0.8957, respectively, indicating that the model is able to capture the actual observations at the 95% confidence level and the range of observations is well covered. The values of PINAW are 0.0814 and 0.1230, respectively, indicating that the average width of the predicted values is relatively small and the dispersion is also small. The values of CWC are 1.0837 and 1.1505, respectively, indicating moderate coverage of the range of observations. And the values of CRPS are 0.1233 and 0.2350, respectively, indicating that the model performs well in terms of cumulative distribution residuals. As the confidence interval narrows, the range of observations captured by the model narrows, while the CWC and CRPS values show a gradual increase but are still within reasonable limits. It is possible that this is due to the fact that the model is more uncertain about future observations when the confidence intervals are reduced, and that the model may miss certain edge cases or outliers, resulting in a slight decrease in the accuracy of the model. However, overall, the model has better interval prediction performance at different confidence interval levels. In addition, this study found that the width of the intervals decreases as the confidence interval level increases, resulting in interval predictions that provide smaller PIs, i.e. more detailed information, suggesting that the proportion of true values falling within the PIs decreases as the confidence interval level increases. On this basis, the predictive effects at different confidence levels are compared and, as CWC combines PICP and PINAW, it is found to be the best in Datasets A and B at the 95% confidence level, with the smallest CWC values and CRPS values compared to the other confidence levels.
In this paper, multiple error metrics were used to evaluate the model's predictive performance, with RMSE and CRPS being the most relevant to the model's ability to predict fluctuations in the effluent pH values. RMSE amplifies prediction errors by squaring them, making it particularly sensitive to sudden and large fluctuations, thus reflecting the model's accuracy in predicting sharp changes. CRPS, on the other hand, measures the accuracy of the predicted probability distribution, capturing the model's performance in forecasting fluctuations in a probabilistic sense. Lower RMSE and CRPS values indicate that the model is better at handling abrupt changes.
Moreover, upon meticulous examination of Table 1, it is apparent that the model demonstrates exceptional proficiency in forecasting fluctuation values. This highlights the model's outstanding performance not only in traditional predictive tasks but also in accurately forecasting the trends of fluctuations within the dataset.
EXPERIMENTS AND ANALYSIS
Ablation experiment
Model . | Dataset A . | Dataset B . | ||||||
---|---|---|---|---|---|---|---|---|
MAE . | RMSE . | MAPE . | . | MAE . | RMSE . | MAPE . | . | |
CBGRU–MHA | 0.0403 | 0.0521 | 0.46% | 0.9943 | 0.0643 | 0.0816 | 0.80% | 0.9797 |
CNN–GRU–MHA | 0.0686 | 0.1100 | 0.78% | 0.9746 | 0.0820 | 0.1273 | 1.02% | 0.9506 |
CNN–BiGRU–AM | 0.0715 | 0.1173 | 0.82% | 0.9711 | 0.0927 | 0.1298 | 1.15% | 0.9486 |
CNN–GRU–AM | 0.2395 | 0.2058 | 2.73% | 0.9111 | 0.2746 | 0.2191 | 3.42% | 0.8537 |
CNN–MHA | 0.2112 | 0.2033 | 2.41% | 0.9132 | 0.2442 | 0.1731 | 3.04% | 0.9087 |
BiGRU–MHA | 0.1955 | 0.1933 | 2.23% | 0.9215 | 0.2497 | 0.1568 | 3.11% | 0.9250 |
CNN | 0.2310 | 0.2099 | 2.63% | 0.9075 | 0.2923 | 0.1834 | 3.64% | 0.8974 |
BiGRU | 0.2447 | 0.2180 | 2.79% | 0.9002 | 0.2370 | 0.1732 | 2.95% | 0.9085 |
Model . | Dataset A . | Dataset B . | ||||||
---|---|---|---|---|---|---|---|---|
MAE . | RMSE . | MAPE . | . | MAE . | RMSE . | MAPE . | . | |
CBGRU–MHA | 0.0403 | 0.0521 | 0.46% | 0.9943 | 0.0643 | 0.0816 | 0.80% | 0.9797 |
CNN–GRU–MHA | 0.0686 | 0.1100 | 0.78% | 0.9746 | 0.0820 | 0.1273 | 1.02% | 0.9506 |
CNN–BiGRU–AM | 0.0715 | 0.1173 | 0.82% | 0.9711 | 0.0927 | 0.1298 | 1.15% | 0.9486 |
CNN–GRU–AM | 0.2395 | 0.2058 | 2.73% | 0.9111 | 0.2746 | 0.2191 | 3.42% | 0.8537 |
CNN–MHA | 0.2112 | 0.2033 | 2.41% | 0.9132 | 0.2442 | 0.1731 | 3.04% | 0.9087 |
BiGRU–MHA | 0.1955 | 0.1933 | 2.23% | 0.9215 | 0.2497 | 0.1568 | 3.11% | 0.9250 |
CNN | 0.2310 | 0.2099 | 2.63% | 0.9075 | 0.2923 | 0.1834 | 3.64% | 0.8974 |
BiGRU | 0.2447 | 0.2180 | 2.79% | 0.9002 | 0.2370 | 0.1732 | 2.95% | 0.9085 |
The results show that removing BiGRU, MHA or CNN all lead to worse model results (Liu et al. 2023; Zhou et al. 2023; Bi et al. 2024; Cui et al. 2024; Wang & Cao 2024), proving that these components play an important role in the models studied in this paper and are essential for improving the predictive performance of the models. The hybrid model CNN–GRU–AM is even worse than CNN–MHA, BiGRU–MHA, which may be due to the characteristics of CNN–MHA, BiGRU–MHA to better capture the data of the two selected WWTPs. Further, it is observed that the addition of BiGRU significantly outweighs the optimization effect of CNN, possibly due to the sequence data being more important than the spatial data in Dataset A, Dataset B. Therefore, the ability of BiGRU to capture the dependencies and dynamic changes in sequence data is more important. Meanwhile, the addition of BiGRU is better than the addition of GRU alone, indicating that the introduction of BiGRU can significantly improve the performance of the model because BiGRU can better account for past and future information in the sequence. Finally, adding MHA is more effective than adding AM alone. This may be because MHA is able to better capture the importance of different aspects of the input sequence, thereby increasing the model's focus on the most important information in the data.
PP results and comparative analysis
To assess the CBGRU–MHA model's efficacy in the PP task, the study conducted a comparative analysis against traditional single models. Specifically, the study compared the CBGRU–MHA model with the following individual models: the ANN model, the Back Propagation Neural Network (BPNN) model, the CNN model, the BiGRU model, the LSTM model, the RNN model, the Temporal Convolutional Network (TCN) model, and the SVM model.
According to the results, among the individual models, the study observed that the CNN model and BiGRU model performed the best in Dataset A and Dataset B, followed by the LSTM model with relatively better predictions. The TCN and SVM models performed relatively poorly. Therefore, it is concluded that PP performance of the CBGRU–MHA model in predicting pH for the two datasets used in this paper is better than the other single AI prediction models compared.
Then, comparing and analyzing the results of the hybrid models, among the four hybrid models added, the best result is CNN–BiLSTM–MHA, but the prediction is still worse than the CBGRU–MHA model used in this paper. In addition, the result illustrates once again that MHA and bi-directional models can optimize the model to a certain extent, as in the case of the CNN–BiLSTM–MHA model in comparison with the CNN–BiLSTM–AM model and the CNN–LSTM- MHA model. However, it should be noted that on the datasets used in this paper, there are cases where the hybrid models are less effective than single models, for example, the BiGRU model is better than the CNN–LSTM–AM model, which may be due to the fact that the characteristics of the wastewater datasets used are not sufficiently suitable for the application of the CNN–LSTM–AM hybrid model. This demonstrates that hybrid models are not necessarily advantageous on certain datasets. Hence, a comprehensive consideration of task requirements, data characteristics, model design, and domain knowledge is essential when choosing models, ensuring that the selected model aligns most effectively with the prediction task's needs.
When comparing the performance of different models, it is essential to consider not only predictive accuracy but also the impact of model size and complexity on prediction performance. Larger models typically have more adjustable parameters, which theoretically give them greater capacity for fitting the data. However, they also run the risk of overfitting, especially when dealing with small or noisy datasets, where their performance may degrade. Additionally, larger models can significantly increase both training and inference time.
Among the models compared in this study, although the CBGRU–MHA model has a relatively large number of parameters, it still outperforms other single and hybrid models in terms of predictive performance. In contrast, some models with fewer parameters, such as SVM and TCN, are limited in their parameter count and representational capacity, which makes them less capable of capturing complex spatiotemporal dependencies, leading to inferior performance.
However, the performance of hybrid models varies. For example, the CNN–BiLSTM–MHA model, despite its higher parameter count and complexity, performs well but still falls short of the CBGRU–MHA model on certain datasets. This suggests that while larger models may have an advantage in terms of representational power, their actual performance depends on the characteristics of the dataset and the specific task at hand.
Interval prediction results and comparative analysis
The interval prediction method used in this paper is ABKDE and the kernel function is Gaussian kernel. Using three different kernel functions, the study compares and evaluates the interval prediction results of ABKDE and KDE. Confidence intervals of 95 and 90% are used, giving more reliable results while increasing the range of PIs. The evaluation metrics of the model's interval prediction under different strategies are shown in Supplementary material, Table S5 and Figure 5(c)–5(f).
Firstly, it is observed that the PICP values measured using the ABKDE method are generally higher than the KDE on the two datasets used. This means that the actual observations of the ABKDE are more likely to fall within the predicted intervals, suggesting that the ABKDE is more accurate in capturing the true data distribution. Meanwhile, indicators such as PINAW, CWC and CRPS are also generally reduced to varying degrees compared to KDE, indicating that ABKDE has improved in terms of PI width and accuracy.
In the following, for the three kernel functions Gauss, Laplace and Cauchy, it is observed that both ABKDE and KDE give the best results for Gauss in terms of PICP. This suggests that the Gaussian kernel function is better at capturing data distributions, and its typical bell curve shape is suitable for most real data distributions. In addition, the Gaussian kernel function exhibits lower error and higher robustness compared to Laplace and Cauchy in the CRPS metric, further confirming the superiority of Gaussian in the two datasets used.
The interval prediction results of ABKDE and KDE with three different kernel functions validate the excellent performance of ABKDE and Gaussian kernel functions. For ABKDE, the combination of the Gaussian kernel function shows good adaptability and robustness in interval prediction, making it a reliable interval prediction method. The results of this study provide an important reference for future interval prediction model design and optimization.
Forecast period
To offer a comprehensive evaluation of the prediction models utilized to address the diverse decision and planning requirements of real-world WWTPs, this study took into account the influence of the prediction time frame on prediction accuracy. In this study, forecast periods are categorized as very short-term, short-term and long-term. Time steps are , , and days for the very short-term, 1, 2, and 3 days for the short-term and 7, 14, and 28 days for the long-term. The results are shown in Supplementary material, Table S6. From the analysis of the length of the prediction time period, in the very short-term, the MAE, RMSE, and MAPE values of Dataset A gradually decrease and the value increases with the increase of the time step, and the best performance is achieved at the time step of day. But the MAE and MAPE values of Dataset B fluctuate upward, the RMSE value gradually increases, gradually decreases, and relatively speaking, Dataset B performs best at a time step of days. In both short and long term, the MAE, RMSE and MAPE values of Dataset A are gradually increasing and is decreasing. The difference is that in the short-term the change is slower, while in the long-term the change is much faster. With a time step of 28 days compared to 14 days, the MAE increased by 0.1058, the RMSE increased by 0.1205, the MAPE increased by 1.17% and the decreased by 0.0942. The MAE, RMSE, MAPE, and of Dataset B show a fluctuating downward trend. So in the short-term, Dataset A performs best at a time step of 1 day, while Dataset B performs best at a time step of 2 days. In the long-term, Dataset A performs best at a time step of 7 days, while Dataset B performs best at a time step of 14 days. Overall, the model performs better in the very short-term than in the short-term, and in the short-term than in the long-term, but the predictive performance is still excellent.
In terms of long-term forecasting, the model shows instability as the time step increases, and MAE, RMSE, and MAPE show an increasing trend, indicating that the model's predictive ability is more challenged. According to the results, for predicted time steps less than or equal to 14 days, the model performance is excellent, with above 0.97 on both datasets. A more pronounced fluctuation in prediction performance occurs at a time step of 28 days, which may be due to the fact that the effects of internal and external factors are more complex at this time and the model has difficulty capturing all the dynamic relationships.
In addition, the study finds that the error of Dataset A is generally lower than that of Dataset B when the time step is small, but the increase in error is relatively large. In terms of fit, the of Dataset A is generally higher than that of Dataset B, indicating a better model fit for Dataset A. It is worth noting that Dataset A has a significant drop in prediction at a time step of 28 days. This may be due to the more stable operation of the wastewater treatment system in the Shanghai, where the data is more regular and the model is more likely to capture the regularity. Dataset A performs well in the short-term, but as the time step increases, the prediction error increases more rapidly and the fit decreases. This can be affected by region-specific conditions, such as weather changes and equipment failures, which can affect the operational stability of the treatment system and reduce the fit of the model. Further development of the model in terms of long-term forecasting will be necessary in the future to improve the model's forecasting performance.
Special data forecasts
Although the model has shown excellent predictive performance on both datasets, the data is relatively complete as the datasets used by the model run from 8 November 2020 to 31 December 2023. To assess the model's performance on unique datasets, the study identified several segments within Dataset A featuring special values such as peaks, fluctuations, maximums, and minimums. These segments were discerned based on the real data depiction of the predictor pH as illustrated in Supplementary material, Figure S5. According to the labeled sections in Supplementary material, Figure S9, the study takes these four labeled data from Dataset A, respectively, and the groups of data with basic information are shown in Supplementary material, Table S7.
The model of this paper is run for these four particular datasets in turn, as shown in Supplementary material, Figure S10 and Table 3. It can be observed that under the conditions of special values such as peak values, the PP performance of the model still remains in a relatively excellent range, and the are all greater than 0.8, which can well explain the changes of the predicted target variables, and the degree of variation between the predicted values and the actual observed values is relatively small, and it has a high ability in explaining the special data. For Datasets 1–4, the trends of the fitted curves of the PP values and the true values are basically the same, with a good fit. The values of MAE, RMSE and MAPE, which indicate different errors, for Datasets 1, 4 are all small, while are all greater than 0.85, suggesting that the model PPs in Datasets 1, 4 very well. For Datasets 2 and 3, despite the relatively large error, the minimum value of for the measured dataset is 0.8013, indicating that although the model has some error in predicting some of the special values, it is highly interpretable and able to predict data containing peaks with a high degree of fluctuation to some extent.
Dataset . | MAE . | RMSE . | MAPE . | . | Confidence level . | PICP . | PINAW . | CWC . | CRPS . |
---|---|---|---|---|---|---|---|---|---|
1 | 0.0604 | 0.0711 | 0.83% | 0.9111 | 95% | 0.9019 | 0.3163 | 1.3535 | 0.2623 |
90% | 0.8529 | 0.2524 | 1.3022 | 0.2555 | |||||
2 | 0.0760 | 0.1071 | 0.92% | 0.8013 | 95% | 0.9310 | 0.5756 | 1.5978 | 0.1464 |
90% | 0.8867 | 0.5473 | 1.5795 | 0.1413 | |||||
3 | 0.1685 | 0.2004 | 1.90% | 0.8222 | 95% | 0.9464 | 0.2471 | 1.2615 | 0.3595 |
90% | 0.8892 | 0.2200 | 1.2508 | 0.3805 | |||||
4 | 0.0473 | 0.0949 | 0.56% | 0.8543 | 95% | 0.9465 | 0.3855 | 1.3999 | 0.2441 |
90% | 0.8983 | 0.2033 | 1.2294 | 0.2274 |
Dataset . | MAE . | RMSE . | MAPE . | . | Confidence level . | PICP . | PINAW . | CWC . | CRPS . |
---|---|---|---|---|---|---|---|---|---|
1 | 0.0604 | 0.0711 | 0.83% | 0.9111 | 95% | 0.9019 | 0.3163 | 1.3535 | 0.2623 |
90% | 0.8529 | 0.2524 | 1.3022 | 0.2555 | |||||
2 | 0.0760 | 0.1071 | 0.92% | 0.8013 | 95% | 0.9310 | 0.5756 | 1.5978 | 0.1464 |
90% | 0.8867 | 0.5473 | 1.5795 | 0.1413 | |||||
3 | 0.1685 | 0.2004 | 1.90% | 0.8222 | 95% | 0.9464 | 0.2471 | 1.2615 | 0.3595 |
90% | 0.8892 | 0.2200 | 1.2508 | 0.3805 | |||||
4 | 0.0473 | 0.0949 | 0.56% | 0.8543 | 95% | 0.9465 | 0.3855 | 1.3999 | 0.2441 |
90% | 0.8983 | 0.2033 | 1.2294 | 0.2274 |
For interval prediction, 95 and 90% confidence intervals are selected, and the performance of the models used in this study for special datasets is evaluated using PICP, PINAW, CWC, and CRPS, as shown in Supplementary material, Figure S11 and Table 3.
Datasets 1 and 4 have better results, while Datasets 2 and 3 have differences in the prediction of some of the values. After analyzing the results, the study believes that there are several reasons for the following:
(1) The data fluctuations of Datasets 2, 3 are to some extent larger than those of Datasets 1, 4. The data distribution is more uneven and less regular, and the sharp fluctuations of the data cause difficulties in prediction.
(2) The test sets of Datasets 2 and 3 show huge data fluctuations. The extreme deviation of the target predictor pH reaches 1 on the test set of Dataset 2, and even 1.4 on Dataset 3. The disconnected test sets limit the predictions of the model.
(3) In this experiment, the parameters chosen for the model remain the same as those used throughout the study of the model in this paper. It is well known that model parameters affect prediction, so for datasets containing large peaks and fluctuations, the structure and parameters of the model need to be adjusted to obtain better interval predictions.
The prediction results are also reflected in the evaluation indicators. Firstly, for Datasets 1 and 3, the evaluation indicators show excellent results, indicating that the model captures the true values very well. This shows that the model performs extremely well, accurately capturing the range of the distribution of the data with a high degree of interpretability and adaptability. For Dataset 4, the study observes that the model covers the true values well, with low CWC and CRPS. This shows that the model performs well in predicting the intervals of Dataset 4 at 95 and 90% confidence levels. This good performance may stem from the characteristics of the data distribution of Dataset 4 and the adaptability of the model in dealing with this data distribution. In addition, the performance of the model at higher confidence levels demonstrates the robustness of its predictions, capturing the distributional characteristics well. However, the interval predictions for Dataset 2 were relatively poor and did not perform as well as Datasets 1, 3, and 4. This may be due to the fact that the characteristics of Dataset 2 is different from the previous three, leading to challenges for the model in dealing with this distribution. The characteristics of the datasets and possible limitations of the model need to be further analyzed. However, despite the challenges encountered, the model is still able to make predictions on datasets, which suggests that the model has the ability to generalize and adapt to handle some degree of spikes and fluctuations.
In summary, the model used in this paper is able to provide some good interval prediction for data containing peaks and large fluctuations. Differences in the predictive effectiveness of intervals across datasets reflect the sensitivity of the model to the characteristics of the data, as well as the strengths and weaknesses of the model itself. In future studies, it is worthwhile to further analyze the data characteristics from different WWTPs at various stages and to investigate the model performance on data with different characteristics, which is crucial for improving the predictive and generalizing ability of the model.
Model suitability
The model utilized in this paper is capable of forecasting multivariate input water quality data originating from WWTPs. In this section, to thoroughly validate the robustness of the CBGRU–MHA–ABKDE model, dissolved oxygen and turbidity are employed as predictor variables, and the outcomes are depicted in Supplementary material, Figures S12 and S13, Table 4.
Predictor variable . | Dataset . | MAE . | RMSE . | MAPE . | . | Confidence level . | PICP . | PINAW . | CWC . | CRPS . |
---|---|---|---|---|---|---|---|---|---|---|
Dissolved oxygen | A | 0.0611 | 0.0726 | 0.71% | 0.9914 | 95% | 0.8971 | 0.0833 | 1.1231 | 0.0944 |
90% | 0.7968 | 0.0660 | 1.1456 | 0.0945 | ||||||
85% | 0.7250 | 0.0575 | 1.1627 | 0.0946 | ||||||
B | 0.3352 | 0.4331 | 3.73% | 0.9697 | 95% | 0.8860 | 0.1154 | 1.1609 | 0.2051 | |
90% | 0.8379 | 0.0923 | 1.1499 | 0.2050 | ||||||
85% | 0.7932 | 0.0780 | 1.1460 | 0.2052 | ||||||
Turbidity | A | 0.1739 | 0.2549 | 5.35% | 0.9837 | 95% | 0.9114 | 0.0852 | 1.1175 | 0.1851 |
90% | 0.8218 | 0.0592 | 1.1253 | 0.1867 | ||||||
85% | 0.7430 | 0.0441 | 1.1393 | 0.1883 | ||||||
B | 2.9184 | 5.159 | 59.89% | 0.8653 | 95% | 0.8804 | 0.0964 | 1.1448 | 0.0955 | |
90% | 0.7703 | 0.0516 | 1.1456 | 0.0906 | ||||||
85% | 0.6877 | 0.0402 | 1.1662 | 0.0925 |
Predictor variable . | Dataset . | MAE . | RMSE . | MAPE . | . | Confidence level . | PICP . | PINAW . | CWC . | CRPS . |
---|---|---|---|---|---|---|---|---|---|---|
Dissolved oxygen | A | 0.0611 | 0.0726 | 0.71% | 0.9914 | 95% | 0.8971 | 0.0833 | 1.1231 | 0.0944 |
90% | 0.7968 | 0.0660 | 1.1456 | 0.0945 | ||||||
85% | 0.7250 | 0.0575 | 1.1627 | 0.0946 | ||||||
B | 0.3352 | 0.4331 | 3.73% | 0.9697 | 95% | 0.8860 | 0.1154 | 1.1609 | 0.2051 | |
90% | 0.8379 | 0.0923 | 1.1499 | 0.2050 | ||||||
85% | 0.7932 | 0.0780 | 1.1460 | 0.2052 | ||||||
Turbidity | A | 0.1739 | 0.2549 | 5.35% | 0.9837 | 95% | 0.9114 | 0.0852 | 1.1175 | 0.1851 |
90% | 0.8218 | 0.0592 | 1.1253 | 0.1867 | ||||||
85% | 0.7430 | 0.0441 | 1.1393 | 0.1883 | ||||||
B | 2.9184 | 5.159 | 59.89% | 0.8653 | 95% | 0.8804 | 0.0964 | 1.1448 | 0.0955 | |
90% | 0.7703 | 0.0516 | 1.1456 | 0.0906 | ||||||
85% | 0.6877 | 0.0402 | 1.1662 | 0.0925 |
The results showed that when the target predictor variable was dissolved oxygen, the developed model gave excellent prediction results on both datasets, with reaching 96% and the curves of predicted and actual values fitted well. In terms of PP, the model predicts Dataset A better than Dataset B. In terms of interval prediction, both PICP and PINAW tended to increase gradually with increasing confidence interval level, with less difference for CRPS. According to the CWC indicator, Dataset A is best predicted at the 95% confidence level, while Dataset B is best predicted at the 85% confidence level.
For the targeted predictor variable, turbidity, the model showed a high accuracy with of 98% on Dataset A, while it was slightly less accurate on Dataset B with of 0.8653. In terms of PP, the model still predicts Dataset A better than Dataset B; In terms of interval prediction, PICP and PINAW increased with increasing confidence levels, whereas CRPS differed less. According to the CWC indicator, datasets A and B are better predicted at the 95% confidence level.
Taken together, the CBGRU–MHA–ABKDE model showed good prediction results for both target predictor variables, dissolved oxygen and turbidity, confirming the suitability of the model. However, further research and modeling improvements are needed to improve the accuracy of predictions in specific contexts.
Model robustness
Data acquisition equipment at WWTPs can be affected by factors such as equipment limitations, ageing, and inaccuracies, leading to erroneous measurements. Therefore, it is essential to evaluate the model's predictive performance when erroneous data is present in the training set to assess its robustness. In this study, 2% of the processed training data was randomly sampled, and errors were introduced into these data points. These erroneous data were generated as random outliers, with values either two to three times larger or smaller than the original normal data.
Given the use of multivariate inputs in this study, a randomized approach was adopted to determine the number and magnitude of erroneous data for the input variables in the sampled training set. For example, within a single row of data, there could be one magnified erroneous data point, or there might be three magnified and two minimized erroneous data points, all assigned randomly.
The training sets derived from processed datasets A and B, after being modified with introduced erroneous data, were merged with the original test set. Then, predictions were made using the model proposed in this paper, ensuring consistency in model parameters throughout the process to assess its robustness. The results, which are presented in Table 5, summarize the evaluation metrics after the insertion of erroneous data.
Dataset . | MAE . | RMSE . | MAPE . | . | Confidence level . | PICP . | PINAW . | CWC . | CRPS . |
---|---|---|---|---|---|---|---|---|---|
A | 0.0390 | 0.0521 | 0.45% | 0.9943 | 95% | 0.9554 | 0.0887 | 1.0986 | 0.1229 |
90% | 0.8774 | 0.0599 | 1.0968 | 0.1248 | |||||
85% | 0.7721 | 0.0464 | 1.1258 | 0.1252 | |||||
B | 0.0721 | 0.0936 | 0.90% | 0.9733 | 95% | 0.8974 | 0.1409 | 1.1804 | 0.2329 |
90% | 0.7988 | 0.1042 | 1.1827 | 0.2345 | |||||
85% | 0.7154 | 0.0847 | 1.1952 | 0.2363 |
Dataset . | MAE . | RMSE . | MAPE . | . | Confidence level . | PICP . | PINAW . | CWC . | CRPS . |
---|---|---|---|---|---|---|---|---|---|
A | 0.0390 | 0.0521 | 0.45% | 0.9943 | 95% | 0.9554 | 0.0887 | 1.0986 | 0.1229 |
90% | 0.8774 | 0.0599 | 1.0968 | 0.1248 | |||||
85% | 0.7721 | 0.0464 | 1.1258 | 0.1252 | |||||
B | 0.0721 | 0.0936 | 0.90% | 0.9733 | 95% | 0.8974 | 0.1409 | 1.1804 | 0.2329 |
90% | 0.7988 | 0.1042 | 1.1827 | 0.2345 | |||||
85% | 0.7154 | 0.0847 | 1.1952 | 0.2363 |
This test enables us to evaluate the model's robustness. Robustness refers to the model's ability to maintain good predictive performance even in the presence of noise or anomalies in the data. In this test, we simulated potential data abnormalities by introducing randomly generated erroneous data into the training set and then observed how the model performed under these conditions. If the model is able to maintain high predictive accuracy after the introduction of erroneous data, and even show improvements in certain metrics, this would indicate a high tolerance of the model to noise.
Based on the results, it is found that the model's point and interval prediction performance is still very good after setting the wrong data. The performance evaluation metrics are essentially unchanged from the original model, and in some cases are even better. After analyzing the situation, there may be the above reasons:
(1) Robustness of the model. The model is highly fault-tolerant, which means it can effectively deal with noise and outliers in the input data. The model is capable of handling data with a 2% error rate.
(2) If there is some degree of correlation between several input features, the model can use this redundant information to compensate for the loss of information caused by partially erroneous data.
The model maintains excellent performance after the introduction of erroneous data, reflecting its good robustness. The model used in this paper is currently able to handle datasets with an error rate of at least 2%. It also suggests that the robustness of the model can be further explored and optimized in future research, for example by introducing more sophisticated anomaly detection mechanisms.
MODEL INTEGRATION AND MANAGEMENT IMPROVEMENT
In this section, we provide a detailed discussion on how the proposed water quality prediction model can be integrated into the existing system of a WWTP. Through intelligent and automated management, the integration aims to enhance overall system efficiency, reduce operational costs, and ensure both environmental compliance and long-term sustainability.
Integration of data collection and preprocessing
WWTPs are widely equipped with water quality monitoring devices, such as pH sensors, COD sensors, and dissolved oxygen sensors. These sensors provide real-time, multidimensional data on water quality, forming the foundation for the effective operation of the water quality prediction model.
To ensure data accuracy and consistency, the data collected by the sensors must undergo preprocessing. This includes noise removal, filling missing data, and performing standardization to ensure that the input data meets the model's expected format. This preprocessing module will integrate with the existing monitoring system to establish a real-time, efficient data pipeline.
Model deployment and computational architecture design
Deploying the model is critical and requires selecting the appropriate architecture. WWTPs can opt for either local deployment or cloud-based deployment, depending on their computational resources and real-time processing requirements. For larger plants, local deployment helps reduce data transmission latency and enhances real-time prediction performance. The model is interfaced with the plant's control system through APIs or other data interfaces, ensuring that prediction results are effectively applied in real-world operations.
The integration of the model enables real-time water quality predictions, allowing for accurate forecasting of future conditions. Based on these predictions, the system can automatically adjust key treatment parameters, such as chemical dosing, aeration intensity, and sedimentation time, optimizing the entire treatment process.
Application of predictions and automated control
The integration of the water quality prediction model significantly improves WWTP management processes. The model can forecast changes in water quality over a specified period, providing the basis for automated control decisions. By incorporating the prediction model, the system can proactively adjust process parameters in advance, preventing non-compliant discharges or insufficient treatment.
In practice, when the model predicts that water quality will exceed regulatory limits or that treatment challenges will increase, the system can issue timely alerts and automatically take corrective measures, such as increasing chemical dosing, extending treatment time, or activating backup treatment facilities. This approach effectively reduces the risk of sudden pollution incidents and enhances the stability and reliability of the treatment process.
Long-term adaptation and model optimization
Over time, WWTPs accumulate increasing amounts of real-time water quality data. Using this data, the water quality prediction model can be continuously optimized to become more adaptive. The model's self-learning capability allows it to handle water quality fluctuations, seasonal variations, and unforeseen events, ensuring that prediction accuracy and reliability improve over time.
This adaptive mechanism also supports the plant's long-term operational planning, reducing the need for manual intervention and advancing intelligent management.
Enhancing management efficiency and environmental compliance
The integration of the water quality prediction model is not merely a technical innovation; it represents a comprehensive upgrade to the management model of WWTPs. Traditionally, these plants have relied heavily on experience and manual control. However, with increasingly stringent water quality regulations and more complex water compositions, manual management is no longer efficient. By adopting intelligent and automated methods, human intervention can be significantly reduced, leading to more efficient operations.
(1) Reducing operational costs: By optimizing chemical dosing, energy usage, and treatment times, the prediction model can effectively reduce the resource consumption of wastewater treatment processes. This is particularly important for plants that operate over long periods, as it not only lowers production costs but also reduces environmental impact.
(2) Improving emergency response capabilities: In the event of sudden pollution incidents, the water quality prediction model can respond rapidly, forecasting the spread of contamination and automatically adjusting treatment processes. This significantly enhances emergency response capabilities and reduces the risk of non-compliant discharges.
(3) Ensuring long-term compliance: As water discharge standards become more stringent, WWTPs face increasing regulatory pressure. With the model's early warning system and real-time control, treated water can consistently meet regulatory standards, helping plants avoid fines or shutdowns.
(4) The necessity for future development: With rapid urbanization and increased industrial discharges, the demand for wastewater treatment is rising. In the future, treatment plants will face greater pressures, and water quality will become increasingly complex. Traditional treatment technologies alone will no longer suffice to meet these challenges. Therefore, the introduction of water quality prediction models is not only essential for current management needs but also provides a more intelligent solution for the future of wastewater treatment.
Integrating the water quality prediction model is a crucial step toward intelligent and automated management in the wastewater treatment industry. The application of this technology not only significantly improves operational efficiency but also ensures environmental compliance, reduces operational risks, and aligns with current and future environmental policies and sustainability goals. As the industry evolves, intelligent and automated management will become the standard, and this work offers a feasible and practically applicable solution for that transition.
CONCLUSION
The study presents the CBGRU–MHA with ABKDE method to tackle the challenge of multivariate time series interval prediction in water quality data from WWTPs. Initially, data from WWTPs in Shanghai and Zhejiang, China, underwent preprocessing, involving WT for data smoothing and noise reduction. Subsequently, CC and MI methods were employed to identify input variables and target predictors, utilizing multiple modeling iterations and trial-and-error approaches. To account for the varied ranges of different indicators, data normalization was performed. Moving forward, a combination of CNN and BiGRU models addressed time series correlation, while MHA evaluated potential correlations among water quality data indicators of WWTPs. Interval prediction was facilitated using ABKDE, with upper and lower bounds derived through the bootstrap method. Stochastic gradient descent optimized model parameters, enhancing effectiveness. Ablation experiments elucidated the impacts of various model components, showcasing the model's capability to handle data containing peaks and significant fluctuations. The model's superiority in water quality prediction of WWTPs was highlighted through the analysis of forecast period and comparisons with alternative models. Suitability and robustness testing confirmed the model's outstanding generalization capability and resilience.
Despite these advancements, certain limitations remain, necessitating further enhancements. For instance, the current forecast factors focus solely on WWTP effluent indicators, overlooking other potential influencing factors such as geographical location, climatic conditions, and industrial structure.
Incorporating these considerations into predictive modeling for WWTPs can enhance accuracy, interpretability, and contribute to sustainable development goals.
ACKNOWLEDGEMENT
It was supported by the Humanities and Social Sciences Research Planning Fund Program, Ministry of Education, China (No. 24YJAZH167). It was also supported by the Open Fund of Key Laboratory of Sediment Science and Northern River Training, the Ministry of Water Resources, China Institute of Water Resources and Hydropower Research (Grant No. IWHR-SEDI-2023-10).
CRediT AUTHORSHIP CONTRIBUTION STATEMENT
S.L. contributed to conceptualization, methodology, software, writing – original draft. Z.W. contributed to methodology, validation, writing – review & editing, supervision. Y.L. did software analysis.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.