The water quality of drinking water reservoirs directly impacts the water supply safety for urban residents. This study focuses on the Da Jing Shan Reservoir, a crucial drinking water source for Zhuhai City and the Macau Special Administrative Region. The aim is to establish a prediction model for the water quality of drinking water reservoirs, which can serve as a vital reference for water plants when formulating their water supply plans. In this research, after smoothing the data using the Hodrick-Prescott filter, we utilized the long short-term memory (LSTM) network model to create a water quality prediction model for the Da Jing Shan Reservoir. Simulation calculations reveal that the model's fitting degree is consistently above 60%. Specifically, the prediction accuracy for pH, dissolved oxygen (DO), and biochemical oxygen demand (BOD) in the water quality prediction model aligns with actual results by more than 70%, effectively simulating the reservoir's water quality changes. Moreover, for parameters such as pH, DO, BOD, and total phosphorus, the relative forecasting error of the LSTM model is less than 10%, confirming the model's validity. The results of this study offer an essential model reference for predicting water quality for the Da Jing Shan Reservoir.

  • The long short-term memory (LSTM) model was optimized with the Hodrick-Prescott (HP) filter.

  • The HP-LSTM model is used to predict the water quality index.

  • The HP-LSTM model is compared with the LSTM model to prove the effectiveness of the optimized model.

With the rapid development of socio-economics in recent times, water environmental pollution has become an increasingly pressing issue (Hasanzadeh et al. 2020; Chen et al. 2021; Ouyang et al. 2023). Accurately predicting water quality indicators is crucial for anticipating and promptly responding to sudden water pollution events (Bi et al. 2024; Wang et al. 2024). Numerous studies on temporal and geographical water quality forecasts and data-driven models have been conducted to limit water pollution and lessen its detrimental effects on human civilization and the aquatic ecological system (Chen et al. 2020). In particular, the management of water quality in reservoirs, due to its important functional status as a source of drinking water, is more highly concerned with its water quality change situation (Lian et al. 2022; Yin et al. 2022). The urban reservoir serves as a crucial source of drinking water in urban areas, playing a significant role in ensuring the safety of urban water supply. Water quality in the reservoir is a critical determinant of water supply safety. The process of economic development inevitably brings about specific pollution problems. Therefore, it is crucial to protect the urban reservoir's water source and ensure the urban water supply's safety. One of the most critical tasks to ensure the safety of the water supply is to track and monitor the water quality of the source and identify the pollution problem in real time (Abraham et al. 2022; Zhang et al. 2022a).

Water quality prediction has always been a focal research direction in the water resources field, and forecasting trends in water quality fluctuations is essential for the timely detection of changes, and assessing the impact on drinking water quality and safety for human health and the environment (Bourjila et al. 2023; Chen et al. 2024). As ecological systems, reservoirs are characterized by their complexity, dynamism, vulnerability, and significant ecological value. Predictive alerts for reservoir water quality enable the early detection of anomalies, strengthening ecological and environmental protection (Marcé et al. 2016). Current water quality prediction methods can be categorized into mechanistic and non-mechanistic (Zhang et al. 2022a), and data analysis techniques applied to water quality datasets help identify key factors and characteristics affecting water quality (Li et al. 2023; Shan et al. 2023). However, water quality data have become nonlinear and unstable, which is influenced by numerous factors, and choosing appropriate computational methods for modeling is crucial for developing predictive water quality models. These models are instrumental in foreseeing and alerting about future water quality changes, promptly identifying potential issues, and initiating corrective actions (Shi et al. 2018). In the era of big data, machine learning (ML) and deep learning (DL) models within artificial intelligence (AI) have seen significant growth, offering advanced methods for predicting water quality dynamics (Reichstein et al. 2019; Xiong et al. 2022; Ouyang et al. 2023).

In the field of predicting water quality changes, ML and DL have numerous successful applications. Models such as Extreme Gradient Boosting (XGBoost), support vector regression, K-nearest neighbors, ensemble trees, and random forests have been used to predict the water quality index (WQI), effectively reducing the time and errors associated with calculating the WQI and improving prediction accuracy (Sakaa et al. 2022; Hussein et al. 2024; Yan et al. 2024). In the domain of artificial neural networks, Multilayer Perceptron Neural Networks are widely applied for predicting river/stream water temperature and quality. These models handle complex nonlinear relationships and have demonstrated superior performance compared to traditional statistical models in various case studies (Zhu & Piotrowski 2020; Wong et al. 2021).

These models are data-centric and independent of watershed process mechanisms. They employ algorithms to discern internal relationships between input and output data, effectively capturing the predicted subjects' dynamic patterns (Kasiviswanathan et al. 2016). Among these ML/DL models, neural network models have been broadly adopted (Yang et al. 2017), especially the application of long short-term memory (LSTM), being utilized in predicting water quality, air quality, and other environmental domains (Zhang & Li 2022; Zhang et al. 2022a; Wang et al. 2023). For instance, LSTM is employed to enhance weather forecasting models due to its capability to analyze and predict time-series data. Researchers have utilized LSTM to forecast meteorological variables such as temperature, wind speed, and rainfall. Within the medical domain, LSTM finds utility in disease diagnosis, including risk assessment for conditions such as heart attacks and diabetes. Additionally, it aids in predicting patients' hospital stays and associated medical costs (Varadharajan & Nallasamy 2022; Su et al. 2023). In energy demand forecasting, LSTM is utilized to predict electricity demand and energy consumption, which is crucial for the stable operation of power systems and energy supply planning.

As the demand for high-frequency prediction escalates, traditional data-driven ML and DL models require enhanced robustness to capture transient and dramatic dynamic changes effectively (Xiao et al. 2017). Academic research on LSTM networks shows that a single LSTM structure has apparent limitations in forecasting accuracy when facing complex data. This is especially evident when dealing with high volatility and low regularity data samples, where LSTM cannot be fitted finitely, resulting in reduced prediction accuracy. Therefore, it is necessary to combine the LSTM model with other state-of-the-art technologies (Pyo et al. 2023; Wang et al. 2023b; Cai et al. 2023; Zamani et al. 2023), and the prediction accuracy and efficiency of the LSTM model can be significantly improved by using various pretreatment techniques (Haq & Harigovindan 2022; Zhang et al. 2022c).

Data pre-processing techniques are crucial in harnessing the full potential of LSTM models' algorithmic strengths (Fan et al. 2022; Hao et al. 2022). Among these techniques, the Hodrick-Prescott (HP) filter is notable for addressing the non-stationarity issue by smoothing data (Domala & Kim 2023). Therefore, it can effectively counter the limitations of LSTM models in handling the non-stationary dynamics of the original series, significantly enhancing model performance. The primary use of the HP filter lies in signal denoising for stock price forecasting, load forecasting, and energy consumption forecasting (Xu et al. 2015; Ilyas et al. 2022). However, integrating the HP filter with LSTM models for surface water environment modeling remains largely unexplored. We hypothesize that using the HP filter as a pre-processing technique will bolster LSTM's capability to capture water quality dynamics. This combination is expected to enhance the model's overall performance and reduce its prediction error, particularly in high-frequency scenarios for water quality prediction.

This study aims to analyze the Da Jing Shan Reservoir in Zhuhai as a critical urban water source by collecting WQI measurements from 2010 to 2020. Utilizing these parameters, the study employs LSTM cyclic networks for training and prediction throughout the dataset. To address the problem caused by low-frequency fluctuations, an HP filter is integrated for data pre-processing, which effectively improves the prediction accuracy of the LSTM model. Additionally, the study conducts comparative analyses of LSTM configurations to identify the most effective model structure for precise water quality forecasting, thereby providing a benchmark for future advancements in water quality prediction methodologies.

Area description

The Da Jing Shan Reservoir, which is located in the northern part of the central urban area of Zhuhai City within Guangdong Province (as shown in Figure 1), is a medium-sized reservoir on the Feng Huang River. According to the CJ 3020-1993 water quality standard for drinking water sources, the Zhuhai Dajing Mountain Water reservoir is a Class II water source. It encompasses a watershed area of 5.95 km2 and has a total storage capacity of 17.1 million cubic meters. The average annual water output is 30 million cubic meters. Therefore, the reservoir is an essential urban water source for Zhuhai and Macau.
Figure 1

The location of the Da Jing Shan Reservoir.

Figure 1

The location of the Da Jing Shan Reservoir.

Close modal

Source and partition of dataset

The research data for this study are sourced from the Environmental Monitoring Yearbook of Zhuhai City, Guangdong Province, spanning from 2010 to 2020. A decade of water quality monitoring data specific to the Da Jing Shan Reservoir is extracted for analysis. A predictive model based on the LSTM network is developed. This model aims to simulate and forecast key water quality parameters of the reservoir, including pH, sulfates (), chlorides (Cl), the permanganate index (CODMn), dissolved oxygen (DO), total phosphorus (TP), and biological oxygen demand (BOD). The historical data from the Da Jing Shan Reservoir are recorded monthly, providing a temporal resolution suitable for detailed analysis. Our predictive model is designed to process 15 prior temporal measurements of water quality data to predict the subsequent measurement. The model's training and evaluation used historical data from January 2010 to December 2020, comprising 132 monthly records. Data allocation for training and validation followed a 4:1 ratio, which was strategically chosen to ensure comprehensive model development and accurate performance evaluation.

Model design

The overarching pattern of water quality variations often manifests a degree of periodicity, which is conducive to enhancing the accuracy of water quality predictions (Syeed et al. 2023). Nevertheless, the actual measurements of water quality parameters are subject to considerable uncertainty, influenced by various external factors, posing significant challenges to the predictive modeling of water quality (Razavi 2021). As indicated previously, the existing LSTM models do not yield highly accurate predictions (Barzegar et al. 2020), and the HP filter is employed to process the data (Nath et al. 2021; Wang et al. 2024), which is subsequently used to construct a water quality predictive model utilizing LSTM, aiming to achieve refined prediction accuracy.

LSTM network modeling

LSTM networks are a sophisticated variant of Recurrent Neural Network (RNNs) (Xiang et al. 2020), developed to address the limitations of RNNs on short-term memory. Traditional RNNs experience difficulties in preserving information over extended sequences, which impedes their ability to relate information from early time steps to later ones (Xiang et al. 2020). In contrast, LSTMs excel at recognizing and retaining long-term dependencies, thereby circumventing the issues of vanishing and exploding gradient that often plague the training process of standard RNNs. This attribute enables LSTMs to remember inputs over a more extended period, making them particularly effective for tasks involving long or delayed sequences. Despite being proposed in 1997, the LSTM model remains widely utilized due to its robust performance in processing time-series data (Murugesan et al. 2022; Roy et al. 2022; Liang et al. 2023; Patel et al. 2023).

LSTM networks are distinguished by their ability to leverage long-term and efficient short-term memory in propagating states across temporal sequences, a process termed ‘cell state’ transmission. This state conveyance extends beyond immediate temporal succession, ensuring sustained efficacy over extended intervals, thus enhancing the network's capability for multi-period forecasting in time-series analysis. The distinctive structure of LSTMs comprises three integral gates: the forget gate, the input gate, and the output gate. Each gate has a specific role in the data flow through the network. The forget gate decides what information should be removed from the cell state; the input gate selects the new data to be included, and the output gate determines which parts of the cell state should be utilized to compute the output at each time step. These gates work in concert to regulate the retention and disposal of information, much like a filtration system, ensuring that the LSTM retains relevant information throughout the data sequence. The modeling process is shown in Figure 2, and this sophisticated gating mechanism allows LSTMs to perform exceptionally well in various complex tasks involving sequential data, such as language modeling and time-series prediction.
Figure 2

The concept of the LSTM modeling process.

Figure 2

The concept of the LSTM modeling process.

Close modal

HP filter

The HP filter, a prominent method in signal separation, is grounded in frequency analysis techniques. It operates on the premise that any time series can be decomposed into a blend of cyclic patterns occurring at diverse frequencies. The primary role of the HP filter is to segregate these cyclical components from the trend component within the time series. This segregation is achieved by smoothing out higher frequency fluctuations, highlighting the lower frequency trend. The extracted trend component is vital for analyzing the time-series data's underlying movements and long-term tendencies. The HP filter clarifies the time series by dividing it into a trend, which indicates the general direction over time, and a cyclical component that reflects short-term variations.

The mathematical basis of the HP filter involves statistical equations that adjust the smoothness of the time series. This adjustment allows for control over the separation between cyclical and trend components. The filter's operation minimizes the sum of squared deviations of the trend component from the actual data. This process is subject to a penalty that regulates the second derivative of the trend component, thus regularizing its smoothness. The smoothing parameter selection, commonly represented by lambda (λ), is critical. It determines the filter's responsiveness to short-term fluctuations versus long-term trends, striking a balance between the two for practical analysis (Zhang et al. 2022b).

Indicators for evaluating the model predictions

In generating predictive outcomes with a forecasting model, it is essential to assess the model's predictive performance rigorously. This assessment focuses on the accuracy and the practical applicability of the model's predictions. The evaluation typically includes several key performance indicators. One such indicator is the root mean squared error (RMSE), a metric that quantifies the differences between predicted and observed values. The RMSE is computed by squaring these differences, averaging them, and then taking the square root of this average. This calculation results in an assessment of the average magnitude of the model's errors. A lower RMSE value indicates a model that more closely aligns with the observed data, highlighting its accuracy and reliability (Song et al. 2021; Li & Li 2023).

Another critical metric in model evaluation is the coefficient of determination, which is commonly represented as R². This statistic measures the fit of a regression model to the observed data by quantifying the proportion of variance in the dependent variable that is predictable from the model's independent variables. An R² value near 1 implies a high explanatory power of the model and a better fit to the data. In addition to R², the mean absolute error (MAE) and the mean relative error (MRE) are also used. The MAE calculates the average of the absolute differences between predicted and observed values, offering a straightforward interpretation of error magnitude. Conversely, the MRE expresses the average error as a percentage of the actual values, with a value closer to zero indicating higher predictive accuracy (Than et al. 2021; Wan et al. 2022). These metrics provide a comprehensive view of a model's performance, guiding improvements and application in real-world scenarios.

Working flow of the model

In the current study, the HP filter is applied for pre-processing to refine the water quality data of the Da Jing Shan Reservoir. Following the data smoothing process, a predictive model is constructed utilizing the LSTM framework. This LSTM-based model is designed to forecast future water quality by learning from historical patterns. The efficacy of the HP-LSTM coupled model lies in its foundation of extracting the maximal potential information from the data to enhance accuracy. The methodological framework and the sequential process of the model are illustrated in Figure 3, offering a visual depiction of the workflow that encompasses data preparation through the prediction output. This figure serves as an instructive schematic, guiding the reader through each stage of the modeling process and ensuring a comprehensive understanding of the steps involved in deriving predictive insights.
Figure 3

The methodological framework of the model in this study.

Figure 3

The methodological framework of the model in this study.

Close modal

Water quality assessment and trends of water quality parameters

To enhance the management of water quality and the WQI for this source, we have identified seven critical water quality indicators for ongoing environmental monitoring: pH, SO42–, Cl, CODMn, TP, DO, and BOD. The pH level is crucial as it influences biological activity and chemical processes in water. Sulfates, chlorides, and phosphorus are significant pollutants that pose direct threats to the environment. CODMn serves as an indirect measure of water quality by indicating the oxidation potential of organic substances, while BOD is a direct measure of the organic pollution that can affect the aquatic life. Monitoring these indicators provides a thorough evaluation of both the environmental quality and the ecological health of the water body (Chanapathi & Thatikonda 2019; Manna & Biswas 2023).

Figure 4 presents an in-depth analysis of the water quality parameter for the Da Jing Shan Reservoir, revealing significant fluctuations over the observed period. The pH values, for instance, exhibit a wide range of variability, oscillating between a high value of 8.86 and a low value of 7.095, centring around a median value of 8.1. This range suggests periods of relative stability, particularly between pH 7 and 7.5, as inferred from the average being lower than the median. The distribution of these pH values, as depicted in the accompanying standard curve and histogram, appears to align closely with a regular distribution model.
Figure 4

Water quality trends and statistical characteristics for Da Jing Shan reservoir. (a) pH content time series diagram, (b) pH content distribution map, (c) SO42− content time series diagram, (d) SO42− content distribution map, (e) Cl content time series diagram, (f) Cl content distribution map, (g) CODMn content time series diagram, (h) CODMn content distribution map, (i) DO content time series diagram, (j) DO content distribution map, (k) TP content time series diagram, (l) TP content distribution map, (m) BOD content time series diagram, (n) BOD content distribution map.)

Figure 4

Water quality trends and statistical characteristics for Da Jing Shan reservoir. (a) pH content time series diagram, (b) pH content distribution map, (c) SO42− content time series diagram, (d) SO42− content distribution map, (e) Cl content time series diagram, (f) Cl content distribution map, (g) CODMn content time series diagram, (h) CODMn content distribution map, (i) DO content time series diagram, (j) DO content distribution map, (k) TP content time series diagram, (l) TP content distribution map, (m) BOD content time series diagram, (n) BOD content distribution map.)

Close modal

levels in the reservoir also showed considerable fluctuations, primarily oscillating between 10 and 40 mg/L. However, within the 60- to 80-month interval, these levels experienced significant variations, peaking at 88 mg/L. The extreme high value of 88.25 mg/L and a remarkable low value of 0.12 mg/L, combined with a median of 19.3 mg/L, indicate sporadic surges in SO4 concentrations. This inference is further supported by the mean value exceeding the median, and the distribution pattern, as visualized in the histogram and standard curve, suggests a deviation from a normal distribution. Cl concentrations displayed relatively less variability, mainly within the range of 20–40 mg/L. However, a notable transient increase is observed early in the sequence between the 1- to 30-month interval, with the maximum value recorded at 107 mg/L and the minimum value at 4.87 mg/L. The median value of 16.4 mg/L implies that a significant number of readings are concentrated between 10 and 30 mg/L. The distribution pattern of Cl does not conform to a standard curve, indicating a skewed distribution.

Additionally, the study notes significant variations in CODMn and DO levels. Initially, the CODMn showed relative stability with minor fluctuations, but after the 80th month, the range of variation increased considerably, with differences of up to 3.5 mg/L. DO levels also varied notably, especially in the latter part of the series, with fluctuations around 2.85 mg/L. Both TP and BOD recorded extreme values early in the study but later exhibited more stability. Post the 80th month, BOD levels began showing more pronounced fluctuations. These patterns, marked by outlier events followed by stability, suggest a distribution that deviates from the typical pattern. Typically, the lower the water temperature and the organic pollutants that consume oxygen, the higher the DO. Since the water temperature in the reservoir area has not undergone significant changes over the years, the impact of the oxygen-consuming indicators, which characterize the organic pollution of the water body in the reservoir area, is evident (Li et al. 2020; Jiang et al. 2022). In summary, the oxygen-consuming metrics reflected the impact of organic pollution on the water body in the reservoir. Meanwhile, pH, chloride ions, and sulfate ions show slight variation.

LSTM standalone prediction

LSTM models for predicting water quality parameters are performed in Figure 5. In cases where LSTM standalone is used, the prediction of multiple water quality parameters such as pH, , DO, and TP is not entirely satisfactory. However, the prediction for parameters such as Cl and BOD is within an acceptable range. The primary reason is the inherent high variability in the original data. The observed variability in the water quality data measurements is quite substantial. The graphical representations of the model predictions highlight the difficulties in making precise forecasts for environmental variables, as the raw outputs from the model do not adequately reflect the intricate patterns and trends present in the water quality parameters of the reservoir, particularly concerning the BOD. Inputting data directly into the LSTM model for prediction without pre-processing may result in the model's inability to accurately capture the dynamic changes within the curve progression, leading to suboptimal predictive performance. Given the limited data, there is still room for improvement in the model's predictive performance, especially for the TP, where significant enhancement potential exists. It is necessary to pre-process the data to a certain degree before utilizing it for prediction purposes.
Figure 5

Training sets and validation sets for water quality by LSTM alone ((a) LSTM model training results of pH, (b) LSTM model test results of pH, (c) LSTM model training results of SO42−, (d) LSTM model test results of SO42−, (e) LSTM model training results of Cl, (f) LSTM model test results of Cl, (g) LSTM model training results of CODMn, (h) LSTM model test results of CODMn, (i) LSTM model training results of DO, (j) LSTM model test results of DO, (k) LSTM model training results of TP, (l) LSTM model test results of TP, (m) LSTM model training results of BOD, (n) LSTM model test results of BOD.

Figure 5

Training sets and validation sets for water quality by LSTM alone ((a) LSTM model training results of pH, (b) LSTM model test results of pH, (c) LSTM model training results of SO42−, (d) LSTM model test results of SO42−, (e) LSTM model training results of Cl, (f) LSTM model test results of Cl, (g) LSTM model training results of CODMn, (h) LSTM model test results of CODMn, (i) LSTM model training results of DO, (j) LSTM model test results of DO, (k) LSTM model training results of TP, (l) LSTM model test results of TP, (m) LSTM model training results of BOD, (n) LSTM model test results of BOD.

Close modal

Preprocessing by the HP filter and its LSTM prediction

In general, the value of lambda is related to the period of the time series when the parameter is set. Taking the mapping of 132 data points of pH value in Dajingshan from 2010 to 2020 as an example, the HP prediction graph under different parameters is shown in Figure 6.
Figure 6

Filtering effects with different lambda (λ) parameters.

Figure 6

Filtering effects with different lambda (λ) parameters.

Close modal

Estimation of the lambda value for the HP filter and its preprocessing

In the HP filter, the lambda parameter controls the smoothness of the output. Lambda selection is typically heuristic, often relying on data properties and iterative refinement (Zhang et al. 2022b). Applying the HP filter as a preprocessing step to the water quality parameter data from the Da Jing Shan Reservoir profoundly impacted the data characteristics, as evidenced in Table 1. The preprocessing led to a significant decrease in variance and kurtosis across different parameters. This reduction in variance implies a more homogenous dataset, and the lower kurtosis suggests a distribution with less extreme outliers or less ‘tailedness’. These changes generally make data more accessible to the model, as extreme fluctuations and anomalies can complicate the training process and lead to poor model performance.

Table 1

Variance and kurtosis coefficient of the data

pHClCODMnDOTPBOD
Unfiltered Variance 0.17 8.52 12.55 0.55 0.43 0.01 0.51 
Kurtosis coefficient 1.68 1.3 4.33 0.29 1.04 2.94 1.14 
Filtered Variance 0.24 10.13 12.98 0.40 0.49 0.02 0.63 
Kurtosis coefficient 5.93 1.02 18.24 −0.87 2.39 63.25 9.40 
pHClCODMnDOTPBOD
Unfiltered Variance 0.17 8.52 12.55 0.55 0.43 0.01 0.51 
Kurtosis coefficient 1.68 1.3 4.33 0.29 1.04 2.94 1.14 
Filtered Variance 0.24 10.13 12.98 0.40 0.49 0.02 0.63 
Kurtosis coefficient 5.93 1.02 18.24 −0.87 2.39 63.25 9.40 

Model parameter adjustment

The parameter adjustment of the HP-LSTM model involves two main aspects. The lambda value of the HP model component is set to 1 based on prior experience. Parameters of the LSTM model include the hidden layer size, back-step, and iteration number. The hidden layer size is set to 64 based on empirical knowledge. However, the back-step and iteration numbers require more careful adjustment according to individual datasets. Observing the fluctuations and trends of each dataset helps determine a general range for these parameters. Subsequently, continuous adjustments are made based on the LOSS chart and the MSE value, leading to the identification of appropriate back-step and iteration numbers.

LSTM prediction performance

The utilization of the HP filter as a pre-processing method on the water quality data from the Da Jing Shan Reservoir had a notable impact on the data's characteristics, as demonstrated in Table 1. This pre-processing resulted in a marked reduction in variance and kurtosis across various parameters. A decrease in variance suggests a more homogeneous dataset, implying that the data points are more closely clustered around the mean. Simultaneously, the reduction in kurtosis indicates a distribution with fewer extreme outliers or less pronounced tails. These alterations generally lead to data that is more amenable to modeling. This is because extreme fluctuations and anomalies in the data can complicate the training process of models, often leading to suboptimal performance.

Figure 7 offers a comparative analysis of the LOSS values for the LSTM model both before and after applying the HP filter. LOSS values are a quantitative indicator of the divergence between the model's predictions and the actual data. Initially, the LSTM model, when trained on unfiltered data, exhibited a LOSS curve with minimal improvement as training progressed. This indicated an inability of the model to learn and adapt from the training data effectively. This is further evidenced by a stagnant test LOSS curve, which implies that the model's predictive accuracy did not progress, rendering its outputs unreliable.
Figure 7

Training sets and validation sets for water quality by LSTM integrated with HP filter. (a) HP-LSTM model training results of pH, (b) HP-LSTM model test results of pH, (c) HP-LSTM model training results of SO42−, (d) HP-LSTM model test results of SO42−, (e) HP-LSTM model training results of Cl, (f) HP-LSTM model test results of Cl, (g) HP-LSTM model training results of CODMn, (h) HP-LSTM model test results of CODMn, (i) HP-LSTM model training results of DO, (j) HP-LSTM model test results of DO, (k) HP-LSTM model training results of TP, (l) HP-LSTM model test results of TP, (m) HP-LSTM model training results of BOD, (n) HP-LSTM model test results of BOD.

Figure 7

Training sets and validation sets for water quality by LSTM integrated with HP filter. (a) HP-LSTM model training results of pH, (b) HP-LSTM model test results of pH, (c) HP-LSTM model training results of SO42−, (d) HP-LSTM model test results of SO42−, (e) HP-LSTM model training results of Cl, (f) HP-LSTM model test results of Cl, (g) HP-LSTM model training results of CODMn, (h) HP-LSTM model test results of CODMn, (i) HP-LSTM model training results of DO, (j) HP-LSTM model test results of DO, (k) HP-LSTM model training results of TP, (l) HP-LSTM model test results of TP, (m) HP-LSTM model training results of BOD, (n) HP-LSTM model test results of BOD.

Close modal

Conversely, the LSTM model trained on data processing with the HP filter demonstrated a significantly different behavior. Post-preprocessing, the LOSS curve of this model showed a pronounced downward trajectory, signifying improved performance with each additional training batch. This trend eventually plateaued, which indicates the model attaining an optimal level of comprehension regarding the data patterns. Such a trend indicates a successful training process, where the model learns effectively and reaches a point of diminishing returns on further training, a hallmark of effective model optimization.

The efficacy of the LSTM model's training, enhanced by the HP filter pre-processing, is convincingly demonstrated in Figure 7, which represents the prediction results for both the training and validation phases, respectively. These figures vividly illustrate the close alignment between the model's predictions and the observed values for each water quality index in both periods. The remarkable predictive performance observed during the training phase is not isolated; it is consistently replicated in the validation phase. This consistency is a strong indicator that the model has achieved a commendable level of generalization. It suggests that the LSTM network is not merely memorizing the training data but is effectively learning and adapting to the underlying patterns in the dataset.

This ability to accurately capture and predict the behavior of the various water quality parameters in the Da Jing Shan Reservoir is a significant testament to the combined approach's effectiveness. The LSTM network, renowned for its capability to process and learn from sequential data, and the HP filter, known for its proficiency in smoothing and highlighting essential trends in time-series data, provide a robust framework for water quality prediction. The combined approach effectively mitigates issues like overfitting, where a model performs well on training data but poorly on unseen data, ensuring that the model remains practical and reliable for water quality applications.

The LOSS plot identifies the model fit

The provided Figure 8 illustrates the LOSS diagram of each indicator model under the final parameters, offering insight into the fitting status of each model.

  • pH: Analysis of Figure 8(a) reveals that the LOSS curve of the unfiltered training set exhibits minimal decrease, suggesting inadequate learning from the training data. Conversely, the filtered LOSS curve for both training and test sets begins to converge after 20 iterations, with a negligible gap between them, indicating an optimal fit.

  • SO42−: Examination of Figure 8(b) indicates declining unfiltered LOSS curves. While the test set stabilizes after 20 iterations, the LOSS value remains above 0.3, indicating underfitting. However, post-filtering, the LOSS curve steadily decreases and flattens after 20 iterations, with a value below 0.1, demonstrating a perfect fit.

  • Cl: As seen in Figure 8(c), the unfiltered LOSS curve stabilizes around 0.6 after 10 iterations, suggesting poor fitting. Conversely, the filtered LOSS curve consistently decreases in both training and test sets, approaching 0.05 around the 35th iteration, with minimal disparity between sets, signifying excellent fitting.

  • CODMn, DO, and BOD: Figure 8(d), Figure 8(e) and Figure 8(g) depict similar LOSS graphs for these indicators. Under unfiltered conditions, the training set yields stable, low LOSS values, while the test set's LOSS curve either stagnates or increases significantly, indicating underfitting. However, post-filtering, both training and test set LOSS curves steadily decline, plateauing around the 40th iteration, with significantly reduced values compared to pre-filtering. Despite not being perfect, these models exhibit strong fitting.

  • TP: Figure 8(f) illustrates a steadily declining LOSS curve for this index both pre- and post-filtering, flattening after 25 iterations. Post-filtering, the value significantly decreases, indicating improved fitting.)

Model evaluation and error comparison before and after filtering

Figure 7 depicts the impact of data pre-processing on the performance of the LSTM model applied to pH value predictions from January 2010 to December 2020 using 132 data points. Before the implementation of data processing, the LOSS metric during the test period demonstrates a trend of stabilization but remains at a high error magnitude. The lack of a decrease in error with an increasing batch count indicates the model's initial inability to extract meaningful patterns from the dataset. This suggests that, in its unprocessed state, the data presents complexities or anomalies that the LSTM model struggles to interpret effectively, hindering its learning and predictive capabilities.

The scenario dramatically changes after the application of HP filter. Post-filtering, the LOSS values consistently stay below 0.2, a substantial improvement over the pre-processing phase. This marked reduction in LOSS values signifies a significantly enhanced predictive performance. The HP filter, known for its ability to smooth out noise and highlight underlying trends in time-series data, appears to have made it more tractable for the LSTM model. By reducing the complexity and irregularities in the data, the HP filter allows the LSTM network to learn from the data more effectively and make accurate predictions. This contrast in performance before and after data processing underscores the importance of appropriate data pre-processing in ML applications, especially in dealing with time-series data. It highlights how preprocessing techniques like the HP filter can improve the quality and reliability of predictions made by sophisticated models such as LSTM networks. This improved performance is not just a technical achievement but also has practical implications, ensuring more reliable and accurate predictions in real-world applications such as environmental monitoring and management.

This study simulated and forecasted seven water quality parameters of the Da Jing Shan Reservoir. The comparison of LSTM training and testing outcomes before and after HP filtering, as depicted in Figure 8 and Table 2, reveals suboptimal predictive accuracy for various water quality indicators before HP filtering. During the training phase, the coefficient of determination (R2) values fluctuated between 32 and 66%, and for the testing phase, these values plummeted below 30%, denoting a markedly inadequate fitting. Post-HP filtering, the training R2 values uniformly exceeded 90%. In the testing phase, except for total phosphorus, which displayed a moderate 58% fitting degree, other indicators such as sulfate and chloride achieved R2 values above 90%. The lower fitting accuracy for total phosphorus may be attributable to minimal fluctuations and smoother curves in the training data instead of the more pronounced variability during the testing phase.
Table 2

Prediction evaluation of water quality by LSTM integrated with HP filtering

DatasetParameterMSEMAEMRER2
Training (without HP filter) pH 0.23 0.16 0.02 0.32 
 8.33 6.68 1.36 0.59 
Cl 9.73 6.37 0.44 0.49 
CODMn 0.40 0.25 0.12 0.60 
DO 0.31 0.22 0.03 0.63 
TP 0.01 0.00 0.22 0.66 
BOD 0.33 0.22 0.16 0.52 
Test (without HP filter) pH 0.28 0.22 0.28 0.00 
 10.42 7.28 0.38 
Cl 9.21 6.46 0.45 0.27 
CODMn 0.91 0.79 0.35 0.22 
DO 0.71 0.60 0.09 
TP 0.01 0.01 0.51 0.23 
BOD 0.95 0.84 0.50 0.22 
Training (with HP filter) pH 0.05 0.03 0.01 0.99 
 2.27 1.66 0.08 0.93 
Cl 1.71 1.38 0.08 0.97 
CODMn 0.05 0.03 0.02 0.98 
DO 0.1 0.07 0.01 0.94 
TP 0.00 0.00 0.09 0.94 
BOD 0.09 0.07 0.04 0.93 
Test (with HP filter) pH 0.07 0.06 0.01 0.73 
 1.23 1.02 0.05 0.97 
Cl 2.57 2.07 0.13 0.92 
CODMn 0.39 0.12 0.15 0.62 
DO 0.17 0.11 0.02 0.80 
TP 0.00 0.00 0.13 0.54 
BOD 0.28 0.25 0.13 0.85 
DatasetParameterMSEMAEMRER2
Training (without HP filter) pH 0.23 0.16 0.02 0.32 
 8.33 6.68 1.36 0.59 
Cl 9.73 6.37 0.44 0.49 
CODMn 0.40 0.25 0.12 0.60 
DO 0.31 0.22 0.03 0.63 
TP 0.01 0.00 0.22 0.66 
BOD 0.33 0.22 0.16 0.52 
Test (without HP filter) pH 0.28 0.22 0.28 0.00 
 10.42 7.28 0.38 
Cl 9.21 6.46 0.45 0.27 
CODMn 0.91 0.79 0.35 0.22 
DO 0.71 0.60 0.09 
TP 0.01 0.01 0.51 0.23 
BOD 0.95 0.84 0.50 0.22 
Training (with HP filter) pH 0.05 0.03 0.01 0.99 
 2.27 1.66 0.08 0.93 
Cl 1.71 1.38 0.08 0.97 
CODMn 0.05 0.03 0.02 0.98 
DO 0.1 0.07 0.01 0.94 
TP 0.00 0.00 0.09 0.94 
BOD 0.09 0.07 0.04 0.93 
Test (with HP filter) pH 0.07 0.06 0.01 0.73 
 1.23 1.02 0.05 0.97 
Cl 2.57 2.07 0.13 0.92 
CODMn 0.39 0.12 0.15 0.62 
DO 0.17 0.11 0.02 0.80 
TP 0.00 0.00 0.13 0.54 
BOD 0.28 0.25 0.13 0.85 
Figure 8

LOSS curves of 7 indexes before and after filtering under the same optimal parameters. (a) For pH, the LOSS curve is compared before and after filtering, (b) For SO42−, the LOSS curve is compared before and after filtering, (c) For Cl, the LOSS curve is compared before and after filtering, (d) For CODMn, the LOSS curve is compared before and after filtering, (e) For DO, the LOSS curve is compared before and after filtering, (f) For TP, the LOSS curve is compared before and after filtering, (h) For BOD, the LOSS curve is compared before and after filtering.

Figure 8

LOSS curves of 7 indexes before and after filtering under the same optimal parameters. (a) For pH, the LOSS curve is compared before and after filtering, (b) For SO42−, the LOSS curve is compared before and after filtering, (c) For Cl, the LOSS curve is compared before and after filtering, (d) For CODMn, the LOSS curve is compared before and after filtering, (e) For DO, the LOSS curve is compared before and after filtering, (f) For TP, the LOSS curve is compared before and after filtering, (h) For BOD, the LOSS curve is compared before and after filtering.

Close modal

Regarding the precision of pH value predictions prior to processing, the forecasted values displayed a reasonable alignment with the actual values during training, evidenced by a MRE of 2% and a MAE of 0.167. Conversely, during the testing phase, the MRE escalated to 28%, and the MAE increased to 0.228. Post-processing improvements are noteworthy, as evidenced by the MRE decreasing to a mere 0.5% and the MAE to 0.027 during training. The testing phase also reflected minimal discrepancies, with an MAE value of 0.056 and an MRE value of approximately 0.7% (Wun & Wen 1991; Karunasingha 2022; Jiao et al. 2023).

For and Cl predictions before data processing, the training phase MAEs are over 6, with MRE soaring to 135% for and 44% for Cl. These metrics approximately stabilized around 40% during the testing phase. Following data processing, both parameters exhibited reduced disparities between predicted and actual values, with MAEs around 1.5 and MREs under 9%. The testing phase retained this trend of precision, with MREs also below 10%. Before processing, the CODMn index predictions reflected MAEs of 0.25 during training and 0.8 during testing, with 11 and 35% MREs, respectively. These values improved significantly after processing, with MAEs reducing to 0.025 and 0.12, and MREs to 2 and 15%, underscoring enhanced predictive accuracy. Preprocessing MAEs are above 0.55 for DO and BOD during testing, with BOD's MRE reaching 50%. Post-processing, however, revealed a significant amelioration during training, with MAEs below 0.2 and MREs of only 2 and 12%, respectively (Willmott & Matsuura 2005).

The impact of data processing on the accuracy metrics for TP is evident when comparing the pre- and post-processing phases. Post-processing, there is a noticeable improvement in the model fitting during training, as indicated by a relatively low error rate of just 8%. This improvement reflects the model's enhanced ability to learn and adapt to the patterns within the training dataset. However, when it comes to testing, the model demonstrates notable lags when its predictions are compared with the actual values. This discrepancy suggests that the model does not fully account for the presence of significant temporal effects. Despite these lags, the overall trend in the model's predictions post-processing remains consistent, with a relative error of 12%. This consistency, even in the presence of temporal variability and a limited dataset, is a positive indicator of the model's capability to capture general trends in the data. However, the discernible delays in model predictions point to areas where further refinement is needed.

Given the complex nature of environmental data, such as that of TP levels, which can be influenced by many factors and exhibit considerable temporal variation, it is not uncommon for models to face challenges in achieving perfect accuracy. The delays and lags observed suggest that the model may benefit from enhancements, such as incorporating additional variables that account for temporal dynamics or applying more sophisticated data pre-processing techniques that can better handle the variability. The current findings, as highlighted in the referenced study (Park & Stefanski 1998), underscore the importance of continuous model evaluation and refinement, especially in fields dealing with dynamic and complex datasets. Improving the model's ability to account for temporal effects and reduce prediction delays will be crucial for increasing its reliability and applicability, such as water quality monitoring and management.

The simulation analyses conducted in this study clearly illustrate the enhanced performance of the LSTM model when augmented with HP filtering, especially when compared to an unaided LSTM model. This enhancement is evident in the model's fitting effects and prediction accuracy. A significant achievement of this combined approach is the high degree of congruence observed between the model's predicted values and the actual observations. For most water quality indicators studied, this congruence surpassed the 80% threshold, a notable benchmark in predictive modeling. In particular, the predictions for , Cl, and DO are remarkably accurate, closely matching the empirical results with an accuracy of up to 90%. This high precision in tracing and simulating the measurement processes speaks volumes about the model's efficacy.

Additionally, the performance of the model for critical parameters such as , Cl, and DO is further validated by the low error metrics. MSE, MAE, and MRE for these parameters all remained below 10%. Such low error rates reinforce the robustness and validity of the proposed model, confirming its reliability in accurately predicting water quality parameters.

LSTM has been applied in various fields, resulting in models such as Long Short-Term Memory with Empirical Mode Decomposition (EMD-LSTM), Long Short-Term Memory with Principal Component Analysis (PCA-LSTM), and Long Short-Term Memory with Wavelet Decomposition and Wavelet Neural Network (WD-WNN-LSTM). EMD-LSTM addresses feature fluctuations, decomposes scale variations, and enhances raw data utilization. PCA-LSTM reduces data dimensions, eliminates redundancy, and decreases prediction errors. WD-WNN-LSTM captures complex spatiotemporal characteristics and non-linear relationships, improving model accuracy (Hao et al. 2022; Li et al. 2022; Wang 2024).

This study significantly impacts water quality forecasting for the Da Jing Shan Reservoir. The successful application of the LSTM model combined with the HP filter in analyzing the reservoir's water quality demonstrates their effectiveness and suitability for similar predictive tasks. For models where short-term data fluctuations impact fitting accuracy, combining HP with LSTM mitigates these fluctuations, enhancing accuracy and achieving a precision of over 85%. Compared to newer composite LSTM models such as EMD-LSTM and Long Short-Term Memory with Kernel Principal Component Analysis (KPCA-LSTM), this approach is more concise and practical. This research is crucial for ensuring water supply safety for both the reservoir and Zhuhai City by providing a reliable model framework.

A real-time water quality monitoring platform based on the model to predict and warn of future changes is developed. This platform aids in the early detection of potential issues, timely issuance of warnings, and pollution process reviews, enhancing response capabilities to water environmental risks. It improves governance planning, promoting high-quality water conservation and environmental protection.

This work is supported by the Foshan Shunde District Core Technology Breakthrough Project (2230218004273), the Science and Technology Plan Project of Zhuhai in the Field of Social Development (2220004000355), Guangdong Basic and Applied Basic Research Foundation (2023B1515040028), and the National Key Research and Development Program of China (2022YFC3202200).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Barzegar
R.
,
Aalami
M. T.
&
Adamowski
J.
2020
Short-term water quality variable prediction using a hybrid CNN-LSTM deep learning model
.
Stochastic Environmental Research and Risk Assessment
34
(
2
),
415
433
.
doi:10.1007/s00477-020-01776-2
.
Bi
J.
,
Chen
Z. X.
,
Yuan
H. T.
&
Zhang
J.
2024
Accurate water quality prediction with attention-based bidirectional LSTM and encoder-decoder
.
Expert Systems with Applications
238
.
doi:10.1016/j.eswa.2023.121807
.
Bourjila
A.
,
Dimane
F.
,
Ghalit
M.
,
Taher
M.
,
Kamari
S.
,
El Hammoudani
Y.
,
Achoukhi
I.
&
Haboubi
K.
2023
Mapping the spatiotemporal evolution of seawater intrusion in the Moroccan coastal aquifer of Ghiss-Nekor using GIS-based modeling
.
Water Cycle
4
,
104
119
.
doi:10.1016/j.watcyc.2023.05.002
.
Chanapathi
T.
&
Thatikonda
S.
2019
Fuzzy-based regional water quality index for surface water quality assessment
.
Journal of Hazardous Toxic and Radioactive Waste
23
(
4
),
11
.
doi:10.1061/(asce)hz.2153-5515.0000443
.
Chen
K. Y.
,
Chen
H. X.
,
Zhou
C. L.
,
Huang
Y. C.
,
Qi
X. Y.
,
Shen
R. Q.
,
Liu
F. R.
,
Zuo
M.
,
Zou
X. Y.
,
Wang
J. F.
,
Zhang
Y.
,
Chen
D.
,
Chen
X. G.
,
Deng
Y. F.
&
Ren
H. Q.
2020
Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data
.
Water Research
171
.
doi:10.1016/j.watres.2019.115454
.
Chen
Z.
,
Xu
H.
,
Jiang
P.
,
Yu
S. N.
,
Lin
G.
,
Bychkov
I.
,
Hmelnov
A.
,
Ruzhnikov
G.
,
Zhu
N.
&
Liu
Z.
2021
A transfer learning-based LSTM strategy for imputing large-scale consecutive missing data and its application in a water quality prediction system
.
Journal of Hydrology
602
.
doi:10.1016/j.jhydrol.2021.126573
.
Chen
S. Y.
,
Huang
J. L.
,
Wang
P.
,
Tang
X.
&
Zhang
Z. Y.
2024
A coupled model to improve river water quality prediction towards addressing non-stationarity and data limitation
.
Water Research
248
.
doi:10.1016/j.watres.2023.120895
.
Fan
C. D.
,
Li
Y. F.
,
Yi
L. Z.
,
Xiao
L. Y.
,
Qu
X. L.
&
Ai
Z. Y.
2022
Multi-objective LSTM ensemble model for household short-term load forecasting
.
Memetic Computing
14
(
1
),
115
132
.
doi:10.1007/s12293-022-00355-y
.
Hao
W.
,
Sun
X. F.
,
Wang
C. Y.
,
Chen
H. Y.
&
Huang
L. M.
2022
A hybrid EMD-LSTM model for non-stationary wave prediction in offshore China
.
Ocean Engineering
246
.
doi:10.1016/j.oceaneng.2022.110566
.
Haq
K.
&
Harigovindan
V. P.
2022
Water quality prediction for smart aquaculture using hybrid deep learning models
.
IEEE Access
10
,
60078
60098
.
doi:10.1109/access.2022.3180482
.
Hasanzadeh
S. K.
,
Saadatpour
M.
&
Afshar
A.
2020
A fuzzy equilibrium strategy for sustainable water quality management in river-reservoir system
.
Journal of Hydrology
586
.
doi:10.1016/j.jhydrol.2020.124892
.
Hussein
E. E.
,
Derdour
A.
,
Zerouali
B.
,
Almaliki
A.
,
Wong
Y. J.
,
Los Santos
M. B.-D.
,
Ngoc
P. M.
,
Hashim
M. A.
&
Elbeltagi
A.
2024
Groundwater quality assessment and irrigation water quality index prediction using machine learning algorithms
.
Water
16
,
2
.
doi:10.3390/w16020264
.
Ilyas
Q. M.
,
Iqbal
K.
,
Ijaz
S.
,
Mehmood
A.
&
Bhatia
S.
2022
A hybrid model to predict stock closing price using novel features and a fully modified Hodrick-Prescott filter
.
Electronics
11
,
21
.
doi:10.3390/electronics11213588
.
Jiang
J. Q.
,
Zhao
G. F.
,
Wang
D. W.
,
Liu
L.
,
Yan
X.
&
Song
H. R.
2022
Identifying trends and driving factors of spatio-temporal water quality variation in Guanting Reservoir Basin, North China
.
Environmental Science and Pollution Research
29
(
58
),
88347
88358
.
doi:10.1007/s11356-022-21714-9
.
Jiao
G. M.
,
Chen
S. K.
,
Wang
F.
,
Wang
Z. Y.
,
Wang
F. J.
,
Li
H.
,
Zhang
F. J.
,
Cai
J. L.
&
Jin
J.
2023
Water quality evaluation and prediction based on a combined model
.
Applied Sciences – Basel
13
,
3
.
doi:10.3390/app13031286
.
Karunasingha
D. S. K.
2022
Root mean square error or mean absolute error? Use their ratio as well
.
Information Sciences
585
,
609
629
.
doi:10.1016/j.ins.2021.11.036
.
Kasiviswanathan
K. S.
,
He
J. X.
,
Sudheer
K. P.
&
Tay
J. H.
2016
Potential application of wavelet neural network ensemble to forecast streamflow for flood management
.
Journal of Hydrology
536
,
161
173
.
doi:10.1016/j.jhydrol.2016.02.044
.
Li
Y. T.
&
Li
R. Y.
2023
Predicting ammonia nitrogen in surface water by a new attention-based deep learning hybrid model
.
Environmental Research
216
.
doi:10.1016/j.envres.2022.114723
.
Li
B.
,
Yang
G. S.
&
Wan
R. R.
2020
Multidecadal water quality deterioration in the largest freshwater lake in China (Poyang Lake): Implications on eutrophication management
.
Environmental Pollution
260
.
doi:10.1016/j.envpol.2020.114033
.
Li
J. M.
,
Zhou
T. T.
&
Hu
X. P.
2022
Prediction algorithm of stock holdings of Hong Kong-funded institutions based on optimized PCA-LSTM model
.
International Journal of Innovative Computing Information and Control
18
(
3
),
999
1008
.
doi:10.24507/ijicic.18.03.999
.
Lian
J. J.
,
Yan
L. L.
,
Yao
Y.
&
Chen
Y. L.
2022
Hydrodynamic and water quality impacts of water transfer project on regulating reservoir, a case study of Dongzhang reservoir
.
Journal of Hydrology
614
.
doi:10.1016/j.jhydrol.2022.128494
.
Liang
B. S.
,
Wang
S. Y.
,
Huang
Y. Q.
,
Liu
Y. L.
&
Ma
L. P.
2023
F-LSTM: FPGA-based heterogeneous computing framework for deploying LSTM-based algorithms
.
Electronics
12
,
5
.
doi:10.3390/electronics12051139
.
Manna
A.
&
Biswas
D.
2023
Assessment of drinking water quality using water quality index: A review
.
Water Conservation Science and Engineering
8
(
1
),
18
.
doi:10.1007/s41101-023-00185-0
.
Marcé
R.
,
George
G.
,
Buscarinu
P.
,
Deidda
M.
,
Dunalska
J.
,
De Eyto
E.
,
Flaim
G.
,
Grossart
H. P.
,
Istvanovics
V.
,
Lenhardt
M.
,
Moreno-Ostos
E.
,
Obrador
B.
,
Ostrovsky
I.
,
Pierson
D. C.
,
Potuzák
J.
,
Poikane
S.
,
Rinke
K.
,
Rodríguez-Mozaz
S.
,
Staehr
P. A.
,
Sumberová
K.
,
Waajen
G.
,
Weyhenmeyer
G. A.
,
Weathers
K. C.
,
Zion
M.
,
Ibelings
B. W.
&
Jennings
E.
2016
Automatic high frequency monitoring for improved lake and reservoir management
.
Environmental Science & Technology
50
(
20
),
10780
10794
.
doi:10.1021/acs.est.6b01604
.
Murugesan
R.
,
Mishra
E.
&
Krishnan
A. H.
2022
Forecasting agricultural commodities prices using deep learning-based models: Basic LSTM, bi-LSTM, stacked LSTM, CNN LSTM, and convolutional LSTM
.
International Journal of Sustainable Agricultural Management and Informatics
8
(
3
),
242
277
.
doi:10.1504/ijsami.2022.125757
.
Nath
P.
,
Saha
P.
,
Middya
A. I.
&
Roy
S.
2021
Long-term time-series pollution forecast using statistical and deep learning methods
.
Neural Computing & Applications
33
(
19
),
12551
12570
.
doi:10.1007/s00521-021-05901-2
.
Park
H.
&
Stefanski
L. A.
1998
Relative-error prediction
.
Statistics & Probability Letters
40
(
3
),
227
236
.
doi:10.1016/s0167-7152(98)00088-1
.
Patel
N.
,
Vasani
N.
,
Jadav
N. K.
,
Gupta
R.
,
Tanwar
S.
,
Polkowski
Z.
,
Alqahtani
F.
&
Gafar
A.
2023
F-LSTM: Federated learning-based LSTM framework for cryptocurrency price prediction
.
Electronic Research Archive
31
(
10
),
6525
6551
.
doi:10.3934/era.2023330
.
Pyo
J.
,
Pachepsky
Y.
,
Kim
S.
,
Abbas
A.
,
Kim
M.
,
Kwon
Y. S.
,
Ligaray
M.
&
Cho
K. H.
2023
Long short-term memory models of water quality in inland water environments
.
Water Research X
21
.
doi:10.1016/j.wroa.2023.100207
.
Razavi
S.
2021
Deep learning, explained: Fundamentals, explainability, and bridgeability to process-based modelling
.
Environmental Modelling & Software
144
.
doi:10.1016/j.envsoft.2021.105159
.
Reichstein
M.
,
Camps-Valls
G.
,
Stevens
B.
,
Jung
M.
,
Denzler
J.
,
Carvalhais
N.
&
Prabhat
,
2019
Deep learning and process understanding for data-driven earth system science
.
Nature
566
(
7743
),
195
204
.
doi:10.1038/s41586-019-0912-1
.
Roy
S. S.
,
Awad
A. I.
,
Amare
L. A.
,
Erkihun
M. T.
&
Anas
M.
2022
Multimodel phishing URL detection using LSTM, bidirectional LSTM, and GRU models
.
Future Internet
14
,
11
.
doi:10.3390/fi14110340
.
Sakaa
B.
,
Elbeltagi
A.
,
Boudibi
S.
,
Chaffai
H.
,
Islam
A. R. M. T.
,
Kulimushi
L. C.
,
Choudhari
P.
,
Hani
A.
,
Brouziyne
Y.
&
Wong
Y. J.
2022
Water quality index modeling using random forest and improved SMO algorithm for support vector machine in Saf-Saf river basin
.
Environmental Science and Pollution Research
29
(
32
),
48491
48508
.
doi:10.1007/s11356-022-18644-x
.
Shan
X.
,
Li
C.-G.
&
Li
F.-M.
2023
Water quality variation of a typical urban landscape river replenished with reclaimed water
.
Water Cycle
4
,
137
144
.
doi:10.1016/j.watcyc.2023.04.001
.
Shi
B.
,
Wang
P.
,
Jiang
J. P.
&
Liu
R. T.
2018
Applying high-frequency surrogate measurements and a wavelet-ANN model to provide early warnings of rapid surface water quality anomalies
.
Science of the Total Environment
610
,
1390
1399
.
doi:10.1016/j.scitotenv.2017.08.232
.
Song
C. G.
,
Yao
L. H.
,
Hua
C. Y.
&
Ni
Q. H.
2021
A novel hybrid model for water quality prediction based on synchrosqueezed wavelet transform technique and improved long short-term memory
.
Journal of Hydrology
603
.
doi:10.1016/j.jhydrol.2021.126879
.
Su
Y. X.
,
Li
J. Y.
,
Liu
L. L.
,
Guo
X.
,
Huang
L. K.
&
Hu
M. Y.
2023
Application of CNN-LSTM algorithm for PM2.5 concentration forecasting in the Beijing-Tianjin-Hebei metropolitan area
.
Atmosphere
14
(
9
),
20
.
doi:10.3390/atmos14091392
.
Syeed
M. M. M.
,
Hossain
M. S.
,
Karim
M. R.
,
Uddin
M. F.
,
Hasan
M.
&
Khan
R. H.
2023
Surface water quality profiling using the water quality index, pollution index and statistical methods: A critical review
.
Environmental and Sustainability Indicators
18
.
doi:10.1016/j.indic.2023.100247
.
Wan
H.
,
Xu
R.
,
Zhang
M.
,
Cai
Y. P.
,
Li
J.
&
Shen
X.
2022
Papers A novel model for water quality prediction caused by non-point sources pollution based on deep learning and feature extraction methods
.
Journal of Hydrology
612
.
doi:10.1016/j.jhydrol.2022.128081
.
Wang
Y. Z.
2024
PCA-LSTM: An impulsive ground-Shaking identification method based on combined deep learning
.
CMES – Computer Modeling in Engineering & Sciences
139
(
3
),
3029
3045
.
doi:10.32604/cmes.2024.046270
.
Wang
L. X.
,
Dong
H. L.
,
Cao
Y. Q.
,
Hou
D. B.
&
Zhang
G. X.
2023a
Real-time water quality detection based on fluctuation feature analysis with the LSTM model
.
Journal of Hydroinformatics
25
(
1
),
140
149
.
doi:10.2166/hydro.2023.127
.
Wang
Y. H.
,
Zhang
C.
,
Fu
Y. Y.
,
Suo
L. M.
,
Song
S. H.
,
Peng
T.
&
Nazir
M. S.
2023b
Hybrid solar radiation forecasting model with temporal convolutional network using data decomposition and improved artificial ecosystem-based optimization algorithm
.
Energy
280
.
doi:10.1016/j.energy.2023.128171
.
Wang
W.-L.
,
Jing
Z.-B.
,
Zhang
Y.-L.
,
Wu
Q.-Y.
,
Drewes J
E.
,
Lee
M.-Y.
&
Hübner
U.
2024
Assessing the chemical-free oxidation of trace organic chemicals by VUV/UV as an alternative to conventional UV/H2O2
.
Environmental Science & Technology
58
(
16
),
7113
7123
.
Willmott
C. J.
&
Matsuura
K.
2005
Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance
.
Climate Research
30
(
1
),
79
82
.
doi:10.3354/cr030079
.
Wong
Y. J.
,
Shimizu
Y.
,
Kamiya
A.
,
Maneechot
L.
,
Bharambe
K. P.
,
Fong
C. S.
&
Nik Sulaiman
N. M.
2021
Application of artificial intelligence methods for monsoonal river classification in Selangor river basin, Malaysia
.
Environmental Monitoring and Assessment
193
,
7
.
doi:10.1007/s10661-021-09202-y
.
Wun
L. M.
&
Wen
L. P.
1991
Assessing the statistical characteristics of the mean absolute error or forecasting
.
International Journal of Forecasting
7
(
3
),
335
337
.
doi:10.1016/0169-2070(91)90007-i
.
Xiang
Z. R.
,
Yan
J.
&
Demir
I.
2020
A rainfall-runoff model with LSTM-based sequence-to-sequence learning
.
Water Resources Research
56
,
1
.
doi:10.1029/2019wr025326
.
Xiao
X.
,
He
J. Y.
,
Huang
H. M.
,
Miller
T. R.
,
Christakos
G.
,
Reichwaldt
E. S.
,
Ghadouani
A.
,
Lin
S. P.
,
Xu
X. H.
&
Shi
J. Y.
2017
A novel single-parameter approach for forecasting algal blooms
.
Water Research
108
,
222
231
.
doi:10.1016/j.watres.2016.10.076
.
Xiong
R.
,
Zheng
Y.
,
Chen
N. W.
,
Tian
Q.
,
Liu
W.
,
Han
F.
,
Jiang
S. J.
,
Lu
M. Q.
&
Zheng
Y.
2022
Predicting dynamic riverine nitrogen export in unmonitored watersheds: Leveraging insights of AI from data-rich regions
.
Environmental Science & Technology
56
(
14
),
10530
10542
.
doi:10.1021/acs.est.2c02232
.
Xu
W. J.
,
Gu
R.
,
Liu
Y. Z. N.
&
Dai
Y. W.
2015
Forecasting energy consumption using a new GM-ARMA model based on HP filter: The case of Guangdong Province of China
.
Economic Modelling
45
,
127
135
.
doi:10.1016/j.econmod.2014.11.011
.
Yan
X.
,
Zhang
T.
,
Du
W.
,
Meng
Q.
,
Xu
X.
&
Zhao
X.
2024
A comprehensive review of machine learning for water quality prediction over the past five years
.
Journal of Marine Science and Engineering
12
,
1
.
doi:10.3390/jmse12010159
.
Yang
Q.
,
Zhang
J.
,
Hou
Z.
,
Lei
X.
,
Tai
W.
,
Chen
W.
&
Chen
T.
2017
Shallow groundwater quality assessment: Use of the improved Nemerow pollution index, wavelet transform and neural networks
.
Journal of Hydroinformatics
19
(
5
),
784
794
.
doi:10.2166/hydro.2017.224
.
Yin
Q.
,
Sun
Y.
,
Li
B.
,
Feng
Z.
&
Wu
G.
2022
The r/K selection theory and its application in biological wastewater treatment processes
.
Science of the Total Environment
824
.
doi:10.1016/j.scitotenv.2022.153836
.
Zamani
M. G.
,
Nikoo
M. R.
,
Rastad
D.
&
Nematollahi
B.
2023
A comparative study of data-driven models for runoff, sediment, and nitrate forecasting
.
Journal of Environmental Management
341
.
doi:10.1016/j.jenvman.2023.118006
.
Zhang
J. X.
&
Li
S. Y.
2022
Air quality index forecast in Beijing based on CNN-LSTM multi-model
.
Chemosphere
308
.
doi:10.1016/j.chemosphere.2022.136180
.
Zhang
L.
,
Jiang
Z. Q.
,
He
S. S.
,
Duan
J. F.
,
Wang
P. F.
&
Zhou
T.
2022a
Study on water quality prediction of urban reservoir by coupled CEEMDAN decomposition and LSTM neural network model
.
Water Resources Management
36
(
10
),
3715
3735
.
doi:10.1007/s11269-022-03224-y
.
Zhang
X. Q.
,
Wu
X. L.
,
Xiao
Y. M.
,
Shi
J. W.
,
Zhao
Y.
&
Zhang
M. H.
2022b
Application of improved seasonal GM (1,1) model based on HP filter for runoff prediction in Xiangjiang River
.
Environmental Science and Pollution Research
29
(
35
),
52806
52817
.
doi:10.1007/s11356-022-19572-6
.
Zhang
Y. T.
,
Li
C. L.
,
Jiang
Y. Q.
,
Sun
L.
,
Zhao
R. B.
,
Yan
K. F.
&
Wang
W. H.
2022c
Accurate prediction of water quality in urban drainage network with integrated EMD-LSTM model
.
Journal of Cleaner Production
354
.
doi:10.1016/j.jclepro.2022.131724
.
Zhu
S.
&
Piotrowski
A. P.
2020
River/stream water temperature forecasting using artificial intelligence models: A systematic review
.
Acta Geophysica
68
(
5
),
1433
1442
.
doi:10.1007/s11600-020-00480-7
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).