This work explores the accurate prediction of water quality, a critical aspect of water environment protection. Leveraging advanced algorithms and statistical models, including regression analyses, moving average, and artificial neural networks (ANN), we analyze multivariate time-series datasets from tap water testing sites. Our approach involves model fitting and evaluation based on the Akaike information criterion (AIC) and root mean square error (RMSE). The work introduces the long short-term memory (LSTM) neural network and the seasonal autoregressive integrated moving average (SARIMA) model, detailing their architectures and implementations. Results highlight the effectiveness of a multivariate LSTM model in forecasting total chlorine concentrations, complemented by the SARIMA model. Evaluation metrics reveal comparable predictive accuracy, with a slight advantage observed for the mixed method combining both models. This work underscores the significance of integrating technological and statistical methodologies for enhanced water quality predictions, contributing to proactive water management strategies.

  • Optimization of water quality monitoring process.

  • Predictive analysis to alleviate nitrification concerns.

  • Atmospheric surface temperature is a good predictor of total chlorine concentration.

  • Long short-term memory and ARIMA models were fit to the data with optimal hyperparameters.

  • Both models provide a similar degree of accuracy.

Ensuring the accurate prediction of water quality is of paramount importance in the domain of water environment protection, offering operators the foresight to implement proactive measures and mitigate potential issues before they even occur. Despite the critical nature of this endeavor, achieving the desired precision remains a complex challenge. However, the contemporary landscape is replete with a plethora of meticulously designed and rigorously tested algorithms and statistical models specifically crafted for the analysis of time-series data. Notable among these are regression analyses, support vector regressions, Grey systems, and artificial neural networks (ANN) (Zhou et al. 2018). There are several models that were tested but not included in this study. Among these are the random forest model and gradient boosting models such as XGBoost. These are not included in this study because the goal is to compare models that are specifically designed for time series prediction.

Water quality monitoring data typically takes the form of multivariate time-series datasets, integrating historical and daily updated information from tap water testing sites within the distribution networks of two water treatment plants. Advanced technologies like IoT facilitate real-time water quality monitoring, addressing critical needs in resource conservation and management (Karthikeyan et al. 2023). Exploratory data analysis plays a vital role in understanding the dataset's structure and analysis objectives.

Researchers have extensively studied the influence of factors such as temperature (Oliveira & Von Sperling 2008), pH (Zhang et al. 2019), and other chemical components on water quality. Analytical techniques help uncover relationships among these variables. For instance, principal component analysis effectively explores correlations among physicochemical and microbiological parameters (Platikanov et al. 2019), while correlation analysis provides insights into variable associations (Mitryasova et al. 2016). Cross-correlation analysis reveals temporal dependencies among time-series variables (Lehmann & Rode 2001).

Time-series forecasting methods like autoregressive integrated moving average (ARIMA) have proven useful in water quality predictions (Hernández et al. 2017; Selvaraj & Shalma 2024). More advanced approaches, such as long short-term memory (LSTM) networks, offer sophisticated solutions for modeling and predicting time-series data (Chen et al. 2022). Case studies demonstrate LSTM's superior performance in water quality analysis and prediction (Yoon et al. 2024). Due to time constraints, not every type of machine learning or statistical model could be tested on this dataset. Considering the goal of this case study is solely to maximize the accuracy of predicted values of a time series, the models tested are renowned for their predictive ability in a time series context. Therefore, the LSTM and ARIMA models, representing machine learning and statistical methods, respectively, are considered in this study.

In this work, we focus on key parameters such as weather, chlorine, ammonia, and nitrite to monitor and predict water quality. By bridging water science with predictive analytics, this study contributes to advancing water quality forecasting, promoting better environmental stewardship, and resource management.

The data comprise multivariate time-series datasets, integrating past and current information from tap water testing sites within the distribution system of two water treatment plants. Initial exploration involves selecting target and feature variables using the Pearson correlation coefficient. Subsequently, three models, one being a hybrid of the other two, are fitted to the data. Model evaluation employs the Akaike information criterion (AIC), and predictive performance is assessed through the root mean square error (RMSE).

The LSTM model

The LSTM neural network, a subtype of recurrent neural network (RNN), plays a crucial role in forecasting diverse time series data, ranging from stock prices (Nelson et al. 2017) and traffic flow (Fu et al. 2016) to speech recognition (Graves et al. 2013). Its architecture, as shown in Figure 1, allows all the advantages of a classic RNN while providing a solution to the exploding/vanishing gradient problem in the form of an extra module known as the ‘Cell State,’ which provides the long-term memory component (Van Houdt et al. 2020).
Figure 1

Basic architecture of an LSTM unit.

Figure 1

Basic architecture of an LSTM unit.

Close modal

The LSTM framework also supports multivariate input and output, which is particularly important when forecasting chemical concentrations at tap water testing sites (Im et al. 2022). Typically, multiple chemical components are measured simultaneously at each testing site, resulting in several time series for each location (Marcoux et al. 2017). The criteria for selecting additional feature variables or time series involve considering the correlation coefficient between the variable to be predicted and any variable that may be related.

Model implementation

Model input is of the form {Xi1(td), Xi2(td), … , Xis(td)}, where observations for each Xis(td) for s ∈ {1, 2, … , n} are expanded to the set {Xis(t − 1), Xis(t − 2), … , Xis(td)} assuming there are n number of feature variables included that are directly related to the time series of interest and each Xis represents a feature/predictor variable for row/observation i and feature indicator s (Siami-Namini et al. 2018).

The implementation of a multivariate LSTM model can pose challenges due to the requirement of (k − 1) future/predicted values for all included feature variables, based on the number k of future values in the time series to be predicted. A frequently adopted approach involves separately predicting up to (k − 1) future values for all feature variables and subsequently using these predicted values as the required future values.

As mentioned previously, the model is trained on batches of sequential data. Let ‘n’ represent the number of past observations/timesteps in a dataset, ‘d’ donate the number of past observations used to predict the next value in the sequence of interest, ‘s’ stand for the total number of feature variables, and ‘m’ indicate the number of batches to be extracted from the dataset. These ‘batches’ can be expressed as matrices of size [d, s] denoted by ‘’ for ‘i’ ∈{1, 2, ··· , nd} where (Fneish 2019):

Each of m1[d, 1], m2[d, 1], ··· , mnd[d, 1] is exactly equal to Y = Xi1(tn + d + 1), Xi1(tn + d + 2), ··· , Xi1(t − 1). Therefore, if we let M = m1, m2, ··· , mn, we can assert that every element of M corresponds to every element of Y. This outlines how the LSTM model is trained: predict Yi given Mi. Conceptually, this can be envisioned as a window of length d commencing at the first row of the dataset (time tn) and progressively sliding down row-by-row until the second to last row (time t − 2) (Fneish 2019).

Figure 2 illustrates the process of reshaping a dataset with two columns of n rows, The widely adopted optimizer for time series prediction with an LSTM model is ‘Adam’, accompanied by the typical use of mean-squared error (MSE) as the loss function (Jiang & Chen 2017).
Figure 2

Dataset with dimension [n × 2] is converted into batches.

Figure 2

Dataset with dimension [n × 2] is converted into batches.

Close modal

The SARIMA model

The seasonal autoregressive integrated moving average (SARIMA) model represents a more statistical approach to time series forecasting compared to an LSTM model (Theerthagiri & Ruby 2023). It extends the ARIMA model to incorporate the seasonality inherent in a time series. The model is commonly employed in situations where a discernible pattern of variation occurs at specific intervals of time, indicating a seasonal component. SARIMA is characterized by six hyperparameters (Valipour et al. 2012):

  • p: autoregressive order

  • d: differencing order

  • q: moving average order

  • P: seasonal autoregressive order

  • D: seasonal differencing order

  • Q: seasonal moving average order

The ‘trend’ hyperparameters encompass the p, d, and q parameters, as they are utilized to model the immediate trend in the time series. Conversely, P, D, and Q are employed to model the seasonal component of the time series. If the autoregressive order (p) is set to n, then the model incorporates n weighted lags with which to model yt would be (t − 1, t − 2, ··· , tn). Similarly, if the moving average order (q) is set to n, the model incorporates n weighted errors of the lags of the time series to model yt would be (t − 1, t − 2, ··· , tn) (Hyndman & Athanasopoulos 2018).

For seasonal autoregressive and moving average orders, (P and Q) are set to n, then there would be 2n additional terms added to the model where the same explanations of p and q apply to P and Q, respectively. However, the lags for P and Q are represented as (tm, t − 2m, ··· tnm), where ‘m’ is the length of the ‘season’, a period where yt varies in a predictable pattern – could be weeks, months (Hyndman & Athanasopoulos 2018).

The first three (p, d, q) compose the entirety of the ARIMA model, and the last three (P, D, Q) are the same as the first three but apply to the seasonal component. In notation (Pennsylvania State University 2022):

Autoregressive order AR(p = n): yt = β0 + β1(yt−1) + β1(yt−1) + ··· + βn(ytn) + εt

Differencing order I(d = n): where Byt = yt−1

Moving average order MA(q = n): yt = c + εt + θ1εt−1 + θ2εt−2 + ··· + θnεtn

As an example, a SARIMA(p, q, d)(P, Q, D)m where each parameter is equal to n except m would look like:

In the above model, yt represents the time series of interest, zt denotes the time series after both trend (d) and seasonal (D) differencing have been applied, ω is the intercept term, and n is the order of all six other hyperparameters. The first two summations in the lower part correspond to the p (trend) and P (seasonal) autoregressive components, respectively, while the last two are the q (trend) and Q (seasonal) moving average components, respectively (Hyndman & Athanasopoulos 2018).

Data

The United States' Center for Disease Control reports that nine out of 10 Americans get their water from one of more than 148,000 public water systems. These treatment systems typically rely on either groundwater or surface water (from rivers and lakes) as their source of raw water (Centers for Disease Control & Prevention n.d.). To uphold water quality standards, operators of said treatment facilities undertake the daily measurement of specific chemical components in the treated water. These measurements are taken as the water flows out of storage tanks, passes through booster pump stations (where the treated water is ‘boosted’ with additional chlorine), and at various locations, primary endpoints, throughout the distribution system.

The dataset used in this analysis consists of daily measurements for each chemical of interest, collected from various locations throughout the distribution system of a medium-sized city in the southern US. The city's water supply comes from two main treatment plants, Jefferson and El Pico, which together serve approximately 260,000 residents with a combined capacity of 85 million gallons per day. The data collection period across these 36 tap water testing sites spans from 1 September 2021 to 3 December 2023, spanning approximately 1,100 days/data points. It should be noted that the frequency and method of measurement result in a highly variable time series for each chemical monitored, resulting in generally low autocorrelation (<0.2) across the time series of interest. Additionally, some data cleaning was performed prior to predictive analysis. The cleaning process included ensuring that there were no data gaps for addresses that had some missing data. These small data gaps were filled via linear interpolation. The provided data were generally very clean, and there were not significant outliers that warranted removal.

The levels of various chemicals in milligrams per liter (mg/L) undergo continuous monitoring at each tap water testing site, encompassing parameters such as total chlorine, monochloramine, free ammonia, nitrite, and nitrate. However, the focal point of predictive analysis and forecasting narrows down to total chlorine and free ammonia for two reasons. First, free ammonia is predicted because it is the main contributing factor to nitrification. Second, chlorine concentration is predicted because it is the sole parameter subject to manual adjustment by controlling the chlorine dosage at booster pump stations. Finally, other chemicals – beside total chlorine and free ammonia – are either redundant measurements of total chlorine (such as monochloramine) or were measured with too low precision to allow for practically useful prediction. The concentrations of all other monitored chemicals such as nitrite, nitrate, monochloramine, are the result of chemical reactions with chlorine taking place as the water ages throughout the distribution system and are not directly modifiable. Moreover, the flow of treated water from a booster pump station to an endpoint may take several days. Therefore, forecasting total chlorine and free ammonia concentrations at endpoints of the distribution system gives operators the best chance of proactively remedying the situation of chemical levels moving into their respective critical zones.

The variability of each chemical over time exhibits a moderate degree of seasonality aligned with meteorological season. As illustrated in Figure 3, total chlorine tends to peak in mid-winter, gradually declining to its lowest concentrations in mid-summer. Conversely, free ammonia follows an inverse pattern, peaking in summer and reaching its lowest levels in winter. Consequently, atmospheric conditions, particularly temperature, are suspected to be the primary driving factors influencing the long-term trends in the concentration of each chemical component.
Figure 3

Levels of total chlorine and free ammonia in mg/L at a testing site.

Figure 3

Levels of total chlorine and free ammonia in mg/L at a testing site.

Close modal

Despite the evident inverse relationship observed between the levels of the two chemicals plotted above, the coefficient of correlation, using the Pearson method, is only −0.39. Surprisingly, establishing a statistically significant relationship between any two chemical components, whether analyzing one location or an aggregate of several, proves challenging. In exploring potential predictors of total chlorine (or other components) level, the variable demonstrating the strongest observed relationship with the concentration of any chemical is the maximum daily temperature recorded by the Automated Weather Observing System (AWOS) of the international airport within the city. Pearson correlation coefficients of the relationship between said maximum temperature and the concentration of the previously noted key chemical components – whether at a single site or any combination – range from approximately 0.4 to 0.7 but tend to hover around 0.6.

To streamline the model and minimize unnecessary feature variables, only those variables consistently showing a statistically significant relationship with total chlorine levels (>50% of sites) are considered. In this scenario, maximum daily temperature from the airport's AWOS emerges as the sole variable meeting this requirement. We have these daily maximum temperature values for every single day for which we have chemical concentration data. Consequently, the final dataset encompasses total chlorine levels for each of the typically 36 (variable due to real-time monitoring needs) testing sites for each day, along with the maximum temperature for each day converted into degrees Fahrenheit and rounded to the nearest whole number.

A multivariate LSTM model is employed to forecast the next six daily total chlorine concentration measurements (in mg/L) at 10 tap water testing sites in a southern US city. The model's input consists of 16 batches of sequences with two variables that are heavily correlated with total chlorine levels: lagged chlorine levels and daily maximum temperature. The output is a single value representing the predicted total chlorine for the next day at each site. The model is trained on 80% of the available data and tested on the remaining 20%.

Adapted from insights from Liu et al. (2022), the stacked LSTM RNN consists of two hidden layers (between the required input and output layers) containing three and eight neurons, respectively. The choice of these specific layer sizes was based on a balance between model complexity and performance, as preliminary experiments suggested that this configuration achieved good predictive accuracy without overfitting. Liu et al. (2022) employed similar layer configurations for time series forecasting tasks, demonstrating the effectiveness of smaller layers in capturing temporal dependencies without excessive computation. The optimizer used is ‘Adam,’ known for its efficiency in training deep learning models, and the loss function is MSE, as it is commonly employed for regression tasks to minimize error. The activation function is ‘SoftPlus,’ as it resulted in the lowest average loss during experimentation, reflecting its ability to avoid issues like vanishing gradients compared to traditional activation functions. The learning rate is set to 0.001, and the number of epochs is fixed at 50. Additionally, an ‘EarlyStopping’ callback is implemented to prevent the initiation of the next epoch if the validation loss has not improved for the past five epochs (Keras 2022). Further experiments could explore the impact of varying the number of neurons or epochs to optimize performance.

Given the robust relationship between total chlorine levels and temperature, it proves beneficial to incorporate temperature in the model, as depicted in Figure 4. Utilizing only the forthcoming maximum daily temperature, the model sequentially generates predictions for total chlorine concentration in the next day over the course of 700 days. In Figure 4, the resulting forecast is displayed in orange over the actual chlorine values during that same period. The actual model implemented to predict future chlorine values relies on the past 10 days of predicted total chlorine levels as well as maximum temperature. However, Figure 4 demonstrates that accurately forecasting the trend of chlorine values can be achieved by using only maximum surface air temperature as a predictor variable. The implication of this visualization is that temperature significantly influences the overarching trend of total chlorine levels, allowing for an accurate modeling of this trend using solely daily maximum temperature.
Figure 4

Actual vs. forecasted total chlorine concentration (in mg/L) at a testing site for the next 700 days using future maximum daily temperatures.

Figure 4

Actual vs. forecasted total chlorine concentration (in mg/L) at a testing site for the next 700 days using future maximum daily temperatures.

Close modal

Among the various statistical models considered, including SARIMA and ARIMA types, an exhaustive model selection process using the ‘auto.arima’ function in R's ‘forecast’ package led to the implementation of the ARIMA(5,1,1) model. The choice is based on the model providing the minimum AIC metric. The SARIMA model was considered because of the observed seasonal variation of both chemical concentrations. However, the ARIMA model performed the best of all the ARIMA-type models tested.

ARIMA-class models are reduced SARIMA-class models that do not account for any seasonal component, with only the p, d, and q terms included. Consequently, this model's prediction of the current day's total chlorine level at any testing site is a function of only the previous 5 days' total chlorine levels (autoregressive order of 5) and yesterday's error in total chlorine level (moving average order of 1) applied to the first difference of the time series (differencing order of 1).

Recognizing the proven efficiency of aggregated forecasts from multiple models in enhancing prediction accuracy (Jose & Yasala 2024), a supplementary forecasting approach involves computing another forecast. This is achieved by averaging the predicted values from each of the two models for every day throughout the forecast period. Subsequently, the performance of each method is systematically compared, focusing on a specific testing site.

The model used in this case study demonstrates strengths, such as capturing the strong relationship between total chlorine levels and daily maximum temperature, leading to accurate predictions. Its flexibility in forecasting multiple sites and using previous day values alongside temperature enhances its effectiveness. However, the model has weaknesses, such as relying solely on temperature, which may limit accuracy in areas where factors like manual boosting chemicals significantly impact chlorine levels.

As illustrated in Figure 5, LSTM and ARIMA models exhibit a good fit to the data, accounting for its inherent noise. The fitted values are the model predictions given the input data (past chlorine values and daily maximum surface air temperature). In Figure 5, these fitted values are visually compared to the actual values for each model at one site. Consequently, all three models are fitted to the time series for each corresponding testing site, and their performance is assessed by comparing RMSE. RMSE is the standard deviation of the residuals, so it provides an idea of the average distance between the fitted and observed values (Cerqueira et al. 2020):
Figure 5

Original vs. Fitted values of each model at a testing site.

Figure 5

Original vs. Fitted values of each model at a testing site.

Close modal

In the provided formula, Pi represents the predicted total chlorine concentration for a specific time at a specific site, Oi denotes the observed total chlorine concentration at that same time and site, and n represents the number of recorded measurements for that site. It is important to note that RMSE is calculated on a per-site basis, considering each site individually.

Five-number summaries of the RMSE values for each method provide a better sense of how well these models predict relative to each other.

Overall, the difference in predictive error when comparing one method to any other is not practically important. Any of the three models could be used interchangeably for the same degree of practical utility. This fact is illustrated in both Figure 6 and Table 1, as five-number summaries for each model are practically equivalent.
Table 1

Five-number summary of each method's RMSE

Five-number summary
Min25th percentileMedian75th percentileMax
Method ARIMA 0.23 0.29 0.33 0.41 0.74 
LSTM 0.22 0.28 0.31 0.40 0.77 
Mixed 0.22 0.27 0.31 0.40 0.74 
Five-number summary
Min25th percentileMedian75th percentileMax
Method ARIMA 0.23 0.29 0.33 0.41 0.74 
LSTM 0.22 0.28 0.31 0.40 0.77 
Mixed 0.22 0.27 0.31 0.40 0.74 
Figure 6

Comparison of all three models' RMSE values upon fitting each to all 36 testing sites of the distribution system.

Figure 6

Comparison of all three models' RMSE values upon fitting each to all 36 testing sites of the distribution system.

Close modal

Additionally, the same LSTM model used for total chlorine level prediction was applied to predict free ammonia concentration for the next 6 days for each of the 36 testing sites. Maximum daily temperature is the only other variable that is included in the model (just like total chlorine prediction) due to the strong correlation of free ammonia and near-surface air temperature. Validation results of these predictions are essentially the same (proportionally) as those illustrated above of total chlorine.

Finally, follow-up evaluation of the LSTM model's predictive accuracy several months after implementation demonstrates its high degree of accuracy. On average, model predictions of total chlorine for the next day are off by roughly 80% of a testing site's standard deviation in total chlorine levels. Given the high degree of variability in day-to-day total chlorine levels (as seen in Figure 3), we believe that an RMSE value less than the standard deviation indicates a good/useful level of predictive accuracy.

Accurate water quality prediction is essential for the efficient operation of water treatment systems, yet noisy data remains a persistent challenge to achieving reliable outcomes. To address this, we proposed a hybrid prediction model that combines the strengths of ARIMA and LSTM, as outlined by Fathi (2019). This composite model leverages the complementary capabilities of its constituent models, with its performance evaluated using RMSE as the benchmark.

The analysis reveals that the hybrid ARIMA–LSTM model offers a marginal improvement in predictive accuracy over individual ARIMA and LSTM models. While the standalone models exhibit comparable performance in this study, the LSTM network demonstrates a slight edge, particularly in capturing long-term trends. The reason that the LSTM model is used instead of either the ARIMA or mixed model is that the multivariate nature of the LSTM enables it to effectively model extended trends in water quality, driven by the strong correlation between chemical concentrations and near-surface air temperature. Furthermore, model interpretability is not a key consideration in this case, as the goal is to maximize raw predictive power.

Despite these strengths, daily variations in water quality data, often influenced by human inputs, introduce noise that cannot be solely attributed to temperature fluctuations. These variations may also be affected by external factors, like water treatment interventions, including chemical dosing, which can significantly alter water quality but may not be accounted for in the model. In cases where these variations could significantly impact chemical component levels, modeling this variability becomes crucial. Incorporating additional predictor variables beyond daily maximum temperature is recommended to improve predictive accuracy and better capture these external influences. However, data acquisition depends on the sensor capabilities of municipal water treatment systems, which may have limitations in both coverage and resolution, further complicating model accuracy. Addressing these data constraints is a critical step toward improving daily variability predictions.

The LSTM model takes roughly 25 s to be trained on the data and generate predictions for each site, meaning that it takes around 15 min in total to train the model on all 36 sites. This amount of training time is not a constraint in the overall predictive process, however, as the predictions are automatically generated early in the morning each day. Given the resources available, this process could be expanded to include a vast number of additional municipalities before training time and hardware limitations even become a concern.

Further optimization, including exhaustive tuning of LSTM hyperparameters and expanding the ensemble with additional models, holds potential for improving predictive accuracy. Nevertheless, the LSTM model presented here provides sufficient accuracy for most practical applications, demonstrating its utility in advancing water quality management and fostering better resource stewardship.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Centers for Disease Control and Prevention. (n.d.). About Drinking Water. Available at: https://www.cdc.gov/drinking-water/about/index.html
Chen, H., Yang, J., Fu, X., Zheng, Q., Song, X., Fu, Z., Wang, J., Liang, Y., Yin, H., Liu, Z., Jiang, J. & Yang, X.
(
2022
)
Water quality prediction based on LSTM and attention mechanism: a case study of the Burnett river
,
Australia. Sustainability
,
14
(
20
),
13231
.
Fathi
O.
(
2019
)
Time series forecasting using a hybrid ARIMA and LSTM model
,
Velvet Consulting
,
1
7
.
Fneish
M.
(
2019
)
Keras LSTM Diagram. GitHub. Available at: https://github.com/MohammadFneish7/Keras_LSTM_Diagram.
Fu
R.
,
Zhang
Z.
&
Li
L.
(
2016, November
)
Using LSTM and GRU neural network methods for traffic flow prediction
,
2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC)
,
324
328
.
IEEE
.
Graves
A.
,
Jaitly
N.
&
Mohamed
A. R.
(
2013
)
Hybrid speech recognition with deep bidirectional LSTM
, In:
2013 IEEE Workshop on Automatic Speech Recognition and Understanding
,
Olomouc, Czech Republic: IEEE, pp.
273
278
.
Hernández
N.
,
Camargo
J.
,
Moreno
F.
,
Plazas-Nossa
L.
&
Torres
A.
(
2017
)
ARIMA as a forecasting tool for water quality time series measured with UV-vis spectrometers in a constructed wetland
,
Tecnología Y Ciencias del Agua
,
8
(
5
),
127
139
.
Hyndman
R. J.
&
Athanasopoulos
G.
(
2018
)
Seasonal ARIMA models
. In:
Forecasting: Principles and Practice
.
OTexts
. .
Jiang
S.
&
Chen
Y.
(
2017
)
Hand gesture recognition by using 3DCNN and LSTM with Adam optimizer
,
Pacific Rim Conference on Multimedia
.
Cham
:
Springer International Publishing
. pp.
743
753
.
Jose
A.
&
Yasala
S.
(
2024
)
Machine learning-based ensemble model for groundwater quality prediction: a case study
,
Water Practice and Technology
,
19
(
6
),
2364
2375
.
Karthikeyan
V.
,
Palin Visu
Y.
&
Raja
E.
(
2023
)
Integrated intelligent system for water quality monitoring and theft detection
,
Water Practice & Technology
,
18
(
12
),
3035
3047
.
Keras
(
2022
)
EarlyStopping Callback. Keras Documentation. Available at: https://keras.io/api/callbacks/early_stopping/.
Liu
C.
,
Zhang
Y.
,
Sun
J.
,
Cui
Z.
&
Wang
K.
(
2022
)
Stacked bidirectional LSTM RNN to evaluate the remaining useful life of supercapacitor
,
International Journal of Energy Research
,
46
(
3
),
3034
3043
.
Marcoux
A.
,
Pelletier
G.
,
Legay
C.
,
Bouchard
C.
&
Rodriguez
M. J.
(
2017
)
Behavior of non-regulated disinfection by-products in water following multiple chlorination points during treatment
,
Science of The Total Environment
,
586
,
870
878
.
Mitryasova
O.
,
Pohrebennyk
V.
,
Cygnar
M.
&
Sopilnyak
I.
(
2016
)
Environmental natural water quality assessment by method of correlation analysis
,
International Multidisciplinary Scientific GeoConference: SGEM
,
2
,
317
324
.
Nelson
D. M.
,
Pereira
A. C.
&
De Oliveira
R. A.
(
2017, May
)
Stock market's price movement prediction with LSTM neural networks
,
2017 International Joint Conference on Neural Networks (IJCNN)
,
IEEE
. pp.
1419
1426
.
Oliveira
S. C.
&
Von Sperling
M.
(
2008
)
Reliability analysis of wastewater treatment plants
,
Water Research
,
42
(
4–5
),
1182
1194
.
Pennsylvania State University
(
2022
)
Introduction to Time Series and Forecasting
. Lesson: 4, Statistics Online Program. Available at: https://online.stat.psu.edu/stat510/lesson/4/4.1.
Platikanov
S.
,
Baquero
D.
,
González
S.
,
Martín-Alonso
J.
,
Paraira
M.
,
Cortina
J. L.
&
Tauler
R.
(
2019
)
Chemometric analysis for river water quality assessment at the intake of drinking water treatment plants
,
Science of The Total Environment
,
667
,
552
562
.
Selvaraj
P.
&
Shalma
H.
(
2024
)
ARIMA modeling for reliable potable water identification and quality prediction
,
2024 5th International Conference on Smart Electronics and Communication (ICOSEC)
. pp.
1878
1883
.
doi:10.1109/ICOSEC61587.2024.10722505
.
Siami-Namini
S.
,
Tavakoli
N.
&
Namin
A. S.
(
2018
)
A comparison of ARIMA and LSTM in forecasting time series
,
2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)
.
IEEE
pp.
1394
1401
.
Theerthagiri
P.
&
Ruby
A. U.
(
2023
)
Seasonal learning based ARIMA algorithm for prediction of Brent oil price trends
,
Multimedia Tools and Applications
,
82 (16), 24485–24504. https://doi.org/10.1007/s11042-023-14819-x
.
Valipour
M.
,
Banihabib
M. E.
&
Behbahani
S. M. R.
(
2012
)
Parameters estimate of autoregressive moving average and autoregressive integrated moving average models and compare their ability for inflow forecasting
,
Journal of Mathematics and Statistics
,
8
(
3
),
330
338
.
Van Houdt
G.
,
Mosquera
C.
&
Nápoles
G.
(
2020
)
A review on the long short-term memory model
,
Artificial Intelligence Review
,
53
,
5929
5955
.
Yoon
S.
,
Shin
J.
,
Park
N.-S.
,
Kweon
M.
&
Kim
Y.
(
2024
)
A study on a hybrid water quality prediction model using sequence to sequence learning based LSTM and machine learning
,
Desalination and Water Treatment
,
320
,
100895
.
doi:10.1016/j.dwt.2024.100895
.
Zhou
J.
,
Wang
Y.
,
Xiao
F.
,
Wang
Y.
&
Sun
L.
(
2018
)
Water quality prediction method based on IGRA and LSTM
,
Water
,
10
(
9
),
1148
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).