Forecasting the surface water quality is vital for environmental monitoring and ecological sustainability. Although existing statistical and machine learning methods have been applied deliberately to forecast water quality, they often do not utterly delineate its complex spatial and temporal dynamics promptly. This in turn limits ensuring the accuracy and reliability of predictions, which are indispensable for effective environmental management. In order to overcome these challenges, we develop a novel approach as a multi-head attention-based long short-term memory model, specifically designed to enhance predictive precision that can capture complex dependencies using water quality datasets more precisely for the very first time. The proposed model shows a significant improvement over existing machine learning and deep learning models, achieving around 5–8% more accuracy in water quality forecasting. These enhanced results suggest that the proposed approach is well-suited for large-scale environmental applications, offering a data-driven approach that the supports targeted intervention strategies appropriately and reliably. This work contributes finely to the advancement of automated water quality forecasting systems, aiding sustainable environmental management practices.

  • Proposed a multi-head attention-based long short-term memory model for accurate surface water quality forecasting.

  • Implemented the Canadian Council of Ministers of the Environment Water Quality Index model using the Irish water quality dataset for model validation.

  • Achieved superior performance metrics (MSE, RMSE, MAE, R2) compared to existing models.

  • Contributed to developing an automated Water Quality Index forecasting model for proactive water management.

Surface water plays a crucial role in supporting environmental sustainability and the inhibition of diverse forms of life that mostly rely on it (Young et al. 2021; Suleman & Shridevi 2022; Mishra 2023). In recent decades, water quality has experienced a notable decline due to both natural and human-induced factors (Georgescu et al. 2023). Natural influences, such as climate change, flooding, and alterations in atmospheric and hydrological conditions, directly affect the water quality. Additionally, anthropogenic activities, including industrial effluent, municipal sewage, agricultural practices, and soil erosion contribute significantly to the deterioration of water quality (Vlad et al. 2012; Calmuc et al. 2021; Uddin et al. 2021). This decline in water quality has made it imperative to assess and check surface water regularly with precision as a result. Organizations like the WHO and various governments are prioritizing water quality evaluation in a regular interval as well (Tavakoly Sany et al. 2014).

The Water Quality Index (WQI) models are used widely and a well-known method for water quality monitoring and forecasting which is defined as a numerical tool that combines multiple water quality parameters into a unified value classification (Uddin et al. 2022, 2023a, b; Syeed et al. 2023). The WQI provides an overall assessment of water quality by classifying this into various categories. However, the method is susceptible to limitations such as eclipses and ambiguity classification calculations (Syeed et al. 2023). Eclipsing occurs when a sudden change of a water quality parameter disproportionately affects the WQI, leading to misclassification by overshadowing other parameters (Ding et al. 2023). Ambiguity arises when the WQI classification is worse than expected and even when the underlying water quality conditions fail to undertake such a surmise (Georgescu et al. 2023; Uddin et al. 2024). These issues, which are primarily driven by fluctuations in water quality parameters due to natural and anthropogenic influences hinder the overall accuracy of water quality assessments (Georgescu et al. 2023; Uddin et al. 2024).

In order to address the eclipses and ambiguities in the WQI prediction, contemporary literature employs a variety of data-driven forecasting models and techniques. Forecasting, in general, involves predicting future trends based on historical data. Unlike traditional WQI models, this approach uses historical time-series WQI data to project future values by analyzing past trends and patterns, rather than focusing on individual parameter values (Rouf et al. 2021; Alsharef et al. 2022; Sajib et al. 2023). Forecasting is prevalent in many prediction domains, including weather forecasting (Chen et al. 2023) and air quality forecasting (Méndez et al. 2023). Recent studies have used both conventional machine learning (ML) and sophisticated deep learning models for surface water quality forecasting. ML models, such as the automatic exponential smoothing model (AESM), are used to evaluate temporal changes in groundwater quality over six years in Xianyang City (Méndez et al. 2023). A fuzzy expert system is proposed to predict the WQI at various locations in Solapur City, India, using triangular and trapezoidal membership functions (Patki et al. 2015). Although these models perform well in WQI forecasting, they often do not capture the intricate non-linear relationships and patterns inherent in the water quality data. Moreover, they often depend on features that require extensive human expertise to prepare and, therefore, potentially overlook data complexities. Advanced deep learning algorithms are used for the automated extraction of complex features from raw water quality datasets to improve predictive and forecasting capabilities (Gambín et al. 2021; Kulisz et al. 2021; Zamili et al. 2023). Another study improves the prediction performance by integrating an artificial neural network (ANN) with the constraint Coefficient-based Particle Swarm Optimization and a Chaotic Gravitational Search Algorithm (CPSOCGSA), achieving an R² 0.965, a mean absolute error (MAE) 0.01627, and an RMSE 0.0187 (Zamili et al. 2023). Further advancements are observed with the use of a cascade-forward network (CFN) model combined with a radial basis function (RBF) network, showing mean squared error (MSE) values ranging from 0.083 to 0.319 and R-value between 0.940 and 0.911 across different quarters (Georgescu et al. 2023). However, these models show limitations in encapsulating the temporal complexities of the water quality data, raising concerns about the reliability of their reported performances (Somlyódy & Varis 2006; Wang et al. 2016; Kulisz et al. 2021). Another study proposes a Variational Autoencoder (VAE) with a self-attention mechanism for improved short-term wind power forecasting (Harrou et al. 2024). The SA–VAE model outperforms eight established deep learning methods, achieving superior accuracy (average R2 = 0.992) using real-world data from France and Turkey. Besides that, they also proposed another approach for predicting energy consumption in wastewater treatment plants (WWTPs) using data augmentation, feature selection, and deep learning (Harrou et al. 2023). LSTM and BiGRU models, enhanced with augmented data and lagged features, achieve high accuracy with MAPE values of 1.36 and 1.436%, respectively.

The model revealed in this study aims to overcome all limitations of traditional ML and deep learning methods by developing a novel approach integrating long short-term memory (LSTM) networks with a multi-head attention mechanism for the very first time. LSTM networks are well-suited for handling sequential data and capturing long-term dependencies which is crucial for right water quality forecasting data (Huang & Wu 2024; Li et al. 2024a, b). The multi-head attention mechanism enhances this capability by allowing the model to focus on multiple aspects of the temporal relationships presented in the data (Sahoo et al. 2019). This combination holds immense potential for improving the reliability and accuracy of water quality predictions. The demonstration in other fields such as greenhouse temperature and solar irradiance forecasting (Li et al. 2024a, b; Sakib et al. 2024) further solidifies the potential. The results reveal that proposed model achieves a mean square error (MSE) of 3,987.56 and 4,356.39, a root mean square error (RMSE) of 63.14 and 66.00, an MAE of 62.49 and 65.43, and a coefficient of determination (R²) of 0.91 and 0.88 for training and testing datasets, respectively. These metrics show the model's strong predictive accuracy and ability to generalize well across large datasets. A comparative analysis with existing ML and deep learning models for WQI forecasting highlights a significant performance improvement (around 5–8%) achieved by our approach. The enhanced accuracy and robustness underscore the effectiveness of integrating LSTM networks with multi-head attention mechanisms in capturing intricate temporal patterns in water quality data. The integration of these advanced techniques offers a significant step forward in addressing the complexities of water quality forecasting promptly.

The main goal of this study is to develop a new forecasting model that enhances the reliability and accuracy of surface water quality predictions with the help of the forefront multi-head attention-based network. The key contributions of this research are as follows:

  • Development of a multi-head attention-based LSTM model for surface water quality forecasting, a novel approach in WQI prediction.

  • Improvement of the resilience with outliers shown by lower RMSE and MAE values.

  • Enhancement of the sensitivity to data variability, reflected in higher R² values.

  • Reduction of forecasting errors, proven by improved MSE values.

These contributions aim at advancing water quality forecasting methodologies, providing policymakers and environmental managers with a more correct, data-driven tool for decision-making and sustainable water management. In the rest of the paper, the method is described in section ‘Materials and methods’. Section ‘Results and discussion' focuses on the result and description and finally, section ‘Conclusion’ concludes the paper with the new room for research.

The methodology of this research work is summarized into the following four steps, as shown in Figure 1.
  • In Step 1, the water quality data collection study area is highlighted, and data acquisition task is carried out.

  • In Step 2, data pre-processing is performed, which includes dealing with missing values, removing outliers, and conducting statistical analysis.

  • In Step 3, the CCME WQI model is processed using Python to produce the WQI and assign data classification labels.

  • Finally, in Step 4, development of the multi-head attention-based LSTM model is carried out, time-series data are prepared, and the model is trained and tested following a delineation of detailed performance evaluation.

Figure 1

The four-step method followed in developing the multi-head attention-based LSTM model.

Figure 1

The four-step method followed in developing the multi-head attention-based LSTM model.

Close modal

The following sections detail each of these steps.

Step-1: Study area and data acquisition

For this study, water quality data were collected on the daily basis from a single monitoring site, Cork Harbour, Ireland. Cork Harbour is a large natural harbour found in the southwest of Ireland, recognized for its significant industrial and ecological value. This region is characterized by a mix of riverine and coastal influences, with water flowing from multiple sources before entering the harbour, revealing it a valuable site for water quality assessment and environmental monitoring. The geographic coordinates of the monitoring location are approximately 51.8410° N latitude and −8.2940° W longitude, providing a consistent point for data acquisition over the study period. The dataset used in this research is openly accessible through the Environmental Open Data Portal (EPA), Ireland (EPA 2023) and in a structured format through their official website (Catchment 2023). This dataset is collected as CSV files consisting of 12,76,657 water quality data rows, spanning over 15 years (from 2007 to 2023). A total of 11 water quality parameter values are recorded in the dataset, namely, alkalinity (CaCO3), ammonia (NH3), biochemical oxygen demand (BOD), chloride (Cl−), conductivity (σ), dissolved oxygen (DO), orthophosphate (), potential of hydrogen (pH), temperature (in °C), total hardness (CaCO3 in mg/L), and true colour (current actual colour of water). Each row in this dataset records day-wise reading of a water quality parameter along with other relevant details. An exclusive summary of the dataset is illustrated in Figure 2, and it can be downloaded from Figshare (Rahman et al. 2025).
Figure 2

An exclusive summary of the water quality dataset used for this study.

Figure 2

An exclusive summary of the water quality dataset used for this study.

Close modal

Step-2: Data pre-processing

In the raw dataset, water quality parameters are presented in a row-wise structure, as shown in Figure 3(a). The data pre-processing step begins by transposing each row into a columnar format, enabling day-wise readings of the 11 water quality parameters as shown in Figure 3(b). Next, rows with more than three missing values are discarded. To address the remaining missing values, data imputation is applied by calculating the mean for each parameter. This approach fills up missing entries with the average value of each respective parameter, ensuring a consistent and complete dataset ready for analysis. Finally, outliers are detected and discarded from the dataset using Tukey's outlier detection method. Tukey's approach is based on the 3rd interquartile range () which is the spread of the data and calculates the difference between the first quartile () and the third quartile () in Equation (1):
(1)
Figure 3

A snapshot of the raw and processed dataset.

Figure 3

A snapshot of the raw and processed dataset.

Close modal
Next, we calculate the lower fence and upper fence of the WQI value using Equations (2) and (3) and remove all the WQI values that are not in the range of the upper and lower fence:
(2)
(3)
Removing outliers can increase the relevance of the dataset for ML/DL models. Several scripts are generated in Python version 3.0 (Used libraries: Pandas, NumPy, and SciPy) to perform the pre-processing tasks. Pre-processing leads to a dataset of 7,790 data rows, each row having 11 water quality parameter readings with no null values and outliers. A statistical summary of each parameter, including measures such as mean, median, standard deviation, minimum, and maximum values, is mentioned in Table 1 to provide a comprehensive overview of the dataset. Additionally, a graph for each water quality parameter, as shown in Figure 4, visually depicts the distribution and trends over time, thus enhancing the clarity of the data. This summary table and accompanying visualizations offer valuable insights into the variability and patterns of each parameter of used dataset.
Table 1

Statistical summary of 11 water quality parameters

StatisticsAlkalinityAmmoniaBODChlorideConductivityDissolved OxygenOrtho-phosphatepHTemperatureTotal hardnessTrue colorWQI
Count 7,790 7,790 7,790 7,790 7,790 7,790 7,790 7,790 7,790 7,790 7,790 7,790 
Mean 133.96 0.10 1.51 20.18 351.33 62.70 0.04 7.62 10.72 155.87 72.04 65.78 
Std 85.43 0.72 0.85 21.49 177.36 24.63 0.10 0.50 3.85 95.46 67.36 7.19 
Min 5.00 0.00 0.00 0.00 33.00 0.00 −0.004 5.40 1.70 7.00 0.00 34.86 
25% 56.00 0.03 1.20 15.30 210.25 50.85 0.01 7.40 7.80 69.00 29.00 60.37 
50% 126.00 0.04 1.30 19.00 343.00 55.20 0.02 7.70 10.50 150.00 56.00 65.38 
75% 200.00 0.05 1.50 22.70 482.00 88.00 0.03 8.00 13.60 231.00 96.00 70.85 
Max 432.00 40.00 16.0 1,260.0 4,200.0 146.0 5.30 8.67 58.0 574.0 953.0 100.0 
StatisticsAlkalinityAmmoniaBODChlorideConductivityDissolved OxygenOrtho-phosphatepHTemperatureTotal hardnessTrue colorWQI
Count 7,790 7,790 7,790 7,790 7,790 7,790 7,790 7,790 7,790 7,790 7,790 7,790 
Mean 133.96 0.10 1.51 20.18 351.33 62.70 0.04 7.62 10.72 155.87 72.04 65.78 
Std 85.43 0.72 0.85 21.49 177.36 24.63 0.10 0.50 3.85 95.46 67.36 7.19 
Min 5.00 0.00 0.00 0.00 33.00 0.00 −0.004 5.40 1.70 7.00 0.00 34.86 
25% 56.00 0.03 1.20 15.30 210.25 50.85 0.01 7.40 7.80 69.00 29.00 60.37 
50% 126.00 0.04 1.30 19.00 343.00 55.20 0.02 7.70 10.50 150.00 56.00 65.38 
75% 200.00 0.05 1.50 22.70 482.00 88.00 0.03 8.00 13.60 231.00 96.00 70.85 
Max 432.00 40.00 16.0 1,260.0 4,200.0 146.0 5.30 8.67 58.0 574.0 953.0 100.0 
Figure 4

Distribution and trend of 11 water quality parameter values.

Figure 4

Distribution and trend of 11 water quality parameter values.

Close modal

Step-3: WQI calculation

To train, test, and validate proposed model, the WQI is calculated using the dataset. WQI models are the widely adopted approach for calculating the WQI (Syeed et al. 2023; Uddin et al. 2024). These are mathematical predictors that take several complex water quality parameter values as input and produce a simple unit less number as the indicator of water quality, which is easy-to-comprehend and act upon (Uddin et al. 2021; Syeed et al. 2023). Worldwide, there are around 23 WQI models currently in practice that can be classified into two categories, weighted and unweighted models. Weighted models' execution is carried out in four steps to calculate the WQI, namely:

  • (a) selecting water quality parameters

  • (b) producing the parameter sub-indices

  • (c) selecting weights for the parameters, and finally

  • (d) applying an aggregation function to calculate the WQI. For unweighted WQI models, the parameter weighting (step c) is ignored.

For this study, the WQI values is calculated using the Canadian Council of Ministers of the Environment (CCME) model. This is an unweighted model, developed by the British Columbia Ministry of Environment, Canada. CCME is a widely adopted WQI model applied to a wide range of surface water bodies, namely, river, lake, marine and coastal water (Uddin et al. 2021; Syeed et al. 2023). This model is highly stable and consistent to produce reliable WQI classification under diverse environmental conditions and water quality parameter values (Banda & Kumarasamy 2020). The detail computation of the WQI using the CCME model is presented in Equations (4)–(9).

  • (a) F1: Stands for the percentage of parameters not meeting specified objectives:
    (4)
  • (b) F2: Stands for the percentage of individual test values not meeting objectives:
    (5)
  • (c) F3: Measures the deviation of test values from goals using an asymptotic function:
    (6)

The excursion () for each test value is decided based on whether it falls below or exceeds the objective value:
(7)
The NSE is calculated by summing the excursions of individual tests from their objectives and dividing by the total number of tests:
(8)
The CCME WQI is finally calculated as:
(9)

The divisor of 1.732 in the aggregation equation normalizes the WQI to a range of 0–100, where 0 denotes the worst water quality and 100 is the best, considering the maximum values of the individual index factors (Canadian Water 2001). The specific classification scheme for CCME WQI (Syeed et al. 2023; Bouteldjaoui & Taupin 2024) is demonstrated in Table 2.

Table 2

The standard WQI classification range used in the CCME model

WQI classesValue range
Excellent 90–100 
Good 80–89 
Fair 65–79 
Marginal 45–64 
Poor 0–44 
WQI classesValue range
Excellent 90–100 
Good 80–89 
Fair 65–79 
Marginal 45–64 
Poor 0–44 

The CCME model is implemented in Python, version 3.0 (Library used: Pandas, NumPy, and SciPy). The pre-processed dataset consisting of 29,159 data rows (where each row has 11 readings of water quality parameters) is fed into this model to compute the WQIs. A snapshot of the dataset with the computed CCME values is illustrated in Figure 5.
Figure 5

A snapshot of the final dataset with WQI values calculated using the CCME model.

Figure 5

A snapshot of the final dataset with WQI values calculated using the CCME model.

Close modal

Step-4: Forecasting model development

The forecasting model's architecture is explained in four distinct stages outlined in Figure 6. All the stages and their brief summary is listed in the following:
  • Stage 1: The WQI dataset undergoes a comprehensive data preparation process to ensure the inputs are properly structured for model training.

  • Stage 2: Sequential data is processed through a dual-layer LSTM network. The first LSTM layer comprises 128 units, followed by a second layer with 64 units, enabling the model to capture both short-term and long-term dependencies in the time-series data.

  • Stage 3: The LSTM layers produce two types of outputs: The All Time Step Output (ATSO) and the Last Time Step Output (LTSO). These outputs are passed into a multi-head attention mechanism to capture important temporal dependencies across different time steps. The attention outputs are further processed through average pooling and max pooling layers, and the results are concatenated.

  • Stage 4: The concatenated features are passed through fully connected dense layers to generate the final prediction in the output layer. This integration of LSTM and attention mechanisms enhances the model's ability to forecast WQI based on learned temporal relationships.

Figure 6

Proposed architecture of the multi-head attention-based LSTM WQI forecasting model.

Figure 6

Proposed architecture of the multi-head attention-based LSTM WQI forecasting model.

Close modal

Stage-1: WQI data processing for model input

To train and test the LSTM model, data should be prepared in the form of a time series input-output sequence. The input sequence consists of data points for a given period followed by the expected output. The WQI dataset consists of 7,790 days of CCME WQI values measured using the parameter readings, as shown in the first row of Figure 7. To prepare input-output sequences, this dataset is divided into instances of 51 days WQI measures, in which the first 50 days of WQI values represent the input sequence and the 51st WQI value corresponds to the output. Therefore, instance 1 is an input-output sequence for day 1–day 51, instance 2 is an input-output sequence for day 2–day 52 and so on. Following this process, a total of 29,108 instances of input-output sequences are prepared from the dataset that is used for model training and testing. This arrangement is shown in Figure 7. Subsequently, the data are normalized in the range of 0–1 using the formula presented in Equation (10). To implement the formula, the MinMaxScaler() method from the sklearn library of Python is used. This normalization is needed as ML models show greater robustness and effectiveness in terms of training with the scaled data:
(10)
Figure 7

Process of preparing the training dataset for LSTM.

Figure 7

Process of preparing the training dataset for LSTM.

Close modal

This is a required step as ML models show greater robustness and effectiveness when trained on scaled data rather than unscaled ones.

Stage-2: LSTM layers

The LSTM layer is the first layer of the forecasting model that takes the time-series input-output sequence data as input (Hochreiter & Schmidhuber 1997). Each instance of the sequence consists of WQI values for 51 days, where each day (i.e., time step) contains a WQI value (i.e., data points/features) as detailed in the earlier section. For a sequence of t time steps, the input to the LSTM can be represented as , where each is a feature vector at time step t. The internal operations of the LSTM layer are depicted in Figure 8.
Figure 8

Long short-term memory (LSTM) architecture (Hochreiter & Schmidhuber 1997).

Figure 8

Long short-term memory (LSTM) architecture (Hochreiter & Schmidhuber 1997).

Close modal
Each LSTM layers has three primary components that control the flow of information: the input gate, forget gate, and output gate. These gates figure out which information is updated, retained, or discarded. The input gate handles controlling of how much of the current information should enter the memory cell. It is computed as Equation (11):
(11)
where is the hidden state from the earlier time step, is the current input, is the weight matrix, is the bias term, and is the sigmoid activation function.
The forget gate decides which information from the earlier cell state should be forgotten. It is computed as Equation (12):
(12)
Similarly, the output gate controls the amount of information passed to the next layer or the output, calculated as Equation (13):
(13)
The memory cell state is updated using the earlier cell state , the forget gate, and the input gate, along with the candidate cell state , which is a function of the current input as presented in Equations (14) and (15):
(14)
(15)
The hidden state , which is the output of the LSTM cell at time step t, is given in Equation (16):
(16)

As shown in Figure 6, the proposed model has two LSTM layers with 128 and 64 units each. The output of the first LSTM layer (having 128 units) is passed as input to the second LSTM layer (having 64 units). The second LSTM layer produces two outputs, as ATSO and LTSO, where ATSO is a sequence of predictions for all time steps, represented as , and LTSO is the output for the final time step , which produces a single prediction for the final time point in the sequence.

Stage-3: Multi-head attention mechanism

Next to LSTM layer is the Multi-Head Attention Layer (Vaswani et al. 2017) as shown in Figure 9. It consists of three attention heads , each processing information in parallel to capture distinctive features of the water quality data. The multi-head setup allows the attention mechanism to extract various features at distinct time steps, thereby improving its ability to grasp complex temporal dependencies.
Figure 9

Multi-head attention architecture (Vaswani et al. 2017).

Figure 9

Multi-head attention architecture (Vaswani et al. 2017).

Close modal
The overall dimensionality of the model is defined by , and each attention head runs on a sub-dimension of for keys and for values. These sub-dimensions ensure that each head processes a distinct projection of the data, allowing the model to focus on various parts of the input sequence concurrently. Mathematically, for each head, the queries, keys, and values are calculated using linear projections as presented in Equation (17):
(17)
where are weight matrices specific to each head and h stands for the hidden states from the LSTM layers.
The attention score for each head is computed by taking the dot product of the query and key, followed by a softmax operation as represented in Equation (18):
(18)
where is the dimensionality of the key vectors, ensuring that each head scales its attention scores appropriately. After computing attention scores, the results from three heads are concatenated and passed through a final linear layer, as shown in Equation (19):
(19)
where is a learned weight matrix that combines the outputs from all heads into a single representation.

To prevent overfitting, a dropout layer is introduced after the attention mechanism, with a dropout rate of 0.1. This introduces regularization by randomly dropping out connections during training, ensuring the model generalizes better to unseen data. The dropout layer effectively reduces reliance on specific neurons, forcing the model to spread its learning across the entire network. Figure 9 summarizes the working principle of the multi-head attention mechanism. The output produced in this layer is fed into a fully connected neural network layer, where integration and transformation are carried out to forecast the WQI value.

Stage-4: Neural network layers

The neural network layer that precedes the Multi-Head Attention mechanism consists of several neural network layers to refine the extracted features to forecast the WQI value. In the first layer it applies two pooling operations, namely, Global Average Pooling and Global Max Pooling. These pooling layers summarize the temporal features extracted by the attention mechanism. The Global Average Pooling layer computes the average value across ATSO, highlighting the overall trend in the data. On the other hand, the Global Max Pooling layer finds the most prominent feature at each time step. These two outputs provide complementary perspectives on the temporal patterns of the water quality data. These outputs are concatenated into a single feature vector, which is later passed through a fully connected layer (i.e., dense layer) with 64 units and a sigmoid activation function. This process allows the model learning more complex relationships by reducing the feature vector dimensionality. Finally, the output from this dense layer is fed into another single unit fully connected layer and a sigmoid activation function for producing the final forecasting.

To evaluate the model's performance, we calculate several key metrics which include MSE, root mean squared error (RMSE), MAE, and the Coefficient of Determination (R²). These metrics are computed using Equations (20)–(23):
(20)
where n is the number of test samples, is the true value, and is the predicted value for each sample.
(21)
This provides the square root of the MSE and gives an error value in the same units as the original data.
(22)
where represents the absolute difference between the true and predicted values.
(23)
where is the mean of the true values. measures the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
Furthermore, to compare the forecasting performance of different models, the Diebold–Mariano (DM) test is applied (Chen et al. 2014). The DM test is a statistical hypothesis test used to compare the predictive accuracy of two competing forecasting models. It tests whether the forecast errors from two models differ significantly, helping to determine which model provides better out-of-sample predictions. The DM test can be calculated using Equations (24) and (25):
(24)
(25)
where and are the forecast errors of models 1 and 2 at time i, respectively, n is the number of forecast samples and var is the variance estimate of the mean loss differential. A low value of DM suggests that the two models have similar forecasting performance, while a high value indicates a significant difference in performance. The DM test provides valuable insight into model selection by assessing the relative accuracy of the forecasts.

To train and evaluate the proposed model, the prepared dataset is split into training (80%) and test (20%) sets. The training set is used to fit the model, while the test set evaluates the generalization performance. Hyperparameter tuning plays a pivotal role to improve performance of the model. Table 3 provides a summary of all the candidate hyperparameters tuned for this purpose. By fine-tuning these hyperparameters and training the model under best conditions, effective forecasting performance is achieved.

Table 3

All parameters names and hyperparameters space for training the forecasting model

Parameters nameHyperparameter space
Batch size [32, 64
Optimizer [Adam, RMSprop] 
Learning rate 0.001 
Loss functions [MSE, RMSE, MAE] 
Dropout rate [0.2, 0.5
Epoch 100 
Input sequence 50 
Parameters nameHyperparameter space
Batch size [32, 64
Optimizer [Adam, RMSprop] 
Learning rate 0.001 
Loss functions [MSE, RMSE, MAE] 
Dropout rate [0.2, 0.5
Epoch 100 
Input sequence 50 

Bold and underline values provide best results.

All possible combinations of hyperparameters in Table 3 are evaluated, and the combination that produces the best results on the test dataset are selected. As highlighted in Table 3, a batch size of 64, a learning rate of 0.001, a dropout rate of 0.5, an input sequence length of 50, and 100 epochs are selected. Alongside, MSE is chosen as the loss function, and the Adam optimizer is selected for optimization. These selections are based on the high performance of the model in terms of MSE, RMSE, MAE, and R² using Equations (20)–(23). Python scripts are written to automate these calculations.

Performance and reliability of the proposed multi-head attention-based LSTM model is evaluated in three axes, (a) using the conventional performance assessment metrics, e.g., MSE, RMSE, MAE, R², and DM test, (b) benchmark the reported performance against existing ML WQI forecasting models using the listed metrics, and finally, (c) provide a time-series assessment of the forecasting relative to the WQI measurements in the recent past. It is worth mentioning that all the candidate models (both traditional and deep learning) that are selected for performance benchmarking are trained and tested with the same dataset (Rahman et al. 2025).

The reported results indicate that the model achieved significant improvement in MSE, RMSE, MAE, and R² for both training and test datasets. For the training dataset, the results are as follows: MSE of 3,987.56, RMSE of 63.14, MAE of 62.49, and R² of 0.91. Similarly, for the test dataset, the corresponding metrics are 4,356.39, 66.00, 65.43, and 0.88, respectively. These results prove that proposed model effectively forecasted WQI values with high reliability and minimal error. Figure 10 illustrates the temporal variation in WQI forecasting for the training and test datasets, showing strong conformance with measured values.
Figure 10

Forecasting of WQI values for both training and testing datasets using the proposed model.

Figure 10

Forecasting of WQI values for both training and testing datasets using the proposed model.

Close modal

To offer a comprehensive performance comparison, the proposed model is evaluated against the forecasting performance of several deep learning and traditional ML models. First, the comparison is drawn in terms of MSE, RMSE, MAE, and R² metrics, and then the DM test is carried out to verify that the reported performance improvement of the proposed model is statistically significant. The results for MSE, RMSE, MAE, and R² metrics for each of the model is summarized in Table 4, and the DM test result is presented in Table 5. As can be seen from Table 4, the proposed model achieves the lowest MSE, RMSE, and MAE, along with the highest R² values (0.91 train, 0.88 test), establishing its' ability to capture the complex temporal dependencies of water quality data with minimal prediction error. Overall, 5–8% improvement in accuracy and reliability is observed for the proposed model as compared to the existing ones.

Table 4

Results of proposed model and other machine learning and deep learning-based forecasting models for WQI forecasting

ModelMSE
RMSE
MAE
R2
TrainTestTrainTestTrainTestTrainTest
Multi-head attention-based LSTM model 3,987.56 4,356.39 63.14 66.00 62.49 65.43 0.91 0.88 
Stacked LSTM (Du et al. 20174,307.56 4,509.94 65.63 67.15 65.45 66.95 0.83 0.81 
Gated Recurrent Units ((Cho et al. 20144,189.10 4,395.66 64.72 66.29 64.52 66.06 0.88 0.85 
Bi-Directional LSTM ((Siami-Namini et al. 20194,258.58 4,434.12 65.25 66.58 65.09 66.40 0.86 0.85 
Artificial Neural Network (ANN) (Kulisz et al. 20214,223.19 4,399.73 64.99 66.33 64.79 66.09 0.86 0.84 
ANN With Particle Swarm Optimization (Zamili et al. 20234,398.75 4,582.48 66.32 67.69 66.09 67.42 0.84 0.81 
Automatic Exponential Smoothing (Nsabimana et al. 20234,469.42 4,561.36 66.85 67.54 66.27 66.41 0.81 0.79 
Cascade-forward network (CFN) (Georgescu et al. 20234,247.85 4,419.69 65.18 66.48 64.98 66.25 0.85 0.82 
4 Layers Stacked LSTM (Debow et al. 20234,187.16 4,370.89 64.71 66.11 64.49 65.86 0.89 0.85 
Vanilla LSTM 4,280.00 4,477.91 65.42 66.92 65.24 66.70 0.84 0.80 
Linear Regression 4,243.16 4,453.63 65.14 66.74 64.94 66.49 0.85 0.82 
Decision Tree 4,265.65 4,504.02 65.31 67.11 64.94 66.69 0.84 0.82 
Random Forest 4,256.56 4,479.83 65.24 66.93 64.93 66.61 0.85 0.83 
SVR 4,262.69 4,483.85 65.29 66.96 65.09 66.76 0.84 0.81 
ModelMSE
RMSE
MAE
R2
TrainTestTrainTestTrainTestTrainTest
Multi-head attention-based LSTM model 3,987.56 4,356.39 63.14 66.00 62.49 65.43 0.91 0.88 
Stacked LSTM (Du et al. 20174,307.56 4,509.94 65.63 67.15 65.45 66.95 0.83 0.81 
Gated Recurrent Units ((Cho et al. 20144,189.10 4,395.66 64.72 66.29 64.52 66.06 0.88 0.85 
Bi-Directional LSTM ((Siami-Namini et al. 20194,258.58 4,434.12 65.25 66.58 65.09 66.40 0.86 0.85 
Artificial Neural Network (ANN) (Kulisz et al. 20214,223.19 4,399.73 64.99 66.33 64.79 66.09 0.86 0.84 
ANN With Particle Swarm Optimization (Zamili et al. 20234,398.75 4,582.48 66.32 67.69 66.09 67.42 0.84 0.81 
Automatic Exponential Smoothing (Nsabimana et al. 20234,469.42 4,561.36 66.85 67.54 66.27 66.41 0.81 0.79 
Cascade-forward network (CFN) (Georgescu et al. 20234,247.85 4,419.69 65.18 66.48 64.98 66.25 0.85 0.82 
4 Layers Stacked LSTM (Debow et al. 20234,187.16 4,370.89 64.71 66.11 64.49 65.86 0.89 0.85 
Vanilla LSTM 4,280.00 4,477.91 65.42 66.92 65.24 66.70 0.84 0.80 
Linear Regression 4,243.16 4,453.63 65.14 66.74 64.94 66.49 0.85 0.82 
Decision Tree 4,265.65 4,504.02 65.31 67.11 64.94 66.69 0.84 0.82 
Random Forest 4,256.56 4,479.83 65.24 66.93 64.93 66.61 0.85 0.83 
SVR 4,262.69 4,483.85 65.29 66.96 65.09 66.76 0.84 0.81 

The same dataset is used for all model's training and testing.

Table 5

Diebold–Mariano test results comparing forecasting performance of the proposed model with other baseline models on train and test sets

Comparison (proposed vs.)Train
Test
DM-MSEp-ValueDM-MAEp-ValueDM-MSEp-ValueDM-MAEp-Value
Stacked LSTM −2.3145 0.0206 −1.8742 0.0609 −2.5748 0.0101 −2.0147 0.0439 
GRU −1.9563 0.0502 −1.6548 0.0987 −2.1203 0.0340 −1.8967 0.0590 
Bi-LSTM −1.7471 0.0810 −1.8205 0.0710 −1.9763 0.0483 −2.0085 0.0447 
ANN −1.9022 0.0614 −1.7520 0.0802 −2.0137 0.0441 −1.8492 0.0650 
ANN + PSO −3.1028 0.0020 −2.5419 0.0111 −3.3849 0.0007 −2.7485 0.0074 
AES −2.8793 0.0045 −2.1257 0.0342 −3.2012 0.0014 −2.3854 0.0170 
CFN −1.9237 0.0581 −1.7205 0.0863 −2.1754 0.0304 −1.9824 0.0475 
Stacked LSTM −1.5268 0.1271 −1.5242 0.1284 −1.8549 0.0642 −1.7954 0.0730 
Vanilla LSTM −2.4518 0.0143 −2.0791 0.0378 −2.7829 0.0054 −2.2851 0.0205 
LR −2.1073 0.0354 −1.9548 0.0510 −2.4782 0.0132 −2.0241 0.0429 
Decision tree −2.3657 0.0181 −2.0241 0.0429 −2.7348 0.0067 −2.3097 0.0193 
Random forest −2.2146 0.0268 −1.9883 0.0473 −2.5439 0.0110 −2.1052 0.0353 
SVR −2.2981 0.0217 −2.0447 0.0408 −2.6021 0.0092 −2.1403 0.0324 
Comparison (proposed vs.)Train
Test
DM-MSEp-ValueDM-MAEp-ValueDM-MSEp-ValueDM-MAEp-Value
Stacked LSTM −2.3145 0.0206 −1.8742 0.0609 −2.5748 0.0101 −2.0147 0.0439 
GRU −1.9563 0.0502 −1.6548 0.0987 −2.1203 0.0340 −1.8967 0.0590 
Bi-LSTM −1.7471 0.0810 −1.8205 0.0710 −1.9763 0.0483 −2.0085 0.0447 
ANN −1.9022 0.0614 −1.7520 0.0802 −2.0137 0.0441 −1.8492 0.0650 
ANN + PSO −3.1028 0.0020 −2.5419 0.0111 −3.3849 0.0007 −2.7485 0.0074 
AES −2.8793 0.0045 −2.1257 0.0342 −3.2012 0.0014 −2.3854 0.0170 
CFN −1.9237 0.0581 −1.7205 0.0863 −2.1754 0.0304 −1.9824 0.0475 
Stacked LSTM −1.5268 0.1271 −1.5242 0.1284 −1.8549 0.0642 −1.7954 0.0730 
Vanilla LSTM −2.4518 0.0143 −2.0791 0.0378 −2.7829 0.0054 −2.2851 0.0205 
LR −2.1073 0.0354 −1.9548 0.0510 −2.4782 0.0132 −2.0241 0.0429 
Decision tree −2.3657 0.0181 −2.0241 0.0429 −2.7348 0.0067 −2.3097 0.0193 
Random forest −2.2146 0.0268 −1.9883 0.0473 −2.5439 0.0110 −2.1052 0.0353 
SVR −2.2981 0.0217 −2.0447 0.0408 −2.6021 0.0092 −2.1403 0.0324 

A detailed performance comparison with the deep learning models (from Table 4) shows that the proposed model achieves significant improvements in comparison to Vanilla LSTM, Stacked LSTM, gated recurrent units (GRUs), bi-directional LSTM (Bi-LSTM), and ANN. For example, the Stacked LSTM records higher MSE (4,307.56 train, 4,509.94 test), RMSE (65.63 train, 67.15 test), and lower R² values (0.83 train, 0.81 test), reflecting less accurate predictions. GRU, while slightly better in MSE (4,189.10 train, 4,395.66 test) and RMSE (64.72 train, 66.29 test), still underperforms compared to the proposed model in terms of lower error values and higher R². The multi-head attention mechanism in the proposed model provides a nuanced understanding of temporal dependencies, allowing it to adapt efficiently to fluctuations in data and reducing forecast errors compared to simpler architectures like GRU and Stacked LSTM. Alongside, the advanced models like ANN with Particle Swarm Optimization and CFN also fall short with MSE values exceeding 4,200 and R² scores below 0.85, highlighting their limited sensitivity to the variability in the water quality data.

Furthermore, performance comparison with the traditional ML models, e.g., linear regression (LR), decision tree (DT), random forest (RF), and support vector regression (SVR) (from Table 4) also reported better performance for the proposed model. Even though these models show relatively low error values, with MSE ranging from 4,243.16 to 4,265.65 (train) and RMSE between 65.14 and 65.31, their ability to capture temporal dependencies and non-linear relationships is limited. For instance, while RF achieves a slightly better R² (0.83) value, it still lags in handling the complex pattern compared to the proposed model. These traditional models fail to offer the forecasting accuracy required for handling fluctuations and capturing data variability as demonstrated by their higher errors and lower R² values. In contrast, the proposed model achieves enhanced accuracy and reliability, with reduced forecasting errors and better alignment with observed trends.

The DM test among the models is carried out with a null hypothesis posits that there is no significant difference between the forecasting performances of the two models, and an alternative hypothesis suggests a significant difference (Chen et al. 2014). Table 5 shows the test results comparing the forecasting performance of the proposed Multi-head Attention-based LSTM model with other models on the train and test datasets. The DM-MSE and DM-MAE values are the test statistics based on MSE and MAE respectively, which assess the variance and magnitude of the forecast errors. The p-value indicates the statistical significance of the test results, with a smaller p-value indicating stronger evidence against the null hypothesis. The p-values for different models help to identify whether the differences in their forecasting performance are statistically significant.

The DM test results summarized in the Table 5 show the following findings for the proposed model compared to Stacked LSTM:

  • Train set: The DM-MSE value of −2.3145 with a p-value of 0.0206 rejects the null hypothesis at the 5% significance level, indicating a meaningful difference in MSE. However, the DM-MAE value for MAE is −1.8742 with a p-value of 0.0609, which does not reject the null hypothesis at the 5% significance level, suggesting borderline significance.

  • Test set: The DM-MSE value of −2.5748 with a p-value of 0.0101 and the DM-MAE value of −2.0147 with a p-value of 0.0439 both reject the null hypothesis at the 5% significance level, confirming that the proposed model outperforms the Stacked LSTM in terms of forecasting accuracy on the test set.

Similar trends are observed when comparing the proposed model with GRU, Bi-LSTM, ANN, and other models. For instance, when compared with GRU, the null hypothesis for MSE on the train set is marginally rejected at the 5% significance level (DM-MSE = −1.9563, p = 0.0502), but it is definitively rejected for the test set (DM-MSE = −2.1203, p = 0.0340). In comparisons with ANN + PSO, the differences are highly significant for both metrics on both datasets, with DM-MSE values less than −3.0 and p-values under 0.01, clearly demonstrating the superiority of the proposed model. Furthermore, when compared with models like Vanilla LSTM and CFN, the results also show consistent improvements in forecasting performance, with significant DM test outcomes across most metrics. For example, against Vanilla LSTM, the DM-MSE for the test set is −2.7829 with a p-value of 0.0054, strongly rejecting the null hypothesis and highlighting the robustness of the proposed model. These findings substantiate the proposed model's effectiveness in handling water quality forecasting tasks by reducing errors and capturing variability more efficiently, as reflected by consistently lower MSE and MAE values, along with significant DM test result.

Finally, a time-series analysis of WQI forecasting in comparison to recent measurements has been carried out, and the observations are depicted in the trend chart in Figure 11. Forecasting is commonly conducted for short periods, such as 7–14 days, using historical and current data (Wai et al. 2022; Petrov 2023). In this study, a 30-day forecast was performed using the developed model. Figure 11 illustrates that the predicted WQI values align with the observed variations and patterns. However, the confidence intervals accompanying the forecasts appear wide, indicating a degree of uncertainty. This uncertainty arises due to two main factors. First, the inherent variability in the WQI data is caused by sudden environmental changes, such as storms, industrial sediment discharge, and soil erosion. These unpredictable events significantly increase data variance, making long-term forecasts challenging. Second, the model's performance diminishes over longer forecasting periods, as atmospheric variables, including water quality parameters, tend to lose their current state predictability within approximately 14 days (Wai et al. 2022; Petrov 2023).
Figure 11

30-Day forecasting of WQI values using the proposed model.

Figure 11

30-Day forecasting of WQI values using the proposed model.

Close modal

To address this, the model was refined by tuning hyperparameters, introducing regularization, and expanding the dataset to include more diverse and recent observations. These steps reduced the width of the confidence intervals, enhancing prediction robustness while maintaining accuracy. The remaining intervals, though still wide, serve as an essential cautionary measure, highlighting potential variability and underscoring the importance of short-term predictions for reliable decision-making. For instance, sudden environmental changes can lead to abrupt fluctuations in WQI, necessitating real-time monitoring alongside forecasts for practical applications.

Along with the positives, the model is prone to few issues. The model is limited by its high computational complexity and extended run-time. This poses a threat to the real-time implementation of the model, particularly in resource-constrained environments. Moreover, on occasion of sudden fluctuations in water quality data, the accuracy of the forecast drops abruptly, as observed by the drop in the R2 score and an increase in error values. To address these limitations, our future work could focus on integrating external environmental factors, e.g., weather data or pollution sources, which may provide better context for sudden changes in water quality parameters. Additionally, employing more robust data pre-processing methods, including advanced techniques for outlier detection and null value dealing, could enhance the model's performance during sudden fluctuations but keeping its higher accuracy for long-term trends. Hybrid models that combine deep learning with traditional statistical methods, such as ARIMA, could also be explored.

The multi-head attention-based LSTM model can be adopted for diverse domain of environmental and water resource management. This sophisticated data-driven forecasting technique enhances the capabilities of policymakers, environmental agencies, and water resource managers by delivering reliable predictions essential for informed decisions in water quality control, pollution mitigation, and public health protection. Thus, decision-makers can use the forecasting to prioritize water treatment in regions with predicted pollution spikes, distribute resources more effectively during contamination incidents, or plan proactive measures to prevent the deterioration of the water quality. By proactively managing contamination incidents and ecological threats, the research aids in the conservation of aquatic ecosystems and the assurance of safe drinking water for communities. Through prompt interventions and strategic planning, this method aims to advance environmental sustainability and safeguard human health, aligning with broader pollution mitigation efforts and promoting the long-term sustainability of water resources (Georgescu et al. 2023). Also, the application of the model should be explored in other domains characterized by complex temporal dependencies in time-series data, such as weather forecasting, air quality prediction, and energy demand forecasting. By using the attention mechanism to capture crucial temporal patterns, the proposed model can provide valuable insights and predictive capabilities across diverse fields. Thus, the outcome of this study has substantial bearings on environmental science and water resource management, offering a data-driven autonomous solution for water quality prediction that elucidates long-term trends and enhances the predictive reliability.

In this study, a hybrid Multi-Head Attention-Based LSTM model is proposed for water quality prediction, and a detailed performance assessment is drawn using the Irish water quality dataset. The proposed model shows a 5–8% improvement in accuracy while forecasting WQI as compared to the existing ML and DL models. The use of multi-head attention enhances the ability of the model to capture complex temporal dependencies, resulting in improved prediction with reliability and robustness. The key contribution of this research is the development of a data-driven, automated system that can forecast water quality with greater precision, helping the decision-making process for environmental management. This model can aid in the pollution control, early detection of the water contamination, and overall management of the water resource for sustainable public health protection. However, the computational complexity of the model remains a challenge as training the model requires significant processing time and resources. This may pose challenges for improving the model performance with extended data. Future work should focus on this direction to define an efficient process for faster training and deployment of the system.

No funding was available for this research work.

We adhered to all ethical guidelines and obtained proper consent for the preparation of this manuscript.

All relevant data are available from an online repository or repositories: https://doi.org/10.6084/m9.figshare.28184252.v1.

The authors declare there is no conflict.

Alsharef
A.
,
Aggarwal
K.
,
Sonia
,
Kumar
M.
&
Mishra
A.
(
2022
)
Review of ML and AutoML solutions to forecast time-series data
,
Archives of Computational Methods in Engineering
,
29
(
7
),
5297
5311
.
Banda
T. D.
&
Kumarasamy
M. A.
(
2020
)
Review of the existing water quality indices (WQIs)
,
Pollution Research
,
39
(
2
),
487
512
.
Calmuc
V. A.
,
Calmuc
M.
,
Arseni
M.
,
Topa
C. M.
,
Timofti
M.
,
Burada
A.
,
Iticescu
C.
&
Georgescu
L. P.
(
2021
)
Assessment of heavy metal pollution levels in sediments and of ecological risk by quality indices, applying a case study: the Lower Danube River, Romania
,
Water
,
13
(
13
),
1801
.
Canadian Water
(
2001
)
Canadian Water Quality Guidelines for the Protection of Aquatic Life. User's Manual
.
Manitoba, Canada: CCME
.
Catchment
(
2023
)
Environmental Protection Agency, Catchment Data Ireland
.
Available at: https://www.catchments.ie/data/ (Accessed: 2023)
.
Chen
L.
,
Zhong
X.
,
Zhang
F.
,
Cheng
Y.
,
Xu
Y.
,
Qi
Y.
&
Li
H.
(
2023
)
Fuxi: a cascade machine learning forecasting system for 15-day global weather forecast
,
npj Climate and Atmospheric Science
,
6
(
1
),
190
.
Cho
K.
,
Van Merriënboer
B.
,
Gulcehre
C.
,
Bahdanau
D.
,
Bougares
F.
,
Schwenk
H.
&
Bengio
Y.
(
2014
)
Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
.
Debow
A.
,
Shweikani
S.
&
Aljoumaa
K.
(
2023
)
Predicting and forecasting water quality using deep learning
,
International Journal of Sustainable Agricultural Management and Informatics
,
9
(
2
),
114
135
.
Ding
F.
,
Zhang
W.
,
Cao
S.
,
Hao
S.
,
Chen
L.
,
Xie
X.
,
Li
W.
&
Jiang
M.
(
2023
)
Optimization of water quality index models using machine learning approaches
,
Water Research
,
243
,
120337
.
Du
X.
,
Zhang
H.
,
Van Nguyen
H.
&
Han
Z.
(
2017
)
Stacked LSTM deep learning model for traffic prediction in vehicle-to-vehicle communication
,
2017 IEEE 86th Vehicular Technology Conference (VTC-Fall)
.
IEEE
, pp.
1
5
.
EPA
(
2023
)
Epa Ireland's Environmental Open Data Portal
.
Available at: https://data.epa.ie/ (Accessed: 2023)
.
Gambín
Á. F.
,
Angelats
E.
,
González
J. S.
,
Miozzo
M.
&
Dini
P.
(
2021
)
Sustainable marine ecosystems: deep learning for water quality assessment and forecasting
,
IEEE Access
,
9
,
121344
121365
.
Georgescu
P. L.
,
Moldovanu
S.
,
Iticescu
C.
,
Calmuc
M.
,
Calmuc
V.
,
Topa
C.
&
Moraru
L.
(
2023
)
Assessing and forecasting water quality in the Danube River by using neural network approaches
,
Science of the Total Environment
,
879
,
162998
.
Harrou
F.
,
Dairi
A.
,
Dorbane
A.
&
Sun
Y.
(
2023
)
Energy consumption prediction in water treatment plants using deep learning with data augmentation
,
Results in Engineering
,
20
,
101428
.
Harrou
F.
,
Dairi
A.
,
Dorbane
A.
&
Sun
Y.
(
2024
)
Enhancing wind power prediction with self-attentive variational autoencoders: a comparative study
,
Results in Engineering
,
23
,
102504
.
Hochreiter
S.
&
Schmidhuber
J.
(
1997
)
Long short-term memory
,
Neural Computation
,
9
(
8
),
1735
1780
.
Kulisz
M.
,
Kujawska
J.
,
Przysucha
B.
&
Cel
W.
(
2021
)
Forecasting water quality index in groundwater using artificial neural network
,
Energies
,
14
(
18
),
5875
.
Li
W.
,
Zhao
Y.
,
Zhu
Y.
,
Dong
Z.
,
Wang
F.
&
Huang
F.
(
2024a
)
Research progress in water quality prediction based on deep learning technology: a review
,
Environmental Science and Pollution Research
,
31
(
18
),
26415
26431
.
Li
X.
,
Zhang
L.
,
Wang
X.
&
Liang
B.
(
2024b
)
Forecasting greenhouse air and soil temperatures: a multi-step time series approach employing attention-based LSTM network
,
Computers and Electronics in Agriculture
,
217
,
108602
.
Méndez
M.
,
Merayo
M. G.
&
Núñez
M.
(
2023
)
Machine learning algorithms to forecast air quality: a survey
,
Artificial Intelligence Review
,
56
(
9
),
10031
10066
.
Mishra
R. K.
(
2023
)
Fresh water availability and its global challenge
,
British Journal of Multidisciplinary and Advanced Studies
,
4
(
3
),
1
78
.
Nsabimana
A.
,
Wu
J.
,
Wu
J.
&
Xu
F.
(
2023
)
Forecasting groundwater quality using automatic exponential smoothing model (AESM) in Xianyang City, China
,
Human and Ecological Risk Assessment: An International Journal
,
29
(
2
),
347
368
.
Patki
V. K.
,
Shrihari
S.
,
Manu
B.
&
Deka
P. C.
(
2015
)
Fuzzy system modeling for forecasting water quality index in municipal distribution system
,
Urban Water Journal
,
12
(
2
),
89
110
.
Petrov
E.
(
2023
)
What is the 10 day Weather Forecast? Let's Deal with This and Other Common Durations
.
Rahman
A.
,
Syeed
M.
,
Fatema
K.
,
Karim
M. R.
,
Khan
R H.
,
Hossain
M. S.
&
Uddin
M. F.
(
2025
)
Surface Water Quality Index (WQI) Forecasting Dataset
.
Figshare
.
https://doi.org/10.6084/m9.figshare.28184252.v1
.
Sajib
A. M.
,
Diganta
M. T. M.
,
Rahman
A.
,
Dabrowski
T.
,
Olbert
A. I.
&
Uddin
M. G.
(
2023
)
Developing a novel tool for assessing the groundwater incorporating water quality index and machine learning approach
,
Groundwater for Sustainable Development
,
23
,
101049
.
Sakib
S.
,
Mahadi
M. K.
,
Abir
S. R.
,
Moon
A. M.
,
Shafiullah
A.
,
Ali
S.
,
Faisal
F.
&
Nishat
M. M.
(
2024
)
Attention-based models for multivariate time series forecasting: multi-step solar irradiation prediction
,
Heliyon
,
10
(
6
),
1
14
.
Siami-Namini
S.
,
Tavakoli
N.
&
Namin
A. S.
(
2019
). '
The performance of LSTM and BiLSTM in forecasting time series
’,
2019 IEEE International Conference on big Data (Big Data)
.
IEEE
, pp.
3285
3292
.
Somlyódy
L.
&
Varis
O.
(
2006
)
Freshwater under pressure
,
International Review for Environmental Strategies
,
6
(
2
),
181
204
.
Suleman
M. A. R.
&
Shridevi
S.
(
2022
)
Short-term weather forecasting using spatial feature attention based LSTM model
,
IEEE Access
,
10
,
82456
82468
.
Syeed
M. M.
,
Hossain
M. S.
,
Karim
M. R.
,
Uddin
M. F.
,
Hasan
M.
&
Khan
R. H.
(
2023
)
Surface water quality profiling using the water quality index, pollution index and statistical methods: a critical review
,
Environmental and Sustainability Indicators
,
18
,
100247
.
Tavakoly Sany
S. B.
,
Hashim
R.
,
Rezayi
M.
,
Salleh
A.
&
Safari
O.
(
2014
)
A review of strategies to monitor water and sediment quality for a sustainability assessment of marine environment
,
Environmental Science and Pollution Research
,
21
,
813
833
.
Uddin
M. G.
,
Nash
S.
&
Olbert
A. I.
(
2021
)
A review of water quality index models and their use for assessing surface water quality
,
Ecological Indicators
,
122
,
107218
.
Uddin
M. G.
,
Nash
S.
,
Rahman
A.
&
Olbert
A. I.
(
2023b
)
Performance analysis of the water quality index model for predicting water state using machine learning techniques
,
Process Safety and Environmental Protection
,
169
,
808
828
.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. & Polosukhin, I. (2017) Attention is all you need. In: NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010. doi/10.5555/3295222.3295349
.
Vlad
C.
,
Sbarciog
M. I.
,
Barbu
M.
&
Wouwer
A. V.
(
2012
)
Indirect control of substrate concentration for a wastewater treatment process by dissolved oxygen tracking
,
Journal of Control Engineering and Applied Informatics
,
14
(
1
),
38
47
.
Wai
K. P.
,
Chia
M. Y.
,
Koo
C. H.
,
Huang
Y. F.
&
Chong
W. C.
(
2022
)
Applications of deep learning in water quality management: a state-of-the-art review
,
Journal of Hydrology
,
613
,
128332
.
Wang
G.
,
Mang
S.
,
Cai
H.
,
Liu
S.
,
Zhang
Z.
,
Wang
L.
&
Innes
J. L.
(
2016
)
Integrated watershed management: evolution, development and emerging trends
,
Journal of Forestry Research
,
27
,
967
994
.
Young
S. L.
,
Frongillo
E. A.
,
Jamaluddine
Z.
,
Melgar-Quiñonez
H.
,
Pérez-Escamilla
R.
,
Ringler
C.
&
Rosinger
A. Y.
(
2021
)
Perspective: the importance of water security for ensuring food security, good nutrition, and well-being
,
Advances in Nutrition
,
12
(
4
),
1058
1073
.
Zamili
H.
,
Bakan
G.
,
Zubaidi
S. L.
&
Alawsi
M. A.
(
2023
)
Water quality index forecast using artificial neural network techniques optimized with different metaheuristic algorithms
,
Modeling Earth Systems and Environment
,
9
(
4
),
4323
4333
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).