Accurate, stable, and long-term water quality predictions are essential for water pollution warning and efficient water environment management. In this study, a hierarchical water quality prediction (HWQP) model was developed based on ‘data decomposition–predictor screening–efficient prediction’ via wavelet decomposition, Spearman correlation analysis, and long short-term memory network, respectively. The observed data from 14 stations in the Huaihe River–Hongze Lake system, including ammonia nitrogen (AN) and chemical oxygen demand (COD), were used to make long-term water quality predictions. The results suggested that, compared to existing water quality prediction models, the HWQP model has higher accuracy, with the root mean square errors of 6 and 17% for simulating AN and COD, respectively. The AN and COD concentrations will range from 0 to 1 mg/l and from 3 to 5 mg/l at 12 stations, respectively, and the COD concentrations will exceed the water quality target at Stations 4 and 5. The established model has great potential to address the challenges associated with the water environment.

  • The original water quality sequences are decomposed by wavelet transformation.

  • Driving predictors for water quality predictions are identified using the Spearman coefficient.

  • A novel hybrid model is developed for long-term water quality prediction.

Pollution discharge to sensitive ecosystems has profoundly disrupted their health and functions (Mahdian et al. 2024). Under the background of population growth and industrialization, the increasing pollution discharge has led to water quality degradation as well as a series of problems such as nutrient load and emerging pollutants (Tian et al. 2024a, b). To address the increasing problems of water pollution, a large number of studies have been conducted on water quality monitoring, simulation, and prediction (Behmel et al. 2016; Singh & Ahmed 2021). The prediction of water quality is necessary to improve water pollution control and efficient management of the water environment.

Two categories of models are generally used in water quality prediction, i.e., conventional models and artificial intelligence (AI) models (Dong et al. 2023). Conventional models are based on physical mechanism simulation (Tang et al. 2014; Ding et al. 2019), regression analysis methods (Avila et al. 2017; Wu et al. 2023), and time series decomposition (Yu et al. 2020). The physical mechanism models that can achieve detailed analysis in contaminant transportation processes often involve complex modeling and massive data requirements (Fomarelli et al. 2013). The prediction accuracy of regression analysis and data decomposition methods is related to the complexity of the observed water quality sequence, and the stability and universality of the model are difficult to guarantee. With the development of AI technology, its strong predictive ability for highly non-linear and non-stationary sequences has also been applied to predict water quality. The uncertainty of AI methods has been challenging, and it is difficult to handle complex water quality sequences with traditional algorithms such as random forest, artificial neural networks, and support vector machine (Habib et al. 2024). Many deep neural network models have been successfully applied to predict the concentration of various pollutants, such as ammonia nitrogen (AN), chemical oxygen demand (COD), and biological oxygen demand (Najah et al. 2011; Emamgholizadeh et al. 2014; Imani et al. 2021).

Long short-term memory network (LSTM) is a type of deep learning model, which can effectively solve the long-term dependencies and optimize the gradient explosion problem of traditional neural networks during model training (Peng et al. 2022). In case of LSTM, it is essential to determine the predictors and their forms of the inputs. However, if the original water quality sequences are directly simulated by LSTM, there will be significant errors in the results for sequences with seasonal and long-term trends (Komornikova et al. 2008). Furthermore, it is difficult for LSTM to provide inter-annual predictions due to the inputs missing and error accumulation in simulation.

To improve the performance of AI models in water quality predictions, data decomposition methods have been used to process the original water quality datasets (Wang & Wu 2016). Compared to directly using original data, water quality prediction models based on data decomposition can extract multi-scale features and simplify the datasets (Eze & Ajmal 2020). Wavelet decomposition (WTD) (Song et al. 2021) has been found to be effective in the analysis of water quality series and can facilitate the analysis of multi-temporal scale structures, identification of principal components, and removal of noise. WTD can decompose the original water quality sequence into several relatively stable subsequences with different frequencies, which has been proven to have advantages in revealing subtle changes and hidden information of water quality series (Yuan et al. 2022; Han et al. 2023). Thus, one of the main motivations of this study is to apply WTD into the proposed water quality prediction model.

Water quality parameters often fluctuate due to various natural and human influences, exhibiting instability and seasonality. To predict water quality using deep learning models effectively, it is crucial to identify the appropriate influencing factors, which is essential for ensuring the rationality of the model and improving its accuracy (Li et al. 2022). The Spearman correlation analysis (SCA), which is a non-parametric statistical method, possesses strong capabilities for data interpretation and spatiotemporal pattern recognition (Karthikeyan et al. 2017). Therefore, the SCA can establish a relationship with LSTM by extracting key predictors from multiple influencing factors. This process helps in identifying the most significant variables that impact water quality, thereby enhancing the predictive performance and accuracy of LSTM.

Based on previous research models, the structure of a water quality prediction model can be optimized by considering the distribution patterns of pollutants and related influencing factors of the study area. Numerous studies have explored water quality prediction models, focusing on input pre-processing (data decomposition), structure adjustment (predictor screening), sequence prediction, and output post-processing, all of which have been proven to enhance model performance. However, previous studies only improved a portion of the model without considering the overall optimization, resulting in insufficient optimization of the model's performance. For example, when only considering the optimization of WTD for deep learning model but ignoring the impact of streamflow as a factor in water quality prediction, there were significant relative errors in simulating rivers with seasonal variation of streamflow (Zhou et al. 2022).

This article proposes a hierarchical water quality prediction model with ‘decomposition–inputs–prediction’ hierarchical optimization. Distinct from previous related work, the developed model optimized the designation of water quality prediction by combining data decomposition (WTD) and predictor screening (SCA). Based on the stationary characteristics of decomposed subsequences, long-term predictions were implemented for the decomposition of the inputs.

The aims of this study were to (1) develop a novel water prediction model with the ‘decomposition–inputs–prediction’ hierarchical optimization framework, (2) apply this model for water quality predictions (including AN and COD) of the Huaihe River–Hongze Lake (HR–HL) system, and (3) make long-term water quality predictions for the HR–HL system.

In this study, a prediction model named WTD-SCA-LSTM was developed for precise prediction of water quality parameters, which is coupled with wavelet decomposition, SCA, and LSTM modules. A structural flowchart of the hierarchical optimization framework is shown in Figure 1. Sections 2.1–2.3 introduce the detailed optimization process of the water quality prediction model, including data pre-processing, inputs decision, and series prediction. Section 2.4 introduces three indicators to evaluate the accuracy, reliability, and stability of the model.
Figure 1

Structural flow chart of the water quality prediction model.

Figure 1

Structural flow chart of the water quality prediction model.

Close modal

Data decomposition based on wavelet decomposition

The observed water quality data are a highly non-linear and non-stationary complex time series, which is influenced by multiple factors. Wavelet transform is a multi-resolution analysis in time and frequency domains. Wavelet transform decomposition (WTD) converts a time series into a set (typically three to five) of constitutive series based on the local properties of the datasets (Seo et al. 2015). It decomposes the original time series into one low-frequency sequence and multiple high-frequency sequences, while the high-frequency sequences describe short-term fluctuations of the series and the low-frequency sequence indicates the overall trend. High-frequency sequences can be considered as stationary sequences (mean does not change over time and sequence has no trend) (Conejo et al. 2005). The monitored water quality datasets are discrete values, and the decomposition method used in this study is discrete wavelet transform (DWT), which can be represented as follows:
(1)
where xt is the water quality series at time t, n is the number of the DWT, Ln(t) is the low-frequency sequence corresponding to the nth DWT at time t, Hk(t) is the high-frequency sequence corresponding to the kth DWT at time t, N is the total number of samples for water quality indicators; cn,i and dk,i are the decomposition coefficients corresponding to the low-frequency sequence and the high-frequency sequence at the ith sample, respectively, and φn,i and ψk,i are the low-pass filter and the high-pass filters in WTD, respectively.
The specific steps of wavelet decomposition are shown in Figure 2 (Parmar & Bhardwaj 2015).
  • Step 1. Select a set of continuous finite orthogonal wavelet basis functions and align it with the starting point of the water quality series.

  • Step 2. Calculate the decomposition coefficients c and d. The larger the coefficient, the more similar the waveform of the current water quality sequence is to the wavelet basis function.

  • Step 3. Move the wavelet basis function along the time axis and calculate the decomposition coefficient at each time until it covers the entire water quality sequence.

  • Step 4. Scale the selected wavelet function by one unit and then repeat Steps 1–4.

  • In this study, we selected Daubechies 5 (db5) as the basis function based on the waveform matching algorithm (Farajpanah et al. 2024). db5 is more suitable for decomposing water quality data in this study through matching the shape of the observed data with the desired wavelet.

Figure 2

Schematic diagram of the wavelet decomposition of water quality sequence.

Figure 2

Schematic diagram of the wavelet decomposition of water quality sequence.

Close modal
The maximum decomposition levels of WTD can be evaluated as follows (Wu & Wang 2022):
(2)
where nd is the length of samples for water quality indicators and lw is the length of the wavelet decomposition low-pass filter.

Predictor screening by SCA

Water quality indicators are usually correlated with multiple factors, and the decomposed water quality subsequences are also interrelated with each other. SCA was used to describe the trend direction and correlation strength of two random variables in this study. SCA is good at identifying and revealing the degree of association between one dependent variable and one or more variables in a complex water quality dataset (Xiao et al. 2016). This information can evaluate the contribution of driving predictors for more accurate water quality prediction.

The Spearman correlation coefficient is represented as follows (Gauthier 2001):
(3)
where di is the difference between each pair of the ranked variables and N is the total number of water quality indicators. The range of the Spearman correlation coefficient is between 0 and 1, with 0 indicating no correlation between the two variables. When the two variables are completely monotonically correlated, the absolute value of the coefficient is 1.0.

Combining as many quality factors as possible will enhance the performance of the water quality prediction model. However, too many predictors may lead to longer time consumption and enlarged predict uncertainty, while too few might reduce accuracy. Moreover, when the prediction range exceeds the set value of the time delay, recursive prediction is required, and the predictors need to be substituted for the measured values. Thus, the reliability of the predictors is crucial in water quality prediction.

Long short-term models for water quality parameters

The variation of water quality is controlled by coupled impacts of numerous hydrological and anthropogenic factors, and it can be generalized as a mapping relationship of an output with multiple inputs, which is similar to neural network models (Kratzert et al. 2018). LSTM as one of the recurrent neural networks (RNNs) is especially good at predicting time series. LSTM overcomes the limitations of conventional RNNs, which solves the problem of gradient disappearance and gradient explosion in long-term time series prediction (Ahmadi et al. 2024). Compared with other improved models (such as Bi-LSTM and stacked LSTM), they have similar accuracy to LSTM in predicting non-stationary sequences such as water quality but require more training costs due to their more complex structure (Adib et al. 2024). Hence, LSTM was applied in this study to achieve accurate and stable prediction of complex water quality sequences.

The schematic diagram of the LSTM approach applied in this study is shown in Figure 3. A common LSTM network contains an input layer, several hidden layers, an output layer, and a cell state (Gers et al. 1999). The input predictors of the input layer are the subsequences obtained from WTD and their influencing factors. Each hidden layer consists of three gates, namely, forget gate, input gate, and output gate, which alternatively control the storage information. The output layer obtains future sequence values through the time delay parameter k. The cell states include the short-term memory and the long-term memory.
Figure 3

Schematic diagram of the LSTM approach.

Figure 3

Schematic diagram of the LSTM approach.

Close modal
The forget gate is used to filter the historical information and remove unnecessary components stored in the cell state Ct−1 under control of the forget factor ft.
(4)
where Ct−1 is the cell state at time t − 1; σ is the activation function, which transforms each input into a number between 0 and 1; wf is the weight matrix of the forget gate and bf is its bias term; and ht−1 is the hidden layer state at time t − 1.
The input gate is used to evaluate and retain important inputs for the current time under control of the input factor it.
(5)
(6)
where is the update factor; wi and wc are the input and update weight matrix of the input gate, respectively, and bi and bc are their bias terms; Tanh is a function that transforms each input into a number between –1 and 1. This gate can extract inputs, avoiding perturbation in water quality prediction while supplementing more important information for prediction to enhance the accuracy of the LSTM network. Therefore, the combination of the forget gate and the input gate provides the update of the cell state Ct as follows:
(7)
The output gate is used to control the output of cell state values and updates the hidden layer state ht at the current time as follows:
(8)
(9)
where wo is the weight matrix of the output gate and bo is its bias term, and ot is the output value.

From the above introduction, the LSTM network can capture better long-term dependence in sequences through the cell state and gate mechanism, and the temporal correlation does not weaken even if the time series becomes longer.

Evaluation metrics of long-term water quality prediction

Three indicators are calculated to evaluate the accuracy, reliability, and stability of water quality prediction models in this study, which are often used to evaluate simulation errors in water quality prediction.

Accuracy is evaluated by the root mean square error (RMSE), which is a typical statistical indicator to measure the overall error between the observed and predicted values. The specific expression is as follows:
(10)
where yobs,i is the observed value, ypre,i is the predicted value, and n is the length of the time series.
Reliability is evaluated by the mean absolute percentage error (MAPE), which can measure the robustness of the model. MAPE is a percentage form of the mean absolute error (MAE), whereas MAE follows Laplacian distribution and this method is less affected by outliers. The specific expression for MAPE is as follows:
(11)
A reasonable prediction should have the ability to make long-term predictions and have strong stability (Huang et al. 2022). The stability of a prediction model is reflected in the small probability of the error exceeding the allowable value during the long-term simulation. The absolute error ε of each time step was calculated, and the augmented Dickey–Fuller (ADF) test (Dong et al. 2023) was applied to test the prediction errors for stationarity and rationality.
(12)

The metrics of RMSE, MAPE, and ADF are selected to evaluate the accuracy, reliability, and stability, respectively, with the objective of addressing the interference of outliers and calibrating the performance in long-term water quality prediction.

The HR–HL system (Eastern China) is the study region for this research with data from 14 monitoring stations (S1–S14) obtained during 1998–2018.

Study region

The HR is the third largest river in China, and the HR Basin is located in eastern China, which covers an area of approximately 270,000 km2 (Xu et al. 2022). The HR Basin has a dense population and fertile soil and is an important grain production base in China, which provides about a quarter of China's commodity grain. The mainstream of the HR originates from the Tongbai Mountain and flows from west to east through four provinces before reaching the outlet at the HL. As shown in Figure 4, the HL is located on the HR which is the main water source for approximately 20 million people and has six main tributaries. The water environment of the HR–HL system is closely related, and the regimes of contaminant transport and transformation are complex. This study focused on the 300-km-long sub-reach of the HR and the HL. The elevation in this region is between 2 and 255 m, and the mean elevation is about 53 m (1985 Chinese National Elevation Datum).
Figure 4

Schematic view of the study area.

Figure 4

Schematic view of the study area.

Close modal

Data

There are 14 monitoring stations in this study area. S1–S6 are located in the mainstream of the HR, and S7–S14 are located in the HL. In addition, S6 is a monitoring station at the confluence of the HR and the HL, and S8, S10, S12, and S14 are the confluences of the HL and other tributaries. In this study, weekly or monthly sampling is carried out to measure the concentration and temperature of local contaminants and the flow is measured daily. The measurement period of the available AN and COD concentration datasets is from 1998 to 2018 for S1–S4, from 2003 to 2018 for S5 and S6, and from 2004 to 2018 for the other stations around the HL (Table 1). The concentration and temperature data of contaminants were measured using the national standard water quality detection method, which were provided by the HR Water Resources Protection Bulletin and the HR Water Environment Monitoring Center. The streamflow data were obtained from the hydrographic office of HR Commission of the Ministry of Water Resources. The streamflow and temperature data series are sufficiently long to match with the water quality data, and the daily streamflow datasets are converted into weekly or monthly mean values to drive the prediction model. Datasets before January 2018 are used for calibration and data from 2018 are used as a validation period for the water quality model in this study.

Table 1

Fourteen monitoring stations in this study

Station IDStation nameLocationMonitoring periodElevation (m)
S1 Lu Taizi HR 1998.1–2018.12 24.2 
S2 Beng Bu HR 1998.1–2018.12 21.3 
S3 Wu Jiadu HR 1998.1–2018.12 20.7 
S4 Lin Huaiguan HR 1998.1–2018.12 18.5 
S5 Xiao Liuxiang HR 2003.1–2018.12 16.5 
S6 Lao Zishan HR 2003.1–2018.12 13.5 
S7 Lin Huai HL 2004.1–2018.12 12–14 
S8 Jiang Ba HL 2004.1–2018.12  
S9 Gao Liangjian HL 2004.1–2018.12  
S10 Er Hezha HL 2004.1–2018.12  
S11 Cheng Zihu HL 2004.1–2018.12  
S12 Xu Hong HL 2004.1–2018.12  
S13 Cheng He HL 2004.1–2018.12  
S14 Li Hewa HL 2004.1–2018.12  
Station IDStation nameLocationMonitoring periodElevation (m)
S1 Lu Taizi HR 1998.1–2018.12 24.2 
S2 Beng Bu HR 1998.1–2018.12 21.3 
S3 Wu Jiadu HR 1998.1–2018.12 20.7 
S4 Lin Huaiguan HR 1998.1–2018.12 18.5 
S5 Xiao Liuxiang HR 2003.1–2018.12 16.5 
S6 Lao Zishan HR 2003.1–2018.12 13.5 
S7 Lin Huai HL 2004.1–2018.12 12–14 
S8 Jiang Ba HL 2004.1–2018.12  
S9 Gao Liangjian HL 2004.1–2018.12  
S10 Er Hezha HL 2004.1–2018.12  
S11 Cheng Zihu HL 2004.1–2018.12  
S12 Xu Hong HL 2004.1–2018.12  
S13 Cheng He HL 2004.1–2018.12  
S14 Li Hewa HL 2004.1–2018.12  

The data distribution analysis of AN and COD concentrations from S1 to S14 is shown in Figure 5. Obviously, the AN concentration monitored in the HR was generally higher than that in the HL and the highest concentration occurred in the midstream. In addition, the range of datasets of AN concentration in the river was wider, which means that AN accumulation events occur more frequently. With respect to the COD results, the spatial variability of datasets monitored in the HL was more significant than that in the HR.
Figure 5

Spatial patterns of water quality indicators in the HR–HL system from 2009 to 2019. The box plots in dark blue represent the upper and lower bounds, quartiles, and median (orange horizontal line) of pollutant concentrations in the HR, while light blue represent that in the HL (significance level = 0.05).

Figure 5

Spatial patterns of water quality indicators in the HR–HL system from 2009 to 2019. The box plots in dark blue represent the upper and lower bounds, quartiles, and median (orange horizontal line) of pollutant concentrations in the HR, while light blue represent that in the HL (significance level = 0.05).

Close modal

The proposed WTD-SCA-LSTM method was applied to this case, and the driving predictors for the HR and the HL were screened separately. On the basis of short-term accurate and reliable prediction, the long-term prediction and assessment of water quality in the study area will also be discussed in this section.

Original water quality series decomposition

The water quality indicators of a river–lake system are highly non-linear and complex due to the influence of climate change and human interference. This section aims to obtain subsequences with simple features for prediction input by using discrete wavelet decomposition to decompose the original AN and COD concentrations.

Before 2008, the water pollution problem in the HR Basin was severe, and high concentration pollutants were often monitored with water pollution incident (Han et al. 2023). After 2008, the concentrations of pollutants were controlled and remained stable due to the implementation of water environment protection policies. Therefore, the dataset period selected for the prediction model in this article is uniform starting from 2008 for the stability and accuracy of the model. According to Equation (2), the frequency of water quality samples is half a month and db5 is chosen as the wavelet basis function; the layers of wavelet decomposition is rounded to three after calculation.

As shown in Figure 6, the original AN concentration at S1 was decomposed into three high-frequency subsequences in Figure 6(b)–6(d) and the low-frequency subsequence in Figure 6(e). From subsequences d1–d3, the frequency of curve oscillation is decreasing and the mean value tends to 0, which shows significant stationary series features. The low-frequency term in Figure 6(e) reflects the overall trend of AN concentration at S1, showing a long-term downward trend but rebounded around 2018, which is consistent with the original characteristics in Figure 6(a). Compared with the low-frequency subsequences of S8 and S9, it was found that c3 may have a significant increasing or decreasing trend or it may be oscillatory, but there will be a threshold (upper and lower bounds of the original series). In short, the low-frequency term can serve as an evaluation basis for short-term trends prediction in water quality.
Figure 6

The results of wavelet decomposition: (a) the original AN concentration sequence at S1; (b–d) the high-frequency sequences decomposed from (a); (e) the low-frequency sequence obtained via (a); (f) and (g) low-frequency sequences obtained from the observed AN concentrations of S8 and S9, respectively.

Figure 6

The results of wavelet decomposition: (a) the original AN concentration sequence at S1; (b–d) the high-frequency sequences decomposed from (a); (e) the low-frequency sequence obtained via (a); (f) and (g) low-frequency sequences obtained from the observed AN concentrations of S8 and S9, respectively.

Close modal

Driving predictor screening and predictor set construction

Separately determining the inputs of the prediction model for the water quality indicators of each station is a prerequisite for accurate prediction. SCA was used in this section to quantitatively analyze the contribution of influencing factors to water quality parameters.

According to the physical processes of the water environment in the study area, the main hydrological factors affecting water quality are streamflow, water temperature, and wind speed (Lutz et al. 2016; Zlatanovic et al. 2017). Previous studies have utilized these factors to establish prediction inputs for predicting the original water quality sequence. When the data decomposition methods were applied to optimize the prediction model, the predictors should also consider the potential impact of the hydrological factors. From the perspective of the transformation mechanism of environmental factors, complex impacts may exist between pollutants in water bodies, such as the inverse relationship between total nitrogen and dissolved oxygen. In addition, the transport of pollutants can lead to spatial variability in water quality distribution. Therefore, the spatial connection of water quality for stations was evaluated in the inputs of the prediction model. For stations along the HR, the streamflow is the main factor affecting pollutant transport; the contribution of pollutant concentration at upstream station to local station was calculated in SCA. For stations around the HL, the spatial correlation was calculated with the pollutant concentration of nearby stations. The correlation is significant for stations around the confluences of the river-lake system (S6, S8, S10 and S12) due to highly dynamic contaminant activities.

The results of SCA provide a snapshot of a combination of main predictors. As shown in Figure 7, the correlation coefficients of various comprehensive factors on the AN concentration series and its subsequences of S3 were calculated, with dark boxes indicating the critical factors. Thus, the driving predictors set for d1 were AN and COD series of S3, while AN concentration at upstream S2 was input for the prediction of d2 and d3. As the final decomposed low-frequency subsequence, the driving predictor of c3 contains the original sequence, streamflow, and itself because of the least interference.
Figure 7

Results of SCA for factors of the subsequences of AN concentration at S3. AN (S2) and AN (S3) represent the AN concentration at S2 and S3, respectively. The variable Q represents the streamflow and T represents the temperature at S3.

Figure 7

Results of SCA for factors of the subsequences of AN concentration at S3. AN (S2) and AN (S3) represent the AN concentration at S2 and S3, respectively. The variable Q represents the streamflow and T represents the temperature at S3.

Close modal

The driving predictor set for stations in the HR (S3) and the HL (S8) were obtained by maximizing the correlation coefficient of SCA (Tables 2 and 3, respectively). For the monitoring stations observed in this study area, the inputs of high-frequency subsequence were repeatedly influenced by the upstream pollutant transport, especially during the wet years when pollution transport activities were intense. Compared with the AN and COD, the inputs for COD prediction usually include temperature, which has a priority than other factors such as streamflow and wind speed. In contrast, the inputs for AN prediction were mainly driven by the original decomposition sequence combination.

Table 2

Driving predictor set for S3's water quality prediction

ContaminantSubsequenceDriving predictors (with the lead time)
AN d1 d1 (t − 3), AN (S3)(t − 3), COD (t − 3) 
d2 d2 (t − 3), AN(S2) (t − 3), AN (S3) (t − 3) 
d3 d3 (t − 2), AN (S2) (t − 2), AN (S3) (t − 2) 
c3 c3 (t − 1), AN (S3) (t − 1), Q (t − 2) 
COD d1 d1 (t − 3), COD (S3) (t − 3), AN (t − 3) 
d2 d2 (t − 3), COD (S2) (t − 3), COD (S3) (t − 3), T (t − 1) 
d3 d3 (t − 2), COD (S3) (t − 2), T (t − 1) 
c3 c3 (t − 1), COD (S3) (t − 1) 
ContaminantSubsequenceDriving predictors (with the lead time)
AN d1 d1 (t − 3), AN (S3)(t − 3), COD (t − 3) 
d2 d2 (t − 3), AN(S2) (t − 3), AN (S3) (t − 3) 
d3 d3 (t − 2), AN (S2) (t − 2), AN (S3) (t − 2) 
c3 c3 (t − 1), AN (S3) (t − 1), Q (t − 2) 
COD d1 d1 (t − 3), COD (S3) (t − 3), AN (t − 3) 
d2 d2 (t − 3), COD (S2) (t − 3), COD (S3) (t − 3), T (t − 1) 
d3 d3 (t − 2), COD (S3) (t − 2), T (t − 1) 
c3 c3 (t − 1), COD (S3) (t − 1) 
Table 3

Driving predictor set for S8's water quality prediction

ContaminantSubsequenceDriving predictors (with the lead time)
AN d1 d1 (t − 4), AN (S7) (t − 4), AN (S8) (t − 4) 
d2 d2 (t − 4), AN (S7) (t − 4), AN (S8) (t − 4) 
d3 d3 (t − 3), AN (S8) (t − 3), COD (S8) (t − 3) 
c3 c3 (t − 1), AN (S8) (t − 1) 
COD d1 d1 (t − 4), COD (S8) (t − 4), COD (S7) (t − 4) 
d2 d2 (t − 4), COD (S8) (t − 4), T (t − 3) 
d3 d3 (t − 3), COD (S8) (t − 3), T (t − 2) 
c3 c3 (t − 1), COD (S8) (t − 2), T (t − 1) 
ContaminantSubsequenceDriving predictors (with the lead time)
AN d1 d1 (t − 4), AN (S7) (t − 4), AN (S8) (t − 4) 
d2 d2 (t − 4), AN (S7) (t − 4), AN (S8) (t − 4) 
d3 d3 (t − 3), AN (S8) (t − 3), COD (S8) (t − 3) 
c3 c3 (t − 1), AN (S8) (t − 1) 
COD d1 d1 (t − 4), COD (S8) (t − 4), COD (S7) (t − 4) 
d2 d2 (t − 4), COD (S8) (t − 4), T (t − 3) 
d3 d3 (t − 3), COD (S8) (t − 3), T (t − 2) 
c3 c3 (t − 1), COD (S8) (t − 2), T (t − 1) 

Water quality prediction based on LSTM

The LSTM was used to predict the subsequences decomposed by original water quality series. The dimension of the input layer depends on the number of input predictors, and the maximum input dimension in this study was 4 while the output dimension was only 1 which was the target subsequence. The dimension of the hidden layer was set to 2 when simulating the original water quality sequence and d1, and others were set to 1. The learning rate initialization of LSTM was set to 0.001, and it decreased continuously with training. The period of training set was from 2009 to 2018 with a prediction period from 2018 to 2019. Before starting the simulation of LSTM, normalization of the input datasets was carried out to remove the mean (zero mean and small variance) and improve the efficiency.

The LSTM based on the analysis of data decomposition and input indicators was used to predict d1–d3 and c3 for AN and COD of each station. As shown in Figure 8, the prediction accuracy increases from Figure 8(a) to 8(d), which suggests the advantages of using LSTM to predict simple sequences in terms of accuracy and stability. The predicted sequence of AN concentration at S3 was obtained by summing up the results of Figure 8(a)–8(d). Compared with the original sequence, the prediction result effectively simulated the monthly variation of AN concentration at S3. The final error was mainly caused by the prediction error at the extreme points of the high-frequency subsequences, indicating that reducing the relative error at these moments is crucial for improving the accuracy of the hybrid model.
Figure 8

Results of prediction for subsequences of AN concentration of S3 based on the LSTM network: (a–c) comparison of the observed and predicted high-frequency subsequences (d1–d3); (d) comparison of the observed and predicted low-frequency subsequence (c3); (e) the final prediction result of AN concentration, which was obtained by summing up the predicted values of subsequences.

Figure 8

Results of prediction for subsequences of AN concentration of S3 based on the LSTM network: (a–c) comparison of the observed and predicted high-frequency subsequences (d1–d3); (d) comparison of the observed and predicted low-frequency subsequence (c3); (e) the final prediction result of AN concentration, which was obtained by summing up the predicted values of subsequences.

Close modal
Figure 9 shows the comparison of water quality prediction results between a single LSTM and the ‘WTD-SCA-LSTM’ hybrid model at S3 (HR), S6 (confluence between the HR and the HL), and S8 (HL). Compared to directly applying the LSTM model, the accuracy and efficiency of the hybrid model significantly improved, and the training time reduced about 60%. Relative errors of the hybrid model during the recorded period were kept within 25 and 20% for AN and COD concentrations, respectively, indicating a stable prediction.
Figure 9

Comparison of the prediction results obtained by LSTM and ‘WTD-SCA-LSTM’ hybrid model for S3 (a and b), S6 (c and d), and S8 (e and f). S3 is located in the HR, S6 is located at the confluence of the HR and the HL, and S8 is located in the HL. (a, c, and e) described the variation of AN concentration, while (b, d, and f) were comparisons of the simulated COD concentration.

Figure 9

Comparison of the prediction results obtained by LSTM and ‘WTD-SCA-LSTM’ hybrid model for S3 (a and b), S6 (c and d), and S8 (e and f). S3 is located in the HR, S6 is located at the confluence of the HR and the HL, and S8 is located in the HL. (a, c, and e) described the variation of AN concentration, while (b, d, and f) were comparisons of the simulated COD concentration.

Close modal

The simulation errors of all stations were analyzed using Equations (10) and (11). Overall, the mean values of the RMSE in the recorded period for predicting the AN and COD concentration series in all stations are 0.06 and 0.17 mg/l, respectively, and that of the MAPE are 24.8 and 5.2%, respectively. This study used the ADF test to examine the relative error variation for stationarity, and the errors of AN and COD concentrations at 79.8 and 88.5 (at 95% confidence level) of all stations were stationary, respectively. In conclusion, the hierarchical optimization prediction model is characterized by high accuracy, reliability, and stability, which provides efficient and predictable support information for relevant departments.

Performance comparison of water quality prediction models

To compare the performance of the novel model with other water quality prediction models, the optimization function of predictor screening and data decomposition on LSTM were evaluated in this section. Selecting predictors with higher correlation by SCA helps eliminate unnecessary information from the inputs, especially when the water quality is significantly affected by a factor (i.e., the response of AN concentration to streamflow during flood season). The evaluation of simulating AN and COD concentrations for all stations by different hybrid models is shown in Table 4. After applying SCA to the LSTM prediction model, the mean value of RMSE for AN and COD concentrations decreased by 11 and 21%, respectively. In addition, the decrease in maximum RMSE is significant, which indicates that SCA is an essential part of water quality prediction for some stations.

Table 4

Comparison of different models' performance for simulating pollutant concentrations (training period: 2009–2018; testing period: 2018–2019)

EvaluationLSTM
SCA + LSTM
EMD + SCA + LSTM
WTD + SCA + LSTM
MaxMinMeanMaxMinMeanMaxMinMeanMaxMinMean
RMSE (mg/l) AN Training 0.39 0.05 0.15 0.31 0.04 0.11 0.15 0.03 0.07 0.09 0.03 0.04 
Testing 0.46 0.05 0.18 0.37 0.05 0.16 0.15 0.03 0.08 0.12 0.03 0.06 
COD Training 0.73 0.06 0.31 0.55 0.06 0.24 0.28 0.06 0.17 0.27 0.05 0.14 
Testing 0.91 0.11 0.39 0.69 0.11 0.31 0.34 0.07 0.19 0.31 0.06 0.17 
MAPE (%) AN Training 43.9 19.1 24.9 35.4 18.3 23.6 29.9 14.2 20.7 29.1 11.7 16.3 
Testing 46.1 19.6 31.3 40.5 18.9 29.7 38.6 18.6 28.9 35.3 17.5 24.8 
COD Training 11.2 4.1 6.8 9.9 4.7 6.7 8.4 3.1 5.3 6.6 2.4 4.1 
Testing 11.4 4.7 7.4 10.1 4.7 7.2 9.2 3.6 6.1 7.2 3.1 5.2 
EvaluationLSTM
SCA + LSTM
EMD + SCA + LSTM
WTD + SCA + LSTM
MaxMinMeanMaxMinMeanMaxMinMeanMaxMinMean
RMSE (mg/l) AN Training 0.39 0.05 0.15 0.31 0.04 0.11 0.15 0.03 0.07 0.09 0.03 0.04 
Testing 0.46 0.05 0.18 0.37 0.05 0.16 0.15 0.03 0.08 0.12 0.03 0.06 
COD Training 0.73 0.06 0.31 0.55 0.06 0.24 0.28 0.06 0.17 0.27 0.05 0.14 
Testing 0.91 0.11 0.39 0.69 0.11 0.31 0.34 0.07 0.19 0.31 0.06 0.17 
MAPE (%) AN Training 43.9 19.1 24.9 35.4 18.3 23.6 29.9 14.2 20.7 29.1 11.7 16.3 
Testing 46.1 19.6 31.3 40.5 18.9 29.7 38.6 18.6 28.9 35.3 17.5 24.8 
COD Training 11.2 4.1 6.8 9.9 4.7 6.7 8.4 3.1 5.3 6.6 2.4 4.1 
Testing 11.4 4.7 7.4 10.1 4.7 7.2 9.2 3.6 6.1 7.2 3.1 5.2 

Empirical Mode Decomposition (EMD) is a classic data decomposition method (Liang et al. 2021), and its performance is compared with WTD in Table 4. In general, the application of data decomposition methods significantly improves the accuracy and reliability of the prediction model. The mean value of RMSE for AN and COD concentrations decreased by more than 50% in some scenarios, and a significant reduction in MAPE values helps reduce the uncertainty in predictions. From the results of all stations, WTD performs better in both maximum and mean errors, indicating that WTD is a more robust choice in hybrid water quality prediction models.

Long-term water quality prediction

Long-term water quality prediction is significant for foreseeing the future water quality status and evaluating the water environment of the watershed in water environment management (Dong et al. 2023). With regard to applying LSTM in long-term water quality prediction, the most difficult task is to obtain input datasets. If the pollutant sequence needs to be predicted for a long time, the output of each step will be repeated as the input for the training model, which causes expensive computational costs and cumulative simulation errors. In this study, the original pollutant sequence was decomposed into a series of high-frequency subsequences by WTD, which showed significant stationarity features. Based on WTD, to predict long-term high-frequency subsequences (d1–d3) from 2023 to 2026, similar sequences in the historical period were selected as LSTM inputs. As shown in Figure 10, the water quality parameters after 2019 were not recorded; the same value as the breakpoint (2019.1) was found in 2015, and a sequence was intercepted to replace the model inputs after 2019. Three years were chosen as a cycle for long-term prediction, and the simulation results of high-frequency subsequences from 2023 to 2026 were obtained. In addition, a linear regression model was applied for long-term prediction of the low-frequency subsequences because of their simple trends.
Figure 10

Schematic diagram of selecting inputs for long-term prediction of high-frequency sequences.

Figure 10

Schematic diagram of selecting inputs for long-term prediction of high-frequency sequences.

Close modal
The above results were summed to obtain the future water quality situation, and the results can be seen in Figure 11. There are five levels of water quality based on concentration according to the environmental quality standards for surface water in China (Zhai et al. 2007). The AN concentration will be below Class III (1.0 mg/l) at all monitoring stations except S4 and S5 from 2023 to 2026, which conforms that the AN concentration at S4 and S5 was higher than others in Figure 6. The mean values of the AN concentration in the predicted period for all station will be below Class II (0.5 mg/l). The COD concentration will fall between Class II (3 mg/l) and Class III (5 mg/l) for almost all time of the prediction period, and it may occasionally cause high concentration pollution. In general, the water quality will meet Class III of water environment requirements at most periods from 2023 to 2026, and the classifications of COD will be worse than those of AN. In addition, it is worth noting that S4 and S5 in the HR will be a Class IV classification at a few periods in the future, so prevention and control policies can be considered in advance at these stations.
Figure 11

The long-term prediction of AN and COD concentrations: (a) and (b) AN prediction results for stations in the HR (S1–S6) and the HL (S7–S14), respectively; (c) and (d) COD prediction results for stations in the HR (S1–S6) and the HL (S7–S14), respectively. Partial data missed in S9, S12, and S13 from 2011 to 2012, which were processed through linear interpolation.

Figure 11

The long-term prediction of AN and COD concentrations: (a) and (b) AN prediction results for stations in the HR (S1–S6) and the HL (S7–S14), respectively; (c) and (d) COD prediction results for stations in the HR (S1–S6) and the HL (S7–S14), respectively. Partial data missed in S9, S12, and S13 from 2011 to 2012, which were processed through linear interpolation.

Close modal

Advantages of the developed prediction model

Previous studies mainly focused on the accuracy and reliability of water quality simulation and prediction, and the LSTM models were widely applied due to their powerful performance in time series prediction (Song et al. 2021). While several pre-processing methods have been proposed to enhance the LSTM models, few studies addressed the selection of factors for the preprocessed subsequences, limiting the potential for fully improving model performance. A noteworthy aspect of this study is the application of WTD and SCA methods in the hierarchical optimization of the LSTM-based water quality prediction model. WTD was used to extract multi-dimensional features from the original water quality sequences, reducing uncertainties arising from observation. Other data pre-processing methods (such as Locally Weighted Scatterplot Smoothing (LOWESS), Seasonal and Trend decomposition using Loess (STL), EMD, and Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)) were also considered for data denoising (Yu et al. 2022; Zhou et al. 2022; Dodig et al. 2024). However, we chose WTD for its ability to highlight local details of sequence data through an appropriate wavelet function. The combination of WTD and LSTM has been validated as robust in predicting non-linear time series in uncertain environments (Chen & Xue 2023). This study also demonstrated the effectiveness of WTD in pre-processing observed water quality data. In addition, traditional prediction models used only water quality data as input variables (Song et al. 2021; Yu et al. 2022). In contrast, this study considered multiple factors, including hydrology, water quality, and spatial connections among stations, with the SCA method employed to select relevant predictors. This hierarchical optimization model significantly improves the performance of LSTM, as shown in Table 4, which highlights the model's advantages in accuracy and reliability. In addition, the reconstruction of input features using WTD reduces the simulation time compared to a single LSTM model (Wu & Wang 2022).

In the HR Basin, it is difficult to obtain the input data for water quality prediction. A few studies of water quality prediction in this region (Zhang & Wu 2021; Chen et al. 2024) show that our predictions have a smaller RMSE compared to these results. Moreover, this study found that the subsequences obtained through WTD are stable except for low-frequency subsequences. These stationary subsequences can ensure the accuracy and improve the stability of LSTM in long-term prediction. The low-frequency subsequences can be calculated by linear fitting because of their simple trends. Accordingly, we achieved long-term water quality predictions for the HR Basin in the next few years. According to the National Water Quality Reports published by the China National Environmental Monitoring Centre (https://www.mee.gov.cn/hjzl/shj/dbsszyb/), the water quality of the HR Basin was consistent with our results in 2023, demonstrating the reliability of the developed model.

Limitations of the developed prediction model

Due to the limitation of the observation frequency of water quality in the HR Basin, only monthly AN and COD data were used for model training and prediction in this study. In future, higher frequency data are expected to be utilized to evaluate how data resolution affects model accuracy. In addition, factors such as land use and climate change also affect the water quality, but they are not key factors for predicting the water quality at the timescale of this study. When the observation period is longer, these factors could become important, and hence the development and application of the proposed model should pay attention to them in the future.

Water environment protection and management

Due to the transport of pollutants from the HR to the HL, there is a correlation between the water quality in the HL (S1–S6) and the HR (S7–S14). For example, in April 2014, a water pollution incident in the HR (Figure 11(a)) directly led to a significant increase in AN concentration within HL (Figure 11(b)). Therefore, controlling the pollution load in the HR is crucial for ensuring the health and sustainability of water environment in the HL.

As shown in Figure 11, the predicted AN and COD concentrations will remain stable, and COD will sometimes fail to meet water quality targets at S4 and S5. Therefore, the water environment management department can adjust pollution discharge standards in different sub-basins based on the long-term water quality predictions. More control policies should be implemented at stations that will exceed water quality targets, while considering easing on pollution discharge at other stations to help economic development.

In this study, a novel river water quality prediction model was developed and applied to simulate the water quality in the HR–HL system. This model used ‘decomposition–inputs–prediction’ hierarchical optimization to preprocess the original datasets, upgrade the predictors, and improve the prediction accuracy. The results lead to the following conclusions:

  • (1) A hierarchical optimization model based on WTD and SCA was developed for water quality prediction at 14 stations. In comparison with other models, the proposed model improves the accuracy (RMSE: 0.06 and 0.17 mg/l for AN and COD, respectively), reliability (MAPE: 24.8 and 5.2% for AN and COD, respectively), and simulation efficiency in the study area.

  • (2) AN and COD concentrations will continue to be stable in this region, and the mean values of COD concentration will be more likely to exceed the water quality targets than AN. The water quality of S4 and S5 in the HR will be worse than other stations, and their water quality will have a higher risk of exceeding the targets.

  • (3) The observation frequency of water quality data affects the accuracy of water quality prediction. Higher frequency data (weekly or daily) are expected to drive the developed model to compare the impact of data resolution on the performance of the water quality prediction model.

  • (4) The new model was just used for water quality predictions in the HR Basin in this article. It could be applied in other basins for long-term water quality prediction. The setting of hyper-parameters in LSTM is suitable for the stations within the study region. When applying this model in other regions, it is necessary to pay attention to adjusting the hyper-parameters.

This research was funded by the National Key R&D Program of China (2022YFC3202600), the National Natural Science Foundation of China (52479062 and 52309086), the Jiangsu Provincial Science and Technology Basic Research Program Youth Fund Project (BK20241516), and the Water Conservancy Science and Technology Project in Jiangsu Province (2023013, 2022061 and 2024009).

The authors declare there is no conflict.

Data cannot be made publicly available; readers should contact the corresponding author for details.

Adib
A.
,
Pourghasemzadeh
M.
&
Lotfirad
M.
(
2024
)
RNN-based monthly inflow prediction for Dez Dam in Iran considering the effect of wavelet pre-processing and uncertainty analysis
,
Hydrology
,
11
,
155
.
https://doi.org/10.3390/hydrology11090155
.
Ahmadi
S. M.
,
Balahang
S.
&
Abolfathi
S.
(
2024
)
Predicting the hydraulic response of critical transport infrastructures during extreme flood events
,
Engineering Applications of Artificial Intelligence
,
133
,
108573
.
https://doi.org/10.1016/j.engappai.2024.108573
.
Avila
R.
,
Horn
B.
,
Moriarty
E.
,
Hodson
R.
&
Moltchanova
E.
(
2017
)
Evaluating statistical model performance in water quality prediction
,
Journal of Environmental Management
,
206
,
910
919
.
Behmel
S.
,
Damour
M.
,
Ludwig
R.
&
Rodriguez
M. J.
(
2016
)
Water quality monitoring strategies – A review and future perspectives
,
Science of The Total Environment
,
571
,
1312
1329
.
https://doi.org/10.1016/j.scitotenv.2016.06.235
.
Chen
C.
&
Xue
X.
(
2023
)
A novel coupling preprocessing approach for handling missing data in water quality prediction
,
Journal of Hydrology
,
617
,
128901
.
https://doi.org/10.1016/j.jhydrol.2022.128901
.
Chen
J.
,
Li
H.
,
Felix
M.
,
Chen
Y.
&
Zheng
K.
(
2024
)
Water quality prediction of artificial intelligence model: A case of Huaihe river basin, China
,
Environmental Science and Pollution Research
,
31
(
10
),
14610
14640
.
https://doi.org/10.1007/s11356-024-32061-2
.
Conejo
A. J.
,
Plazas
M. A.
,
Espinola
R.
&
Molina
A. B.
(
2005
)
Day-ahead electricity price forecasting using the wavelet transform and ARIMA models
,
IEEE Transactions on Power Systems
,
20
,
1035
1042
.
Ding
X.
,
Zhu
Q.
,
Zhai
A.
&
Liu
L.
(
2019
)
Water quality safety prediction model for drinking water source areas in Three Gorges Reservoir and its application
,
Ecological Indicators
,
101
,
734
741
.
https://doi.org/10.1016/j.ecolind.2019.01.068
.
Dodig
A.
,
Ricci
E.
,
Kvascev
G.
&
Stojkovic
M.
(
2024
)
A novel machine learning-based framework for the water quality parameters prediction using hybrid long short-term memory and locally weighted scatterplot smoothing methods
,
Journal of Hydroinformatics
,
26
,
1059
1079
.
https://doi.org/10.2166/hydro.2024.273
.
Dong
W.
,
Zhang
Y.
,
Zhang
L.
,
Ma
W.
&
Luo
L.
(
2023
)
What will the water quality of the Yangtze river be in the future?
,
Science of The Total Environment
,
857
,
159714
.
https://doi.org/10.1016/j.scitotenv.2022.159714
.
Emamgholizadeh
S.
,
Kashi
H.
,
Marofpoor
I.
&
Zalaghi
E.
(
2014
)
Prediction of water quality parameters of Karoon river (Iran) by artificial intelligence-based models
,
International Journal of Environmental Science and Technology
,
11
,
645
656
.
https://doi.org/10.1007/s13762-013-0378-x
.
Eze
E.
&
Ajmal
T.
(
2020
)
Dissolved oxygen forecasting in aquaculture: A hybrid model approach
,
Applied Sciences
,
10
,
7079
.
https://doi.org/10.3390/app10207079
.
Farajpanah
H.
,
Adib
A.
,
Lotfirad
M.
,
Esmaeili-Gisavandani
H.
,
Riyahi
M.
&
Zaerpour
A.
(
2024
)
A novel application of waveform matching algorithm for improving monthly runoff forecasting using wavelet-ML models
,
Journal of Hydroinformatics
,
26
,
1771
1789
.
https://doi.org/10.2166/hydro.2024.128
.
Fomarelli
R.
,
Galelli
S.
,
Castelletti
A.
,
Antenucci
J. P.
&
Marti
C. L.
(
2013
)
An empirical modeling approach to predict and understand phytoplankton dynamics in a reservoir affected by interbasin water transfers
,
Water Resources Research
,
49
,
3626
3641
.
Gauthier
T.
(
2001
)
Detecting trends using Spearman's rank correlation coefficient
,
Environmental Forensics
,
2
,
359
362
.
https://doi.org/10.1006/enfo.2001.0061
.
Gers
F.
,
Schmidhuber
J.
&
Cummins
F.
(
1999
) ‘
Learning to forget: Continual prediction with LSTM
’,
Ninth International Conference on Artificial Neural Networks (ICANN99), Vols 1 and 2
,
IEEE Conference Publications
, pp.
850
855
.
Habib
M. A.
,
Abolfathi
S.
,
O'Sullivan
J. J.
&
Salauddin
M.
(
2024
)
Efficient data-driven machine learning models for scour depth predictions at sloping sea defences
,
Frontiers in Built Environment
,
10
,
1343398
.
https://doi.org/10.3389/fbuil.2024.1343398
.
Huang
X.
,
Xu
B.
,
Zhong
P. A.
,
Yao
H.
,
Yue
H.
,
Zhu
F.
,
Lu
Q.
,
Sun
Y.
,
Mo
R.
&
Li
Z.
(
2022
)
Robust multiobjective reservoir operation and risk decision-making model for real-time flood control coping with forecast uncertainty
,
Journal of Hydrology
,
605
,
127334
.
Imani
M.
,
Hasan
M. M.
,
Bittencourt
L. F.
,
McClymont
K.
&
Kapelan
Z.
(
2021
)
A novel machine learning application: Water quality resilience prediction model
,
Science of The Total Environment
,
768
,
144459
.
https://doi.org/10.1016/j.scitotenv.2020.144459
.
Karthikeyan
P.
,
Venkatachalapathy
R.
&
Vennila
G.
(
2017
)
Multivariate analysis for river water quality assessment of the Cauvery river, Tamil Nadu, India
,
Indian Journal of Geo-Marine Sciences
,
46
,
785
790
.
Komornikova
M.
,
Szolgay
J.
,
Svetlikova
D.
,
Szoekeova
D.
&
Jurcak
S.
(
2008
)
A hybrid modelling framework for forecasting monthly reservoir inflows
,
Journal of Hydrology and Hydromechanics
,
56
,
145
162
.
Kratzert
F.
,
Klotz
D.
,
Brenner
C.
,
Schulz
K.
&
Herrnegger
M.
(
2018
)
Rainfall–runoff modelling using long short-term memory (LSTM) networks
,
Hydrology and Earth System Sciences
,
22
,
6005
6022
.
Li
D.
,
Sun
Y.
,
Sun
J.
,
Wang
X.
&
Zhang
X.
(
2022
)
An advanced approach for the precise prediction of water quality using a discrete hidden Markov model
,
Journal of Hydrology
,
60
,
127659
.
https://doi.org/10.1016/j.jhydrol.2022.127659
.
Liang
B.-X.
,
Hu
J.-P.
,
Liu
C.
&
Hong
B.
(
2021
)
Data pre-processing and artificial neural networks for tidal level prediction at the Pearl River Estuary
,
Journal of Hydroinformatics
,
23
,
368
382
.
https://doi.org/10.2166/hydro.2020.055
.
Lutz
S. R.
,
Mallucci
S.
,
Diamantini
E.
,
Majone
B.
,
Bellin
A.
&
Merz
R.
(
2016
)
Hydroclimatic and water quality trends across three Mediterranean river basins
,
Science of The Total Environment
,
571
,
1392
1406
.
https://doi.org/10.1016/j.scitotenv.2016.07.102
.
Mahdian
M.
,
Noori
R.
,
Salamattalab
M.
,
Heggy
E.
,
Bateni
S.
,
Nohegar
A.
,
Hosseinzadeh
M.
,
Siadatmousavi
S.
,
Fadaei
M.
&
Abolfathi
S.
(
2024
)
Anzali wetland crisis: Unraveling the decline of Iran's ecological gem
,
Journal of Geophysical Research: Atmospheres
,
129
,
e2023JD039538
.
https://doi.org/10.1029/2023JD039538
.
Najah
A.
,
El-Shafie
A.
,
Karim
O. A.
&
Jaafar
O.
(
2011
)
Integrated versus isolated scenario for prediction dissolved oxygen at progression of water quality monitoring stations
,
Hydrology and Earth System Sciences
,
15
,
2693
2708
.
https://doi.org/10.5194/hess-15-2693-2011
.
Parmar
K. S.
&
Bhardwaj
R.
(
2015
)
River water prediction modeling using neural networks, fuzzy and wavelet coupled model
,
Water Resources Management
,
29
,
17
33
.
Peng
L.
,
Wu
H.
,
Gao
M.
,
Yi
H.
,
Xiong
Q.
,
Yang
L.
&
Cheng
S.
(
2022
)
TLT: Recurrent fine-tuning transfer learning for water quality long-term prediction
,
Water Research
,
225
,
119171
.
https://doi.org/10.1016/j.watres.2022.119171
.
Seo
Y.
,
Kim
S.
,
Kisi
O.
&
Singh
V. P.
(
2015
)
Daily water level forecasting using wavelet decomposition and artificial intelligence techniques
,
Journal of Hydrology
,
520
,
224
243
.
Singh
M.
&
Ahmed
S.
(
2021
)
IoT based smart water management systems: A systematic review
,
Materials Today: Proceedings
,
46
,
5211
5218
.
https://doi.org/10.1016/j.matpr.2020.08.588
.
Song
C.
,
Yao
L.
,
Hua
C.
&
Ni
Q.
(
2021
)
A novel hybrid model for water quality prediction based on synchrosqueezed wavelet transform technique and improved long short-term memory
,
Journal of Hydrology
,
603
,
126879
.
https://doi.org/10.1016/j.jhydrol.2021.126879
.
Tang
C.
,
Yi
Y.
,
Yang
Z.
&
Cheng
X.
(
2014
)
Water pollution risk simulation and prediction in the main canal of the south-to-north water transfer project
,
Journal of Hydrology
,
519
,
2111
2120
.
https://doi.org/10.1016/j.jhydrol.2014.10.010
.
Tian
H.
,
Du
Y.
,
Luo
X.
,
Dong
J.
,
Chen
S.
,
Hu
X.
,
Zhang
M.
,
Liu
Z.
&
Abolfathi
S.
(
2024a
)
Understanding visible light and microbe-driven degradation mechanisms of polyurethane plastics: Pathways, property changes, and product analysis
,
Water Research
,
259
,
121856
.
https://doi.org/10.1016/j.watres.2024.121856
.
Tian
H.
,
Wang
L.
,
Zhu
X.
,
Zhang
M.
,
Li
L.
,
Liu
Z.
&
Abolfathi
S.
(
2024b
)
Biodegradation of microplastics derived from controlled release fertilizer coating: Selective microbial colonization and metabolism in plastisphere
,
Science of The Total Environment
,
920
,
170978
.
https://doi.org/10.1016/j.scitotenv.2024.170978
.
Wang
Y.
&
Wu
L.
(
2016
)
On practical challenges of decomposition-based hybrid forecasting algorithms for wind speed and solar irradiation
,
Energy
,
112
,
208
220
.
https://doi.org/10.1016/j.energy.2016.06.075
.
Wu
J.
,
Zhang
J.
,
Tan
W.
,
Lan
H.
,
Zhang
S.
,
Xiao
K.
,
Wang
L.
,
Lin
H.
,
Sun
G.
&
Guo
P.
(
2023
)
Application of time serial model in water quality predicting
,
Computers Materials & Continua
,
74
,
67
82
.
https://doi.org/10.32604/cmc.2023.030703
.
Xiao
C.
,
Ye
J.
,
Esteves
R. M.
&
Rong
C.
(
2016
)
Using Spearman's correlation coefficients for exploratory data analysis on big dataset
,
Concurrency and Computation: Practice and Experience
,
28
,
3866
3878
.
Xu
J.
,
Zhang
C.
,
Wang
L.
,
Zhu
H.
,
Tang
H.
&
Avital
E. J.
(
2022
)
Variation of dominant discharge along the riverbed based on numerical and deep-learning models: A case study in the middle Huaihe river, China
,
Journal of Hydrology
,
612
,
128285
.
https://doi.org/10.1016/j.jhydrol.2022.128285
.
Yu
Z.
,
Yang
K.
,
Luo
Y.
&
Shang
C.
(
2020
)
Spatial-temporal process simulation and prediction of chlorophyll-a concentration in Dianchi lake based on wavelet analysis and long-short term memory network
,
Journal of Hydrology
,
582
,
124488
.
https://doi.org/10.1016/j.jhydrol.2019.124488
.
Yu
J.-W.
,
Kim
J.-S.
,
Li
X.
,
Jong
Y.-C.
,
Kim
K.-H.
&
Ryang
G.-I.
(
2022
)
Water quality forecasting based on data decomposition, fuzzy clustering and deep learning neural network
,
Environmental Pollution
,
303
,
119136
.
https://doi.org/10.1016/j.envpol.2022.119136
.
Yuan
M.
,
Wei
S.
,
Sun
M.
&
Zhao
J.
(
2022
)
Wavelet decomposition and Seq2Seq hybrid models for water quality prediction
,
Water Resources
,
49
,
743
752
.
https://doi.org/10.1134/S0097807822040212
.
Zhai
J.
,
He
Q.
&
Xiao
H. W.
(
2007
)
GIS-based fuzzy comprehensive water quality assessment model
,
Journal of Chongqing University(Natural Science Edition)
,
8
,
49
53
(In Chinese)
.
Zhang
K.
&
Wu
L.
(
2021
)
Using a fractional order grey seasonal model to predict the dissolved oxygen and pH in the Huaihe river
,
Water Science & Technology
,
83
,
475
486
.
https://doi.org/10.2166/wst.2020.596
.
Zhou
S.
,
Song
C.
,
Zhang
J.
,
Chang
W.
,
Hou
W.
&
Yang
L.
(
2022
)
A hybrid prediction framework for water quality with integrated W-ARIMA-GRU and LightGBM methods
,
Water
,
14
,
1322
.
https://doi.org/10.3390/w14091322
.
Zlatanovic
L. J.
,
van der Hoek
J. P.
&
Vreeburg
J. H. G.
(
2017
)
An experimental study on the influence of water stagnation and temperature change on water quality in a full-scale domestic drinking water system
,
Water Research
,
123
,
761
772
.
https://doi.org/10.1016/j.watres.2017.07.019
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).