River algal blooms pose a significant environmental threat, necessitating accurate forecasts and timely warnings for effective prevention. This study proposes a novel hybrid model, combining an external recursive long short-term memory neural network based on encoder–decoder (RLSTM-ED) with a backpropagation (BP) neural network, denoted as RLSTM-ED-BP. A dataset comprising 34,992 hydrological, climatic, and water quality (4-hourly) observations from the Hanjiang River Basin in China was divided for model training and testing. Comparative analysis with an RLSTM baseline demonstrated that the RLSTM-ED-BP model enhanced the Nash–Sutcliffe coefficient (NSE) by more than 5% and reduced the root mean square error by over 10% during the 24-h forecast horizon. The RLSTM-ED-BP model yielded NSE and threat score values exceeding 0.95 and efficiently provided early warnings for algal bloom events. The model's enhanced performance contributes to the generalizability of deep learning approaches in addressing the critical environmental challenge of algal blooms.

  • Deep learning captures the complex nonlinear relationship of impact factors and algal density.

  • Recursive long short-term memory neural network based on encoder–decoder (RLSTM-ED) conquers overfitting and alleviates error propagation of multi-output forecasts.

  • RLSTM-ED-BP improves the accuracy and reliability of intraday algal bloom early warning.

The eutrophication of water bodies, driven by climate change and human activities, has intensified algal blooms, a significant environmental challenge globally (Raulino et al. 2021; Zheng et al. 2021; Qian et al. 2024; Xiao et al. 2024). While traditionally occurring in static water bodies like lakes, algal blooms have also been observed in large rivers (Park et al. 2018; Pickering & Ford 2021). In China, the Hanjiang River has experienced recurrent algal blooms since 1992, characterized by a brownish tint, foul odor, and ecological damage, posing risks to freshwater quality and aquatic ecosystems (Xin et al. 2020; Li et al. 2021; Shen et al. 2021). Most studies focus on algal blooms in static shallow lakes (Schmale et al. 2019), while large river ecosystems, with their broader impact and complexity, remain underexplored, especially under limited data conditions (Tian et al. 2022). Accurate forecasting and early warning for river algal blooms are thus urgent and critical.

Algal bloom occurrences in large rivers are shaped by complex interactions among hydrological, climatic, and nutrient conditions (Whitehead et al. 2009; Zhou et al. 2021; Tian et al. 2022; Yin et al. 2022). Accurate forecasts of meteorological, hydrological, and water quality parameters underpin algal bloom early warning. However, while physical models are effective, their computational inefficiency highlights the advantages of machine learning methods, such as neural networks and support vector machines, in modeling nonlinear systems (Alavi et al. 2022). Deep learning, particularly long short-term memory (LSTM) networks (Hochreiter & Schmidhuber 1997), enhances forecast accuracy for water environment indicators (Hu et al. 2019). Recursive strategies improve multi-step-ahead forecasts by incorporating previous predictions (Zhou et al. 2019), but challenges like high-dimensional data and overfitting persist. Encoder–decoder (ED) architectures, originally developed for machine translation, effectively handle these issues and have been applied to water environment forecasting (Jahangir et al. 2023). Sheng et al. (2023) propose a multi-output temporal convolutional network-based ED model to forecast ammonia nitrogen. Zhang & Li (2023) combined the ED structure with deep learning models to perform water quality forecasting. However, gaps remain in integrating recursive strategies with ED models to address error propagation and overfitting challenges.

Proactive forecasting of riverine algal densities helps prevent algal blooms and mitigate their impacts. Constructing a predictive model that quantifies the multiple regression relationship between water environment factors (climatic, hydrological, and water quality factors) and algal density is a viable option. However, forecasting algal blooms remains challenging due to complex environmental influences and the nonlinear nature of algal dynamics (Xiao et al. 2017; Liu et al. 2022b). Traditional models like multiple linear regression struggle with such complexity, while machine learning methods, including neural networks, gradient boosting, and random forests, better capture nonlinear relationships and are commonly applied in algal bloom prediction (Xia et al. 2020; Deng et al. 2021; Liao et al. 2021). Most studies may pay more attention to forecasting and early warning of algal blooms on medium- to long-term scales, such as monthly or seasonal forecasting. Lin et al. (2023) researched medium-term algal bloom forecasting in mesotrophic lakes using machine learning models. Marzidovšek et al. (2024) utilized explainable machine learning models for seasonal harmful algal bloom predictions in the Adriatic Sea. However, hydrological changes from large-scale water diversion projects in the Hanjiang River have increased the frequency and risk of algal blooms (Maavara et al. 2015; Shen et al. 2021; Tan et al. 2023).

Algal bloom events develop rapidly, making medium- and long-term forecasts insufficient for timely preventive measures. Existing frameworks often fail to address the dynamic and sudden nature of blooms, highlighting the need for short-term, especially intraday, forecasting. Algal density, a key indicator of bloom severity, is traditionally monitored through labor-intensive field surveys, causing delays that hinder early warning models (Liu et al. 2020, 2023). While chlorophyll-a has been explored as an alternative indicator (Chen et al. 2015; Tian et al. 2022), algal density remains a more direct and accurate measure. Achieving accurate short-term forecasts under limited data conditions remains a significant challenge.

This study introduces a novel RLSTM-ED-BP model, integrating a recursive long short-term memory-based encoder–decoder model (RLSTM-ED) and a backpropagation (BP) neural network, for short-term algal bloom forecasting and early warning. The RLSTM-ED model addresses error propagation and overfitting, ensuring accurate multi-step water environment forecasts, while the BP model captures nonlinear relationships between water factors and algal density. Their fusion enables effective algal bloom forecasting and early warning. The model's applicability is demonstrated through a case study on algal bloom events in the Hanjiang River, China.

Study area

The Hanjiang River, originating at the southern foothills of the Qinling Mountains, is the largest tributary of the Yangtze River, traversing approximately 1,577 km before converging into the Yangtze River in Wuhan. The expansive Hanjiang River Basin (106–114°E, 30–34°N) covers an area of approximately 159,000 km2 and is demarcated into upper, middle, and lower reaches by the Danjiangkou Reservoir and Zhongxiang City. The river serves as a crucial source of drinking water for Wuhan, and the Danjiangkou Reservoir, located upstream, plays a pivotal role in the South-to-North Water Diversion Project, aimed at alleviating water stress in northern regions. However, recent hydraulic projects have brought adverse consequences to the Hanjiang River. The diversion of water from the Danjiangkou Reservoir for the South-to-North Water Diversion Project significantly diminishes the downstream runoff volume (Kuo et al. 2019). Concurrently, industrial and agricultural wastewater discharges have elevated nutrient levels, contributing to a surge in algal blooms (Xin et al. 2020). The Hanjiang River has witnessed an increased frequency of algal bloom outbreaks in the past three decades, exerting substantial impacts on the ecosystem and regional water supply. Algal blooms primarily occur from January to March, predominantly downstream of Huangzhuang (Tian et al. 2022). Consequently, Xiantao, a pivotal monitoring station for algal bloom incidents (Figure 1), is selected for detailed analysis in this study.
Figure 1

Locations of the Hanjiang River catchment and monitoring station.

Figure 1

Locations of the Hanjiang River catchment and monitoring station.

Close modal

Materials

This study collected 4 h (time step) data from the 2017–2021 dry season (December–May of the following year) at the Xiantao monitoring station downstream of the Hanjiang River. We selected three representative categories of environmental variables (Xia et al. 2020): (1) hydrological indicator: streamflow discharge (Q); (2) climate indicator: water temperature (WT); and (3) water quality indicators: total phosphorus (TP), total nitrogen (TN), ammonia nitrogen, dissolved oxygen (DO), 5-day biochemical oxygen demand () and potential of hydrogen (pH). The data used in this study were provided by the Changjiang Water Resources Commission of the Ministry of Water Resources in China (http://www.cjw.gov.cn/english/).

The time step of the datasets is 4 h. A total of 34,992 (=[(31 days × 4 months × 4 years + 30 days × 4 months + 28 days × 3 months + 29 days) × 6 × 8 variables) time series values were partitioned into three datasets for model training (18,960 from 2017.12–2019.12), validation (8,784 from 2020.1 to 2020.12), and testing (7,248 from 2021.1 to 2021.5). The statistics of the data collected at the Xiantao monitoring station are shown in Table S1. Furthermore, this study also collected 4 h (time step) emergency monitoring data from mid-February to late March 2018 at the Xiantao Station. The Pearson correlation coefficients of the four water environment factors (WT, pH, DO, and Q) with algal density and the data statistics are shown in Table S2.

This study proposes an RLSTM-ED-BP model to enhance the reliability and accuracy of algal bloom forecasts and early warning. For comparison, an RLSTM model is also constructed. Figure 2 illustrates the architecture of the RLSTM, RLSTM-ED, and RLSTM-ED-BP models, where Figure 2(a) presents the fusion of the recursive strategy and LSTM to make multi-step-ahead forecasts of water environment indicators for comparison, Figure 2(b) presents the integration of the RLSTM and ED framework to make multi-step-ahead forecasts of water environment indicators, and Figure 2(c) presents the hybrid of RLSTM-ED and BP for early warning of algal blooms, respectively.
Figure 2

Architecture of the (a) RLSTM, (b) RLSTM-ED, and (c) RLSTM-ED-BP models. Both RLSTM-ED and RLSTM models are constructed to make multi-step-ahead forecasts of water environment indicators. The RLSTM-ED-BP model is constructed to implement algal bloom early warning.

Figure 2

Architecture of the (a) RLSTM, (b) RLSTM-ED, and (c) RLSTM-ED-BP models. Both RLSTM-ED and RLSTM models are constructed to make multi-step-ahead forecasts of water environment indicators. The RLSTM-ED-BP model is constructed to implement algal bloom early warning.

Close modal

Recursive long short-term memory neural network

The structure of the LSTM unit includes two transmission states, namely the cell state and the hidden state (Equation (6)). LSTM removes or adds information by the structure of the gates. The LSTM unit has three gates, namely the forget gate (Equation (1)), the input gate (Equation (2)), and the output gate (Equation (3)). The forget gate determines the information that needs to be forgotten from the cell state and the input gate determines the information that needs to be stored from the cell state. The output vectors corresponding to the three gates are described as follows (Hochreiter & Schmidhuber 1997):
(1)
(2)
(3)
where the input vector contains the input at the current moment as well as the hidden state at the past moment. and are the weight parameters and bias of the forget gate, input gate, and output gate, respectively. denotes the sigmoid activation function with an output value between 0 and 1. The update of the cell state (Equation (5)) is calculated as follows:
(4)
(5)
where (Equation (4)) is the output vector of the core unit, and are the corresponding weights and biases, and tanh is the activation function. The new cell state updates the information through the input gate, the forget gate, the core unit, and the previous moment cell state. The output vector (Equation (7)) of the LSTM unit can be expressed as follows:
(6)
(7)
The LSTM model is composed of the above LSTM units. In this study, the LSTM model comprises an input layer, an LSTM layer, a fully connected layer, and an output layer. The output of the LSTM layer is fed to the fully connected layer, which reduces the computational dimensionality and ultimately yields the predicted values of pH, Q, WT, and DO at time t + 1. The recursive strategy based on multivariate inputs and multiple outputs is inspired by Zhou et al. (2019). The inputs to the model are the impact factors at moments t − 5,⋯, and t, and the outputs are the predicted values at future t + 1 moments. It uses the predicted value at moment t + 1 to update the input information and predict the output value at moment t + 2. The same can be done until the predicted values for the next six horizon periods are obtained. By fusing LSTM and a recursive strategy, we construct the RLSTM model (Figure 2(a)). The pattern of the RLSTM model to achieve multi-step-ahead forecasting can be described as follows:
(8)
(9)
Horizon t + 1:
(10)
Horizon t + 2:
(11)
Horizon t + 6:
(12)
where the function represents the pattern between the input and output variables., (Equations (10)–(12)) represent the vector of n predictor variables at horizons t + 1, …, t + 6, respectively. (Equation (8)) represents the matrix of six antecedent observations of the p external input variables. (Equation (9)) represents the matrix of six antecedent observations of the n autoregressive input variables.

RLSTM-based encoder decoder forecasting model

The LSTM-based encoder − decoder (LSTM-ED) module can be constructed by embedding the LSTM model into the ED structure. In this study, the ED structure consists of two LSTM neural networks, with the first LSTM acting as an encoder and the second LSTM acting as a decoder. The function of the encoder is to transform the input sequence into a fixed-length vector, and the decoder transforms it into the target value. The utilization of the ED structure aids in mitigating the overfitting problem and enhancing the accuracy and reliability of the model.

Assuming that denotes the observation at time t, sequence-to-sequence forecasting can be represented as the forecasting of a sequence (Equation (13)) of length k using the first j observations, , which can be represented as follows (Shi et al. 2015):
(13)
As shown in Equation (14), the encoder converts the information from the input sequence into a fixed-length context vector, which can be represented as follows:
(14)
As shown in Equation (15), the decoder can decode the context vectors and map them to the final predicted values, as demonstrated in the following equation:
(15)

In this study, the input sequence comprises eight variables, with a time step length of 6 for each variable. Therefore, the long short-term memory encoder (LSTMe) is reused 6 times (Kao et al. 2020). The repeated encoding process enables all input vectors to be transformed into their corresponding encoding vectors by LSTMe. The last time step in Figure 2(b) is sent to the decoder part as the context value. The fully connected layer is utilized to reduce the dimensionality of the high-dimensional forecasting from the LSTM decoder. The LSTM-ED module outputs pH, Q, WT, and DO at moment t. The RLSTM-ED model is constructed by fusing the LSTM-ED module and the recursive strategy (Figure 2(b)). According to the recursive strategy (Equations (8)–(12)), the output values of the LSTM-ED module at moment t are utilized to update the input information to predict the pH, Q, WT, and DO at moment t + 1. This process is repeated until pH, Q, WT, and DO values are predicted for the 24 h forecast horizon.

BP neural network model

The BP neural network was developed by scientists led by Rumelhart et al. (1986). The neural network is a multilayer feedforward neural network trained using the error BP algorithm. The BP neural network consists of three parts: the input layer, the hidden layer, and the output layer. The input information flows positively and passes through the three layers in turn to get the output value. The difference between the output value and the desired one is then calculated for BP and the output value is updated. The study utilizes the BP neural network model to construct a nonlinear mapping relationship among the explanatory variables (WT, Q, pH, and DO) and the predictor variable (algal density) based on the 2018 emergency monitoring data.

Fusing RLSTM-ED and BP (RLSTM-ED-BP) for early warning

Through fusing RLSTM-ED and BP, we introduce a hybrid RLSTM-ED-BP model (Figure 2(c)) for algal bloom early warning. The RLSTM-ED-BP model consists of two components. (1) Based on the predicted values of water environment indicators from t + 1 to t + 6 horizons of the RLSTM-ED model, algal density is predicted using the BP model developed in Section 2.5. (2) Algal density is an effective indicator for early warning of algal blooms and can determine the class of the algal bloom. Based on the predicted values of algal density, an early warning analysis of algal bloom events at Xiantao Station on the lower Hanjiang River is conducted.

The lack of algal monitoring data is a significant limiting factor in the development of forecasting and early warning models. In this study, a novel hybrid RLSTM-ED-BP model is proposed to provide a feasible solution for algal bloom forecasting and early warning in the lower reaches of the Hanjiang River under limited data conditions. The computation processes of forecasting and early warning models are described as follows.

Step 1: Divide the dataset into training, validation, and testing sets, and the dataset is processed with min–max normalization. The training set is used to fit the model and train the model parameters. The validation set is used to tune the model's hyperparameters. The test set is used to evaluate the model's generalization capabilities. The RLSTM-ED model is used to simulate and predict the values of WT, Q, PH, and DO from t + 1 to t + 6 horizons.

Step 2: Based on the 2018 emergency monitoring data, a BP model is used to capture a nonlinear mapping relationship among four water environment indicators (WT, Q, pH, and DO) and algal density.

Step 3: Based on the well-established BP model, four water environment indicators (WT, Q, pH, and DO) predicted by the RLSTM-ED model are used as inputs to predict algal densities. In this study, the hybrid model that integrates RLSTM-ED and BP models is referred to as the RLSTM-ED-BP model.

Step 4: Early warning analysis of algal bloom events at Xiantao Station on the lower Hanjiang River based on algal density predicted by the RLSTM-ED-BP model.

The RLSTM-ED model can be regarded as a ‘predictor’ that forecasts future water environment indicators (WT, Q, pH, and DO) based on existing monitoring data. In contrast, the BP model functions as an ‘interpreter’, establishing a nonlinear mapping relationship between limited algal density data and water environment indicators. By utilizing the RLSTM-ED model's predicted water environment indicators as inputs, the BP model converts these predictions into algal density dynamics. This two-step approach of the RLSTM-ED-BP model achieves algal bloom forecasting and early warning indirectly, effectively addressing the limitations posed by insufficient algal monitoring data.

The hyperparameters as well as the input and output variable settings of the model are shown in Table S3, where the parameters are determined by the trial-and-error method. The number of neurons in the hidden layer of the benchmark model (RLSTM) was similarly set to 32 for comparison (Sheng et al. 2023). The learning rate and epoch are set to 0.001 and 500, respectively. The model parameters are optimized using the Adam optimizer, and the mean absolute error is chosen as the loss function. The models used in this study are implemented using the Python library PyTorch. The models conduct 20 rounds of experiments to decrease the impact of weight parameters on the forecasting performance.

The differences between the RLSTM and RLSTM-ED models constructed in this paper are as follows: (1) to construct complex nonlinear mapping relationships between multiple input and multiple output variables, the former is based on an LSTM layer while the latter is based on an ED framework; and (2) since the latter consists of an LSTM encoder and a decoder, the number of parameters (32*32 + 32) is more than that of the former. Additionally, the RLSTM-ED model needs to be further integrated with the BP neural network for forecasting and early warning. The purpose of comparing the RLSTM-BP and RLSTM-ED-BP models in this study is to assess the effect of water environmental indicators predicted by the RLSTM and RLSTM-ED models on the accuracy of algal density forecasting.

For the water environment indicator forecasting results, the Nash–Sutcliffe efficiency (NSE) coefficient and root mean square error (RMSE) are used to evaluate the forecasting accuracy of the model (Nash & Sutcliffe 1970; Jamro et al. 2023). The formulas for NSE and RMSE are presented as follows:
(16)
(17)
where and denote the observed and predicted values at moment t, respectively, for a total of n moments. The value of NSE (Equation (16)) ranges from 0 to 1. Larger values of NSE and smaller values of RMSE (Equation (17)) indicate higher forecast accuracy. However, in actual forecasting, decision-makers may be more interested in whether an algal bloom occurs or not, rather than the specific algal densities when an algal bloom occurs. Therefore, a threat score (TS) is also used for accuracy assessment. The TS (Equation (18)) score is used to evaluate the percentage of correct forecasts out of the total number of forecasts and takes the value in the range [0,1]. A higher TS score indicates a better forecast. A higher TS score indicates a better forecast, which can be described as follows:
(18)
where the meaning of each symbol to the right of the equal sign is shown in Table S4.

This study aims to assess the accuracy and effectiveness of the proposed algal bloom forecasting and early warning model. The experimental results and comprehensive analysis are presented below.

Multi-step-ahead forecasts of water environment indicators using RLSTM-ED models

This study integrates a recursive strategy with an LSTM neural network model based on an ED (called RLSTM-ED) for multi-step ahead forecasting. To evaluate and compare the impact of the ED structure on the forecasting results of the model, a recursive strategy-based long short-term memory neural network (RLSTM) was used as a benchmark model. In addition, to decrease the impact of the inherent randomness of the neural network models on the forecasting results, 20 rounds of experiments were conducted for each model in this study. The mean values from 20 rounds of experiments were adopted as predictive values for the model. The model inputs are the observations collected in Table S1 for the preceding 24 h with a time step of 4 h. And the output is the forecasting results of DO, pH, WT, and flow (Q) for the 24-h forecast horizon. The forecasting results of the RLSTM-ED and LSTM models are shown in Table 1. Overall, the forecasting accuracy of both models decreased as the forecast horizon increased.

Table 1

Performance of the multi-step-ahead forecasting models for water environment indicators at Xiantao Station

PeriodModelIndicatorsEvaluation indicatorsHorizon
t + 1t + 2t + 3t + 4t + 5t + 6
Training R-LSTM-ED DOX NSE 0.990 0.988 0.984 0.981 0.976 0.972 
RMSE (mg/L) 0.131 0.148 0.167 0.185 0.204 0.223 
PH NSE 0.984 0.978 0.972 0.966 0.960 0.954 
RMSE 0.017 0.020 0.022 0.025 0.027 0.029 
WT NSE 0.991 0.986 0.982 0.978 0.973 0.969 
RMSE (°C) 0.451 0.543 0.624 0.697 0.763 0.824 
Q NSE 0.985 0.976 0.967 0.956 0.945 0.932 
RMSE (m3/s) 40 50 59 68 76 84 
RLSTM DOX NSE 0.980 0.974 0.968 0.961 0.954 0.946 
RMSE (mg/L) 0.190 0.213 0.237 0.261 0.285 0.308 
PH NSE 0.983 0.978 0.972 0.966 0.959 0.952 
RMSE 0.017 0.020 0.023 0.025 0.027 0.029 
WT NSE 0.981 0.978 0.975 0.970 0.962 0.953 
RMSE (°C) 0.639 0.696 0.744 0.815 0.908 1.016 
NSE 0.948 0.943 0.935 0.926 0.917 0.907 
RMSE (m3/s) 74 77 83 88 94 99 
Validation R-LSTM-ED DOX NSE 0.983 0.977 0.970 0.963 0.954 0.945 
RMSE (mg/L) 0.085 0.098 0.112 0.125 0.137 0.150 
PH NSE 0.995 0.995 0.994 0.993 0.992 0.991 
RMSE 0.005 0.005 0.006 0.006 0.006 0.007 
WT NSE 0.978 0.969 0.959 0.949 0.940 0.931 
RMSE (°C) 0.494 0.590 0.675 0.750 0.818 0.878 
NSE 0.996 0.993 0.989 0.985 0.980 0.974 
RMSE (m3/s) 12 16 20 24 27 31 
RLSTM DOX NSE 0.980 0.972 0.964 0.955 0.945 0.934 
RMSE (mg/L) 0.092 0.108 0.123 0.137 0.151 0.165 
PH NSE 0.989 0.988 0.988 0.987 0.987 0.985 
RMSE 0.007 0.008 0.008 0.008 0.008 0.009 
WT NSE 0.964 0.959 0.953 0.947 0.939 0.930 
RMSE (°C) 0.632 0.676 0.721 0.769 0.822 0.881 
Q NSE 0.956 0.953 0.948 0.942 0.936 0.929 
RMSE (m3/s) 40 41 43 46 48 51 
Testing R-LSTM-ED DOX NSE 0.981 0.973 0.963 0.951 0.937 0.921 
RMSE (mg/L) 0.102 0.123 0.144 0.166 0.188 0.210 
PH NSE 0.926 0.904 0.884 0.863 0.843 0.823 
RMSE 0.014 0.015 0.017 0.019 0.020 0.021 
WT NSE 0.997 0.994 0.990 0.986 0.982 0.978 
RMSE (°C) 0.211 0.320 0.403 0.474 0.537 0.594 
Q NSE 0.996 0.988 0.980 0.971 0.961 0.951 
RMSE (m3/s) 21 36 47 56 65 73 
RLSTM DOX NSE 0.918 0.894 0.863 0.826 0.782 0.733 
RMSE (mg/L) 0.213 0.243 0.276 0.312 0.348 0.385 
PH NSE 0.874 0.861 0.827 0.807 0.794 0.780 
RMSE 0.018 0.019 0.021 0.022 0.023 0.024 
WT NSE 0.964 0.962 0.960 0.916 0.912 0.907 
RMSE (°C) 0.763 0.780 0.807 1.167 1.196 1.233 
Q NSE 0.947 0.937 0.928 0.920 0.912 0.885 
RMSE (m3/s) 75 82 88 93 97 116 
PeriodModelIndicatorsEvaluation indicatorsHorizon
t + 1t + 2t + 3t + 4t + 5t + 6
Training R-LSTM-ED DOX NSE 0.990 0.988 0.984 0.981 0.976 0.972 
RMSE (mg/L) 0.131 0.148 0.167 0.185 0.204 0.223 
PH NSE 0.984 0.978 0.972 0.966 0.960 0.954 
RMSE 0.017 0.020 0.022 0.025 0.027 0.029 
WT NSE 0.991 0.986 0.982 0.978 0.973 0.969 
RMSE (°C) 0.451 0.543 0.624 0.697 0.763 0.824 
Q NSE 0.985 0.976 0.967 0.956 0.945 0.932 
RMSE (m3/s) 40 50 59 68 76 84 
RLSTM DOX NSE 0.980 0.974 0.968 0.961 0.954 0.946 
RMSE (mg/L) 0.190 0.213 0.237 0.261 0.285 0.308 
PH NSE 0.983 0.978 0.972 0.966 0.959 0.952 
RMSE 0.017 0.020 0.023 0.025 0.027 0.029 
WT NSE 0.981 0.978 0.975 0.970 0.962 0.953 
RMSE (°C) 0.639 0.696 0.744 0.815 0.908 1.016 
NSE 0.948 0.943 0.935 0.926 0.917 0.907 
RMSE (m3/s) 74 77 83 88 94 99 
Validation R-LSTM-ED DOX NSE 0.983 0.977 0.970 0.963 0.954 0.945 
RMSE (mg/L) 0.085 0.098 0.112 0.125 0.137 0.150 
PH NSE 0.995 0.995 0.994 0.993 0.992 0.991 
RMSE 0.005 0.005 0.006 0.006 0.006 0.007 
WT NSE 0.978 0.969 0.959 0.949 0.940 0.931 
RMSE (°C) 0.494 0.590 0.675 0.750 0.818 0.878 
NSE 0.996 0.993 0.989 0.985 0.980 0.974 
RMSE (m3/s) 12 16 20 24 27 31 
RLSTM DOX NSE 0.980 0.972 0.964 0.955 0.945 0.934 
RMSE (mg/L) 0.092 0.108 0.123 0.137 0.151 0.165 
PH NSE 0.989 0.988 0.988 0.987 0.987 0.985 
RMSE 0.007 0.008 0.008 0.008 0.008 0.009 
WT NSE 0.964 0.959 0.953 0.947 0.939 0.930 
RMSE (°C) 0.632 0.676 0.721 0.769 0.822 0.881 
Q NSE 0.956 0.953 0.948 0.942 0.936 0.929 
RMSE (m3/s) 40 41 43 46 48 51 
Testing R-LSTM-ED DOX NSE 0.981 0.973 0.963 0.951 0.937 0.921 
RMSE (mg/L) 0.102 0.123 0.144 0.166 0.188 0.210 
PH NSE 0.926 0.904 0.884 0.863 0.843 0.823 
RMSE 0.014 0.015 0.017 0.019 0.020 0.021 
WT NSE 0.997 0.994 0.990 0.986 0.982 0.978 
RMSE (°C) 0.211 0.320 0.403 0.474 0.537 0.594 
Q NSE 0.996 0.988 0.980 0.971 0.961 0.951 
RMSE (m3/s) 21 36 47 56 65 73 
RLSTM DOX NSE 0.918 0.894 0.863 0.826 0.782 0.733 
RMSE (mg/L) 0.213 0.243 0.276 0.312 0.348 0.385 
PH NSE 0.874 0.861 0.827 0.807 0.794 0.780 
RMSE 0.018 0.019 0.021 0.022 0.023 0.024 
WT NSE 0.964 0.962 0.960 0.916 0.912 0.907 
RMSE (°C) 0.763 0.780 0.807 1.167 1.196 1.233 
Q NSE 0.947 0.937 0.928 0.920 0.912 0.885 
RMSE (m3/s) 75 82 88 93 97 116 

The NSE values yielded by the RLSTM-ED model during the training phase range from 0.932 to 0.991, while the corresponding values yielded by the RLSTM model are between 0.907 and 0.983. The RLSTM-ED model performs superior forecasting performance at horizon t + 1 to t + 6. This is consistent with the conclusions drawn from the lower RMSE yielded by the RLSTM-ED model. Moreover, the NSE values yielded by the RLSTM-ED model are higher than 0.95, except for the discharge flow at horizon t + 5 and t + 6. Similarly, the RLSTM-ED model outperforms the RLSTM model in all forecast horizons during the validation phase. This indicates that the RLSTM-ED model has good fitting ability and can simulate the internal nonlinear characteristics of the multi-input and multi-output factors in the Xiantao Station.

The forecasting accuracy of the RLSTM-ED model in all horizons is also more favorable in the testing phase. The NSE values yielded by the water environment indicators are higher than 0.9 in all horizons except for pH from horizon t + 3 to t + 6. In contrast, the RLSTM model has a lower forecasting accuracy. The NSE values of pH are less than 0.9 in all six horizons. From horizon t + 5 to t + 6, the NSE values of DO and pH are even less than 0.8. The forecasting accuracy of the RLSTM model decreases relatively fast as the horizon increases. The NSE value of DO at horizon t + 6 is only 0.733, which is significantly lower than the corresponding value of DO at horizon t + 1. The forecasting accuracy of the RLSTM-ED model is superior to that of the RLSTM model in the testing phase, which is consistent with the conclusions drawn in the training phase.

More attention is usually paid to the predictive ability of the model in the testing phase. Therefore, Figure 3 visualizes the improvement rate of RLSTM-ED over the LSTM model in the testing phase. It can be seen from Figure 3 that the RLSTM-ED model increases the NSE value by approximately more than 5%. The improvement in DO is even as high as 25.6% at horizon t + 6. The improvement in RMSE is over 10% from horizon t + 1 to t + 6, which is much more significant.
Figure 3

Improvement rates in terms of (a) NSE and (b) RMSE of the RLSTM-ED model for the multi-step-ahead forecast at the Xiantao Station during the testing phase, compared with the RLSTM model. Improvement rate of each indicator = .

Figure 3

Improvement rates in terms of (a) NSE and (b) RMSE of the RLSTM-ED model for the multi-step-ahead forecast at the Xiantao Station during the testing phase, compared with the RLSTM model. Improvement rate of each indicator = .

Close modal
The Taylor diagram comprehensively incorporates centered RMSE (CRMSE), correlation coefficient (CC), and standard deviation (STD), visually presenting them on a single chart (Taylor 2001). The Taylor diagram provides a platform for intuitively comparing the predictive performance of various models and has been utilized in the field of hydrological forecasting (Wen et al. 2019; Kim et al. 2023). Figure 4 shows the normalized Taylor diagrams of the predicted means of the RLSTM-ED and RLSTM models for 20 rounds in the t + 1, t + 3, and t + 6 horizons during the testing stages.
Figure 4

Normalized Taylor diagrams of the predicted means of the models for 20 rounds in the t + 1, t + 3, and t + 6 horizons during the testing stages.

Figure 4

Normalized Taylor diagrams of the predicted means of the models for 20 rounds in the t + 1, t + 3, and t + 6 horizons during the testing stages.

Close modal

The normalized STD of the scatter corresponding to the observations is 1 (as shown in Figure 4). The closer the scatter corresponding to the model is to the scatter corresponding to the observations, the better the model can reproduce the observations. For these four water environmental indicators (Q, WT, DO, and pH), the normalized standard deviations yielded by the RLSTM-ED model range from 0.807 to 1.006, which are closer to the ideal forecasting STD of 1 compared to the corresponding values yielded by the RLSTM model. This finding is consistent with the high CC and low normalized CRMSE obtained by the RLSTM-ED model. However, the scatter points corresponding to both models deviate to varying degrees from the scatter points corresponding to the observations as the foresight horizon increases. Overall, the scatter points corresponding to the RLSTM-ED model at the three horizons are closer to the scatter points corresponding to the observations compared to the RLSTM model. The ED framework improves the forecast accuracy of the model and reduces the discrepancy between the forecasts and the observations.

The conclusion drawn in Figures 3 and 4 further confirms the superior forecasting performance of the RLSTM-ED model. Taking the horizon t + 6 as an example, the forecasting results and kernel density distribution curves of the RLSTM-ED and RLSTM models are shown in Figure 5. In comparison to the RLSTM model, the predictions generated by the RLSTM-ED model demonstrate greater proximity to the observed values. Furthermore, the corresponding kernel density profiles exhibit improved overlap with the observations. In summary, the RLSTM-ED model shows better performance in fitting and forecasting.
Figure 5

The line plots and kernel density curves of the predicted mean values of the models over 20 rounds at horizon t + 6.

Figure 5

The line plots and kernel density curves of the predicted mean values of the models over 20 rounds at horizon t + 6.

Close modal

Simulation and multi-step-ahead forecasts of algal density using BP models

From Table S2, it can be observed that algal density is closely related to the four water environment factors (WT, pH, DO, and Q). Therefore, a BP neural network model was used to create a regression on the 2018 emergency monitoring data (Table S2) at the time of the algal bloom. The inputs of the model are the measured data of four water environment indicators (WT, pH, DO, and Q), and the output is the algal density. A scatter plot of the model fitting results is shown in Figure 6. Scatter plots perform a better analysis of the correlation between observed and predicted values. The coefficient of determination R2 is also known as the goodness of fit, and it takes values from 0 to 1. The closer R2 is to 1, the more it indicates that the scatter is denser around the regression line. The BP model exhibits high fitting accuracy and can reproduce the observations more accurately. The nonlinear mapping relationship between these four water environment indicators and algal density can be well determined by adopting the BP model.
Figure 6

Scatter plots of simulated and observed algal density.

Figure 6

Scatter plots of simulated and observed algal density.

Close modal

According to Section 4.1, it is obvious that the forecasting accuracy of the RLSTM-ED model is high for the four water environment indicators (Q, WT, DO, and pH) from horizon t + 1 to t + 6. Therefore, the algal density forecasting model can be built using the BP neural network based on the forecasting results of the RLSTM-ED model, which is abbreviated as RLSTM-ED-BP. Besides, a BP neural network model built based on the predictions of the RLSTM model is used as a benchmark (referred to as RLSTM-BP) for comparison.

Figure 7 illustrates the violin plots of the RLSTM-ED-BP and RLSTM-BP models during the testing phase. The violin plots can be viewed as a combination of box plots and kernel density plots. The inner part of the violin plot shows the median, upper and lower quartiles, and maximum and minimum values, while the outer curve represents the density of the data distribution. The larger the graphical area of a region, the greater the probability of a distribution around a certain value.
Figure 7

Performance of the RLSTM-BP and RLSTM-ED-BP models (each model performed 20 rounds) for the 24-h forecast horizon during the testing phase.

Figure 7

Performance of the RLSTM-BP and RLSTM-ED-BP models (each model performed 20 rounds) for the 24-h forecast horizon during the testing phase.

Close modal

As shown in Figure 7, the shape distributions of violin plots drawn based on the forecasting of the two models are significantly different. The shape of the violin plots based on the forecasting results of the RLSTM-ED-BP model is much 'shorter and wider' compared to the RLSTM-BP model. RLSTM-ED-BP conducted 20 rounds of experiments resulting in smaller quartile ranges as well as min–max ranges. Hence, the outcomes from 20 rounds of experiments provide evidence that the constructed RLSTM-ED-BP model exhibits better stability. In addition, compared to the RLSTM-BP model, the RLSTM-ED-BP model yields higher NSE values and lower RMSE values for all horizons. The RLSTM-ED-BP model exhibits superior forecasting performance compared to the RLSTM-BP model.

In practice, decision-makers may pay more attention to whether or not an algal bloom occurs. The TS score is used to evaluate the proportion of correctly predicted occurrences of algal blooms out of the total number of forecasts. It ranges between 0 and 1, with a higher score indicating a higher accuracy in predicting algal bloom events. The TS score yielded by the RLSTM-ED-BP model is higher than the corresponding values yielded by the RLSTM-BP model (as shown in Figure 7). The RLSTM-ED-BP model demonstrates superior capability in correctly predicting algal bloom events.

The mean values of algal density forecasting from 20 rounds of experiments with the two hybrid models during the testing phase are shown in Table 2. Overall, the forecasting accuracy of the RLSTM-ED-BP and RLSTM-BP models decreased as the foresight horizon increased. The RLSTM-ED-BP model yields NSE and TS score values above 0.95 at horizons t + 1 to t + 6, which are significantly superior to the corresponding values obtained from the RLSTM-BP model. This finding is consistent with the lower RMSE values generated by the RLSTM-ED-BP model.

Table 2

The mean values of algal density forecasting from 20 rounds of experiments with the two hybrid models during the testing phase

StationModelEvaluation indicatorsHorizon
t + 1t + 2t + 3t + 4t + 5t + 6
Xiantao RLSTM-ED-BP TS 0.962 0.961 0.959 0.959 0.959 0.958 
NSE 0.989 0.984 0.978 0.973 0.967 0.961 
RMSE  74.429 91.202 105.590 117.922 129.145 141.015 
RLSTM-BP TS 0.909 0.909 0.907 0.905 0.904 0.901 
NSE 0.935 0.934 0.933 0.932 0.929 0.925 
RMSE  183.356 183.594 185.762 185.871 189.514 194.724 
StationModelEvaluation indicatorsHorizon
t + 1t + 2t + 3t + 4t + 5t + 6
Xiantao RLSTM-ED-BP TS 0.962 0.961 0.959 0.959 0.959 0.958 
NSE 0.989 0.984 0.978 0.973 0.967 0.961 
RMSE  74.429 91.202 105.590 117.922 129.145 141.015 
RLSTM-BP TS 0.909 0.909 0.907 0.905 0.904 0.901 
NSE 0.935 0.934 0.933 0.932 0.929 0.925 
RMSE  183.356 183.594 185.762 185.871 189.514 194.724 

In summary, compared to the RLSTM-BP model, the RLSTM-ED-BP model achieves superior forecasting accuracy and stability.

Algal bloom early warning analysis using the RLSTM-ED-BP model

As shown in Section 4.2, the RLSTM-ED-BP model reproduces the algal density more accurately. Algal density is an effective indicator for early warning of algal blooms (defined by algal density). Accurate early warning of algal bloom events may receive more attention. Therefore, this study conducted an early warning analysis of algal bloom events in 2018 and 2021 based on the predicted values of algal density from the RLSTM-ED-BP model (Table 3). The predictions of algal density from the RLSTM-ED-BP model are within the range corresponding to the algal bloom class. Our proposed algal bloom forecasting and warning framework effectively warns that a moderate bloom event will occur in the late January 2021 period. This provides valuable decision-making time (24-h forecast horizon) for the implementation of water project scheduling for algal bloom prevention and control.

Table 3

Early warning analysis of algal bloom events in the Xiantao Station

Training phaseTesting phase
Item 2018 2021 
Classification of algal blooms Ⅳ Ⅳ 
Observations of algal density (cells/L) 1.0–3.5 1.0–2.0 
Simulation and forecasting of algal densities (cells/L) 1.0–2.8 1.0–2.6 
Early warning or not Yes Yes 
No algal bloom (I): <algal density<
No significant algal bloom (Ⅱ): <algal density<
Mild algal bloom (Ⅲ): <algal density<
Moderate algal bloom (Ⅳ): <algal density<
Severe algal bloom (Ⅴ): Algal density> 
Training phaseTesting phase
Item 2018 2021 
Classification of algal blooms Ⅳ Ⅳ 
Observations of algal density (cells/L) 1.0–3.5 1.0–2.0 
Simulation and forecasting of algal densities (cells/L) 1.0–2.8 1.0–2.6 
Early warning or not Yes Yes 
No algal bloom (I): <algal density<
No significant algal bloom (Ⅱ): <algal density<
Mild algal bloom (Ⅲ): <algal density<
Moderate algal bloom (Ⅳ): <algal density<
Severe algal bloom (Ⅴ): Algal density> 

The phytoplankton community in the Hanjiang River exhibits diversity in distribution. The determination of dominant species can reflect the level of pollution and water environment conditions in a water body to some extent. Pan et al. (2014) studied the characteristics of phytoplankton group structure during spring algal blooms in the Hanjiang River. It can be found that the phytoplankton group composition at the Xiantao Station has the largest proportion of Bacillariophyta (51.29%), followed by Chlorophyta (28.21%); Cyanophyta and Cryptophyta account for a relatively small number of phytoplankton species (7.69%); and there are only 1–2 phytoplankton species of Pyrrophyta and Euglenophyta, which account for the smallest proportion of the phytoplankton species. Furthermore, Mai et al. (2020) presented a similar analysis on the phytoplankton group in the Hanjiang River and arrived at the same conclusion that the number of species of Bacillariophyta was dominant. Moreover, the dominant phytoplankton species was identified as Stephanodiscus hantzschii of diatoms by 18S rRNA gene sequence analysis (Zheng et al. 2009).

Algal bloom occurrences in the Hanjiang River result from the interplay of various environmental factors. Nutrients like nitrogen and phosphorus are fundamental prerequisites, climatic conditions act as inducers, and hydrodynamic conditions serve as primary drivers for algal blooms (Imteaz & Asaeda 2000; Neal et al. 2006). The accumulation of algal biomass in rivers necessitates high nutrient concentrations, suitable water temperatures, and relatively low flow conditions (Hilton et al. 2006; Li et al. 2021; Xiao et al. 2024). Diatom blooms predominantly occurred in January–March, correlating with the low-temperature season (Liu et al. 2022a). This suggests that late winter or early spring conditions are conducive to diatom growth. However, a deviation from this trend was observed in 2022, when algal blooms transpired during the flood season in the lower Hanjiang River, with cyanobacteria emerging as the dominant species. Unlike diatoms, cyanobacteria exhibit adaptability to higher temperatures, thriving in the range of 30–35°C (Cheng et al. 2019). This shift underscores the dynamic nature of algal bloom occurrences, emphasizing the influence of varying environmental conditions on phytoplankton community composition.

Water bodies' eutrophication, driven by excessive nutrients like nitrogen and phosphorus, is a critical environmental concern. Established thresholds for eutrophication, typically set at TN =0.2 mg/L and TP = 0.02 mg/L (Xia et al. 2020), indicate potential issues. Analyzing data from the Xiantao Station during the dry period from 2017 to 2021 (Table S1), TN and TP concentrations ranged from 1.279 to 2.256 mg/L and 0.05 to 0.157 mg/L, respectively – exceeding eutrophication thresholds. This aligns with earlier findings, indicating nutrient levels in the Hanjiang River satisfy essential conditions for diatom growth (Xin et al. 2020).

The Xiantao gaging station's median flow during early spring (January–March) from 2014 to 2018 was significantly lower than that recorded from 1992 to 2003 (Xin et al. 2020). Lower flow rates, known to favor algal growth (Kim et al. 2022), align with the observed increase in algal bloom frequency. The period before 2014 witnessed nine algal bloom events in 23 years, whereas the 9 years post-2014 saw five such events. This surge is attributed to the full operation of the Xinglong Reservoir and the South-to-North Water Diversion Project's impoundment around 2014 (Xin et al. 2020). Human activities, including dam construction and water extraction, have significantly altered the lower Hanjiang River's hydrological environment (Zhou et al. 2013; Mei et al. 2016; Zhang et al. 2022; Zhu et al. 2023). Streamflow's impact on nutrient and algal accumulation is notable. Lower streamflow conditions promote algal aggregation, reproduction, and prolonged nutrient presence – critical for algal growth. In 2022, reduced streamflow during the flood season in the lower Hanjiang River was a key factor in algal bloom occurrences, fostering stagnant water and heightened nutrient concentrations conducive to algae proliferation. Understanding these dynamics is vital for effective environmental management and mitigation strategies.

A confluence of environmental factors contributes to the occurrence of algal blooms in rivers. Algal density, directly reflecting the quantity of algae in a unit of water, stands as a pivotal indicator for assessing the severity of algal blooms. Consequently, the construction of a predictive model capable of accurately forecasting algal density values assumes paramount importance for early warning systems. However, the scarcity of available data on algal monitoring poses a significant hurdle in developing robust forecasting models. This dearth of monitoring data also imposes limitations on the application of process-based physical models, emphasizing the challenge of achieving accurate algal bloom forecasts and warnings under limited data conditions. Our proposed machine learning-based model for algal bloom forecasting and early warning maximizes the utilization of existing data, harnessing the strengths of machine learning modeling. The predictive model, RLSTM-ED, tailored for water environment indicators, adeptly addresses the intricate challenges posed by multiple input and multiple output factors. The ED structure enhances the model's predictive and generalization capabilities while excelling in capturing intricate nonlinear connections between input and output factors. Employing a recursive strategy, the model incorporates the predicted value at time ‘t’ as additional information for predicting the value at time ‘t + 1’. This real-time updating of input variables facilitates a dynamic mapping relationship between inputs and outputs. The recursive strategy, by swiftly adapting model parameters to dynamically changing water environmental factors, significantly enhances forecasting accuracy. This valuable strategy allows simultaneous multi-step ahead forecasting for four crucial water environment metrics – pH, DO, streamflow (Q), and WT.

Given the constraints of available data, not all factors influencing algal blooms were incorporated as explanatory variables. Nevertheless, the BP model effectively established a nonlinear mapping relationship between the input variables (pH, DO, Q, and WT) and the output variable (algal density). The water environment indicators (pH, DO, Q, and WT) exhibit notable correlations with algal densities (Table S2), contributing significantly to the model's performance enhancement. In comparison to the RLSTM-BP model, the RLSTM-ED-BP model demonstrates superior accuracy in predicting algal density. The inclusion of the ED structure plays a pivotal role in refining algal density forecasting accuracy by enhancing the precision of water environment indicators prediction. The RLSTM-ED-BP hybrid model marks a significant milestone in intraday algal density forecasting and algal bloom warning, achieving a short-term (24 h) forecasting and warning capability for the first time. Through intraday algal bloom forecasting and early warning, management authorities can promptly regulate reservoir outflow to utilize hydrodynamic forces for flushing and diluting algal concentrations. This approach effectively mitigates the risk of algal bloom. Such a dynamic regulation mechanism offers a more precise and scientifically grounded strategy for managing sudden algal bloom events, thereby enhancing the ability to safeguard aquatic ecosystem stability. This study is limited by the lack of monitoring data across the entire river basin. Focusing solely on the Xiantao section may not fully capture the algal bloom dynamics of the entire Hanjiang River. However, the outcomes of this study validate the robustness and feasibility of the proposed algal bloom forecasting and early warning model under conditions of limited data. This advancement holds promise for addressing the challenges posed by data constraints in accurately predicting and proactively managing algal blooms in water bodies.

Accurate forecasting and early warning systems for river algal blooms play a pivotal role in aiding decision-makers to devise effective strategies for algal bloom prevention and control. This study proposes a novel hybrid RLSTM-ED-BP model designed for short-term (intraday) algal bloom forecasting and early warning. The case study focuses on the Hanjiang River Basin in China. The proposed ED structure demonstrates superior performance in capturing intricate nonlinear mapping connections of water environment indicators and impact factors. Leveraging a recursive strategy, this model proves invaluable for achieving multi-step-ahead forecasting of four water environment indicators simultaneously. In comparison to the benchmark RLSTM-BP model, the RLSTM-ED-BP model exhibits a notable enhancement in the reliability and stability of algal density forecast results. The key findings are summarized as follows:

  • a. The RLSTM-ED model, in the testing period, surpasses the RLSTM model by enhancing the NSE by more than 5% and reducing the RMSE by over 10% from t + 1 to t + 6 horizons. It overcomes the limitations of error propagation and overfitting in traditional artificial neural network models for multi-step ahead forecasting.

  • b. The BP model effectively establishes a nonlinear mapping relationship between water environment indicators (WT, pH, DO, and Q) and algal density. The RLSTM-ED-BP model outperforms the RLSTM-BP model, yielding NSE and TS scores higher than 0.95, with improved predictive accuracy and enhanced stability in algal density forecasting. The ED framework contributes to the precision of algal density forecasting by refining predicted values of water environmental indicators.

  • c. The RLSTM-ED-BP model accurately predicts algal density, enabling effective early warning of a moderate algal bloom event in late January 2021 at the Xiantao Station of the downstream Hanjiang River. The 24-h forecast horizon provides valuable decision-making time for implementing water project scheduling for algal bloom prevention and control.

The novel hybrid RLSTM-ED-BP model proposed in this study significantly reduces the risk of algal blooms and offers essential technical support for decision-makers in crafting strategies for prevention and control. Future research avenues could explore year-round algal bloom forecasting and early warning studies, integrating water quality sampling data collected during the flood season. Additionally, exploring the integration of algal bloom forecasting and early warning with water engineering for comprehensive prevention and control scheduling could lead to a more integrated approach.

This work was supported by the National Key Research and Development Program of China (No. 2021YFC3200303). The authors would like to thank the Editors and anonymous Reviewers for their constructive comments that greatly contributed to improving the manuscript.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Alavi
J.
,
Ewees
A. A.
,
Ansari
S.
,
Shahid
S.
&
Yaseen
Z. M.
(
2022
)
A new insight for real-time wastewater quality prediction using hybridized kernel-based extreme learning machines with advanced optimization algorithms
,
Environ. Sci. Pollut. Res.
,
29
,
20496
20516
.
https://doi.org/10.1007/s11356-021-17190-2
.
Chen
Q.
,
Guan
T.
,
Yun
L.
,
Li
R.
&
Recknagel
F.
(
2015
)
Online forecasting chlorophyll a concentrations by an auto-regressive integrated moving average model: feasibilities and potentials
,
Harmful Algae
,
43
,
58
65
.
https://doi.org/10.1016/j.hal.2015.01.002
.
Cheng
B.
,
Xia
R.
,
Zhang
Y.
,
Yang
Z.
,
Hu
S.
,
Guo
F.
&
Ma
S.
(
2019
)
Characterization and causes analysis for algae blooms in large river system
,
Sustain. Cities Soc.
,
51
,
101707
.
https://doi.org/10.1016/j.scs.2019.101707
.
Deng
T.
,
Chau
K.-W.
&
Duan
H.-F.
(
2021
)
Machine learning based marine water quality prediction for coastal hydro-environment management
,
J. Environ. Manage.
,
284
,
112051
.
https://doi.org/10.1016/j.jenvman.2021.112051
.
Hilton
J.
,
O'Hare
M.
,
Bowes
M. J.
&
Jones
J. I.
(
2006
)
How green is my river? A new paradigm of eutrophication in rivers
,
Sci. Total Environ.
,
365
,
66
83
.
https://doi.org/10.1016/j.scitotenv.2006.02.055
.
Hochreiter
S.
&
Schmidhuber
J.
(
1997
)
Long short-term memory
,
Neural Comput.
,
9
,
1735
1780
.
https://doi.org/10.1162/neco.1997.9.8.1735
.
Hu
Z.
,
Zhang
Y.
,
Zhao
Y.
,
Xie
M.
,
Zhong
J.
,
Tu
Z.
&
Liu
J.
(
2019
)
A water quality prediction method based on the deep LSTM network considering correlation in smart mariculture
,
Sensors
,
19
,
1420
.
https://doi.org/10.3390/s19061420
.
Imteaz
M. A.
&
Asaeda
T.
(
2000
)
Artificial mixing of lake water by bubble plume and effects of bubbling operations on algal bloom
,
Water Res.
,
34
,
1919
1929
.
https://doi.org/10.1016/S0043-1354(99)00341-3
.
Jahangir
M. S.
,
You
J.
&
Quilty
J.
(
2023
)
A quantile-based encoder–decoder framework for multi-step ahead runoff forecasting
,
J. Hydrol.
,
619
,
129269
.
https://doi.org/10.1016/j.jhydrol.2023.129269
.
Jamro
I. A.
,
Raheem
A.
,
Khoso
S.
,
Baloch
H. A.
,
Kumar
A.
,
Chen
G.
,
Bhagat
W. A.
,
Wenga
T.
&
Ma
W.
(
2023
)
Investigation of enhanced H2 production from municipal solid waste gasification via artificial neural network with data on tar compounds
,
J. Environ. Manage.
,
328
,
117014
.
https://doi.org/10.1016/j.jenvman.2022.117014
.
Kao
I.-F.
,
Zhou
Y.
,
Chang
L.-C.
&
Chang
F.-J.
(
2020
)
Exploring a long short-term memory based encoder-decoder framework for multi-step-ahead flood forecasting
,
J. Hydrol.
,
583
,
124631
.
https://doi.org/10.1016/j.jhydrol.2020.124631
.
Kim
T.
,
Shin
J.
,
Lee
D.
,
Kim
Y.
,
Na
E.
,
Park
J.
,
Lim
C.
&
Cha
Y.
(
2022
)
Simultaneous feature engineering and interpretation: forecasting harmful algal blooms using a deep learning approach
,
Water Res.
,
215
,
118289
.
https://doi.org/10.1016/j.watres.2022.118289
.
Kim
S.
,
Seo
Y.
,
Malik
A.
,
Kim
S.
,
Heddam
S.
,
Yaseen
Z. M.
,
Kisi
O.
&
Singh
V. P.
(
2023
)
Quantification of river total phosphorus using integrative artificial intelligence models
,
Ecol. Indic.
,
153
,
110437
.
https://doi.org/10.1016/j.ecolind.2023.110437
.
Kuo
Y.-M.
,
Liu
W.
,
Zhao
E.
,
Li
R.
&
Muñoz-Carpena
R.
(
2019
)
Water quality variability in the middle and down streams of Han River under the influence of the middle route of south-North water diversion project, China
,
J. Hydrol.
,
569
,
218
229
.
https://doi.org/10.1016/j.jhydrol.2018.12.001
.
Li
J.
,
Yin
W.
,
Jia
H.
&
Xin
X.
(
2021
)
Hydrological management strategies for the control of algal blooms in regulated lowland rivers
,
Hydrol. Process.
,
35
,
e14171
.
https://doi.org/10.1002/hyp.14171
.
Liao
A.
,
Han
D.
,
Song
X.
&
Yang
S.
(
2021
)
Impacts of storm events on chlorophyll-a variations and controlling factors for algal bloom in a river receiving reclaimed water
,
J. Environ. Manage.
,
297
,
113376
.
https://doi.org/10.1016/j.jenvman.2021.113376
.
Lin
S.
,
Pierson
D. C.
&
Mesman
J. P.
(
2023
)
Prediction of algal blooms via data-driven machine learning models: an evaluation using data from a well-monitored mesotrophic lake
,
Geosci. Model Dev.
,
16
,
35
46
.
https://doi.org/10.5194/gmd-16-35-2023
.
Liu
J.-Y.
,
Zeng
L.-H.
,
Ren
Z.-H.
,
Du
T.-M.
&
Liu
X.
(
2020
)
Rapid in situ measurements of algal cell concentrations using an artificial neural network and single-excitation fluorescence spectrometry
,
Algal Res.
,
45
,
101739
.
https://doi.org/10.1016/j.algal.2019.101739
.
Liu
C.
,
Chen
Y.
,
Zou
L.
,
Cheng
B.
&
Huang
T.
(
2022a
)
Time-lag effect: river algal blooms on multiple driving factors
,
Front. Earth Sci.
,
9
,
813287
.
https://doi.org/10.3389/feart.2021.813287
.
Liu
M.
,
He
J.
,
Huang
Y.
,
Tang
T.
,
Hu
J.
&
Xiao
X.
(
2022b
)
Algal bloom forecasting with time-frequency analysis: a hybrid deep learning approach
,
Water Res.
,
219
,
118591
.
https://doi.org/10.1016/j.watres.2022.118591
.
Liu
M.
,
Hu
J.
,
Huang
Y.
,
He
J.
,
Effiong
K.
,
Tang
T.
,
Huang
S.
,
Perianen
Y. D.
,
Wang
F.
,
Li
M.
&
Xiao
X.
(
2023
)
Probabilistic prediction of algal blooms from basic water quality parameters by Bayesian scale-mixture of skew-normal model
,
Environ. Res. Lett.
,
18
,
014034
.
https://doi.org/10.1088/1748-9326/acaf11
.
Maavara
T.
,
Parsons
C. T.
,
Ridenour
C.
,
Stojanovic
S.
,
Dürr
H. H.
,
Powley
H. R.
&
Van Cappellen
P.
(
2015
)
Global phosphorus retention by river damming
,
Proc. Natl. Acad. Sci.
,
112
,
15603
15608
.
https://doi.org/10.1073/pnas.1511797112
.
Mai
Z.
,
Li
S.
,
Guo
C.
,
Li
W.
&
Yin
Z.
(
2020
)
Phytoplankton community structure and water quality evaluation in the middle and lower reaches of the Hanjiang River (in Chinese)
,
Bio. Res.
,
42
(
3
),
271
278
.
https://doi.org/10.14188/j.ajsh.2020.03.002
.
Marzidovšek
M.
,
Francé
J.
,
Podpečan
V.
,
Vadnjal
S.
,
Dolenc
J.
&
Mozetič
P.
(
2024
)
Explainable machine learning for predicting diarrhetic shellfish poisoning events in the adriatic Sea using long-term monitoring data
,
Harmful Algae
,
139
,
102728
.
https://doi.org/10.1016/j.hal.2024.102728
.
Mei
X.
,
Dai
Z.
,
Wei
W.
&
Gao
J.
(
2016
)
Dams induced stage–discharge relationship variations in the upper Yangtze River basin
,
Hydrol. Res.
,
47
,
157
170
.
https://doi.org/10.2166/nh.2015.010
.
Nash
J. E.
&
Sutcliffe
J. V.
(
1970
)
River flow forecasting through conceptual models part I – a discussion of principles
,
J. Hydrol.
,
10
,
282
290
.
https://doi.org/10.1016/0022-1694(70)90255-6
.
Neal
C.
,
Hilton
J.
,
Wade
A. J.
,
Neal
M.
&
Wickham
H.
(
2006
)
Chlorophyll-a in the rivers of eastern England
,
Sci. Total Environ.
,
365
,
84
104
.
https://doi.org/10.1016/j.scitotenv.2006.02.039
.
Pan
X.
,
Zhu
A.
,
Zheng
Z.
,
Qiao
Y.
,
Zou
Q.
,
Zhou
L.
&
Zou
X.
(
2014
)
Structural characteristics and influencing factors of phytoplankton community in the middle and lower reaches of Hanjiang River during spring season (in Chinese)
,
Chin. J. Ecol..
,
33
(
01
),
33
40
.
https://doi.org/10.13292/j.1000-4890.20131220.0012
.
Park
H.-K.
,
Kwon
M.-A.
,
Lee
H.-J.
,
Oh
J.
,
Lee
S.-H.
&
Kim
I.-S.
(
2018
)
Molecular verification of bloom-forming aphanizomenon flos-aquae and their secondary metabolites in the Nakdong River
,
Int J Environ Res Public Health
,
15
,
1739
.
https://doi.org/10.3390/ijerph15081739
.
Qian
J.
,
Qian
L.
,
Pu
N.
,
Bi
Y.
,
Wilhelms
A.
&
Norra
S.
(
2024
)
An intelligent early warning system for harmful algal blooms: harnessing the power of big data and deep learning
,
Environ. Sci. Technol.
,
58
(
35
),
15607
15618
.
https://doi.org/10.1021/acs.est.3c03906
.
Raulino
J. B. S.
,
Silveira
C. S.
&
Lima Neto
I. E.
(
2021
)
Assessment of climate change impacts on hydrology and water quality of large semi-arid reservoirs in Brazil
,
Hydrol. Sci. J.
,
66
(
8
),
1321
1336
.
https://doi.10.1080/02626667.2021.1933491
.
Rumelhart
D. E.
,
Hinton
G. E.
&
Williams
R. J.
(
1986
)
Learning representations by back-propagating errors
,
Nature
,
323
,
533
536
.
https://doi.org/10.1038/323533a0
.
Schmale
D. G.
,
Ault
A. P.
,
Saad
W.
,
Scott
D. T.
&
Westrick
J. A.
(
2019
)
Perspectives on harmful algal blooms (HABs) and the cyberbiosecurity of freshwater systems
,
Front. Bioeng. Biotechnol.
,
7
,
128
.
https://doi.org/10.3389/fbioe.2019.00128
.
Shen
L.
,
Dou
M.
,
Xia
R.
,
Li
G.
&
Yang
B.
(
2021
)
Effects of hydrological change on the risk of riverine algal blooms: case study in the mid-downstream of the Han River in China
,
Environ. Sci. Pollut. Res.
,
28
,
19851
19865
.
https://doi.org/10.1007/s11356-020-11756-2
.
Sheng
S.
,
Lin
K.
,
Zhou
Y.
,
Chen
H.
,
Luo
Y.
,
Guo
S.
&
Xu
C.-Y.
(
2023
)
Exploring a multi-output temporal convolutional network driven encoder-decoder framework for ammonia nitrogen forecasting
,
J. Environ. Manage.
,
342
,
118232
.
https://doi.org/10.1016/j.jenvman.2023.118232
.
Shi
X. J.
,
Chen
Z. R.
,
Wang
H.
,
Yeung
D. Y.
,
Wong
W. K.
&
Woo
W. C.
(
2015
)
Convolutional LSTM network: a machine learning approach for precipitation nowcasting
,
Adv. Neural Inf. Process. Syst.
,
28
,
802
810
.
https://doi.org/10.1007/978-3-319-21233-3_6
.
Tan
L.
,
Wang
Z.
,
Bai
Y.
&
Huang
X.
(
2023
)
Short-term responses of nutrients and algal biomass in a eutrophic shallow lake to different scales of water transfer
,
Sci. Total Environ.
,
880
,
163321
.
https://doi.org/10.1016/j.scitotenv.2023.163321
.
Taylor
K. E.
(
2001
)
Summarizing multiple aspects of model performance in a single diagram
,
J. Geophys. Res. Atmos.
,
106
,
7183
7192
.
https://doi.org/10.1029/2000JD900719
.
Tian
J.
,
Guo
S.
,
Wang
J.
,
Wang
H.
&
Pan
Z.
(
2022
)
Preemptive warning and control strategies for algal blooms in the downstream of Han River, China
,
Ecol. Indic.
,
142
,
109190
.
https://doi.org/10.1016/j.ecolind.2022.109190
.
Wen
X.
,
Feng
Q.
,
Deo
R. C.
,
Wu
M.
,
Yin
Z.
,
Yang
L.
&
Singh
V. P.
(
2019
)
Two-phase extreme learning machines integrated with the complete ensemble empirical mode decomposition with adaptive noise algorithm for multi-scale runoff prediction problems
,
J. Hydrol.
,
570
,
167
184
.
https://doi.org/10.1016/j.jhydrol.2018.12.060
.
Whitehead
P. G.
,
Wilby
R. L.
,
Battarbee
R. W.
,
Kernan
M.
&
Wade
A. J.
(
2009
)
A review of the potential impacts of climate change on surface water quality
,
Hydrol. Sci. J.
,
54
(
1
),
101
123
.
https://doi:10.1623/hysj.54.1.101
.
Xia
R.
,
Wang
G.
,
Zhang
Y.
,
Yang
P.
,
Yang
Z.
,
Ding
S.
,
Jia
X.
,
Yang
C.
,
Liu
C.
,
Ma
S.
,
Lin
J.
,
Wang
X.
,
Hou
X.
,
Zhang
K.
,
Gao
X.
,
Duan
P.
&
Qian
C.
(
2020
)
River algal blooms are well predicted by antecedent environmental conditions
,
Water Res.
,
185
,
116221
.
https://doi.org/10.1016/j.watres.2020.116221
.
Xiao
X.
,
He
J.
,
Huang
H.
,
Miller
T. R.
,
Christakos
G.
,
Reichwaldt
E. S.
,
Ghadouani
A.
,
Lin
S.
,
Xu
X.
&
Shi
J.
(
2017
)
A novel single-parameter approach for forecasting algal blooms
,
Water Res.
,
108
,
222
231
.
https://doi.org/10.1016/j.watres.2016.10.076
.
Xiao
X.
,
Peng
Y.
,
Zhang
W.
,
Yang
X.
,
Zhang
Z.
,
Ren
B.
,
Zhu
G.
&
Zhou
S.
(
2024
)
Current status and prospects of algal bloom early warning technologies: a review
,
J. Environ. Manage.
,
349
,
119510
.
https://doi.org/10.1016/j.jenvman.2023.119510
.
Xin
X.
,
Zhang
H.
,
Lei
P.
,
Tang
W.
,
Yin
W.
,
Li
J.
,
Zhong
H.
&
Li
K.
(
2020
)
Algal blooms in the middle and lower Han River: characteristics, early warning and prevention
,
Sci. Total Environ.
,
706
,
135293
.
https://doi.org/10.1016/j.scitotenv.2019.135293
.
Yin
D.
,
Xu
T.
,
Li
K.
,
Leng
L.
,
Jia
H.
&
Sun
Z.
(
2022
)
Comprehensive modelling and cost-benefit optimization for joint regulation of algae in urban water system
,
Environ. Pollut.
,
296
,
118743
.
https://doi.org/10.1016/j.envpol.2021.118743
.
Zhang
X.
&
Li
D.
(
2023
)
Multi-input multi-output temporal convolutional network for predicting the long-term water quality of ocean ranches
,
Environ. Sci. Pollut. Res.
,
30
,
7914
7929
.
https://doi.org/10.1007/s11356-022-22588-7
.
Zhang
S.
,
Zeng
Y.
,
Zha
W.
,
Huo
S.
,
Niu
L.
&
Zhang
X.
(
2022
)
Spatiotemporal variation of cascade reservoirs phosphorus in the three gorges reservoir: impact of upstream
,
Environ. Sci. Pollut. Res.
,
29
,
56739
56749
.
https://doi.org/10.1007/s11356-022-19787-7
.
Zheng
L.
,
Song
L.
,
Wu
X.
&
Zhuang
H.
(
2009
)
Analysis of morphology and 18S rDNA gene from the causative specie related diatom bloom in Hanjiang River (in Chinese)
,
Acta Hydrobiol. Sin.
,
33
(
5
),
562
564
.
https://doi.org/10.3724/SP.J.0000.2009.30562
.
Zheng
L.
,
Wang
H.
,
Liu
C.
,
Zhang
S.
,
Ding
A.
,
Xie
E.
,
Li
J.
&
Wang
S.
(
2021
)
Prediction of harmful algal blooms in large water bodies using the combined EFDC and LSTM models
,
J. Environ. Manage.
,
295
,
113060
.
https://doi.org/10.1016/j.jenvman.2021.113060
.
Zhou
J.
,
Zhang
M.
&
Lu
P.
(
2013
)
The effect of dams on phosphorus in the middle and lower Yangtze River: dam effect on lower river
,
Water Resour. Res.
,
49
,
3659
3669
.
https://doi.org/10.1002/wrcr.20283
.
Zhou
Y.
,
Guo
S.
&
Chang
F.-J.
(
2019
)
Explore an evolutionary recurrent ANFIS for modelling multi-step-ahead flood forecasts
,
J. Hydrol.
,
570
,
343
355
.
https://doi.org/10.1016/j.jhydrol.2018.12.04
.
Zhu
D.
,
Zhou
Y.
,
Guo
S.
,
Chang
F.-J.
,
Lin
K.
&
Deng
Z.
(
2023
)
Exploring a multi-objective optimization operation model of water projects for boosting synergies and water quality improvement in big river systems
,
J. Environ. Manage.
,
345
,
118673
.
https://doi.org/10.1016/j.jenvman.2023.118673
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-ND 4.0), which permits copying and redistribution with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nd/4.0/).

Supplementary data