ABSTRACT
River algal blooms pose a significant environmental threat, necessitating accurate forecasts and timely warnings for effective prevention. This study proposes a novel hybrid model, combining an external recursive long short-term memory neural network based on encoder–decoder (RLSTM-ED) with a backpropagation (BP) neural network, denoted as RLSTM-ED-BP. A dataset comprising 34,992 hydrological, climatic, and water quality (4-hourly) observations from the Hanjiang River Basin in China was divided for model training and testing. Comparative analysis with an RLSTM baseline demonstrated that the RLSTM-ED-BP model enhanced the Nash–Sutcliffe coefficient (NSE) by more than 5% and reduced the root mean square error by over 10% during the 24-h forecast horizon. The RLSTM-ED-BP model yielded NSE and threat score values exceeding 0.95 and efficiently provided early warnings for algal bloom events. The model's enhanced performance contributes to the generalizability of deep learning approaches in addressing the critical environmental challenge of algal blooms.
HIGHLIGHTS
Deep learning captures the complex nonlinear relationship of impact factors and algal density.
Recursive long short-term memory neural network based on encoder–decoder (RLSTM-ED) conquers overfitting and alleviates error propagation of multi-output forecasts.
RLSTM-ED-BP improves the accuracy and reliability of intraday algal bloom early warning.
INTRODUCTION
The eutrophication of water bodies, driven by climate change and human activities, has intensified algal blooms, a significant environmental challenge globally (Raulino et al. 2021; Zheng et al. 2021; Qian et al. 2024; Xiao et al. 2024). While traditionally occurring in static water bodies like lakes, algal blooms have also been observed in large rivers (Park et al. 2018; Pickering & Ford 2021). In China, the Hanjiang River has experienced recurrent algal blooms since 1992, characterized by a brownish tint, foul odor, and ecological damage, posing risks to freshwater quality and aquatic ecosystems (Xin et al. 2020; Li et al. 2021; Shen et al. 2021). Most studies focus on algal blooms in static shallow lakes (Schmale et al. 2019), while large river ecosystems, with their broader impact and complexity, remain underexplored, especially under limited data conditions (Tian et al. 2022). Accurate forecasting and early warning for river algal blooms are thus urgent and critical.
Algal bloom occurrences in large rivers are shaped by complex interactions among hydrological, climatic, and nutrient conditions (Whitehead et al. 2009; Zhou et al. 2021; Tian et al. 2022; Yin et al. 2022). Accurate forecasts of meteorological, hydrological, and water quality parameters underpin algal bloom early warning. However, while physical models are effective, their computational inefficiency highlights the advantages of machine learning methods, such as neural networks and support vector machines, in modeling nonlinear systems (Alavi et al. 2022). Deep learning, particularly long short-term memory (LSTM) networks (Hochreiter & Schmidhuber 1997), enhances forecast accuracy for water environment indicators (Hu et al. 2019). Recursive strategies improve multi-step-ahead forecasts by incorporating previous predictions (Zhou et al. 2019), but challenges like high-dimensional data and overfitting persist. Encoder–decoder (ED) architectures, originally developed for machine translation, effectively handle these issues and have been applied to water environment forecasting (Jahangir et al. 2023). Sheng et al. (2023) propose a multi-output temporal convolutional network-based ED model to forecast ammonia nitrogen. Zhang & Li (2023) combined the ED structure with deep learning models to perform water quality forecasting. However, gaps remain in integrating recursive strategies with ED models to address error propagation and overfitting challenges.
Proactive forecasting of riverine algal densities helps prevent algal blooms and mitigate their impacts. Constructing a predictive model that quantifies the multiple regression relationship between water environment factors (climatic, hydrological, and water quality factors) and algal density is a viable option. However, forecasting algal blooms remains challenging due to complex environmental influences and the nonlinear nature of algal dynamics (Xiao et al. 2017; Liu et al. 2022b). Traditional models like multiple linear regression struggle with such complexity, while machine learning methods, including neural networks, gradient boosting, and random forests, better capture nonlinear relationships and are commonly applied in algal bloom prediction (Xia et al. 2020; Deng et al. 2021; Liao et al. 2021). Most studies may pay more attention to forecasting and early warning of algal blooms on medium- to long-term scales, such as monthly or seasonal forecasting. Lin et al. (2023) researched medium-term algal bloom forecasting in mesotrophic lakes using machine learning models. Marzidovšek et al. (2024) utilized explainable machine learning models for seasonal harmful algal bloom predictions in the Adriatic Sea. However, hydrological changes from large-scale water diversion projects in the Hanjiang River have increased the frequency and risk of algal blooms (Maavara et al. 2015; Shen et al. 2021; Tan et al. 2023).
Algal bloom events develop rapidly, making medium- and long-term forecasts insufficient for timely preventive measures. Existing frameworks often fail to address the dynamic and sudden nature of blooms, highlighting the need for short-term, especially intraday, forecasting. Algal density, a key indicator of bloom severity, is traditionally monitored through labor-intensive field surveys, causing delays that hinder early warning models (Liu et al. 2020, 2023). While chlorophyll-a has been explored as an alternative indicator (Chen et al. 2015; Tian et al. 2022), algal density remains a more direct and accurate measure. Achieving accurate short-term forecasts under limited data conditions remains a significant challenge.
This study introduces a novel RLSTM-ED-BP model, integrating a recursive long short-term memory-based encoder–decoder model (RLSTM-ED) and a backpropagation (BP) neural network, for short-term algal bloom forecasting and early warning. The RLSTM-ED model addresses error propagation and overfitting, ensuring accurate multi-step water environment forecasts, while the BP model captures nonlinear relationships between water factors and algal density. Their fusion enables effective algal bloom forecasting and early warning. The model's applicability is demonstrated through a case study on algal bloom events in the Hanjiang River, China.
STUDY AREA AND MATERIALS
Study area
Materials
This study collected 4 h (time step) data from the 2017–2021 dry season (December–May of the following year) at the Xiantao monitoring station downstream of the Hanjiang River. We selected three representative categories of environmental variables (Xia et al. 2020): (1) hydrological indicator: streamflow discharge (Q); (2) climate indicator: water temperature (WT); and (3) water quality indicators: total phosphorus (TP), total nitrogen (TN), ammonia nitrogen, dissolved oxygen (DO), 5-day biochemical oxygen demand () and potential of hydrogen (pH). The data used in this study were provided by the Changjiang Water Resources Commission of the Ministry of Water Resources in China (http://www.cjw.gov.cn/english/).
The time step of the datasets is 4 h. A total of 34,992 (=[(31 days × 4 months × 4 years + 30 days × 4 months + 28 days × 3 months + 29 days) × 6 × 8 variables) time series values were partitioned into three datasets for model training (18,960 from 2017.12–2019.12), validation (8,784 from 2020.1 to 2020.12), and testing (7,248 from 2021.1 to 2021.5). The statistics of the data collected at the Xiantao monitoring station are shown in Table S1. Furthermore, this study also collected 4 h (time step) emergency monitoring data from mid-February to late March 2018 at the Xiantao Station. The Pearson correlation coefficients of the four water environment factors (WT, pH, DO, and Q) with algal density and the data statistics are shown in Table S2.
METHODS
Architecture of the (a) RLSTM, (b) RLSTM-ED, and (c) RLSTM-ED-BP models. Both RLSTM-ED and RLSTM models are constructed to make multi-step-ahead forecasts of water environment indicators. The RLSTM-ED-BP model is constructed to implement algal bloom early warning.
Architecture of the (a) RLSTM, (b) RLSTM-ED, and (c) RLSTM-ED-BP models. Both RLSTM-ED and RLSTM models are constructed to make multi-step-ahead forecasts of water environment indicators. The RLSTM-ED-BP model is constructed to implement algal bloom early warning.
Recursive long short-term memory neural network




















RLSTM-based encoder decoder forecasting model
The LSTM-based encoder − decoder (LSTM-ED) module can be constructed by embedding the LSTM model into the ED structure. In this study, the ED structure consists of two LSTM neural networks, with the first LSTM acting as an encoder and the second LSTM acting as a decoder. The function of the encoder is to transform the input sequence into a fixed-length vector, and the decoder transforms it into the target value. The utilization of the ED structure aids in mitigating the overfitting problem and enhancing the accuracy and reliability of the model.



In this study, the input sequence comprises eight variables, with a time step length of 6 for each variable. Therefore, the long short-term memory encoder (LSTMe) is reused 6 times (Kao et al. 2020). The repeated encoding process enables all input vectors to be transformed into their corresponding encoding vectors by LSTMe. The last time step in Figure 2(b) is sent to the decoder part as the context value. The fully connected layer is utilized to reduce the dimensionality of the high-dimensional forecasting from the LSTM decoder. The LSTM-ED module outputs pH, Q, WT, and DO at moment t. The RLSTM-ED model is constructed by fusing the LSTM-ED module and the recursive strategy (Figure 2(b)). According to the recursive strategy (Equations (8)–(12)), the output values of the LSTM-ED module at moment t are utilized to update the input information to predict the pH, Q, WT, and DO at moment t + 1. This process is repeated until pH, Q, WT, and DO values are predicted for the 24 h forecast horizon.
BP neural network model
The BP neural network was developed by scientists led by Rumelhart et al. (1986). The neural network is a multilayer feedforward neural network trained using the error BP algorithm. The BP neural network consists of three parts: the input layer, the hidden layer, and the output layer. The input information flows positively and passes through the three layers in turn to get the output value. The difference between the output value and the desired one is then calculated for BP and the output value is updated. The study utilizes the BP neural network model to construct a nonlinear mapping relationship among the explanatory variables (WT, Q, pH, and DO) and the predictor variable (algal density) based on the 2018 emergency monitoring data.
Fusing RLSTM-ED and BP (RLSTM-ED-BP) for early warning
Through fusing RLSTM-ED and BP, we introduce a hybrid RLSTM-ED-BP model (Figure 2(c)) for algal bloom early warning. The RLSTM-ED-BP model consists of two components. (1) Based on the predicted values of water environment indicators from t + 1 to t + 6 horizons of the RLSTM-ED model, algal density is predicted using the BP model developed in Section 2.5. (2) Algal density is an effective indicator for early warning of algal blooms and can determine the class of the algal bloom. Based on the predicted values of algal density, an early warning analysis of algal bloom events at Xiantao Station on the lower Hanjiang River is conducted.
The lack of algal monitoring data is a significant limiting factor in the development of forecasting and early warning models. In this study, a novel hybrid RLSTM-ED-BP model is proposed to provide a feasible solution for algal bloom forecasting and early warning in the lower reaches of the Hanjiang River under limited data conditions. The computation processes of forecasting and early warning models are described as follows.
Step 1: Divide the dataset into training, validation, and testing sets, and the dataset is processed with min–max normalization. The training set is used to fit the model and train the model parameters. The validation set is used to tune the model's hyperparameters. The test set is used to evaluate the model's generalization capabilities. The RLSTM-ED model is used to simulate and predict the values of WT, Q, PH, and DO from t + 1 to t + 6 horizons.
Step 2: Based on the 2018 emergency monitoring data, a BP model is used to capture a nonlinear mapping relationship among four water environment indicators (WT, Q, pH, and DO) and algal density.
Step 3: Based on the well-established BP model, four water environment indicators (WT, Q, pH, and DO) predicted by the RLSTM-ED model are used as inputs to predict algal densities. In this study, the hybrid model that integrates RLSTM-ED and BP models is referred to as the RLSTM-ED-BP model.
Step 4: Early warning analysis of algal bloom events at Xiantao Station on the lower Hanjiang River based on algal density predicted by the RLSTM-ED-BP model.
The RLSTM-ED model can be regarded as a ‘predictor’ that forecasts future water environment indicators (WT, Q, pH, and DO) based on existing monitoring data. In contrast, the BP model functions as an ‘interpreter’, establishing a nonlinear mapping relationship between limited algal density data and water environment indicators. By utilizing the RLSTM-ED model's predicted water environment indicators as inputs, the BP model converts these predictions into algal density dynamics. This two-step approach of the RLSTM-ED-BP model achieves algal bloom forecasting and early warning indirectly, effectively addressing the limitations posed by insufficient algal monitoring data.
The hyperparameters as well as the input and output variable settings of the model are shown in Table S3, where the parameters are determined by the trial-and-error method. The number of neurons in the hidden layer of the benchmark model (RLSTM) was similarly set to 32 for comparison (Sheng et al. 2023). The learning rate and epoch are set to 0.001 and 500, respectively. The model parameters are optimized using the Adam optimizer, and the mean absolute error is chosen as the loss function. The models used in this study are implemented using the Python library PyTorch. The models conduct 20 rounds of experiments to decrease the impact of weight parameters on the forecasting performance.
The differences between the RLSTM and RLSTM-ED models constructed in this paper are as follows: (1) to construct complex nonlinear mapping relationships between multiple input and multiple output variables, the former is based on an LSTM layer while the latter is based on an ED framework; and (2) since the latter consists of an LSTM encoder and a decoder, the number of parameters (32*32 + 32) is more than that of the former. Additionally, the RLSTM-ED model needs to be further integrated with the BP neural network for forecasting and early warning. The purpose of comparing the RLSTM-BP and RLSTM-ED-BP models in this study is to assess the effect of water environmental indicators predicted by the RLSTM and RLSTM-ED models on the accuracy of algal density forecasting.


RESULTS
This study aims to assess the accuracy and effectiveness of the proposed algal bloom forecasting and early warning model. The experimental results and comprehensive analysis are presented below.
Multi-step-ahead forecasts of water environment indicators using RLSTM-ED models
This study integrates a recursive strategy with an LSTM neural network model based on an ED (called RLSTM-ED) for multi-step ahead forecasting. To evaluate and compare the impact of the ED structure on the forecasting results of the model, a recursive strategy-based long short-term memory neural network (RLSTM) was used as a benchmark model. In addition, to decrease the impact of the inherent randomness of the neural network models on the forecasting results, 20 rounds of experiments were conducted for each model in this study. The mean values from 20 rounds of experiments were adopted as predictive values for the model. The model inputs are the observations collected in Table S1 for the preceding 24 h with a time step of 4 h. And the output is the forecasting results of DO, pH, WT, and flow (Q) for the 24-h forecast horizon. The forecasting results of the RLSTM-ED and LSTM models are shown in Table 1. Overall, the forecasting accuracy of both models decreased as the forecast horizon increased.
Performance of the multi-step-ahead forecasting models for water environment indicators at Xiantao Station
Period . | Model . | Indicators . | Evaluation indicators . | Horizon . | |||||
---|---|---|---|---|---|---|---|---|---|
t + 1 . | t + 2 . | t + 3 . | t + 4 . | t + 5 . | t + 6 . | ||||
Training | R-LSTM-ED | DOX | NSE | 0.990 | 0.988 | 0.984 | 0.981 | 0.976 | 0.972 |
RMSE (mg/L) | 0.131 | 0.148 | 0.167 | 0.185 | 0.204 | 0.223 | |||
PH | NSE | 0.984 | 0.978 | 0.972 | 0.966 | 0.960 | 0.954 | ||
RMSE | 0.017 | 0.020 | 0.022 | 0.025 | 0.027 | 0.029 | |||
WT | NSE | 0.991 | 0.986 | 0.982 | 0.978 | 0.973 | 0.969 | ||
RMSE (°C) | 0.451 | 0.543 | 0.624 | 0.697 | 0.763 | 0.824 | |||
Q | NSE | 0.985 | 0.976 | 0.967 | 0.956 | 0.945 | 0.932 | ||
RMSE (m3/s) | 40 | 50 | 59 | 68 | 76 | 84 | |||
RLSTM | DOX | NSE | 0.980 | 0.974 | 0.968 | 0.961 | 0.954 | 0.946 | |
RMSE (mg/L) | 0.190 | 0.213 | 0.237 | 0.261 | 0.285 | 0.308 | |||
PH | NSE | 0.983 | 0.978 | 0.972 | 0.966 | 0.959 | 0.952 | ||
RMSE | 0.017 | 0.020 | 0.023 | 0.025 | 0.027 | 0.029 | |||
WT | NSE | 0.981 | 0.978 | 0.975 | 0.970 | 0.962 | 0.953 | ||
RMSE (°C) | 0.639 | 0.696 | 0.744 | 0.815 | 0.908 | 1.016 | |||
Q | NSE | 0.948 | 0.943 | 0.935 | 0.926 | 0.917 | 0.907 | ||
RMSE (m3/s) | 74 | 77 | 83 | 88 | 94 | 99 | |||
Validation | R-LSTM-ED | DOX | NSE | 0.983 | 0.977 | 0.970 | 0.963 | 0.954 | 0.945 |
RMSE (mg/L) | 0.085 | 0.098 | 0.112 | 0.125 | 0.137 | 0.150 | |||
PH | NSE | 0.995 | 0.995 | 0.994 | 0.993 | 0.992 | 0.991 | ||
RMSE | 0.005 | 0.005 | 0.006 | 0.006 | 0.006 | 0.007 | |||
WT | NSE | 0.978 | 0.969 | 0.959 | 0.949 | 0.940 | 0.931 | ||
RMSE (°C) | 0.494 | 0.590 | 0.675 | 0.750 | 0.818 | 0.878 | |||
Q | NSE | 0.996 | 0.993 | 0.989 | 0.985 | 0.980 | 0.974 | ||
RMSE (m3/s) | 12 | 16 | 20 | 24 | 27 | 31 | |||
RLSTM | DOX | NSE | 0.980 | 0.972 | 0.964 | 0.955 | 0.945 | 0.934 | |
RMSE (mg/L) | 0.092 | 0.108 | 0.123 | 0.137 | 0.151 | 0.165 | |||
PH | NSE | 0.989 | 0.988 | 0.988 | 0.987 | 0.987 | 0.985 | ||
RMSE | 0.007 | 0.008 | 0.008 | 0.008 | 0.008 | 0.009 | |||
WT | NSE | 0.964 | 0.959 | 0.953 | 0.947 | 0.939 | 0.930 | ||
RMSE (°C) | 0.632 | 0.676 | 0.721 | 0.769 | 0.822 | 0.881 | |||
Q | NSE | 0.956 | 0.953 | 0.948 | 0.942 | 0.936 | 0.929 | ||
RMSE (m3/s) | 40 | 41 | 43 | 46 | 48 | 51 | |||
Testing | R-LSTM-ED | DOX | NSE | 0.981 | 0.973 | 0.963 | 0.951 | 0.937 | 0.921 |
RMSE (mg/L) | 0.102 | 0.123 | 0.144 | 0.166 | 0.188 | 0.210 | |||
PH | NSE | 0.926 | 0.904 | 0.884 | 0.863 | 0.843 | 0.823 | ||
RMSE | 0.014 | 0.015 | 0.017 | 0.019 | 0.020 | 0.021 | |||
WT | NSE | 0.997 | 0.994 | 0.990 | 0.986 | 0.982 | 0.978 | ||
RMSE (°C) | 0.211 | 0.320 | 0.403 | 0.474 | 0.537 | 0.594 | |||
Q | NSE | 0.996 | 0.988 | 0.980 | 0.971 | 0.961 | 0.951 | ||
RMSE (m3/s) | 21 | 36 | 47 | 56 | 65 | 73 | |||
RLSTM | DOX | NSE | 0.918 | 0.894 | 0.863 | 0.826 | 0.782 | 0.733 | |
RMSE (mg/L) | 0.213 | 0.243 | 0.276 | 0.312 | 0.348 | 0.385 | |||
PH | NSE | 0.874 | 0.861 | 0.827 | 0.807 | 0.794 | 0.780 | ||
RMSE | 0.018 | 0.019 | 0.021 | 0.022 | 0.023 | 0.024 | |||
WT | NSE | 0.964 | 0.962 | 0.960 | 0.916 | 0.912 | 0.907 | ||
RMSE (°C) | 0.763 | 0.780 | 0.807 | 1.167 | 1.196 | 1.233 | |||
Q | NSE | 0.947 | 0.937 | 0.928 | 0.920 | 0.912 | 0.885 | ||
RMSE (m3/s) | 75 | 82 | 88 | 93 | 97 | 116 |
Period . | Model . | Indicators . | Evaluation indicators . | Horizon . | |||||
---|---|---|---|---|---|---|---|---|---|
t + 1 . | t + 2 . | t + 3 . | t + 4 . | t + 5 . | t + 6 . | ||||
Training | R-LSTM-ED | DOX | NSE | 0.990 | 0.988 | 0.984 | 0.981 | 0.976 | 0.972 |
RMSE (mg/L) | 0.131 | 0.148 | 0.167 | 0.185 | 0.204 | 0.223 | |||
PH | NSE | 0.984 | 0.978 | 0.972 | 0.966 | 0.960 | 0.954 | ||
RMSE | 0.017 | 0.020 | 0.022 | 0.025 | 0.027 | 0.029 | |||
WT | NSE | 0.991 | 0.986 | 0.982 | 0.978 | 0.973 | 0.969 | ||
RMSE (°C) | 0.451 | 0.543 | 0.624 | 0.697 | 0.763 | 0.824 | |||
Q | NSE | 0.985 | 0.976 | 0.967 | 0.956 | 0.945 | 0.932 | ||
RMSE (m3/s) | 40 | 50 | 59 | 68 | 76 | 84 | |||
RLSTM | DOX | NSE | 0.980 | 0.974 | 0.968 | 0.961 | 0.954 | 0.946 | |
RMSE (mg/L) | 0.190 | 0.213 | 0.237 | 0.261 | 0.285 | 0.308 | |||
PH | NSE | 0.983 | 0.978 | 0.972 | 0.966 | 0.959 | 0.952 | ||
RMSE | 0.017 | 0.020 | 0.023 | 0.025 | 0.027 | 0.029 | |||
WT | NSE | 0.981 | 0.978 | 0.975 | 0.970 | 0.962 | 0.953 | ||
RMSE (°C) | 0.639 | 0.696 | 0.744 | 0.815 | 0.908 | 1.016 | |||
Q | NSE | 0.948 | 0.943 | 0.935 | 0.926 | 0.917 | 0.907 | ||
RMSE (m3/s) | 74 | 77 | 83 | 88 | 94 | 99 | |||
Validation | R-LSTM-ED | DOX | NSE | 0.983 | 0.977 | 0.970 | 0.963 | 0.954 | 0.945 |
RMSE (mg/L) | 0.085 | 0.098 | 0.112 | 0.125 | 0.137 | 0.150 | |||
PH | NSE | 0.995 | 0.995 | 0.994 | 0.993 | 0.992 | 0.991 | ||
RMSE | 0.005 | 0.005 | 0.006 | 0.006 | 0.006 | 0.007 | |||
WT | NSE | 0.978 | 0.969 | 0.959 | 0.949 | 0.940 | 0.931 | ||
RMSE (°C) | 0.494 | 0.590 | 0.675 | 0.750 | 0.818 | 0.878 | |||
Q | NSE | 0.996 | 0.993 | 0.989 | 0.985 | 0.980 | 0.974 | ||
RMSE (m3/s) | 12 | 16 | 20 | 24 | 27 | 31 | |||
RLSTM | DOX | NSE | 0.980 | 0.972 | 0.964 | 0.955 | 0.945 | 0.934 | |
RMSE (mg/L) | 0.092 | 0.108 | 0.123 | 0.137 | 0.151 | 0.165 | |||
PH | NSE | 0.989 | 0.988 | 0.988 | 0.987 | 0.987 | 0.985 | ||
RMSE | 0.007 | 0.008 | 0.008 | 0.008 | 0.008 | 0.009 | |||
WT | NSE | 0.964 | 0.959 | 0.953 | 0.947 | 0.939 | 0.930 | ||
RMSE (°C) | 0.632 | 0.676 | 0.721 | 0.769 | 0.822 | 0.881 | |||
Q | NSE | 0.956 | 0.953 | 0.948 | 0.942 | 0.936 | 0.929 | ||
RMSE (m3/s) | 40 | 41 | 43 | 46 | 48 | 51 | |||
Testing | R-LSTM-ED | DOX | NSE | 0.981 | 0.973 | 0.963 | 0.951 | 0.937 | 0.921 |
RMSE (mg/L) | 0.102 | 0.123 | 0.144 | 0.166 | 0.188 | 0.210 | |||
PH | NSE | 0.926 | 0.904 | 0.884 | 0.863 | 0.843 | 0.823 | ||
RMSE | 0.014 | 0.015 | 0.017 | 0.019 | 0.020 | 0.021 | |||
WT | NSE | 0.997 | 0.994 | 0.990 | 0.986 | 0.982 | 0.978 | ||
RMSE (°C) | 0.211 | 0.320 | 0.403 | 0.474 | 0.537 | 0.594 | |||
Q | NSE | 0.996 | 0.988 | 0.980 | 0.971 | 0.961 | 0.951 | ||
RMSE (m3/s) | 21 | 36 | 47 | 56 | 65 | 73 | |||
RLSTM | DOX | NSE | 0.918 | 0.894 | 0.863 | 0.826 | 0.782 | 0.733 | |
RMSE (mg/L) | 0.213 | 0.243 | 0.276 | 0.312 | 0.348 | 0.385 | |||
PH | NSE | 0.874 | 0.861 | 0.827 | 0.807 | 0.794 | 0.780 | ||
RMSE | 0.018 | 0.019 | 0.021 | 0.022 | 0.023 | 0.024 | |||
WT | NSE | 0.964 | 0.962 | 0.960 | 0.916 | 0.912 | 0.907 | ||
RMSE (°C) | 0.763 | 0.780 | 0.807 | 1.167 | 1.196 | 1.233 | |||
Q | NSE | 0.947 | 0.937 | 0.928 | 0.920 | 0.912 | 0.885 | ||
RMSE (m3/s) | 75 | 82 | 88 | 93 | 97 | 116 |
The NSE values yielded by the RLSTM-ED model during the training phase range from 0.932 to 0.991, while the corresponding values yielded by the RLSTM model are between 0.907 and 0.983. The RLSTM-ED model performs superior forecasting performance at horizon t + 1 to t + 6. This is consistent with the conclusions drawn from the lower RMSE yielded by the RLSTM-ED model. Moreover, the NSE values yielded by the RLSTM-ED model are higher than 0.95, except for the discharge flow at horizon t + 5 and t + 6. Similarly, the RLSTM-ED model outperforms the RLSTM model in all forecast horizons during the validation phase. This indicates that the RLSTM-ED model has good fitting ability and can simulate the internal nonlinear characteristics of the multi-input and multi-output factors in the Xiantao Station.
The forecasting accuracy of the RLSTM-ED model in all horizons is also more favorable in the testing phase. The NSE values yielded by the water environment indicators are higher than 0.9 in all horizons except for pH from horizon t + 3 to t + 6. In contrast, the RLSTM model has a lower forecasting accuracy. The NSE values of pH are less than 0.9 in all six horizons. From horizon t + 5 to t + 6, the NSE values of DO and pH are even less than 0.8. The forecasting accuracy of the RLSTM model decreases relatively fast as the horizon increases. The NSE value of DO at horizon t + 6 is only 0.733, which is significantly lower than the corresponding value of DO at horizon t + 1. The forecasting accuracy of the RLSTM-ED model is superior to that of the RLSTM model in the testing phase, which is consistent with the conclusions drawn in the training phase.
Improvement rates in terms of (a) NSE and (b) RMSE of the RLSTM-ED model for the multi-step-ahead forecast at the Xiantao Station during the testing phase, compared with the RLSTM model. Improvement rate of each indicator = .
Improvement rates in terms of (a) NSE and (b) RMSE of the RLSTM-ED model for the multi-step-ahead forecast at the Xiantao Station during the testing phase, compared with the RLSTM model. Improvement rate of each indicator = .
Normalized Taylor diagrams of the predicted means of the models for 20 rounds in the t + 1, t + 3, and t + 6 horizons during the testing stages.
Normalized Taylor diagrams of the predicted means of the models for 20 rounds in the t + 1, t + 3, and t + 6 horizons during the testing stages.
The normalized STD of the scatter corresponding to the observations is 1 (as shown in Figure 4). The closer the scatter corresponding to the model is to the scatter corresponding to the observations, the better the model can reproduce the observations. For these four water environmental indicators (Q, WT, DO, and pH), the normalized standard deviations yielded by the RLSTM-ED model range from 0.807 to 1.006, which are closer to the ideal forecasting STD of 1 compared to the corresponding values yielded by the RLSTM model. This finding is consistent with the high CC and low normalized CRMSE obtained by the RLSTM-ED model. However, the scatter points corresponding to both models deviate to varying degrees from the scatter points corresponding to the observations as the foresight horizon increases. Overall, the scatter points corresponding to the RLSTM-ED model at the three horizons are closer to the scatter points corresponding to the observations compared to the RLSTM model. The ED framework improves the forecast accuracy of the model and reduces the discrepancy between the forecasts and the observations.
The line plots and kernel density curves of the predicted mean values of the models over 20 rounds at horizon t + 6.
The line plots and kernel density curves of the predicted mean values of the models over 20 rounds at horizon t + 6.
Simulation and multi-step-ahead forecasts of algal density using BP models
According to Section 4.1, it is obvious that the forecasting accuracy of the RLSTM-ED model is high for the four water environment indicators (Q, WT, DO, and pH) from horizon t + 1 to t + 6. Therefore, the algal density forecasting model can be built using the BP neural network based on the forecasting results of the RLSTM-ED model, which is abbreviated as RLSTM-ED-BP. Besides, a BP neural network model built based on the predictions of the RLSTM model is used as a benchmark (referred to as RLSTM-BP) for comparison.
Performance of the RLSTM-BP and RLSTM-ED-BP models (each model performed 20 rounds) for the 24-h forecast horizon during the testing phase.
Performance of the RLSTM-BP and RLSTM-ED-BP models (each model performed 20 rounds) for the 24-h forecast horizon during the testing phase.
As shown in Figure 7, the shape distributions of violin plots drawn based on the forecasting of the two models are significantly different. The shape of the violin plots based on the forecasting results of the RLSTM-ED-BP model is much 'shorter and wider' compared to the RLSTM-BP model. RLSTM-ED-BP conducted 20 rounds of experiments resulting in smaller quartile ranges as well as min–max ranges. Hence, the outcomes from 20 rounds of experiments provide evidence that the constructed RLSTM-ED-BP model exhibits better stability. In addition, compared to the RLSTM-BP model, the RLSTM-ED-BP model yields higher NSE values and lower RMSE values for all horizons. The RLSTM-ED-BP model exhibits superior forecasting performance compared to the RLSTM-BP model.
In practice, decision-makers may pay more attention to whether or not an algal bloom occurs. The TS score is used to evaluate the proportion of correctly predicted occurrences of algal blooms out of the total number of forecasts. It ranges between 0 and 1, with a higher score indicating a higher accuracy in predicting algal bloom events. The TS score yielded by the RLSTM-ED-BP model is higher than the corresponding values yielded by the RLSTM-BP model (as shown in Figure 7). The RLSTM-ED-BP model demonstrates superior capability in correctly predicting algal bloom events.
The mean values of algal density forecasting from 20 rounds of experiments with the two hybrid models during the testing phase are shown in Table 2. Overall, the forecasting accuracy of the RLSTM-ED-BP and RLSTM-BP models decreased as the foresight horizon increased. The RLSTM-ED-BP model yields NSE and TS score values above 0.95 at horizons t + 1 to t + 6, which are significantly superior to the corresponding values obtained from the RLSTM-BP model. This finding is consistent with the lower RMSE values generated by the RLSTM-ED-BP model.
The mean values of algal density forecasting from 20 rounds of experiments with the two hybrid models during the testing phase
Station . | Model . | Evaluation indicators . | Horizon . | |||||
---|---|---|---|---|---|---|---|---|
t + 1 . | t + 2 . | t + 3 . | t + 4 . | t + 5 . | t + 6 . | |||
Xiantao | RLSTM-ED-BP | TS | 0.962 | 0.961 | 0.959 | 0.959 | 0.959 | 0.958 |
NSE | 0.989 | 0.984 | 0.978 | 0.973 | 0.967 | 0.961 | ||
RMSE ![]() | 74.429 | 91.202 | 105.590 | 117.922 | 129.145 | 141.015 | ||
RLSTM-BP | TS | 0.909 | 0.909 | 0.907 | 0.905 | 0.904 | 0.901 | |
NSE | 0.935 | 0.934 | 0.933 | 0.932 | 0.929 | 0.925 | ||
RMSE ![]() | 183.356 | 183.594 | 185.762 | 185.871 | 189.514 | 194.724 |
Station . | Model . | Evaluation indicators . | Horizon . | |||||
---|---|---|---|---|---|---|---|---|
t + 1 . | t + 2 . | t + 3 . | t + 4 . | t + 5 . | t + 6 . | |||
Xiantao | RLSTM-ED-BP | TS | 0.962 | 0.961 | 0.959 | 0.959 | 0.959 | 0.958 |
NSE | 0.989 | 0.984 | 0.978 | 0.973 | 0.967 | 0.961 | ||
RMSE ![]() | 74.429 | 91.202 | 105.590 | 117.922 | 129.145 | 141.015 | ||
RLSTM-BP | TS | 0.909 | 0.909 | 0.907 | 0.905 | 0.904 | 0.901 | |
NSE | 0.935 | 0.934 | 0.933 | 0.932 | 0.929 | 0.925 | ||
RMSE ![]() | 183.356 | 183.594 | 185.762 | 185.871 | 189.514 | 194.724 |
In summary, compared to the RLSTM-BP model, the RLSTM-ED-BP model achieves superior forecasting accuracy and stability.
Algal bloom early warning analysis using the RLSTM-ED-BP model
As shown in Section 4.2, the RLSTM-ED-BP model reproduces the algal density more accurately. Algal density is an effective indicator for early warning of algal blooms (defined by algal density). Accurate early warning of algal bloom events may receive more attention. Therefore, this study conducted an early warning analysis of algal bloom events in 2018 and 2021 based on the predicted values of algal density from the RLSTM-ED-BP model (Table 3). The predictions of algal density from the RLSTM-ED-BP model are within the range corresponding to the algal bloom class. Our proposed algal bloom forecasting and warning framework effectively warns that a moderate bloom event will occur in the late January 2021 period. This provides valuable decision-making time (24-h forecast horizon) for the implementation of water project scheduling for algal bloom prevention and control.
Early warning analysis of algal bloom events in the Xiantao Station
. | Training phase . | Testing phase . |
---|---|---|
Item | 2018 | 2021 |
Classification of algal blooms | Ⅳ | Ⅳ |
Observations of algal density (![]() | 1.0–3.5 | 1.0–2.0 |
Simulation and forecasting of algal densities (![]() | 1.0–2.8 | 1.0–2.6 |
Early warning or not | Yes | Yes |
No algal bloom (I): ![]() ![]() No significant algal bloom (Ⅱ): ![]() ![]() Mild algal bloom (Ⅲ): ![]() ![]() Moderate algal bloom (Ⅳ): ![]() ![]() Severe algal bloom (Ⅴ): Algal density> ![]() |
. | Training phase . | Testing phase . |
---|---|---|
Item | 2018 | 2021 |
Classification of algal blooms | Ⅳ | Ⅳ |
Observations of algal density (![]() | 1.0–3.5 | 1.0–2.0 |
Simulation and forecasting of algal densities (![]() | 1.0–2.8 | 1.0–2.6 |
Early warning or not | Yes | Yes |
No algal bloom (I): ![]() ![]() No significant algal bloom (Ⅱ): ![]() ![]() Mild algal bloom (Ⅲ): ![]() ![]() Moderate algal bloom (Ⅳ): ![]() ![]() Severe algal bloom (Ⅴ): Algal density> ![]() |
DISCUSSION
The phytoplankton community in the Hanjiang River exhibits diversity in distribution. The determination of dominant species can reflect the level of pollution and water environment conditions in a water body to some extent. Pan et al. (2014) studied the characteristics of phytoplankton group structure during spring algal blooms in the Hanjiang River. It can be found that the phytoplankton group composition at the Xiantao Station has the largest proportion of Bacillariophyta (51.29%), followed by Chlorophyta (28.21%); Cyanophyta and Cryptophyta account for a relatively small number of phytoplankton species (7.69%); and there are only 1–2 phytoplankton species of Pyrrophyta and Euglenophyta, which account for the smallest proportion of the phytoplankton species. Furthermore, Mai et al. (2020) presented a similar analysis on the phytoplankton group in the Hanjiang River and arrived at the same conclusion that the number of species of Bacillariophyta was dominant. Moreover, the dominant phytoplankton species was identified as Stephanodiscus hantzschii of diatoms by 18S rRNA gene sequence analysis (Zheng et al. 2009).
Algal bloom occurrences in the Hanjiang River result from the interplay of various environmental factors. Nutrients like nitrogen and phosphorus are fundamental prerequisites, climatic conditions act as inducers, and hydrodynamic conditions serve as primary drivers for algal blooms (Imteaz & Asaeda 2000; Neal et al. 2006). The accumulation of algal biomass in rivers necessitates high nutrient concentrations, suitable water temperatures, and relatively low flow conditions (Hilton et al. 2006; Li et al. 2021; Xiao et al. 2024). Diatom blooms predominantly occurred in January–March, correlating with the low-temperature season (Liu et al. 2022a). This suggests that late winter or early spring conditions are conducive to diatom growth. However, a deviation from this trend was observed in 2022, when algal blooms transpired during the flood season in the lower Hanjiang River, with cyanobacteria emerging as the dominant species. Unlike diatoms, cyanobacteria exhibit adaptability to higher temperatures, thriving in the range of 30–35°C (Cheng et al. 2019). This shift underscores the dynamic nature of algal bloom occurrences, emphasizing the influence of varying environmental conditions on phytoplankton community composition.
Water bodies' eutrophication, driven by excessive nutrients like nitrogen and phosphorus, is a critical environmental concern. Established thresholds for eutrophication, typically set at TN =0.2 mg/L and TP = 0.02 mg/L (Xia et al. 2020), indicate potential issues. Analyzing data from the Xiantao Station during the dry period from 2017 to 2021 (Table S1), TN and TP concentrations ranged from 1.279 to 2.256 mg/L and 0.05 to 0.157 mg/L, respectively – exceeding eutrophication thresholds. This aligns with earlier findings, indicating nutrient levels in the Hanjiang River satisfy essential conditions for diatom growth (Xin et al. 2020).
The Xiantao gaging station's median flow during early spring (January–March) from 2014 to 2018 was significantly lower than that recorded from 1992 to 2003 (Xin et al. 2020). Lower flow rates, known to favor algal growth (Kim et al. 2022), align with the observed increase in algal bloom frequency. The period before 2014 witnessed nine algal bloom events in 23 years, whereas the 9 years post-2014 saw five such events. This surge is attributed to the full operation of the Xinglong Reservoir and the South-to-North Water Diversion Project's impoundment around 2014 (Xin et al. 2020). Human activities, including dam construction and water extraction, have significantly altered the lower Hanjiang River's hydrological environment (Zhou et al. 2013; Mei et al. 2016; Zhang et al. 2022; Zhu et al. 2023). Streamflow's impact on nutrient and algal accumulation is notable. Lower streamflow conditions promote algal aggregation, reproduction, and prolonged nutrient presence – critical for algal growth. In 2022, reduced streamflow during the flood season in the lower Hanjiang River was a key factor in algal bloom occurrences, fostering stagnant water and heightened nutrient concentrations conducive to algae proliferation. Understanding these dynamics is vital for effective environmental management and mitigation strategies.
A confluence of environmental factors contributes to the occurrence of algal blooms in rivers. Algal density, directly reflecting the quantity of algae in a unit of water, stands as a pivotal indicator for assessing the severity of algal blooms. Consequently, the construction of a predictive model capable of accurately forecasting algal density values assumes paramount importance for early warning systems. However, the scarcity of available data on algal monitoring poses a significant hurdle in developing robust forecasting models. This dearth of monitoring data also imposes limitations on the application of process-based physical models, emphasizing the challenge of achieving accurate algal bloom forecasts and warnings under limited data conditions. Our proposed machine learning-based model for algal bloom forecasting and early warning maximizes the utilization of existing data, harnessing the strengths of machine learning modeling. The predictive model, RLSTM-ED, tailored for water environment indicators, adeptly addresses the intricate challenges posed by multiple input and multiple output factors. The ED structure enhances the model's predictive and generalization capabilities while excelling in capturing intricate nonlinear connections between input and output factors. Employing a recursive strategy, the model incorporates the predicted value at time ‘t’ as additional information for predicting the value at time ‘t + 1’. This real-time updating of input variables facilitates a dynamic mapping relationship between inputs and outputs. The recursive strategy, by swiftly adapting model parameters to dynamically changing water environmental factors, significantly enhances forecasting accuracy. This valuable strategy allows simultaneous multi-step ahead forecasting for four crucial water environment metrics – pH, DO, streamflow (Q), and WT.
Given the constraints of available data, not all factors influencing algal blooms were incorporated as explanatory variables. Nevertheless, the BP model effectively established a nonlinear mapping relationship between the input variables (pH, DO, Q, and WT) and the output variable (algal density). The water environment indicators (pH, DO, Q, and WT) exhibit notable correlations with algal densities (Table S2), contributing significantly to the model's performance enhancement. In comparison to the RLSTM-BP model, the RLSTM-ED-BP model demonstrates superior accuracy in predicting algal density. The inclusion of the ED structure plays a pivotal role in refining algal density forecasting accuracy by enhancing the precision of water environment indicators prediction. The RLSTM-ED-BP hybrid model marks a significant milestone in intraday algal density forecasting and algal bloom warning, achieving a short-term (24 h) forecasting and warning capability for the first time. Through intraday algal bloom forecasting and early warning, management authorities can promptly regulate reservoir outflow to utilize hydrodynamic forces for flushing and diluting algal concentrations. This approach effectively mitigates the risk of algal bloom. Such a dynamic regulation mechanism offers a more precise and scientifically grounded strategy for managing sudden algal bloom events, thereby enhancing the ability to safeguard aquatic ecosystem stability. This study is limited by the lack of monitoring data across the entire river basin. Focusing solely on the Xiantao section may not fully capture the algal bloom dynamics of the entire Hanjiang River. However, the outcomes of this study validate the robustness and feasibility of the proposed algal bloom forecasting and early warning model under conditions of limited data. This advancement holds promise for addressing the challenges posed by data constraints in accurately predicting and proactively managing algal blooms in water bodies.
CONCLUSION
Accurate forecasting and early warning systems for river algal blooms play a pivotal role in aiding decision-makers to devise effective strategies for algal bloom prevention and control. This study proposes a novel hybrid RLSTM-ED-BP model designed for short-term (intraday) algal bloom forecasting and early warning. The case study focuses on the Hanjiang River Basin in China. The proposed ED structure demonstrates superior performance in capturing intricate nonlinear mapping connections of water environment indicators and impact factors. Leveraging a recursive strategy, this model proves invaluable for achieving multi-step-ahead forecasting of four water environment indicators simultaneously. In comparison to the benchmark RLSTM-BP model, the RLSTM-ED-BP model exhibits a notable enhancement in the reliability and stability of algal density forecast results. The key findings are summarized as follows:
a. The RLSTM-ED model, in the testing period, surpasses the RLSTM model by enhancing the NSE by more than 5% and reducing the RMSE by over 10% from t + 1 to t + 6 horizons. It overcomes the limitations of error propagation and overfitting in traditional artificial neural network models for multi-step ahead forecasting.
b. The BP model effectively establishes a nonlinear mapping relationship between water environment indicators (WT, pH, DO, and Q) and algal density. The RLSTM-ED-BP model outperforms the RLSTM-BP model, yielding NSE and TS scores higher than 0.95, with improved predictive accuracy and enhanced stability in algal density forecasting. The ED framework contributes to the precision of algal density forecasting by refining predicted values of water environmental indicators.
c. The RLSTM-ED-BP model accurately predicts algal density, enabling effective early warning of a moderate algal bloom event in late January 2021 at the Xiantao Station of the downstream Hanjiang River. The 24-h forecast horizon provides valuable decision-making time for implementing water project scheduling for algal bloom prevention and control.
The novel hybrid RLSTM-ED-BP model proposed in this study significantly reduces the risk of algal blooms and offers essential technical support for decision-makers in crafting strategies for prevention and control. Future research avenues could explore year-round algal bloom forecasting and early warning studies, integrating water quality sampling data collected during the flood season. Additionally, exploring the integration of algal bloom forecasting and early warning with water engineering for comprehensive prevention and control scheduling could lead to a more integrated approach.
ACKNOWLEDGEMENTS
This work was supported by the National Key Research and Development Program of China (No. 2021YFC3200303). The authors would like to thank the Editors and anonymous Reviewers for their constructive comments that greatly contributed to improving the manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.