ABSTRACT
Runoff prediction serves as the cornerstone for the effective management, allocation, and utilization of water resources, playing a key role in hydrological research. This study employs a newly reported deep learning model, Mamba, to forecast river daily runoff and compared the proposed model with various benchmark methods, including statistical models, machine learning methods, recurrent neural networks, and attention-based models. Application of these models is implemented on three hydrological stations situated along the middle and lower reaches of the Mississippi River. Daily runoff from 1983 to 2023 were used to build the model for 7-day prediction. Findings demonstrate the superiority of the Mamba model over its counterparts, showcasing its potential as a backbone model. In response to the necessity for a more lightweight approach, a refined variant of the Mamba model is proposed, called LightMamba. LightMamba incorporates partial normalization and MPM (Multi-Path-Mamba) to enhance its efficacy in discerning nonlinear trends and capturing long-term dependencies within the streamflow data. Notably, LightMamba achieves commendable performance with an average NSE of 0.904, 0.907, and 0.900 on the three stations. This study introduces an innovative backbone model for time series forecasting, which offers a novel approach to hybrid modeling for future daily runoff prediction.
HIGHLIGHTS
The effectiveness of a Mamba model for predicting long-term daily runoff is explored.
LightMamba was proposed as a lightweight model with partial normalization and MPM module.
The proposed model was compared with various benchmark methods and the results were analyzed.
Results show that the LightMamba generally worked well.
INTRODUCTION
Runoff forecasting plays a critical role in water resources planning and management (Liu et al. 2024), and high-precision runoff forecasts are essential for water supply, flood control, and power generation (Williams et al. 2021). Influenced by natural meteorological conditions, watershed characteristics, and human activities, the runoff sequence is characterized by nonlinearity, stochasticity, and periodicity (Sharafati et al. 2020). The search for more accurate runoff forecasting methods has been a major concern of scholars.
Existing runoff prediction methods generally fall into two broad categories: physically driven models and data-driven models (Nourani et al. 2021). Physically driven models help understand the underlying physics of hydrologic processes. These models construct complex mathematical models, such as high-dimensional partial differential equations, using data on hydrology, meteorology, topography, soil properties, and vegetation cover index, to simulate or predict river runoff (Farmer et al. 2003; Fenicia et al. 2008; Bai et al. 2009). Unfortunately, the degree of difficulty and the complexity of parameter estimation are high.
Unlike physically driven models, data-driven models focus only on the relationship between inputs and outputs (Kumar et al. 2022), and it is easy to conduct. As data-driven models continue to evolve, they can be further categorized into statistical models, dynamic and stochastic model, machine learning methods and deep learning models. Statistical models use statistical correlation methods to predict future runoff from historical runoff observations, such as Autoregressive Integrated Moving Average Model (ARIMA) (Farajzadeh et al. 2014). Statistical models can accurately capture the linear relationship between historical and future data, but they cannot account for the effects of external factors related on considered series, and the ability of catching nonlinear relationship between time series is weaker. Stochastic methods model time series based on stochastic processes (Dimitriadis et al. 2021). Complex nonlinear relationships in systems can be described. However, it is difficult to calculate various parameters of stochastic processes.
Machine learning models are capable of fitting nonlinear relationships between inputs and outputs (Mosavi et al. 2018) and can therefore be used for short-term runoff prediction. A variety of machine learning methods have been proposed for runoff prediction, including k-nearest neighbor (KNN) regression (Yang et al. 2020), support vector regression (SVR) (Bigdeli et al. 2023), eXtreme Gradient Boosting (XGBoost) (Szczepanek 2022), and others. However, these machine learning methods have limited capabilities in extracting deep information and lack the ability to obtain prior information about the sequence order. Therefore, temporal information must be added to the model through feature engineering (Dwivedi et al. 2022). However, the predictive performance of the models decreases rapidly as the dimensionality of the input features and the prediction steps increase.
With the rapid development of artificial intelligence and computing capability (e.g., GPUs), deep learning models are widely used in various tasks due to their ability to process high-dimensional input information. Owing to its self-ordering advantage, Recurrent Neural Network (RNN) excels at sequence tasks (Nosouhian et al. 2021). Long Short-Term Memory (LSTM) alleviates the problem of vanishing or exploding gradients in RNNs, and Gated Recurrent Unit (GRU) further improves training efficiency. Therefore, these two variants are often implemented as base models for runoff prediction (Xiang et al. 2020; Man et al. 2023; Yao et al. 2023). However, based on the structure of RNN, these models can only be trained in recursive mode. When the input sequence is long, the training of the neural network becomes inefficient.
Transformer (Vaswani et al. 2017) is widely used in the fields of natural language processing and computer vision, and it is also applied to the runoff prediction task (Yin et al. 2022; Fang et al. 2024). The core of transformer is the self-attention mechanism, which eliminates the sequential dependency problem in traditional recurrent neural networks by introducing an attention mechanism. Informer model (Zhou et al. 2021), which improves the efficiency of the transformer in the task of time series prediction, was used to runoff prediction and achieved good results (Du et al. 2022). However, the attention mechanism suffers from the quadratic complexity of the input sequence length and low inductive bias (Neyshabur 2020).
Recently, state space models (SSMs) have attracted a lot of attention in the NLP and CV communities due to their efficacy in long-range sequence modeling. Mamba introduces a selection mechanism into SSMs, giving them the ability to perform context-aware selection and capture long-range correlations (Gu & Dao 2023). It employs scanning operations for parallel computation during training while still maintaining linear complexity, allowing Mamba-based models to outperform transformers. As a sequence model, Mamba also has great potential for time series prediction tasks, which triggers our interest in exploring Mamba for runoff prediction.
The main contributions of this study can be categorized into three areas:
The Mamba model was used to predict runoff data to explore the potential application of Mamba as a backbone.
A new deep learning prediction model LightMamba is proposed. The Multi-Path-Mamba (MPM) module is utilized to reduce the number of parameters of the Mamba model, and the input sequences are locally regularized to improve the prediction accuracy of non-stationary time series such as runoff data.
We tested our model and compared it with several benchmark models in four evaluation metrics, and the results show that our model has better performance.
MATERIALS AND METHODS
Problem statement
In the time series forecasting task, the input to the model is , where . Predict future sequence using input X. L and T are the lengths of the time windows for inputting the past and predicting the future, respectively, and are referred to as the retrospective window and the predictive horizon. pi is the variable and D is the dimension of the variable.
State space model
Mamba
Algorithm 1 Mamba Block with Selective SSM
Input:
Output:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10: returnY
LightMamba
The Mamba’s number of parameters is primarily determined by its internal linear projection layer. When the input and output dimensions are both d, the linear layer contains parameters, which is the square level of the input variable’s dimension. Thus, reducing the input variable’s dimension to or less will result in a smaller number of parameters in the linear layer.
The MPM Algorithm 2 is proposed based on this idea. First, the input sequence X is split into different subspaces along the variable dimensions to obtain . Then, the smaller Mamba model is used to extract the sequence features under the subspace. Finally, the features from each subspace are merged to create the final output. MPM can help the model to learn time series features representations under different subspaces while using fewer parameters than the original model.
Algorithm 2 Multi-Path-Mamba
Input:
Output:
1:
2:
3: for each do
4:
5:
6: end for
7:
8: returnY
Model evaluation indicators
STUDY AREA AND DATA
Study area
Data
Station . | Min . | Max . | Mean . | Std . | Date range . |
---|---|---|---|---|---|
St. Louis | 1464.0 | 29166.3 | 7076.7 | 4329.1 | 1983/10/1–2023/09/30 |
Chester | 1407.3 | 28316.8 | 6809.0 | 4198.5 | 1983/10/1–2023/09/30 |
Thebes | 1166.7 | 29732.6 | 6509.5 | 4033.8 | 1983/10/1–2023/09/30 |
Station . | Min . | Max . | Mean . | Std . | Date range . |
---|---|---|---|---|---|
St. Louis | 1464.0 | 29166.3 | 7076.7 | 4329.1 | 1983/10/1–2023/09/30 |
Chester | 1407.3 | 28316.8 | 6809.0 | 4198.5 | 1983/10/1–2023/09/30 |
Thebes | 1166.7 | 29732.6 | 6509.5 | 4033.8 | 1983/10/1–2023/09/30 |
EXPERIMENTS
Prediction models
The models utilized in this paper are as follows:
Benchmark: BM.
Statistical and ML methods: ARIMA, SVR, KNN, XGBoost.
Recurrent neural network: RNN, LSTM, BiLSTM, GRU.
Attention-based models: Transformer, Informer.
Mamba-based models: Mamba, LightMamba.
Experiment settings
The data is divided into training, validation, and testing sets in a 7:1:2 ratio along the time axis. Normalization is performed on the training set, and the validation and test sets are normalized using the scale parameters of the training set. After making a prediction using the model, the results are denormalized using the same scale, to avoid information leakage (Li et al. 2023). The selected training model is the best performer from the validation set, ensuring the model’s generalization ability. The daily runoff for the next 7 days was predicted by all models using runoff data from the past 30 days, that is the in this study. The parameter settings for each model are shown in Table 2, in which the deep learning model shares some hyperparameters. All models will be trained three times, and the average of the evaluation metrics will be taken as the final result.
Model . | Parameter . |
---|---|
ARIMA | , , |
KNN | Number of neighbors |
SVR | Kernel type is linear |
XGBoost | Number of estimators 100, max depth 6 |
RNNLSTM | Hidden layer size 64, layers of RNNs 1 (BiLSTM 2), attention heads of transformers 4, dropout rate 0.1, loss MSE, optimizer AdamW, learning rate 0.001, learning rate decay factor 0.9, maximum training period 30, early-stop patience 5 |
BiLSTM | |
GRU | |
Transformer | |
Informer | |
Mamba | |
LightMamba |
Model . | Parameter . |
---|---|
ARIMA | , , |
KNN | Number of neighbors |
SVR | Kernel type is linear |
XGBoost | Number of estimators 100, max depth 6 |
RNNLSTM | Hidden layer size 64, layers of RNNs 1 (BiLSTM 2), attention heads of transformers 4, dropout rate 0.1, loss MSE, optimizer AdamW, learning rate 0.001, learning rate decay factor 0.9, maximum training period 30, early-stop patience 5 |
BiLSTM | |
GRU | |
Transformer | |
Informer | |
Mamba | |
LightMamba |
Result and analysis
Model . | Avg . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|
BM | 989.48 | 309.81 | 588.83 | 831.63 | 1042.27 | 1227.30 | 1390.89 | 1535.60 |
ARIMA | 904.27 | 205.73 | 465.42 | 716.13 | 944.48 | 1149.06 | 1338.48 | 1510.59 |
KNN | 1588.89 | 830.06 | 1154.43 | 1442.37 | 1689.35 | 1869.59 | 2013.46 | 2122.97 |
SVR | 1432.19 | 524.75 | 946.36 | 1286.92 | 1556.96 | 1761.53 | 1914.18 | 2034.62 |
XGBoost | 1331.62 | 252.50 | 752.27 | 1160.16 | 1488.43 | 1725.83 | 1903.24 | 2038.93 |
RNN | 902.94 | 257.29 | 494.71 | 729.26 | 950.20 | 1140.52 | 1302.69 | 1445.94 |
LSTM | 912.93 | 248.57 | 502.46 | 744.87 | 956.86 | 1154.15 | 1315.92 | 1467.72 |
BiLSTM | 909.29 | 243.53 | 499.24 | 743.94 | 958.58 | 1150.75 | 1314.11 | 1454.90 |
GRU | 889.00 | 253.46 | 484.74 | 725.03 | 930.91 | 1117.99 | 1282.23 | 1428.62 |
Transformer | 970.09 | 430.52 | 625.63 | 819.04 | 1000.91 | 1165.90 | 1314.67 | 1433.97 |
Informer | 959.05 | 437.69 | 602.01 | 803.45 | 989.61 | 1166.47 | 1291.52 | 1422.62 |
Mamba | 856.00 | 235.21 | 455.78 | 681.93 | 892.59 | 1081.13 | 1248.17 | 1397.16 |
LightMamba | 833.75 | 191.18 | 422.63 | 661.40 | 877.14 | 1068.54 | 1234.47 | 1380.87 |
Model . | Avg . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|
BM | 989.48 | 309.81 | 588.83 | 831.63 | 1042.27 | 1227.30 | 1390.89 | 1535.60 |
ARIMA | 904.27 | 205.73 | 465.42 | 716.13 | 944.48 | 1149.06 | 1338.48 | 1510.59 |
KNN | 1588.89 | 830.06 | 1154.43 | 1442.37 | 1689.35 | 1869.59 | 2013.46 | 2122.97 |
SVR | 1432.19 | 524.75 | 946.36 | 1286.92 | 1556.96 | 1761.53 | 1914.18 | 2034.62 |
XGBoost | 1331.62 | 252.50 | 752.27 | 1160.16 | 1488.43 | 1725.83 | 1903.24 | 2038.93 |
RNN | 902.94 | 257.29 | 494.71 | 729.26 | 950.20 | 1140.52 | 1302.69 | 1445.94 |
LSTM | 912.93 | 248.57 | 502.46 | 744.87 | 956.86 | 1154.15 | 1315.92 | 1467.72 |
BiLSTM | 909.29 | 243.53 | 499.24 | 743.94 | 958.58 | 1150.75 | 1314.11 | 1454.90 |
GRU | 889.00 | 253.46 | 484.74 | 725.03 | 930.91 | 1117.99 | 1282.23 | 1428.62 |
Transformer | 970.09 | 430.52 | 625.63 | 819.04 | 1000.91 | 1165.90 | 1314.67 | 1433.97 |
Informer | 959.05 | 437.69 | 602.01 | 803.45 | 989.61 | 1166.47 | 1291.52 | 1422.62 |
Mamba | 856.00 | 235.21 | 455.78 | 681.93 | 892.59 | 1081.13 | 1248.17 | 1397.16 |
LightMamba | 833.75 | 191.18 | 422.63 | 661.40 | 877.14 | 1068.54 | 1234.47 | 1380.87 |
Model . | Avg . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|
BM | 1503.74 | 493.46 | 925.91 | 1290.98 | 1600.19 | 1862.63 | 2083.94 | 2269.04 |
ARIMA | 1425.46 | 345.57 | 757.95 | 1145.84 | 1496.41 | 1811.72 | 2090.19 | 2330.54 |
KNN | 2318.50 | 1250.13 | 1726.09 | 2146.32 | 2476.32 | 2712.56 | 2888.03 | 3030.04 |
SVR | 2125.61 | 867.01 | 1507.39 | 1982.88 | 2321.32 | 2560.62 | 2741.91 | 2898.16 |
XGBoost | 1942.27 | 441.84 | 1171.33 | 1758.76 | 2187.99 | 2481.73 | 2692.83 | 2861.39 |
RNN | 1384.58 | 452.50 | 803.42 | 1143.27 | 1455.57 | 1733.11 | 1957.58 | 2146.62 |
LSTM | 1387.60 | 434.77 | 801.26 | 1149.31 | 1458.54 | 1739.39 | 1969.05 | 2160.88 |
BiLSTM | 1380.70 | 406.99 | 790.73 | 1148.12 | 1465.74 | 1739.40 | 1965.51 | 2148.44 |
GRU | 1371.60 | 429.09 | 778.21 | 1137.82 | 1443.73 | 1719.66 | 1949.90 | 2142.78 |
Transformer | 1515.09 | 682.19 | 995.55 | 1309.02 | 1589.43 | 1824.72 | 2019.52 | 2185.17 |
Informer | 1497.22 | 696.35 | 944.21 | 1287.12 | 1563.05 | 1814.74 | 1998.11 | 2177.00 |
Mamba | 1320.39 | 374.28 | 727.13 | 1076.42 | 1394.06 | 1669.89 | 1904.70 | 2096.26 |
LightMamba | 1286.01 | 308.04 | 680.41 | 1040.57 | 1363.11 | 1640.26 | 1883.45 | 2086.22 |
Model . | Avg . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|
BM | 1503.74 | 493.46 | 925.91 | 1290.98 | 1600.19 | 1862.63 | 2083.94 | 2269.04 |
ARIMA | 1425.46 | 345.57 | 757.95 | 1145.84 | 1496.41 | 1811.72 | 2090.19 | 2330.54 |
KNN | 2318.50 | 1250.13 | 1726.09 | 2146.32 | 2476.32 | 2712.56 | 2888.03 | 3030.04 |
SVR | 2125.61 | 867.01 | 1507.39 | 1982.88 | 2321.32 | 2560.62 | 2741.91 | 2898.16 |
XGBoost | 1942.27 | 441.84 | 1171.33 | 1758.76 | 2187.99 | 2481.73 | 2692.83 | 2861.39 |
RNN | 1384.58 | 452.50 | 803.42 | 1143.27 | 1455.57 | 1733.11 | 1957.58 | 2146.62 |
LSTM | 1387.60 | 434.77 | 801.26 | 1149.31 | 1458.54 | 1739.39 | 1969.05 | 2160.88 |
BiLSTM | 1380.70 | 406.99 | 790.73 | 1148.12 | 1465.74 | 1739.40 | 1965.51 | 2148.44 |
GRU | 1371.60 | 429.09 | 778.21 | 1137.82 | 1443.73 | 1719.66 | 1949.90 | 2142.78 |
Transformer | 1515.09 | 682.19 | 995.55 | 1309.02 | 1589.43 | 1824.72 | 2019.52 | 2185.17 |
Informer | 1497.22 | 696.35 | 944.21 | 1287.12 | 1563.05 | 1814.74 | 1998.11 | 2177.00 |
Mamba | 1320.39 | 374.28 | 727.13 | 1076.42 | 1394.06 | 1669.89 | 1904.70 | 2096.26 |
LightMamba | 1286.01 | 308.04 | 680.41 | 1040.57 | 1363.11 | 1640.26 | 1883.45 | 2086.22 |
Model . | Avg . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|
BM | 0.875 | 0.988 | 0.959 | 0.920 | 0.878 | 0.834 | 0.793 | 0.754 |
ARIMA | 0.882 | 0.994 | 0.973 | 0.937 | 0.893 | 0.843 | 0.791 | 0.740 |
KNN | 0.726 | 0.925 | 0.858 | 0.780 | 0.707 | 0.649 | 0.602 | 0.562 |
SVR | 0.762 | 0.964 | 0.891 | 0.812 | 0.743 | 0.687 | 0.641 | 0.599 |
XGBoost | 0.788 | 0.991 | 0.934 | 0.852 | 0.771 | 0.706 | 0.654 | 0.609 |
RNN | 0.893 | 0.990 | 0.969 | 0.937 | 0.899 | 0.856 | 0.817 | 0.780 |
LSTM | 0.892 | 0.991 | 0.969 | 0.937 | 0.898 | 0.855 | 0.815 | 0.777 |
BiLSTM | 0.892 | 0.992 | 0.970 | 0.937 | 0.897 | 0.855 | 0.815 | 0.780 |
GRU | 0.894 | 0.991 | 0.971 | 0.938 | 0.900 | 0.859 | 0.818 | 0.781 |
Transformer | 0.878 | 0.978 | 0.953 | 0.918 | 0.879 | 0.841 | 0.805 | 0.772 |
Informer | 0.881 | 0.977 | 0.957 | 0.921 | 0.883 | 0.843 | 0.809 | 0.774 |
Mamba | 0.900 | 0.993 | 0.975 | 0.945 | 0.907 | 0.867 | 0.827 | 0.790 |
LightMamba | 0.904 | 0.995 | 0.978 | 0.948 | 0.911 | 0.871 | 0.830 | 0.792 |
Model . | Avg . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|
BM | 0.875 | 0.988 | 0.959 | 0.920 | 0.878 | 0.834 | 0.793 | 0.754 |
ARIMA | 0.882 | 0.994 | 0.973 | 0.937 | 0.893 | 0.843 | 0.791 | 0.740 |
KNN | 0.726 | 0.925 | 0.858 | 0.780 | 0.707 | 0.649 | 0.602 | 0.562 |
SVR | 0.762 | 0.964 | 0.891 | 0.812 | 0.743 | 0.687 | 0.641 | 0.599 |
XGBoost | 0.788 | 0.991 | 0.934 | 0.852 | 0.771 | 0.706 | 0.654 | 0.609 |
RNN | 0.893 | 0.990 | 0.969 | 0.937 | 0.899 | 0.856 | 0.817 | 0.780 |
LSTM | 0.892 | 0.991 | 0.969 | 0.937 | 0.898 | 0.855 | 0.815 | 0.777 |
BiLSTM | 0.892 | 0.992 | 0.970 | 0.937 | 0.897 | 0.855 | 0.815 | 0.780 |
GRU | 0.894 | 0.991 | 0.971 | 0.938 | 0.900 | 0.859 | 0.818 | 0.781 |
Transformer | 0.878 | 0.978 | 0.953 | 0.918 | 0.879 | 0.841 | 0.805 | 0.772 |
Informer | 0.881 | 0.977 | 0.957 | 0.921 | 0.883 | 0.843 | 0.809 | 0.774 |
Mamba | 0.900 | 0.993 | 0.975 | 0.945 | 0.907 | 0.867 | 0.827 | 0.790 |
LightMamba | 0.904 | 0.995 | 0.978 | 0.948 | 0.911 | 0.871 | 0.830 | 0.792 |
From Tables 3–5 we can see that the ARIMA model has excellent forecasting performance on the 1-day forecasting and also performs better over the whole forecasting period compared to listed ML methods. This indicates that the ARIMA model can capture short-term linear relationships by considering the historical behavior and trend of the series, and the prediction is more robust in terms of the long-term trend (Fard & Akbari-Zadeh 2014). However, the ARIMA model has poorer prediction performance at relatively long step sizes. ARIMA is not suitable for predicting runoff, as it cannot handle time series with complex patterns or external factors influencing the series. Additionally, selecting the appropriate ARIMA model requires complex data preprocessing, making it difficult to apply in multivariate forecasting and practical applications.
The predictive power of machine learning methods in this study is inadequate. These models lack sequential structure and only learn the relationship between input features and outputs, without the ability to infer temporal order. Although XGBoost achieved an impressive accuracy of 0.991 NSE on the first day, its prediction error increased significantly with each subsequent step. As a result, the model’s overall performance was poor, with a score of only 0.788 NSE for the entire prediction period. This suggests that machine learning methods struggle to capture the temporal evolution of runoff sequences.
The recurrent neural network contains a priori information on sequence order and has a better prediction effect on time series. The GRU exhibits the best performance on four recurrent neural networks in terms of MAE, RMSE, and NSE, with values of 889.00, 1371.60, and 0.894, respectively, on St. Louis station, and has 25.5K parameters relatively lower than others. The effectiveness of GRU in runoff prediction is verified. Due to its superior prediction capacity and the advantage of the number of parameters, GRU is widely used in various runoff prediction works. However, the recursive structure of RNNs limits their ability to be trained in parallel. This inefficiency becomes more pronounced when the input sequence is long, requiring additional computational resources for neural network training.
However, our results show that the transformers did not achieve the expected accuracy. We analyzed that this is due to the low inductive bias of the transformers and its encoder and decoder structure, which results in a large number of parameters in this type of model. In this experiment, the parameters of the transformer and informer are as high as 93.5K and 106.0K. Although larger model parameters can increase the upper learning limit of the model, it is crucial to train it adequately with a significant amount of data. Therefore, this type of model is more suitable for large-scale datasets (Wen et al. 2022). In contrast, the runoff data is small, making it challenging to realize the potential of transformer models.
The Mamba-based model demonstrates superior performance in comparison to the other models, particularly in terms of the relatively long number of prediction steps and the overall performance. The MAE, RMSE, and NSE of Mamba are 856.00, 1320.39, and 0.900 on St. Louis station, respectively. The number of parameters is only 16.9K, which exceeds the prediction performance of GRU with a lower number of parameters. The results demonstrate that the Mamba-based model is capable of performing well in the runoff prediction task. The performance of LightMamba is particularly noteworthy, with MAE, RMSE, and NSE values of 833.75, 1286.01, and 0.904, respectively. These values illustrate a greater degree of accuracy in prediction than that observed in Mamba. This is due to the incorporation of local normalization and MPM into LightMamba. Local normalization enables the output of the model to be maintained within the same range as the inputs, thereby enhancing the prediction accuracy to a certain extent when confronted with a non-stationary time series such as runoff. Concurrently, our MPM algorithm, analogous to the multi-head attention mechanism, enables the Mamba to model the sequence relationship in disparate subspaces, thereby enhancing the model’s representation capability. Furthermore, MPM can effectively reduce the model parameters. LightMamba has only 10.2K parameters, which constitutes 60% of Mamba.
The average performance of each model across the three stations is summarized in Table 6. The best results are highlighted in bold, while the second-best results are shown in underlined text. The results indicate that the transformers exhibit a lower overall performance compared to the RNNs when confronted with this limited dataset. In contrast, the Mamba-based model achieves the best combined performance across the three stations, with higher predictive accuracy of the model while maintaining a lower number of parameters. In particular, an examination of the comprehensive performance of the three stations, as reflected in the RMSE metrics, reveals that our proposed model, LightMamba, exhibits a reduction of 14.71, 49.14, 7.67, and 16.16% in comparison to BM, XGBoost, GRU, and Informer, respectively.
Station . | St. Louis . | Chester . | Thebes . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | MAE . | RMSE . | MAPE . | NSE . | MAE . | RMSE . | MAPE . | NSE . | MAE . | RMSE . | MAPE . | NSE . |
BM | 989.48 | 1503.74 | 0.121 | 0.875 | 956.70 | 1445.57 | 0.123 | 0.876 | 893.05 | 1355.40 | 0.121 | 0.883 |
ARIMA | 904.27 | 1425.46 | 0.113 | 0.882 | 882.35 | 1376.12 | 0.115 | 0.882 | 846.52 | 1325.55 | 0.116 | 0.884 |
KNN | 1588.89 | 2318.50 | 0.196 | 0.726 | 1568.84 | 2278.42 | 0.201 | 0.717 | 1476.23 | 2144.91 | 0.197 | 0.729 |
SVR | 1432.19 | 2125.61 | 0.174 | 0.762 | 1373.94 | 2023.02 | 0.175 | 0.768 | 1265.08 | 1859.82 | 0.169 | 0.788 |
XGBoost | 1331.62 | 1942.27 | 0.165 | 0.788 | 1287.96 | 1875.96 | 0.167 | 0.788 | 1209.52 | 1777.83 | 0.164 | 0.798 |
RNN | 902.94 | 1384.58 | 0.118 | 0.893 | 877.93 | 1343.99 | 0.120 | 0.891 | 854.76 | 1324.41 | 0.121 | 0.888 |
LSTM | 912.93 | 1387.60 | 0.118 | 0.892 | 885.94 | 1355.29 | 0.119 | 0.889 | 846.58 | 1284.60 | 0.122 | 0.893 |
BiLSTM | 909.29 | 1380.70 | 0.118 | 0.892 | 893.07 | 1364.69 | 0.121 | 0.888 | 849.86 | 1304.55 | 0.120 | 0.890 |
GRU | 889.00 | 1371.60 | 0.114 | 0.894 | 875.30 | 1365.61 | 0.115 | 0.889 | 841.58 | 1301.76 | 0.117 | 0.891 |
Transformer | 970.09 | 1515.09 | 0.124 | 0.878 | 947.02 | 1458.48 | 0.126 | 0.878 | 904.76 | 1401.88 | 0.127 | 0.879 |
Informer | 959.05 | 1497.22 | 0.121 | 0.881 | 945.28 | 1459.25 | 0.125 | 0.878 | 903.28 | 1401.45 | 0.125 | 0.879 |
Mamba | 856.00 | 1320.39 | 0.111 | 0.900 | 828.69 | 1274.02 | 0.114 | 0.902 | 818.82 | 1241.25 | 0.119 | 0.900 |
LightMamba | 833.75 | 1286.01 | 0.106 | 0.904 | 799.07 | 1229.48 | 0.107 | 0.907 | 809.62 | 1236.29 | 0.113 | 0.900 |
Station . | St. Louis . | Chester . | Thebes . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | MAE . | RMSE . | MAPE . | NSE . | MAE . | RMSE . | MAPE . | NSE . | MAE . | RMSE . | MAPE . | NSE . |
BM | 989.48 | 1503.74 | 0.121 | 0.875 | 956.70 | 1445.57 | 0.123 | 0.876 | 893.05 | 1355.40 | 0.121 | 0.883 |
ARIMA | 904.27 | 1425.46 | 0.113 | 0.882 | 882.35 | 1376.12 | 0.115 | 0.882 | 846.52 | 1325.55 | 0.116 | 0.884 |
KNN | 1588.89 | 2318.50 | 0.196 | 0.726 | 1568.84 | 2278.42 | 0.201 | 0.717 | 1476.23 | 2144.91 | 0.197 | 0.729 |
SVR | 1432.19 | 2125.61 | 0.174 | 0.762 | 1373.94 | 2023.02 | 0.175 | 0.768 | 1265.08 | 1859.82 | 0.169 | 0.788 |
XGBoost | 1331.62 | 1942.27 | 0.165 | 0.788 | 1287.96 | 1875.96 | 0.167 | 0.788 | 1209.52 | 1777.83 | 0.164 | 0.798 |
RNN | 902.94 | 1384.58 | 0.118 | 0.893 | 877.93 | 1343.99 | 0.120 | 0.891 | 854.76 | 1324.41 | 0.121 | 0.888 |
LSTM | 912.93 | 1387.60 | 0.118 | 0.892 | 885.94 | 1355.29 | 0.119 | 0.889 | 846.58 | 1284.60 | 0.122 | 0.893 |
BiLSTM | 909.29 | 1380.70 | 0.118 | 0.892 | 893.07 | 1364.69 | 0.121 | 0.888 | 849.86 | 1304.55 | 0.120 | 0.890 |
GRU | 889.00 | 1371.60 | 0.114 | 0.894 | 875.30 | 1365.61 | 0.115 | 0.889 | 841.58 | 1301.76 | 0.117 | 0.891 |
Transformer | 970.09 | 1515.09 | 0.124 | 0.878 | 947.02 | 1458.48 | 0.126 | 0.878 | 904.76 | 1401.88 | 0.127 | 0.879 |
Informer | 959.05 | 1497.22 | 0.121 | 0.881 | 945.28 | 1459.25 | 0.125 | 0.878 | 903.28 | 1401.45 | 0.125 | 0.879 |
Mamba | 856.00 | 1320.39 | 0.111 | 0.900 | 828.69 | 1274.02 | 0.114 | 0.902 | 818.82 | 1241.25 | 0.119 | 0.900 |
LightMamba | 833.75 | 1286.01 | 0.106 | 0.904 | 799.07 | 1229.48 | 0.107 | 0.907 | 809.62 | 1236.29 | 0.113 | 0.900 |
CONCLUSIONS
This paper examines the potential of the Mamba model for runoff prediction. Compared to the statistical model, machine learning methods, recurrent neural networks, and attention-based models, the Mamba model demonstrates superior overall performance. The Mamba model is adept at extracting nonlinear and long-term dependencies in runoff data, and its prediction accuracy is enhanced while the number of parameters is reduced. The scanning algorithm in the Mamba block enables the model to be trained in parallel, resulting in a faster training speed and lower GPU memory usage than other models. This demonstrates the great potential of Mamba as a backbone model for runoff prediction tasks.
In this paper, we also propose a deep learning model for runoff prediction: LightMamba. It utilizes partial normalization and an MPM module, which improves the accuracy of Mamba on runoff prediction tasks and reduces the number of parameters. We validate LightMamba on runoff data from three stations, and the results demonstrate that LightMamba can effectively capture the temporal dependence in runoff data and outperforms previous methods.
However, in this study, only historical runoff data were utilized to predict future runoff, which is a univariate forecasting problem. Mamba is also adept at extracting synergistic relationships between multiple variables. Therefore, other variables related to runoff (e.g., rainfall, temperature, potential evapotranspiration, vegetation cover) can be subsequently collected and utilized. In addition, various decomposition methods are frequently employed in runoff prediction tasks, as they can significantly improve the prediction accuracy. Mamba may be considered as a backbone model, combined with data decomposition methods to improve the precision of runoff prediction, which is also our future research direction.
ACKNOWLEDGEMENTS
This work was supported by the National Natural Science Foundation of China (Grant No. 42130113). The numerical calculations in this paper were supported by the Supercomputing Center of Lanzhou University.
AUTHOR CONTRIBUTIONS
J.D. conceived and designed the study; J.D., H.D., and C.S. conducted the workshops; J.D. and H.D. conducted the survey; J.D., H.D., and C.S. performed the inspections; J.D. and H.D. analyzed the data; J.D., H.D., and L.W. wrote the draft paper. All authors have read and agreed to the published version of the manuscript.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository: https://grdc.bafg.de/.
CONFLICT OF INTEREST
The authors declare there is no conflict.