ABSTRACT
Flood forecasting plays an important role in water resources management and flood prevention, leading to enormous research interests. Given the considerable data amount and computational complexity, many scholars have considered this problem from the perspective of time-series forecasting and made substantial contributions in a data-driven manner. Based on this, a novel method is proposed to handle flood forecasting in this study, which is a modified denoising diffusion probabilistic model (DDPM). In addition, a novel attention mechanism, Graph Kolmogorov–Arnold Attention (GKAT) is designed. Since the Kolmogorov–Arnold Network (KAN) utilizes tunable activation functions, it increases the interpretability of complex hydrological models. Meanwhile, spatio-temporal attention (SA) is also adopted, considering the time-variant characteristic of the time-series data. Therefore, this method is termed GKASA–DDPM. Furthermore, Savitzky-Golay smoothing mechanism is deployed in the post-processing procedure to adjust the predicted results according to practical observation. Multiple experiments are executed to exhibit the superior performance of the proposed method, involving seven models and observed hydrological data collected from Xiaoqing River basin above Huangtaiqiao Hydrological Station. Comprehensive results show that GKASA–DDPM achieves the highest prediction accuracy under all experimental conditions, over 0.9 when evaluated by Nash–Sutcliffe Efficiency (NSE).
HIGHLIGHTS
A novel flood forecasting model based on Graph Kolmogorov–Arnold Attention and spatio-temporal attention under smoothing DDPM is proposed.
The proposed model achieves the highest forecast accuracy under all experimental conditions.
The proposed model also accurately captures peak flows and time-to-peak.
INTRODUCTION
Flood forecasting, a crucial concern in hydrology, is an essential component of the basin flood early warning system. Accurate flood forecasting can provide scientific and effective support for water resources management and flood prevention (Xu et al. 2024). Extensive work has been done on flood forecasting in hydrology, which can be broadly classified into two main categories. The first one is the process-based hydrologic models, considering the physical mechanisms of hydrological processes. Although these models can offer strong interpretability, the modeling process heavily depends on accurate hydrological, topographic, and vegetation data (Wan et al. 2023). The second category encompasses the data-driven models, which aim to identify and extract latent patterns from hydrological data and apply them to future data. These models can avoid complex hydrological processes, including evapotranspiration, runoff generation, and concentration in basins, by establishing a direct mapping relationship between rainfall and runoff (Nourani et al. 2011; Shortridge et al. 2016; Van et al. 2020).
Therefore, many scholars have focused on data-driven models, including traditional time-series statistical models, machine learning (ML) models, and deep learning models (Khatun et al. 2023). Autoregressive Integrated Moving Average (ARIMA) is a commonly used example of traditional time-series statistical models in flood forecasting (He et al. 2019). However, the high nonlinearity and stochasticity inherent in hydrological time-series sequence may present a significant challenge for traditional statistical models to achieve accurate forecasting. Time-series modeling studies based on ML have demonstrated effective processing performance for large-scale, nonlinear time-series sequences (Shen 2018), and have been extensively employed in flood forecasting, including extreme learning machine (ELM) (Yaseen et al. 2016), support vector machine (SVM) (Bafitlhile & Li 2019), Adaptive Neuro-Fuzzy Inference System (ANFIS) (Zhou et al. 2019), etc. Dharmarathne et al. (2024) reviewed commonly used ML methods and compared the performance of urban flood forecasting models under climate change conditions. The study concluded that ML-driven warning systems can remain effective under constantly changing climate conditions. Madhushani et al. (2024) employed four ML methods – histogram gradient boosting (HGB), extreme gradient boosting (XGB), deep neural network (DNN), and convolutional neural network (CNN) – to predict streamflow in ungauged basins. The experiments revealed that XGB outperforms other models and holds particular significance for managing flood risk factors in urban areas. However, due to the limitations of architectures and the uncertainties of parameters, ML models are prone to underestimating peak flows and exhibiting delays in rainfall–runoff simulations (Li et al. 2024).
Recent studies increasingly apply deep learning to flood time-series forecasting. The long short-term memory (LSTM), with its gating mechanism, excels at capturing sequential dependencies, making it suitable for hydrological data with high autocorrelation and cross-correlation. Kratzert et al. (2018) demonstrated LSTM's effectiveness in rainfall–runoff modeling for the first time. Cui et al. (2022) enhanced LSTM with an encoder–decoder architecture, improving multi-step flood forecasting accuracy. Variants such as gate recurrent unit (GRU), bidirectional long short-term memory (BiLSTM) and bidirectional gate recurrent unit (BiGRU) have also been explored, with GRU simplifying LSTM's structure, and BiLSTM/BiGRU capturing bidirectional features to retain flood peak information and reduce time-delay questions. Miau & Hung (2020) combined the Convolutional Neural Networks model with GRU for effective abnormal water level detection. Cao et al. (2022) developed BiLSTM based on the sequence-to-sequence method for accurate streamflow forecasting. Bao et al. (2023) proposed a water-level-forecast model based on BiGRU to mitigate accuracy degradation over extended time spans. However, LSTM-based models have many limitations, including insufficient knowledge of the sample distribution and accumulation of forecast bias with time (Feng et al. 2021). Especially, LSTM and GRU are able to readily transmit the forecast bias from the preceding forecast horizons to the subsequent one via its recursive process, thereby precipitating a precipitous accumulation of forecast bias and seriously affecting forecast accuracy (Young et al. 2017; Zhou et al. 2019; Kurian et al. 2020).
Due to its flexible structure and attention mechanism, transformer excels at complex time-series forecasting tasks and shows promise in addressing temporal dependencies, outperforming LSTM in long-term flood forecasting. Double-encoder transformer by Liu et al. (2022) demonstrated superior performance over LSTM-based models. However, transformer's self-attention mechanisms, such as permutation invariance, can lead to temporal information loss, limiting its effectiveness in long-term forecasting (Li et al. 2024).
The employment of generative models to address the problem of long-term correlation in flood forecasting is a highly effective solution (Yang et al. 2020). The main benefit of using generative models is that with sufficient training, the model is allowed to capture the rainfall–runoff relationship over the whole period and converges to the actual model relatively quickly, thus enabling long-term accurate forecasts of floods (Shao et al. 2024). One of the most widely used generative models is the generative adversarial network (GAN) first developed by Goodfellow (Goodfellow et al. 2014). Cheng et al. (2021) established the deep convolutional generative adversarial network (DCGAN) and found that the model could grasp the complex streamflow features over the urban area and improve the forecast accuracy. However, GAN networks, due to their adversarial training nature, may encounter questions such as backward collapse and pattern collapse (Shao et al. 2024).
The denoising diffusion probabilistic model (DDPM) can effectively address the above challenges. Its potential has been explored in various fields, including image recognition (Dhariwal & Nichol 2021; Rombach et al. 2021), protein sequence analysis (Anand & Achim 2022), video synthesis (Harvey et al. 2022; Ho et al. 2022), and threat detection (Blau et al. 2022). After the introduction of diffusion models, significant progress has been made in the field of time-series prediction. Diffusion models excel at generating high-quality, complex sequences, including time-series and spatio-temporal data, by gradually removing noise to achieve detailed coherence (Yang et al. 2024). This capability has been demonstrated in studies such as TimeGrad and D3VAE (Chang et al. 2024). TimeGrad utilizes LSTM or GRU networks to constrain the diffusion process to hidden states extracted from historical data. It achieved exceptional performance by evaluating six commonly used time-series benchmarks on real-world datasets with thousands of correlated dimensions. D3VAE introduces a bidirectional variational auto-encoder that incorporates both diffusion and denoising processes. Extensive experiments on synthetic and real-world data validated that the proposed generative model achieved state-of-the-art performance compared to existing competitive generative models (Li et al. 2023). Since hydrological flood forecasting problems can be viewed as a subset of time-series forecasting, this study aims to model and simulate flow using DDPM models. For most models, successful training usually relies on having accurate and sufficient training data. In the context of flood forecasting in this study, it is crucial to accurately identify the features of hydrological data and effectively utilize them to train the diffusion model. The existing studies on DDPM in time-series forecasting are primarily concerned with capturing temporal dependency. Yan et al. (2021) proposed a multivariate probabilistic time-series forecasting framework based on continuous energy-based generative models (ScoreGrad). This model introduced a converter-based attention mechanism to build a time-series feature extraction module, which improves the ability to extract temporal features. Chang et al. (2024) proposed a Transformer-based diffusion probabilistic model for sparse time-series forecasting (TDSTF). This model merged a residual network based on Transformer and diffusion models to characterize complex temporal relationships and enhance computational efficiency. However, these models cannot simultaneously describe spatial and temporal dependency, thus failing to improve further hydrological data utilization and flood forecasting accuracy. In addition, the DDPM models are particularly susceptible to alterations in the noise scale and noise steps (Shao et al. 2024), which lead to poor performance in forecasting the recession limb and exhibit significant jagged fluctuations at the end of the flood hydrograph.
To solve the problems mentioned above, this paper proposes GKASA–DDPM, a novel flood forecasting model based on Graph Kolmogorov–Arnold Attention (GKAT) and spatio-temporal attention under smoothing DDPM. First, the model extends the basic concepts of Kolmogorov–Arnold Network (KAN) to graph attention (GAT) and incorporates an innovative attention mechanism, GKAT. It utilizes the characteristics of KAN to directly merge learnable functions into the edges of the graph, reducing the distribution differences among time periods that are easily caused by GAT. At the same time, a spatio-temporal two-dimensional (2D) attention is proposed under the original DDPM framework to extract effective information on flood time-series data from rainfall–runoff characteristics and different time steps at hydrological stations, which solves the problem that the original DDPM is unable to simultaneously describe spatial and temporal dependency, thus failing to improve further the utilization of hydrological data, and is beneficial for increasing model interpretability. In addition, the post-processing Savitzky-Golay smoothing mechanism is also introduced, considering the possibility of jagged fluctuations that might emerge in the recession limb.
In order to improve the forecasting performance of the original DDPM, three algorithms are included in GKASA–DDPM:
Introduction of GKAT: The proposed GKASA–DDPM model introduces GKAT to enhance the model's understanding of the spatial structure of time-series sequence data. This innovation improves the modeling capability of the model for hydrological data and addresses the question of insufficient feature information extraction.
Spatio-temporal 2D attention mechanism: Inspired by CSDI, GKASA–DDPM incorporates a 2D attention mechanism in its model structure. This approach enables the model to focus globally on input sequence information, significantly enhancing its ability to capture global dependencies and improving long-term flood forecasting performance.
Savitzky-Golay smoothing mechanism: This mechanism reduces jagged fluctuations introduced by the DDPM network, preserves peak flow features, and improves prediction accuracy, enhancing the model's performance over longer forecast horizons.
The rest of the paper is organized as follows. Section 2 introduces the structure of the proposed GKASA–DDPM. Section 3 describes the study basin and data. Section 4 evaluates the forecast performance in the Xiaoqing River basin above the Huangtaiqiao hydrological station in Jinan City and compares it with other six commonly used representative forecasting models. Section 5 discusses the advantages of the proposed model. Section 6 provides the summary of the research.
METHODOLOGY
Denoising diffusion probabilistic model














GKASA–DDPM
Although the classical DDPM framework can model the dynamic changes of flood time-series data, its effectiveness may be limited in the context of hydrological spatio-temporal information (Shao et al. 2024). Moreover, the DDPM models are particularly susceptible to alterations in the noise scale and noise steps, which lead to poor performance in forecasting the recession limb and exhibit significant jagged fluctuations at the end of the flood hydrograph. Based on this, addressing the specific characteristics of flood time-series data, this paper proposes an enhancement to the DDPM model, as detailed below. We provide the ablations in Supplementary material, Appendix C.
This study innovatively proposes a novel attention network, GKAT. It extends the principles of KAN to GAT. GAT is a deep learning model based on the graph convolutional network (GCN) (Bao et al. 2023). It incorporates the attention mechanism into the nodes of neural networks to extract the overall information without requiring knowledge of the entire network structure. This allows the model to focus on the essential differences between nodes, leading to enhanced performance and efficacy in handling large-scale databases. The adaptive learning of GAT enables it to identify and utilize the relationships between nodes.











GAT primarily focuses on the connectivity of edges during feature learning, that is, whether a connection relationship exists between two features. However, the edge feature also includes weights, directions, and other attributes. GAT cannot capture complex feature relationships well. Additionally, GAT calculates a linear combination of features for each node as the final output. When performing GAT network operations on neighboring nodes, the outputs of neighboring nodes exhibit strong linear correlations with the output of its node, which limits its ability to learn complex nonlinear spatial correlations effectively.
















After going through all layers, the final learned features are mapped as the output of the encoder. This mechanism extracts effective information on flood time-series data from rainfall–runoff characteristics and different time steps at hydrological stations and employs temporal and feature weights in order to determine the comprehensive weights of input variables, thereby identifying and emphasizing the relatively important forecasting information and consequently improving the forecast accuracy.
Due to the iterative nature of its structure, the DDPM model is particularly susceptible to alterations in the noise scale and noise steps (Shao et al. 2024), which leads to poor performance in forecasting the recession limb and exhibits significant jagged fluctuations at the end of the flood hydrograph. Although these fluctuations do not affect the overall trend of the time-series forecast, they impact the forecast accuracy. In this experiment, as shown in Figure 1, at the end of the model, the output streamflow sequence data from the improved DDPM network is smoothed using the Savitzky-Golay smoothing mechanism. This reduces the jagged fluctuations introduced by the DDPM network, preserves the peak flow characteristics well, and improves the forecast accuracy.
The Savitzky-Golay smoothing mechanism is a method that smooths the curve based on the average trend of the time series. It uses a local polynomial least squares fitting in the time domain, which can simulate the long-term trend of the entire time-series data. This method filters out the fluctuations while preserving the shape of the signal.







STUDY AREA AND MATERIALS
Study area
Location and stations of the study area. (red circles represent rainfall stations; red triangle represents hydrological station).
Location and stations of the study area. (red circles represent rainfall stations; red triangle represents hydrological station).
Dataset and input-output selection
This research collects the observed hydrological data series from 1998 to 2021, which included precipitation data from five rainfall stations (Liujiazhuang, Wujiapu, Donghongmiao, Xinglong, Yanzishan) and runoff data from Huangtaiqiao hydrological station and selects discharge as the forecast target. 41 typical flood events, including single-peak and multipeak events, are extracted from the collected hydrological data. Among these flood events, 75% (30 events) are randomly selected as the training set, and the remaining 25% (11 events) are used as the testing set for evaluating model performance using the sliding window method. The 30 training events are processed by the sliding window method, with input-output applied to generate training samples. From these samples, 10% are randomly selected as the validation set, and the remaining 90% are employed for the training set. The training set is utilized to train the forecasting model, while the validation set is used to optimize hyperparameters. Finally, according to the flow confluence time of the basin, the precipitation and streamflow data for the were adopted as the inputs of the neural networks. The target outputs were the streamflow to the 3-h forecast horizons. We provide the rationale for the input-output prediction horizon in Supplementary material, Appendix A.
Evaluation metrics











EXPERIMENTS AND RESULTS
To validate the effectiveness of the proposed model for flood forecasting, the following experiments are carried out in this paper. The proposed GKASA–DDPM model forecasting results are compared with those of other representative forecasting models, including LSTM, GRU, BiLSTM, BiGRU, Transformer, and Conditional Score-based Diffusion models for Imputation (CSDI) (Tashiro et al. 2021). LSTM, GRU, BiLSTM, and BiGRU are commonly used neural network time-series forecasting models based on recurrent neural networks. In these models, the number of hidden layer units is set to 64, the ReLU optimization algorithm is used, and the batch size is set to 32. Transformer is a neural network model that has recently been frequently used for flood forecasting. In this model, the ReLU optimization algorithm is used, and the batch size is set to 32. The original DDPM cannot be directly used for time-series forecasting. CSDI is a commonly used DDPM-based time-series forecasting model that utilizes score-based diffusion models conditioned on observed data. In this model, the diffusion step is set to 50, and the embedding dim is set to 128. We provide the reason for the choice of hyperparameters in Supplementary material, Appendix B.
Forecasting performance by evaluation metrics
Tables 1–3 shows the evaluation metrics at different output stages for seven models. It demonstrates that as the forecast step increases, the overall forecasting effectiveness of the models gradually declines. Compared with classic networks, the DDPM-based models (CSDI and GKASA–DDPM) have the least degraded forecast performance. In particular, the NSEs of LSTM, BiLSTM, and Transformer at decrease from 0.885, 0.898, and 0.897 to 0.698, 0.768, and 0.786, respectively, while the NSEs using DDPM-based models can achieve more than 0.883. At 2-h forecast horizon, although BiGRU has the highest NSE, the proposed GKASA–DDPM is close to it. The results suggest that the diffusion models significantly enhance the model's forecasting capacity over longer forecast horizons in this test case, thereby indicating that DDPM may have the potential to increase the stability of the forecasting model and is more suitable for flood forecasting.
The evaluation indices of the different forecasting models (1-h forecast horizon)
Groups . | Models . | Indices . | ||||
---|---|---|---|---|---|---|
RMSE . | MAE . | NSE . | ![]() | ![]() | ||
Conventional | GRU | 15.62 | 11.97 | 0.887 | 10.80% | 0.89 |
LSTM | 16.21 | 12.64 | 0.885 | 12.18% | 1.22 | |
BiGRU | 15.29 | 11.39 | 0.965 | 10.19% | 0.78 | |
BiLSTM | 15.70 | 12.27 | 0.898 | 9.68% | 1.11 | |
Advanced | Transformer | 8.92 | 6.02 | 0.897 | 7.54% | 0.67 |
CSDI | 4.86 | 4.47 | 0.938 | 7.39% | 0.30 | |
GKASA–DDPM | 3.25 | 2.53 | 0.973 | 6.17% | 0.20 |
Groups . | Models . | Indices . | ||||
---|---|---|---|---|---|---|
RMSE . | MAE . | NSE . | ![]() | ![]() | ||
Conventional | GRU | 15.62 | 11.97 | 0.887 | 10.80% | 0.89 |
LSTM | 16.21 | 12.64 | 0.885 | 12.18% | 1.22 | |
BiGRU | 15.29 | 11.39 | 0.965 | 10.19% | 0.78 | |
BiLSTM | 15.70 | 12.27 | 0.898 | 9.68% | 1.11 | |
Advanced | Transformer | 8.92 | 6.02 | 0.897 | 7.54% | 0.67 |
CSDI | 4.86 | 4.47 | 0.938 | 7.39% | 0.30 | |
GKASA–DDPM | 3.25 | 2.53 | 0.973 | 6.17% | 0.20 |
The bolded values represent the best performance.
The evaluation indices of the different forecasting models (2-h forecast horizon)
Groups . | Models . | Indices . | ||||
---|---|---|---|---|---|---|
RMSE . | MAE . | NSE . | ![]() | ![]() | ||
Conventional | GRU | 16.06 | 12.10 | 0.859 | 14.32% | 1.67 |
LSTM | 17.74 | 12.55 | 0.860 | 15.14% | 2.00 | |
BiGRU | 16.47 | 12.12 | 0.941 | 12.03% | 1.33 | |
BiLSTM | 15.71 | 12.57 | 0.880 | 8.45% | 1.22 | |
Advanced | Transformer | 9.77 | 6.06 | 0.819 | 12.05% | 1.00 |
CSDI | 4.95 | 5.59 | 0.905 | 8.42% | 0.30 | |
GKASA–DDPM | 3.84 | 2.91 | 0.936 | 7.99% | 0.30 |
Groups . | Models . | Indices . | ||||
---|---|---|---|---|---|---|
RMSE . | MAE . | NSE . | ![]() | ![]() | ||
Conventional | GRU | 16.06 | 12.10 | 0.859 | 14.32% | 1.67 |
LSTM | 17.74 | 12.55 | 0.860 | 15.14% | 2.00 | |
BiGRU | 16.47 | 12.12 | 0.941 | 12.03% | 1.33 | |
BiLSTM | 15.71 | 12.57 | 0.880 | 8.45% | 1.22 | |
Advanced | Transformer | 9.77 | 6.06 | 0.819 | 12.05% | 1.00 |
CSDI | 4.95 | 5.59 | 0.905 | 8.42% | 0.30 | |
GKASA–DDPM | 3.84 | 2.91 | 0.936 | 7.99% | 0.30 |
The bolded values represent the best performance.
The evaluation indices of the different forecasting models (3-h forecast horizon)
Groups . | Models . | Indices . | ||||
---|---|---|---|---|---|---|
RMSE . | MAE . | NSE . | ![]() | ![]() | ||
Conventional | GRU | 17.19 | 12.62 | 0.791 | 15.79% | 1.78 |
LSTM | 18.15 | 13.18 | 0.698 | 25.49% | 2.00 | |
BiGRU | 16.75 | 12.14 | 0.874 | 12.95% | 1.67 | |
BiLSTM | 17.58 | 13.20 | 0.768 | 14.30% | 1.78 | |
Advanced | Transformer | 14.66 | 10.48 | 0.786 | 12.84% | 1.67 |
CSDI | 6.83 | 5.66 | 0.883 | 8.39% | 0.60 | |
GKASA–DDPM | 3.98 | 3.20 | 0.900 | 8.17% | 0.50 |
Groups . | Models . | Indices . | ||||
---|---|---|---|---|---|---|
RMSE . | MAE . | NSE . | ![]() | ![]() | ||
Conventional | GRU | 17.19 | 12.62 | 0.791 | 15.79% | 1.78 |
LSTM | 18.15 | 13.18 | 0.698 | 25.49% | 2.00 | |
BiGRU | 16.75 | 12.14 | 0.874 | 12.95% | 1.67 | |
BiLSTM | 17.58 | 13.20 | 0.768 | 14.30% | 1.78 | |
Advanced | Transformer | 14.66 | 10.48 | 0.786 | 12.84% | 1.67 |
CSDI | 6.83 | 5.66 | 0.883 | 8.39% | 0.60 | |
GKASA–DDPM | 3.98 | 3.20 | 0.900 | 8.17% | 0.50 |
The bolded values represent the best performance.
Compared with the best forecasting performance of recurrent neural networks BiGRU, Transformer reduces the RMSE by 41.66% and the MAE by 47.15% at 1-h forecast horizon. The finding reveals that Transformer is a more suitable choice for short-term flood forecasting. Its unique multi-head attention mechanism structure and hyper-parameter optimization algorithm enable it to demonstrate robust forecasting capabilities within minor times. However, at 3-h forecast horizon, the forecast accuracy of Transformer decreases significantly, with RMSE, MAE, and NSE of 14.66, 10.48, and 0.786, respectively, which indicates that the long-term forecasting stability of Transformer in flood forecasting needs to be improved.
Notably, the GKASA–DDPM model achieves at least a 22% lower RMSE and a 43% lower MAE, compared to the CSDI model. Meanwhile, the GKASA–DDPM model's NSE is 4% higher than that of the CSDI model. The results indicated that the GKASA–DDPM model demonstrates superior performance in enhancing the precision of flood forecasting in comparison to the CSDI model, particularly in terms of the MAE metric, which exhibited a notable advantage. MAE considers the absolute error between the observed and predicted streamflow and is unaffected by outliers, so it can provide an accurate evaluation of forecasting network robustness. It could be concluded that the performance of the GKASA–DDPM model is better than that of the original DDPM-based time-series forecasting model in the field of flood forecasting.
The last two columns of the above Tables show the evaluation results of time-to-peak and peak flows for different models. For time-to-peak, the proposed GKASA–DDPM could substantially reduce the and accurately capture the time-to-peak. In contrast, the
of the LSTM-based models gradually increased with the forecast hour. This suggested that phase bias was unavoidable in flood forecasting using LSTM-based models, particularly when considering longer forecast periods. While Transformer shows a relatively robust performance in forecasting short-term time-to-peak, its efficacy decreases significantly at 3-h forecast horizon, resulting in substantial
. As the forecast step increased, the time-to-peak of the CSDI model slightly lagged. For peak flows, the evaluation results reveal that the proposed GKASA–DDPM could facilitate a more centralized distribution and a reduction in
. As the forecast step increased, although the error distribution exhibited a gradual dispersion, the
remained the smallest among all models. In contrast, LSTM has the highest
, while the proposed GKASA–DDPM model has the lowest
. It could be seen that the LSTM-based models severely underestimated peak flows, and the decay in forecast accuracy increased with the forecast hour. Transformer also demonstrates a tendency to underestimate peak flows. In summary, the minimal
and
highlight the superior flood peak performance of the proposed GKASA–DDPM.
Scatter plots of the different forecasting models: (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.
Scatter plots of the different forecasting models: (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.
Forecasting performance of representative flood events
The forecasting process of the flood event (in 1998): (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.
The forecasting process of the flood event (in 1998): (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.
The forecasting process of the flood event (in 2018): (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.
The forecasting process of the flood event (in 2018): (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.
For the flood event (in 1998), the proposed GKASA–DDPM model performs best, followed by the CSDI model. These two models have similar errors in the second and third forecasting peak floods, with GKASA–DDPM slightly better than CSDI. However, GKASA–DDPM is closer to the observed streamflow than CSDI in the first forecasting peak flood, indicating that the GKASA–DDPM model exhibits superior performance in terms of time-to-peak, peak flow, and flood processes. Among other models, the LSTM model exhibits significant fluctuations, produces unreasonable flood processes, and has the worst forecast performance. The BiLSTM model significantly overestimates peak flows, potentially leading to suboptimal flood control decisions. Compared to the LSTM-based model, Transformer's forecasting results are relatively good, but there is still a problem of underestimating peak flow. In addition, due to the application of the Savitzky-Golay smoothing mechanism, GKASA–DDPM mitigates the jagged fluctuations in the recession limb and fits the observed streamflow than CSDI.
For the flood event (in 2018), the observed flood hydrograph demonstrates a rapid increase, which can be attributed to the relative concentration of precipitation. The GKASA–DDPM model demonstrates strong performance in peak flow and time-to-peak forecasting. Although the CSDI model accurately captures time-to-peak, it underestimates the peak flow, performs poorly in forecasting the recession limb, and exhibits significant jagged fluctuations at the end of the flood hydrograph. Among other models, LSTM severely underestimates the peak flow and produces irregular fluctuations at the peak flow, which leads to its inability to forecast a reasonable shape of the flood. The BiLSTM model and BiGRU model can estimate the peak flow at 1-h forecast horizon, but the accuracy of the forecasting peak flow shows a notable decline with the forecast hour, resulting in relatively low forecast performance. The time-to-peak of the Transformer model slightly lags. In summary, the proposed GKASA–DDPM model exhibits proficiency in forecasting the flood rise and receding process, thereby providing more accurate forecast results and well-suited flood forecasting.
DISCUSSION
This experiment compares the forecasting performance of LSTM, GRU, BiLSTM, BiGRU, Transformer, CSDI, and GKASA–DDPM in the Xiaoqing River basin above the Huangtaiqiao hydrological station.
Traditional neural networks use LSTM-based gating mechanisms to mitigate vanishing gradients and learn time-series patterns, achieving strong performance. However, LSTM and GRU rely on retaining input information at each time step, leading to error accumulation over longer forecast horizons. Transformer, while effective at capturing long-term dependencies through self-attention, suffers from temporal information loss due to properties like permutation invariance, resulting in poor long-term performance. Although CSDI prevents error accumulation, its performance is limited by boundary inconsistencies and sensitivity to noise scale and steps.
Compared to the LSTM, GRU, BiLSTM, BiGRU, Transformer, and CSDI, the advantages of GKASA–DDPM in flood forecasting mainly include the following three points:
Flood peak performance: GKASA–DDPM significantly enhances the ability to capture global dependencies and extract features in the flood time-series sequence, which prevents LSTM-based models from selectively discarding noisy data through forgetting gates and allows the information at the flood peaks to be retained and more fully exploited. This not only improves the accuracy of the flood peak flow but also overcomes the time-delay problem to some extent. Tables 1–3 reveal that the GKASA–DDPM model has the lowest and
. It can be concluded that the GKASA–DDPM model is more suitable for flood forecasting.
Long-term performance: GKASA–DDPM combines GKAT with DDPM for flood forecasting. It enhances the understanding of spatial features in high-dimensional rainfall–runoff data and addresses the question of insufficient feature information extraction. Additionally, GKASA–DDPM incorporates a spatio-temporal 2D attention mechanism in its model structure. This approach enables the model to focus globally on input sequence information, significantly enhancing its ability to capture global dependencies and improving long-term flood forecasting performance. Finally, a key challenge in long-term prediction is noise. GKASA–DDPM employs the Savitzky-Golay smoothing mechanism to reduce the impact of noise on the model. As shown in Figure 7, the scattered points of the other models gradually display a deviation from the 1:1 ideal line with the forecast hour, especially at 3-h forecast horizon. It can be concluded that the GKASA–DDPM model has the most optimal forecast performance and can effectively minimize the forecast error over longer forecast periods.
Smoothing performance: Under the introduction of the Savitzky-Golay smoothing mechanism, the proposed GKASA–DDPM model mitigates the potential for jagged fluctuations that may emerge at the end of the flood hydrograph, thereby enhancing the forecasting precision. As shown in Figures 8 and 9, the GKASA–DDPM is significantly smooth in the recession limb and more fitted to the observed streamflow than the other models.
It is acknowledged that GKASA–DDPM has high computational complexity. GKASA–DDPM consumes over 0.1 billion Multiply-Accumulate (MAC)s per batch during inference and has 0.28 million parameters. Given that our model's predictions are primarily executed on high-performance computing systems and computational capabilities continue to advance, the computational complexity of GKASA–DDPM is not a significant long-term challenge.
CONCLUSION
This paper proposes GKASA–DDPM, a novel flood forecasting model based on GKAT and spatio-temporal attention under smoothing DDPM for improving flood forecasting accuracy. To evaluate the efficacy of the proposed model GKASA–DDPM, the paper takes the Xiaoqing River basin above the Huangtaiqiao hydrological station in Jinan City as the study area and conducts a series of comparative experiments with various advanced forecasting models, including LSTM, GRU, BiLSTM, BiGRU, Transformer, and CSDI. These experiments evaluate the forecasting performance of the models from different perspectives, such as forecast accuracy, flood peaks, and flood processes. Comparative experimental results show that GKASA–DDPM can significantly reduce forecast error and improve accuracy. Furthermore, the results indicate that GKASA–DDPM accurately captures peak flows and peak times.
It should be noted that certain limitations are associated with this study. Comparative experiments only consider the Xiaoqing River basin above the Huangtaiqiao hydrological station as the study area. Future research can explore the efficacy of the GKASA–DDPM model in global river basins, including regions with high streamflow variability and arid areas. Additionally, this study only uses precipitation and runoff data from hydrological stations, future work can incorporate meteorological data from satellite observations and evaporation data to enhance the model's performance further.
FUNDING
This study was supported by the National Key Research and Development Program of China (2022YFC3005501), the MWR Major Science & Technology Program (SKS-2022007) and the IWHR Research & Development Support Program (WH0145B022021, JZ110145B0062024).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.