Flood forecasting plays an important role in water resources management and flood prevention, leading to enormous research interests. Given the considerable data amount and computational complexity, many scholars have considered this problem from the perspective of time-series forecasting and made substantial contributions in a data-driven manner. Based on this, a novel method is proposed to handle flood forecasting in this study, which is a modified denoising diffusion probabilistic model (DDPM). In addition, a novel attention mechanism, Graph Kolmogorov–Arnold Attention (GKAT) is designed. Since the Kolmogorov–Arnold Network (KAN) utilizes tunable activation functions, it increases the interpretability of complex hydrological models. Meanwhile, spatio-temporal attention (SA) is also adopted, considering the time-variant characteristic of the time-series data. Therefore, this method is termed GKASA–DDPM. Furthermore, Savitzky-Golay smoothing mechanism is deployed in the post-processing procedure to adjust the predicted results according to practical observation. Multiple experiments are executed to exhibit the superior performance of the proposed method, involving seven models and observed hydrological data collected from Xiaoqing River basin above Huangtaiqiao Hydrological Station. Comprehensive results show that GKASA–DDPM achieves the highest prediction accuracy under all experimental conditions, over 0.9 when evaluated by Nash–Sutcliffe Efficiency (NSE).

  • A novel flood forecasting model based on Graph Kolmogorov–Arnold Attention and spatio-temporal attention under smoothing DDPM is proposed.

  • The proposed model achieves the highest forecast accuracy under all experimental conditions.

  • The proposed model also accurately captures peak flows and time-to-peak.

Flood forecasting, a crucial concern in hydrology, is an essential component of the basin flood early warning system. Accurate flood forecasting can provide scientific and effective support for water resources management and flood prevention (Xu et al. 2024). Extensive work has been done on flood forecasting in hydrology, which can be broadly classified into two main categories. The first one is the process-based hydrologic models, considering the physical mechanisms of hydrological processes. Although these models can offer strong interpretability, the modeling process heavily depends on accurate hydrological, topographic, and vegetation data (Wan et al. 2023). The second category encompasses the data-driven models, which aim to identify and extract latent patterns from hydrological data and apply them to future data. These models can avoid complex hydrological processes, including evapotranspiration, runoff generation, and concentration in basins, by establishing a direct mapping relationship between rainfall and runoff (Nourani et al. 2011; Shortridge et al. 2016; Van et al. 2020).

Therefore, many scholars have focused on data-driven models, including traditional time-series statistical models, machine learning (ML) models, and deep learning models (Khatun et al. 2023). Autoregressive Integrated Moving Average (ARIMA) is a commonly used example of traditional time-series statistical models in flood forecasting (He et al. 2019). However, the high nonlinearity and stochasticity inherent in hydrological time-series sequence may present a significant challenge for traditional statistical models to achieve accurate forecasting. Time-series modeling studies based on ML have demonstrated effective processing performance for large-scale, nonlinear time-series sequences (Shen 2018), and have been extensively employed in flood forecasting, including extreme learning machine (ELM) (Yaseen et al. 2016), support vector machine (SVM) (Bafitlhile & Li 2019), Adaptive Neuro-Fuzzy Inference System (ANFIS) (Zhou et al. 2019), etc. Dharmarathne et al. (2024) reviewed commonly used ML methods and compared the performance of urban flood forecasting models under climate change conditions. The study concluded that ML-driven warning systems can remain effective under constantly changing climate conditions. Madhushani et al. (2024) employed four ML methods – histogram gradient boosting (HGB), extreme gradient boosting (XGB), deep neural network (DNN), and convolutional neural network (CNN) – to predict streamflow in ungauged basins. The experiments revealed that XGB outperforms other models and holds particular significance for managing flood risk factors in urban areas. However, due to the limitations of architectures and the uncertainties of parameters, ML models are prone to underestimating peak flows and exhibiting delays in rainfall–runoff simulations (Li et al. 2024).

Recent studies increasingly apply deep learning to flood time-series forecasting. The long short-term memory (LSTM), with its gating mechanism, excels at capturing sequential dependencies, making it suitable for hydrological data with high autocorrelation and cross-correlation. Kratzert et al. (2018) demonstrated LSTM's effectiveness in rainfall–runoff modeling for the first time. Cui et al. (2022) enhanced LSTM with an encoder–decoder architecture, improving multi-step flood forecasting accuracy. Variants such as gate recurrent unit (GRU), bidirectional long short-term memory (BiLSTM) and bidirectional gate recurrent unit (BiGRU) have also been explored, with GRU simplifying LSTM's structure, and BiLSTM/BiGRU capturing bidirectional features to retain flood peak information and reduce time-delay questions. Miau & Hung (2020) combined the Convolutional Neural Networks model with GRU for effective abnormal water level detection. Cao et al. (2022) developed BiLSTM based on the sequence-to-sequence method for accurate streamflow forecasting. Bao et al. (2023) proposed a water-level-forecast model based on BiGRU to mitigate accuracy degradation over extended time spans. However, LSTM-based models have many limitations, including insufficient knowledge of the sample distribution and accumulation of forecast bias with time (Feng et al. 2021). Especially, LSTM and GRU are able to readily transmit the forecast bias from the preceding forecast horizons to the subsequent one via its recursive process, thereby precipitating a precipitous accumulation of forecast bias and seriously affecting forecast accuracy (Young et al. 2017; Zhou et al. 2019; Kurian et al. 2020).

Due to its flexible structure and attention mechanism, transformer excels at complex time-series forecasting tasks and shows promise in addressing temporal dependencies, outperforming LSTM in long-term flood forecasting. Double-encoder transformer by Liu et al. (2022) demonstrated superior performance over LSTM-based models. However, transformer's self-attention mechanisms, such as permutation invariance, can lead to temporal information loss, limiting its effectiveness in long-term forecasting (Li et al. 2024).

The employment of generative models to address the problem of long-term correlation in flood forecasting is a highly effective solution (Yang et al. 2020). The main benefit of using generative models is that with sufficient training, the model is allowed to capture the rainfall–runoff relationship over the whole period and converges to the actual model relatively quickly, thus enabling long-term accurate forecasts of floods (Shao et al. 2024). One of the most widely used generative models is the generative adversarial network (GAN) first developed by Goodfellow (Goodfellow et al. 2014). Cheng et al. (2021) established the deep convolutional generative adversarial network (DCGAN) and found that the model could grasp the complex streamflow features over the urban area and improve the forecast accuracy. However, GAN networks, due to their adversarial training nature, may encounter questions such as backward collapse and pattern collapse (Shao et al. 2024).

The denoising diffusion probabilistic model (DDPM) can effectively address the above challenges. Its potential has been explored in various fields, including image recognition (Dhariwal & Nichol 2021; Rombach et al. 2021), protein sequence analysis (Anand & Achim 2022), video synthesis (Harvey et al. 2022; Ho et al. 2022), and threat detection (Blau et al. 2022). After the introduction of diffusion models, significant progress has been made in the field of time-series prediction. Diffusion models excel at generating high-quality, complex sequences, including time-series and spatio-temporal data, by gradually removing noise to achieve detailed coherence (Yang et al. 2024). This capability has been demonstrated in studies such as TimeGrad and D3VAE (Chang et al. 2024). TimeGrad utilizes LSTM or GRU networks to constrain the diffusion process to hidden states extracted from historical data. It achieved exceptional performance by evaluating six commonly used time-series benchmarks on real-world datasets with thousands of correlated dimensions. D3VAE introduces a bidirectional variational auto-encoder that incorporates both diffusion and denoising processes. Extensive experiments on synthetic and real-world data validated that the proposed generative model achieved state-of-the-art performance compared to existing competitive generative models (Li et al. 2023). Since hydrological flood forecasting problems can be viewed as a subset of time-series forecasting, this study aims to model and simulate flow using DDPM models. For most models, successful training usually relies on having accurate and sufficient training data. In the context of flood forecasting in this study, it is crucial to accurately identify the features of hydrological data and effectively utilize them to train the diffusion model. The existing studies on DDPM in time-series forecasting are primarily concerned with capturing temporal dependency. Yan et al. (2021) proposed a multivariate probabilistic time-series forecasting framework based on continuous energy-based generative models (ScoreGrad). This model introduced a converter-based attention mechanism to build a time-series feature extraction module, which improves the ability to extract temporal features. Chang et al. (2024) proposed a Transformer-based diffusion probabilistic model for sparse time-series forecasting (TDSTF). This model merged a residual network based on Transformer and diffusion models to characterize complex temporal relationships and enhance computational efficiency. However, these models cannot simultaneously describe spatial and temporal dependency, thus failing to improve further hydrological data utilization and flood forecasting accuracy. In addition, the DDPM models are particularly susceptible to alterations in the noise scale and noise steps (Shao et al. 2024), which lead to poor performance in forecasting the recession limb and exhibit significant jagged fluctuations at the end of the flood hydrograph.

To solve the problems mentioned above, this paper proposes GKASA–DDPM, a novel flood forecasting model based on Graph Kolmogorov–Arnold Attention (GKAT) and spatio-temporal attention under smoothing DDPM. First, the model extends the basic concepts of Kolmogorov–Arnold Network (KAN) to graph attention (GAT) and incorporates an innovative attention mechanism, GKAT. It utilizes the characteristics of KAN to directly merge learnable functions into the edges of the graph, reducing the distribution differences among time periods that are easily caused by GAT. At the same time, a spatio-temporal two-dimensional (2D) attention is proposed under the original DDPM framework to extract effective information on flood time-series data from rainfall–runoff characteristics and different time steps at hydrological stations, which solves the problem that the original DDPM is unable to simultaneously describe spatial and temporal dependency, thus failing to improve further the utilization of hydrological data, and is beneficial for increasing model interpretability. In addition, the post-processing Savitzky-Golay smoothing mechanism is also introduced, considering the possibility of jagged fluctuations that might emerge in the recession limb.

In order to improve the forecasting performance of the original DDPM, three algorithms are included in GKASA–DDPM:

Introduction of GKAT: The proposed GKASA–DDPM model introduces GKAT to enhance the model's understanding of the spatial structure of time-series sequence data. This innovation improves the modeling capability of the model for hydrological data and addresses the question of insufficient feature information extraction.

Spatio-temporal 2D attention mechanism: Inspired by CSDI, GKASA–DDPM incorporates a 2D attention mechanism in its model structure. This approach enables the model to focus globally on input sequence information, significantly enhancing its ability to capture global dependencies and improving long-term flood forecasting performance.

Savitzky-Golay smoothing mechanism: This mechanism reduces jagged fluctuations introduced by the DDPM network, preserves peak flow features, and improves prediction accuracy, enhancing the model's performance over longer forecast horizons.

The rest of the paper is organized as follows. Section 2 introduces the structure of the proposed GKASA–DDPM. Section 3 describes the study basin and data. Section 4 evaluates the forecast performance in the Xiaoqing River basin above the Huangtaiqiao hydrological station in Jinan City and compares it with other six commonly used representative forecasting models. Section 5 discusses the advantages of the proposed model. Section 6 provides the summary of the research.

To more comprehensively capture and analyze the spatio-temporal relationships in the flood time-series data and achieve higher forecast accuracy, this paper proposes a novel flood forecasting model based on GKAT and spatio-temporal attention under smoothing DDPM (simplified as GKASA–DDPM) model for flood forecasting. The construction of GKASA–DDPM is illustrated in Figure 1. The basic framework of the proposed model is based on the DDPM model, and the specific content is introduced in Section 2.1. Figure 1(a)–1(c) reflects the modeling process of GKAT, spatio-temporal 2D attention mechanism and the Savitzky-Golay post-process, respectively. The improved part of the proposed model is organized according to the process described in Figure 1, and the specific content is introduced in Section 2.2.
Figure 1

The framework of the proposed GKASA–DDPM.

Figure 1

The framework of the proposed GKASA–DDPM.

Close modal

Denoising diffusion probabilistic model

The DDPM models are constituted by a forward diffusion process and a reverse diffusion process (Huang et al. 2024). The former defines a Markov chain that transforms the original hydrological data series to the Gaussian noise through the addition of random noise. The latter denoises the Gaussian noisy sequence and recovers the hydrological data series step by step. Finally, the DDPM networks compare the loss between the estimated noise generated by the U-Net model therein and the original random noise to gradually denoise the data, thus learning the characteristics of the original hydrological data and forecasting floods. Figure 2 shows the structure of DDPM. The solid arrows in Figure 2 indicate the forward process of the original data to the latent variable by continuously adding the random noise, and the dashed arrows represent the backward process of recovering the original data from Gaussian noise.
Figure 2

The structure of the DDPM model.

Figure 2

The structure of the DDPM model.

Close modal
The forward diffusion process is formulated as Equation (1).
(1)
where denotes the variance following a predefined schedule. denotes the original hydrological data series. K denotes the diffusion steps count. for denotes the latent variable. On this basis, the forward diffusion process uses a reparameterization trick to sample , as defined as Equations (2) and (3).
(2)
where and . Thus, can be directly defined as Equation (3).
(3)
where the noise vector .
The reverse diffusion process is formulated as Equation (4).
(4)
where . Specifically, denotes the mean value of the neural network. The model can be trained by a Variational Lower Bound (VLB) and minimize the loss function as defined as Equation (5).
(5)
where denotes the denoising function which estimates the by corresponding noisy input.

GKASA–DDPM

Although the classical DDPM framework can model the dynamic changes of flood time-series data, its effectiveness may be limited in the context of hydrological spatio-temporal information (Shao et al. 2024). Moreover, the DDPM models are particularly susceptible to alterations in the noise scale and noise steps, which lead to poor performance in forecasting the recession limb and exhibit significant jagged fluctuations at the end of the flood hydrograph. Based on this, addressing the specific characteristics of flood time-series data, this paper proposes an enhancement to the DDPM model, as detailed below. We provide the ablations in Supplementary material, Appendix C.

This study innovatively proposes a novel attention network, GKAT. It extends the principles of KAN to GAT. GAT is a deep learning model based on the graph convolutional network (GCN) (Bao et al. 2023). It incorporates the attention mechanism into the nodes of neural networks to extract the overall information without requiring knowledge of the entire network structure. This allows the model to focus on the essential differences between nodes, leading to enhanced performance and efficacy in handling large-scale databases. The adaptive learning of GAT enables it to identify and utilize the relationships between nodes.

Considering the direct and indirect relationship between the hydrological data series at each station, we assume that there are a total of N stations, and the input feature set in these stations can be expressed as , where denotes the feature vector of each station node. F denotes the feature dimensions of each station node. The attention coefficients between station node i and station node j can be defined in Equation (6).
(6)
where W denotes the matrix of feature weights shared by each layer. denotes the neighboring station nodes of station node i. denotes the weight vector which parametrizes a single-layer feedforward neural network. is the activation function (Veličković et al. 2017). denotes the concatenation operation, and denotes transposition. For each station node i, the feature vectors of its neighboring station nodes and the attention coefficients are weighted and summed to obtain a new feature vector , as defined in Equation (7).
(7)
where denotes the sigmoid activation function.
The multi-head attention mechanism improves the structural stability and expressiveness of the model (Wang et al. 2023). Multiple GAT networks with the same structure exist in a single network layer, and each attention head has an independent attention weight matrix. When using the multi-head attention mechanism, the attention of a node inside the graph relative to each neighboring node is computed using a similar approach to self-attention inside the transformer. Then, all the multi-head results are fused by concatenating or averaging to get the final node representation vector, as defined in Equation (8). The structure of the multi-head attention mechanism of the GAT network is shown in Figure 3.
(8)
where Wl denotes the weight matrix of the lth head. L denotes the count of heads of the attention mechanism. denotes the attention weight of the lth head.
Figure 3

The multi-head attention mechanism of GAT by node 1 (with heads).

Figure 3

The multi-head attention mechanism of GAT by node 1 (with heads).

Close modal

GAT primarily focuses on the connectivity of edges during feature learning, that is, whether a connection relationship exists between two features. However, the edge feature also includes weights, directions, and other attributes. GAT cannot capture complex feature relationships well. Additionally, GAT calculates a linear combination of features for each node as the final output. When performing GAT network operations on neighboring nodes, the outputs of neighboring nodes exhibit strong linear correlations with the output of its node, which limits its ability to learn complex nonlinear spatial correlations effectively.

KAN represents a new alternative to multi-layer perceptron (MLP), inspired by the Kolmogorov–Arnold representation theorem (Liu et al. 2024). In contrast to conventional MLP networks, which utilize fixed activation functions on neurons, each weight parameter in KAN is substituted with a learnable univariate activation function (De Carlo et al. 2024). The Kolmogorov–Arnold representation theorem represents a smooth function that can be expressed as Equation (9).
(9)
where each is a mapping from to , and each is a real-valued function. s denotes input dimensionality. On this basis, KAN for a deeper architecture is defined by a matrix of univariate functions, as defined in Equation (10).
(10)
where each function has trainable parameters. This methodology allows for the representation of intricate functional relationships through a series of transformations and summations.
GKAT extends the principles of KAN to GAT. In this model, before obtaining the new feature vector , the node embedding is basically generated by passing and summing the node embedding of the previous layers through KAN-Layers, which can be expressed as Equation (11).
(11)
The structure of GKAT is shown in Figure 4. This innovative GKAT structure, parameterized by KAN's spline function, enables localized weight adjustments, making the network more adaptable to different data patterns. GKAT enhances the modeling capability of GKASA–DDPM for hydrological data, reduces the distribution mismatch in flood time-series data caused by GAT, and identifies the strength of hydrological data correlations between different stations. In addition, the spline function used by GKAT enhances model interpretability. It provides clear insights into how inputs affect results, making hydrological models more transparent and offering valuable guidance for water resource management (Granata et al. 2024).
Figure 4

The structures of GKAT.

Figure 4

The structures of GKAT.

Close modal
Meanwhile, to better capture the spatio-temporal relationships in the flood time-series sequence, referring to previous research (Tashiro et al. 2021; Dai et al. 2024), a spatio-temporal 2D attention under DDPM is applied. GKASA–DDPM uses the 2D attention mechanism embedded by a temporal layer and a feature layer in each residual layer of the classical DDPM, as shown in Figure 5. It can be a plug-and-play module within DDPM. Different Transformers are adopted in different layers to characterize the spatio-temporal relationship of flood time-series sequence (Tashiro et al. 2021). Specifically, the temporal Transformer layer is configured to accept tensors representing each feature as inputs to learn temporal dependency. The feature Transformer layer is designed to accept tensors representing each time step to learn spatial (feature) dependency.
Figure 5

The structure of 2D attention mechanism.

Figure 5

The structure of 2D attention mechanism.

Close modal
Each transformer layer used in Figure 5 is a 1-layer Transformer-Encoder, including multi-head attention modules, residual connection modules, normalization modules, and feedforward modules. The input feature passes through multi-head attention modules to enhance the model's ability to capture complex relationships in the data, as Equation (12).
(12)
where and denote the extracted features in the subspace and final space, respectively. and denote the queries and key-value pairs in the subspace. denotes the normalization module. On this basis, a feedforward network is used to further extract features, as Equation (13).
(13)
where denotes two linear layers.

After going through all layers, the final learned features are mapped as the output of the encoder. This mechanism extracts effective information on flood time-series data from rainfall–runoff characteristics and different time steps at hydrological stations and employs temporal and feature weights in order to determine the comprehensive weights of input variables, thereby identifying and emphasizing the relatively important forecasting information and consequently improving the forecast accuracy.

Due to the iterative nature of its structure, the DDPM model is particularly susceptible to alterations in the noise scale and noise steps (Shao et al. 2024), which leads to poor performance in forecasting the recession limb and exhibits significant jagged fluctuations at the end of the flood hydrograph. Although these fluctuations do not affect the overall trend of the time-series forecast, they impact the forecast accuracy. In this experiment, as shown in Figure 1, at the end of the model, the output streamflow sequence data from the improved DDPM network is smoothed using the Savitzky-Golay smoothing mechanism. This reduces the jagged fluctuations introduced by the DDPM network, preserves the peak flow characteristics well, and improves the forecast accuracy.

The Savitzky-Golay smoothing mechanism is a method that smooths the curve based on the average trend of the time series. It uses a local polynomial least squares fitting in the time domain, which can simulate the long-term trend of the entire time-series data. This method filters out the fluctuations while preserving the shape of the signal.

The polynomial function is formulated as Equation (14).
(14)
where R denotes the polynomial order. denotes the polynomial coefficient. n denotes the discrete sequence value. The using least-square fit between the polynomial results and original signals is formulated as Equation (15).
(15)
where denotes the Savitzky-Golay smoothing mechanism parameter.
The fitting result using convolutional smoothing is formulated as Equation (16).
(16)
where denotes the filter impulse response.

Study area

The catchment area upstream of the Huangtaiqiao hydrological station (117°04′E, 36°44′N) is 321 km2, as shown in Figure 6. The basin belongs to the typical warm temperate continental monsoon climate. The average precipitation over the years is 580–750 mm. The rainy season generally concentrates from June to September, with July to August accounting for about 50% of the annual rainfall. The rainfall in this basin is often characterized by high intensity, short duration, and dramatic spatial and temporal variability. There are five rainfall stations (Liujiazhuang, Wujiapu, Donghongmiao, Xinglong, Yanzishan) and Huangtaiqiao hydrological station in the basin. The basin above the Huangtaiqiao station has a special shallow disk-shaped terrain, which makes it easy to further aggravate urban flooding by rapidly draining mountain floods into the urban area during heavy rainfall. The flood exhibits characteristics of both mountain and urban floods. This high complexity poses additional challenges for hydrological simulation and flood forecasting.
Figure 6

Location and stations of the study area. (red circles represent rainfall stations; red triangle represents hydrological station).

Figure 6

Location and stations of the study area. (red circles represent rainfall stations; red triangle represents hydrological station).

Close modal

Dataset and input-output selection

This research collects the observed hydrological data series from 1998 to 2021, which included precipitation data from five rainfall stations (Liujiazhuang, Wujiapu, Donghongmiao, Xinglong, Yanzishan) and runoff data from Huangtaiqiao hydrological station and selects discharge as the forecast target. 41 typical flood events, including single-peak and multipeak events, are extracted from the collected hydrological data. Among these flood events, 75% (30 events) are randomly selected as the training set, and the remaining 25% (11 events) are used as the testing set for evaluating model performance using the sliding window method. The 30 training events are processed by the sliding window method, with input-output applied to generate training samples. From these samples, 10% are randomly selected as the validation set, and the remaining 90% are employed for the training set. The training set is utilized to train the forecasting model, while the validation set is used to optimize hyperparameters. Finally, according to the flow confluence time of the basin, the precipitation and streamflow data for the were adopted as the inputs of the neural networks. The target outputs were the streamflow to the 3-h forecast horizons. We provide the rationale for the input-output prediction horizon in Supplementary material, Appendix A.

Evaluation metrics

The forecasting model is quantitatively assessed by widely used indices, including the Root Mean Square Error (RMSE), the Mean Absolute Error (MAE), and the Nash–Sutcliffe Efficiency (NSE), as defined in Equations (15)–(17). These five evaluation metrics provide a comprehensive flood forecasting performance. Firstly, this paper evaluates the performance of flood forecasting models on the entire dataset from a neural network perspective. Therefore, two commonly used evaluation metrics for neural network prediction, RMSE and MAE, are included. RMSE and MAE are commonly used to evaluate the bias between predicted and observed values (Zou et al. 2023). The flood process forecast can reflect the characteristics and trends of the flood process over time. NSE, an important hydrological index, is the most widely used metric to evaluate the fit between predicted and observed streamflow (Weng et al. 2023). The smaller the RMSE and MAE, the closer-to-1 the NSE, and the better the forecast performance. In addition to these indices, the paper also focused on peak flow and time-to-peak, which were particularly important in the engineering practice of flood forecasting. For small and medium-sized rivers, where floods rise and fall quickly, an accurate forecast of flood rise and peak flow is critical. Therefore, Relative Peak Error () and Peak Time Error () were adopted, as defined in Equations (18) and (19). The smaller the and , the better the forecast performance in capturing the flood rise and peak flows.
(17)
(18)
(19)
(20)
(21)
where and denote the observed and forecasted flows. denotes the mean observed flows; M represents the sample count; E represents the testing flood events count; and denote the observed and forecasted peak flows. and denote the time-to-peak of observations and forecasts.

To validate the effectiveness of the proposed model for flood forecasting, the following experiments are carried out in this paper. The proposed GKASA–DDPM model forecasting results are compared with those of other representative forecasting models, including LSTM, GRU, BiLSTM, BiGRU, Transformer, and Conditional Score-based Diffusion models for Imputation (CSDI) (Tashiro et al. 2021). LSTM, GRU, BiLSTM, and BiGRU are commonly used neural network time-series forecasting models based on recurrent neural networks. In these models, the number of hidden layer units is set to 64, the ReLU optimization algorithm is used, and the batch size is set to 32. Transformer is a neural network model that has recently been frequently used for flood forecasting. In this model, the ReLU optimization algorithm is used, and the batch size is set to 32. The original DDPM cannot be directly used for time-series forecasting. CSDI is a commonly used DDPM-based time-series forecasting model that utilizes score-based diffusion models conditioned on observed data. In this model, the diffusion step is set to 50, and the embedding dim is set to 128. We provide the reason for the choice of hyperparameters in Supplementary material, Appendix B.

Forecasting performance by evaluation metrics

Tables 13 shows the evaluation metrics at different output stages for seven models. It demonstrates that as the forecast step increases, the overall forecasting effectiveness of the models gradually declines. Compared with classic networks, the DDPM-based models (CSDI and GKASA–DDPM) have the least degraded forecast performance. In particular, the NSEs of LSTM, BiLSTM, and Transformer at decrease from 0.885, 0.898, and 0.897 to 0.698, 0.768, and 0.786, respectively, while the NSEs using DDPM-based models can achieve more than 0.883. At 2-h forecast horizon, although BiGRU has the highest NSE, the proposed GKASA–DDPM is close to it. The results suggest that the diffusion models significantly enhance the model's forecasting capacity over longer forecast horizons in this test case, thereby indicating that DDPM may have the potential to increase the stability of the forecasting model and is more suitable for flood forecasting.

Table 1

The evaluation indices of the different forecasting models (1-h forecast horizon)

GroupsModelsIndices
RMSEMAENSE
Conventional GRU 15.62 11.97 0.887 10.80% 0.89 
LSTM 16.21 12.64 0.885 12.18% 1.22 
BiGRU 15.29 11.39 0.965 10.19% 0.78 
BiLSTM 15.70 12.27 0.898 9.68% 1.11 
Advanced Transformer 8.92 6.02 0.897 7.54% 0.67 
CSDI 4.86 4.47 0.938 7.39% 0.30 
GKASA–DDPM 3.25 2.53 0.973 6.17% 0.20 
GroupsModelsIndices
RMSEMAENSE
Conventional GRU 15.62 11.97 0.887 10.80% 0.89 
LSTM 16.21 12.64 0.885 12.18% 1.22 
BiGRU 15.29 11.39 0.965 10.19% 0.78 
BiLSTM 15.70 12.27 0.898 9.68% 1.11 
Advanced Transformer 8.92 6.02 0.897 7.54% 0.67 
CSDI 4.86 4.47 0.938 7.39% 0.30 
GKASA–DDPM 3.25 2.53 0.973 6.17% 0.20 

The bolded values represent the best performance.

Table 2

The evaluation indices of the different forecasting models (2-h forecast horizon)

GroupsModelsIndices
RMSEMAENSE
Conventional GRU 16.06 12.10 0.859 14.32% 1.67 
LSTM 17.74 12.55 0.860 15.14% 2.00 
BiGRU 16.47 12.12 0.941 12.03% 1.33 
BiLSTM 15.71 12.57 0.880 8.45% 1.22 
Advanced Transformer 9.77 6.06 0.819 12.05% 1.00 
CSDI 4.95 5.59 0.905 8.42% 0.30 
GKASA–DDPM 3.84 2.91 0.936 7.99% 0.30 
GroupsModelsIndices
RMSEMAENSE
Conventional GRU 16.06 12.10 0.859 14.32% 1.67 
LSTM 17.74 12.55 0.860 15.14% 2.00 
BiGRU 16.47 12.12 0.941 12.03% 1.33 
BiLSTM 15.71 12.57 0.880 8.45% 1.22 
Advanced Transformer 9.77 6.06 0.819 12.05% 1.00 
CSDI 4.95 5.59 0.905 8.42% 0.30 
GKASA–DDPM 3.84 2.91 0.936 7.99% 0.30 

The bolded values represent the best performance.

Table 3

The evaluation indices of the different forecasting models (3-h forecast horizon)

GroupsModelsIndices
RMSEMAENSE
Conventional GRU 17.19 12.62 0.791 15.79% 1.78 
LSTM 18.15 13.18 0.698 25.49% 2.00 
BiGRU 16.75 12.14 0.874 12.95% 1.67 
BiLSTM 17.58 13.20 0.768 14.30% 1.78 
Advanced Transformer 14.66 10.48 0.786 12.84% 1.67 
CSDI 6.83 5.66 0.883 8.39% 0.60 
GKASA–DDPM 3.98 3.20 0.900 8.17% 0.50 
GroupsModelsIndices
RMSEMAENSE
Conventional GRU 17.19 12.62 0.791 15.79% 1.78 
LSTM 18.15 13.18 0.698 25.49% 2.00 
BiGRU 16.75 12.14 0.874 12.95% 1.67 
BiLSTM 17.58 13.20 0.768 14.30% 1.78 
Advanced Transformer 14.66 10.48 0.786 12.84% 1.67 
CSDI 6.83 5.66 0.883 8.39% 0.60 
GKASA–DDPM 3.98 3.20 0.900 8.17% 0.50 

The bolded values represent the best performance.

Compared with the best forecasting performance of recurrent neural networks BiGRU, Transformer reduces the RMSE by 41.66% and the MAE by 47.15% at 1-h forecast horizon. The finding reveals that Transformer is a more suitable choice for short-term flood forecasting. Its unique multi-head attention mechanism structure and hyper-parameter optimization algorithm enable it to demonstrate robust forecasting capabilities within minor times. However, at 3-h forecast horizon, the forecast accuracy of Transformer decreases significantly, with RMSE, MAE, and NSE of 14.66, 10.48, and 0.786, respectively, which indicates that the long-term forecasting stability of Transformer in flood forecasting needs to be improved.

Notably, the GKASA–DDPM model achieves at least a 22% lower RMSE and a 43% lower MAE, compared to the CSDI model. Meanwhile, the GKASA–DDPM model's NSE is 4% higher than that of the CSDI model. The results indicated that the GKASA–DDPM model demonstrates superior performance in enhancing the precision of flood forecasting in comparison to the CSDI model, particularly in terms of the MAE metric, which exhibited a notable advantage. MAE considers the absolute error between the observed and predicted streamflow and is unaffected by outliers, so it can provide an accurate evaluation of forecasting network robustness. It could be concluded that the performance of the GKASA–DDPM model is better than that of the original DDPM-based time-series forecasting model in the field of flood forecasting.

The last two columns of the above Tables show the evaluation results of time-to-peak and peak flows for different models. For time-to-peak, the proposed GKASA–DDPM could substantially reduce the and accurately capture the time-to-peak. In contrast, the of the LSTM-based models gradually increased with the forecast hour. This suggested that phase bias was unavoidable in flood forecasting using LSTM-based models, particularly when considering longer forecast periods. While Transformer shows a relatively robust performance in forecasting short-term time-to-peak, its efficacy decreases significantly at 3-h forecast horizon, resulting in substantial . As the forecast step increased, the time-to-peak of the CSDI model slightly lagged. For peak flows, the evaluation results reveal that the proposed GKASA–DDPM could facilitate a more centralized distribution and a reduction in . As the forecast step increased, although the error distribution exhibited a gradual dispersion, the remained the smallest among all models. In contrast, LSTM has the highest , while the proposed GKASA–DDPM model has the lowest . It could be seen that the LSTM-based models severely underestimated peak flows, and the decay in forecast accuracy increased with the forecast hour. Transformer also demonstrates a tendency to underestimate peak flows. In summary, the minimal and highlight the superior flood peak performance of the proposed GKASA–DDPM.

The scatter plots of the seven forecasting models' results are shown in Figure 7. The scattered points of commonly used forecasting models under high-flow conditions are notable below the 1:1 ideal line, while the CSDI and GKASA–DDPM models are above this line. The divergence in the scattered points among those seven models becomes increasingly evident with the forecast hour. Among them, the LSTM model produces the most dispersed points, exhibits a severe underestimation of the streamflow, and has the worst forecasting performance, especially at 3-h forecast horizon. Transformer performs well at 1-h forecast horizon, providing its suitability for short-term flood forecasting. However, the scattered points at 3-h forecast horizon deviate significantly from the 1:1 ideal line, indicating that Transformer still needs improvement in medium and long-term flood forecasting. The forecasting scattered points of the CSDI and GKASA–DDPM models are close to the 1:1 ideal line at 1-h forecast horizon, with the scattered points of the CSDI model gradually displaying a deviation from this line with the forecast hour. It can be concluded that the GKASA–DDPM model has the most optimal forecast performance and can effectively minimize the forecast error over longer forecast periods, followed by CSDI and Transformer, and LSTM has the worst forecasting accuracy.
Figure 7

Scatter plots of the different forecasting models: (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.

Figure 7

Scatter plots of the different forecasting models: (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.

Close modal

Forecasting performance of representative flood events

To further characterize the applicability of these seven models in flood forecasting, two representative flood events are selected from the test set: the flood event (in 1998), which includes three consecutive flood peaks and recession processes with consecutive simulation forecast times of more than 300 @@h, and the flood event (in 2018) event, characterized by a short-duration single-peak flood. The forecasting processes of the flood events are shown in Figures 8 and 9.
Figure 8

The forecasting process of the flood event (in 1998): (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.

Figure 8

The forecasting process of the flood event (in 1998): (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.

Close modal
Figure 9

The forecasting process of the flood event (in 2018): (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.

Figure 9

The forecasting process of the flood event (in 2018): (a) 1-h forecast horizon; (b) 2-h forecast horizon; (c) 3-h forecast horizon.

Close modal

For the flood event (in 1998), the proposed GKASA–DDPM model performs best, followed by the CSDI model. These two models have similar errors in the second and third forecasting peak floods, with GKASA–DDPM slightly better than CSDI. However, GKASA–DDPM is closer to the observed streamflow than CSDI in the first forecasting peak flood, indicating that the GKASA–DDPM model exhibits superior performance in terms of time-to-peak, peak flow, and flood processes. Among other models, the LSTM model exhibits significant fluctuations, produces unreasonable flood processes, and has the worst forecast performance. The BiLSTM model significantly overestimates peak flows, potentially leading to suboptimal flood control decisions. Compared to the LSTM-based model, Transformer's forecasting results are relatively good, but there is still a problem of underestimating peak flow. In addition, due to the application of the Savitzky-Golay smoothing mechanism, GKASA–DDPM mitigates the jagged fluctuations in the recession limb and fits the observed streamflow than CSDI.

For the flood event (in 2018), the observed flood hydrograph demonstrates a rapid increase, which can be attributed to the relative concentration of precipitation. The GKASA–DDPM model demonstrates strong performance in peak flow and time-to-peak forecasting. Although the CSDI model accurately captures time-to-peak, it underestimates the peak flow, performs poorly in forecasting the recession limb, and exhibits significant jagged fluctuations at the end of the flood hydrograph. Among other models, LSTM severely underestimates the peak flow and produces irregular fluctuations at the peak flow, which leads to its inability to forecast a reasonable shape of the flood. The BiLSTM model and BiGRU model can estimate the peak flow at 1-h forecast horizon, but the accuracy of the forecasting peak flow shows a notable decline with the forecast hour, resulting in relatively low forecast performance. The time-to-peak of the Transformer model slightly lags. In summary, the proposed GKASA–DDPM model exhibits proficiency in forecasting the flood rise and receding process, thereby providing more accurate forecast results and well-suited flood forecasting.

This experiment compares the forecasting performance of LSTM, GRU, BiLSTM, BiGRU, Transformer, CSDI, and GKASA–DDPM in the Xiaoqing River basin above the Huangtaiqiao hydrological station.

Traditional neural networks use LSTM-based gating mechanisms to mitigate vanishing gradients and learn time-series patterns, achieving strong performance. However, LSTM and GRU rely on retaining input information at each time step, leading to error accumulation over longer forecast horizons. Transformer, while effective at capturing long-term dependencies through self-attention, suffers from temporal information loss due to properties like permutation invariance, resulting in poor long-term performance. Although CSDI prevents error accumulation, its performance is limited by boundary inconsistencies and sensitivity to noise scale and steps.

Compared to the LSTM, GRU, BiLSTM, BiGRU, Transformer, and CSDI, the advantages of GKASA–DDPM in flood forecasting mainly include the following three points:

Flood peak performance: GKASA–DDPM significantly enhances the ability to capture global dependencies and extract features in the flood time-series sequence, which prevents LSTM-based models from selectively discarding noisy data through forgetting gates and allows the information at the flood peaks to be retained and more fully exploited. This not only improves the accuracy of the flood peak flow but also overcomes the time-delay problem to some extent. Tables 1–3 reveal that the GKASA–DDPM model has the lowest and . It can be concluded that the GKASA–DDPM model is more suitable for flood forecasting.

Long-term performance: GKASA–DDPM combines GKAT with DDPM for flood forecasting. It enhances the understanding of spatial features in high-dimensional rainfall–runoff data and addresses the question of insufficient feature information extraction. Additionally, GKASA–DDPM incorporates a spatio-temporal 2D attention mechanism in its model structure. This approach enables the model to focus globally on input sequence information, significantly enhancing its ability to capture global dependencies and improving long-term flood forecasting performance. Finally, a key challenge in long-term prediction is noise. GKASA–DDPM employs the Savitzky-Golay smoothing mechanism to reduce the impact of noise on the model. As shown in Figure 7, the scattered points of the other models gradually display a deviation from the 1:1 ideal line with the forecast hour, especially at 3-h forecast horizon. It can be concluded that the GKASA–DDPM model has the most optimal forecast performance and can effectively minimize the forecast error over longer forecast periods.

Smoothing performance: Under the introduction of the Savitzky-Golay smoothing mechanism, the proposed GKASA–DDPM model mitigates the potential for jagged fluctuations that may emerge at the end of the flood hydrograph, thereby enhancing the forecasting precision. As shown in Figures 8 and 9, the GKASA–DDPM is significantly smooth in the recession limb and more fitted to the observed streamflow than the other models.

It is acknowledged that GKASA–DDPM has high computational complexity. GKASA–DDPM consumes over 0.1 billion Multiply-Accumulate (MAC)s per batch during inference and has 0.28 million parameters. Given that our model's predictions are primarily executed on high-performance computing systems and computational capabilities continue to advance, the computational complexity of GKASA–DDPM is not a significant long-term challenge.

This paper proposes GKASA–DDPM, a novel flood forecasting model based on GKAT and spatio-temporal attention under smoothing DDPM for improving flood forecasting accuracy. To evaluate the efficacy of the proposed model GKASA–DDPM, the paper takes the Xiaoqing River basin above the Huangtaiqiao hydrological station in Jinan City as the study area and conducts a series of comparative experiments with various advanced forecasting models, including LSTM, GRU, BiLSTM, BiGRU, Transformer, and CSDI. These experiments evaluate the forecasting performance of the models from different perspectives, such as forecast accuracy, flood peaks, and flood processes. Comparative experimental results show that GKASA–DDPM can significantly reduce forecast error and improve accuracy. Furthermore, the results indicate that GKASA–DDPM accurately captures peak flows and peak times.

It should be noted that certain limitations are associated with this study. Comparative experiments only consider the Xiaoqing River basin above the Huangtaiqiao hydrological station as the study area. Future research can explore the efficacy of the GKASA–DDPM model in global river basins, including regions with high streamflow variability and arid areas. Additionally, this study only uses precipitation and runoff data from hydrological stations, future work can incorporate meteorological data from satellite observations and evaporation data to enhance the model's performance further.

This study was supported by the National Key Research and Development Program of China (2022YFC3005501), the MWR Major Science & Technology Program (SKS-2022007) and the IWHR Research & Development Support Program (WH0145B022021, JZ110145B0062024).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Anand
N.
&
Achim
T.
(
2022
)
Protein structure and sequence generation with equivariant denoising diffusion probabilistic models
.
arXiv preprint arXiv:2205.15019.
Bao
K.
,
Bi
J.
,
Ma
R.
,
Sun
Y.
,
Zhang
W.
&
Wang
Y
. (
2023
)
A spatial-reduction attention-based BiGRU network for water level prediction
.
Water
,
15
(
7
),
1306
.
https://doi.org/10.3390/w15071306
.
Blau
T.
,
Ganz
R.
,
Kawar
B.
,
Bronstein
A.
&
Elad
M
. (
2022
)
Threat model-agnostic adversarial defense using diffusion models
.
arXiv preprint arXiv
:
2207
,
08089
.
Cao
Q.
,
Zhang
H.
,
Zhu
F.
,
Hao
Z.
&
Yuan
F
. (
2022
)
Multi-step-ahead flood forecasting using an improved BiLSTM-S2S model
.
Journal of Flood Risk Management
,
15
(
4
),
e12827
.
https://doi.org/10.1111/jfr3.12827
.
Chang
P.
,
Li
H.
,
Quan
S. F.
,
Lu
S.
,
Wung
S. F.
,
Roveda
J.
&
Li
A
. (
2024
)
A transformer-based diffusion probabilistic model for heart rate and blood pressure forecasting in intensive care unit
.
Computer Methods and Programs in Biomedicine
,
246
,
108060
.
https://doi.org/10.1016/j.cmpb.2024.108060
.
Cheng
M.
,
Fang
F.
,
Navon
I. M.
&
Pain
C. C
. (
2021
)
A real-time flow forecasting with deep convolutional generative adversarial network: Application to flooding event in Denmark
.
Physics of Fluids
,
33
(
5
).
https://doi.org/10.1063/5.0051213
.
Cui
Z.
,
Zhou
Y.
,
Guo
S.
,
Wang
J.
&
Xu
C. Y
. (
2022
)
Effective improvement of multi-step-ahead flood forecasting accuracy through encoder-decoder with an exogenous input structure
.
Journal of Hydrology
,
609
,
127764
.
https://doi.org/10.1016/j.jhydrol.2022.127764
.
Dai
Z.
,
Getzen
E.
&
Long
Q
. (
2024
)
SADI: Similarity-Aware Diffusion Model-Based Imputation for Incomplete Temporal EHR Data
.
In
:
International Conference on Artificial Intelligence and Statistics
.
PMLR, pp. 4195–4203
.
De Carlo
G.
,
Mastropietro
A.
&
Anagnostopoulos
A.
(
2024
)
Kolmogorov-arnold graph neural networks
. arXiv preprint arXiv:2406.18354.
Dhariwal
P.
&
Nichol
A
. (
2021
)
Diffusion models beat gans on image synthesis
.
Advances in neural information processing systems
,
34
,
8780
8794
.
Dharmarathne
G.
,
Waduge
A. O.
,
Bogahawaththa
M.
,
Rathnayake
U.
&
Meddage
D. P. P
. (
2024
)
Adapting cities to the surge: A comprehensive review of climate-induced urban flooding
.
Results in Engineering
,
102123
.
https://doi.org/10.1016/j.rineng.2024.102123
.
Feng
J.
,
Wang
Z.
,
Wu
Y.
&
Xi
Y
. (
2021
)
Spatial and temporal aware graph convolutional network for flood forecasting
.
In
2021
International joint conference on neural networks (IJCNN). IEEE, pp. 1-8. https://doi.org/10.1109/IJCNN52387.2021.9533694
.
Goodfellow
I.
,
Pouget-Abadie
J.
,
Mirza
M.
,
Xu
B.
,
Warde-Farley
D.
,
Ozair
S.
,
Courville
A.
&
Bengio
Y
. (
2014
)
Generative adversarial nets
.
Advances in neural information processing systems
,
27
.
Granata
F.
,
Zhu
S.
&
Di Nunno
F.
(
2024
)
Advanced streamflow forecasting for Central European Rivers: the cutting-edge Kolmogorov-Arnold networks compared to Transformers
.
Journal of Hydrology
,
645
,
132175
.
https://doi.org/10.1016/j.jhydrol.2024.132175
.
Harvey
W.
,
Naderiparizi
S.
,
Masrani
V.
,
Weilbach
C.
&
Wood
F
. (
2022
)
Flexible diffusion modeling of long videos
.
Advances in Neural Information Processing Systems
,
35
,
27953
27965
.
He
X.
,
Luo
J.
,
Zuo
G.
&
Xie
J
. (
2019
)
Daily runoff forecasting using a hybrid model based on variational mode decomposition and deep neural networks
.
Water resources management
,
33
,
1571
1590
.
https://doi.org/10.1007/s11269-019-2183-x
.
Ho
J.
,
Salimans
T.
,
Gritsenko
A.
,
Chan
W.
,
Norouzi
M.
&
Fleet
D. J
. (
2022
)
Video diffusion models
.
Advances in Neural Information Processing Systems
,
35
,
8633
8646
.
Huang
L.
,
Li
P.
,
Gao
Q.
,
Liu
G.
,
Luo
Z.
&
Li
T
. (
2024
)
Diffusion probabilistic model for bike-sharing demand recovery with factual knowledge fusion
.
Neural Networks
,
179
,
106538
.
https://doi.org/10.1016/j.neunet.2024.106538
.
Khatun
A.
,
Chatterjee
C.
,
Sahu
G.
&
Sahoo
B
. (
2023
)
A novel smoothing-based long short-term memory framework for short-to medium-range flood forecasting
.
Hydrological Sciences Journal
,
68
(
3
),
488
506
.
https://doi.org/10.1080/02626667.2023.2173012
.
Kratzert
F.
,
Klotz
D.
,
Brenner
C.
,
Schulz
K.
&
Herrnegger
M
. (
2018
)
Rainfall–runoff modelling using long short-term memory (LSTM) networks
.
Hydrology and Earth System Sciences
,
22
(
11
),
6005
6022
.
https://doi.org/10.5194/hess-22-6005-2018
.
Kurian
C.
,
Sudheer
K. P.
,
Vema
V. K.
&
Sahoo
D
. (
2020
)
Effective flood forecasting at higher lead times through hybrid modelling framework
.
Journal of Hydrology
,
587
,
124945
.
https://doi.org/10.1016/j.jhydrol.2020.124945
.
Li
Y.
,
Lu
X.
,
Wang
Y.
&
Dou
D
. (
2022
)
Generative time series forecasting with diffusion, denoise, and disentanglement
.
Advances in Neural Information Processing Systems
,
35
,
23009
23022
.
Li
W.
,
Liu
C.
,
Xu
Y.
,
Niu
C.
,
Li
R.
,
Li
M.
,
Hu
C.
&
Tian
L
. (
2024
)
An interpretable hybrid deep learning model for flood forecasting based on Transformer and LSTM
.
Journal of Hydrology: Regional Studies
,
54
,
101873
.
https://doi.org/10.1016/j.ejrh.2024.101873
.
Liu
C.
,
Liu
D.
&
Mu
L
. (
2022
)
Improved transformer model for enhanced monthly streamflow predictions of the Yangtze River
.
Ieee Access
,
10
,
58240
58253
.
https://doi.org/10.1109/ACCESS.2022.3178521
.
Liu
Z.
,
Wang
Y.
,
Vaidya
S.
,
Ruehle
F.
,
Halverson
J.
,
Soljačić
M.
,
Hou
T.
&
Tegmark
M.
(
2024
)
Kan: Kolmogorov-arnold networks
. arXiv preprint arXiv:2404.19756.
Madhushani
C.
,
Dananjaya
K.
,
Ekanayake
I. U.
,
Meddage
D. P. P.
,
Kantamaneni
K.
&
Rathnayake
U
. (
2024
)
Modeling streamflow in non-gauged watersheds with sparse data considering physiographic, dynamic climate, and anthropogenic factors using explainable soft computing techniques
.
Journal of Hydrology
,
631
,
130846
.
https://doi.org/10.1016/j.jhydrol.2024.130846
.
Miau
S.
&
Hung
W. H
. (
2020
)
River flooding forecasting and anomaly detection based on deep learning
.
Ieee Access
,
8
,
198384
198402
.
https://doi.org/10.1109/ACCESS.2020.3034875
.
Nourani
V.
,
Kisi
Ö.
&
Komasi
M
. (
2011
)
Two hybrid artificial intelligence approaches for modeling rainfall–runoff process
.
Journal of Hydrology
,
402
(
1-2
),
41
59
.
https://doi.org/10.1016/j.jhydrol.2011.03.002
.
Rombach
R.
,
Blattmann
A.
,
Lorenz
D.
,
Esser
P.
&
Ommer
B
. (
2022
)
High-resolution image synthesis with latent diffusion models
.
In
:
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
. pp.
10684
10695
.
Shao
P.
,
Feng
J.
,
Lu
J.
,
Zhang
P.
&
Zou
C
. (
2024
)
Data-driven and knowledge-guided denoising diffusion model for flood forecasting
.
Expert Systems with Applications
,
244
,
122908
.
https://doi.org/10.1016/j.eswa.2023.122908
.
Shen
C
. (
2018
)
A transdisciplinary review of deep learning research and its relevance for water resources scientists
.
Water Resources Research
,
54
(
11
),
8558
8593
.
https://doi.org/10.1029/2018WR022643
.
Shortridge
J. E.
,
Guikema
S. D.
&
Zaitchik
B. F
. (
2016
)
Machine learning methods for empirical streamflow simulation: a comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds
.
Hydrology and Earth System Sciences
,
20
(
7
),
2611
2628
.
https://doi.org/10.5194/hess-20-2611-2016
.
Tashiro
Y.
,
Song
J.
,
Song
Y.
&
Ermon
S
. (
2021
)
Csdi: Conditional score-based diffusion models for probabilistic time series imputation
.
Advances in neural information processing systems
,
34
,
24804
24816
.
Van
S. P.
,
Le
H. M.
,
Thanh
D. V.
,
Dang
T. D.
,
Loc
H. H.
&
Anh
D. T
. (
2020
)
Deep learning convolutional neural network in rainfall–runoff modelling
.
Journal of Hydroinformatics
,
22
(
3
),
541
561
.
https://doi.org/10.2166/hydro.2020.095
.
Veličković
P.
,
Cucurull
G.
,
Casanova
A.
,
Romero
A.
,
Lio
P.
&
Bengio
Y.
(
2017
)
Graph attention networks
. arXiv preprint arXiv:1710.10903.
Wan
X.
,
Wu
Q.
,
Cao
Z.
&
Wu
Y
. (
2023
)
Real-time flood forecasting based on a general dynamic neural network framework
.
Stochastic Environmental Research and Risk Assessment
,
37
(
1
),
133
151
.
https://doi.org/10.1007/s00477-022-02271-6
.
Wang
Y. Y.
,
Wang
W.
,
Chau
K. W.
,
Xu
D. M.
,
Zang
H. F.
,
Liu
C. J.
&
Ma
Q
. (
2023
)
A new stable and interpretable flood forecasting model combining multi-head attention mechanism and multiple linear regression
.
Journal of hydroinformatics
,
25
(
6
),
2561
2588
.
https://doi.org/10.2166/hydro.2023.160
.
Weng
P.
,
Tian
Y.
,
Liu
Y.
&
Zheng
Y
. (
2023
)
Time-series generative adversarial networks for flood forecasting
.
Journal of Hydrology
,
622
,
129702
.
https://doi.org/10.1016/j.jhydrol.2023.129702
.
Xu
Y.
,
Hu
C.
,
Wu
Q.
,
Li
Z.
,
Jian
S.
&
Chen
Y
. (
2021
)
Application of temporal convolutional network for flood forecasting
.
Hydrology Research
,
52
(
6
),
1455
1468
.
https://doi.org/10.2166/nh.2021.021
.
Yan
T.
,
Zhang
H.
,
Zhou
T.
,
Zhan
Y.
&
Xia
Y.
(
2021
)
Scoregrad: Multivariate probabilistic time series forecasting with continuous energy-based generative models
. arXiv preprint arXiv:2106.10121.
Yang
L.
,
Zhang
D.
&
Karniadakis
G. E
. (
2020
)
Physics-informed generative adversarial networks for stochastic differential equations
.
SIAM Journal on Scientific Computing
,
42
(
1
),
A292
A317
.
https://doi.org/10.1137/18M1225409
.
Yang
Y.
,
Jin
M.
,
Wen
H.
,
Zhang
C.
,
Liang
Y.
,
Ma
L.
,
Wang
Y.
,
Liu
C.
,
Yang
B.
,
Xu
Z.
,
Bian
J.
,
Pan
S.
&
Wen
Q
. (
2024
)
A survey on diffusion models for time series and spatio-temporal data
.
arXiv preprint arXiv
:2404.
18886
.
Yaseen
Z. M.
,
Jaafar
O.
,
Deo
R. C.
,
Kisi
O.
,
Adamowski
J.
,
Quilty
J.
&
El-Shafie
A
. (
2016
)
Stream-flow forecasting using extreme learning machines: a case study in a semi-arid region in Iraq
.
Journal of Hydrology
,
542
,
603
614
.
https://doi.org/10.1016/j.jhydrol.2016.09.035
.
Zhou
Y.
,
Guo
S.
&
Chang
F. J
. (
2019
)
Explore an evolutionary recurrent ANFIS for modelling multi-step-ahead flood forecasts
.
Journal of hydrology
,
570
,
343
355
.
https://doi.org/10.1016/j.jhydrol.2018.12.040
.
Zou
Y.
,
Wang
J.
,
Lei
P.
&
Li
Y
. (
2023
)
A novel multi-step ahead forecasting model for flood based on time residual LSTM
.
Journal of Hydrology
,
620
,
129521
.
https://doi.org/10.1016/j.jhydrol.2023.129521
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data