An accurate rainfall–runoff observation is critical for giving a warning of a potential damage early enough to allow appropriate response to the disaster. The long short-term memory (LSTM)-based rainfall–runoff model has been proven to be effective in runoff prediction. Previous research has typically utilized multiple information sources as the LSTM training data. However, when there are many sequences of input data, the LSTM cannot get nonlinear valid information between consecutive data. In this paper, a novel informer neural network using empirical wavelet transform (EWT) was first proposed to predict the runoff based only on the single rainfall data. The use of EWT reduced the non-linearity and non-stationarity of runoff data, which increased the accuracy of prediction results. In addition, the model introduced the Fractal theory to divide the rainfall and runoff into three parts, by which the interference caused by excessive data fluctuations could be eliminated. Using 15-year precipitation from the GPM satellite and runoff from the USGS, the model performance was tested. The results show that the EWT_Informer model outperforms the LSTM-based models for runoff prediction. The PCC and training time in EWT_Informer were 0.937, 0.868, and 1 min 3.56 s, respectively, while those provided by the LSTM-based model were 0.854, 0.731, and 4 min 25.9 s, respectively.

  • The informer network was innovatively introduced into rainfall–runoff prediction, which reduced time and spatial complexity.

  • The empirical wavelet transform was utilized to enhance the treatment of non-linearity and non-stationarity.

  • Fractal theory was applied to eliminate the interference caused by excessive data fluctuations.

Hydrological research has long emphasized the development and utilization of watershed models that depict the precipitation to stream flow and channel runoff processes (Jakeman & Hornberger 1993). The precision and promptness of rainfall observations are critical for constructing these watershed models in near real-time. Unfortunately, in most countries, catchment basins are sparsely gauged without accurate and adequate rainfall and runoff measurements by ground-based observations (e.g. rainfall gauge, radar or other devices) (Jeong & Kim 2005; Brocca et al. 2020).

Satellites offer an ideal platform for global precipitation measurement (GPM) due to their unique attributes. Satellite rainfall data are useful for predicting flood events and conducting runoff analysis. These ‘data from the sky’ are (i) easily available on a global basis from the internet; (ii) virtually uninterrupted supply of rainfall data for maintaining the functionality of operational land-based systems during catastrophic situations that can temporarily shut down ground networks (e.g., overland effects of hurricanes/earthquakes/tsunamis); (iii) the availability of basin-wide rainfall data in transboundary river basins where riparian nations lack treaties for real-time rainfall data sharing across political boundaries (Harris et al. 2007).

Nerini et al. (2015) have demonstrated the successful use of satellite data in conceptual rainfall–runoff (CRR) models for flood prediction (Najmaddin et al. 2017). CRR models typically require a substantial amount of input, such as hydrological data, meteorological data, topographic data, soil characteristics, and vegetation coverage, in addition to sophisticated mathematical tools and significant user expertise, which limits their application in the prediction of sequence data with the unknown limited quasi-periodic dynamics (Wood et al. 2011; Sang 2013). Most recently, due to its capability to process long sequence data (Bengio et al. 1994), long short-term memory (LSTM) networks (a form of deep learning) have successfully applied in rainfall–runoff prediction tasks (Fang et al. 2017; Kratzert et al. 2018; Gao et al. 2020; Xiang et al. 2020; Yokoo et al. 2022), which establish a direct mapping between input rainfall and output runoff without requiring knowledge of the physical processes.

However, the LSTM model, despite taking long time series as input, has been shown to struggle with capturing long-term dependencies, which can reduce prediction performance (Zhao et al. 2020). Additionally, since runoff data are non-stationary and nonlinear time series, the accuracy is often low without proper data preprocessing. To address these issues, many studies incorporate data such as wind speed and soil moisture into the LSTM network to improve prediction accuracy (Gauch et al. 2021; Lees et al. 2021; Mao et al. 2021; Xie et al. 2021; Katipoğlu 2023). Unfortunately, the LSTM network cannot obtain nonlinear valid information between consecutive data in handling large amounts of input data (Wu et al. 2022). Not only the non-stationarity of runoff data has been unsolved, but also it introduced more non-stationary data, increasing the time for data collection and preprocessing. Time series forecasting methods based on encoder–decoder structure, such as the transformer model, used the attention mechanism to reduce the maximum length of network signals and avoid cycle structure, resulting in better performance than the LSTM model for capturing long-term dependencies (Vaswani et al. 2017). However, training time for these models, especially for rainfall–runoff forecasting, can be lengthy.

To tackle the above challenges, Zhou et al. (2021) proposed the Informer model based on transformer architecture, which incorporates a multi-head PorbSparse self-attention mechanism to reduce the complexity and memory usage. The Informer model has been applied in wind power prediction (Wang et al. 2022), power load prediction (Liu et al. 2022), and so on. In this study, we innovatively applied the Informer model to rainfall–runoff prediction tasks and used the empirical wavelet transform (EWT) method to decompose the runoff time series into several more predictable components with lower non-stationarity. Then, we evaluated the new model using extraction of GPM imagery and data downloaded from the United States Geological Survey (USGS) website. Moreover, to mitigate the impact of data input fluctuations on prediction accuracy, we divided the runoff data using the Fractal theory.

In this paper, we address several challenges in the time series forecasting of rainfall–runoff, including the problems of long-term dependencies, time and space complexity, as well as non-stationary and non-linearity of data. The main contributions of this paper are summarized as follows:

  • (1)

    The Informer network was innovatively introduced into rainfall–runoff prediction, which reduced time and spatial complexity.

  • (2)

    The EWT Informer model was proposed, combining EWT with the Informer network, to improve the non-linearity and non-stationarity of runoff data for more accurate and efficient predictions.

  • (3)

    To eliminate the interference caused by excessive data fluctuations, the Fractal theory was utilized to divide the data of each year into several parts.

Study area

The Trinity River, a hydrologic feature that spans a length of 715 miles, flows entirely within the confines of Texas. The river and its accompanying basin support over one-fifth of the state's populace and encompass a geographical area that surpasses that of nine states within the United States. The Trinity River consists of four primary tributaries, namely, the Clear Fork, the Elm Fork, the West Fork, and the East Fork.

The East Fork of the Trinity River originates in central Grayson County, Texas, at coordinates 33–32′ N, 96–41′ W, and meanders for a distance of 78 miles toward the southwest, crossing central Collin, western Rockwall, eastern Dallas, and western Kaufman counties before merging with the West Fork in the southwestern part of Kaufman County, located at 32 29′ N, 96 30′ W. The river's flow is reliant on dam releases from Lake Ray Hubbard or precipitation from recent heavy local rainfall. While the section of the East Fork above Lake Ray Hubbard is typically not navigable, the remaining 40 miles of the river, extending from IH 20 to SH 34, present excellent potential for recreational activities, with the final 20–25 miles being the most favorable stretch for paddling. Although the East Fork does not feature significant rapids, high flows from runoff, log jams, low-hanging vegetation, a narrow channel, or dam releases can create hazardous conditions for those engaged in water activities, such as paddlers and boat fishermen. For the purposes of our study, we focus specifically on the segment of the river situated in the Duck Creek-East Fork Trinity River watershed, as depicted in Figure 1, with the study river shown by the blue circle.
Figure 1

Study area.

Data

We used satellite imagery from the Final Run (Level 3) GPM satellite, which was acquired between 2006 and 2020, to obtain rainfall data. The format of rainfall data is netcdf with a spatial resolution of 0.1–0.1, and a temporal resolution of 1 day. Initially, we converted the downloaded data into tiff format, then constructed the watershed shapefile of the study area in ArcMap, and used the toolbox in ArcMap to intercept the daily rainfall of the target area from the original satellite image. Finally, we averaged the rainfall values in each grid of the watershed and converted data to csv format to get the original rainfall dataset.

The stream site, maintained by the USGS Texas Water Science Center (identifier USGS-TX), has the name ‘E Fk Trinity Rv nr Forney, TX’ and has the identifier USGS-08061750. It is located in Kaufman County County, Texas at 32.7742947 degrees latitude and −96.5035991 degrees longitude using the datum NAD83. The runoff data was downloaded from the United States Geological Survey (USGS) website (USGS Current Conditions for USGS 08061750 E Fk Trinity Rv nr Forney, TX). The graphical variability of the mean daily satellite-based rainfall data and measured runoff data between 2006 and 2020 are presented in Figure 2. We can easily find that the rainfall and runoff data are non-stationary and nonlinear, and the data in each period of the year are quite different.
Figure 2

Average daily runoff and rainfall data from 2006 to 2020. (a) runoff data (b) rainfall data.

Figure 2

Average daily runoff and rainfall data from 2006 to 2020. (a) runoff data (b) rainfall data.

Close modal

Empirical wavelet transform

Considering the non-linearity and non-stationarity of runoff data, using it directly for prediction can negatively impact accuracy (Liu et al. 2019). Most articles on rainfall–runoff prediction incorporate meteorological parameters such as temperature, humidity, and wind speed into the prediction model to improve accuracy (Gauch et al. 2021; Lees et al. 2021; Mao et al. 2021; Xie et al. 2021; Katipoğlu 2023). However, this introduces challenges in data acquisition and processing. The novelty of our study lies in the fact that we enhance the prediction accuracy without introducing auxiliary parameters. Instead, we focus solely on processing the rainfall–runoff data itself. If the non-stationary runoff time series can be decomposed into several more predictable components with less non-stationary characteristics, it is possible to improve prediction accuracy without introducing additional auxiliary hydrological parameters. The wavelet transform (WT) is a commonly employed technique for analyzing nonlinear and non-stationary signals. Nonetheless, its application is hindered by the limitations of fixed basis functions, as they are unable to fully capture the characteristics of all real signals (Liu & Chen 2019). Empirical mode decomposition (EMD) is a powerful technique that enables the decomposition of a signal into a collection of oscillatory components. These components are instrumental in capturing inherent properties embedded within the input signal. However, its lack of mathematical theory is still an inherited defect (Huang et al. 1998). Inspired by WT and EMD, Gille (Gilles 2013) proposed EWT. The core concept of EWT involves identifying the Fourier segments and constructing a series of wavelet filters that can effectively extract various modes from the processed signal. EWT achieves adaptability through automatic determination of the Fourier segments.

In this paper, we employed empirical wavelets transform to decompose the original runoff data. For a given signal f(t), first, we obtained its Fourier transform and processed it to get the normalized Fourier spectrum in the range of 2. According to Shannon's criterion, only the signal characteristics on [0, ] are discussed in the analysis process. Therefore, the Fourier spectrum support interval is defined in the range of [0, ]. Assuming that the signal is segmented into N contiguous components, the support interval of Fourier spectrum is , which is divided into N consecutive segments ( represents the boundary between sections), there are N + 1 boundary, where and see Figure 3. Each segment is denoted , then it can be observed that . Around each , we define a transition phase (indicated by the gray hatched areas in Figure 3) of width .
Figure 3

Partitioning of the Fourier axis.

Figure 3

Partitioning of the Fourier axis.

Close modal
Empirical wavelet is a band-pass filter bank defined in the interval, using the idea of constructing Littlewood-Paley and Meyer wavelets. For n0, the empirical wavelet function and empirical scaling function are expressed by expressions of (1) and (2), respectively.
formula
(1)
formula
(2)
formula
(3)
Drawing on the idea of the classic WT, the empirical wavelet coefficients constructed by Gilles are generated by the inner product. The detail coefficients are given by the inner product of signal f(t) and empirical wavelet function , which can be written as follows:
formula
(4)
The approximation coefficients by the inner product with the scaling function:
formula
(5)
Therefore, the reconstruction expression of the signal f(t) is:
formula
(6)
formula
(7)
Following this formalism, the empirical mode is given by:
formula
(8)
formula
(9)
The runoff data from 2006 to 2020 after EWT are shown in Figure 4. In the following chapters, we will input the two decomposed signals (mra1 and mra2) and the original data into the model for prediction.
Figure 4

Runoff data after empirical wavelet transform.

Figure 4

Runoff data after empirical wavelet transform.

Close modal

Fractal theory

The term fractal is typically defined as ‘a rough or fragmented geometric shape’, which can be divided into several parts, and each part (at least approximately) is the overall reduced shape (Mandelbrot & Mandelbrot 1982). In recent years, the Fractal theory has been widely employed in hydrology-related research fields (Oñate Rubalcaba 1997; De Lima & De Lima 2009) to explore the intrinsic self-similarity of time series. In the Fractal theory, self-similarity refers to the similarity between the part and the entirety of the research object, and most of them are statistically similar (Dong et al. 2007). As illustrated in Figure 2(a), the runoff data vary greatly across different time periods. Such fluctuations can have a significant impact on the accuracy of neural networks. Therefore, to ensure the scientific validity and accuracy of the model, this study employs the fractal characteristics of the runoff data to divide the annual rainfall and runoff data into multiple stages.

  • (1)

    Preliminary fractal judgement of runoff series

To evaluate the presence of self-similarity in the sequence, we conducted the Kolmogorov–Smirnov test by computing various statistical parameter values of the sequence, as presented in Table 1.

Table 1

Computation of statistical parameter

AverageStd.Coefficient of variationSkewnessKurtosis
599.98 1,865.53 3.11 8.34 115.61 
AverageStd.Coefficient of variationSkewnessKurtosis
599.98 1,865.53 3.11 8.34 115.61 

At the same time, we drew the QQ normal probability diagram of runoff data, as shown in Figure 5(a).
Figure 5

Preliminary fractal judgement of runoff series. (a) The normal Q-Q plots of runoff data (b) DFA analysis result of runoff data.

Figure 5

Preliminary fractal judgement of runoff series. (a) The normal Q-Q plots of runoff data (b) DFA analysis result of runoff data.

Close modal

The curve trend shown in the QQ diagram indicates that the sample data does not follow a normal distribution, and the sequence shows a ‘peak’ and ‘thick tail’ distribution. Combined with the statistical parameter values in Table 1, we can conclude that this time series conforms to the unique characteristics of the ‘Pareto’ distribution, indicating that the runoff series possesses self-similarity (Gordon 1995). To further verify its fractal characteristics, this study introduced the Detrended Fluctuation Analysis (DFA) method (Peng et al. 1994). The DFA method was utilized to select the interval (‘N’ represents the length of the studied sequence) (Hou et al. 2010), and the analysis results are shown in Figure 5(b). It can be seen that the scale index of the runoff series is found to be 0.8558, which is greater than 0.5, indicating that the runoff series possesses a long-term correlation, and its future trend is positively correlated with the historical change trend (Mandelbrot 1967). In conclusion, the runoff series under investigation demonstrates self-similarity and scale invariance, thereby classifying it as a fractal time series.

  • (2)

    Fractal dimension computation

The Fractal theory has found extensive applications in the calculation of flood stages (Ai et al. 2009). The basic concept involves assuming a starting calculation time, defining different time intervals, calculating the fractal dimension for each time interval, respectively, and then comparing the fractal dimension values across different time intervals to identify the periodicity of the series. It can be seen from Figure 2(a) that there are three significant increases in runoff data throughout the year, the first in the middle and late February, the second in late May and early June, and the third in late November. By taking 2/10 as the starting period, the fractal dimensions of each stage are calculated and presented in Tables 24 (Yu et al. 1999; Lei-hua & Yi 2021).

Table 2

Fractal dimension computation in stage 1

DateDaysDimensionDifference
2/10–3/22 40 1.81 
2/10–5/2 81 1.848 0.038 
2/10–6/12 122 1.856 0.012 
2/10–7/23 163 1.844 0.012 
2/10–9/2 204 1.769 0.075 
DateDaysDimensionDifference
2/10–3/22 40 1.81 
2/10–5/2 81 1.848 0.038 
2/10–6/12 122 1.856 0.012 
2/10–7/23 163 1.844 0.012 
2/10–9/2 204 1.769 0.075 
Table 3

Fractal dimension computation in stage 2

DateDaysDimensionDifference
7/24–9/2 40 1.489 
7/24–10/12 80 1.811 0.032 
7/24–11/18 117 1.761 0.05 
DateDaysDimensionDifference
7/24–9/2 40 1.489 
7/24–10/12 80 1.811 0.032 
7/24–11/18 117 1.761 0.05 
Table 4

Fractal dimension computation in stage 3

DateDaysDimensionDifference
9/3–10/13 40 1.802 
9/3–11/18 76 1.862 0.06 
9/3–12/28 116 1.868 0.006 
9/3–2/9 159 1.846 0.012 
DateDaysDimensionDifference
9/3–10/13 40 1.802 
9/3–11/18 76 1.862 0.06 
9/3–12/28 116 1.868 0.006 
9/3–2/9 159 1.846 0.012 

Building on the application of the Fractal theory in flood staging, the runoff data in the study area can be partitioned into three stages. Specifically, the first stage spans from 2/10 to 7/23, the second stage is observed between 7/24 and 9/2, while the third stage extends from 9/3 to 2/9.

Long short-term memory network

The LSTM is a type of recurrent neural network (RNN) that was first introduced in 1997 (Graves & Graves 2012). One of the distinctive aspects that sets LSTM apart from traditional RNN is its unique ‘gate’ structure. This structure enables the network to effectively tackle the challenges of gradient disappearance and gradient explosion during training (Bengio et al. 1994). LSTM consists of several components, including a cell state, input gate, output gate, and forget gate, as illustrated in Figure 6(a). These components work together to enable the network to selectively retain or discard information over time, allowing it to learn long-term dependencies in sequential data more effectively.
Figure 6

Long short-term memory (LSTM) network. (a) The structure of the LSTM unit (b) The training process of LSTM.

Figure 6

Long short-term memory (LSTM) network. (a) The structure of the LSTM unit (b) The training process of LSTM.

Close modal

In Figure 6(a), the plus sign represents the element-wise addition of the input and the previous output, while the Hadamard product denotes element-wise multiplication. , , z, represent the forgotten gate, the input gate, the input node, and the output gate, respectively. The LSTM cell captures the dependencies among the data in the input sequence. The input gate regulates the extent of the values which flow into the cell. The forget gate will selectively discard information from the previous node . The values within the are used to calculate the output activation of the LSTM, which is controlled by the output gate. The schematic diagram of the LSTM network training is shown in Figure 6(b).

Informer network

The Informer model is a network structure based on the attention mechanism. It is designed to enhance the computational efficiency of the self-attention mechanism, multi-layer network stacking, and step-by-step decoding methods used in the transformer network (Zhou et al. 2021). The informer network exhibits better performance than LSTM in capturing long-term correlations. Therefore, it has been successfully applied in fields such as long time series electrical line trip fault prediction and wind power prediction (Guo et al. 2021; Tian et al. 2023). Based on its success in long-term time series prediction, this study introduces the Informer model for the first time in rainfall–runoff prediction.

The model is comprised of two main components: an encoder and a decoder, as illustrated in Figure 7. The encoder is designed to process long sequences of input data, while the decoder is responsible for generating output sequences of the same length as the input, with zero-padding added as necessary. This architecture leverages the benefits of the self-attention mechanism, while reducing the computational complexity and memory usage from to . The calculation formula for the multi-head sparse self-attention mechanism is as follows:
formula
(10)
where represents the attention block, Conv1d is the 1-D convolutional filters, and ELU is the activation function. The ‘distilling’ operation adds a pooling layer with stride 2 and down-sample into its half slice, which reduces the memory utilization to , where is a small number.
Figure 7

The structure of the Fractal EWT Informer model.

Figure 7

The structure of the Fractal EWT Informer model.

Close modal
The decoder is composed of a stack of two identical multi-head attention layers and the decoder with the vectors as follows. Generative inference is employed to alleviate the slowdown of long prediction.
formula
(11)
where is the input sequence of the decoder, is the start token, is a placeholder for the target sequence, and the ‘t’ both represent the tth. The model adopts the method of generative reasoning for decoding (Wu et al. 2022). The overall architecture of the model we used in the paper is shown in Figure 7.

Experimental setup

To evaluate the performances of Fractal EWT Informer proposed in this paper, we compared it to five benchmark algorithms: LSTM, EWT_LSTM, Fractal_EWT_LSTM, Informer, and EWT_Informer. These methods were classified into two groups: Group 1, consisting of LSTM, EWT_LSTM, and Fractal_EWT_LSTM; and Group 2, consisting of Informer, EWT_Informer, Fractal_EWT_Informer. Obviously, all the forecasting algorithms in Group 1 are based on LSTM, while the members of Group 2 are based on Informer. It is important to note that after introducing EWT, the model's output still represents the true runoff values without the need for further transformation.
  • (1)
    Evaluation metrics: We used the Pearson correlation coefficient (PCC) as the primary evaluation indices. PCC was used to measure the linear correlation between two variables X and Y, with values between −1 and 1. The formula is as follows:
    formula
    (12)
Here, , , are the standard scores, average values and standard deviations of samples respectively. Result closer to 1 is better. Usually, the PCC metric is used to assess the correlation between variables when the data follow a normal distribution. However, with large sample sizes, the normality test will reject normality with a high probability even in the presence of small and acceptable deviations from normality. Tsagris M's suggestion (Tsagris & Pandis 2021) is that we should ignore testing and focus more on visual assessment. Figure 8 shows the histogram distribution of the data sequences. It can be observed that the histograms of each time series are roughly bell-shaped, indicating that they approximately follow a normal distribution. Therefore, we can use PCC to measure the correlation between two variables.
Figure 8

The histogram distribution of the each data sequence (the abbreviation ‘P’ in the diagram stands for ‘Prediction’, while ‘T’ represents ‘True’). (a) LSTM Prediction (b) LSTM True (c) EWT LSTM P (d) EWT LSTM Fractal P (e) EWT LSTM Fractal T (f) Informer P (g) Informer T (h) EWT Informer P (i) EWT Informer T (j) EWT Informer Fractal P (k) EWT Informer Fractal T.

Figure 8

The histogram distribution of the each data sequence (the abbreviation ‘P’ in the diagram stands for ‘Prediction’, while ‘T’ represents ‘True’). (a) LSTM Prediction (b) LSTM True (c) EWT LSTM P (d) EWT LSTM Fractal P (e) EWT LSTM Fractal T (f) Informer P (g) Informer T (h) EWT Informer P (i) EWT Informer T (j) EWT Informer Fractal P (k) EWT Informer Fractal T.

Close modal
Additionally, root mean square error (RMSE), mean absolute error (MAE) and R2 score were also reported. The above metrics are defined as follows:
formula
(13)
formula
(14)
formula
(15)
where m equals the length of the sample. A smaller value for RMSE or MAE indicates a better model. Ideally, score should be close to one.
  • (2)

    Model settings: Our models were implemented in Python 3.7.6 using PyTorch 1.6.0 framework. The libraries for data processing we used are Numpy, Pandas, sklearn, and Matplotlib. All experiments can be conducted on one single NVIDIA GeForce GTX 1650 Ti (11.9 GB memory).

In this paper, our goal was to demonstrate that the EWT and Fractal theory can improve the non-linearity and non-stationarity of runoff data. Thus, we evaluated the performance of Group 1 in the study area and compared the results using the statistics of those four evaluation metrics and cost time. Furthermore, we aimed to highlight the advantage of the Informer model, which has shown higher accuracy and faster computation time. Therefore, we compared the results of Group 2 with those of Group 1.

The runoff prediction results of models based on LSTM

For the first group, the following steps are taken. First, we divided the dataset into a training set (70%) and a test set (30%). Then, we employed the training set to update learning parameters using the backpropagation, while the test set was used to determine hyper-parameters. The experimental results confirmed that the EWT and fractal characteristics are suitable for reducing the non-linearity and non-stationarity of runoff data. Table 5 shows the forecasting results. Figure 9(a)–9(c) clearly demonstrates that the prediction accuracy of the proposed method after EWT and fractal processing is better compared with the LSTM model. In order to show the difference between the predicted value and the real value more clearly, this paper only selected 600 pieces of data with relatively large changes in the predicted results for display. At the same time, the prediction result of the Fractal_EWT_LSTM model is spliced after training, so its x-axis subscript is inconsistent with LSTM and EWT_LSTM.
Table 5

Forecasting result of models based on LSTM

LSTMEWT LSTMEWT LSTM Fractal
RMSE 1,149.889 837.467 722.490 
MAE 398.874 324.134 369.277 
 0.731 0.807 0.810 
PCC 0.854 0.900 0.905 
Cost time (s) 265.900 429.100 356.8 
LSTMEWT LSTMEWT LSTM Fractal
RMSE 1,149.889 837.467 722.490 
MAE 398.874 324.134 369.277 
 0.731 0.807 0.810 
PCC 0.854 0.900 0.905 
Cost time (s) 265.900 429.100 356.8 
Figure 9

Prediction results based on the LSTM model. (a) Prediction result of LSTM model (b) Prediction result of EWT_LSTM model (c) Prediction result of EWT_LSTM_Fractal model.

Figure 9

Prediction results based on the LSTM model. (a) Prediction result of LSTM model (b) Prediction result of EWT_LSTM model (c) Prediction result of EWT_LSTM_Fractal model.

Close modal
Figure 10

Changes in the loss of the training set over epochs.

Figure 10

Changes in the loss of the training set over epochs.

Close modal

Wherein, the Fractal_EWT_LSTM yields the best results for runoff prediction. The values shown in Table 5 of the Fractal_EWT_LSTM model are obtained by averaging the results of three parts after fractal processing. Compared with the LSTM model, each index has improved to varying degrees after applying EWT and Fractal theory. Moreover, compared to the EWT_LSTM model, RMSE, , and PCC are improved, and the total prediction time is reduced to 5 min 56.8 s after being divided into three parts. While the prediction accuracy is slightly improved, the prediction time is shortened by 1 min 12.3 s. These results indicate that combining EWT and fractal transform with the LSTM model can increase prediction efficiency when dealing with large prediction datasets. The changes in the loss of the training set over epochs are shown in Figure 10.

The runoff prediction results of models based on informer

Similar to group 1, the original runoff data, runoff data processed by EWT, and data processed by EWT and fractal transform were input into the Informer model to compare their prediction results. The hyper-parameters we used in the Informer model are presented in Table 6.

Table 6

The values of hyper-parameters in the Informer model

Hyper-parametersValues
Batch size 32 
Model dimension 512 
Dropout rate 0.05 
Learning rate 0.001 
Sequence length 
Label length 
Number of encoder and decoder layers 
Feature M (multivariate) 
Hyper-parametersValues
Batch size 32 
Model dimension 512 
Dropout rate 0.05 
Learning rate 0.001 
Sequence length 
Label length 
Number of encoder and decoder layers 
Feature M (multivariate) 

For the selection of hyper-parameters in the Informer models, manual hyper-parameter tuning was employed in this study. By conducting multiple experiments with different combinations of hyper-parameters, relatively moderate-performing hyper-parameters were chosen. Table 7 and Figure 11 show the forecasting result of each model. In order to show the difference between the predicted value and the real value more clearly, this paper only selected 600 pieces of data with relatively large changes in the predicted results for display. At the same time, the predicted features of Group 2 are multivariate, therefore, the results shown in the figure may be somewhat inconsistent. The changes in the loss of the training set over epochs are shown in Figure 10.
Table 7

Forecasting result of models based on Informer

InformerEWT InformerEWT Informer Fractal
RMSE 759.195 599.714 493.230 
MAE 224.805 182.717 202.205 
 0.783 0.868 0.678 
PCC 0.900 0.937 0.826 
Cost time (s) 57.477 63.560 41.55 
InformerEWT InformerEWT Informer Fractal
RMSE 759.195 599.714 493.230 
MAE 224.805 182.717 202.205 
 0.783 0.868 0.678 
PCC 0.900 0.937 0.826 
Cost time (s) 57.477 63.560 41.55 
Figure 11

Prediction results based on the Informer model. (a) Prediction result of Informer (b) Prediction result of EWT Informer (c) Prediction result of Fractal EWT Informer.

Figure 11

Prediction results based on the Informer model. (a) Prediction result of Informer (b) Prediction result of EWT Informer (c) Prediction result of Fractal EWT Informer.

Close modal
Figure 12

The box-plot of each evaluation metric.

Figure 12

The box-plot of each evaluation metric.

Close modal

Meanwhile, Figure 12 shows the box-plot of each evaluation metric. Wherein, the EWT_Informer model achieved the best prediction results. Compared with Fractal_EWT_LSTM, the RMSE reduces by 112.776, the MAE reduces by 186.56, the increases by 0.088, and the PCC increases by 0.032. Moreover, the running time of EWT_Informer is only 1 min 3.56 s. EWT_Informer achieved better prediction results at almost the same time as Informer.

It can be observed in Figure 13 that the model based on the informer architecture exhibits significant decreases in RMSE, MAE, and cost time. Additionally, except for the EWT_Informer_Fractal model, both the and PCC values of models based on Informer show improvements, providing evidence of the feasibility of applying the Informer to runoff prediction. Moreover, whether it is the LSTM-based models or the Informer-based models, the introduction of EWT for data processing leads to varying degrees of improvement in predictive performance. This suggests that incorporating EWT to decompose sequences into smoother and more predictable components for inclusion in prediction models is feasible. Meanwhile, Figure 12 shows the box-plot of each evaluation metric.
Figure 13

Comparison of evaluation metric across each model.

Figure 13

Comparison of evaluation metric across each model.

Close modal

A clear advantage of the structure based on encoder and decoder, such as informer, is the possibility for the network to use attention layers to model global relations between tokens. However, the increased representation capacity comes at a price, as the Informer model requires a large amount of data for training (Liu et al. 2021). Although Fractal EWT Informer shows reduced prediction accuracy in part 2 due to its small data volume, it performs well in part 1 and part 3 where there is sufficient data length. This suggests that the Fractal theory can still contribute to improving prediction accuracy if the dataset is sufficiently long. Furthermore, dividing the data into parts can save training time.

When the predicted and observed flow values in Figure 14 are compared, it can be observed from EWT_Informer that the data points in this plot exhibit the smallest deviation from the regression line. This finding aligns with our previous conclusions, indicating that the predictive accuracy of this model is the highest.
Figure 14

Scatter plots of predicted values versus actual values for each model.

Figure 14

Scatter plots of predicted values versus actual values for each model.

Close modal

This study presented an EWT_Informer method to predict runoff data of the East Fork of the Trinity River in Texas, USA. The EWT was employed to decompose original runoff data into sub-signals to extract information. In addition, the Fractal theory was introduced to divide rainfall and runoff data into three parts, which helped eliminate the interference caused by excessive data fluctuations on the prediction accuracy.

With the proposed prediction method, we obtained significant accuracy improvements (the points of PCC increase from 0.854 to 0.937) and lower time complexity in the EWT_Informer model compared with the LSTM model (the cost time reduces from 4 min 52.90 s to 1 min 3.56 s). Furthermore, we proved that the Fractal theory can reduce non-linearity and non-stationarity of data, leading to improved prediction accuracy when the amount of data is sufficient.

This study introduces the Informer network for the first time in rainfall–runoff prediction, demonstrating its ability to improve prediction accuracy while reducing time complexity. Additionally, EWT and fractal theory have been shown to enhance prediction accuracy on large datasets. Engineers can utilize the proposed network to respond more quickly to floods and droughts. Moreover, since the network does not require the inclusion of meteorological parameters, and due to the characteristics of deep learning networks, there is no need for operators to have a detailed understanding of the internal mechanisms of rainfall–runoff, thus reducing the threshold for prediction.

In the study, the model's parameters were chosen by trial and error. In order to improve efficiency and scientific rigor, it is suggested that a diverse set of nature-based optimization techniques be employed to determine the parameters such as Grid search and Bayesian search.

Meanwhile, in this paper, the data are divided into three segments using the Fractal theory. However, the second segment has a small data size and is not suitable for prediction on the Informer network. The first and third segments still perform well. It is recommended that future researchers apply the EWT_Informer_Fractal model to larger datasets for further investigation.

The authors declare there is no conflict.

Ai
X.-s.
,
Dong
Q.-j.
,
Wang
X.-j.
&
Zhang
Y.-m.
2009
Application of wavelet fractal dimension estimation in dividing flood stages for three gorges reservoir
.
Systems Engineering-Theory & Practice
29
(
1
),
145
151
.
Bengio
Y.
,
Simard
P.
&
Frasconi
P.
1994
Learning long-term dependencies with gradient descent is difficult
.
IEEE Transactions on Neural Networks
5
(
2
),
157
166
.
Brocca
L.
,
Massari
C.
,
Pellarin
T.
,
Filippucci
P.
,
Ciabatta
L.
,
Camici
S.
,
Kerr
Y. H.
&
Fernández-Prieto
D.
2020
River flow prediction in data scarce regions: Soil moisture integrated satellite rainfall products outperform rain gauge observations in west africa
.
Scientific Reports
10
(
1
),
12517
.
De Lima
M. I. P.
&
De Lima
J. L. M. P.
2009
Investigating the multifractality of point precipitation in the Madeira archipelago
.
Nonlinear Processes in Geophysics
16
(
2
),
299
311
.
Dong
Q.
,
Wang
X.
,
Wang
J.
&
Fu
C.
2007
Application of fractal theory in the stage analysis of flood seasons in three gorges reservoir
.
Resour. Environ. Yangtze Basin
16
,
400
404
.
Gao
S.
,
Huang
Y.
,
Zhang
S.
,
Han
J.
,
Wang
G.
,
Zhang
M.
&
Lin
Q.
2020
Short-term runoff prediction with gru and lstm networks without requiring time step optimization during sample generation
.
Journal of Hydrology
589
,
125188
.
Gauch
M.
,
Kratzert
F.
,
Klotz
D.
,
Nearing
G.
,
Lin
J.
&
Hochreiter
S.
2021
Rainfall–runoff prediction at multiple timescales with a single long short-term memory network
.
Hydrology and Earth System Sciences
25
(
4
),
2045
2062
.
Gilles
J.
2013
Empirical wavelet transform
.
IEEE Transactions on Signal Processing
61
(
16
),
3999
4010
.
Gordon
J.
1995
Pareto process as a model of self-similar packet traffic
. In
Supervised Sequence Labelling
(Graves, A. & Graves, A., eds.)
.
Springer
,
Cham
.
Graves
A.
&
Graves
A.
2012
Long short-term memory
.
Supervised Sequence Labelling with Recurrent Neural networks
37
45
.
Harris
A.
,
Rahman
S.
,
Hossain
F.
,
Yarborough
L.
,
Bagtzoglou
A. C.
&
Easson
G.
2007
Satellite-based flood modeling using trmm-based rainfall products
.
Sensors
7
(
12
),
3416
3427
.
Hou
W.
,
Zhang
D.-Q.
,
Yang
P.
&
Yang
J.
2010
A valid method to compute the segment size in detrended fluctuation analysis
.
Acta Physica Sinica
59
(
12
),
8986
.
Huang
N. E.
,
Shen
Z.
,
Long
S. R.
,
Wu
M. C.
,
Shih
H. H.
,
Zheng
Q.
,
Yen
N.-C.
,
Tung
C. C.
&
Liu
H. H.
1998
The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis
.
Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences
454
(
1971
),
903
995
.
Jakeman
A. J.
&
Hornberger
G. M.
1993
How much complexity is warranted in a rainfall–runoff model?
Water Resources Research
29
(
8
),
2637
2649
.
Jeong
D.-I.
&
Kim
Y.-O.
2005
Rainfall–runoff models using artificial neural networks for ensemble streamflow prediction
.
Hydrological Processes: An International Journal
19
(
19
),
3819
3835
.
Kratzert
F.
,
Klotz
D.
,
Brenner
C.
,
Schulz
K.
&
Herrnegger
M.
2018
Rainfall–runoff modelling using long short-term memory (lstm) networks
.
Hydrology and Earth System Sciences
22
(
11
),
6005
6022
.
Lees
T.
,
Buechel
M.
,
Anderson
B.
,
Slater
L.
,
Reece
S.
,
Coxon
G.
&
Dadson
S. J.
2021
Benchmarking data-driven rainfall–runoff models in great britain: A comparison of long short-term memory (lstm)-based models with four lumped conceptual models
.
Hydrology and Earth System Sciences
25
(
10
),
5517
5534
.
Lei-hua
Z. A.-r. D.
&
Yi
J.
2021
Utilization of flood resources in the sanhekou reservoir
.
China Rural Water and Hydropower
02
,
62
65 + 77
.
Liu
Y.
,
Guan
L.
,
Hou
C.
,
Han
H.
,
Liu
Z.
,
Sun
Y.
&
Zheng
M.
2019
Wind power short-term prediction based on lstm and discrete wavelet transform
.
Applied Sciences
9
(
6
),
1108
.
Liu
Y.
,
Sangineto
E.
,
Bi
W.
,
Sebe
N.
,
Lepri
B.
&
Nadai
M.
2021
Efficient training of visual transformers with small datasets
.
Advances in Neural Information Processing Systems
34
,
23818
23830
.
Liu
F.
,
Dong
T.
&
Liu
Y.
2022
An improved informer model for short-term load forecasting by considering periodic property of load profiles
.
Frontiers in Energy Research
10
,
950912
.
Mandelbrot
B. B.
&
Mandelbrot
B. B.
1982
The Fractal Geometry of Nature
, Vol.
1
.
WH Freeman, New York
.
Mao
G.
,
Wang
M.
,
Liu
J.
,
Wang
Z.
,
Wang
K.
,
Meng
Y.
,
Zhong
R.
,
Wang
H.
&
Li
Y.
2021
Comprehensive comparison of artificial neural networks and long short-term memory networks for rainfall–runoff simulation
.
Physics and Chemistry of the Earth, Parts A/B/C
123
,
103026
.
Nerini
D.
,
Zulkafli
Z.
,
Wang
L.-P.
,
Onof
C.
,
Buytaert
W.
,
Lavado-Casimiro
W.
&
Guyot
J.-L.
2015
A comparative analysis of trmm–rain gauge data merging techniques at the daily time scale for distributed rainfall–runoff modeling applications
.
Journal of Hydrometeorology
16
(
5
),
2153
2168
.
Oñate Rubalcaba
J. J.
1997
Fractal analysis of climatic data: Annual precipitation records in Spain
.
Theoretical and Applied Climatology
56
,
83
87
.
Peng
C.-K.
,
Buldyrev
S. V.
,
Havlin
S.
,
Simons
M.
,
Eugene Stanley
H.
&
Goldberger
A. L.
1994
Mosaic organization of dna nucleotides
.
Physical Review E
49
(
2
),
1685
.
Tian
Y.
,
Wang
D.
,
Zhou
G.
,
Wang
J.
,
Zhao
S.
&
Ni
Y.
2023
An adaptive hybrid model for wind power prediction based on the ivmd-fe-ad-informer
.
Entropy
25
(
4
),
647
.
Tsagris
M.
&
Pandis
N.
2021
Normality test: is it really necessary?
American Journal of Orthodontics and Dentofacial Orthopedics
159
(
4
),
548
549
.
Vaswani
A.
,
Shazeer
N.
,
Parmar
N.
,
Uszkoreit
J.
,
Jones
L.
,
Gomez
A. N.
,
Kaiser
Ł.
&
Polosukhin
I.
2017
Attention is all you need
.
Advances in Neural Information Processing Systems
30
,
1
2
.
Wang
H.-K.
,
Song
K.
&
Cheng
Y.
2022
A hybrid forecasting model based on cnn and informer for short-term wind power
.
Frontiers in Energy Research
9
,
1041
.
Wood
E. F.
,
Roundy
J. K.
,
Troy
T. J.
,
Van Beek
L. P. H.
,
Bierkens
M. F.
,
Blyth
E.
,
de Roo
A.
,
Döll
P.
,
Ek
M.
,
Famiglietti
J.
,
Gochis
D.
,
van de Giesen
N.
,
Houser
P.
,
Jaffé
P. R.
,
Kollet
S.
,
Lehner
B.
,
Lettenmaier
D. P.
,
Peters-Lidard
C.
,
Sivapalan
M.
,
Sheffield
J.
,
Wade
A.
&
Whitehead
P.
2011
Hyperresolution global land surface modeling: Meeting a grand challenge for monitoring earth's terrestrial water
.
Water Resources Research
47
(
5
),
1
10
.
Wu
Z.
,
Pan
F.
,
Li
D.
,
He
H.
,
Zhang
T.
&
Yang
S.
2022
Prediction of photovoltaic power by the informer model based on convolutional neural network
.
Sustainability
14
(
20
),
13022
.
Xiang
Z.
,
Yan
J.
&
Demir
I.
2020
A rainfall–runoff model with lstm-based sequence-to-sequence learning
.
Water Resources Research
56
(
1
),
e2019WR025326
.
Yokoo
K.
,
Ishida
K.
,
Ercan
A.
,
Tu
T.
,
Nagasato
T.
,
Kiyama
M.
&
Amagasaki
M.
2022
Capabilities of deep learning models on learning physical relationships: Case of rainfall–runoff modeling with lstm
.
Science of The Total Environment
802
,
149876
.
Yu
H.
,
Boxian
W.
&
Guoquan
Z.
1999
Preliminary study on the seasonal periods classification of floods by using fractal theory
.
Advances in Water Science
10
(
2
),
140
143
.
Zhao
J.
,
Huang
F.
,
Lv
J.
,
Duan
Y.
,
Qin
Z.
,
Li
G.
&
Tian
G.
2020
Do rnn and lstm have long memory?
In
International Conference on Machine Learning
.
PMLR
, pp.
11365
11375
.
Zhou
H.
,
Zhang
S.
,
Peng
J.
,
Zhang
S.
,
Li
J.
,
Xiong
H.
&
Zhang
W.
2021
Informer: Beyond efficient transformer for long sequence time-series forecasting
. In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol.
35
, pp.
11106
11115
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).