## Abstract

An accurate rainfall–runoff observation is critical for giving a warning of a potential damage early enough to allow appropriate response to the disaster. The long short-term memory (LSTM)-based rainfall–runoff model has been proven to be effective in runoff prediction. Previous research has typically utilized multiple information sources as the LSTM training data. However, when there are many sequences of input data, the LSTM cannot get nonlinear valid information between consecutive data. In this paper, a novel informer neural network using empirical wavelet transform (EWT) was first proposed to predict the runoff based only on the single rainfall data. The use of EWT reduced the non-linearity and non-stationarity of runoff data, which increased the accuracy of prediction results. In addition, the model introduced the Fractal theory to divide the rainfall and runoff into three parts, by which the interference caused by excessive data fluctuations could be eliminated. Using 15-year precipitation from the GPM satellite and runoff from the USGS, the model performance was tested. The results show that the EWT_Informer model outperforms the LSTM-based models for runoff prediction. The PCC and training time in EWT_Informer were 0.937, 0.868, and 1 min 3.56 s, respectively, while those provided by the LSTM-based model were 0.854, 0.731, and 4 min 25.9 s, respectively.

## HIGHLIGHTS

The informer network was innovatively introduced into rainfall–runoff prediction, which reduced time and spatial complexity.

The empirical wavelet transform was utilized to enhance the treatment of non-linearity and non-stationarity.

Fractal theory was applied to eliminate the interference caused by excessive data fluctuations.

## INTRODUCTION

Hydrological research has long emphasized the development and utilization of watershed models that depict the precipitation to stream flow and channel runoff processes (Jakeman & Hornberger 1993). The precision and promptness of rainfall observations are critical for constructing these watershed models in near real-time. Unfortunately, in most countries, catchment basins are sparsely gauged without accurate and adequate rainfall and runoff measurements by ground-based observations (e.g. rainfall gauge, radar or other devices) (Jeong & Kim 2005; Brocca *et al.* 2020).

Satellites offer an ideal platform for global precipitation measurement (GPM) due to their unique attributes. Satellite rainfall data are useful for predicting flood events and conducting runoff analysis. These ‘data from the sky’ are (i) easily available on a global basis from the internet; (ii) virtually uninterrupted supply of rainfall data for maintaining the functionality of operational land-based systems during catastrophic situations that can temporarily shut down ground networks (e.g., overland effects of hurricanes/earthquakes/tsunamis); (iii) the availability of basin-wide rainfall data in transboundary river basins where riparian nations lack treaties for real-time rainfall data sharing across political boundaries (Harris *et al.* 2007).

Nerini *et al.* (2015) have demonstrated the successful use of satellite data in conceptual rainfall–runoff (CRR) models for flood prediction (Najmaddin *et al.* 2017). CRR models typically require a substantial amount of input, such as hydrological data, meteorological data, topographic data, soil characteristics, and vegetation coverage, in addition to sophisticated mathematical tools and significant user expertise, which limits their application in the prediction of sequence data with the unknown limited quasi-periodic dynamics (Wood *et al.* 2011; Sang 2013). Most recently, due to its capability to process long sequence data (Bengio *et al.* 1994), long short-term memory (LSTM) networks (a form of deep learning) have successfully applied in rainfall–runoff prediction tasks (Fang *et al.* 2017; Kratzert *et al.* 2018; Gao *et al.* 2020; Xiang *et al.* 2020; Yokoo *et al.* 2022), which establish a direct mapping between input rainfall and output runoff without requiring knowledge of the physical processes.

However, the LSTM model, despite taking long time series as input, has been shown to struggle with capturing long-term dependencies, which can reduce prediction performance (Zhao *et al.* 2020). Additionally, since runoff data are non-stationary and nonlinear time series, the accuracy is often low without proper data preprocessing. To address these issues, many studies incorporate data such as wind speed and soil moisture into the LSTM network to improve prediction accuracy (Gauch *et al.* 2021; Lees *et al.* 2021; Mao *et al.* 2021; Xie *et al.* 2021; Katipoğlu 2023). Unfortunately, the LSTM network cannot obtain nonlinear valid information between consecutive data in handling large amounts of input data (Wu *et al.* 2022). Not only the non-stationarity of runoff data has been unsolved, but also it introduced more non-stationary data, increasing the time for data collection and preprocessing. Time series forecasting methods based on encoder–decoder structure, such as the transformer model, used the attention mechanism to reduce the maximum length of network signals and avoid cycle structure, resulting in better performance than the LSTM model for capturing long-term dependencies (Vaswani *et al.* 2017). However, training time for these models, especially for rainfall–runoff forecasting, can be lengthy.

To tackle the above challenges, Zhou *et al.* (2021) proposed the Informer model based on transformer architecture, which incorporates a multi-head PorbSparse self-attention mechanism to reduce the complexity and memory usage. The Informer model has been applied in wind power prediction (Wang *et al.* 2022), power load prediction (Liu *et al.* 2022), and so on. In this study, we innovatively applied the Informer model to rainfall–runoff prediction tasks and used the empirical wavelet transform (EWT) method to decompose the runoff time series into several more predictable components with lower non-stationarity. Then, we evaluated the new model using extraction of GPM imagery and data downloaded from the United States Geological Survey (USGS) website. Moreover, to mitigate the impact of data input fluctuations on prediction accuracy, we divided the runoff data using the Fractal theory.

In this paper, we address several challenges in the time series forecasting of rainfall–runoff, including the problems of long-term dependencies, time and space complexity, as well as non-stationary and non-linearity of data. The main contributions of this paper are summarized as follows:

- (1)
The Informer network was innovatively introduced into rainfall–runoff prediction, which reduced time and spatial complexity.

- (2)
The EWT Informer model was proposed, combining EWT with the Informer network, to improve the non-linearity and non-stationarity of runoff data for more accurate and efficient predictions.

- (3)
To eliminate the interference caused by excessive data fluctuations, the Fractal theory was utilized to divide the data of each year into several parts.

## STUDY REGION AND DATA

### Study area

The Trinity River, a hydrologic feature that spans a length of 715 miles, flows entirely within the confines of Texas. The river and its accompanying basin support over one-fifth of the state's populace and encompass a geographical area that surpasses that of nine states within the United States. The Trinity River consists of four primary tributaries, namely, the Clear Fork, the Elm Fork, the West Fork, and the East Fork.

### Data

We used satellite imagery from the Final Run (Level 3) GPM satellite, which was acquired between 2006 and 2020, to obtain rainfall data. The format of rainfall data is netcdf with a spatial resolution of 0.1–0.1, and a temporal resolution of 1 day. Initially, we converted the downloaded data into tiff format, then constructed the watershed shapefile of the study area in ArcMap, and used the toolbox in ArcMap to intercept the daily rainfall of the target area from the original satellite image. Finally, we averaged the rainfall values in each grid of the watershed and converted data to csv format to get the original rainfall dataset.

## METHOD

### Empirical wavelet transform

Considering the non-linearity and non-stationarity of runoff data, using it directly for prediction can negatively impact accuracy (Liu *et al.* 2019). Most articles on rainfall–runoff prediction incorporate meteorological parameters such as temperature, humidity, and wind speed into the prediction model to improve accuracy (Gauch *et al.* 2021; Lees *et al.* 2021; Mao *et al.* 2021; Xie *et al.* 2021; Katipoğlu 2023). However, this introduces challenges in data acquisition and processing. The novelty of our study lies in the fact that we enhance the prediction accuracy without introducing auxiliary parameters. Instead, we focus solely on processing the rainfall–runoff data itself. If the non-stationary runoff time series can be decomposed into several more predictable components with less non-stationary characteristics, it is possible to improve prediction accuracy without introducing additional auxiliary hydrological parameters. The wavelet transform (WT) is a commonly employed technique for analyzing nonlinear and non-stationary signals. Nonetheless, its application is hindered by the limitations of fixed basis functions, as they are unable to fully capture the characteristics of all real signals (Liu & Chen 2019). Empirical mode decomposition (EMD) is a powerful technique that enables the decomposition of a signal into a collection of oscillatory components. These components are instrumental in capturing inherent properties embedded within the input signal. However, its lack of mathematical theory is still an inherited defect (Huang *et al.* 1998). Inspired by WT and EMD, Gille (Gilles 2013) proposed EWT. The core concept of EWT involves identifying the Fourier segments and constructing a series of wavelet filters that can effectively extract various modes from the processed signal. EWT achieves adaptability through automatic determination of the Fourier segments.

*f*(

*t*), first, we obtained its Fourier transform and processed it to get the normalized Fourier spectrum in the range of 2. According to Shannon's criterion, only the signal characteristics on [0, ] are discussed in the analysis process. Therefore, the Fourier spectrum support interval is defined in the range of [0, ]. Assuming that the signal is segmented into

*N*contiguous components, the support interval of Fourier spectrum is , which is divided into

*N*consecutive segments ( represents the boundary between sections), there are

*N*+ 1 boundary, where and see Figure 3. Each segment is denoted , then it can be observed that . Around each , we define a transition phase (indicated by the gray hatched areas in Figure 3) of width .

### Fractal theory

The term fractal is typically defined as ‘a rough or fragmented geometric shape’, which can be divided into several parts, and each part (at least approximately) is the overall reduced shape (Mandelbrot & Mandelbrot 1982). In recent years, the Fractal theory has been widely employed in hydrology-related research fields (Oñate Rubalcaba 1997; De Lima & De Lima 2009) to explore the intrinsic self-similarity of time series. In the Fractal theory, self-similarity refers to the similarity between the part and the entirety of the research object, and most of them are statistically similar (Dong *et al.* 2007). As illustrated in Figure 2(a), the runoff data vary greatly across different time periods. Such fluctuations can have a significant impact on the accuracy of neural networks. Therefore, to ensure the scientific validity and accuracy of the model, this study employs the fractal characteristics of the runoff data to divide the annual rainfall and runoff data into multiple stages.

- (1)
Preliminary fractal judgement of runoff series

To evaluate the presence of self-similarity in the sequence, we conducted the Kolmogorov–Smirnov test by computing various statistical parameter values of the sequence, as presented in Table 1.

Average . | Std. . | Coefficient of variation . | Skewness . | Kurtosis . |
---|---|---|---|---|

599.98 | 1,865.53 | 3.11 | 8.34 | 115.61 |

Average . | Std. . | Coefficient of variation . | Skewness . | Kurtosis . |
---|---|---|---|---|

599.98 | 1,865.53 | 3.11 | 8.34 | 115.61 |

*Q*–

*Q*normal probability diagram of runoff data, as shown in Figure 5(a).

The curve trend shown in the *Q*–*Q* diagram indicates that the sample data does not follow a normal distribution, and the sequence shows a ‘peak’ and ‘thick tail’ distribution. Combined with the statistical parameter values in Table 1, we can conclude that this time series conforms to the unique characteristics of the ‘Pareto’ distribution, indicating that the runoff series possesses self-similarity (Gordon 1995). To further verify its fractal characteristics, this study introduced the Detrended Fluctuation Analysis (DFA) method (Peng *et al.* 1994). The DFA method was utilized to select the interval (‘*N*’ represents the length of the studied sequence) (Hou *et al.* 2010), and the analysis results are shown in Figure 5(b). It can be seen that the scale index of the runoff series is found to be 0.8558, which is greater than 0.5, indicating that the runoff series possesses a long-term correlation, and its future trend is positively correlated with the historical change trend (Mandelbrot 1967). In conclusion, the runoff series under investigation demonstrates self-similarity and scale invariance, thereby classifying it as a fractal time series.

- (2)
Fractal dimension computation

The Fractal theory has found extensive applications in the calculation of flood stages (Ai *et al.* 2009). The basic concept involves assuming a starting calculation time, defining different time intervals, calculating the fractal dimension for each time interval, respectively, and then comparing the fractal dimension values across different time intervals to identify the periodicity of the series. It can be seen from Figure 2(a) that there are three significant increases in runoff data throughout the year, the first in the middle and late February, the second in late May and early June, and the third in late November. By taking 2/10 as the starting period, the fractal dimensions of each stage are calculated and presented in Tables 2–4 (Yu *et al.* 1999; Lei-hua & Yi 2021).

Date . | Days . | Dimension . | Difference . |
---|---|---|---|

2/10–3/22 | 40 | 1.81 | / |

2/10–5/2 | 81 | 1.848 | 0.038 |

2/10–6/12 | 122 | 1.856 | 0.012 |

2/10–7/23 | 163 | 1.844 | 0.012 |

2/10–9/2 | 204 | 1.769 | 0.075 |

Date . | Days . | Dimension . | Difference . |
---|---|---|---|

2/10–3/22 | 40 | 1.81 | / |

2/10–5/2 | 81 | 1.848 | 0.038 |

2/10–6/12 | 122 | 1.856 | 0.012 |

2/10–7/23 | 163 | 1.844 | 0.012 |

2/10–9/2 | 204 | 1.769 | 0.075 |

Date . | Days . | Dimension . | Difference . |
---|---|---|---|

7/24–9/2 | 40 | 1.489 | / |

7/24–10/12 | 80 | 1.811 | 0.032 |

7/24–11/18 | 117 | 1.761 | 0.05 |

Date . | Days . | Dimension . | Difference . |
---|---|---|---|

7/24–9/2 | 40 | 1.489 | / |

7/24–10/12 | 80 | 1.811 | 0.032 |

7/24–11/18 | 117 | 1.761 | 0.05 |

Date . | Days . | Dimension . | Difference . |
---|---|---|---|

9/3–10/13 | 40 | 1.802 | / |

9/3–11/18 | 76 | 1.862 | 0.06 |

9/3–12/28 | 116 | 1.868 | 0.006 |

9/3–2/9 | 159 | 1.846 | 0.012 |

Date . | Days . | Dimension . | Difference . |
---|---|---|---|

9/3–10/13 | 40 | 1.802 | / |

9/3–11/18 | 76 | 1.862 | 0.06 |

9/3–12/28 | 116 | 1.868 | 0.006 |

9/3–2/9 | 159 | 1.846 | 0.012 |

Building on the application of the Fractal theory in flood staging, the runoff data in the study area can be partitioned into three stages. Specifically, the first stage spans from 2/10 to 7/23, the second stage is observed between 7/24 and 9/2, while the third stage extends from 9/3 to 2/9.

### Long short-term memory network

*et al.*1994). LSTM consists of several components, including a cell state, input gate, output gate, and forget gate, as illustrated in Figure 6(a). These components work together to enable the network to selectively retain or discard information over time, allowing it to learn long-term dependencies in sequential data more effectively.

In Figure 6(a), the plus sign represents the element-wise addition of the input and the previous output, while the Hadamard product denotes element-wise multiplication. , , *z*, represent the forgotten gate, the input gate, the input node, and the output gate, respectively. The LSTM cell captures the dependencies among the data in the input sequence. The input gate regulates the extent of the values which flow into the cell. The forget gate will selectively discard information from the previous node . The values within the are used to calculate the output activation of the LSTM, which is controlled by the output gate. The schematic diagram of the LSTM network training is shown in Figure 6(b).

### Informer network

The Informer model is a network structure based on the attention mechanism. It is designed to enhance the computational efficiency of the self-attention mechanism, multi-layer network stacking, and step-by-step decoding methods used in the transformer network (Zhou *et al.* 2021). The informer network exhibits better performance than LSTM in capturing long-term correlations. Therefore, it has been successfully applied in fields such as long time series electrical line trip fault prediction and wind power prediction (Guo *et al.* 2021; Tian *et al.* 2023). Based on its success in long-term time series prediction, this study introduces the Informer model for the first time in rainfall–runoff prediction.

*t*th. The model adopts the method of generative reasoning for decoding (Wu

*et al.*2022). The overall architecture of the model we used in the paper is shown in Figure 7.

## EXPERIMENTAL RESULTS AND ANALYSIS

### Experimental setup

- (1)

*M*'s suggestion (Tsagris & Pandis 2021) is that we should ignore testing and focus more on visual assessment. Figure 8 shows the histogram distribution of the data sequences. It can be observed that the histograms of each time series are roughly bell-shaped, indicating that they approximately follow a normal distribution. Therefore, we can use PCC to measure the correlation between two variables.

*R*

^{2}score were also reported. The above metrics are defined as follows:where

*m*equals the length of the sample. A smaller value for RMSE or MAE indicates a better model. Ideally, score should be close to one.

- (2)
*Model settings*: Our models were implemented in Python 3.7.6 using PyTorch 1.6.0 framework. The libraries for data processing we used are Numpy, Pandas, sklearn, and Matplotlib. All experiments can be conducted on one single NVIDIA GeForce GTX 1650 Ti (11.9 GB memory).

In this paper, our goal was to demonstrate that the EWT and Fractal theory can improve the non-linearity and non-stationarity of runoff data. Thus, we evaluated the performance of Group 1 in the study area and compared the results using the statistics of those four evaluation metrics and cost time. Furthermore, we aimed to highlight the advantage of the Informer model, which has shown higher accuracy and faster computation time. Therefore, we compared the results of Group 2 with those of Group 1.

### The runoff prediction results of models based on LSTM

*x*-axis subscript is inconsistent with LSTM and EWT_LSTM.

. | LSTM . | EWT LSTM . | EWT LSTM Fractal . |
---|---|---|---|

RMSE | 1,149.889 | 837.467 | 722.490 |

MAE | 398.874 | 324.134 | 369.277 |

0.731 | 0.807 | 0.810 | |

PCC | 0.854 | 0.900 | 0.905 |

Cost time (s) | 265.900 | 429.100 | 356.8 |

. | LSTM . | EWT LSTM . | EWT LSTM Fractal . |
---|---|---|---|

RMSE | 1,149.889 | 837.467 | 722.490 |

MAE | 398.874 | 324.134 | 369.277 |

0.731 | 0.807 | 0.810 | |

PCC | 0.854 | 0.900 | 0.905 |

Cost time (s) | 265.900 | 429.100 | 356.8 |

Wherein, the Fractal_EWT_LSTM yields the best results for runoff prediction. The values shown in Table 5 of the Fractal_EWT_LSTM model are obtained by averaging the results of three parts after fractal processing. Compared with the LSTM model, each index has improved to varying degrees after applying EWT and Fractal theory. Moreover, compared to the EWT_LSTM model, RMSE, , and PCC are improved, and the total prediction time is reduced to 5 min 56.8 s after being divided into three parts. While the prediction accuracy is slightly improved, the prediction time is shortened by 1 min 12.3 s. These results indicate that combining EWT and fractal transform with the LSTM model can increase prediction efficiency when dealing with large prediction datasets. The changes in the loss of the training set over epochs are shown in Figure 10.

### The runoff prediction results of models based on informer

Similar to group 1, the original runoff data, runoff data processed by EWT, and data processed by EWT and fractal transform were input into the Informer model to compare their prediction results. The hyper-parameters we used in the Informer model are presented in Table 6.

Hyper-parameters . | Values . |
---|---|

Batch size | 32 |

Model dimension | 512 |

Dropout rate | 0.05 |

Learning rate | 0.001 |

Sequence length | 7 |

Label length | 7 |

Number of encoder and decoder layers | 3 |

Feature | M (multivariate) |

Hyper-parameters . | Values . |
---|---|

Batch size | 32 |

Model dimension | 512 |

Dropout rate | 0.05 |

Learning rate | 0.001 |

Sequence length | 7 |

Label length | 7 |

Number of encoder and decoder layers | 3 |

Feature | M (multivariate) |

. | Informer . | EWT Informer . | EWT Informer Fractal . |
---|---|---|---|

RMSE | 759.195 | 599.714 | 493.230 |

MAE | 224.805 | 182.717 | 202.205 |

0.783 | 0.868 | 0.678 | |

PCC | 0.900 | 0.937 | 0.826 |

Cost time (s) | 57.477 | 63.560 | 41.55 |

. | Informer . | EWT Informer . | EWT Informer Fractal . |
---|---|---|---|

RMSE | 759.195 | 599.714 | 493.230 |

MAE | 224.805 | 182.717 | 202.205 |

0.783 | 0.868 | 0.678 | |

PCC | 0.900 | 0.937 | 0.826 |

Cost time (s) | 57.477 | 63.560 | 41.55 |

Meanwhile, Figure 12 shows the box-plot of each evaluation metric. Wherein, the EWT_Informer model achieved the best prediction results. Compared with Fractal_EWT_LSTM, the RMSE reduces by 112.776, the MAE reduces by 186.56, the increases by 0.088, and the PCC increases by 0.032. Moreover, the running time of EWT_Informer is only 1 min 3.56 s. EWT_Informer achieved better prediction results at almost the same time as Informer.

A clear advantage of the structure based on encoder and decoder, such as informer, is the possibility for the network to use attention layers to model global relations between tokens. However, the increased representation capacity comes at a price, as the Informer model requires a large amount of data for training (Liu *et al.* 2021). Although Fractal EWT Informer shows reduced prediction accuracy in part 2 due to its small data volume, it performs well in part 1 and part 3 where there is sufficient data length. This suggests that the Fractal theory can still contribute to improving prediction accuracy if the dataset is sufficiently long. Furthermore, dividing the data into parts can save training time.

## CONCLUSIONS

This study presented an EWT_Informer method to predict runoff data of the East Fork of the Trinity River in Texas, USA. The EWT was employed to decompose original runoff data into sub-signals to extract information. In addition, the Fractal theory was introduced to divide rainfall and runoff data into three parts, which helped eliminate the interference caused by excessive data fluctuations on the prediction accuracy.

With the proposed prediction method, we obtained significant accuracy improvements (the points of PCC increase from 0.854 to 0.937) and lower time complexity in the EWT_Informer model compared with the LSTM model (the cost time reduces from 4 min 52.90 s to 1 min 3.56 s). Furthermore, we proved that the Fractal theory can reduce non-linearity and non-stationarity of data, leading to improved prediction accuracy when the amount of data is sufficient.

This study introduces the Informer network for the first time in rainfall–runoff prediction, demonstrating its ability to improve prediction accuracy while reducing time complexity. Additionally, EWT and fractal theory have been shown to enhance prediction accuracy on large datasets. Engineers can utilize the proposed network to respond more quickly to floods and droughts. Moreover, since the network does not require the inclusion of meteorological parameters, and due to the characteristics of deep learning networks, there is no need for operators to have a detailed understanding of the internal mechanisms of rainfall–runoff, thus reducing the threshold for prediction.

In the study, the model's parameters were chosen by trial and error. In order to improve efficiency and scientific rigor, it is suggested that a diverse set of nature-based optimization techniques be employed to determine the parameters such as Grid search and Bayesian search.

Meanwhile, in this paper, the data are divided into three segments using the Fractal theory. However, the second segment has a small data size and is not suitable for prediction on the Informer network. The first and third segments still perform well. It is recommended that future researchers apply the EWT_Informer_Fractal model to larger datasets for further investigation.

## DATA AVAILABILITY STATEMENT

All relevant data are available from an online repository or repositories: (https://waterdata.usgs.gov/nwis/dv?cb_00060=on&format=html&site_no=08061750&referred_module=sw&period=&begin_date=2006-01-01&end_date=2020-12-31); (https://gpm.nasa.gov/missions/GPM-The Global Precipitation Measurement Mission (GPM) | NASA Global Precipitation Measurement Mission).

## CONFLICT OF INTEREST

The authors declare there is no conflict.