Hydrological runoff prediction is vital for water resource management. The non-linear and non-stationary runoff series and the complex hydrological features for large-scale basins make it difficult to predict. Long short-term memory (LSTM) is effective for runoff prediction but unstable for large-scale basins. This study develops three hybrid models combined with two-stage decomposition and LSTM, including wavelet transformation (WT) combined with complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), variational mode decomposition (VMD), and local mean decomposition (LMD), to predict the daily runoff of the Pearl River in China. The results indicate CEEMDAN's broader signal decomposition applicability for runoff series preprocessing, while VMD is simpler to extract high-runoff characteristics. VMD–WT–LSTM is appropriate for predicting high and median runoff, whereas CEEMDAN–WT–LSTM is better for low-runoff and high and median runoffs with low-violent fluctuations. These hybrid models provide satisfactory predictions for NSE and R2 indicators, and 97.2% of indicators fall within the acceptable range for high-runoff predictions. The hybrid models outperform traditional and standalone models in high-runoff but none of the decomposition methods in this research can identify low-runoff sub-sequence. This study provided runoff prediction methods requiring fewer data and processing time, and these methods are promising alternatives for daily runoff prediction in large-scale basins.

  • Data-driven model is the trend of runoff prediction model.

  • Signal decomposition technology can decompose the runoff features.

  • Wavelet transformation can decompose highly fluctuating runoffs.

  • The LSTM-based hybrid model is suitable for river runoff prediction.

  • Signal decomposition-LSTM model outperforms the standalone model.

Runoff plays a critical role in efficient water resource management and effective flood warning (Yaseen et al. 2019), and also is a vital driving force of the hazardous materials transportation in river basins (Young & Liu 2015). Obtaining accurate runoff information enables us to better understand the hydrological environment, provide early warning of flood events, plan water resources, sediment control, and protect ecosystems (Min et al. 2023). Precipitation and local hydrological environments are the main factors affecting the runoff and seasonal changes of rivers (Sun et al. 2022). As a result, a combination of hydrological parameters and mathematical models can predict the river runoff effectively. Nowadays, runoff prediction has been one of the best ways to collect valuable runoff data alongside on-site monitoring, and has played a crucial role in the scheduling and management of water resources (Chen et al. 2021).

Two types of runoff modeling approaches have been developed and applied in recent years: physical-based and data-driven models (Fidal & Kjeldsen 2020; Yin et al. 2022). The physical-based models, such as the Soil and Water Assessment Tool (SWAT) and MIKE System Hydrologique Eurpeen model (MIKE SHE) (Devi et al. 2015; Xiang et al. 2018), cover the major processes in the hydrologic cycle and include process models for evapotranspiration, overland flow, unsaturated flow, groundwater flow, and channel flow and their interactions. Their exact processes internally are aiding in understanding the underlying rules governing runoff processes. However, these models are subject to many simplifying assumptions (Duan et al. 1992), and they need wide-ranging and abundant parameters and are subject to many simplifying assumptions, making the establishment and calibration process extremely time-consuming (Bajirao et al. 2021). In addition, the physical-based model has a high demand for various meteorological, underlying surface and hydrological data, and obtaining these data is also a major challenge. By comparison, data-driven models, mainly using the machine learning method, can establish a functional relationship between input parameters and output results for accurate forecasting without a clear physical mechanism (Cao et al. 2019), without the need for complex parameters and long-term establishment and calibration. Due to the composite features of non-linearity, high uncertainty, and spatiotemporal variability of runoff data, data-driven models are considered an optimal approach for minimizing or overcoming these issues in runoff prediction (He et al. 2014; Hao et al. 2023). Time series neural networks represented by the long short-term memory (LSTM) model are the most widely used in machine learning data-driven models, and it has obvious advantages in processing large data (Ren et al. 2022). The LSTM architecture is an improved recurrent neural network (RNN) model that overcomes the problem of vanishing or exploding gradients (Gao et al. 2020), and it can be trained for sequence generation by analyzing real data sequences one step at a time and predicting what will occur next. Due to its superior ability to capture the correlation of time series, it has been increasingly used by researchers in runoff prediction applications. The LSTM-based methods have been successfully used in the runoff prediction in various time series worldwide (Xiang et al. 2020) and have indicated their capability to simulate low-runoff conditions and their outflow curve for the peak operation period.

Runoff prediction in large-scale basins is now developed to satisfy water management plans (Goudarzi et al. 2020). The main problem faced by this research is that the data fluctuates in a large range with time series; for example, in large basins, the difference in daily runoff between different months varies by dozens of times (Ren et al. 2022). However, the time series capture ability of a single LSTM model is limited in its capacity to accurately distinguish and identify these features. Recent studies indicated that deep learning models with data preprocessing perform better when predicting time series (Tang et al. 2018). The characteristic factors of runoff data with long or short time series can be extracted and expressed using appropriate time–frequency decomposition technology to preprocess data (Mosavi et al. 2018; Xie et al. 2019). Taking the runoff data as the signal data and using the signal decomposition method is conducive to solving the problem of a large amount of data and its fluctuation (Jamei et al. 2020). Several comparisons between standalone, namely models that are directly predicted without data preprocessing, and hybrid models revealed that the data preprocessing technique can enhance the performance of standalone models (Xie et al. 2019). Wavelet transform (WT) is frequently applied in the field of hydrology, mainly for processing and analyzing hydrological data (Gharbia et al. 2022). WT can be used to analyze rainfall trends, stream and river sediments (Gao et al. 2021). Hydrological sequence data processed through wavelet decomposition usually performs better than before (Ahmadi et al. 2021). Currently, WT is one of the most popular preprocessing methods for short-term runoff prediction models (Kaveh et al. 2021). WT is also frequently applied in the field of hydrology, mainly for processing and analyzing hydrological data (Gharbia et al. 2022). WT can be used to analyze rainfall trends, stream and river sediments (Gao et al. 2021). Hydrological sequence data processed through wavelet decomposition usually performs better than before (Ahmadi et al. 2021). Currently, WT is one of the most popular preprocessing methods for short-term runoff prediction models (Kaveh et al. 2021). There is substantial evidence that the performance of forecast models can be enhanced by utilizing signal decomposition techniques to produce cleaner signals as model inputs (La Rosa Lama & Sánchez 2020). However, it should be noted that some papers report that using wavelet analysis as a preprocessing method improves modeling performance, while others indicate that it deteriorates modeling results (Sachindra et al. 2019). The performance of different wavelet types depends on the original input signal, and the behavior of input signals for different hydrological processes depends on the climate and hydrological characteristics of the watershed region (Bajirao et al. 2021). However, hydrological processes are highly dynamic and stochastic (Liu et al. 2022). Especially for large-scale basins with high-runoff, their runoff dynamics are more intense and difficult to predict. In addition, the current mainstream signal decomposition methods are primarily employed for runoff simulation in middle and small river basins, and there is still a lack of hybrid models combined with signal decomposition application cases in large-scale basins. Considering the difficulty in selecting wavelet bases for wavelet decomposition and the complexity of high runoff fluctuations, adaptive signal decompositions such as empirical mode decomposition (EMD) (Feng et al. 2022), variation mode decomposition (VMD) (Seo et al. 2018) and local mean decomposition (LMD) (Peng et al. 2021) are firstly used to preliminarily decompose the runoff series and obtain components with uniform high-frequency components. Then, components with unified high-frequency features are decomposed using WT for secondary decomposition, which may avoid the problem of wavelet base mismatch and solve the problem of the difficulty in capturing runoff features in large-scale basins.

This study aims to build the hybrid models of multiple time–frequency decomposition technology and LSTM and to predict the runoff in a large-scale watershed basin. Various signal decomposition technologies including CEEMDAN/VMD/LMD-WT are set out, and the advantages and disadvantages of each hybrid model are explored and then compare these models to obtain the most outstanding performance. In addition, other published runoff prediction performance results for this basin were collected for comparison to access the superior performance of the hybrid runoff prediction model in large-scale basins.

Study area

The Pearl River is the second largest and third longest river in China, and the largest river in southern China (Zhai et al. 2010). It flows through four Chinese provinces and forms a basin area of 453,690 km2. The Pearl River Basin is located in mild and rainy southern China and has a mean annual temperature between 14 and 22 °C and mean annual precipitation of about 1,200–2,200 mm. The precipitation distribution is irregular within 1 year, and there are big differences in regional and inter-annual variations (Li & Yang 2016). Based on the Hydrological Yearbook of China, 38 of 39 national hydrological stations in the Pearl River Basin were selected for runoff prediction, in which one station hydrological station was excluded due to its near 0 runoff and missing data in most cases. The observed runoff series data are from the Water Year Book of the Pearl River Basin, and China meteorological forcing dataset (He et al. 2020) is provided by the National Tibetan Plateau Data Center (http://data.tpdc.ac.cn). The runoff in the Pearl River Basin is affected by the weather, underlying surface characteristics and tides. The runoff in the whole basin fluctuates greatly, and it is difficult to calibrate it using conventional methods. By now, there are few runoff calibration and prediction publications for the whole basin. An overview of the Pearl River Basin and locations of hydrological stations used in this study are provided in Figure 1.
Figure 1

Map of the study area and locations of hydrological stations.

Figure 1

Map of the study area and locations of hydrological stations.

Close modal

Methods building

Various methods are commonly used for data decomposition, including WT (Hadi & Tombul 2018), variational mode decomposition (VMD) (Zounemat-Kermani et al. 2019), EMD (Liu et al. 2016), LMD, and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). Among them, the WT has been applied successfully in various cases and is especially suitable for modeling the daily runoff as preprocessing to the data to be entered into the deep learning models (Alizadeh et al. 2018). As for other data decomposition methods, EMD has proven to be with end-point effects and over-enveloping (Sankaran & Reddy 2016). On the other hand, CEEMDAN has fewer calculations and accurate reconstruction results than EMD (Ren et al. 2015). In addition, the VMD largely alleviates the situation that one subcomponent has more than one sub-signal with clear differences or more sub-components with similar characteristics, namely the mode mixing issue, which often appears in EMD (Naik et al. 2018). Thus, the EMD is excluded from the comparison method in this study.

The combination of wavelet decomposition and neural networks has been considered one of the best methods for traffic prediction. However, in large basins, the fluctuation of temporal data is more severe. Therefore, a layer of data decomposition is added before WT to better analyze the fluctuation of high-frequency data. CEEMDAN, VMD, and LMD are adaptive, while WT is decomposed manually. Different stations have different characteristics of runoff fluctuation, and the manually set parameters differ greatly. Therefore, CEEMDAN, VMD, and LMD are selected for the first adaptive decomposition to obtain components with relatively fixed frequency, and then WT is used for decomposition of the high-frequency components. Finally, a two-stage preprocessing of CEEMDAN/VMD/LMD-WT-LSTM is used as a hybrid model for runoff prediction.

Data signals decomposition

CEEMDAN: CEEMDAN decomposition is an algorithm developed for analyzing and processing non-linear and non-stationary signals, which is suitable for processing signals with higher noise levels and situations that require more stable decomposition results. This study uses PyEMD to build the CEEMDAN method (https://github.com/vrcarva/ewtpy). The CEEMDAN is self-adaptive; hence, the original runoff time series is automatically divided into several IMFs representing different frequency and variation characteristics (Torres et al. 2011). The CEEMDAN automatically decomposes the time series to a certain extent and then stops. The IMFs with high frequency, which is difficult to predict, shall be further decomposed based on stationary WT, and then the decomposition results are predicted using the LSTM. The other path with low frequency, which is relatively predictable, is directly predicted using the LSTM. The details are provided in Part S1, Supplementary information.

VMD: The VMD is developed to extract non-linear trends and harmonics from complex signals (He et al. 2019a), which is suitable for the decomposition of non-linear and non-stationary signals, and can effectively capture the time–frequency characteristics of these signals. Recently, VMD has been applied in multiple fields, such as forecasting economic and financial time series (Lahmiri 2016), sock price index (Niu et al. 2020), and sunspots time series (Li et al. 2018). Thus, VMD is a potential method for runoff time series processing. Since VMD is not self-adaptive, its parameters must be defined manually. In this study, the tolerance of the convergence criterion is defined as 0.000006 after being tested many times. Regarding the decomposition number, the number of CEEMDAN compositions is selected as a benchmark. Starting from the benchmark value, the decomposition number for the VMD method is set up until the center-frequency in the time has little change compared with the one in the next decomposition. The original runoff time series is divided into several variational mode (VM) components representing different frequency and variation characteristics. Like CEEMDAN, these VMs are divided into two parts for WT preprocession or straightforward prediction. The details are provided in Part S2, Supplementary information.

LMD: LMD was proposed as a frequency analysis method (Smith 2005), which is suitable for decomposing signals with local frequency changes, as it can capture the local characteristics of the signal. Similar to CEEMDAN, LMD is a self-adaptive signal composition method. The original runoff time series is automatically divided into several Productive Functions (PFs) representing different frequency and variation characteristics. Also, by using PyLMD, the variations for LMD can be set automatically, and it performs adaptive decomposition based on the characteristics of the signal. These PFs are also divided into two parts for similar operation with CEEMDAN and VMD. The details are provided in Part S3, Supplementary information.

WT: In this study, the SWT is used for further processing in high-frequency sub-components of the first stage processed by the signals decomposition method. After testing many cases, the coif 3 is selected as the wavelet basis and decomposed the high-frequency sub-components based on the three-scale stationary wavelet packet decomposition method. In general, PyWavelets is an open-source WT software in Python. It combines a simple high-level interface with low-level C and Cython performance. It decomposes the time series through its built-in SWT method. A single waveform can be decomposed into high-frequency and low-frequency waveforms. Thereafter, the high- and low-frequency waveforms are decomposed again. The three-scale stationary wavelet packet decomposition method is used to decompose a waveform for a total of 8 times. Finally, the original runoff time series are transformed into four high-frequency wavelet components (digital components) and four low-frequency wavelet components (approximate components), and those are predicted in the next stage. The details are provided in Part S4, Supplementary information.

LSTM method

For the runoff prediction, the major input variables are rainfall and runoff. Hence, the LSTM model with two input parameters is used to predict the time series of runoff. As the input parameters are only with two dimensions, the complexity of the model does not need to be set very high. Through the trial-and-error method (Liang et al. 2018), a double-layer LSTM with 36 neuron number is the most accurate structure. However, it is a difficult problem as to how long to use the data volume to predict the next step of runoff. After many experiments in this study, it is determined that 30 days is a relatively accurate early days, and the runoff of the 31st day is predicted, as shown in Figure S2, Supplementary information. It is noted that 30 days is a parameter for the Pearl River Basin in the study, and more suitable prediction time needs to be explored for other basins. Runoff sub-sequences and rainfall series have been normalized as input variables. There are 4,382 days in 12 years (2006–2017 years) and ideally 4,322 instances in one hydrological station. Accordingly, the model is trained using 75% of the data (3,258 days, 2006–2014 years) and validated using 25% of the data (1,064 days, 2015–2017 years). Before running the LSTM, the learning factor needs to be set up. Since the decomposition sub-sequences fluctuate more than the wavelet components, a learning rate of 0.0000001 is used for decomposition sub-sequences, and a learning rate of 0.00000001 is used for wavelet components. Once the training is ended, the mean squared error (MSE) between prediction and observation is utilized as a loss function for training assessment and parameter calibration. During each training of the model, the predicted results of the final component sequence of a single input are decomposed from the original data using the MSE function for calculation. The MSE value of 0 is the optimal value, and the gradient value of each parameter is calculated based on backpropagation to continuously optimize the model. The training is finished once the MSE is relatively stable. After many tests, epochs are adjusted 60 times. In this study, the LSTM model is developed and executed using PyTorch. The initialization of parameters within the model ensures the generation of the same random number in different runs by setting a global random seed. Detailed survey and data processing methods are summarized in Part S5 and Figure S1, Supplementary information.

Hybrid method for runoff time series prediction

The hybrid model combining multiple time–frequency decomposition technology and LSTM is illustrated in Figure 2. The proposed approach is composed of four stages:
Figure 2

Sketch map of the proposed hybrid model.

Figure 2

Sketch map of the proposed hybrid model.

Close modal

Step 1: Data decomposition. There are two steps in this stage. First, the signal decomposition methods (CEEMDAN, VMD, and LMD) are used to decompose the original runoff time series into several sub-components (IMFs, VMs, PFs), representing different values of vibrant frequency. Second, the SWT is applied for further decomposition on the high-frequency part of the sub-components from the previous step to get wavelet components. Subsequently, the bands obtained from data decomposition are collectively referred to as the sub-sequence, with sub-components representing the bands decomposed in the first step and wavelet components representing the bands decomposed in the second step.

Step 2: Data preprocessing. All data are subdivided into training and validation sets, and then each 30-day series part is taken as an input data series length to predict the 31st day. These data are normalized into [0, 1] using Min-Max Normalization. Detailed survey and data processing methods are summarized in Part S6, Supplementary information.

Step 3: Sub-sequence prediction. LSTM models are applied to predict each sequence, including the remaining CEEMDAN/VMD/LMD sub-components (IMFs, VMs, or PFs) and the stationary WT frequency wavelet components. In the prediction, the historic value time series [t − 29, t] is considered an input, while the data at time t + 1 are considered as an output ranging from 0 to 1. Thereafter, the true runoff data are obtained based on inverse normalization as shown in Part S6, Supplementary information.

Step 4: Data reconstruction. Firstly, the inverse stationary WT is used to reconstruct high- and low-frequency wavelet components as high-frequency sub-components (IMFs, VMs, PFs) of CEEMDAN/VMD/LMD. Secondly, by summing reconstructed sub-components (IMFs, VMs, or PFs) from wavelet components and predicted sub-components based on LSTM, the new runoff time series, including the runoff prediction time step, is finally predicted.

In this study, the daily runoff for 38 hydrological stations of the Pearl River is predicted using the hybrid model. In order to show the prediction results more directly, it is divided these hydrological stations into three groups based on mean runoff volume: group 1 – low runoff, group 2 – medium runoff, group 3 – high runoff. The geometric interval is a statistical classification method based on the law of numerical statistical distribution that minimizes the sum of squares within groups, and it is used for runoff grouping. The runoff values and their division results are shown in Table 1.

Table 1

Runoff grouping of hydrological stations and the performance of three hybrid models in validation period (2015–2017)

Station IDMean flow (m3)NSE
R2
Group
CEEMDANVMDLMDCEEMDANVMDLMD
6,748.35 0.614 0.923 0.608 0.721 0.925 0.609 
6,204.72 0.889 0.985 0.613 0.889 0.989 0.640 
5,245.53 0.891 0.965 0.661 0.891 0.966 0.686 
3,786.70 0.760 0.977 0.503 0.782 0.979 0.559 
1,791.81 0.636 0.973 0.807 0.789 0.975 0.808 
1,618.92 0.910 0.969 0.815 0.911 0.969 0.815 
1,340.77 0.916 0.973 0.664 0.919 0.975 0.666 
1,333.73 0.824 0.939 0.702 0.827 0.942 0.703 
1,300.33 0.929 0.976 0.778 0.930 0.979 0.780 
10 1,259.45 0.744 0.958 0.460 0.748 0.961 0.474 
11 1,136.71 0.830 0.952 0.681 0.832 0.955 0.686 
12 1,071.47 0.929 0.969 0.315 0.930 0.972 0.580 
13 773.80 0.813 0.937 0.638 0.815 0.940 0.644 
14 671.44 0.679 0.919 −0.064 0.682 0.925 0.347 
15 649.43 0.850 0.955 0.713 0.853 0.958 0.725 
16 503.64 0.874 0.966 0.729 0.885 0.969 0.731 
17 495.97 0.896 0.958 0.822 0.904 0.962 0.825 
18 466.86 0.791 0.927 0.756 0.798 0.928 0.757 
19 410.57 0.887 0.914 0.792 0.890 0.916 0.793 
20 406.13 0.749 0.904 0.506 0.749 0.918 0.545 
21 362.92 0.713 0.901 0.433 0.760 0.914 0.461 
22 293.80 0.677 0.889 0.299 0.679 0.897 0.451 
23 228.79 0.772 0.789 0.357 0.788 0.801 0.554 
24 216.93 0.811 0.782 0.665 0.824 0.785 0.682 
25 214.26 0.712 0.815 0.555 0.715 0.816 0.557 
26 189.10 0.772 0.866 0.682 0.777 0.870 0.683 
27 146.69 0.703 0.701 0.707 0.703 0.751 0.719 
28 140.21 0.658 0.797 −0.169 0.659 0.803 0.300 
29 106.59 0.682 0.740 −0.775 0.683 0.743 0.175 
30 106.08 0.618 0.598 0.621 0.647 0.665 0.632 
31 98.03 0.748 0.740 0.669 0.749 0.741 0.692 
32 82.30 0.524 0.623 0.512 0.555 0.631 0.526 
33 67.67 0.674 0.447 0.507 0.678 0.458 0.526 
34 45.36 0.350 0.050 0.353 0.363 0.292 0.386 
35 45.12 0.394 0.393 0.410 0.407 0.411 0.430 
36 37.27 0.499 0.113 −0.320 0.512 0.160 0.191 
37 24.79 0.466 0.093 0.431 0.496 0.334 0.451 
38 13.17 0.379 −0.296 0.278 0.435 0.095 0.330 
Station IDMean flow (m3)NSE
R2
Group
CEEMDANVMDLMDCEEMDANVMDLMD
6,748.35 0.614 0.923 0.608 0.721 0.925 0.609 
6,204.72 0.889 0.985 0.613 0.889 0.989 0.640 
5,245.53 0.891 0.965 0.661 0.891 0.966 0.686 
3,786.70 0.760 0.977 0.503 0.782 0.979 0.559 
1,791.81 0.636 0.973 0.807 0.789 0.975 0.808 
1,618.92 0.910 0.969 0.815 0.911 0.969 0.815 
1,340.77 0.916 0.973 0.664 0.919 0.975 0.666 
1,333.73 0.824 0.939 0.702 0.827 0.942 0.703 
1,300.33 0.929 0.976 0.778 0.930 0.979 0.780 
10 1,259.45 0.744 0.958 0.460 0.748 0.961 0.474 
11 1,136.71 0.830 0.952 0.681 0.832 0.955 0.686 
12 1,071.47 0.929 0.969 0.315 0.930 0.972 0.580 
13 773.80 0.813 0.937 0.638 0.815 0.940 0.644 
14 671.44 0.679 0.919 −0.064 0.682 0.925 0.347 
15 649.43 0.850 0.955 0.713 0.853 0.958 0.725 
16 503.64 0.874 0.966 0.729 0.885 0.969 0.731 
17 495.97 0.896 0.958 0.822 0.904 0.962 0.825 
18 466.86 0.791 0.927 0.756 0.798 0.928 0.757 
19 410.57 0.887 0.914 0.792 0.890 0.916 0.793 
20 406.13 0.749 0.904 0.506 0.749 0.918 0.545 
21 362.92 0.713 0.901 0.433 0.760 0.914 0.461 
22 293.80 0.677 0.889 0.299 0.679 0.897 0.451 
23 228.79 0.772 0.789 0.357 0.788 0.801 0.554 
24 216.93 0.811 0.782 0.665 0.824 0.785 0.682 
25 214.26 0.712 0.815 0.555 0.715 0.816 0.557 
26 189.10 0.772 0.866 0.682 0.777 0.870 0.683 
27 146.69 0.703 0.701 0.707 0.703 0.751 0.719 
28 140.21 0.658 0.797 −0.169 0.659 0.803 0.300 
29 106.59 0.682 0.740 −0.775 0.683 0.743 0.175 
30 106.08 0.618 0.598 0.621 0.647 0.665 0.632 
31 98.03 0.748 0.740 0.669 0.749 0.741 0.692 
32 82.30 0.524 0.623 0.512 0.555 0.631 0.526 
33 67.67 0.674 0.447 0.507 0.678 0.458 0.526 
34 45.36 0.350 0.050 0.353 0.363 0.292 0.386 
35 45.12 0.394 0.393 0.410 0.407 0.411 0.430 
36 37.27 0.499 0.113 −0.320 0.512 0.160 0.191 
37 24.79 0.466 0.093 0.431 0.496 0.334 0.451 
38 13.17 0.379 −0.296 0.278 0.435 0.095 0.330 

Model performance evaluation

The performance of the three hybrid models in this study is evaluated using the Nash–Sutcliffe efficiency coefficient (NSEC) and coefficient of determination (R2). Generally, the NSE is one of the most regularly used criteria for assessing runoff prediction, and the closer the value is to 1, the more accurate the prediction (Kumar et al. 2016). Besides, an NSE value greater than 0.6 is an outstanding prediction, and a value less than 0.4 is an unacceptable prediction (Nash & Sutcliffe 1970). Detailed survey and data processing method are summarized in Part S7, Supplementary information.

Sequence decomposition of runoff

Three signal decomposition methods (CEEMDAN, VMD, and LMD) are utilized to decompose the original runoff series data as shown in Figure 3. The numbers of IMFs, VMs, and PFs derived from CEEMDAN, VMD, and LMD decomposition range from 10 to 13, 7 to 14, and 7 to 9, respectively. Each band contains information of a specific frequency. When processing the same data, the more bands can be decomposed, indicating that this method has a greater ability to identify the data frequency, the simpler it is to predict. The minimum number of bands decomposed by CEEMDAN is 10, larger than those of VMD (7) and LMD (7). Hence, CEEMDAN has a strong capacity for identifying all hydrological stations with high and low flow. VMD has the best decomposition ability for a specific band in terms of the maximum number of bands (14), but it is not as effective as CEEMDAN for some bands because of the minimum band number (7). The maximum and the minimum number of the bands decomposed by LMD are the lowest (7 and 9), indicating it performs worst in identifying various frequencies.
Figure 3

The first-stage decomposition results of (a) CEEMDAN; (b) VMD; and (c) LMD.

Figure 3

The first-stage decomposition results of (a) CEEMDAN; (b) VMD; and (c) LMD.

Close modal

The VMD's decomposition process is different from those of CEEMDAN and LMD. CEEMDAN and LMD separate the high-frequency band from the original runoff time series, and the amplitudes of IMFs and PFs decrease with the decomposition times. VMD separates the low-frequency band from the original runoff time series first, and the amplitudes of VMs decrease with the decomposition times. VMs fluctuate more slowly than IMFs and PFs when the amplitude is large. Thus, at the same frequency, the VMs have a smaller amplitude and stable vibration when compared with IMFs and PFs. In addition, the last separated IMF and PF is monotone, with only a single extreme point. However, the first VM also has many extreme points, and the lowest frequency VM still fluctuates regularly. Thus, VMD will not completely decompose the band and is inferior to CEEMDAN and LMD in decomposition depth.

For a more detailed elaboration, stations 7, 20, and 33 were selected as the representations of three station groups since they are the median of corresponding groups; the results are illustrated in Figure 3. Figure 3(a) shows that, for CEEMDAN, the original runoff time series is decomposed into 11–12 IMFs which indicate more significant variation characteristics. The IMFs with low-frequency are more helpful for models to estimate future value. Station 7 series exhibits less decomposition than stations 20 and 33, demonstrating that large runoff is helpful for CEEMDAN to capture the variation characteristics. As shown in Figure 3(b), VM frequency increases from the first to the last sub-components. The decomposition number is 8–10, less than CEEMDAN; therefore, VMD performs more efficiently for single decomposition. In Figure 3(c), multiple extreme vibrations exist in high-frequency PFs; the maximum and minimum values of PF1 are considerably greater than IMF1 and VM1. It is difficult to capture the features of band changes with such violent fluctuations. In addition, the last PF (PF9) is irregular overall, reducing model prediction accuracy.

In general, CEEMDAN is suitable for capturing the data features of different amounts of runoff, while the VMD method makes it easier to extract high-runoff data. Although LMD has a balanced ability to extract various high and low runoff data, it is inferior to CEEMDAN in the same amount of runoff. In the simulation of runoff in large-scale watersheds, the CEEMDAN method has more general applicability in the preprocessing of runoff.

WT of runoff

The high-frequency band data obtained after a signal decomposition remains challenging to predict due to their violent fluctuations. Therefore, WT was applied to decompose the two bands with the highest frequency (the last two bands of IMFs and PFs and the first two bands of VMs). The stations of 7, 20, and 33 are also the representative stations for describing the method's features, as shown in Figure S3, Supplementary information. Each station has two band groups after processing by each method, and one band group has eight bands. The first and second band groups are decomposed from the sequence of the first high-frequency and the second high-frequency in the first decomposition stage, respectively. The first four bands represent the decomposed digital components in a single decomposition result, whereas the latter four bands indicate the decomposed approximate components. The approximate components demonstrate the low-frequency of IMFs/VMs/PFs, and digital components represent the higher one.

In terms of amplitude, the band amplitudes obtained by VMs decomposition are the smallest at station 7. The maximum and minimum amplitudes of VMs decomposition are 600 and 12.5, respectively, less than IMFs (5,000 and 300) and PFs (50,000 and 750) decompositions. Maximum and minimum amplitudes of VMs decomposition are 1,500 and 15, respectively, at station 22. These values are similarly lower than IMFs (10,000 and 400) and PFs (50,000 and 1,000). The band with excessively high amplitude is unpredictable. Thus, the decomposed wavelet components from VMs are more predictable than from IMFs and PFs at stations with a high and median runoff. However, the average runoff at station 22 is lower than at station 7. It is reasonable to speculate that a decrease in runoff reduces the performance of WT, causing a sudden peak in the band. It will make bands more unpredictable. VMs decomposition also yields the smallest band amplitudes for station 33. There is only one band of high-information VMs, with a value of 1,000. The other bands are mostly in single digits and carry little information. These bands with single digits offer a high prediction accuracy. However, the amount of information stored inside is minimal, and its contribution to the final result is limited.

Regarding fluctuation, two approximate components from IMFs are not prominent. In stations 7 and 22, the values around the 200th day changed sharply, whereas those around the 500th, 1,000th, 1,500th, and 2,000th day did not change. It shows that the first and second IMFs included fewer low-frequency components and that the most low-frequency information is stored in other IMFs. This will make the wavelet component easy to predict but will raise the complexity of predicting other IMFs. The amplitude of VM at site 7 is the largest, but it is only 1,000; hence, VMs are simple to estimate due to their small amplitude. As for the digital decomposition result of PF, it fluctuates most violently with up to 5,000, 3,000, and 5,000 in stations 7, 22, and 33, respectively. These fluctuations are entirely unpredictable. In general, in stations with high runoff, WT is ideally suited for processing VMs. The LMD-WT method is inferior to VMD-WT and CEEMDAN-WT for forecasting runoff under high variability conditions.

Prediction performance for hybrid models

Before running the model, a unique random seed for initialization of the three models was set to ensure consistency in the initial parameters of the LSTM model. Training data (from the years 2006 to 2014) were applied to calibrate parameters. Figures S4 and S5 and Table S1, Supplementary information illustrate the performance of three hybrid models during training periods. The average NSE and R2 of CEEMDAN-based, VMD-based, and LMD-based models are 0.690 and 0.705, 0.764 and 0.787, 0.375 and 0.529, respectively. For the CEEMDAN-based method, 94.7% of all predictions are satisfactory, showing an outstanding performance. Followed by the VMD-based hybrid model, 86.8% of all NSE and R2 are acceptable. The LMD-based model has the worst performance, although 63.2% NSE and 71.1% R2 of all predictions are still within the acceptable range.

Figure 4 illustrates the hybrid model performance in all hydrological stations during the validation period (from 2015 to 2017), and Table 1 details their evaluation indicators. The group with the best prediction results is group 3 (average NSE = 0.807 and R2 = 0.948), followed by group 2 (average NSE = 0.745 and R2 = 0.925), and group 1 is the worst (average NSE = 0.388 and R2 = 0.704). In general, the performance of the hybrid model improves as runoff increases. However, there are exceptions. The hydrological stations with a much lower amount of runoff are difficult to predict, as the performance of group 1 is mostly unacceptable. Based on the results of signal decomposition, this can be due to the much lower runoff value in these stations, mainly close to 0. Therefore, the decomposition cannot identify sub-sequences with sufficient runoff variation features, and LSTM cannot discover the inherent change laws of runoff. As a result, the band decomposition effect of stations with limited runoff yields poor prediction results. In addition, when the runoff exceeds a certain range, the increase of runoff has little impact on the performance. Figure 5 indicates the performance of hybrid models and the statistic runoff values at each station, using CEEMDAN–WT–LSTM as an example, station 3 (mean runoff = 5,245.53 m3 /s, NSE = 0.891, R2 = 0.956) has ten times more runoff value than station 17 (mean runoff = 495.97 m3 /s, NSE = 0.896, R2 = 0.958) but similar model performance. In addition, the runoff time series with acute fluctuations and sufficient time characteristics, such as station 17, are also easy to predict by using the hybrid models even with relatively small runoff.
Figure 4

The boxplot (NSE and R2) of model performances in different runoff groups.

Figure 4

The boxplot (NSE and R2) of model performances in different runoff groups.

Close modal
Figure 5

The statistical results of performance of hybrid models for all stations of the basin in terms of runoff flow and their NSEs. The line chart represents the runoff flow, and the bar chart represents the NSEs.

Figure 5

The statistical results of performance of hybrid models for all stations of the basin in terms of runoff flow and their NSEs. The line chart represents the runoff flow, and the bar chart represents the NSEs.

Close modal

The simulation performance of three hybrid models varies significantly with runoff levels, as shown in Figure 4. In group 3, 97.2% of all NSEs and R2 fall within the acceptable range. VMD–WT–LSTM performs the best, followed by CEEMDAN–WT–LSTM and LMD-WT-LSTM. The low limit of the hybrid model based on VMD–WT–LSTM (NSE = 0.923, R2 = 0.929) was nearly identical to the high limit of CEEMDAN–WT–LSTM (NSE = 0.929, R2 = 0.930), which was significantly superior to LMD-WT-LSTM (NSE = 0.815, R2 = 0.815). In group 2, the performances of VMD–WT–LSTM and CEEMDAN–WT–LSTM declined slightly, but CEEMDAN–WT–LSTM performance indicators are more stable. These two methods have good results in feature extraction of medium and high flow, but the simulation performance is also different regarding runoff fluctuation. As shown in Figure 5 and Table 1, VMD–WT–LSTM performance is almost directly proportional to the station amount of runoff, but the performance of CEEMDAN–WT–LSTM is irregular. For example, station 1 has the highest runoff and the largest difference between minimum, medium, and average runoff, while its NSE and R2 of the CEEMDAN–WT–LSTM method are greatly lower than those of stations 2 and 3. Several other stations, such as stations 4, 5, 10, and 14, with large runoff changes, have comparable simulation results. As stated previously, LMD-WT-LSTM is poor in extracting high-frequency bands, resulting in a poor prediction in large runoff stations, such as the 900th day predictions in stations 7 and 20 that diverge significantly from actual observations (Figure S6, Supplementary information). Thus, LMD-WT-LSTM is inadequate for processing runoff with excessively erratic fluctuations compared to the two other approaches, which can handle the extraordinary value. Only the CEEMDAN–WT–LSTM in group 1 has an average NSE greater than 0.5. In general, the average runoff data above a specific scale (about 140 m3/s) are very suitable for simulation using hybrid models. VMD–WT–LSTM is excellent in identifying and predicting fluctuations, so it is suitable for data with high (mean runoff greater than 1,071 m3/s) and median (mean runoff greater than 140 m3/s and lower than 1,071 m3/s) amounts of runoff. An increase in runoff volume enhances the VMD–WT–LSTM impact. CEEMDAN–WT–LSTM is more appropriate for predicting low runoff (mean runoff less than 140 m3/s) and high and median runoffs with relatively low violent fluctuations. LMD-WT-LSTM is incapable of handling outliers. It is just suitable for medium and high runoff bands with stable fluctuations but generally inferior to CEEMDAN–WT–LSTM and VMD–WT–LSTM when based on the same runoff batch.

The single LSTM performance (Table S2 and Figure S7, Supplementary information) in predicting the large and medium runoff is often worse than that of the CEEMDAN–WT–LSTM and VMD–WT–LSTM models and superior to that of the LMD-WT-LSTM model. This result is consistent with the report of Sun et al. (2019) (Table S3, Supplementary information), who applied a single ANN/LSTM/autoregressive model (AR) and decomposition hybrid models to estimate runoff of two stations in the Pearl River and the overall prediction performance of the hybrid models is superior. Previous studies indicated that the decomposed sub-sequence is easier to predict, as shown in Table S3, Supplementary information. Wang et al. (2015) reported that ensemble empirical mode decomposition (EEMD) can effectively increase forecasting accuracy for the Dahuofang reservoir, with improvements of 437.78% and 135.01% in the validation stage for the coefficient of correlation (R) and NSEC. Zhang et al. (2021) indicated that the non-autoregressive (NAR) model with CEEMDAN decomposition has superior prediction ability than the single model. Daily runoff prediction is often seen as a more complex process due to its significant variability. He et al. (2019a, 2019b) demonstrated that the VMD-DNN model is a promising new method for daily runoff forecasting. Sun et al. (2019) also presented that the performance of the hybrid model based on single decomposition (WT-ANN) at Longchuan station (R = 0.843) is better than the single model (R = 0.834). For comparable hydrological stations, the CEEMDAN–WT–LSTM and VMD–WT–LSTM models in this study have R values of 0. 907 and 0.886, showing superior performance. Similar LSTM or single decomposition-based hybrid runoff prediction models, as exhibited in Table S3, Supplementary information, provided good prediction results but lower performance than hybrid models in this study. Although first-stage decomposition can successfully identify sub-sequences with fluctuation characteristics, the high-frequency sub-sequence is still challenging to predict. Hence, the two-stage decomposition method helps extract the series characteristics hidden in the high-frequency series (Chen et al. 2021). Overall, incorporating signal decomposition into the deep learning models can produce accurate runoff prediction findings that can serve as a valuable reference for global hydrological runoff prediction research.

The traditional hydrological model, such as SWAT, i.e., the physical-based model, has been the most crucial tool for runoff prediction and calibration. Table S3, Supplementary information summarizes the model performance for various rivers compared to the hybrid models and the model performance in a similar area or runoff level of this study. For the rivers of Marol, Roodan, Fox, and Luohe, whose basin areas are comparable to Longchuan (7,691 km2), the NSEs of the SWAT model for runoff prediction are 0.740, 0.680, 0.460–0.670, and 0.540 (Bekele & Knapp 2010; Suliman et al. 2015; Himanshu et al. 2018; Zhang et al. 2021), and are significantly less than the performance of the CEEMDAN–WT–LSTM method (Tables 1 and Table S3, Supplementary information). Similarly, the stated model performance of NSE and R for Mara River (13,750 km2), Xiaohong River (4,417 km2), and Yass River (1,597 km2) for daily runoff prediction is inferior to this study (Dessu & Melesse 2013; Saha et al. 2014; Li et al. 2020). As can be seen, the performance of hybrid models is superior to that of traditional ones in both large and small areas, and the proposed hybrid model performs better in similar stations. In addition, the SWAT requires several meteorological and surface parameters, including daily precipitation, temperature, wind speed, solar radiation, and relative humidity, and the simulation for large basins needs a long running time (Zhang et al. 2016). In contrast, the hybrid models in this study require only two parameters: daily precipitation and runoff, and their running time for all basin stations is within 10 minutes. In terms of accuracy, efficiency, and simplicity, LSTM-based two-stage decomposition is better than traditional methods.

The effectiveness and practicability of hybrid models were verified in this study and various other attempts, but there were still a few possible shortfalls. Firstly, the performance of predicting time series with low runoff is poor because none of the decomposition methods used in this research can identify sub-sequence with sufficient change characteristics when the runoff is extremely low. It is assumed that runoff with a small quantity that cannot demonstrate the available time scale feature is more susceptible to other influences, such as solar radiation. Hence, obtaining additional data types and increasing the number of possible impact factors within the hybrid model is a feasible optimization method for predicting runoff with a small degree of error. Secondly, higher data accuracy can lead to better prediction performance. This study applies the daily runoff data to predict. The prediction accuracy will be enhanced with higher time accuracy data, such as predictions based on 6 or 12 h; however, it is challenging to achieve high-time accuracy. The relationship between data quality and model accuracy must be investigated further. The LSTM model based on two-layer signal decomposition is recommended for large basin runoff forecasting, despite the presence of some uncertainty.

This study proposes three hybrid models for predicting the daily runoff time series using a two-stage decomposition. Based on daily runoff and rainfall data from the Pearl River Basin in China, two statistical performance evaluation indicators (R2 and NSEC) are applied to evaluate the model performances. It demonstrated that signal decomposition methods can effectively reduce the difficulty of prediction, and two-stage decomposition can more finely decompose high- and low-frequency components, resulting in more accurate prediction results. VMD–WT–LSTM is appropriate for predicting high and median runoff, whereas CEEMDAN–WT–LSTM is more suitable for low runoff and high and median runoffs with low violent fluctuations. The results predicted by LMD-WT-LSTM are generally not as accurate as the first two types of models. Hybrid models require fewer data and processing times and are easier to predict than traditional models. However, when the runoff is extremely low, none of the decomposition methods used in this research can identify sub-sequence with sufficient change characteristics. Therefore, the presented hybrid methods, based on two-stage signal decomposition and deep learning, are promising alternatives for daily runoff prediction in large-scale basins, especially for high runoffs.

The authors would like to acknowledge the financial support from the Guangdong Natural Science Funds for Distinguished Young Scholars (2022B1515020088), National Natural Science Foundation of China (42077429 and U1701242), Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety (2019B030301008), Guangdong Basic and Applied Basic Research Foundation (2021A1515011641), and Guangzhou Basic and Applied Basic Research Foundation (202102021233).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Ahmadi
F.
,
Mehdizadeh
S.
&
Mohammadi
B.
2021
Development of bio-inspired- and wavelet-based hybrid models for reconnaissance drought index modeling
.
Water Resources Management
35
(
12
),
4127
4147
.
https://doi.org/10.1007/s11269-021-02934-z
.
Alizadeh
M.
,
Nourani
V.
,
Mousavimehr
M.
&
Kavianpour
M.
2018
Wavelet-IANN model for predicting flow discharge up to several days and months ahead
.
J Hydroinformatics.
20
(
1
),
134
148
.
https://doi.org/10.2166/hydro.2017.142
.
Bajirao
T.
,
Kumar
P.
,
Kumar
M.
,
Elbeltagi
A.
&
Kuriqi
A.
2021
Potential of hybrid wavelet-coupled data-driven-based algorithms for daily runoff prediction in complex river basins
.
Theor. Appl. Climatol
145
,
1207
1231
.
Bekele
E. G.
&
Knapp
H. V.
2010
Watershed modeling to assessing impacts of potential climate change on water supply availability
.
Water Resour Manag.
24
(
13
),
3299
3320
.
https://doi.org/10.1007/s11269-010-9607-y
.
Cao
Y. Y.
,
Cao
Y. T.
,
Wen
S.
,
Huang
T.
&
Zeng
Z.
2019
Passivity analysis of delayed reaction-diffusion memristor-based neural networks
.
Neural Netw.
109
,
159
167
.
https://doi.org/10.1016/j.neunet.2018.10.004
.
Chen
S.
,
Ren
M.
&
Sun
W.
2021
Combining two-stage decomposition based machine learning methods for annual runoff forecasting
.
J Hydrol.
603
,
126945
.
https://doi.org/10.1016/j.jhydrol.2021.126945.
Dessu
S.
&
Melesse
A.
2013
Impact and uncertainties of climate change on the hydrology of the Mara River basin, Kenya/Tanzania
.
Hydrol Process.
27
(
20
),
2973
2986
.
https://doi.org/10.1002/hyp.9434
.
Devi
G.
,
Ganasri
B.
&
Dwarakish
G.
,
2015
A review on hydrological models
.
Aquatic Procedia
4
,
1001
1007
.
Duan
Q.
,
Sorooshian
S.
&
Gupta
V.
1992
Effective and efficient global optimization for conceptual rainfall-runoff models
.
Water Resour Res.
28
(
4
),
1015
1031
.
https://doi.org/10.1029/91WR02985
.
Feng
Z.
,
Niu
W.
,
Wan
X.
,
Xu
B.
,
Zhu
F.
&
Chen
J.
2022
Hydrological time series forecasting via signal decomposition and twin support vector machine using cooperation search algorithm for parameter identification
.
Journal of Hydrology
612
,
128213
.
https://doi.org/10.1016/j.jhydrol.2022.128213
.
Fidal
J.
&
Kjeldsen
T.
2020
Accounting for soil moisture in rainfall-runoff modelling of urban areas
.
J Hydrol.
589
,
125122
.
Gao
S.
,
Huang
Y.
,
Zhang
S.
,
Han
J.
,
Wang
G.
,
Zhang
M.
&
Lin
Q.
2020
Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation
.
J Hydrol.
589
, 125188.
https://doi.org/10.1016/j.jhydrol.2020.125188
.
Gao
S.
,
Huang
Y.
,
Zhang
S.
,
Han
J.
,
Wang
G.
,
Zhang
M.
&
Lin
Q.
2021
Prediction of long-term inter-seasonal variations of streamflow and sediment load by state-space model in the Loess Plateau of China
.
Journal of Hydrology
600
,
126534
.
https://doi.org/10.1016/j.jhydrol.2021.126534
.
Gharbia
S.
,
Riaz
K.
,
Anton
I.
,
Makrai
G.
,
Gill
L.
,
Creedon
L.
,
McAfee
M.
,
Johnston
P.
&
Pilla
F.
2022
Hybrid data-driven models for hydrological simulation and projection on the catchment scale
.
Sustainability
14
(
7
),
4037
.
https://doi.org/10.3390/su14074037
.
Goudarzi
F.
,
Sarraf
A.
&
Ahmadi
H.
2020
Prediction of runoff within Maharlu basin for future 60 years using RCP scenarios
.
Arab J Geosci.
13
,
605
.
https://doi.org/10.1007/s12517-020-05634-x
.
Hadi
S.
&
Tombul
M.
2018
Streamflow forecasting using four wavelet transformation combinations approaches with data-driven models: a comparative study
.
Water Resour Manag.
32
(
14
),
4661
4679
.
https://doi.org/10.1007/s11269-018-2077-3
.
Hao
S.
, Wörman, A., Riml, J. & Bottacin-Busolin, A.
2023
A model for assessing the importance of runoff forecasts in periodic climate on hydropower production
.
Water
15
(
8
),
1559
.
https://doi.org/10.3390/w15081559
.
He
X.
,
Luo
J.
,
Zuo
G.
&
Xie
J.
2019b
Daily runoff forecasting using a hybrid model based on variational mode decomposition and deep neural networks
.
Water Resour Manag.
33
(
4
),
1571
1590
.
https://doi.org/10.1007/s11269-019-2183-x
.
He
J.
,
Yang
K.
,
Tang
W.
,
Lu
H.
,
Qin
J.
,
Chen
Y.
&
Li
X.
2020
The first high-resolution meteorological forcing dataset for land process studies over China
.
Sci Data.
7
,
25
.
https://doi.org/10.1038/s41597-020-0369-y
.
Himanshu
S.
,
Pandey
A.
&
Patil
A.
2018
Hydrologic evaluation of the TMPA-3b42v7 precipitation data Set over an agricultural watershed using the SWAT model
.
J Hydrol Eng.
23
(
4
).
https://doi.org/10.1061/(ASCE)HE.1943-5584.0001629.
Jamei
M.
,
Ahmadianfar
I.
,
Chu
X.
&
Yaseen
Z.
2020
Prediction of surface water total dissolved solids using hybridized wavelet-multigene genetic programming: New approach
.
J Hydrol.
589
,
125335
.
https://doi.org/10.1016/j.jhydrol.2020.125335.
Kaveh
K.
,
Kaveh
H.
,
Bui
M.
&
Rutschmann
P.
2021
Long short-term memory for predicting daily suspended sediment concentration
.
Engineering With Computers
37
(
3
),
2013
2027
.
https://doi.org/10.1007/s00366-019-00921-y
.
Kumar
P. S.
,
Praveen
T. V.
&
Prasad
M. A.
2016
Artificial neural network model for rainfall-runoff-A case study
.
Int J Hybrid Inf Technol.
9
(
3
),
263
272
.
Lahmiri
S.
2016
A variational mode decompoisition approach for analysis and forecasting of economic and financial time series
.
Expert Syst Appl.
55
,
268
273
.
https://doi.org/10.1016/j.eswa.2016.02.025
.
La Rosa Lama
G.
&
Sánchez
I.
2020
Hybrid models based on mode decomposition and recurrent neural networks for streamflow forecasting in the Chira river in Peru
. In:
2020 IEEE Eng Int Res Conf EIRCON
, p.
1
4
.
IEEE, New York, https://doi.org/10.1109/EIRCON51178.2020.9254035
.
Li
X.
&
Yang
T.
2016
Spatio-temporal Changes Prediction of Precipitation and Temperature in Pearl River Basin From 2015 to 2100
. https://www.webofscience.com/wos/alldb/full-record/INSPEC:15927958.
Li
X.
,
Huang
S.
,
He
R.
,
Wang
G.
,
Tan
M.
,
Yang
X.
&
Zheng
Z.
2020
Impact of temporal rainfall resolution on daily streamflow simulations in a large-sized river basin
.
Hydrol Sci J-J Sci Hydrol.
65
(
15
),
2630
2645
.
https://doi.org/10.1080/02626667.2020.1836374
.
Liu
Y.
,
Yang
G.
,
Li
M.
&
Yin
H.
2016
Variational mode decomposition denoising combined the detrended fluctuation analysis
.
Signal Process.
125
,
349
364
.
https://doi.org/10.1016/j.sigpro.2016.02.011
.
Liu
Q.
,
Dai
H.
,
Gui
D.
,
Hu
B.
,
Ye
M.
,
Wei
G.
,
Qin
J.
, &
Zheng
J.
2022
Evaluation and optimization of the water diversion system of ecohydrological restoration megaproject of Tarim River, China, through wavelet analysis and a neural network
.
J Hydrol.
608
,
127586
.
Min
X.
,
Hao
B.
,
Sheng
Y.
&
Qin
J.
2023
Transfer performance of gated recurrent unit model for runoff prediction based on the comprehensive spatiotemporal similarity of catchments
.
J. Environ. Manage.
330
,
117182
.
Mosavi
A.
,
Ozturk
P.
&
Chau
K.
2018
Flood prediction using machine learning models: literature review
.
WATER.
10
(
11
),
1536
.
https://doi.org/10.3390/w10111536
.
Naik
J.
,
Dash
S.
,
Dash
P.
&
Bisoi
R.
2018
Short term wind power forecasting using hybrid variational mode decomposition and multi-kernel regularized pseudo inverse neural network
.
Renew Energy.
118
,
180
212
.
https://doi.org/10.1016/j.renene.2017.10.111
.
Nash
J. E.
&
Sutcliffe
J. V.
1970
River flow forecasting through conceptual models part I – A discussion of principles
.
J Hydrol.
10
(
3
),
282
290
.
https://doi.org/10.1016/0022-1694(70)90255-6
.
Niu
H.
,
Xu
K.
&
Wang
W.
2020
A hybrid stock price index forecasting model based on variational mode decomposition and LSTM network
.
Appl Intell.
50
(
12
),
4296
4309
.
https://doi.org/10.1007/s10489-020-01814-0
.
Peng
S.
,
Chen
R.
,
Yu
B.
, ,
Xiang
M.
,
Lin
X.
&
Liu
E.
2021
Daily natural gas load forecasting based on the combination of long short term memory, local mean decomposition, and wavelet threshold denoising algorithm
.
J Nat Gas Sci Eng.
95
,
104175
.
Ren
Y.
,
Suganthan
P.
&
Srikanth
N.
2015
A comparative study of empirical mode decomposition-based short-term wind speed forecasting methods
.
IEEE Trans Sustain Energy.
6
(
1
),
236
244
.
https://doi.org/10.1109/TSTE.2014.2365580
.
Ren
Y.
,
Zeng
S.
,
Liu
J.
,
Tang
Z.
,
Hua
X.
,
Li
Z.
,
Song
J.
&
Xia
J.
2022
Mid- to long-term runoff prediction based on deep learning at different time scales in the Upper Yangtze River Basin
.
Water.
14
(
11
),
1692
.
https://doi.org/10.3390/w14111692.
Sachindra
D.
,
Ahmed
K.
,
Rashid
M.
, ,
Sehgal
V.
,
Shahid
S.
&
Perera
B.
2019
Pros and cons of using wavelets in conjunction with genetic programming and generalised linear models in statistical downscaling of precipitation
.
Theor. Appl. Climatol.
138
,
617
638
.
Saha
P.
,
Zeleke
K.
&
Hafeez
M.
2014
Streamflow modeling in a fluctuant climate using SWAT: Yass River catchment in south eastern Australia
.
Environ EARTH Sci.
71
(
12
),
5241
5254
.
https://doi.org/10.1007/s12665-013-2926-6
.
Smith
J.
2005
The local mean decomposition and its application to EEG perception data
.
J R Soc Interface.
2
(
5
),
443
454
.
https://doi.org/10.1098/rsif.2005.0058
.
Suliman
A. H. A.
,
Jajarmizadeh
M.
,
Harun
S.
&
Darus
I. Z. M.
2015
Comparison of semi-distributed, GIS-based hydrological models for the prediction of streamflow in a large catchment
.
Water Resour Manag.
29
(
9
),
3095
3110
.
https://doi.org/10.1007/s11269-015-0984-0
.
Sun
Y.
,
Niu
J.
&
Sivakumar
B.
2019
A comparative study of models for short-term streamflow forecasting with emphasis on wavelet-based approach
.
Stoch Environ Res RISK Assess.
33
(
10
),
1875
1891
.
https://doi.org/10.1007/s00477-019-01734-7
.
Sun
L.
,
Zhou
X.
&
Gu
A.
2022
Effects of climate change on hydropower generation in China based on a WEAP model
.
Sustainability
14
(
9
),
5467
.
https://doi.org/10.3390/su14095467
.
Tang
L.
,
Wu
Y.
&
Yu
L.
2018
A randomized-algorithm-based decomposition-ensemble learning methodology for energy price forecasting
.
Energy
157
,
526
538
.
https://doi.org/10.1016/j.energy.2018.05.146
.
Torres
M.
,
Colominas
M.
,
Schlotthauer
G.
&
Flandrin
P.
2011
A complete ensemble empirical mode decomposition with adaptive noise. IEEE. New York
, p.
4144
4147
.
Wang
W.
,
Chau
K.
,
Qiu
L.
&
Chen
Y.
2015
Improving forecasting accuracy of medium and long-term runoff using artificial neural network based on EEMD decomposition
.
Environ Res.
139
,
46
54
.
https://doi.org/10.1016/j.envres.2015.02.002
.
Xiang
Z.
,
Montas
H. J.
,
Shirmohammadi
A.
,
Leisnham
P. T.
&
Brubaker
K.
2018
Impact of Climate Change on Critical Source Areas in A Chesapeake Bay Watershed
.
ASABE
,
St. Joseph, MI
, p.
1
.
https://doi.org/10.13031/aim.201801831.
Xiang
Z.
,
Yan
J.
&
Demir
I.
2020
A rainfall-runoff model with LSTM-based sequence-to-sequence learning
.
Water Resour Res.
56
(
1
),
1
17
.
https://doi.org/10.1029/2019WR025326.
Xie
T.
,
Zhang
G.
,
Hou
J.
,
Xie
J.
,
Lv
M.
&
Liu
F.
2019
Hybrid forecasting model for non-stationary daily runoff series: A case study in the Han River Basin, China
.
J Hydrol.
577
,
123915
.
https://doi.org/10.1016/j.jhydrol.2019.123915
.
Yin
H.
,
Wang
F.
,
Zhang
X.
,
Zhang
Y.
,
Chen
J.
,
Xia
R.
&
Jin
J.
2022
Rainfall-runoff modeling using long short-term memory based step-sequence framework
.
J. Hydrol.
610
,
127901
.
Young
C.
&
Liu
W.
2015
Prediction and modelling of rainfall-runoff during typhoon events using a physically-based and artificial neural network hybrid model
.
Hydrol Sci J-J Sci Hydrol.
60
(
12
),
2102
2116
.
https://doi.org/10.1080/02626667.2014.959446
.
Zhai
J.
,
Su
B.
,
Krysanova
V.
,
Vetter
T.
,
Gao
C.
&
Jiang
T.
2010
Spatial variation and trends in PDSI and SPI indices and their relation to streamflow in 10 large regions of China
.
J Clim.
23
(
3
),
649
663
.
https://doi.org/10.1175/2009JCLI2968.1
.
Zhang
L.
,
Jin
X.
,
He
C.
,
Zhang
B.
,
Zhang
X.
,
Li
J.
,
Zhao
C.
,
Tian
J.
&
DeMarchi
C.
2016
Comparison of SWAT and DLBRM for hydrological modeling of a mountainous watershed in arid northwest China
.
J Hydrol Eng.
21
(
5
).
https://doi.org/10.1061/(ASCE)HE.1943-5584.0001313.
Zhang
X.
,
Zheng
Z.
&
Wang
K.
2021
Prediction of runoff in the upper Yangtze River Based on CEEMDAN-NAR model
.
Water Supply.
21
(
7
),
3307
3318
.
https://doi.org/10.2166/ws.2021.121
.
Zounemat-Kermani
M.
,
Seo
Y.
,
Kim
S.
,
Ghorbani
M.
,
Samadianfard
S.
,
Naghshara
S.
,
Kim
N.
&
Singh
V.
2019
Can decomposition approaches always enhance soft computing models? predicting the dissolved oxygen concentration in the St. Johns River, Florida
.
Appl Sci-BASEL.
9
(
12
),
2534
.
https://doi.org/10.3390/app9122534.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data