## Abstract

Runoff is affected by natural and nonnatural factors in the process of formation, and the runoff series is generally nonstationary time series. How to improve the accuracy of runoff prediction has always been a difficult problem for hydrologists. The key to solve this problem is to reduce the complexity of runoff series and improve the accuracy of runoff prediction model. Based on the aforementioned ideas, this article uses the complementary set empirical mode decomposition to decompose the runoff series into multiple intrinsic components that retain time–frequency information, thus reducing the complexity of the runoff series. The particle swarm optimization (PSO) adaptive neuro-fuzzy system is used to predict each intrinsic component to improve the accuracy of runoff prediction. After that, the trained intrinsic components of the model are reconstructed into the original runoff series. The example shows that the absolute relative error of the runoff forecasting model constructed in this article is 0.039, and the determination coefficient is 0.973. This model can be applied to the annual runoff series forecasting. Comparing the prediction results of this model with empirical mode decomposition algorithm-ANFIS model and ANFIS model, complementary set empirical mode decomposition algorithm-PSO-ANFIS model shows obvious advantages.

## HIGHLIGHTS

Complementary set empirical mode decomposition algorithm (CEEMD) is a good time series decomposition tool, which can reduce the nonstationary of time series.

Particle swarm optimization (PSO)-ANFIS has good simulation and prediction results.

CEEMD-PSO-ANFIS model can be applied to runoff series prediction.

## INTRODUCTION

Runoff forecasting is an important research direction in the hydrological field. Runoff forecasting can not only provide basic reference for the use of water resources but also provide important data support for urban flood control scheduling. When the accuracy of runoff forecast is higher, the water resource management of the basin will be more reasonable, and the urban flood control and drainage scheduling will also be more scientific. However, in the process of runoff formation, it is not only affected by natural factors such as rainfall, temperature, and evaporation but also affected by many nonnatural factors such as human activities and urbanization process. The impact of these factors makes the runoff time series have nonstationary and nonlinear characteristics, and the impact of many factors also makes the runoff time series more complex. The complex runoff series, nonstationary, and nonlinear characteristics increase the difficulty of improving the accuracy of runoff forecast.

At present, the research in the field of runoff prediction at home and abroad can be roughly divided into two types (Hao *et al.* 2015): first, the quasi-physical model that uses mathematical methods to simulate the runoff process, such as the Xin'anjiang model; second, the formation process of runoff is simulated by the black box model. The black box model uses the computer's fast computing ability to establish the nonlinear relationship between the impact factors of runoff and runoff, so as to simulate and predict. There are much research on the use of black box models at home and abroad, such as artificial neural networks, support vector machines, random forest methods, etc. For example, Yudong & Linsheng (1995) proposed that artificial neural network can be widely used in long-term runoff prediction. Xinmin *et al.* (1999) used conjugate gradient optimization and error back-propagation training algorithm to optimize the artificial neural network and applied it to the real-time prediction of river runoff. This method provides a way for the black box model of runoff prediction, that is, the runoff prediction model can be optimized by improving the model parameters. Jianyi & Chuntian (2006) applied the support vector machine model to the runoff forecast, and the results show that the accuracy of runoff forecast has been improved. Tongtie Gang *et al.* (2012) applied the stochastic forest method to the runoff forecast, and the forecast accuracy was improved. Han & Morrison (2022) used the deep learning method to improve the prediction effectiveness of the earlier hydrological model. The black box model is one of the widely used methods for runoff forecasting.

Due to the complexity and nonstationarity of runoff time series, it is difficult to improve the accuracy of runoff forecast. At present, the methods to improve the accuracy of runoff prediction include two key directions (Dongmei *et al.* 2022). The first is to reduce the complexity of the runoff series, that is, to decompose the complex runoff series into multiple simple time series. The second is to establish a black box model for runoff prediction with strong nonlinear approximation ability. The methods to reduce the complexity of runoff series include empirical mode decomposition (EMD), wavelet analysis, etc. It is to reduce the complexity of runoff series by decomposing the original nonstationary series into multiple series. For example, Shahabi *et al.* (2016) applied wavelet transform to decompose significant wave height (SWH) data, and the results show that this method can improve the prediction accuracy. Tian (2020) applied wavelet transform to decompose crop water demand data, extracted approximate signals, and decomposed signals of water demand, thus reducing the complexity of the model. Vousoughi (2023) applied wavelet analysis technology to decompose groundwater data, and the results showed that this method improved the accuracy of prediction. Zhang *et al.* (2017) applied the EMD model to extract the periodic and local characteristics of runoff. Ghasempour *et al*. (2021) used empirical mode decomposition algorithm (EEMD) and variational mode decomposition (VMD) to decompose the river flow, and the results show that the prediction accuracy of the model can be improved by using this method. After reducing the complexity of the model, the accuracy of runoff prediction can be effectively improved by using the model with strong nonlinear approximation ability. Ruiming (2019) applied wavelet transform and support vector machine to monthly runoff forecast, and the results showed that the method was effective. Rui *et al.* (2021) applied the linear network long short term memory (LN-LSTM)-particle swarm optimization (PSO) model to runoff forecasting, and the results showed that the accuracy of runoff forecasting was significantly improved. Xin *et al.* (2022) applied EMD and artificial neural network (ANN) to runoff prediction and obtain good results.

Based on the aforementioned idea of improving the accuracy of the model, this article introduces the complementary set EMD to decompose the runoff time series. In the process of runoff decomposition, this method can not only decompose the runoff series into simpler series but also give the correlation coefficient between the decomposed series and the original runoff series. The accuracy of runoff prediction can be increased by characterizing the intrinsic components of the original runoff series. Zhang *et al.* (2022a, 2022b) applied the CEEMD-bidirectional long short term memory (BISTM) model to monthly runoff forecast, and believed that this method could improve the accuracy of runoff forecast.

## RESEARCH METHODS

### Complementary ensemble empirical mode decomposition

Complementary set empirical mode decomposition algorithm (CEEMD), proposed by Yeh *et al.* (2010) in 2010, is improved on the basis of set EEMD. This method can effectively improve the decomposition efficiency by adding positive and negative white noise on the basis of traditional EMD. The traditional processing method of modal decomposition assumes that there is no noise in the original data, and the existence of noise is not considered in the process of decomposition, so modal aliasing will occur in the process of decomposition. In view of the aforementioned defects, EEMD solves the modal aliasing problem by adding uniformly distributed white noise several times to mask the noise of the original data. However, the white noise added by EEMD cannot be completely offset after the lumped average, and there is residual. To solve this problem, CEEMD adds positive and negative white noise in pairs. The positive and negative white noise will cancel each other when the set is averaged, thus improving the decomposition effect. The process of using CEEMD to decompose runoff time series is as follows:

- (1)
Add different positive and negative paired white noise to the original runoff series , the number of times of adding is

*k*, the positive noise signal is , and the negative noise signal is . - (2)
EMD decomposition of and .

① The size and position of local maximum and minimum of and are analyzed.

② Apply the cubic spline difference curve to form the upper envelope lines and and the lower envelope lines and of the maximum and minimum values of and .

③ Calculate the mean value of the upper envelope lines and of and and the lower envelope lines and at each time point and . Calculate the difference between the mean and and the difference between the mean and to determine whether the requirements are met. If satisfied, record the current mean value , set it as the intrinsic modulus function (IMF)

_{a}, and set the mean value as the intrinsic modulus function . If the assumed conditions are not met, replace the current data with the difference value and repeat the aforementioned steps until the assumed conditions are met.④ When the set number of iterations

*n*is reached:, , if the set judgment conditions are met, the first intrinsic modulus of positive noise , and the first intrinsic modulus of negative noise .

⑤ Calculate residual sequence

, , judge whether it meets the EMD decomposition conditions. If it meets the conditions, the decomposition will be ended. If not, then another , , return to step ①.

⑥ Through the aforementioned steps, we can gradually calculate the positive noise sequence , the first to nth intrinsic modulus and a residual , as well as the negative noise sequence , the first to nth intrinsic modulus and a residual .

- (3)
Calculate the first to nth intrinsic modulus of the original runoff series , and the residual .

- (4)
The sum of intrinsic modulus and residual is the original data.

Due to the addition of white noise with positive and negative opposites, the two noises can cancel each other on average. This method not only improves the decomposition efficiency but also improves the shortcomings of large reconstruction error and poor decomposition completeness of EEMD. CEEMD has two important parameters during decomposition, including the number of noise additions and the standard deviation of noise additions. After consulting relevant literature (Zhang *et al.* 2022a, 2022b), the standard deviation is generally set as 0.10–0.2, and the number of noise additions is generally 50–100. The process of CEEMD decomposing runoff sequences is shown in Figure 1.

### ANFIS model

Adaptive Network-based Fuzzy Inference System was proposed by J-S.R. Jang in the early 1990s. The model generally includes five layers: fuzziness, rule usage, normalized usage, Takagi-Sugeno-Kang (TSK) output layer, and total output layer (Daneshfaraz *et al.* 2022). The model is essentially a neural network based on fuzzy reasoning. It not only has the advantages of self-learning, adaptive, and nonlinear approximation of neural network (Talei *et al.* 2010) but also has fuzzy reasoning. This method has been proved to be effective in runoff forecasting (Nourani & Komasi 2013; Shabnam & Morteza 2022). It has two key parameters to learn: the former and the latter. Suppose a multi-input and single-output hierarchical fuzzy neural network, the output of the *i* nodes of its *K* layer is , the expected output is , and the actual output is .

#### Learning of front part parameters

#### Learning of subsequent parameters

*I*is the unit matrix, is a large positive number, and a positive number of 103 can be taken. When , the recursive least square method is close to the real value.

### PSO optimized ANFIS model

The learning of the antecedent parameters and the consequent parameters is the most important content of the ANFIS model (Tao *et al.* 2021). In the calculation process of the ANFIS model, the antecedent parameters are calculated based on the gradient descent algorithm, which will lead to a large amount of calculation of the antecedent parameters and easy to fall into local minimization. This will slow down the calculation speed of the model and make it difficult to reach the best-fitting parameters. In order to improve this problem, the global optimization algorithm can be used.

*i*particle is expressed as . By substituting it into the fitness function, the fitness value of the particle at the position can be calculated. The size of the value represents the degree of the position. The best solution found by the particle is recorded as , which is called the individual extreme point pbest. The best position found by the group is recorded as , which is called the global extreme point gbest. The velocity and position of any particle

*i*in its

*d*-dimensional space (1 ≤

*d*≤

*D*) are updated as follows:where

*w*is the inertia weight coefficient, and are the acceleration factors, is the random number in the range [0,1], is the position of particle I in the

*D-*dimensional space, is the velocity of particle

*i*in the

*D-*dimensional space, , are the individual extreme points, and is the global extreme points.

In order to prevent particles from exceeding the set variable range during flight, the maximum speed of flight is generally set according to the variable range of the problem to be solved. If the current speed is or , take or .

The PSO-ANFIS model uses the global optimization ability of PSO to optimize the design of the network structure parameters of ANFIS. The purpose is to improve the accuracy of the parameter identification of ANFIS, thus improving the accuracy of the runoff forecast model. Its calculation process is as follows:

- (1)
Determine the scale, inertia weight and acceleration factor of PSO, and replace the antecedent parameters of ANFIS with initialized particles.

- (2)
Estimate the post-part parameters from the calculated pre-part parameters, calculate the model structure error at this time, and take the error as the fitness value of PSO.

- (3)
Compare the current fitness value with the best value to determine whether to replace the best value.

- (4)
Update the speed and position of PSO particles. Iteration is repeated until the optimal ANFIS parameter is found. The calculation process of PSO-ANFIS is shown in Figure 2.

### CEEMD-PSO-ANFIS model

The design idea of the model is:

- (1)
By analyzing the frequency characteristics of the original runoff series, the frequency and frequency of adding noise to CEEMD are set.

- (2)
The runoff series is decomposed into several IMF components by CEEMD.

- (3)
The PSO-ANFIS model is used to predict each IMF component separately. According to the characteristics of each IMF component, the number of PSO particle swarm, inertia weight coefficient, acceleration factor, and membership function family number of ANFIS model are set.

- (4)
The simulation prediction results of each IMF component are reconstructed to obtain the simulation sequence results of the original runoff series.

### Model evaluation criteria

*et al.*2021; Wu & Wu 2022), this article uses the average absolute value percentage error (), root mean square error (), average absolute pair error (), and determination coefficient () to evaluate the simulation and prediction results of the model. These parameters are calculated as follows:where is the original runoff value, is the predicted runoff value,

*N*is the number of series, and is the average value of runoff.

## CASE APPLICATION

### Overview of the study area

^{2}, and an average gradient of 6.0‰. The main stream is a mountain stream channel above Shatou, 61 km above Xikou, with a river gradient of 10.46‰, and 48 km from Xikou to Shatou, with a river gradient of 1.12‰. The river below Shatou is affected by the tide of Oujiang River estuary, which is a tidal river section. The data are selected from Shizhu Hydrological Station, which is a national hydrological station. The station was set up by the Zhejiang Provincial Hydrological Bureau on 1 June 1955, with a catchment area of 1,273 km

^{2}, to observe water level, water surface gradient, and precipitation. The data selected this time are the annual runoff data of Shizhu Station from 1957 to 2019, with a total of 63 data. From the figure, we can see that the annual runoff series of Shizhu Station on the Nanxi River is nonstationary.

### CEEMD decomposition

. | Original sequence . | IMF1 . | IMF2 . | IMF3 . | IMF4 . |
---|---|---|---|---|---|

Correlation coefficient | 1.000 | 0.695 | 0.639 | 0.057 | 0.224 |

Information entropy | 0.130 | 0.142 | 0.093 | 0.071 | 0.065 |

. | Original sequence . | IMF1 . | IMF2 . | IMF3 . | IMF4 . |
---|---|---|---|---|---|

Correlation coefficient | 1.000 | 0.695 | 0.639 | 0.057 | 0.224 |

Information entropy | 0.130 | 0.142 | 0.093 | 0.071 | 0.065 |

From Table 1, we can see that the correlation between IMF1, IMF2, and the original sequence is the best, the correlation coefficient between IMF3 and the original sequence is general, and the correlation between IMF4 and the original sequence is weak.

### Simulation and prediction of runoff series

From Figure 6, we can see that the simulation and prediction effects of the trend item IMF4 are very good, which shows that the nonstationary series can be better predicted after being separated from the stationary series. In particular, we can see that the more the number of decomposition layers of the series, the better the simulation and prediction effect. This also verifies that the runoff series can improve the runoff prediction results after decomposition.

From Table 2, through statistics of the percentage error (), root mean square error (), average absolute pair error (), and determination coefficient () of the original runoff series, IMF1–IMF4, we can see that the calculation error is gradually decreasing. Among them, the simulation and prediction effect of IMF4 is the best. IMF1 has a strong volatility and stronger nonstationary. Its decision coefficient in the simulation stage is 0.967, but its decision coefficient in the prediction stage is relatively low, with a value of 0.191.

## DISCUSSION

Sequence . | Stage . | . | . | . | . |
---|---|---|---|---|---|

Original | Simulation | 0.039 | 19.780 | 29.668 | 0.973 |

Forecast | 0.223 | 119.971 | 129.655 | 0.446 | |

IMF1 | Simulation | 1.217 | 10.723 | 22.513 | 0.967 |

Forecast | 4.581 | 114.317 | 123.477 | 0.191 | |

IMF2 | Simulation | 0.255 | 11.372 | 15.886 | 0.983 |

Forecast | 1.011 | 19.825 | 24.823 | 0.888 | |

IMF3 | Simulation | 0.038 | 0.687 | 0.930 | 1.000 |

Forecast | 0.413 | 2.751 | 3.056 | 0.961 | |

IMF4 | Simulation | 0.000 | 0.080 | 0.147 | 1.000 |

Forecast | 0.001 | 0.577 | 0.692 | 0.991 |

Sequence . | Stage . | . | . | . | . |
---|---|---|---|---|---|

Original | Simulation | 0.039 | 19.780 | 29.668 | 0.973 |

Forecast | 0.223 | 119.971 | 129.655 | 0.446 | |

IMF1 | Simulation | 1.217 | 10.723 | 22.513 | 0.967 |

Forecast | 4.581 | 114.317 | 123.477 | 0.191 | |

IMF2 | Simulation | 0.255 | 11.372 | 15.886 | 0.983 |

Forecast | 1.011 | 19.825 | 24.823 | 0.888 | |

IMF3 | Simulation | 0.038 | 0.687 | 0.930 | 1.000 |

Forecast | 0.413 | 2.751 | 3.056 | 0.961 | |

IMF4 | Simulation | 0.000 | 0.080 | 0.147 | 1.000 |

Forecast | 0.001 | 0.577 | 0.692 | 0.991 |

The evaluation results of the simulation and prediction stages of each model are shown in Table 3.

Model . | Stage . | . | . | . | . |
---|---|---|---|---|---|

CEEMD-PSO-ANFIS | Simulation | 0.039 | 19.780 | 29.668 | 0.973 |

Forecast | 0.223 | 119.971 | 129.655 | 0.446 | |

EEMD-ANFIS | Simulation | 0.090 | 41.899 | 57.586 | 0.886 |

Forecast | 0.182 | 96.026 | 111.635 | −0.138 | |

ANFIS | Simulation | 0.149 | 68.715 | 91.970 | 0.609 |

Forecast | 0.399 | 219.062 | 264.449 | 0.013 |

Model . | Stage . | . | . | . | . |
---|---|---|---|---|---|

CEEMD-PSO-ANFIS | Simulation | 0.039 | 19.780 | 29.668 | 0.973 |

Forecast | 0.223 | 119.971 | 129.655 | 0.446 | |

EEMD-ANFIS | Simulation | 0.090 | 41.899 | 57.586 | 0.886 |

Forecast | 0.182 | 96.026 | 111.635 | −0.138 | |

ANFIS | Simulation | 0.149 | 68.715 | 91.970 | 0.609 |

Forecast | 0.399 | 219.062 | 264.449 | 0.013 |

## CONCLUSION

- (1)
The CEEMD model is decomposed by adding positive and negative white noise to reduce the deviation in the process of reconstructing the original sequence. First, the original runoff series is decomposed into multiple intrinsic components, then the PSO-ANFIS model is used to predict each component, and finally, the predicted components are reconstructed into the original runoff series. Through decomposition, the runoff series with strong nonstationarity is changed into more stable multiple intrinsic components, thus increasing the prediction accuracy.

- (2)
The average error of training results of CEEMD-PSO-ANFIS model is 0.039, and the determination coefficient is 0.973. Its model evaluation results are better than the ANFIS model and the EEMD-ANFIS model, which indicates that the model is feasible for annual runoff prediction.

- (3)
However, in the process of calculation, it is found that for IMF1 with strong nonstationary, if you want to obtain better prediction accuracy, you must increase the number of membership function families of the ANFIS model, which will make the calculation speed of the model slower than IMF2-IMF4 to a certain extent. In the process of calculation, the number of PSO population and the number of families of ANFIS membership function mainly depend on experience. In the future, it is necessary to increase the research of relevant aspects to determine the parameters of the model more accurately and scientifically.

## AUTHOR CONTRIBUTION

All authors contributed to the study conception and design. Huifang Guo and Yuan Fang contributed to writing and editing; Shixia Zhang contributed to chart editing; and Lihui Chen contributed to preliminary data collection. All authors read and approved the final manuscript.

## FUNDING

This research was supported by a Provincial cultivation project of the Natural Science Foundation of Zhejiang Province (Grant No. LZJWY22E090005), Zhejiang Tongji Vocational College of Science and Technology (Grant No. FRF21PY001), Science and Technology Project of Zhejiang Water Resources Department in 2021 (Grant No. RC2110), and Science and Technology Project of Zhejiang Water Resources Department in 2020 (Grant No. RC2031).

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.