## Abstract

Accurate water level prediction is of great importance for water infrastructures such as dams, embankments, and agriculture. However, the water level has nonlinear characteristics, which make it very challenging to accurately predict the water level. This study proposes a combined model based on variational mode decomposition (VMD), a genetic algorithm–the ELMAN neural network–VMD–the autoregressive integrated moving average (ARIMA) model (GA–ELMAN–VMD–ARIMA). Firstly, VMD preprocesses the original water level and predicts each subsequence with the GA–ELMAN model. Then the error sequence is decomposed by VMD and predicted by the ARIMA model. Finally, the predicted water level is corrected. Using three groups of data from different sites, 10 models are established to compare the performance of the model. The results show that the combination of the VMD algorithm and the GA–ELMAN model can improve the performance of prediction on datasets. In addition, it also shows that the VMD double processing can greatly improve the prediction accuracy.

## HIGHLIGHTS

The variational mode decomposition (VMD) double processing is used for water level prediction.

A genetic algorithm is used to optimize the hyperparameters of the ELMAN neural network.

The error correction method is used to improve the prediction accuracy.

### Graphical Abstract

## INTRODUCTION

### Background

Hydrological system modeling has become common in the last few decades due to its overwhelming importance for understanding the earth system. Habeeb & Talib (2021) combined the Geographic Information System (GIS) with remote sensing, Internet of Things, and Web to manage and monitor the water quality. Ekwueme & Agunwamba (2021) used the Mann–Kendall technique to effectively analyze the air temperature and rainfall trends in regional basins. Nazarnia *et al.* (2020) proposed that sea level rise (SLR) has an impact on the world's coastal areas and coastal infrastructures such as water, transportation system, and energy supplement system. Besides, the water level is an important hydrological variable that can reflect the reserve capacity of the hydrological system and determine the load capacity and self-regulation limit of the hydrological system. The water level has a great impact on the ecological environment and on human life for navigation, flood control, and agricultural irrigation.

The prediction models of the water level are mainly divided into two categories: the process-driven model and data-driven model (Abrahart *et al.* 2012; Blschl *et al.* 2019). The process-driven prediction model simulates the change process of a river's water level on the basis of hydrology and establishes the mathematical model of water level prediction, but this method requires a large amount of water level data. In contrast, the data-driven prediction model needs less data and can simulate the nonlinear and non-stationary characteristics of hydrological processes with the smallest observation data. There is no need to consider the hydrological background of water level height change; it only depends on the historical data of water level height. Prediction can be achieved by learning the relationship between historical data and future water level data (Li *et al*. 2020). The predictive performance of the data-driven model is better than that of the process-driven model (Kalteh 2016).

Data-driven models in hydrological systems mainly include statistical methods, such as the autoregressive moving average (ARMA) and the multiple linear regression (MLR), as well as artificial intelligence methods, such as the ELMAN neural network (ENN) (Lei & Wang 2019), the artificial neural network (ANN) (Rao 2000), the classification and regression tree (CART) (Yang *et al.* 2016), the genetic expression programming (GEP) (Kiafar *et al.* 2016), the genetic programming (GP) (Khu *et al.* 2010), the extreme learning machine (ELM) (Shiri *et al.* 2016), the gated recurrent unit (GRU), the convolutional neural network (CNN) (Pan *et al.* 2020), the support vector machine (SVM) (Behzad *et al.* 2009), the adaptive neuro-fuzzy inference system algorithm (ANFIS) (Sun & Trevor 2018), the stack algorithm (Tyralis *et al.* 2019), the enhancement algorithm (Li *et al.* 2016), the random forest (RF) (Fathian *et al.* 2019), etc. Fu *et al.* (2020) selected the Kelantan River in the northeast of the Malaysian Peninsula to test the impact of the size of the training set, the time interval between the training set and the test set, and the time span of the predicted data on the performance of the developed long short-term memory (LSTM) model. The experimental results show that the model can handle the stable streamflow data in the dry season and the rapid fluctuation streamflow data in the rainy season. In the last few years, the ENN has been applied to many fields. Brunelli *et al.* (2007) used the ENN to predict the daily maximum concentrations of pollutants such as sulfur dioxide (SO_{2}), ozone (O_{3}), inhalable particles (PM_{10}), nitrogen dioxide (NO_{2}), and carbon monoxide (CO) in Palermo, and the predicted values were in agreement with the actual values. Wu *et al.* (2011) used the improved ENN to predict the high PM_{10} air pollution index (API) events caused by sandstorm activities with better accuracy than the standard ELMAN model. Recently, Wang *et al.* (2021) used an ENN to predict stock problems. They also showed the reliability of the ENN model prediction.

However, the traditional single method is not enough to meet people's demand for accuracy. For example, Mosavi *et al.* (2018) proposed that mixing, data decomposition, algorithm integration, and model optimization are the most effective strategies for improving machine learning (ML) methods, which can select the appropriate ML methods based on predicted needs in hydrology and climate. Ebtehaj *et al.* (2021) established two hybrid models of Generalized Structure Group Method of Data Handling (GS-GMDH) and ANFIS with Fuzzy C-Means (ANFIS-FCM) based on the data of two water level stations in the Perak River, Malaysia, which have good accuracy in the river water level prediction. A hybrid model is established, which generally includes a prediction model based on the pretreatment method or by combining the optimization algorithm to optimize the parameters of the neural network. Preprocessing decomposes the original data and eliminates anomalies and denoising to make the sequence more stable and reduces the complexity. Fotovatikhah *et al.* (2018) believe that the hybrid method is the best choice to deal with flood management through computational intelligence (CI), which has the potential to improve the accuracy and lead time of flood and debris prediction, and the wavelet square method has strong integration ability. In recent years, the combination of a wavelet transform and artificial intelligence technology has been successfully applied to hydrology. In the last few years, a wavelet transform (Khan *et al.* 2020), a singular spectrum analysis (SSA) (Wu & Chau 2013; Wang *et al.* 2020), and other methods have been widely used in the hydrological preprocessing. Altunkaynak & Kartal (2019) used joint discrete wavelet transform-fuzzy (DWT-fuzzy) and joint continuous wavelet transform-fuzzy (CWT-fuzzy) models. The prediction performance of the CWT-fuzzy was better than that of the DWT-fuzzy and single fuzzy models. Seo *et al.* (2015) established two hybrid models of ANNs based on wavelets (WANNs) and adaptive neural fuzzy inference systems based on wavelets (WANFIS) to predict daily water levels. Although these techniques can improve the prediction accuracy to a large extent, Du *et al.* (2017) proposed that the hybrid model constructed by the SSA and DWT had problems while in use. Because the SSA and DWT were computed from ‘future’ values, the subseries generated by SSA reconstruction or DWT decomposition contain information about ‘future’ values. Thus, the mixed model had incorrect ‘high’ prediction effects, which led to large errors in practice.

To avoid such potential problems, empirical mode decomposition (EMD) (Zhao *et al.* 2017), ensemble EMD (EEMD) (Wang *et al.* 2015), complete ensemble EMD with adaptive noise (CEEMDAN) (Wen *et al.* 2019), improved complete integrated EMD and adaptive noise decomposition (ICEEMDAN) (Wang *et al.* 2019), and variational mode decomposition (VMD) (Niu *et al.* 2020) are also used for the hydrological preprocessing. Xi *et al.* (2017) used the EMD-Elman model to predict a monthly runoff. The experimental results showed that the combined model had higher prediction accuracy than the Elman model and could be suitable for complex hydrological sequences. Wen *et al.* (2019) used a data-driven method to design a two-phase hybrid model (CVEE-ELM), adopted the complete integrated EMD of adaptive noise (CEEMDAN), combined the method with VMD, and used the ELM algorithm to predict the multiscale runoff problem. The CVEE-ELM model had notable advantages, especially in the extensive analysis of predicted and observed datasets. Niu *et al.* (2020) used the mixed model of VMD and an ELM, and the gravity search algorithm (GSA) to find the optimal super parameters of ELM to predict the annual runoff of the reservoir. The experimental results showed that the performance of the various indicators of the proposed model was better than that of the autoregressive integrated moving average (ARIMA) and ELM models, indicating that the proposed model is an effective tool for predicting a runoff.

Although data preprocessing can reduce the error to a large extent, the researchers found that adding an optimization algorithm can further improve the prediction accuracy. Wang *et al.* (2020) denoised the monthly runoff data of Zhengyi gorge of the Heihe River through the SSA in 2020, and the grey wolf optimization (GWO) algorithm was used to jointly optimize the parameter penalty factor *c* and the kernel function parameter of the support vector regression (SVR) model, which enhanced the generalizability. The results showed that the proposed model had higher prediction accuracy than the persistent model (PM), ARIMA, cross validation-SVR (CV-SVR), and GWO-SVR models, especially for tracking and forecasting the peak runoff during the flood season. Cong & Meesad (2013) proposed a firefly algorithm (FA), a paper swarm optimization (PSO), and a genetic algorithm (GA) to optimize the type 1 and type 2 TSK fuzzy logic systems to predict the hourly sea level of the Nha Trang Sea in Vietnam. Taormina & Chau (2015) used the LUBE method to test the applicability of estimates of production prediction intervals (PIs) under different confidence levels (CL) of river flow in the first 6 h of Susquehanna and Inner Harlem rivers in the United States. The results show that the neural network trained by MOFIPS is superior to the neural network developed by single-objective group optimization. Kisi *et al.* (2015) used a multi-step advanced prediction model based on the SVM and an FA (SVM–FA) to predict the daily water level of the Lake Urmia in northwestern Iran. It was proven that this model was superior to GP and ANN models. Yao *et al.* (2018) combined the GA to optimize the weights and thresholds of the ENN to predict the historical water level of the Yongding River monitoring site. The experimental results showed that the GA optimized Elman network (ENN) was more effective than the single-model Elman, and back propagation (BP) networks were more accurate. This study aims to design the VMD double processing and optimize Elman parameters with a GA. Therefore, the proposed method may be helpful for applications involving multiple domains such as runoffs, precipitation, and stocks.

### Contribution of the paper

Water level series are complex and nonlinear time series, which are influenced by environmental factors. In the past, most of the water level studies are based on the original water level series. The prediction of a single model cannot accurately describe the fluctuation of data, which leads to the low prediction accuracy of the model. To overcome the above problems and improve the water level prediction performance, this paper proposes a new hybrid model strategy based on the VMD double processing, GA–ELMAN, and ARIMA prediction. Specific contribution is as follows:

- (1)
Based on the VMD double processing strategy, the original water level data and error sequence are decomposed, and the non-stationary water level is decomposed into multiple band-limited intrinsic mode functions, which reduces the complexity and non-stationarity of the data.

- (2)
The learning rate and the momentum coefficient of ELMAN are optimized by the GA, and the optimal learning rate and the momentum coefficient are found by verification set search. This strategy can improve the prediction accuracy of the model.

- (3)
The error correction model is used to improve the overall prediction accuracy of the model, avoiding the influence of the error sequence to reduce the prediction accuracy.

- (4)
The accuracy and effectiveness of the proposed VMD–GA–ELMAN–VMD–ARIMA hybrid model in water level prediction are verified by using the water level datasets of three different stations and 10 comparison models.

### The structure of the paper

The rest of this paper is organized as follows: Section 2 introduces the basic theory, method, and framework of the hybrid model in detail. Section 3 describes the simulation experiments used to validate the model. Using three sets of hydrological data, the VMD–GA–ELMAN–VMD–ARIMA model is compared with the other 10 models to verify the prediction results and the performance of the model. Section 4 discusses the advantages of the prediction model. Section 5 summarizes the model.

## METHOD

### The framework of the hybrid model

The frame diagram of the hybrid model is shown in Figure 1. The key part of the model is that the GA optimizes the parameters of the ELMAN and VMD double processing and the error correction method to improve the prediction accuracy.

The framework is divided into the following steps: (1) VMD is used to decompose the original water level series into several subseries. (2) When ELMAN is used to predict each subsequence, the GA is first used to optimize the hyperparameter learning rate and the momentum coefficient of ELMAN to avoid the shortage of manual adjustment parameters, and finally the prediction sequence and error sequence are obtained. (3) The ARIMA model is used to predict the error sequence decomposed by VMD for the second time, and finally the prediction error sequence is obtained. (4) The prediction sequence in the second step is modified by the prediction error sequence to obtain the final predicted water level data.

This paper mainly introduces the VMD, ICEEMDAN, ELMAN, LSTM, and GRU applied in the hybrid model. Since the MLP structure is simple, the prediction effect is not very good in this experiment, and though the ARIMA model is more familiar, it is no longer introduced here.

### Variational mode decomposition

VMD is an adaptive and quasi-orthogonal signal decomposition method newly developed by Dragomiretskiy and Zosso (Dragomiretskiy & Zosso 2014). VMD decomposes the signal *x*(*t*) into *k* variational modes (Liu *et al.* 2018a).

*u*of each mode is estimated using the Gaussian smoothness and the gradient square criterion (Hu

_{k}*et al.*2021). Therefore, the constrained variational problem can be defined as follows:where and are the

*k*th decomposition mode and center frequency after the decomposition of the original signal.

*K*is the total modulus to be decomposed; is the partial derivative of the function with respect to

*t*; is the Dirac distribution function. In Equation (2), a quadratic penalty term and Lagrange multiplier can be introduced to determine the Lagrange equation, which is shown in Equation (3) as follows:

In the equation, represents the balance parameter of the data fidelity constraint. Equation (3) is solved by the alternating direction method of multiplication (ADMM) (Hestenes 1969). The saddle points of the Lagrange function are obtained by iterative updating , , and from the ADMM (Bai *et al.* 2021).

- (1)
Initialize each mode component and center frequency.

Set the initial values of , , , and

*n*to 0 and the resolution times*K*to a positive integer.- (2)
- (3)

*K*modal components are obtained, and the variational modal distribution decomposition is completed as follows:

### Improved complete ensemble EMD with adaptive noise

In the last few years, Marcelo *et al.* proposed a new technology called the improved complete ensemble EMD with adaptive noise (ICEEMDAN). ICEEMDAN is a new decomposition technology based on EMD. Compared with EMD, it solves the problem of frequency aliasing and pseudo-mode encountered in previous studies. Adding white noise can make the frequencies between adjacent scales that have continuity and weaken the influence of frequency mixing. The following equation is the main step of ICEEMDAN (Colominas *et al.* 2014):

- (1)Determine the EMD implementation as follows:where
*i*,*x*, , , and are the added white noise figure, original signal, decomposed signal, white noise, and standard deviation calculator. represents the first component of white noise, , and is the reciprocal of the desired signal-to-noise ratio between the first added noise and the analyzed signal. - (2)
- (3)
- (4)
- (5)
- (6)
Repeat step (4) for the next

*k*stages.

### ELMAN neural network

*et al.*2018b), which can be used to solve the fast optimization problem. This paper selects the optimization algorithm to find the best learning rate and the momentum coefficient of ELMAN. The calculation results are as follows:where

*x*represents the input vector,

_{t}*h*represents the hidden layer vector,

_{t}*y*represents the output vector,

_{t}*W*and

*V*represent the weight matrix,

*a*represents the deviation vector, and and represent the activation function. The structure of ELMAN is shown in Figure 2.

### Long short-term memory network

LSTM is a traditional neural network (RNN), which can solve the problem of disappearance or gradient explosion often encountered by traditional RNNs during training (Hochreiter & Schmidhuber 1997). LSTM consists of three gates: the input gate, the output gate, and the forgetting gate. The ‘gate’ of a long short-term storage network is a special network structure. Its input is a vector and the output range is 0–1. When the output value is 0, no information transmission is refused; when the output value is 1, all information is allowed to pass through (Duan *et al.* 2021).

*et al.*2017):where , , and are an input gate, the forgetting gate, and an output gate, respectively, and is a sigmoid function. It can be seen from the equation that , , and are determined by the sigmoid function. represents the multiplication of each element between vectors, is a hyperbolic tangent function, is the memory cell at the current moment, and represents the deviation vector. The ownership value matrix is updated by the difference between the output value of the error BP algorithm and the actual value (Yan

*et al.*2018), and

*k*is the equation of the hidden state. The structure of the LSTM is shown in Figure 3.

_{t}### Gated recurrent unit

*et al.*in 2014 to solve the long-term dependence of recurrent neural networks (Cho

*et al.*2014). The GRU neural network is evolved from the LSTM, which contains two gates, reset gate and update gate, replacing the forgetting gate, input gate, and output gate in the LSTM neural network. The GRU is easier to calculate and implement than the LSTM. The update gate control is passed to the previous information of the current layer, and the reset gate is the amount of information that is decided to forget. The principle of GRU prediction is to use the gate unit control history and current information for prediction in the current step (Xu

*et al.*2020). The LSTM and the GRU neural networks have similar data in the hidden layer, but there is no separate storage unit in the GRU neural network. Thus, the sample training efficiency is higher. The calculation equation is as follows:where is the hidden layer information of the previous moment and

*k*is the hidden layer information of the current moment.

_{t}*r*and

_{t}*u*are the reset gate and update gate of GRU, respectively. The candidate hidden layer

_{t}*y*measures how much hidden layer information is retained at the previous time by calculating

_{t}*r*. The number of candidate hidden layers to be added to

_{t}*y*can be calculated by

_{t}*u*and finally the current output

_{t}*k*is obtained. The structure of the GRU is shown in Figure 4.

_{t}In the figure, ‘1’ means subtracting each element in the vector with 1.

### Genetic algorithms

The GA was proposed by John Holland *et al.* in the late 1960s. The GA is a metaheuristic optimization algorithm supporting global search, which is used to solve complex optimization and high-dimensional problems with or without constraints (Prado *et al.* 2020). In this paper, the GA is used to optimize the learning rate and momentum coefficient of the ELMAN model, and the set optimization interval is [0–1]. The code string formed by these two optimized parameters is simulated into a biological evolution process, generating the next generation through selection, crossover, mutation, and other operations. The fitness value of the individual in the group is continuously improved until certain termination conditions are met (Liu *et al.* 2014, 2015). The optimization steps are shown in Figure 1. The ratio of the training set, the validation set, and the test set is 8:1:1. The data of the validation set are used to find the optimal parameters. The following are steps to optimize the algorithm:

- (1)
*Initialize the population*: The network learning rate and momentum coefficient are initialized, and the real number is coded. The population number is 40 and the algebra is set to 10. - (2)

In the equation, *f*(*j*) is the fitness value of chromosome *j*, represents the RMSE between the actual output *n _{i}* of the learning rate and momentum coefficient determined by chromosome

*j*and the predicted output and

*L*is the number of input samples for the training set in the network.

- (3)
*Perform a genetic operation*: Calculate the fitness value of each chromosome, determine the optimal fitness individual, and complete the operation. Otherwise, start the next round of operations from (2) until the most satisfactory individual is found. - (4)
*Obtain the learning rates and momentum coefficients of the ENN*: After GA optimization, a set of complete learning rate and momentum coefficient with the smallest error of the ELMAN model are obtained. In the output layer, the actual output*n*is compared with the predicted output , and the RMSE of the predicted value and the actual value is calculated as follows: . The evaluation criterion of ELMAN is that the smaller the RMSE, the better. Then, when the fitness function f(j) is the maximum, the learning rate and momentum coefficient of ELMAN are the optimal values._{i}

## EXPERIMENTAL RESULTS AND ANALYSIS

### Data description

The data are from the river water levels at points 4, 6, and 7 in the intensive observation of runoff in the middle reaches of the Heihe River. The observation point is located in Zhangye City, Gansu Province (National Qinghai-Tibet Plateau Scientific Data Center (tpdc. Ac. Cn)). The riverbed is gravel, and the river widths are 58, 50, and 130 m. The interval between each part of the data is 30 min. Three groups of water level data are, as shown in Figure 5, the training set, the validation set, and the test set. The data statistics are shown in Table 1, and are the total data, the number of training sets, the number of validation sets, and the number of test sets. The first 80% of the data is used as the training set, and then the last 10% of the data is used as the validation set of the GA optimization model to optimize the learning rate and the momentum coefficient. Finally, the remaining 10% of the data is used as the test set. The time, place, length, and complexity of the three groups of datasets are different. If the proposed model predicts the best effect of the three groups of datasets, the design of the model is successful.

. | T/period
. | T
. _{a} | T
. _{b} | T
. _{c} | Maximum (cm/30 min) . | Minimum (cm/30 min) . | Mean (cm/30 min) . | Std . |
---|---|---|---|---|---|---|---|---|

Data 1 | 1,024 | 813 | 102 | 101 | 99 | 51 | 73.97 | 14.01 |

Data 2 | 1,072 | 851 | 107 | 106 | 240.531 | 74.89 | 136.27 | 35.41 |

Data 3 | 1,648 | 1,312 | 164 | 164 | 76.992 | 20.34 | 41.13 | 16.86 |

. | T/period
. | T
. _{a} | T
. _{b} | T
. _{c} | Maximum (cm/30 min) . | Minimum (cm/30 min) . | Mean (cm/30 min) . | Std . |
---|---|---|---|---|---|---|---|---|

Data 1 | 1,024 | 813 | 102 | 101 | 99 | 51 | 73.97 | 14.01 |

Data 2 | 1,072 | 851 | 107 | 106 | 240.531 | 74.89 | 136.27 | 35.41 |

Data 3 | 1,648 | 1,312 | 164 | 164 | 76.992 | 20.34 | 41.13 | 16.86 |

### Evaluation metrics

*n*is the total number of sampling points,

*y*and are the actual value and predicted value, respectively, and is the predicted value.

_{i}, , , and are usually used to analyze errors in a time series. is the average absolute error between the predicted value and the actual value, which can reflect the actual error value; is an unbiased index that reflects the relative predictability of the model by dividing the absolute error by its corresponding actual value; *R*^{2} reflects the degree of data fitting between the actual value and the predicted value; and RMSE reflects the average error of the predicted value compared with the actual value. The smaller the values of , , and , the better the prediction accuracy of the model. Additionally, the closer the value of is to 1, the more accurate the prediction accuracy of the reaction.

Except for the error correction model and the first decomposition prediction model, all single model and optimized parameter model predictions were run 10 times, and the final prediction value was the average value of the 10 times.

Table 2 shows some basic parameter information of the ELMAN, LSTM, and GRU models.

Model . | Meaning . | Value . |
---|---|---|

ELMAN | Number of input layer nodes | 8 |

Number of hidden layer nodes | 15 | |

Number of output layer nodes | 1 | |

Epochs of training | 2,000 | |

Learning rate | 0–1 | |

Momentum coefficient | 0–1 | |

Layer delays | 1:2 | |

LSTM | Number of input layer nodes | 6 |

Number of hidden 1 layer nodes | 64 | |

Number of hidden 2 layer nodes | 16 | |

Number of output layer nodes | 1 | |

Epochs of training | 200 | |

Batch_size | 100 | |

GRU | Number of input layer nodes | 6 |

Number of hidden 1 layer nodes | 128 | |

Number of hidden 2 layer nodes | 16 | |

Number of output layer nodes | 1 | |

Epochs of training | 200 | |

Batch_size | 100 |

Model . | Meaning . | Value . |
---|---|---|

ELMAN | Number of input layer nodes | 8 |

Number of hidden layer nodes | 15 | |

Number of output layer nodes | 1 | |

Epochs of training | 2,000 | |

Learning rate | 0–1 | |

Momentum coefficient | 0–1 | |

Layer delays | 1:2 | |

LSTM | Number of input layer nodes | 6 |

Number of hidden 1 layer nodes | 64 | |

Number of hidden 2 layer nodes | 16 | |

Number of output layer nodes | 1 | |

Epochs of training | 200 | |

Batch_size | 100 | |

GRU | Number of input layer nodes | 6 |

Number of hidden 1 layer nodes | 128 | |

Number of hidden 2 layer nodes | 16 | |

Number of output layer nodes | 1 | |

Epochs of training | 200 | |

Batch_size | 100 |

In the programming of this experiment, ELMAN is implemented in the MATLAB 2019 version, and the LSTM and GRU single models are implemented in the Python environment.

### Comparison of the experimental results

- (1)
Forecast results of data 1

The number of data 1 is the lowest, and the prediction results are shown in Table 3. Figure 6 is a histogram of four prediction performance indicators and a comparison diagram of the prediction results of each model.

Data . | Model . | MAE . | MAPE . | R^{2}
. | RMSE . |
---|---|---|---|---|---|

Data 1 | LSTM | 0.88215 | 0.01263 | 0.99663 | 1.14522 |

GRU | 0.71005 | 0.01013 | 0.99741 | 0.91914 | |

MLP | 0.90700 | 0.01345 | 0.99647 | 1.13766 | |

ELMAN | 0.74666 | 0.01106 | 0.99712 | 0.93369 | |

VMD–ELMAN | 0.47322 | 0.00715 | 0.99955 | 0.54591 | |

ICEEMDAN–ELMAN | 0.58661 | 0.00982 | 0.99953 | 0.72816 | |

VMD–GRU | 0.33959 | 0.00491 | 0.99971 | 0.40567 | |

GA–ELMAN | 0.73590 | 0.01350 | 0.91050 | 0.87150 | |

VMD–GA–ELMAN | 0.27725 | 0.00501 | 0.98450 | 0.36807 | |

VMD–GRU–VMD–ARIMA | 0.14793 | 0.00212 | 0.99987 | 0.20134 | |

VMD–GA–ELMAN–VMD–ARIMA | 0.01578 | 0.00029 | 0.99993 | 0.01930 |

Data . | Model . | MAE . | MAPE . | R^{2}
. | RMSE . |
---|---|---|---|---|---|

Data 1 | LSTM | 0.88215 | 0.01263 | 0.99663 | 1.14522 |

GRU | 0.71005 | 0.01013 | 0.99741 | 0.91914 | |

MLP | 0.90700 | 0.01345 | 0.99647 | 1.13766 | |

ELMAN | 0.74666 | 0.01106 | 0.99712 | 0.93369 | |

VMD–ELMAN | 0.47322 | 0.00715 | 0.99955 | 0.54591 | |

ICEEMDAN–ELMAN | 0.58661 | 0.00982 | 0.99953 | 0.72816 | |

VMD–GRU | 0.33959 | 0.00491 | 0.99971 | 0.40567 | |

GA–ELMAN | 0.73590 | 0.01350 | 0.91050 | 0.87150 | |

VMD–GA–ELMAN | 0.27725 | 0.00501 | 0.98450 | 0.36807 | |

VMD–GRU–VMD–ARIMA | 0.14793 | 0.00212 | 0.99987 | 0.20134 | |

VMD–GA–ELMAN–VMD–ARIMA | 0.01578 | 0.00029 | 0.99993 | 0.01930 |

Data . | Model . | MAE . | MAPE . | R^{2}
. | RMSE . |
---|---|---|---|---|---|

Data 2 | LSTM | 1.57745 | 0.01151 | 0.95183 | 3.25710 |

GRU | 0.84443 | 0.00617 | 0.98327 | 1.90292 | |

MLP | 1.37157 | 0.00989 | 0.96413 | 2.68813 | |

ELMAN | 0.80781 | 0.00594 | 0.98539 | 1.77024 | |

VMD–ELMAN | 0.45906 | 0.00331 | 0.99968 | 0.50775 | |

ICEEMDAN–ELMAN | 0.47917 | 0.00340 | 0.99929 | 0.60761 | |

VMD–GRU | 0.42670 | 0.00330 | 0.99958 | 0.52523 | |

GA–ELMAN | 0.66633 | 0.00490 | 0.99541 | 1.17100 | |

VMD–GA–ELMAN | 0.24775 | 0.00175 | 0.99974 | 0.32971 | |

VMD–GRU–VMD–ARIMA | 0.18195 | 0.00136 | 0.99971 | 0.26206 | |

VMD–GA–ELMAN–VMD–ARIMA | 0.10710 | 0.00076 | 0.99904 | 0.14040 |

Data . | Model . | MAE . | MAPE . | R^{2}
. | RMSE . |
---|---|---|---|---|---|

Data 2 | LSTM | 1.57745 | 0.01151 | 0.95183 | 3.25710 |

GRU | 0.84443 | 0.00617 | 0.98327 | 1.90292 | |

MLP | 1.37157 | 0.00989 | 0.96413 | 2.68813 | |

ELMAN | 0.80781 | 0.00594 | 0.98539 | 1.77024 | |

VMD–ELMAN | 0.45906 | 0.00331 | 0.99968 | 0.50775 | |

ICEEMDAN–ELMAN | 0.47917 | 0.00340 | 0.99929 | 0.60761 | |

VMD–GRU | 0.42670 | 0.00330 | 0.99958 | 0.52523 | |

GA–ELMAN | 0.66633 | 0.00490 | 0.99541 | 1.17100 | |

VMD–GA–ELMAN | 0.24775 | 0.00175 | 0.99974 | 0.32971 | |

VMD–GRU–VMD–ARIMA | 0.18195 | 0.00136 | 0.99971 | 0.26206 | |

VMD–GA–ELMAN–VMD–ARIMA | 0.10710 | 0.00076 | 0.99904 | 0.14040 |

The experimental results show that in the single model, the prediction result of GRU is the best, the prediction result of ELMAN is second only to GRU, and the prediction effect of MLP is the worst. After data decomposition with the VMD and the ICEEMDAN models, the ELMAN model's prediction effect is superior to that of its single model, while the VMD model's prediction effect is also superior to that of the ICEEMDAN. However, after optimizing the parameters of ELMAN with GA, the values of and obtained are better than those of ELMAN, which are increasing by 0.01076 and 0.06219. The values of and show a difference of 0.00244 and 0.08662. The prediction accuracy of the VMD–GA–ELMAN model is higher than that of the VMD–ELMAN model and the GA–ELAMN model. Although the VMD–GRU model is superior to the VMD–ELMAN model, the VMD–GA–ELMAN–VMD–ARIMA model has the best prediction effect. Compared with the single-model ELMAN prediction, the , , , and values are increased by 0.73088, 0.01077, 0.00281, and 0.91439, respectively.

- (2)
Forecast results of data 2

The prediction results of data 2 are shown in Table 4. Figure 7 shows the histogram of four prediction performance indicators and the comparison of the prediction results of each model. Among the four single models, ELMAN has the best prediction effect, followed by GRU, and the VMD–ELMAN model is still better than the ICEEMDAN–ELMAN model. By comparing the prediction results of the ELMAN, the GA–ELAMN, and the VMD–GA–ELAMN models, it was found that the prediction accuracy of ELMAN can be improved by using data preprocessing and GA to optimize parameters. The VMD–GA–ELMAN–VMD–ARIMA model is worse than the VMD–GRU–VMD–ARIMA model, and the error correction has the ability to reduce the error.

- (3)
Forecast results of data 3

The number of data points is the largest, and the prediction results are shown in Table 5. Figure 8 shows the histogram of the four prediction performance indicators and the comparison of the prediction results of each model. In the single model, the prediction effect of ELMAN is the best, followed by GRU. The prediction results of the VMD–ELMAN model are still better than those of the ICEEMDAN–ELMAN model in , , and , but is the opposite, with a difference of 0.00004. The prediction effect after optimization is similar to that of data 2. The GA–ELMAN and VMD–GA–ELAMN models still have higher prediction accuracy than ELMAN. The VMD–GA–ELMAN–VMD–ARIMA model is worse than the VMD–GRU–VMD–ARIMA model, but it still maintains a good prediction effect, and the prediction effect is much higher than that of ELMAN.

- (4)
Scatter plot of experimental data

Data . | Model . | MAE . | MAPE . | R^{2}
. | RMSE . |
---|---|---|---|---|---|

Data 3 | LSTM | 0.60941 | 0.01688 | 0.99132 | 0.88727 |

GRU | 0.51465 | 0.01439 | 0.99438 | 0.70720 | |

MLP | 0.61394 | 0.01692 | 0.99088 | 0.87695 | |

ELMAN | 0.49631 | 0.01386 | 0.99449 | 0.69381 | |

VMD–ELMAN | 0.20425 | 0.00592 | 0.99946 | 0.25184 | |

ICEEMDAN–ELMAN | 0.22961 | 0.00642 | 0.9995 | 0.27769 | |

VMD–GRU | 0.17334 | 0.00503 | 0.99949 | 0.20777 | |

GA–ELMAN | 0.47408 | 0.01324 | 0.99502 | 0.65681 | |

VMD–GA–ELMAN | 0.17776 | 0.00390 | 0.99938 | 0.23451 | |

VMD–GRU–VMD–ARIMA | 0.07790 | 0.00213 | 0.99988 | 0.10026 | |

VMD–GA–ELMAN–VMD–ARIMA | 0.14167 | 0.00284 | 0.99957 | 0.19911 |

Data . | Model . | MAE . | MAPE . | R^{2}
. | RMSE . |
---|---|---|---|---|---|

Data 3 | LSTM | 0.60941 | 0.01688 | 0.99132 | 0.88727 |

GRU | 0.51465 | 0.01439 | 0.99438 | 0.70720 | |

MLP | 0.61394 | 0.01692 | 0.99088 | 0.87695 | |

ELMAN | 0.49631 | 0.01386 | 0.99449 | 0.69381 | |

VMD–ELMAN | 0.20425 | 0.00592 | 0.99946 | 0.25184 | |

ICEEMDAN–ELMAN | 0.22961 | 0.00642 | 0.9995 | 0.27769 | |

VMD–GRU | 0.17334 | 0.00503 | 0.99949 | 0.20777 | |

GA–ELMAN | 0.47408 | 0.01324 | 0.99502 | 0.65681 | |

VMD–GA–ELMAN | 0.17776 | 0.00390 | 0.99938 | 0.23451 | |

VMD–GRU–VMD–ARIMA | 0.07790 | 0.00213 | 0.99988 | 0.10026 | |

VMD–GA–ELMAN–VMD–ARIMA | 0.14167 | 0.00284 | 0.99957 | 0.19911 |

Data . | Indexes . | ELMAN vs. Model . | ||||
---|---|---|---|---|---|---|

VMD–ELMAN (%) . | ICEEMDAN–ELMAN (%) . | GA–ELMAN (%) . | VMD–GA–ELMAN (%) . | VMD–GA–ELMAN–VMD–ARIMA (%) . | ||

Data 1 | 36.6218 | 21.4355 | 1.4411 | 62.8680 | 97.8866 | |

35.3526 | 11.2116 | −22.0615 | 54.7016 | 97.3779 | ||

0.2437 | 0.2417 | −8.6870 | −1.2656 | 0.28418 | ||

41.5320 | 22.0127 | 6.6607 | 60.5790 | 97.9329 | ||

Data 2 | 43.1723 | 40.6828 | 17.5140 | 69.3307 | 86.7419 | |

44.2761 | 42.7609 | 17.5084 | 70.5387 | 87.2054 | ||

1.4502 | 1.4106 | 1.0169 | 1.4563 | 1.3852 | ||

71.3174 | 65.6764 | 33.8508 | 81.3748 | 92.0689 | ||

Data 3 | 58.8463 | 53.7366 | 4.4791 | 64.1837 | 71.4553 | |

57.2872 | 53.6797 | 4.4733 | 71.8615 | 79.5094 | ||

0.5000 | 0.5038 | 0.5040 | 0.4917 | 0.5108 | ||

63.7019 | 59.9761 | 5.3329 | 66.1997 | 71.3019 |

Data . | Indexes . | ELMAN vs. Model . | ||||
---|---|---|---|---|---|---|

VMD–ELMAN (%) . | ICEEMDAN–ELMAN (%) . | GA–ELMAN (%) . | VMD–GA–ELMAN (%) . | VMD–GA–ELMAN–VMD–ARIMA (%) . | ||

Data 1 | 36.6218 | 21.4355 | 1.4411 | 62.8680 | 97.8866 | |

35.3526 | 11.2116 | −22.0615 | 54.7016 | 97.3779 | ||

0.2437 | 0.2417 | −8.6870 | −1.2656 | 0.28418 | ||

41.5320 | 22.0127 | 6.6607 | 60.5790 | 97.9329 | ||

Data 2 | 43.1723 | 40.6828 | 17.5140 | 69.3307 | 86.7419 | |

44.2761 | 42.7609 | 17.5084 | 70.5387 | 87.2054 | ||

1.4502 | 1.4106 | 1.0169 | 1.4563 | 1.3852 | ||

71.3174 | 65.6764 | 33.8508 | 81.3748 | 92.0689 | ||

Data 3 | 58.8463 | 53.7366 | 4.4791 | 64.1837 | 71.4553 | |

57.2872 | 53.6797 | 4.4733 | 71.8615 | 79.5094 | ||

0.5000 | 0.5038 | 0.5040 | 0.4917 | 0.5108 | ||

63.7019 | 59.9761 | 5.3329 | 66.1997 | 71.3019 |

Figure 9 is the scatter plot diagram of the three data groups of data prediction results. The scatter diagram of the VMD–GA–ELMAN–VMD–ARIMA model is the closest to the regression line and is relatively uniform, while the prediction model is reasonable.

## DISCUSSIONS

### Feasibility of parameter optimization

The learning rate can control the speed of updating the parameters of the neural network during training, and the learning rate is small, which can maintain the convergence speed and the stability of the neural network training (Yi 2015). However, if the learning rate is too large, it causes the instability of the network training. An appropriate momentum coefficient can quickly update the network weights to avoid the network falling into a local minimum (Masood *et al.* 2016; Narayanan *et al.* 2016; Wang *et al.* 2016). In this experiment, GA is used to optimize the values of the learning rate and the momentum coefficient, and two optimal values avoid the occurrence of the above situations.

When the original data are not preprocessed, the ELMAN parameters should be optimized, and the ELMAN parameters of each subsequence predicted after VMD decomposition should be optimized to compare the effect of parameter optimization and to illustrate the rationality of the proposed optimization method. From the experimental results, the learning rate and the momentum coefficient of ELMAN optimized by GA are better than those without optimization.

### Model prediction performance improvement rate

Through the above experimental comparison, it is concluded that the prediction performance of the VMD–GA–ELMAN–VMD–ARIMA model is the best of in all nine models. Therefore, the prediction results of the ELMAN basic model are directly used as the standard. , , , and are the values of the ELMAN model prediction accuracy evaluation index, and , , , and are the values of comparison model.

## CONCLUSIONS

The accurate prediction of the water level can have a very important impact on dams, embankments, agriculture, and navigation. At the same time, it can reduce its adverse impact to some extent and improve people's utilization of water. This paper proposes a combination model of the VMD double processing, a genetic optimization algorithm, and an ELMAN prediction, while using the VMD–ARIMA model to correct the error. In the proposed VMD–GA–ELMAN–VMD–ARIMA model, the VMD is used to decompose the original water level series, and then the GA–ELMAN model is used to predict each subseries. At this time, the prediction series and error series are obtained. Then, the VMD–ARIMA model is used to predict the error sequence to obtain the prediction error sequence. Finally, adding the prediction error sequence to the prediction sequence is the final prediction result. Three groups of water level data are used to verify the model, and the performance improvement rate is compared with the other five models. The following conclusions are drawn:

- (1)
The prediction performance of the combined model is better than that of the single model. Data preprocessing by the VMD can reduce the prediction error caused by noise fluctuation in the water level sequence to effectively improve the prediction accuracy. Among the three groups of experimental data, the prediction accuracy of VMD decomposition is higher than that of the ICEEMDAN model. It can be seen that the subsequence of VMD decomposition is more stable.

- (2)
The proposed GA optimizes the learning rate and the momentum coefficient of ELMAN, which is more accurate than the prediction effect of the ELMAN single model. On the first set of data, the values of and predicted by GA optimization are lower than those of ELAMN, but the values of and are higher than those of ELAMN. In the second and third groups of experiments, the GA–ELAMN model is more stable than ELMAN, and the VMD–GA–ELMAN model is more stable than VMD–GA, which shows that GA optimization parameters can also improve the prediction accuracy.

- (3)
The VMD–ARIMA model is used to correct the error sequence predicted by the VMD–GA–ELMAN and VMD–GRU models. Three groups of experimental data show that the error correction model can effectively improve the prediction performance, and the VMD–GA–ELMAN–VMD–ARIMA model has the most accurate prediction accuracy, while the VMD double processing has the ability to considerably improve the prediction accuracy, which also verifies that the proposed model is reasonable and feasible.

Although the prediction performance of the model is satisfactory, other factors affecting the water level, such as precipitation and flow, are not considered. In future studies, we will consider other factors that affect water levels.

## AUTHOR CONTRIBUTIONS STATEMENT

W.-Y.X. conceptualized the whole article, developed the methodology, wrote the original draft, and developed software and conducted investigation. Y.-L.B. conceptualized the whole article, supervised the work, conducted funding acquisition, and wrote the review and edited the article. L.-D. investigated the work and developed software. Q.-H.Y. validated the article and conducted data curation. W.-S. visualized the article. All authors read and agreed to the published version of the manuscript.

## ACKNOWLEDGEMENTS

This research was funded by the NSFC (National Natural Science Foundation of China) project (grant no. 41861047, 62066041, and 41461078). We are thankful to the reviewers whose constructive comments helped significantly to improve this work.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST STATEMENT

The authors declare there is no conflict.