The applicability of ASCS_LSTM_ATT model for water level prediction in small- and medium-sized basins in China

Water level prediction of smalland medium-sized rivers plays an important role in water resource management and flood control. Such a prediction is concentrated in the flood season because of the frequent occurrence of flood disasters in the plain area. Moreover, the flood in mountainous areas suddenly rises and falls, and the slope is steep. Thus, establishing a hydrological prediction model for smalland medium-sized rivers with high accuracy and different topographic features, that is, plains and mountains, is an urgent problem. A prediction method based on ASCS_LSTM_ATT is proposed to solve this problem. First, the important parameters are optimized by improving the cuckoo search algorithm. Second, different methods are used to determine the forecast factors according to various topographic features. Finally, the model is combined with the self-attention mechanism to extract significant information. Experiments demonstrate that the proposed model has the ability to effectively improve the water level prediction accuracy and parameter optimization efficiency.

Therefore, attention should be provided to the water level simulation of small-and medium-sized rivers with different topographies to improve the hydrological forecast accuracy and provide decision-making services for flood control.
At present, the two approaches for water level prediction are available: conceptually or physically based and datadriven models. Conceptually or physically based models utilize multiple-related variables, such as evaporation, infiltration rate, and soil moisture content, to obtain the physical parameters for the prediction task. Data-driven models directly find the relationship between precipitation and water level from the obtained hydrological data and increase data availability (Liu et al. ). Conceptually or physically based models have been widely used in many countries due to their high efficiency. However, these models exhibit some problems due to the lack of hydrological data in small-and medium-sized rivers. These conceptually or physically based models also exhibit difficulty in capturing highly nonlinear relationships due to the large number and wide distribution of small-and medium-sized rivers.
With information development, the neural network becomes a group of classic data-driven methods that can solve the problem of model construction caused by the lack of characteristic hydrological parameters to a large extent and simulation of the temporal and spatial nonlinear changes in hydrological systems (Chen et al. ). For example, the neural network has been used in reservoir operation (Anvari et al. ), water level prediction (Piasecki et al. ), water quality simulation (Chang et al. ), and precipitation forecasting (Akbari Asanjan et al. ). The recurrent neural network (RNN) efficiently performs in time series prediction and has been successfully applied to hydrology research (Bai & Shen ). However, the RNN is unable to learn and deal with the 'long-term dependency' tasks autonomously. After many epochs, the gradient tends to disappear or explode in most cases, thereby resulting in network performance degradation and prediction accuracy reduction. Therefore, Hochreiter & Schmidhuber () proposed the long short-term memory (LSTM) to solve the problem of 'long-term dependencies,' gradient explosion, and vanishing.
The LSTM has extensive usage in certain subjects, such as speech recognition (Graves et al. ), machine translation (Cho et al. ), and stock forecasting (Nelson et al. ). Le et al. () established an LSTM flood forecast model. This model respectively used daily flow and precipitation as input data and proved its effectiveness for runoff forecast in Vietnam River Basin. Zhang et al. () used LSTM to predict the daily water level, and the pro- when faced with complex multi-node networks. These conditions will lead to local optimal solutions during water level prediction. Therefore, the improved parameter optimization algorithm is of considerable importance to the water level prediction of the model.
The forecast factors also have an important influence on model performance. For example, the prior judgment method considerably relies on empirical judgment, and the neural network selection method is computationally expensive and inefficient. The above-mentioned forecast factor selection method will slow down the model calculation speed during water level prediction (Zhao & Yang ).
Hence, different forecast factor selection methods are studied for small-and medium-sized rivers with various topographic features.
The attention mechanism enables neural networks to focus on its input (or feature) subset, thus assigning different weights to forecast factors. Rush et al. () applied an attention model to text summarization to extract keywords from long sentences or paragraphs. Bahdanau et al. () applied the attention model to machine translation. However, the information captured by the early attention mechanism is limited; thus, Google proposed the self-attention mechanism (Vaswani et al. ). The self-attention mechanism can effectively learn the internal structure of the sequence and then extract different aspects of information from the sequence. Given the limited hydrological data of small-and medium-sized rivers, this mechanism is used to allocate reasonable weights for the forecast factors to extract key features. Furthermore, the self-attention mechanism applications are still at their initial phase, with few studies on hydrological prediction in China. Studying the applicability of the self-attention mechanism in small-and medium-sized rivers in China is important. Therefore, the hydrological data and deep learning technology are maximized to model the water level process of small-and medium-sized rivers with different topographic features. This study proposed the adaptive step size CS algorithm (ASCS) to the hydrological model by combining the LSTM and the self-attention mechanism, namely ASCS_LST-M_ATT. The contributions of this study are as follows: 1. The self-attention mechanism is combined with LSTM and can effectively capture the dependence between time series and extract key hydrologic features to address the problems of low accuracy and limited hydrological data.
2. Different methods are proposed to determine the forecast factors according to different topographic features to improve the accuracy of water level prediction of smalland medium-sized rivers.
3. An improved CS algorithm is proposed to optimize the parameters of ASCS_LSTM_ATT and improve global and local search capabilities. Such a task is performed to address the local convergence and limited optimization.

RELATED WORK
LSTM LSTM uses three gating mechanisms to filter the amount of information flow. Different functions are used to calculate and obtain the hidden layer state value to learn the timedependence relationship in the signal. Figure 1 shows the structure of the 'memory cell'.
The gate control mechanism is the key to realize LSTM.
Variable gi t is the input gate that controls the input value; f t is the forget gate, which controls the preservation of the cell's historical state; and o t is the output gate that controls the output value.

Self-attention mechanism
The self-attention mechanism can quickly lock the key points in the target from massive information and reduce the calculation burden of processing high-dimensional input data. Figure 2 shows the structure of the attention mechanism.
The attention mechanism is a process wherein the Encoder is responsible for learning the semantic code C from the input sequence Source, and then the Decoder generates each output Target considering the semantic code C.
The self-attention mechanism is a special case of attention mechanism. In the Encoder-Decoder framework of general tasks, the source and target are different. The aforementioned mechanism refers to the attention calculation mechanism that occurs between the internal elements of the source or target to further obtain the internal correlation and improve data validity.  (1) shows.
According to the step factor in Yang & Deb (), where x hþ1 i and x h i are the positions of the i(i ¼ 1, 2, 3, . . . n) nest in the h þ 1 and h generations, respectively; ⊕ is the point-to-point multiplication; α is the step control quantity, α 0 ¼ 0:01; x best is the current optimal solution position; L(β) is the Levy random search path; and L(β) and time t obey Levy distribution.
where β is the heavy-tailed distribution.

Evaluation index
The evaluation index value is an important way to improve model prediction quality.  Root mean square error (RMSE) measures the overall performance across the entire dataset.
R-square (R 2 ) can judge the fitting degree of the algorithm, and its value range is [0, 1]. The algorithm fitting effect is satisfactory when the result is close to 1.
Mean absolute error (MAE) can efficiently reflect the deviation between the predicted and the actual values.
Mean squared error (MSE) is the quotient of intra-group variation and error freedom in ANOVA.
Nash-Sutcliffe efficiency (NSE) quantitatively describes the model output accuracy.
where y i ,ỹ i , and y i , respectively, represent the corresponding real, predicted, and average values.

PROPOSED METHODS
As previously mentioned, the CS can enhance the prediction model parameters. However, this algorithm still has room for improvement. The use of the self-attention mechanism can improve the relevance of data. The selection of forecast factors in different basins is also crucial to the prediction accuracy. Therefore, ASCS_LSTM_ATT is proposed to predict the water level of small-and medium-sized rivers.
The proposed model gains an advantage in mimicking highly nonlinear, complex systems, and building models without a priori information. Moreover, ASCS_LSTM_ATT acquires the relationship between input and output directly from existing hydrological data rather than the physically based model using a full set of mathematic equations for each part in the hydrological cycle (i.e., interception, infiltration, and evaporation). The physically based model of smalland medium-sized rivers is difficult to complete because of the lack of hydrological data. ASCS_LSTM_ATT is able to find hidden hydrologic rules from the existing hydrological data of small-and medium-sized rivers, which saves expensive computational costs and reduces plenty of data requirements and parameters to be estimated.

ASCS algorithm
CS randomly generates a step factor through the Levy flight.
However, the fixed value of the step factor α 0 is 0.01. Consequently, this algorithm is unable to adjust the step size and slows down the convergence speed. Therefore, the α 0 of the algorithm is improved.
In the middle stage of optimization, the size of the α 0 is reduced to improve the local search capability. The α 0 in Equation (2) is replaced with Equation (9).
Therefore, the improved Levy flight update position is shown in the following equation: where h i represents the current number of iterations, h max represents the total number of iterations.
Equation (1) is used to calculate the location of hatched nests, and the random number r is generated. The probability of discovering r and foreign cuckoo eggs with the host p a is compared. If r > p a , then the new nest location is obtained by the random walk search strategy, as shown below: where x h j and x h i are the position vectors of two randomly hatched nests of the h generation, and ξ is a uniform scaling factor that obeys [0, 1].
The steps of the ASCS algorithm are as follows: 1. Initialize parameters, including the number of bird nest n,   2. Factors that can be obtained before the flood occurrence include precipitation and water level information a few hours before the current time.
Thus, the correlation between the input and output data of the model is calculated by the following equation: where n represents the total number of time series, x i represents the ith series, y i represents the value to be predicted corresponding to the ith series, x represents the average value of all series, and y represents the average value of the value to be predicted corresponding to all series.

In plain areas
The small-and medium-sized rivers in the plain area have a gentle slope, and the terrain in the basin is mainly plain and hilly with low average elevation. The plain polder area is also low lying and has few water storage projects and weak flood detention capacity. The river terrain is flat, and its confluence speed is slow. In general, the farther the distance from the rain gauges to the outlet station of the basin, the longer the confluence time. Accordingly, the precipitation of rain gauges located in different geographical locations has diverse impacts on the target water level. A factor prediction method is proposed according to the precipitation and geographical characteristics of small-and medium-sized rivers in plain areas: The correlation between the water level of the target station in the specified forecast periods and the precipitation of rain gauges in the basin before the forecast period is calculated through Equation (12). The distance between the rain gauges and the target station is calculated by latitude and longitude.
The different rain gauges are arranged in ascending order according to the distance and the correlation coefficients of rain gauges are arranged in ascending order. A variable called Tcorr, which refers to the time corresponding to the correlation coefficient, is proposed. The steps to determine Tcorr are as follows. 1. The Tcorr of the rain gauges, which has a small distance, should be first considered.
2. The Tcorr can be the same when the difference of the distance of the markable station is within the allowable range.
3. Finally, the time Tcorr with the largest correlation is determined. If Tcorr has been selected by other rain gauges, then the largest Tcorr in the remaining correlation coefficient is selected.
All values from Tcorr to current time T are selected as the forecast factors of the model. Figure 4 shows the flow chart of determining forecast factors for small-and medium-sized rivers in plain areas.

In mountainous areas
The small-and medium-sized rivers in the mountainous areas experience floods with extremely fast onset, large quantity, and violent fluctuation. Moreover, the terrain of the basin is mainly mountainous with high average elevation, large relative height difference, steep and narrow riverbed, large slope, complex confluence path and short runoff producing and converging process. Thus, the precipitation of rain gauges in different geographical locations slightly affects the target water level station.
Hence, the precipitation of rain gauges in different geographical locations has a slight impact on the target water level station. A factor prediction method is proposed according to the characteristics of precipitation and topography of small-and medium-sized rivers in mountainous areas. The correlation between the water level of the target station in the specified forecast periods and the precipitation of rain gauges in the basin before the forecast period is calculated through Equation (12). The steps to determine Tcorr are as follows.
1. Arrange the correlation coefficients of different rain gauges in descending order.
2. Select the time corresponding to the largest correlation coefficient as Tcorr.  areas.

ASCS_LSTM_ATT prediction model
The input layer of the water level prediction model based on ASCS_LSTM_ATT takes the water level of the hydrological station and precipitation of rain gauges as independent variables, including data filling realization, normalization, and training set division. Data feature searching and parameter transfer are realized in the hidden layer. In the output layer, the water level data are reverse normalized, and the water level prediction results are output.
The self-attention mechanism is used to assign different weights to the forecast factors for improving the correlation between the input and output data. Additionally, the key features extracted from the self-attention mechanism are used as input of the first LSTM unit after regularization, and the regularization structure is used to avoid the overfitting of the model. Then, the key implicit features of the   The steps of establishing ASCS_LSTM_ATT model are as follows.
1. Process hydrological time series sample data.
2. Use the appropriate forecast factor method to select the appropriate forecast factors.
3. Assign different weights to input data through the selfattention mechanism.
4. Set the model's structure and initialize parameters, which mainly include hidden_size, lr, and the parameters in ASCS.
5. Train the ASCS_LSTM_ATT and use the ASCS to optimize the parameters.
6. Use the optimal solution to establish the ASCS_LST-M_ATT and output the prediction water level value.
Otherwise, return to step (5) and continue to optimize the parameters.

EXPERIMENTAL RESULTS AND ANALYSIS
Data description and preprocessing

Data description
The Qinhuai basin is mainly located in the middle and lower are test sets.

Data preprocessing
The core steps are implemented to avoid the negative effects of the missing original hydrological data, multiple noises, and random values on the data prediction accuracy.

Data Interpolation
Interpolating the precipitation is necessary because the precipitation information in the two basins is incomplete. The methods commonly used in hydrology include mean and linear interpolations.

Mean interpolation
Mean interpolation calculates the mean value of the answer unit in each target variable and fills the average value of each group into the all missing items.
where a i is an indicative variable, a i ¼ 0 represents no answer, a i ¼ 1 means an answer is provided, and n is the total number of units with answers.

Linear interpolation
Linear interpolation calculates the value of an unknown quantity between two known quantities by connecting two known quantities of straight lines. Given the coordinates (x 0 , y 0 ) and (x 1 , y 1 ), the value of x on the line at some point in the interval [x 0 , x 1 ] is solved.
The precipitation with 50 sample size and 20% missing rate in Tunxi was used. The missing values were generated at 4, 10,23,24,27,32,36,38,40, and 43 places. Table 1 shows the comparison of the two interpolation methods.   Table 1 demonstrates that the RMSE of linear interpolation is smaller than that of mean interpolation one, which indicates completion value obtained is more accurate. Therefore, the linear interpolation method is adopted to complete the missing precipitation value.

Min-Max
In sequence X, X max, and X min are the maximum and minimum values, respectively. Thus, the value of each element in the resulting sequence X 0 is within [0,1], and each element X i has the following equation:

Comparison of LSTM_ATT and LSTM
The self-attention mechanism assigns different weights to the input to improve the prediction accuracy. Hence, the LSTM is combined with the self-attention mechanism to predict the water level.  Table 3 shows the evaluation indexes under different models. The forecast period is 1 h.  LSTM_ATT. This result demonstrates that the combination of LSTM and self-attention mechanism can improve the prediction accuracy to a certain extent.
The two above-mentioned situations show that the combination of LSTM and self-attention mechanism can improve the accuracy of water level prediction to a certain extent. The LSTM_ATT can reduce prediction error and input noise data compared with LSTM.    and RMSE is chosen as the evaluation criterion. The lr range is generally set as [0.001, 0.01], and the hidden_size is within [40,150]. Table 4 shows the details of PSO, GA, CS, and ASCS.

Comparison of ASCS and other optimization algorithms
The remarkable feature in Figure 11 is the ASCS, which performs better than other models in optimizing the

Qinhuai basin
In Qinhuai, Dongshan station is selected as the target forecast station. The early-stage water level of Dongshan and the early-stage precipitation of rain gauges in the basin are selected as the forecast factors. The forecast period is set from 1 to 6 h. The forecast period of 1 h (t þ 1) is taken as an example. Table 5 shows the correlation coefficient value and distance of Dongshan station.
In Table 5, Dongshan z (t þ 1) represents the water level of Dongshan station in the future time (t þ 1), and Dongshan q (t À 1) indicates the precipitation in the first     The flood lasted for a long time because the Qinhuai is located in the plain area. The terrain is relatively flat, and the precipitation of rain gauges in different geographical locations has varying effects on the water level of the target station. Thus, the correlation of water level of the target station and the precipitation of rain gauges should be fully combined with the distance as forecast factors.

Tunxi basin
In Tunxi, Tunxi station is selected as the target forecast station. The water level in the early stage of Tunxi station and the precipitation of rain gauges in the basin in the early stage are selected as the forecast factors. The forecast period is set from 1 to 6 h. The forecast period of 1 h (t þ 1) is taken as an example. Table 7 shows the correlation coefficient and distance of Tunxi station.
From Table 8  The target hydrological station, including water level and precipitation.
Bold signifies the Tcorr finally selected by different rain-gauges according to the selection principle of the forecast factors in the 'Forecast factor selection method' section.

Comparison of different models
ASCS_LSTM_ATT with BP, CNN, SVM, and WaveNet are compared. The five models are predicted and simulated by observing the evaluation index and the fitting curve. Table 9 shows the parameters of ASCS_LSTM_ATT in different basins.
The PSO is used to adjust the hyperparameters of SVM, and the kernel function is selected as RBF. Two hyperparameters need to be adjusted in SVM: one is the penalty coefficient C, and the other is the gamma. The parameter values (ranges) are shown in Table 10. The parameters of PSO are mentioned in the 'Comparison of ASCS and other optimization algorithms' section.
The Bayesian Optimization is used to adjust the hyperparameters of WaveNet, and select ReLU as the activation function and Adam as the optimizer. Four hyperparameters need to be adjusted in WaveNet: the learning rate (lr), the hidden layers (l ), the value of dropout (d ), and the node of fully connected layers ( fn). The parameters values (ranges) are followed in Table 11.
The GA is used to adjust the hyperparameters of BP, and select ReLU as the activation function and Adam as the optimizer. Three hyperparameters need to be adjusted in BP: the learning rate (lr), the hidden layers (l ), and the node of hidden layers (n). The parameters values (ranges) are followed in Table  are followed in Table 13.     What can be clearly seen in Table 14 is the NSE of LSTM is higher than four other models, indicating that LSTM is relatively stable in the water level prediction of these two catchments and the prediction accuracy is relatively high. The NSE of LSTM generally ranges from 0.82 to 0.97 and reveals that there has been a gradual decrease in the forecast period of 1-6 h, which means that the simulation accuracy of water level forecast of LSTM decreases gradually. Furthermore, the LSTM had a narrow range of RMSE between 0.012 and 0.042 relative to the four other models, which implies that the simulated water level of LSTM was close to observed values and the error is small.
These results suggest that the LSTM has higher accuracy, a lower error of water level prediction, and a better simulation effect of water level prediction than the other four models.
Although SVM has been widely used in hydrological forecasting and the RMSE of SVM on water level prediction is similar to those of LSTM in the two catchments. LSTM tend to perform better than SVM on NSE, which shows that the water level simulation effect and prediction accuracy in the two catchments of SVM is weaker than that of LSTM.     uncertainty in the parameters of the CNN and WaveNet models, which may also be part of the reason for the aforementioned uncertainty.
The water level curves in Figure 12 show that the SVM overestimated the maximum level even though it could dynamically and accurately predict the change in water level. Moreover, ASCS_LSTM_ATT is superior to SVM.
The curve of the SVM prediction process shows a certain degree of jitter with the increase in prediction time, and the error is large. During frequent precipitation, the SVM prediction curve gradually deviates from the trend of the real value curve, thereby reducing the prediction accuracy.
By contrast, the range of the ASCS_LSTM_ATT jitter is small, and the overall trend fits the real value curve. Figure 13 shows that SVM performance becomes gradually worse with the increase in the forecast time. When the prediction period is 4 h, the SVM poorly performed in terms of predicting the trough and peak of the water level, and the   The water level prediction methods in other references are compared to further evaluate the performance of the proposed model. The R 2 is taken as the evaluation standard.
The forecast period of 1 h is taken as an example, as shown in Table 15. Table 15 demonstrates that the R 2 of the ASCS_LST-M_ATT model is larger than that of all other models, which indicates the high prediction accuracy of the model.   Therefore, the correlation coefficient between the real and the predicted water levels decreased with the increase in the prediction period, and the model performance also gradually decreased.
In the comparative analysis of the above-mentioned models, the performance is as follows: ASCS_LSTM_ATT >  Although some achievements have been made, many problems must still be solved. One problem is that the forecast factors only select water level and precipitation information. However, the model input should also include soil moisture, temperature, meteorological, and hydrological information. Using the model for predicting other hydrological factors, such as stream flow and precipitation, is also the focus of future research.