## Abstract

For the prediction of river flow sequence, owing to the non-stationariness and randomness of the sequence, the prediction accuracy of extreme river flow is not enough. In this study, the sparse factor of the loss function in a sparse autoencoder was enhanced using the inverse method of simulated annealing (ESA), and the river flow of the Kenswat Station in the Manas River Basin in northern Xinjiang, China, at 9:00, 15:00, and 20:00 daily during June, July, and August in 1998–2000 was considered as the study sequence. When the initial values of the sparse factor *β*_{0} are 5, 10, 15, 20, and 25, the experiment is designed with 60, 70, 80, 90, and 100 neurons, respectively, in the hidden layer to explore the relationship between the output characteristics of the hidden layer, and the original river flow sequence after the network is trained with various sparse factors and different numbers of neurons in the hidden layer. Meanwhile, the orthogonal experimental groups ESA1, ESA2, ESA3, ESA4, and ESA5 were designed to predict the daily average river flow in September 2000 and compared with the prediction results of the support vector machine (SVM) and the feedforward neural network (FFNN). The results indicate that after the ESA training, the output of the hidden layer consists of a large number of features of the original river flow sequence, and the boundaries of these features can reflect the river flow series with large changes. The upper bound of the features can reflect the characteristics of the river flow during the flood. Meanwhile, the prediction results of the orthogonal experiment groups indicate that when the number of neurons in the hidden layer is 90 and *β*_{0} = 15, the ESA has the best prediction effect on the sequence. In particular, the fitting effect on the day of ‘swelling up’ of the river flow is more satisfactory than that of SVM and FFNN. The results are significant, as they provide a guide for exploring the evolution of the river flow under drought and flood as well as for optimally dispatching and managing water resources.

## HIGHLIGHTS

An enhanced sparse autoencoder for river flow prediction is proposed.

The optimal sparse factor and the number of hidden neurons of ESA are discussed and obtained.

The output value of the sparse layer implies extreme flow characteristics.

### Graphical Abstract

## INTRODUCTION

Under the impact of frequent human activities, river flow sequences demonstrate significant non-stationariness and randomness (Ye *et al.* 2013). Therefore, the prediction of river flow sequences is becoming increasingly difficult. Frequent human activities cause changes in the local atmospheric circulation, underlying surfaces, and surface and underground river flow, which make it difficult to predict river flow (Han *et al.* 2009). A large gap exists between the predicted and actual river flow during various time periods every day. Effectively predicting river flow at various times is important for the optimal management and profitable dispatch of water resources.

After carefully considering the meteorological conditions, underlying surface, and other factors related to the study area, several models have been applied for daily average river flow prediction. In short- to medium-term prediction, the Soil and Water Assessment Tool (SWAT; Easton *et al.* 2010; Jimeno-Saez *et al.* 2018; Li *et al.* 2018; Fereidoon *et al.* 2019), Hydrological Simulation Program – Fortran (HSPF; Lee *et al.* 2020), System Hydrological European (SHE; Abbott *et al.* 1986), Physically based Distributed Tank (PDTank; Lee & Singh 1999), Stanford (Crawford & Linsley 1966), and other conceptual and distributed models can achieve good prediction results in different basins. In mid- to long-term forecasting, machine learning models generally show a good prediction effect in complex environments of different basins, such as the support vector machine (SVM; Sahoo *et al.* 2019), extreme learning machine (Zhu *et al.* 2019), artificial neural network (Jimeno-Saez *et al.* 2018; Tsakiri *et al.* 2018; Zhang *et al.* 2019b), wavelet neural network (Shafaei & Kisi 2017; Alizadeh *et al.* 2018; Rakhshandehroo *et al.* 2018; Sharghi *et al.* 2018; Nourani *et al.* 2019; Santos *et al.* 2019; Sharghi *et al.* 2019; Sun *et al.* 2019), and fuzzy neural network (Badrzadeh *et al.* 2018; Bou-Fakhreddine *et al.* 2018). However, on some specific days, owing to several factors affecting the occurrence of floods, it is a challenge to sufficiently consider all the natural and human conditions that cause the floods. According to the water–heat exchange and river water quality model, combined with GIS and other technologies, it is possible to make a relatively accurate prediction of the extreme flow of the river in the short term, but the long-term prediction effect is generally insufficient (Ye 2010). On the one hand, these prediction models have significant limitations in fitting practical engineering problems, and the fitting capability of different models is different. On the other hand, researchers have been unable to comprehensively consider all the factors that affect river flow evolution such as various complex human activities, local meteorological conditions, and underlying surface changes since the 20th century (Engeland *et al.* 2017; Gangrade *et al.* 2018; Kisakye *et al.* 2018; Leta *et al.* 2018; Yang *et al.* 2018; Schreiner-McGraw *et al.* 2019).

Several factors lead to the insufficiency of the prediction accuracy of extreme river flow in the sequence. Therefore, several researchers have studied the evolution of individual constituent sequences during the flood occurrence period (Mirzaei *et al.* 2015; Requena *et al.* 2016; Ajadi *et al.* 2017; Konrad & Dettinger 2017; Yan *et al.* 2017; Pandey *et al.* 2018; Sanchez-Garcia *et al.* 2019). Simultaneously, various data-driven models have been used to explore the prediction results of extreme river flow laws, including distributed models, such as the Kalinin-mijukou (Dooge 1959), ISBA-TOPMODEL (Bouilloud *et al.* 2010), and machine learning models, such as the particle swarm optimization extreme learning machine (Niu *et al.* 2018), particle swarm optimization SVM (Zaini *et al.* 2018), and long short-term memory (Widiasari *et al.* 2018; Wang & Lou 2019). Numerous studies have reported that the main bottleneck affecting the accuracy of river flow prediction lies in the lack of prediction capability of many models in extreme cases at certain moments in the basin. The sparse autoencoder was proposed to perform feature learning on unsupervised data (Chen & Li 2017). Its advantage is that it can compress the original input sequence and retain the important information in the original sequence. In the prediction of river flow series, the output value in the hidden layer can often reflect the important characteristics of the original series.

In practical engineering, while predicting the river flow during a non-stationary period time, the sequence often includes the period of flood. In fact, in some research areas, the occurrence of floods is not very frequent and only in certain special years, seasons, and months. Therefore, the forecast of river flow in the corresponding time period usually includes the period of occasional flood. To explore the periodicity of the flood, a period of several years of flood occurrence is selected for research (Toonen 2015; Bhat *et al.* 2019; Sanchez-Garcia *et al.* 2019; Zhang *et al.* 2019a). This type of exploration can often formulate the law of flood evolution of a specific study area using the macroscopic and general flood evolution periodicity. However, in the actual short- to mid-term river flow prediction, the prediction results of extreme flow have a significant impact on the general accuracy, and the macroscopically calculated flood evolution periodicity is difficult to apply to short- to mid-term decisions. Research on short- to mid-term river flow prediction must completely mine the internal laws of the original river flow sequence. Therefore, after the network is trained, exploring the relationship between the output results in the hidden layer of the sparse autoencoder and the original sequence can often help identify some characteristics of the input sequence, particularly the flow characteristics during the flood period.

In this study, the inverse method of simulated annealing is used to improve the sparse autoencoder and the proposed enhanced simulated annealing (ESA) algorithm to study the river flow sequence. To the best of our knowledge, this is the first study to explore the characteristic relationship between the output of the ESA hidden layer and the original sequence for improving the prediction accuracy of extreme flow. The designed orthogonal experiment is used to verify that the output of the hidden layer of the ESA contains a lot of boundary information in the original sequence and provides the parameters of the ESA with the best prediction result in the experimental group.

## MATERIALS AND METHODS

### River flow sequence data at the Kenswat Station

Manas River Basin is located in the hinterland of Eurasia, between N43°27′–N45°21′ and E85°01′–E86°32′. The digital elevation ranges from 170 to 5,242.5 m, and the whole drainage basin is fan-shaped with a total length of 420 km. The annual average precipitation in this area is 100–200 mm, and the spatio-temporal distribution is uneven. The annual average evaporation is 1,500–2,100 mm, the annual average temperature is 4.7–5.7 °C, and the total area is approximately 3.099 × 10^{4} km^{2}.

The Kenswat Station is located in the middle reaches of the Manas River, and it can control 92% of the water volume of the Manas River, with a total length of approximately 8 km, a maximum dam height of 126.8 m, and a total storage capacity of 1.91 billion m^{3}. The installed capacity of the power station is 100 MW, and the designed annual power generation capacity is 2.76 billion KW·h, as illustrated in Figure 1. The daily river flow data at 9:00, 15:00, and 20:00 in June, July, and August of 1998–2000 were used in this study and provided by the Kenswat Station of Manas River, which has an elevation of 900 m and a control area of 5,156 km^{2}. The general trend of annual river flow and the distribution trend of river flow at each time point are presented in Figure 2.

### Strategies for an enhanced sparse autoencoder

#### Regular initialization of weights

The sparse autoencoder is composed of simple neurons connected by a series of weights. The output of the upper layer can be trained as the input for the next layer. Figure 3 illustrates a simple three-layer structure consisting of the input, hidden, and output layers. In the figure, ‘+ 1’ is the threshold unit, *a _{i}*

^{(k)}represents the output of the

*i*th neuron in the

*k*th layer,

*x*represents the

_{i}*i*th input vector,

*h*

_{w}_{,b}

^{(x)}represents the output value of the input sample

*x*under the connection of

*W*and

*b*, and

*L*represents the

_{i}*i*th layer. The network is trained by gradient back propagation (refer Chen & Li (2017)) for specific training processes.

*n*is the number of neurons in the

_{j}*j*th layer.

#### Simulated annealing theory

The simulated annealing algorithm is an effective method to solve the problem of local minimum energy in stochastic networks. The basic idea is to simulate the process of metal annealing. Let *X* represent the microscopic state of a system and *E*(*X*) represent the internal energy in that state.

*T*, when the system is in thermal equilibrium, during the cooling annealing process, the probability

*P*(

*E*) of the system to remain in a certain energy state and temperature obeys the Boltzmann distribution (Bengio

*et al.*2013). When the temperature is constant, the higher the energy of the material system, the lower the probability of it being in this state, and the internal energy of the material system tends to evolve in the direction of decreasing energy. As the temperature

*T*becomes higher, the system state changes more easily. To make the material system finally converge to the equilibrium state at a low temperature, a higher temperature should be set at the beginning of the annealing; this is then gradually lowered. Finally, there is a high probability that the entire system will converge to the lowest energy state. The common way to set the temperature

*T*is the following:where

*t*is the number of iterations.

#### Implementation of sparse coefficient selection based on the simulated annealing theory

When the sparse autoencoder is optimized using simulated annealing, a sparse factor is defined to simulate the annealing temperature of the metal. The change mode of the sparse factor is opposite to that of the simulated annealing. *β*_{0} in the loss function of the sparse autoencoder is simulated with various initial temperature values, and the simulation of various initial temperatures with various numbers of neurons in the hidden layer is considered. When exploring the initial value of the sparse factor, the output sequence can be more consistent with the original sequence.

*β*affects the sparsity of the network. The higher the value of

*β*, the heavier the sparse penalty, indicating that the output of the hidden layer in the network tends to 0. The lower the value of

*β*, the lighter the sparse penalty, indicating that the sparsity of the whole network is lower. To facilitate the observation of effective feature extraction in hydrological sequences, the inverse method of simulated annealing is used to dynamically select

*β*. The sparsity penalty term of the network should be from low to high, and the value of

*β*should also be taken from small to large. The value of

*β*is expressed as follows:where

*β*

_{0}is the initial value of

*β*and

*t*is the number of network iterations. Figure 4 depicts the values of

*β*when

*β*

_{0}= 5, 10, 15, 20, and 25.

The sparse autoencoder is trained based on Equation (4). After training, the output value of the hidden layer is extracted to explore the internal characteristics of the input samples.

#### ESA for studying the river flow sequence

In this study, we examine the characteristic relationship between the output value in the hidden layer and the original input sequence after the ESA training and provide new ideas for the prediction of future river flow sequence evolution. The original input river flow sequence value can be inferred according to the features of river flow sequence extracted from the hidden layer. The training process of the network is illustrated in Figure 5. The edge features extracted from the hidden layer can reflect some apparent changes in the original sequence. The edge feature of the sequence represents the ‘swell’ or ‘slump’ flow in the sequence, which is the boundary of the constraint sequence and can reflect extreme flow. In Figure 5, the sequence is divided into *n* sub-sequence samples, each of which contains 100 days of river flow in the corresponding period as the input of the ESA. The weights and thresholds of the ESA are initialized in the way of regularization. When the number of iterations is *t* = 100, the training is over, and after 30 times, the training model is verified using the verification set.

### Parameter calibration and orthogonal experimental design

The following parameters were set for the studied sequence. Divide the collected experimental data into six groups, each with 100 pieces, 70 for the training set and 30 for the verification set during training. The river flow data from September 2000 are reserved as the test set. During training, when the verification set error rises, stop training to avoid overfitting. The initial values of the sparse factor *β* were 5, 10, 15, 20, and 25, and the numbers of neurons in the hidden layer, *L*, were 60, 70, 80, 90, and 100, respectively. In the experiment exploring the relationship between the output value of the hidden layer and the original sequence, *β* and *L* were orthogonal to form 25 groups of values, namely, (*L*, *β*_{0}) = {(60, 5), (60, 10),…,(60, 25), (70, 5),…, (70, 25),…,(100, 5),…,(100, 25)}. The results of the output layer were compared with the original input, and the feature compression rate and extraction rate were used as the comparison indexes of the experimental group (*L*, *β*_{0}). Subsequently, the optimal solution of the hidden neurons *L* and the initial value of the sparse factor *β* were explored in river flow feature mining.

The daily average temperature (*X*1), daily average precipitation (*X*2), daily average evapotranspiration (*X*3), and average humidity (*X*4) were selected as the factors that affect the evolution of river flow (*Y*) at the Kenswat Station, and the orthogonal experiment was designed. In the experimental groups ESA1, ESA2, ESA3, ESA4, and ESA5, the daily average flow in September, in which the great flood occurred in 2000, was predicted. The prediction results were compared with those obtained from SVM (Chang *et al.* 2018; Bafitlhile & Li 2019) and feedforward neural network (FFNN; Yilmaz & Muttil 2014; Yaseen *et al.* 2016). The comparison indexes used were the correlation coefficient (*R*), root mean square error (RMSE), mean absolute error (MAE), 85% uniformity rate of the offset, and 80% pass rate. The daily average flow in June, July, and August during 1998–2000 were selected as samples of the prediction series. The kernel function based on the Gaussian radial is used in SVM (Fasshauer & McCourt 2012), and the penalty term (*c*) of the kernel function and the kernel parameter (*γ*) take the values 3.5 and 0.8, respectively. The selection of the number of neurons in the hidden layer adopts the trial-and-error method and the empirical method. After a large number of experimental tests, it was shown that the number of hidden layer neurons can be set to 5, which can achieve good results. Therefore, the network structure of the FFNN is 4-5-1. The weights and thresholds are initialized randomly and obey the normal distribution *N*(0, 1). In different orthogonal experimental groups, the weights and thresholds are generated using Equation (1), and *β* is generated using the simulated annealing inverse method represented by Equation (2). After 30 predictions, the average value is taken as the final prediction result of the experiment. All experiments were carried out on MATLAB R2014a.

## RESULTS

The output results of the hidden and output layers in the network are obtained under the combination of various values of *L* and the sparse factor *β*. In Figure 6, when (e1), (e2), and (e3) are *L* = 60 for different values of the sparse factor *β*_{0} at 9:00, 15:00, and 20:00, respectively, the logarithmic distribution of the hidden layer output values after training. It can be seen from Figure 6 that when *β*_{0} = 5 and 10, the difference between the values on both sides of the median is significantly smaller than when *β*_{0}*=* 15, 20, and 25. When *β*_{0} = 20 and 25, the difference between the two sides of the median increases evidently, and there are fewer scattered outliers, which indicates that the output value of the hidden layer is more than = 0.05 when the sparse factor is very small. This illustrates that with the increase in *β*, the ability of ESA to compress the input sequence improves, and the extracted river flow features move more and more to the edge. Similarly, Figures 7(e), 8(e), 9(e), and 10(e) depict the box-shaped distribution chart of the output values in the hidden layer under different values of *β* when the values of *L* are 70, 80, 90, and 100, respectively. (e1)–(e3) are the distribution diagrams at 9:00, 15:00, and 20:00, respectively.

It can be seen from the distribution at each (e1) diagram that there is a small distribution gap at 9:00 for the same value of *β*_{0} and for different values of *L*. The general distribution is relatively stable. When the value of *β* is small, the ability of the sequence to compress is weak. Therefore, the feature output value is larger, and with the increase in *β*, the difference between the two sides of the box graph increases evidently and the compression capacity of the sequence is enhanced. In (e2) and (e3), there are some ‘abnormal values’. When *β*_{0} = 5, 10, and 15, the distribution values are relatively scattered. When *β*_{0} = 20 and 25, there is a significant difference between the two sides of the median. The primary reason is that the selected datasets – June, July, and August – were months with large changes in flow sequence. At 15:00 and 20:00, there was sufficient sunshine, and the rising temperature accelerated the melting of the snow in the upper reaches of the Kenswat Station. In addition, in 2000, there were large floods and extreme flow evolution events in the station area. From June to August 2000, the minimum daily river flow was 20 m^{3}/s, whereas during the flood period, the maximum flow reached 1,100 m^{3}/s. This led to the non-stationariness and randomness of the research sequence. The dispersion of the eigenvalues in (e2) and (e3) is also caused by the date when the river flow gap is evident. It can be seen that the characteristic output value in the hidden layer can reflect the change in the river flow sequence in a certain sense.

Figures 6–10 present the flow output diagrams when the values of *L* are 60, 70, 80, 90, and 100, respectively. When parts (a), (b), and (c) represent 9:00, 15:00, and 20:00, respectively, the output values of the output layer of the ESA for different values of *β*_{0} are compared with the measured value. Parts (a1) and (a2), (b1) and (b2), and (c1) and (c2) in each figure present the output results of the divided training and validation sets. To make the data on the graph more clear, the following steps are carried out in (a1), (b1), and (c1): when *β*_{0} = 5, the output value plus 20; when *β*_{0} = 10, the output value plus 10; when *β*_{0} = 20, the output value minus 10; when *β _{0}* = 25, the output value minus 20. The embedded tables in each figure present the correlation coefficients

*R*between the output values of the training and validation sets and the corresponding measured values for different values of

*β*

_{0}. As can be seen from Figures 6–10(a), when

*L*= 60, with an increase in

*β*, the accuracy of the output of the verification set evidently decreases. In each of Figures 6–10(a2), the output values at

*β*

_{0}= 20 and 25 are significantly worse than those at

*β*

_{0}= 5 and 10. However, the simulation results are relatively good on days with large variations in river flow. In Figures 6–10(b) and 6–10(c), there are more time periods when extreme values occur at 15:00 and 20:00, and the general sequence is more non-stationary at these two time points. The output results indicate that the output values at

*β*

_{0}= 5 and 10 are evidently better than those at

*β*

_{0}= 20 and 25. When

*β*

_{0}= 15,

*R*is more inclined toward the side of

*β*

_{0}= 10, indicating that

*β*

_{0}= 15 is also verified. When

*L*increases, the output results are increasingly better at different time periods for different values of

*β*. For example, at 9:00, when

*L*= 60,

*β*

_{0}= 25, and

*R*= 91.6%, when

*L*= 100,

*β*

_{0}= 25, and

*R*= 99.65%. In addition, it can be seen from Figures 6–10(a), 6–10(b), and 6–10(c) that for different values of

*L*, the increase in

*L*has a positive effect on the improvement of

*R*between the output value of the output layer in the verification set and the measured value. However, from the results presented in Figures 6(e), 7(e), 8(e), 9(e), and 10(e), it can be seen that with the increase in

*L*, the number of output values that are significantly greater than =0.05 in the hidden layer are becoming increasingly equal to the number of input values, which is evidently not good for studying the relationship between the output values of the hidden layer and the extreme evolution values in the river flow series. An analysis and discussion of the appropriate number of hidden neurons and the value of the sparse factor

*β*as ESA parameters can help improve the prediction accuracy of the sequences.

Figure 11 presents a comparison of the prediction results of the ESA with different parameters, SVM, and FFNN for daily average flow in September 2000 under the orthogonal experiment . In September 2000, there was a large flood event at the Kenswat Station. It can be seen from Figure 11 that the prediction result of the ESA4 is the best, particularly on 11 September 2000, 18 September 2000, and other days with a large flow change; ESA4 can fit the part with a large flow change relatively well. FFNN and SVM are the most widely used models for mid- to long-term prediction. In the September forecast, SVM and FFNN demonstrate acceptable forecast accuracy with respect to some stable and relatively small changes in river flow. However, the forecast effect is relatively poor on some days with large changes in river flow. In addition, the orthogonal experiment group ESA5 also demonstrates a relatively good prediction capacity for high flow. Therefore, Figure 11 illustrates that the orthogonal experimental groups ESA4 and ESA5 can achieve relatively good sequence prediction results on days with large flow evolution.

## ANALYSIS AND DISCUSSION

Figure 12(a)–12(c) present the comparison chart of compression and extraction rates calculated at 9:00, 15:00, and 20:00, respectively, for various values of *β*_{0} and *L*. Figure 13(a)–13(c) present the comparison chart of *R* calculated at 9:00, 15:00, and 20:00, respectively, for various values of *β*_{0} and *L*. Figure 12(a) illustrates that the compression rate is generally the highest when *β*_{0} = 25 under various values of *L*. When *β*_{0} = 15 and 20, the compression rate is concentrated approximately 55%. When the number of hidden neurons is 70, the values of *R* of the verification set results are 99% and 98.8%. This indicates that when *L* = 70 and *β*_{0} = 15, a better verification effect can be obtained. When *L* = 60, the result of the verification set is generally poor; however, when *L* ≥ 70 and *β*_{0} = 5, 10, and 15, the result of the verification set is generally higher than 99%, indicating that *β*_{0} = 15 or 20. When *L* ≥ 70, the ESA not only extracts the features of the original sequence more completely, but also the accuracy of the restored original sequence is satisfactory. In the general performance of Figures 12 and 13, except for the difference between the sequence of the verification result and the original sequence at 9:00, *L* = 60, *β*_{0} ≥ 15, in other cases, the value of *R* between the verification results and the original sequence is almost close to 1, which indicates that the ESA can extract most features of the sequence. These features can reflect the law of the whole research sequence, particularly the part with large variation in river flow.

It can be seen from the prediction results of the orthogonal experiment in Figure 11 that on some special flow dates, such as 11 September 2000 and 18 September 2000, ESA4 and ESA5 can fit the actual situation well. SVM and FFNN can also demonstrate good prediction results during some time periods when the river flow is relatively stable; however, they cannot fit well during the time period with large evolution. In ESA1, although the predicted results are quite different from the real values, the general trend of the curve is similar to that of the predicted values. Particularly during the special flow time period, the prediction results demonstrate the prediction capability of the ESA from the comparison indexes presented in Table 1. In comparison of prediction performance indicators, the values of *R* for ESA4, ESA5, and SVM are higher than those for other models; the highest value for SVM is 91%. However, the curve values depicted in Figure 11 illustrate that ESA4 and ESA5 outperform the SVM in the prediction of some special flow. Apparently, the prediction results of the SVM and FFNN are smoother and generally more stable but weaker during the period of flood occurrence than those of ESA4 and ESA5. The RME and MAE are the lowest for ESA4; the values decreased by 19.2, 6.59, 4.34, 2.46, 8.71, and 8.83 and 19.16, 5.03, 3.57, 1.75, 8.4, and 8.48, respectively, compared with those for ESA1, ESA2, ESA3, ESA5, SVM, and FFNN. In the general prediction error comparison, it is evident that the orthogonal experimental groups ESA4 and ESA5 perform better. The 85% anomaly coincidence rate can reflect the gap in the capacity of the different models to demonstrate extreme flow under certain circumstances. The coincidence rates of both ESA4 and ESA5 reached 0.90, indicating that the errors between most predicted and measured values are small. Although the values of *R* for SVM and FFNN reached 91 and 83%, respectively, the anomaly coincidence rates reached only 0.43 and 0.47, respectively. This clearly indicates that in the high-flow prediction part, the prediction results of SVM and FFNN are worse than those of ESA4 and ESA5. In addition, with an error interval of 20%, the pass rate of ESA4 reaches 100%, those of ESA3 and ESA5 reach 97%, and the prediction results in the stable-flow part are also satisfactory.

. | R (%)
. | RMSE . | MAE . | 85% anomaly coincidence rate . | 80% pass rate . |
---|---|---|---|---|---|

ESA1 | 71 | 23.30 | 22.23 | 0.13 | 0.27 |

ESA2 | 82 | 10.69 | 8.42 | 0.63 | 0.87 |

ESA3 | 83 | 8.44 | 6.96 | 0.83 | 0.97 |

ESA4 | 87 | 4.10 | 3.39 | 0.90 | 1.00 |

ESA5 | 88 | 6.56 | 5.14 | 0.90 | 0.97 |

SVM | 91 | 12.81 | 11.79 | 0.43 | 0.80 |

FFNN | 83 | 12.93 | 11.84 | 0.47 | 0.77 |

. | R (%)
. | RMSE . | MAE . | 85% anomaly coincidence rate . | 80% pass rate . |
---|---|---|---|---|---|

ESA1 | 71 | 23.30 | 22.23 | 0.13 | 0.27 |

ESA2 | 82 | 10.69 | 8.42 | 0.63 | 0.87 |

ESA3 | 83 | 8.44 | 6.96 | 0.83 | 0.97 |

ESA4 | 87 | 4.10 | 3.39 | 0.90 | 1.00 |

ESA5 | 88 | 6.56 | 5.14 | 0.90 | 0.97 |

SVM | 91 | 12.81 | 11.79 | 0.43 | 0.80 |

FFNN | 83 | 12.93 | 11.84 | 0.47 | 0.77 |

The bold means the optimal value.

In the special river flow part of the prediction, the model in the ESA orthogonal experiment group generally performs better. The FFNN and SVM yield good prediction results at some time points but generally tend to be stable. Table 2 comparison of prediction results of various models on high-flow days presents the prediction results of the various models during the period with large change in September. As can be seen from the entries in bold, the prediction results of ESA4 are generally the best. This indicates that when *L* = 90 and *β _{0}* = 15, satisfactory results can be obtained in the prediction of the sequence.

Date . | Observation . | ESA1 . | ESA2 . | ESA3 . | ESA4 . | ESA5 . | SVM . | FFNN . |
---|---|---|---|---|---|---|---|---|

10 September 2000 | 84 | 67 | 73 | 76 | 81 | 80 | 72 | 70 |

11 September 2000 | 94 | 76 | 78 | 85 | 92 | 100 | 78 | 81 |

16 September 2000 | 96.5 | 77 | 90 | 87 | 100 | 99 | 85 | 79 |

17 September 2000 | 101 | 80 | 103 | 90 | 96 | 89 | 95 | 92 |

18 September 2000 | 136 | 95 | 105 | 117 | 124 | 135 | 110 | 114 |

19 September 2000 | 87.8 | 66 | 71 | 81 | 90 | 90 | 78 | 78 |

27 September 2000 | 82.8 | 63 | 73 | 67 | 77 | 85 | 69 | 76 |

28 September 2000 | 96.5 | 78 | 96 | 85 | 93 | 82 | 85 | 83 |

30 September 2000 | 113 | 89 | 94 | 100 | 116 | 101 | 99 | 108 |

Date . | Observation . | ESA1 . | ESA2 . | ESA3 . | ESA4 . | ESA5 . | SVM . | FFNN . |
---|---|---|---|---|---|---|---|---|

10 September 2000 | 84 | 67 | 73 | 76 | 81 | 80 | 72 | 70 |

11 September 2000 | 94 | 76 | 78 | 85 | 92 | 100 | 78 | 81 |

16 September 2000 | 96.5 | 77 | 90 | 87 | 100 | 99 | 85 | 79 |

17 September 2000 | 101 | 80 | 103 | 90 | 96 | 89 | 95 | 92 |

18 September 2000 | 136 | 95 | 105 | 117 | 124 | 135 | 110 | 114 |

19 September 2000 | 87.8 | 66 | 71 | 81 | 90 | 90 | 78 | 78 |

27 September 2000 | 82.8 | 63 | 73 | 67 | 77 | 85 | 69 | 76 |

28 September 2000 | 96.5 | 78 | 96 | 85 | 93 | 82 | 85 | 83 |

30 September 2000 | 113 | 89 | 94 | 100 | 116 | 101 | 99 | 108 |

When the sequence was processed by the ESA, most output values in the hidden neurons tended to = 0.05. Therefore, the ESA can be considered to compress the original input sequence to a certain extent. Most information in the original sequence is hidden in the neurons whose output values in the hidden layer are far higher than ρ = 0.05, which can reflect the internal rules and characteristics of the original sequence. This edge feature reflects the upper and lower boundaries of the original sequence. The upper boundary of the feature reflects the value of extreme flow in the sequence. The results provide new ideas for exploring the evolution of the regional flood.

## CONCLUSIONS

In this study, the traditional sparse autoencoder was enhanced using the inverse method of simulated annealing. Considering the river flow data observed at the Kenswat Station as a research example, this study explored the feature extraction of river flow sequences and the prediction performance of future river flow sequences under various values of *L* when the sparse factor *β* takes different initial values. Orthogonal experiments were designed, and the forecast results of the flood months were obtained by the orthogonal experimental groups of SVM and FFNN. By analyzing the output characteristics of the hidden layer under the various combination models and the prediction results of the orthogonal experiments, the following conclusions are reached:

- (1)
The output value of the hidden layer in the ESA contains the evolution characteristics of the original sequence and can reflect the edge of non-stationary sequences.

- (2)
When

*L*= 90 and*β*_{0}= 15, the prediction result of the river flow sequence at the Kenswat Station is the best. At 9:00, 15:00, and 20:00, the compression ratio of the original sequence is 55, 54, and 54%, respectively. - (3)
In the output feature of the ESA hidden layer, the upper bound of the features can reflect the eigenvalues of river flow during the flood in the input sequence, thereby providing some new ideas for exploring the law of evolution of regional floods.

In the future, we will continue to explore the relationship between the output value of the hidden layer of the ESA and the law of evolution of non-stationary river flow sequences and investigate the law of evolution of river flow under drought and flood, which can facilitate better management and planning of water resources.

## CONFLICT OF INTEREST

The authors declare that they have no conflict of interest.

## ACKNOWLEDGEMENTS

Financial supports from the National Natural Science Foundation of China (U1803244), the National Key R&D Program of China (2017YFC0404304), the Key Science and Technology Project in special issues of Bingtuan (2019AB035), and the Talent initiate scientiﬁc research projects of the Shihezi University (RCZK2018C23) are gratefully acknowledged.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.