## ABSTRACT

This study proposes a hybrid model based on the combination of Sand Cat Swarm Optimization (SCSO), Echo State Network (ESN), Gated Recurrent Unit (GRU), Least Squares Method (LSM), and Markov Chain (MC) to improve the accuracy of annual runoff prediction. Firstly, conduct correlation analysis on multi-factor data related to runoff to determine the input of the model. Secondly, the SCSO algorithm is used to optimize the parameters of the ESN and GRU models, and the SCSO-ESN and SCSO-GRU models are established. Next, the prediction results of these two models are coupled using LSM to obtain the preliminary prediction results of the SCSO-ESN-GRU model. Finally, the initial prediction results are corrected for errors using MC to get the final prediction results. Choose Changshui Station and Lanxi Station for experiments, and evaluate the predictive performance of the model through five evaluation indicators. The results show that the combined prediction model corrected by the MC achieved the optimal prediction performance at both experimental stations. This study emphasizes that using a combination prediction model based on Markov chain correction can significantly improve the accuracy of annual runoff prediction, providing a reliable basis for predicting annual runoff in complex watersheds.

## HIGHLIGHTS

A new hybrid sand cat swarm optimization-echo state network-gated recurrent unit-Markov chain (SCSO-ESN-GRU-MC) model is proposed for annual runoff prediction.

A new SCSO algorithm is used to optimize the parameters of the prediction model.

Least squares method is used to couple the ESN and GRU models for annual runoff prediction.

MC is used to correct errors in annual runoff prediction results.

The proposed model achieves the best prediction results at both experimental stations.

## INTRODUCTION

The sustainable development of water resources has become a global focus, and how to plan and manage the uneven distribution and shortage of water resources has emerged as a pressing issue that requires attention (Wei & Xun 2019). In this severe context, runoff, as an essential component of the hydrological cycle on Earth, has attracted much attention. For the best possible management of water resources, flood prevention, disaster relief, and water environment conservation, accurate and trustworthy runoff prediction is essential (Wang *et al.* 2009; Xu *et al.* 2023). Medium- to long-term forecasting focuses on the scale of the coming months and years, which is crucial for long-term water resource planning, management, and development (Bian *et al.* 2023). Through in-depth research on medium- to long-term forecasting, we can have a more comprehensive understanding and prediction of the trend of watershed runoff changes, providing strong technical support for achieving sustainable utilization and protection of water resources. Therefore, precise and dependable mid- to long-term runoff forecast models are essential.

The current models for medium- to long-term runoff prediction are mostly process-driven based on physical models and data-driven based on historical statistics (Wang *et al.* 2013; Han & Morrison 2022a). The process-driven model is founded on a deep understanding of hydrological processes, using mathematical equations to describe various physical processes of the hydrological system, such as rainfall, snow melting, evaporation, and soil moisture movement (Partington *et al.* 2012). Process-driven models typically require a huge quantity of data and computational resources and are complex in parameter determination and model calibration. Therefore, long-term runoff prediction has many limitations, such as annual runoff (Yoon *et al.* 2011; He *et al.* 2023). Data-driven models are trained and learned using huge amounts of data, and their design and prediction capabilities mainly depend on input data characteristics (Min *et al.* 2023). The reason why data-driven models are valued in the field of runoff prediction is that they can effectively predict by analyzing and learning the relationship between input and output data, without relying on a deep understanding of the complex physical processes of hydrological systems. The ease of use and accuracy of this method make it particularly popular in predicting runoff (Parisouj *et al.* 2020).

Time series, machine learning, neural networks, and deep learning models are examples of data-driven models. The Autoregressive Integrated Moving Average (ARIMA) model is one of the representative time series prediction models (Liu *et al.* 2012). Adnan *et al.* (2018) used the Least Squares Support Vector Machine (LSSVM) model to predict daily and monthly runoff at three sites, resulting in better results than the M5 model tree. Although time series and machine learning models have shown outcomes in runoff prediction, neural network models have more advantages in nonlinear modeling, parameter adaptation, and flexibility than the above two models. Neural network models such as the Elman neural network (ENN) (Song *et al.* 2021), backpropagation (BP) neural network (Chen *et al.* 2023), and echo state network (ESN) (Li & Li 2023) have been well developed in predicting future trends. Qin *et al.* (2019) used a BP model based on preprocessing technology to predict runoff at different scales and found that the prediction results were improved. The evapotranspiration (Ev) of green peppers was indicated by Liu *et al.* (2020) using the ENN. When compared with the BP network, which involves iterative optimization of BP, and the ENN, which deals with long-term dependencies facing gradient vanishing or exploding, the ESN model's training procedure just involves linear regression of the output layer without the need to train the internal weights of the network, making the training process more efficient (Ribeiro *et al.* 2020). Meanwhile, the network structure of ESN is relatively simple, requiring only the initialization of a set of weight matrices and state vectors. This makes ESN easy to implement and understand and can be more easily deployed and applied in resource-limited situations (Wen *et al.* 2023). Due to its significant advantages, the ESN model has been widely used to handle time series prediction problems. Coulibaly (2010) predicted lake water levels using the ESN model, and results from four locations showed that the ESN model outperformed recurrent neural network (RNN) and Bayesian neural network. The ESN model was used by Li *et al.* (2023) to increase the prediction accuracy of photovoltaic power. The efficiency of neural network models steadily weakens with increasing dimensions and quantity of data; hence, deep learning models have garnered significant interest and have been continuously developed. Currently, deep learning models such as Long and Short-Term Memory (LSTM) (Tennant *et al.* 2020), RNN (Bhattacharjee & Tollner 2016), and gated recurrent unit (GRU) (Wei *et al.* 2023) have been widely used for prediction. Meng *et al.* (2023) used optimized RNN to predict sea ice's bending and compressive strength, and the results showed higher prediction accuracy than BP. Xu *et al.* (2022) used LSTM for flood forecasting in two watersheds. Using hourly flow and rainfall as inputs, Gao *et al.* (2020) predicted watershed runoff with high prediction accuracy using LSTM and GRU. Although both RNN and LSTM have their advantages, GRU effectively solves classic RNN's gradient vanishing and gradient explosion concerns in modeling long sequence dependencies through gating mechanisms (Gao *et al.* 2022; Liu *et al.* 2023). It can better capture long-term dependencies in sequences by selectively updating and forgetting information. At the same time, GRU has a lower computational complexity than LSTM, and its structure is simpler with fewer parameters. Therefore, in the case of limited computing resources, using GRU can more efficiently train and predict (Zhou *et al.* 2023; Fang *et al.* 2024). In summary, this paper selects ESN model and GRU model with significant advantages as the annual runoff prediction model with multiple factor inputs.

However, due to the significant impact of hyperparameters in ESN and GRU models, on the effectiveness of runoff prediction, traditional parameter determination methods are gradually phased out due to their manpower, material resources, and intense subjectivity (Wang *et al.* 2023). Swarm intelligence optimization algorithms, which feature benefits including great adaptability to nonlinear problems and global search ability, have been utilized by several researchers for parameter optimization as technology has advanced (Niu *et al.* 2019). The artificial neural network (ANN) parameters were optimized by Jahandideh-Tehrani *et al.* (2021) using the genetic algorithm (GA) method and particle swarm optimization (PSO). Through daily rainfall and runoff prediction, they were able to confirm that the ANN-PSO model outperforms the ANN-GA model by a large margin. In comparison to independent Extreme Learning Machine (ELM), a single ELM-PSO, and ELM-GWO, Adnan *et al.* (2021) achieved better accurate findings when using ELM based on the Grey Wolf Optimizer (GWO) and PSO to estimate monthly runoff in the Mangla Basin. Qiao *et al.* (2023) used an improved Aquila optimizer to optimize a time convolutional network for rainfall–runoff simulation and multistep runoff prediction and achieved good results. Inspired by the hunting behavior of sand cats in the desert, the sand cat swarm optimization algorithm (SCSO) is a new optimization algorithm (Seyyedabbasi & Kiani 2023). The optimization algorithm has strong global search ability, fast convergence speed, a simple structure, and can adaptively adjust the parameters in the algorithm according to the characteristics of the problem. The computation process of the simultaneous SCSO algorithm can handle multiple solutions simultaneously. This makes it feasible to use parallel computing to accelerate the algorithm's execution in the large-scale predictive model's parameter optimization challenge (Jia *et al.* 2022; Tavakol Aghaei *et al.* 2023). To solve the critical parameter optimization problem of ESN and GRU models, this paper adopts the SCSO algorithm for processing, aiming to enhance the predictive performance of ESN and GRU models.

Although a single optimized prediction model generated outstanding prediction results, the prediction accuracy of numerous complex prediction data is not ideal due to the structural limitations of each prediction model. Therefore, some scholars attempt integrating models with complementary performance to achieve better prediction results. Yao *et al.* (2023) optimized the parameters of the Convolutional Neural Networks (CNN)-LSTM and GRU models using an improved Sparrow Search Algorithm (SSA) technique. Using an adaptive weighting module, they combined the two models' predictions. Results indicate that the combination model with several prediction models outperforms other single prediction models. The observed displacement of the Three Gorges slope was projected by Duan (2023) using the Grey–Markov, GA-BP, and ARIMA models. The anticipated outcomes of the three models were then blended using the reciprocal variance weighting approach. The outcomes demonstrated that the weighted combination prediction model produced more accurate predictions. It is evident from the aforementioned researchers' studies that high prediction accuracy may be attained by integrating prediction models utilizing weighted combination approaches. Thus, to enhance prediction accuracy by balancing the benefits of both models, this work integrates the prediction outcomes of the GRU and ESN models based on SCSO using the least squares method (LSM).

There are errors between the weighted combination model's predicted outcomes and the original data, but these errors frequently hold a wealth of information that is overlooked. This ignores the potential critical features in the error, affecting the prediction results' accuracy (Liu *et al.* 2021; Yang *et al.* 2023). Many research studies have demonstrated that developing error correction models can improve the prediction accuracy. Wang *et al.* (2024) used statistical methods to extract residual linear features from wind power prediction error sequences for adequate correction. Wu *et al.* (2023) predicted and corrected the error between the forecasted and actual values of the LSTM model linked with preprocessing technologies using an improved coupling model of CEEMDAN (ICEEMDAN) and GRU. The research showed that the crude oil future prices converted by the error correction model were closer to the real values. At three hydrological stations, Xu *et al.* (2024) employed local error correction to rectify the prediction error of the ENN optimized by the sparrow search algorithm. The outcomes demonstrated the benefits of the error correction model. Markov chains (MCs) only rely on the current state for future states and are independent of past conditions. This ‘memorylessness’ makes them suitable for problems with high-frequency random fluctuations such as error sequences (Gao *et al.* 2021; Wei *et al.* 2022). At the same time, the finite space state of MCs and the modeling of the evolution of system states as probability transitions between states simplify the design and adjustment process of error correction models and reduce the computational burden of error correction (Zhang *et al.* 2021). Given the advantages of error correction, this work makes full use of the MC model to accurately correct the error sequence in runoff prediction, hence enhancing the precision and dependability of the model's forecast.

Therefore, this study aims to develop an SCSO-ESN-GRU-MC hybrid model based on a new optimization algorithm, multiple prediction model combinations, and error correction methods. In addition, this model is applied to the Changshui Station in the Luohe River Basin and Lanxi Station in the Hulan River Basin's annual runoff forecast.

## METHODOLOGY AND THEORETICAL FOUNDATIONS

The second section's goal is to give a thorough overview of each model's underlying theories and coupling techniques, as well as the metrics used to assess the models' performance and the precise steps involved in building the suggested model.

### Echo state network

*K*input units,

*N*hidden layer neurons, and

*L*output units, then their values at time

*k*are

In the equation, *u*, *x,* and *v* are the input variables, output variables, and state variables of the ESN, respectively; are the input weight matrix, internal weight matrix, and output weight matrix, respectively, while is the interior to output connection weight matrix; *f* represents the internal activation function, usually using the Sigmoid function or tanh function; is the output layer activation function, generally using a linear process.

The performance of an ESN is determined by the spectrum radius (SR), input scale (IS), reserve pool size (*N*), and sparse degree (SD) parameters of the reserve pool. The specific definitions of these parameters can be found in the papers of Ribeiro *et al.* (2020) and Wen *et al.* (2023).

### Gated recurrent unit

Among them, represents the sigmoid function with output values limited to [0,1]; and represent the reset gate weight matrix and update gate weight matrix for linear transformation of the input vector in the *t*th time step, respectively; and represent the reset gate weight matrix and update gate weight matrix for linear change of the hidden state in the previous time step, respectively. The reset gate controls the degree of neglect of the prior sequence of state information. The smaller the value, the more information is ignored. The update gate holds the degree to which the previous line of state information is transmitted to the current state, and the larger its value, the more information is input.

*W*and

*U*represent the update gate weight matrices for the input vector and hidden state , respectively, and represents the Hadamard product.

### Sand cat swarm optimization

Seyyedabbasi & Kiani (2023) proposed the SCSO algorithm, a population optimization algorithm that uses a unique, straightforward, and effective methodology. The hunting technique of a pack of sand cats, which are incredibly skilled at mining prey and can detect low frequencies below 2,000 Hz, is modeled by this algorithm. The SCSO algorithm splits the feeding behavior of sand cats into two phases based on these two characteristics: global search (exploration) and attack prey (development). It maintains a balance between exploration and development through adaptive mechanisms. The following describes the mathematical description of SCSO:

- (1) Global search phase. Each sand cat has a workable solution
*X*, and we set the number of sand cat populations to*N*. Next, we choose the best individual by assessing each sand cat's fitness for the goal function. Based on where the ideal individual is located, other people move. The equation for updating the position of the*k*th sand cat group in the*i*+ 1 generation is as follows:

Among them is the *i*-generation optimal individual; *r* is the sensitivity range of each sand cat , and rand is a random number between 0 and 1; linearly decreases from 2 to 0 during the iteration process to ensure that the feasible solution gradually approaches the prey, , represents the auditory characteristics of the sand cat (), and *T* represents the maximum number of iterations for the sand cat population.

*R*for the change from exploration to development, SCSO keeps a balance between the two. The mathematical description is as follows:where is the exploration phase and is the development phase.

- (2) Attack the prey stage. At this stage, to determine a random angle at which each sand cat should update its position to approach its prey, SCSO employs a roulette selection algorithm. The updated equation is as follows:where
*d*is the random distance and*θ*is a random angle to approach the hunting position from any angle.

Global search and attack of prey are determined by and *R*. SCSO can seamlessly switch between two stages. When , the sand cat herd's next position can be anywhere between its current position and the hunting position, allowing for a global search to be done. When , the sand cat group enters the attack stage and conducts local searches, and the algorithm has global and local search capabilities.

### Least squares method

A popular optimization method in mathematics and statistics is LSM, which reduces model errors and finds the best-fitting model by minimizing the sum of squares of residuals. The process of using LSM to determine the weights of the composite model in this paper is as follows:

In the equation, *i* represents the type of prediction model, *k* represents the number of prediction models, and represents the measured value.

*Y*represents the predicted value of the combination model; and represent the weights of the SCSO-ESN model and the SCSO-GRU model, respectively; and describe the predicted values of the SCSO-ESN model and the SCSO-GRU model, respectively.

### MC process

MC is a stochastic dynamic process with the characteristic that future states do not depend on past conditions. It can compute the future form at a specific time based on the initial known state through the state transition probability matrix. Large random fluctuation issues can be effectively handled by it.

Finding the research object's state partition and transition probability is essential to understanding the MC correction model. The following are the precise steps for correction:

(1) Judgment status interval.

Based on the absolute value of the relative error between the measured and predicted values of the training set samples, divide the states of *k* sample space.

(2) The state transition probability matrix needs to be solved.

Among them, is the probability of state transitioning to in one step, is the number of samples in the state transition table, and is the frequency at which transfers to in one step.

(3) Solve residual correction.

is the correction error of state ; and are the upper and lower bounds of the interval of state , respectively.

represents the prediction results after MC correction, and represents the initial prediction results of the model.

### Cross-validation

Cross-validation is a technique for evaluating the generalization ability of machine learning models, which reduces overfitting and assists in effective model selection and parameter tuning by dividing the training and validation sets (Jiang & Chen 2016; Rohani *et al.* 2018). There are various types of cross-validation, and in this study, we used the fivefold cross-validation to divide the training and validation sets and select model parameters.

*α*with fivefold cross-validation is as follows:

Step 1: Divide the data into five equal parts.

*h*represents

*y*numerical values.

_{i}### Model performance evaluation indicators

*R*), mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE). The NSEC value shows the quality of the model simulation results, and higher NSEC values indicate the higher simulation quality of the model (Han & Morrison 2022b). Higher predictive accuracy is indicated by a model with an

*R-*value closer to 1, which measures the linear correlation between the model's actual value and its predicted value. The similarity between the predicted results of the model and actual values is displayed using MAE, with smaller MAE values indicating that the model is closer to the actual data. The relative size of the error is indicated by the MAPE, which is the ratio of error to actual value. A smaller MAPE value indicates the model's prediction is relatively accurate (Lee 2022). RMSE measures the degree of deviation between a model's predicted and actual values. The model's prediction error decreases, and the fitting effect improves as the closer the value is to 0. The closer the NSEC and

*R-*values are to 1, the higher the model's prediction accuracy; conversely, the closer the RMSE, MAPE, and MAE values are to 0, the smaller the model's prediction error and the better the fitting effect. The mathematical equations for the five evaluation indicators are shown below:

In the equations, is the measured value; is the predicted value, *f*_{avg} is the average of the measured values, and is the average of the predicted values.

### The modeling process of the proposed model

Step 1: Perform Pearson correlation analysis on the collected multifactor data and select factors with high correlation with runoff factors as input for the prediction model.

Step 2: Normalize the screening input factors and divide each into training and prediction sets in a 9:1 ratio.

Step 3: Establish parameters like the number of SCSO populations and maximum iteration times, and choose the four critical parameters of ESN and the number of GRU hidden layer nodes as the optimization objects for SCSO for population initialization.

Step 4: Using the fivefold cross-validation method, the training set is divided into five parts, with each part taking turns as the validation set and the remaining four parts as the training set. The average target value is obtained by cycling five times. Based on the average target value, adjust the position of the leader in the SCSO algorithm and update the position parameters for continuous optimization. Repeat the above process until the SCSO algorithm reaches the optimal target value, and the ESN and GRU models also determine the optimal parameters based on the results of the validation set.

Step 5: Next, use SCSO-ESN and SCSO-GRU to model and predict the multifactor data after assigning the determined optimal parameter values to ESN and GRU.

Step 6: Fit the predicted results of SCSO-ESN and SCSO-GRU using the LSM to obtain weights

*λ*_{SCSO-ESN}and*λ*_{SCSO-GRU}. The initial prediction results are obtained by weighting and summing the SCSO-ESN and SCSO-GRU prediction results based on their weights.Step 7: The relative error sequence can be obtained by computing the error sequence, which can be obtained by subtracting the original runoff sequence from the initial prediction sequence.

Step 8: Based on the relative error sequence, perform MC correction on the initial prediction result to obtain the final prediction result.

## CASE STUDIES

The pertinent details regarding Changshui and Lanxi stations are introduced in this section, along with the steps involved in applying the SCSO-ESN-GRU-MC model to the two experimental stations.

### Overview of the research area

^{2}, and it empties into the Yellow River north of Luokou in Gongyi City. The Luohe River Basin belongs to a warm temperate monsoon climate, with an average annual temperature of 12.1–14.5°C and an annual precipitation of 500–900 m. The Changshui station is at 111°26′E and 34°19′N in Liupo Village, Changshui Town, Luoning County, Henan Province. With an average annual rainfall of 530 mm and an average annual runoff of 8.17 billion m

^{3}, its watershed area is 874 km

^{2}. Changshui station has long been responsible for providing real-time water information and collecting hydrological data about the Luohe River to flood control command departments. It is a basic hydrological station in China. Located in the center of Heilongjiang Province, the Hulan River is a first-class tributary on the left bank of the Songhua River. It originates from the Taiping Mountains of the Xiaoxing'an Mountains in Tieli City. It is formed by the confluence of the Kayin and Numin rivers, ultimately flowing into the Songhua River. Its drainage area is 35,683 km

^{2}, and its total length is 523 km. With an average annual temperature of 4.0 °C, the Hulan River basin is located in the northern temperate zone's continental monsoon climate. The average annual runoff of the entire basin is 4.129 billion m

^{3}, with an average annual runoff depth of 133.3 mm and an average annual estuary flow of 93.75 m

^{3}/s. The precipitation season is mainly concentrated in July, August, and September, with 3 months accounting for 70% of the annual precipitation. Lanxi station is located 103 km downstream of the Hulan River in Suihua City, Heilongjiang Province, with a geographical location of 126°20′E and 46°15′N. It controls a drainage area of 2,770 km

^{2}. The geographical locations of Changshui station and Lanxi station are shown in Figure 5. Lanxi station is located in a semi-arid region, whereas Changshui station is located in a semi-humid region. The two hydrological stations have significant differences in terrain and landforms, and the areas of Changshui station and Lanxi station are affected by the monsoon climate, with significant differences in precipitation and drastic changes in runoff. They are typical representative hydrological stations. To confirm the effectiveness and generalizability of the suggested model, the runoff and associated meteorological data from the two hydrological stations mentioned above are chosen.

This study collected annual runoff data and 19 related factors affecting runoff at the Changshui station from 1960 to 2016 and the Lanxi station from 1956 to 2014. The 19 collected factors are extreme wind speed (EWS), minimum atmospheric pressure (APmin), minimum air temperature (ATmin), maximum atmospheric pressure (APmax), maximum air temperature (ATmax), average atmospheric pressure (AAP), average 2-min wind speed evapotranspiration (Ev), (A2WS), average air temperature (AAT), average water vapor pressure (AWVP), average relative humidity (ARH), average minimum air temperature (AATmin), average maximum air temperature (AATmax), daily precision ≥0.1 mm days (DP0.1), sunshine duration (SD), maximum wind speed (MWS), maximum daily precision (MDP), precision (P), and runoff (R). Table 1 displays pertinent data regarding Changshui station's and Lanxi station's annual runoff data. To confirm the effectiveness of the suggested model, the annual runoff data was chosen from Changshui station for 57 years and Lanxi station for 56 years. The maximum annual runoff of Changshui Station and Lanxi Station are 37.6160 × 10^{8} and 274.3226 mm, respectively.

Hydrologic station . | Max . | Min . | Mean . | Std . | Training sample . | Testing sample . |
---|---|---|---|---|---|---|

Changshui station | 37.6160 × 10^{8} m^{3} | 1.3200 × 10^{8} m^{3} | 8.2172 × 10^{8} m^{3} | 6.2005 | 46 | 11 |

Lanxi station | 274.3226 mm | 17.4719 mm | 120.4795 mm | 66.9498 | 45 | 11 |

Hydrologic station . | Max . | Min . | Mean . | Std . | Training sample . | Testing sample . |
---|---|---|---|---|---|---|

Changshui station | 37.6160 × 10^{8} m^{3} | 1.3200 × 10^{8} m^{3} | 8.2172 × 10^{8} m^{3} | 6.2005 | 46 | 11 |

Lanxi station | 274.3226 mm | 17.4719 mm | 120.4795 mm | 66.9498 | 45 | 11 |

### Correlation analysis of model inputs

*R*> 0) and a negative correlation (

*R*< 0). To determine the strength of the correlation, one typically uses the correlation coefficient

*R*'s absolute value. The larger the |

*R*|, the closer it is to 1, indicating that the corresponding potential impact factors correlate better with the predicted object. On the contrary, the smaller the |

*R*|, the closer it is to 0, indicating that the correlation between the corresponding potential impact factors and the predicted object is poorer. This paper selects factors with and believes that these factors correlate significantly with runoff factors. The correlation analysis results between the factors of Changshui station and Lanxi station are shown in Figures 6 and 7.

From Figure 6, it can be seen that among the 19 factors at the Changshui station, the factors include APmin, AAP, A2WS, AAT, ARH, AATmin, AATmax, DP0.1, MWS, and P, with 10 factors. As a result, the Changshui station's model validation used these 10 factors as inputs. The content shown in Figure 7 shows that among the 19 factors at the Lanxi station, the factors include ARH, AATmax, DP0.1, MSP, SD, P, and Ev. Due to the significant impact of AAP on precipitation and evaporation processes and patterns, which in turn have a significant impact on runoff depth, and the |*R*| value of AAP is 0.29, which is very close to 0.3, we have listed AAP as one of the factors affecting runoff at the Lanxi station. Therefore, the input factors selected for model validation at the Lanxi station are the above eight factors. Detailed information on the 10 factors inputted at the Changshui station and the eight factors inputted at the Lanxi station is shown in Appendix Table A1.

### Error correction process

The annual runoff series is very volatile, so the predictions made by a single prediction model are not accurate enough. As a result, to increase prediction accuracy and reduce uncertainty in the annual runoff prediction model, the MC is employed for error correction.

The above matrices and represent the state transition probability matrices of the SCSO-ESN-GRU model at the Changshui and Lanxi stations. The sum of probabilities of the predicted model's states in each sample space can be computed using the and matrices. The state with the maximum sum of state probabilities is where the error of the model's runoff prediction value lies. According to and matrices, the error state intervals of the SCSO-ESN-GRU model at Changshui and Lanxi stations are [10%, 50%) and [5%, 35%), respectively. Ultimately, the final prediction results following MC correction are obtained by applying Equations (16) and (17) to the initial annual runoff prediction results of the SCSO-ESN-GRU model.

### Model development

The experimental process of all models in this paper is conducted on Windows 10 (64-bit), Intel (R) Core (TM) i7-8700 CPU@3.20 on GHz devices using MATLAB 2022b.

This paper selects a total of 14 models, including SCSO-ESN-MC (M 2), SCSO-GRU-MC (M 3), SCSO-ESN-GRU (M 4), SCSO-GRU (M 5), GWO-GRU (M 6), whale optimization algorithm (WOA)-GRU (M 7), SCSO-ESN (M 8), GWO-ESN (M 9), WOA-ESN (M 10), GRU (M 11), ESN (M 12), BP (M 13), LSSVM (M 14), and ELM (M 15), to use annual runoff data from Changshui and Lanxi stations with the SCSO-ESN-GRU-MC (M 1) model. Comparative experiments are carried out to illustrate the suggested model's superior performance. The model and 14 comparative models are referred to as M 1–15 to make representation in the text easier because of the abundance of benchmarking models used in this paper.

The comparative model's prediction procedure looks like this:

(1) Single prediction model

The original annual runoff serves as the input object for the ESN model, and the model's input variables are chosen by experimental findings from two experimental stations in Section 3.2. To get the final prediction results, the training set and prediction set are split. GRU, BP, LSSVM, and ELM's prediction process is similar to ESN's. To demonstrate the benefits of combining multiple prediction models and the drawbacks of a single prediction model in annual runoff prediction, five single prediction models, ESN, GRU, BP, LSSVM, and ELM, are chosen to compare with this model.

(2) Optimized prediction model

For the SCSO-ESN and SCSO-GRU models, the input object is the original annual runoff. The model's input variables refer to Section 3.2, and the training and prediction sets are divided. The optimal parameter values are searched using the SCSO optimization algorithm and assigned to the ESN network and GRU. Subsequently, the ESN network and GRU output the prediction results, respectively. The prediction processes of GWO-GRU, WOA-GRU, GWO-ESN, and WOA-ESN are similar to those of SCSO-optimized ESN and GRU, except for replacing algorithms with different performances. By using the SCSO-ESN, SCSO-GRU, GWO-GRU, WOA-GRU, GWO-ESN, and WOA-ESN models, the optimization performance benefits of the SCSO algorithm are highlighted, along with the speed and accuracy with which optimization algorithms screen prediction model parameters. Except for the absence of an MC model for error correction, the steps of the SCSO-ESN-GRU model are comparable to the model put forth in this paper. This model is selected to highlight the benefits of combining least squares prediction models with other prediction models, as well as the improvement effect of this coupled on prediction results accuracy.

(3) Error correction model

The following are the precise steps of the SCSO-ESN-MC and SCSO-GRU-MC models: To get the first prediction results, feed the initial annual runoff into the SCSO-ESN and SCSO-GRU models. Divide the relative error absolute value of the initial prediction result into state intervals, obtain the state interval according to Figures 8(b), 8(c), 9(b), and 9(c), and then obtain the state transition probability matrix. Finally, obtain the final prediction result using Equations (16) and (17). The state intervals and probability transition matrix results of the SCSO-ESN-MC and SCSO-GRU-MC models for MC correction at two stations are shown in Appendix Table A2. The above two models are selected to demonstrate the effectiveness of MC correction methods in different models and datasets. To fairly compare the prediction processes of 15 models at two stations, the parameter settings of all models are presented in Appendix Table A2.

## RESULTS ANALYSIS AND DISCUSSION

This section includes the evaluation results of five evaluation indicators, the discussion and analysis of all models, and the prediction results of all models at two experimental stations.

### Analysis of prediction results

At the same time, this article uses the Wilcoxon signed rank test to test the significance of the model's prediction results with the original runoff to more clearly demonstrate the predictive performance of the model. Table 2 presents the Wilcoxon test results for different models and original runoff, all of which are based on a significance level of 95% (*α* = 0.05). The H column displays the result of rejecting the null hypothesis, where 1 represents rejection (i.e., believing there is a significant difference between the two sets of data) and 0 represents inability to reject (i.e., there is no significant difference between the two sets of data). From Table 2, it can be seen that the SCSO-ESN-GRU-MC model is 0 in the H column of both Lanxi Station and Changshui Station, indicating that there is no significant difference between the predicted results and the original runoff. This also indicates that the SCSO-ESN-GRU-MC model has a good fitting effect and strong generalization ability.

Model . | Lanxi station . | Changshui station . | ||||||
---|---|---|---|---|---|---|---|---|

Train period . | Test period . | Train period . | Test period . | |||||

p-value
. | H . | p-value
. | H . | p-value
. | H . | p-value
. | H . | |

ELM | 0.8083 | 0 | 0.3203 | 0 | 0.3732 | 0 | 0.3652 | 0 |

LSSVM | 0.7053 | 0 | 0.2061 | 0 | 0.3674 | 0 | 0.0137 | 1 |

BP | 0.0008 | 1 | 0.8984 | 0 | 0.0013 | 1 | 0.1230 | 0 |

ESN | 0.1005 | 0 | 0.5771 | 0 | 0.0049 | 1 | 0.0537 | 0 |

GRU | 0.9865 | 0 | 0.3203 | 0 | 0.4542 | 0 | 0.0137 | 1 |

WOA-ESN | 0.1128 | 0 | 0.6377 | 0 | 0.0040 | 1 | 0.0420 | 1 |

GWO-ESN | 0.0852 | 0 | 0.5771 | 0 | 0.0044 | 1 | 0.1016 | 0 |

SCSO-ESN | 0.0515 | 0 | 0.4648 | 0 | 0.0000 | 1 | 0.3203 | 0 |

WOA-GRU | 0.8611 | 0 | 0.1230 | 0 | 0.8398 | 0 | 0.7646 | 0 |

GWO-GRU | 0.9865 | 0 | 0.1475 | 0 | 0.7722 | 0 | 0.7002 | 0 |

SCSO- GRU | 0.9415 | 0 | 0.2402 | 0 | 0.8313 | 0 | 1.0000 | 0 |

SCSO-ESN-GRU | 0.3233 | 0 | 0.3203 | 0 | 0.0384 | 1 | 0.7646 | 0 |

SCSO- GRU-MC | 0.3695 | 0 | 0.7002 | 0 | 0.3502 | 0 | 0.6377 | 0 |

SCSO-ESN-MC | 0.0081 | 1 | 0.3203 | 0 | 0.0007 | 1 | 0.3203 | 0 |

SCSO-ESN-GRU-MC | 0.0736 | 0 | 0.6377 | 0 | 0.1603 | 0 | 0.2402 | 0 |

Model . | Lanxi station . | Changshui station . | ||||||
---|---|---|---|---|---|---|---|---|

Train period . | Test period . | Train period . | Test period . | |||||

p-value
. | H . | p-value
. | H . | p-value
. | H . | p-value
. | H . | |

ELM | 0.8083 | 0 | 0.3203 | 0 | 0.3732 | 0 | 0.3652 | 0 |

LSSVM | 0.7053 | 0 | 0.2061 | 0 | 0.3674 | 0 | 0.0137 | 1 |

BP | 0.0008 | 1 | 0.8984 | 0 | 0.0013 | 1 | 0.1230 | 0 |

ESN | 0.1005 | 0 | 0.5771 | 0 | 0.0049 | 1 | 0.0537 | 0 |

GRU | 0.9865 | 0 | 0.3203 | 0 | 0.4542 | 0 | 0.0137 | 1 |

WOA-ESN | 0.1128 | 0 | 0.6377 | 0 | 0.0040 | 1 | 0.0420 | 1 |

GWO-ESN | 0.0852 | 0 | 0.5771 | 0 | 0.0044 | 1 | 0.1016 | 0 |

SCSO-ESN | 0.0515 | 0 | 0.4648 | 0 | 0.0000 | 1 | 0.3203 | 0 |

WOA-GRU | 0.8611 | 0 | 0.1230 | 0 | 0.8398 | 0 | 0.7646 | 0 |

GWO-GRU | 0.9865 | 0 | 0.1475 | 0 | 0.7722 | 0 | 0.7002 | 0 |

SCSO- GRU | 0.9415 | 0 | 0.2402 | 0 | 0.8313 | 0 | 1.0000 | 0 |

SCSO-ESN-GRU | 0.3233 | 0 | 0.3203 | 0 | 0.0384 | 1 | 0.7646 | 0 |

SCSO- GRU-MC | 0.3695 | 0 | 0.7002 | 0 | 0.3502 | 0 | 0.6377 | 0 |

SCSO-ESN-MC | 0.0081 | 1 | 0.3203 | 0 | 0.0007 | 1 | 0.3203 | 0 |

SCSO-ESN-GRU-MC | 0.0736 | 0 | 0.6377 | 0 | 0.1603 | 0 | 0.2402 | 0 |

### Analysis of evaluation index results

This paper evaluates the prediction accuracy of 15 models using five evaluation indicators: MAE, RMSE, MAPE, NSEC, and *R*. Appendix Table A3 displays the specific values. The predictive performance of the model can be better explained and illustrated by these five evaluation indicators. Appendix Table A3 analysis reveals the following:

(1) Compared to other coupled models, the prediction accuracy of a single model is lower. The test set of the Lanxi station in Appendix Table A3 shows that the NSEC and

*R-*values of the five single prediction models are lower than those of other coupled models, while the MAE, RMSE, and MAPE values are relatively high. This suggests that in time series with high volatility, the prediction outcomes of a single prediction model have large errors and are challenging to evaluate. The five single prediction models' NSEC values are all negative, with*R-*values not going above 0.5, according to the Changshui station test set. This suggests that the five individual prediction models' prediction results have a low prediction effect and a weak correlation with the original data. Simultaneously, it illustrates how flawed the single prediction model is in capturing the volatile annual runoff data information. The ESN and GRU models have the potential to become more integrated models, as evidenced by the fact that their prediction results at two experimental stations outperformed those of the BP, LSSVM, and ELM models.(2) Compared to a single prediction model, the optimized prediction model has a higher prediction accuracy. For instance, the NSEC values of the SCSO-GRU, GWO-GRU, and WOA-GRU models all exceeded 0, and the

*R-*values all exceeded 0.5 during the Changshui station testing phase. Compared with the GRU model, the RMSE value decreases by 32.95, 27.40, and 23.61%, respectively. Compared with the ESN model, the MAPE value reduces by 53.30, 42.49, and 36.02%, respectively. Compared with the BP model, the MAE value decreases by 43.59, 34.58, and 29.93%, respectively. Compared with the LSSVM model, the NSEC value increases by 148.10, 132.61, and 121.29%, respectively. Compared with the ELM model, the*R-*value increase by 24.32, 9.66, and 10.89%, respectively. At the Lanxi station, for the GRU, ESN, BP, LSSVM, and ELM models, the RMSE values of the SCSO-ESN model decrease by 13.28, 14.09, 16.30, 18.06, and 25.07%, respectively. The*R-*values of the GWO-ESN model increase by 4.66, 2.81, 6.80, 4.31, and 3.19%, respectively. The NSEC values of the WOA-ESN model increase by 2.58, 3.44, 5.99, 8.26, and 20.40%, respectively. It is evident that applying optimization algorithms to parameter optimization speeds up the process of determining parameters and greatly increases prediction accuracy.(3) There are notable differences between the prediction outcomes that arise from utilizing distinct optimization algorithms to optimize the parameters of the prediction model. For the GWO-ESN model at the Changshui station, the NSEC value of the SCSO-ESN model increases by 124.42%, while the RMSE and MAE values decrease by 15.31 and 15.35%, respectively. The SCSO-ESN model's RMSE and MAPE values at the Lanxi station drop by 4.01 and 1.92%, respectively. For the WOA-ESN model, at the Changshui station, the

*R-*value of the SCSO-ESN model increases by 7.39%, while the MAE and MAPE values decrease by 17.70 and 20.19%, respectively. The SCSO-ESN model's*R*and NSEC values increase by 2.12 and 8.00%, respectively, at the Lanxi station. In contrast to the WOA-GRU model, the SCSO-GRU model at the Changshui and Lanxi stations have RMSE values that are 12.23 and 7.53% lower, an*R-*value that is 12.14% higher and 1.11% lower, an MAE value that is 19.50% lower and 9.67% lower, and an NSEC value that is 125.94% higher and 4.99% lower, respectively. This paper presents the fitness curves of the three algorithms in Figure 12 to compare the optimization performance of the three algorithms in various prediction models and experimental stations more effectively. Figure 12 shows that the SCSO algorithm's fitness value is lowest in the parameter optimization of both GRU and ESN, as well as in the experimental results at the Changshui and Lanxi stations. Combining Appendix Table A3 and Figure 12, it is shown that the SCSO algorithm performs a parameter search by simulating the hunting behavior of felines. Compared with the WOA algorithm, simulating the feeding behavior of whales, the GWO algorithm, simulating the collaborative predation behavior of wolf packs, has a faster convergence speed, stronger global search ability, and strong adaptability. This also enables the SCSO algorithm to achieve better prediction results in ESN and GRU parameter optimization.(4) Compared to a single optimized prediction model, the coupled prediction model achieves a higher accuracy. For the Changshui station, the

*R-*value of the SCSO-ESN-GRU-MC model is 0.9097 and the MAE value is 0.6907. Compared with the SCSO-ESN-MC and SCSO-GRU-MC models, the*R-*value increases by 10.11 and 3.81%, respectively, while the MAE value decreases by 19.32 and 25.99%, respectively; The RMSE and MAPE of the SCSO-ESN-GRU model decrease by 20.16, and 33.03%, and*R*increases by 17.82% compared to the SCSO-ESN model, respectively. Compared to the SCSO-GRU model, NSEC and*R*increase by 17.19 and 5.41%, respectively, while RMSE decreases by 4.70%. At the Lanxi station, compared with the SCSO-ESN-MC model and the SCSO-GRU-MC model, the*R*and NSEC values of the SCSO-ESN-GRU-MC model increase, while the MAE, RMSE, and MAPE values decrease. Compared with the SCSO-ESN model and the SCSO-GRU model, the RMSE value of the SCSO-ESN-GRU model decreases by 4.05 and 5.91%, respectively, while the NSEC value increases by 2.23 and 3.38%, respectively. As can be seen from the above, the data evaluation index values of the MC corrected LSM and the SCSO-ESN-GRU model after weighting are significantly higher than those of the MC corrected single optimized prediction model and the single optimized prediction model (SCSO-ESN, SCSO-GRU). This suggests that the advantages and disadvantages of the two separate prediction models are balanced by the prediction model combined with the least squares approach. In addition, it successfully reduces the volatility of a single optimized prediction model's prediction results, improving the MC correction's data foundation.(5) The SCSO-ESN-GRU-MC model demonstrates encouraging results in all evaluation indicators, and the prediction results of the coupled model based on MC error correction are superior to other comparative models. For example, at the Changshui station, the

*R*and NSEC values of the SCSO-ESN-GRU-MC model are 0.9097 and 0.8102, respectively, which are much higher than the 0.6454 and 0.4083 values of the SCSO-ESN-GRU model. Compared with the SCSO-ESN model, the NSEC value of the SCSO-ESN-MC model increases from 0.0718 to 0.6750 and the MAE value decreases from 1.8914 to 0.8561. At the Lanxi station, the MAPE and RMSE values of the SCSO-ESN-GRU-MC model decrease by 49.51 and 54.04%, respectively. Compared to the SCSO-ESN-GRU model, the NSEC value increases by 7.01%. The*R-*value of the SCSO-ESN-MC model is 0.9645, which is much higher than the 0.9039 of the SCSO-ESN model. The RMSE value of the SCSO-GRU-MC model is 23.1424, which is lower than the 32.6657 of the SCSO-GRU model. From this, it can be concluded that the MC can effectively reduce error fluctuations and obtain higher accuracy prediction results due to its adaptability and error correction based on error states.

To further analyze the predictive performance of all models, this paper presents scatter density plots of measured and predicted annual runoff during the testing period at two experimental stations, as shown in Appendix Figures A1 and A2. Among them, *R*^{2} in Appendix Figures A1 and A2 is the coefficient of certainty, representing the proportion of variation in the dependent variable *y* that the independent variable *x* can explain. It can be intuitively seen that as the parameters of the ESN and GRU models are optimized using the SCSO method, and the prediction model is coupled with the LSM and error corrected using the MC, the contour deviation gradually decreases, *R*^{2} gradually increases, and the fitting effect gradually improves. The measured and predicted annual runoff values of the SCSO-ESN-GRU-MC model gradually approach. This outcome is in line with Appendix Table A3's model performance findings. It is evident from the comparison above that enhancing prediction accuracy requires the optimization parameters of the SCSO algorithm, the LSM coupled prediction model, and the MC error correction.

### Error analysis

This paper plots the prediction error sequences of two experimental stations into cloud rain maps and stacked bar charts, shown in Appendix Figures A3 and A4, to more clearly demonstrate the correction effect of the MC. Appendix Figure A3 shows that (1) a single prediction model's kernel density curve contains several peaks, with peak positions greatly deviating from the 0-scale line, and the dispersed distribution of error data is not concentrated, with many extreme deviation values. This suggests that it is not ideal for a single prediction model to be able to reduce the volatility of error sequences. (2) The peak shapes of the kernel density curves of the SCSO-ESN-GRU-MC, SCSO-ESN-MC, and SCSO-ESN-MC models are steep and narrow. The distribution of scattered points is highly concentrated, and the box plot's median line is substantially near the 0-scale line. These results suggest that the prediction error when employing the MC for error correction is quite low. However, the SCSO-ESN-GRU-MC model has fewer error outliers than the SCSO-ESN-MC and SCSO-GRU-MC models, indicating that coupling the prediction model with the LSM can, to some extent, reduce the impact of errors. Each model's annual prediction error is displayed in Appendix Figure A4 with clarity. Appendix Figure A4 shows that the model suggested in this paper has the lowest prediction error in the majority of years, and the error values fall within an acceptable range. This suggests that the MC is a useful tool for lessening the impact of errors and raising the accuracy of coupled model predictions.

### Discussion

To validate the effectiveness of a novel coupled prediction model based on MC correction, this study applies it to two experimental stations located in various watersheds. Although the prediction results of a single prediction model are much lower than those of a coupled model, the ESN and GRU models stand out among the prediction results of other single prediction models due to their stability and efficiency. They also provide a solid basis for subsequent coupled models, as can be seen from the experimental results displayed in Figures 10–14, Appendix Figures A1–A4, and Appendix Table A3. However, due to the crucial impact of the parameters of ESN and GRU models on their prediction results, the time-consuming and laborious traditional parameter determination methods, and unsatisfactory results, this paper adopts the SCSO algorithm to optimize the prediction model parameters. In the SCSO algorithm, the strong exploration ability of each cat is utilized to search for new solution spaces, and the balance between exploration and utilization is appropriately adjusted to find a good balance between global search and local optimization, thereby achieving the optimal prediction results of the prediction model. Due to the long period and multiple influencing factors, the annual runoff changes sharply, which greatly tests the prediction model's performance. Although GRU and ESN models can predict annual runoff, the predicted results have different characteristics and deviations. Therefore, coupling the two models using the LSM can obtain diverse model outputs. Such diversity can lower the uncertainty and error of a single model and offer more thorough information and dependable prediction results. The problem of prediction accuracy not reaching the ideal state due to errors caused by internal structural issues in the model has not yet been resolved despite the combined prediction model producing good prediction results. Therefore, this paper adopts an MC model with a state interval structure for error correction. It is evident from the outcomes of the two experimental stations that the MC model's post-correction prediction accuracy has greatly increased. At the same time, to objectively and rationally analyze the predictive performance of the model proposed in this article, we also selected the research results of hydrological stations using the same research area in previous literature. Compared with the ESMD-SE-CEEMDAN-LSTM model in Wang's article (Wang *et al.* 2021), the MAE, RMSE, and MAPE of the proposed model for Changshui Station are 0.691, 0.955, and 16.184, respectively, which are significantly better than the ESMD-SE-CEEMDAN-LSTM model's 1.703, 1.827, and 61.345. The model proposed in this article has better predictive performance and more accurate prediction results, proving the superiority of the method. As can be seen from the discussion above, the SCSO-optimized coupled prediction model plus MC error correction combine to produce prediction results that are closer to the original values and with satisfactory prediction accuracy.

## CONCLUSION

This study proposes a coupling model, named SCSO-ESN-GRU-MC, based on SCSO algorithm, ESN and GRU models, the least squares coupling method, and MC error correction to address the issue of great difficulty and low prediction accuracy in annual runoff prediction. First, a correlation analysis is conducted on the data of multiple factors related to runoff to determine the input of the prediction model. Second, the SCSO-ESN and SCSO-GRU models are built by optimizing the parameters of ESN and GRU using the SCSO optimization algorithm. Then, the selected factors, such as average relative humidity, daily precision ≥0.1 mm days, and precision, are used as inputs for the SCSO-ESN and SCSO-GRU models for annual runoff prediction. The initial prediction results are then obtained by coupling the SCSO-ESN and SCSO-GRU model prediction results using the LSM. The final result is then obtained by applying MC error correction to the original prediction results. The following important conclusions are drawn through experiments on the annual runoff data of Changshui station and Lanxi station:

(1) The prediction model in this study produced good prediction results by utilizing an echo state structure with fixed weights, an ESN network with high adaptability, and a GRU network with an effective gating mechanism and capacity to capture long-term dependencies.

(2) The SCSO algorithm can adapt to uncertain factors in data, solve the problem of parameter determination in ESN and GRU models by dynamically searching and adjusting parameters, and improve the robustness and generalization ability of prediction models.

(3) By using the LSM to complement the advantages of ESN and GRU models in predicting the original annual runoff series, it is possible to more comprehensively capture different time scales and patterns in the time series and improve the modeling ability.

(4) Experimental results have shown that using a MC with a unique state interval structure for error correction achieves the best results in all evaluation indicators, indicating that it can effectively reduce errors and significantly improve the prediction accuracy of the coupled model.

This study highlights that the accuracy of predicting annual runoff can be greatly increased by employing a coupled neural network method based on MC correction. The research results demonstrate that this coupled model fully utilizes the searchability of optimization algorithms, the generalization characteristics of neural network models, and the correction ability of MC correction errors. This comprehensive strategy has practicality and efficiency and provides a new approach for multifactor input annual runoff prediction models.

## ACKNOWLEDGEMENTS

The authors are grateful for the support of the special project for collaborative innovation of science and technology in 2021 (No: 202121206).

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.