Abstract
The prediction of river runoff is crucial for flood forecasting, agricultural irrigation and hydroelectric power generation. A coupled runoff prediction model based on the Gravitational Search Algorithm (GSA) and the Seasonal Autoregressive Integrated Moving Average (SARIMA) model is proposed to address the non-linear and seasonal features of runoff data. The GSA has a significant local optimisation capability, while the SARIMA model allows for real-time adjustment of the model using historical data and is suitable for analysing time series with seasonal variations. Consequently, the GSA-SARIMA model was developed and applied to the runoff prediction of the Xianyang section of the Wei River. The results suggest that the GSA-SARIMA model achieves a linear correlation coefficient of 0.9351, a Nash efficiency coefficient of 0.91, a mean relative error of 6.57 and a root mean square error of 0.21. All of the evaluation indicators of this model outperform the other models developed, and its application to actual runoff prediction is feasible, which creates a new path for runoff prediction.
HIGHLIGHTS
The Mann-Kendall trend test is applied to ascertain the separation point between the training and prediction datasets. It avoids too little data in the test set, while effectively improving the generalisation of the model.
The SARIMA model is an improvement on the ARIMA model and allows for convenient real-time adjustment of the model.
The GSA algorithm is applicable to parameter search optimization of the model and has great global search capability.
Graphical Abstract
INTRODUCTION
River runoff is a critical factor in the hydrological system and is influenced by a variety of factors such as geographical location, topography and climate, with a highly non-linear and seasonal nature (Wu et al. 2020). Runoff prediction plays an increasingly significant role in flood forecasting, agricultural irrigation and hydroelectric power generation, and has been attracting research from scholars both at home and abroad. Wang et al. (2015) coupled an autoregressive integrated moving average (ARIMA) model with an ensemble empirical model decomposition (EEMD) and applied it to runoff series from three reservoirs in China (Biliuhe Reservoir, Dahuofang Reservoir and Mapanshan Reservoir), which demonstrated that the proposed coupled model could significantly improve the prediction accuracy. Zhang et al. (2019) developed a novel model based on a modified ensemble empirical modal decomposition (MEEMD) with ARIMA model in order to predict the runoff volume of the lower Yellow River. It was found that the accuracy of the MEEMD-ARIMA model was higher than that of the other models established. Rizeei et al. (2018) combined the Land Transformation Model (LTM) with an ARIMA model to predict rainfall in 2020, which proved to be a decreasing trend from 2000 to 2015, with this trend continuing. Hao et al. (2017) used statistical methods to analyse and then built an ARIMA model to predict the runoff from the Three Gorges Project. The predictions revealed that the runoff had large abrupt changes in 1991 and 2001, with a variation period of about 10 years. Wang et al. (2018) predicted the daily runoff time series at Tang Naihai station, China, based on empirical modal decomposition (EMD) and ARIMA models. It was shown that the proposed model was effective and suitable for hydrological prediction of long series. In order to predict daily influent flows, Boyd et al. (2019) validated the ARIMA model predictions against actual measured data from five North American wastewater treatment plants. It turned out that the ARIMA model achieved satisfactory prediction values, broadening the application area of the ARIMA model. To discover the influence of climate on runoff, Banihabib et al. (2018) predicted monthly runoff data employing an ARIMA model by taking climate related indices as the input signal to the model. The coupled model was observed to yield higher prediction accuracy than the conventional ARIMA model. For the purpose of testing the predictive power of the ARIMA model against the Seasonal Autoregressive Integrated Moving Average (SARIMA) model, Valipour (2015) predicted runoff volumes for 2011 based on runoff data from 1901 to 2010 for each US state. The results indicated that a cycle of runoff in the US occurs every 20 years. Razak et al. (2018) analysed the variation of rainfall and runoff trends in the Segamat River (Malaysia) with the use of the Mann-Kendall (MK) trend test to develop an ARIMA model for flood prediction. The trend analysis illustrated an increasing trend in rainfall, with forecasting results indicating that the ARIMA (0,1,2) model predictions performed optimally. Nury et al. (2017) invented a temperature prediction model combining wavelet techniques with ARIMA on the basis of the results of MK tests on temperature time series in the northeast of Bangladesh. It was demonstrated that the wavelet-ARIMA model outperformed the wavelet-artificial neural network (ANN) model. Niu et al. (2020) proposes a hybrid prediction model to forecast the annual runoff time series of three large hydropower reservoirs in China. The gravitational search algorithm (GSA) was used to optimally adjust the model parameters during the selection process, and the results suggested that the optimised model is more competitive. Niu et al. (2019) optimized the solution of the extreme learning machine (ELM) selection and variational operators using the Improved Gravity Search Algorithm (IGSA), in which results suggested that the coupled model significantly enhanced the prediction of the monthly runoff time series over other models. Scholars at domestic and abroad have mainly focused on the enhancement and optimisation of models. Some scholars adopt the idea of coupled models, coupling the prediction model with the decomposition model and the optimization solution model, while others concentrated on the improvement of the original model. This study introduces seasonal factors to improve the ARIMA model itself on the one hand, and on the other hand the GSA algorithm is applied to optimise the parameters of the SARIMA model. The innovative coupling of the SARIMA model with the GSA algorithm constructs the GSA-SARIMA model, which can adjust the model in real time and features a powerful global search capability. The objective of this study is to provide improved predictive simulation of river runoff.
RESEARCH METHODOLOGY
Mann-Kendall trend test
For monthly runoff prediction, it is common to divide the time series into a training set and a prediction set according to the proportion (Zhang et al. 2021). Since monthly runoff data are small sample data, there may be different trends in the training and prediction sets after the proportional allocation. Thus, a statistical test is introduced to examine the trend of the monthly runoff data and to determine whether there are abrupt change points in it. By using the mutation points as the cut-off point between the training and prediction datasets, we can avoid too little data in the test set and also effectively improve the generalisation ability of the model.

The trend in x is tested by comparing the magnitude of UFn with the standard normal variable at significance level , where
is the probability of tolerance for the MK test to reject the original hypothesis ranging from (0,0.5), usually taking
= 0.05 (Alifujiang et al. 2020). When
, reject the original hypothesis; otherwise, accept the original hypothesis. Conditional on rejection of the original hypothesis, the series has an upward trend if
, and a downward trend if
. To detect burst points in the data, the time series data are sorted in reverse order and the process is repeated to obtain the test statistical series
. An intersection of
and
is a mutation point in a time series if it is within confidence interval
.
Autoregressive integrated moving average (ARIMA) model




Seasonal autoregressive integrated moving average (SARIMA) model
The seasonal autoregressive integrated moving average (SARIMA) model is an improvement on the ARIMA model. It is mainly adopted for the analysis of time series with seasonal variations or cyclical variations due to some other factors (Moeeni et al. 2017). It enables easy real-time adjustment of the model using the availability of more historical data, as well as incorporating the residuals generated by the fit as an element of analysis. Its outstanding advantage is the advanced accuracy of the short time prediction results (Fashae et al. 2018). SARIMA models are currently widely used in the fields of energy, climate, medicine and economics, especially for hydrological time series containing seasonal variations, and are admired by scholars at home and abroad.




Gravitational search algorithm (GSA)
The GSA is based on Newton's theories of gravity (Wang et al. 2020) which consider the search for a series of particles moving in space as a multi-objective optimal solution problem, with each particle being subject to the force of gravity at the same time. The larger the mass of the particle, the better the position it occupies in the global search space, and thereby the optimal solution to the problem is found. The GSA algorithm is suitable for the optimal search of the parameters of the model and has a good capability of global searches (Pelusi et al. 2019). The detailed process can be described as follows:







Modelling process
The overall modeling process of this study is depicted in Figure 1.
Evaluation criteria


CASE STUDY
Study area
Mann-Kendall (MK) test
The statistical parameters of the MK trend test for the runoff series from the Xianyang hydrological station can be acquired by the method described in section 2.1 and are listed in Table 1. The change in runoff volume over time and its first order linear trend line are plotted in Figure 3. The rejection of the original hypothesis is statistically significant at the 0.05 level of significance when . The UFk trend test value of −0.116, with an absolute value of less than 1.96, indicates that the trend test of significance with a 95% confidence level was not passed. This indicates that the runoff series shows a decreasing trend, however the trend is not significant.
Statistical parameters of MK trend test for runoff series at Xianyang hydrological station
Correlation coefficient r . | Median β . | S . | Var . | UFk . | UBk . |
---|---|---|---|---|---|
0.091 | 9.750 | −50 | 194,366.667 | −0.116 | −0.111 |
Correlation coefficient r . | Median β . | S . | Var . | UFk . | UBk . |
---|---|---|---|---|---|
0.091 | 9.750 | −50 | 194,366.667 | −0.116 | −0.111 |
As seen in Figure 3, the monthly average runoff at Xianyang station from 2010 to 2020 is highly stochastic, with two large peaks in the 29th and 89th months. In addition, the runoff sequences exhibit a considerable degree of seasonality, with dramatic changes in runoff occurring with each change of season. The trend line reveals that the correlation coefficient r is 0.091, which is much less than 1, which verifies the non-linear pattern of the runoff time series. For such non-linear and seasonal time series, prediction may become quite challenging.
ARIMA model
Smoothness test
The ARIMA model is built on the premise that the data is smooth. As can be seen from Figure 3, the runoff data is significantly non-smooth and random, so the original data needs to be differenced to the first order before modelling. The time-series diagram after differencing is shown in Figure 5. To examine the smoothness of the time series after the first order difference, the ADF test is chosen to perform the determination. In the ADF test process, three cases are selected in turn, including the intercept term and trend term, the intercept term, and the original series for stepwise testing and rejection, and the test results are displayed in Table 2.
ADF smoothness test for first order difference series
. | . | t-Statistic . | Prob.* . |
---|---|---|---|
Augmented Dickey-Fuller test statistic | −9.37919 | 0.00012 | |
Test critical values: | 1% level | −2.58471 | |
5% level | −1.94356 | ||
10% level | −1.61493 |
. | . | t-Statistic . | Prob.* . |
---|---|---|---|
Augmented Dickey-Fuller test statistic | −9.37919 | 0.00012 | |
Test critical values: | 1% level | −2.58471 | |
5% level | −1.94356 | ||
10% level | −1.61493 |
ARIMA identification and ordering
It can be observed that there is a more pronounced trailing in Figure 6, and the initial judgement is that there is an autocorrelation 3rd order trailing with a partial autocorrelation 5th order trailing. For further determination of the autoregressive order and moving average order of the model, the models ARIMA(3,1,3), ARIMA(3,1,5), ARIMA(5,1,3) and ARIMA(5,1,5) are established respectively and combined with the AIC, SC and HQ values to determine the model parameters (as in Table 3). It is found that the AIC, SC and HQ values of ARIMA(3,1,3) are the smallest. Based on the minimisation principle, the ARIMA(3,1,3) model is chosen to be developed.
Comparison of model indicators
. | ARIMA(3,1,3) . | ARIMA(3,1,5) . | ARIMA(5,1,3) . | ARIMA(5,1,5) . |
---|---|---|---|---|
AIC | 5.6639 | 5.7626 | 5.7157 | 5.8297 |
SC | 5.7336 | 5.8323 | 5.7854 | 5.8994 |
HQ | 5.6922 | 5.7909 | 5.7440 | 5.8580 |
. | ARIMA(3,1,3) . | ARIMA(3,1,5) . | ARIMA(5,1,3) . | ARIMA(5,1,5) . |
---|---|---|---|---|
AIC | 5.6639 | 5.7626 | 5.7157 | 5.8297 |
SC | 5.7336 | 5.8323 | 5.7854 | 5.8994 |
HQ | 5.6922 | 5.7909 | 5.7440 | 5.8580 |
Significance test
Once the model is established, the runoff time series of Xianyang hydrological station can be predicted. To judge the validity of the model, the ARIMA(3,1,3) model is also tested for significance. The model test results are reported in Table 4.
Model test results
Variable . | Coefficient . | Std. error . | t-Statistic . | Prob. . |
---|---|---|---|---|
AR(3) | 0.357094 | 0.156827 | 2.276988 | 0.0246 |
MA(3) | −0.781715 | 0.122941 | −6.358476 | 0.0000 |
Variable . | Coefficient . | Std. error . | t-Statistic . | Prob. . |
---|---|---|---|---|
AR(3) | 0.357094 | 0.156827 | 2.276988 | 0.0246 |
MA(3) | −0.781715 | 0.122941 | −6.358476 | 0.0000 |
The coefficients of AR (3) and MA(3) in Table 4 are significantly non-zero and the values of the regression coefficients P are less than 0.05, which suggests that the developed ARIMA(3,1,3) model passed the significance test.
SARIMA model
Smoothness test
ADF smoothness test
. | . | t-Statistic . | Prob.* . |
---|---|---|---|
Augmented Dickey-Fuller test statistic | −11.48168 | 0.0000 | |
Test critical values: | 1% level | −2.589273 | |
5% level | −1.944211 | ||
10% level | −1.614532 |
. | . | t-Statistic . | Prob.* . |
---|---|---|---|
Augmented Dickey-Fuller test statistic | −11.48168 | 0.0000 | |
Test critical values: | 1% level | −2.589273 | |
5% level | −1.944211 | ||
10% level | −1.614532 |
The prerequisite for the construction of a SARIMA model is that the time series should be generated by a smooth stochastic process with zero mean, which means the stochastic nature of the process is time-invariant. Figure 7 reveals that the data basically fluctuates around the value of 0 and the fluctuation range is basically symmetrical. This indicates that the seasonally differenced runoff series satisfy the prerequisites and that a SARIMA model can be constructed for modelling predictions.
SARIMA identification and ordering
Comparison of indicators
. | SARIMA(4,1,4) (1,1,1)12 . | SARIMA(4,1,5) (1,1,1)12 . | SARIMA(5,1,4) (1,1,1)12 . | SARIMA(5,1,5) (1,1,1)12 . |
---|---|---|---|---|
AIC | 5.3984 | 5.3975 | 5.4033 | 5.4308 |
SC | 5.5466 | 5.5456 | 5.5515 | 5.5789 |
HQ | 5.4585 | 5.4576 | 5.4634 | 5.4908 |
. | SARIMA(4,1,4) (1,1,1)12 . | SARIMA(4,1,5) (1,1,1)12 . | SARIMA(5,1,4) (1,1,1)12 . | SARIMA(5,1,5) (1,1,1)12 . |
---|---|---|---|---|
AIC | 5.3984 | 5.3975 | 5.4033 | 5.4308 |
SC | 5.5466 | 5.5456 | 5.5515 | 5.5789 |
HQ | 5.4585 | 5.4576 | 5.4634 | 5.4908 |
Since the runoff series is differenced initially and then seasonally with a first order step of 12. As a consequence, d = 1 and D = 1. The ACF plot tapers to zero after the next four orders and the PACF plot trails off after the next five orders. Meanwhile, at lag order k = 12, both ACF and PACF coefficients are significantly nonzero, so that P = 1 and Q = 1. Thus an attempt can be made to build models SARIMA(4,1,4)(1,1,1)12, SARIMA(4,1,5)(1,1,1)12, SARIMA(5,1,4)(1,1,1)12, SARIMA(5,1,5)(1,1,1)12. It is observed that the AIC, SC, and HQ values of SARIMA(4,1,5)(1,1,1)12 are all minimum, and according to the minimization principle, the SARIMA(4,1,5)(1,1,1)12 model is recommended.
Significance test
After the establishment of the model, the runoff time series of Xianyang hydrological station can be predicted. For judging the validity of the model, the SARIMA(4,1,5)(1,1,1)12 model is also required to be tested for significance, and the model test results are described in Table 7.
Model test results
Variable . | Coefficient . | Std. error . | t-Statistic . | Prob. . |
---|---|---|---|---|
C | −0.192271 | 0.876339 | −0.219402 | 0.5268 |
AR(4) | 0.209565 | 0.125484 | 1.670055 | 0.0382 |
SAR(12) | 0.283781 | 0.167126 | 1.698005 | 0.0126 |
MA(5) | 0.012912 | 0.128379 | 0.10058 | 0.0201 |
SMA(12) | 0.420178 | 0.163885 | 2.563859 | 0.0118 |
Variable . | Coefficient . | Std. error . | t-Statistic . | Prob. . |
---|---|---|---|---|
C | −0.192271 | 0.876339 | −0.219402 | 0.5268 |
AR(4) | 0.209565 | 0.125484 | 1.670055 | 0.0382 |
SAR(12) | 0.283781 | 0.167126 | 1.698005 | 0.0126 |
MA(5) | 0.012912 | 0.128379 | 0.10058 | 0.0201 |
SMA(12) | 0.420178 | 0.163885 | 2.563859 | 0.0118 |
As evidenced by the above results, the AR, SAR, and MA coefficients are significantly non-zero, and the values of the regression coefficients P are less than 0.05, which suggests that the developed SARIMA(4,1,5)(1,1,1)12 model passes the significance test.
GSA-ARIMA model
GSA has a strong global search capability, and the modeling steps of the GSA-ARIMA model are basically the same as those of the ARIMA model. The main difference is that the gravitational search algorithm (GSA) is introduced when conducting the model to determine the order, and the optimal solution is sought for the model order by the gravitational search algorithm. The optimal model order sought by the GSA algorithm is p = 2 and q = 2, which is the ARIMA(2,1,2) model.
GSA-SARIMA model
The modeling steps of the GSA-SARIMA model are exactly identical to those of the GSA-ARIMA model, but the results of the model fixed order are not the same. This is because the SARIMA model introduces seasonal parameters P and Q as well as seasonal differential counts D. The optimal model order searched by the GSA algorithm is p = 2, q = 3, P = 1, Q = 1, namely the SARIMA(2,1,3) (1,1,1)12 model.
RESULTS AND DISCUSSION
Results
Based on the ARIMA, SARIMA, GSA-SARIMA and GSA-ARIMA models constructed in section 2, the prediction results of each model can be deduced. Comparing the obtained predicted values with the true values, the relative errors as well as the average relative errors of each model are acquired, which are detailed in Table 8.
Prediction results of each model
. | . | GSA-SARIMA . | GSA-ARIMA . | SARIMA . | ARIMA . | ||||
---|---|---|---|---|---|---|---|---|---|
Month . | Observed . | Predicted . | RE/% . | Predicted . | RE/% . | Predicted . | RE/% . | Predicted . | RE/% . |
99 | 7.38 | 7.14 | −3.33 | 6.46 | −12.45 | 6.57 | −11.07 | 7.24 | −1.95 |
100 | 7.30 | 6.25 | −14.49 | 5.34 | −26.92 | 5.83 | −20.17 | 5.85 | −19.97 |
101 | 5.21 | 5.05 | −3.01 | 4.87 | −6.56 | 6.33 | 21.50 | 4.89 | −6.25 |
102 | 3.45 | 3.99 | 15.68 | 3.70 | 7.31 | 5.20 | 50.94 | 3.94 | 14.36 |
103 | 3.46 | 3.70 | 6.95 | 3.82 | 10.62 | 4.05 | 17.17 | 3.31 | −4.22 |
104 | 3.73 | 3.89 | 4.48 | 3.81 | 2.21 | 3.87 | 3.73 | 3.72 | −0.08 |
105 | 2.69 | 2.29 | −15.06 | 2.83 | 5.15 | 3.71 | 37.82 | 4.54 | 68.84 |
106 | 5.00 | 4.90 | −2.13 | 5.20 | 3.83 | 3.67 | −26.65 | 5.75 | 14.98 |
107 | 6.82 | 5.74 | −15.83 | 5.73 | −15.97 | 5.60 | −17.84 | 7.04 | 3.27 |
108 | 6.50 | 6.34 | −2.41 | 5.80 | −10.80 | 5.89 | −9.33 | 7.33 | 12.71 |
109 | 4.99 | 5.77 | 15.63 | 5.80 | 16.20 | 6.03 | 20.78 | 7.25 | 45.09 |
110 | 5.75 | 5.40 | −6.17 | 6.61 | 14.88 | 6.07 | 5.62 | 7.09 | 23.33 |
111 | 7.67 | 7.95 | 3.66 | 7.45 | −2.87 | 9.37 | 22.14 | 6.84 | −10.80 |
112 | 11.54 | 11.06 | −4.17 | 11.36 | −1.56 | 9.87 | −14.46 | 6.76 | −41.41 |
113 | 12.17 | 10.48 | −13.91 | 12.18 | 0.09 | 10.16 | −16.53 | 6.69 | −45.00 |
114 | 8.28 | 7.97 | −3.72 | 7.69 | −7.15 | 10.24 | 23.65 | 6.61 | −20.16 |
115 | 4.42 | 4.32 | −2.24 | 4.55 | 2.94 | 5.49 | 24.09 | 6.49 | 46.76 |
116 | 4.75 | 5.07 | 6.79 | 5.38 | 13.16 | 8.77 | 84.69 | 6.21 | 30.68 |
117 | 5.54 | 5.32 | −4.02 | 5.74 | 3.46 | 4.43 | −20.16 | 5.86 | 5.75 |
118 | 4.18 | 4.23 | 1.30 | 4.86 | 16.37 | 4.32 | 3.48 | 5.45 | 30.55 |
119 | 3.41 | 3.56 | 4.45 | 3.65 | 7.17 | 3.76 | 10.35 | 5.00 | 46.68 |
120 | 3.36 | 3.85 | 14.64 | 3.89 | 15.83 | 3.67 | 9.11 | 4.79 | 42.58 |
Mean relative error() | 6.57 | 9.25 | 20.49 | 23.28 |
. | . | GSA-SARIMA . | GSA-ARIMA . | SARIMA . | ARIMA . | ||||
---|---|---|---|---|---|---|---|---|---|
Month . | Observed . | Predicted . | RE/% . | Predicted . | RE/% . | Predicted . | RE/% . | Predicted . | RE/% . |
99 | 7.38 | 7.14 | −3.33 | 6.46 | −12.45 | 6.57 | −11.07 | 7.24 | −1.95 |
100 | 7.30 | 6.25 | −14.49 | 5.34 | −26.92 | 5.83 | −20.17 | 5.85 | −19.97 |
101 | 5.21 | 5.05 | −3.01 | 4.87 | −6.56 | 6.33 | 21.50 | 4.89 | −6.25 |
102 | 3.45 | 3.99 | 15.68 | 3.70 | 7.31 | 5.20 | 50.94 | 3.94 | 14.36 |
103 | 3.46 | 3.70 | 6.95 | 3.82 | 10.62 | 4.05 | 17.17 | 3.31 | −4.22 |
104 | 3.73 | 3.89 | 4.48 | 3.81 | 2.21 | 3.87 | 3.73 | 3.72 | −0.08 |
105 | 2.69 | 2.29 | −15.06 | 2.83 | 5.15 | 3.71 | 37.82 | 4.54 | 68.84 |
106 | 5.00 | 4.90 | −2.13 | 5.20 | 3.83 | 3.67 | −26.65 | 5.75 | 14.98 |
107 | 6.82 | 5.74 | −15.83 | 5.73 | −15.97 | 5.60 | −17.84 | 7.04 | 3.27 |
108 | 6.50 | 6.34 | −2.41 | 5.80 | −10.80 | 5.89 | −9.33 | 7.33 | 12.71 |
109 | 4.99 | 5.77 | 15.63 | 5.80 | 16.20 | 6.03 | 20.78 | 7.25 | 45.09 |
110 | 5.75 | 5.40 | −6.17 | 6.61 | 14.88 | 6.07 | 5.62 | 7.09 | 23.33 |
111 | 7.67 | 7.95 | 3.66 | 7.45 | −2.87 | 9.37 | 22.14 | 6.84 | −10.80 |
112 | 11.54 | 11.06 | −4.17 | 11.36 | −1.56 | 9.87 | −14.46 | 6.76 | −41.41 |
113 | 12.17 | 10.48 | −13.91 | 12.18 | 0.09 | 10.16 | −16.53 | 6.69 | −45.00 |
114 | 8.28 | 7.97 | −3.72 | 7.69 | −7.15 | 10.24 | 23.65 | 6.61 | −20.16 |
115 | 4.42 | 4.32 | −2.24 | 4.55 | 2.94 | 5.49 | 24.09 | 6.49 | 46.76 |
116 | 4.75 | 5.07 | 6.79 | 5.38 | 13.16 | 8.77 | 84.69 | 6.21 | 30.68 |
117 | 5.54 | 5.32 | −4.02 | 5.74 | 3.46 | 4.43 | −20.16 | 5.86 | 5.75 |
118 | 4.18 | 4.23 | 1.30 | 4.86 | 16.37 | 4.32 | 3.48 | 5.45 | 30.55 |
119 | 3.41 | 3.56 | 4.45 | 3.65 | 7.17 | 3.76 | 10.35 | 5.00 | 46.68 |
120 | 3.36 | 3.85 | 14.64 | 3.89 | 15.83 | 3.67 | 9.11 | 4.79 | 42.58 |
Mean relative error() | 6.57 | 9.25 | 20.49 | 23.28 |
Note: RE stands for Relative Error.
Discussion
The GSA-SARIMA model plateaus earlier than the GSA-ARIMA model, and the loss value of training decreases rapidly within 50 steps, which demonstrates that the GSA-SARIMA model has a faster learning speed. Additionally, the loss value for the GSA-SARIMA model to reach a steady state is less than 0.1, showing a superior training effect. According to the evaluation indicators in section 1.6, the evaluation parameters for each model are calculated as follows in Table 9.
Comparative analysis of prediction accuracy among models
Predictive model . | Mean relative error/% . | Root mean square error/m3 . | Nash efficiency coefficient . |
---|---|---|---|
ARIMA | 23.28 | 3.78 | 0.34 |
SARIMA | 20.49 | 2.08 | 0.58 |
GSA-ARIMA | 9.25 | 0.50 | 0.86 |
GSA-SARIMA | 6.57 | 0.21 | 0.91 |
Predictive model . | Mean relative error/% . | Root mean square error/m3 . | Nash efficiency coefficient . |
---|---|---|---|
ARIMA | 23.28 | 3.78 | 0.34 |
SARIMA | 20.49 | 2.08 | 0.58 |
GSA-ARIMA | 9.25 | 0.50 | 0.86 |
GSA-SARIMA | 6.57 | 0.21 | 0.91 |
As can be found in Table 9, the average relative error and root mean square error of the model are both decreasing in order from top to bottom, while the Nash efficiency coefficient tends increasingly to 1, which shows a positive trend towards model prediction accuracy. The purpose of this study is to perform seasonal differencing to remove the effects of seasonal variation, while another method to achieve the same goal is to carry out filter decomposition (Zhao et al. 2021; Zhang et al. 2022). It is possible in the future to make various approaches in different situations, and also to compare and analyse the models constructed by the two approaches to study the adaptability, strengths and weaknesses of respective approaches (Wang & Zhou 2020).
CONCLUSION
- (1)
A novel runoff prediction model, the GSA-SARIMA model, is developed in this study. The modelling process is as follows: the 99th month is analysed as the abrupt change point in the runoff data by the MK test, so the 99th month is regarded as the dividing point between the training set and the prediction set. Subsequently, we have constructed the SARIMA model by performing seasonal differencing based on the ARIMA model. In order to enhance the stability of the SARIMA model, the model fixed parameters are optimised by the GSA gravity search algorithm at the early stage of modelling. Ultimately, the GSA-SARIMA model is constructed.
- (2)
By applying it to the prediction of the runoff series of the Xianyang section of the Weihe River from 2010 to 2019, the results show that the GSA gravitational search algorithm has a strong local search capability, and the model with seasonal differencing tends to have improved prediction performance. The analysis of the evaluation metrics revealed that the GSA-SARIMA model features the highest linear correlation coefficient and Nash efficiency coefficient, which are 0.9351 and 0.91. The GSA-SARIMA model is also accompanied by lower mean relative errors and root mean square errors, with 6.57 and 0.21 respectively. The model outperforms the other models developed in all respects, making its application to the prediction of runoff data extremely promising.
- (3)
Although the developed GSA-SARIMA model is outstanding in all aspects, the scope of this study is localized. In the future, this method can be applied to regions with different substrates and varying climates for a deeper level of exploration. This study only provides short-term predictions of river runoff, and the physical mechanisms behind the changes in runoff are not considered. Future research may develop predictive studies from the physical mechanisms.
ETHICS APPROVAL
Not applicable.
CONSENT TO PARTICIPATE
Not applicable.
CONSENT FOR PUBLICATION
Not applicable.
COMPETING INTERESTS
The authors declare no competing interests.
AUTHORS CONTRIBUTIONS
Zhang XQ: Methodology, Investigation, Formal analysis. Wu XL: Conceptualization, Writing-Original draft preparation. Zhu GY: Methodology, project administration. Lu XB: Conceptualization, Resources. Wang Kai: Methodology, Formal analysis.
FUNDING
The authors wish to thank the Key Scientific Research Project of Colleges and Universities in Henan Province (CN) [grant numbers 17A570004] for the collection, analysis and interpretation of data.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.