The prediction of river runoff is crucial for flood forecasting, agricultural irrigation and hydroelectric power generation. A coupled runoff prediction model based on the Gravitational Search Algorithm (GSA) and the Seasonal Autoregressive Integrated Moving Average (SARIMA) model is proposed to address the non-linear and seasonal features of runoff data. The GSA has a significant local optimisation capability, while the SARIMA model allows for real-time adjustment of the model using historical data and is suitable for analysing time series with seasonal variations. Consequently, the GSA-SARIMA model was developed and applied to the runoff prediction of the Xianyang section of the Wei River. The results suggest that the GSA-SARIMA model achieves a linear correlation coefficient of 0.9351, a Nash efficiency coefficient of 0.91, a mean relative error of 6.57 and a root mean square error of 0.21. All of the evaluation indicators of this model outperform the other models developed, and its application to actual runoff prediction is feasible, which creates a new path for runoff prediction.

HIGHLIGHTS

  • The Mann-Kendall trend test is applied to ascertain the separation point between the training and prediction datasets. It avoids too little data in the test set, while effectively improving the generalisation of the model.

  • The SARIMA model is an improvement on the ARIMA model and allows for convenient real-time adjustment of the model.

  • The GSA algorithm is applicable to parameter search optimization of the model and has great global search capability.

Graphical Abstract

Graphical Abstract

River runoff is a critical factor in the hydrological system and is influenced by a variety of factors such as geographical location, topography and climate, with a highly non-linear and seasonal nature (Wu et al. 2020). Runoff prediction plays an increasingly significant role in flood forecasting, agricultural irrigation and hydroelectric power generation, and has been attracting research from scholars both at home and abroad. Wang et al. (2015) coupled an autoregressive integrated moving average (ARIMA) model with an ensemble empirical model decomposition (EEMD) and applied it to runoff series from three reservoirs in China (Biliuhe Reservoir, Dahuofang Reservoir and Mapanshan Reservoir), which demonstrated that the proposed coupled model could significantly improve the prediction accuracy. Zhang et al. (2019) developed a novel model based on a modified ensemble empirical modal decomposition (MEEMD) with ARIMA model in order to predict the runoff volume of the lower Yellow River. It was found that the accuracy of the MEEMD-ARIMA model was higher than that of the other models established. Rizeei et al. (2018) combined the Land Transformation Model (LTM) with an ARIMA model to predict rainfall in 2020, which proved to be a decreasing trend from 2000 to 2015, with this trend continuing. Hao et al. (2017) used statistical methods to analyse and then built an ARIMA model to predict the runoff from the Three Gorges Project. The predictions revealed that the runoff had large abrupt changes in 1991 and 2001, with a variation period of about 10 years. Wang et al. (2018) predicted the daily runoff time series at Tang Naihai station, China, based on empirical modal decomposition (EMD) and ARIMA models. It was shown that the proposed model was effective and suitable for hydrological prediction of long series. In order to predict daily influent flows, Boyd et al. (2019) validated the ARIMA model predictions against actual measured data from five North American wastewater treatment plants. It turned out that the ARIMA model achieved satisfactory prediction values, broadening the application area of the ARIMA model. To discover the influence of climate on runoff, Banihabib et al. (2018) predicted monthly runoff data employing an ARIMA model by taking climate related indices as the input signal to the model. The coupled model was observed to yield higher prediction accuracy than the conventional ARIMA model. For the purpose of testing the predictive power of the ARIMA model against the Seasonal Autoregressive Integrated Moving Average (SARIMA) model, Valipour (2015) predicted runoff volumes for 2011 based on runoff data from 1901 to 2010 for each US state. The results indicated that a cycle of runoff in the US occurs every 20 years. Razak et al. (2018) analysed the variation of rainfall and runoff trends in the Segamat River (Malaysia) with the use of the Mann-Kendall (MK) trend test to develop an ARIMA model for flood prediction. The trend analysis illustrated an increasing trend in rainfall, with forecasting results indicating that the ARIMA (0,1,2) model predictions performed optimally. Nury et al. (2017) invented a temperature prediction model combining wavelet techniques with ARIMA on the basis of the results of MK tests on temperature time series in the northeast of Bangladesh. It was demonstrated that the wavelet-ARIMA model outperformed the wavelet-artificial neural network (ANN) model. Niu et al. (2020) proposes a hybrid prediction model to forecast the annual runoff time series of three large hydropower reservoirs in China. The gravitational search algorithm (GSA) was used to optimally adjust the model parameters during the selection process, and the results suggested that the optimised model is more competitive. Niu et al. (2019) optimized the solution of the extreme learning machine (ELM) selection and variational operators using the Improved Gravity Search Algorithm (IGSA), in which results suggested that the coupled model significantly enhanced the prediction of the monthly runoff time series over other models. Scholars at domestic and abroad have mainly focused on the enhancement and optimisation of models. Some scholars adopt the idea of coupled models, coupling the prediction model with the decomposition model and the optimization solution model, while others concentrated on the improvement of the original model. This study introduces seasonal factors to improve the ARIMA model itself on the one hand, and on the other hand the GSA algorithm is applied to optimise the parameters of the SARIMA model. The innovative coupling of the SARIMA model with the GSA algorithm constructs the GSA-SARIMA model, which can adjust the model in real time and features a powerful global search capability. The objective of this study is to provide improved predictive simulation of river runoff.

Mann-Kendall trend test

For monthly runoff prediction, it is common to divide the time series into a training set and a prediction set according to the proportion (Zhang et al. 2021). Since monthly runoff data are small sample data, there may be different trends in the training and prediction sets after the proportional allocation. Thus, a statistical test is introduced to examine the trend of the monthly runoff data and to determine whether there are abrupt change points in it. By using the mutation points as the cut-off point between the training and prediction datasets, we can avoid too little data in the test set and also effectively improve the generalisation ability of the model.

The MK trend test is a non-parametric test recommended by the World Meteorological Organization (WMO) and has been widely adopted (Yagbasan et al. 2020). This method allows access to trends and data mutation points in the hydrological time series, where the series to be examined are not required to follow a certain distribution. For any sequence to be tested, where n is the total length of the time series. The statistical variable S can be defined as:
(1)
Among them:
(2)
Assume that there are independent observations of x. If no trend exists, then these observations are equally distributed. Therefore, the mean and variance of S are:
(3)
(4)
Define the test statistic for x as:
(5)

The trend in x is tested by comparing the magnitude of UFn with the standard normal variable at significance level , where is the probability of tolerance for the MK test to reject the original hypothesis ranging from (0,0.5), usually taking = 0.05 (Alifujiang et al. 2020). When , reject the original hypothesis; otherwise, accept the original hypothesis. Conditional on rejection of the original hypothesis, the series has an upward trend if , and a downward trend if . To detect burst points in the data, the time series data are sorted in reverse order and the process is repeated to obtain the test statistical series . An intersection of and is a mutation point in a time series if it is within confidence interval .

Autoregressive integrated moving average (ARIMA) model

Box et al. (1976) first proposed an ARIMA model based on stochastic theory in the 1970s. The basic idea and principle of the ARIMA model is to treat the data series of the forecast object over time as a random series, differencing its series to make it smooth, and then fitting this series with a mathematical model. The model draws conclusions relating to past behaviour and makes predictions and inferences about its future behaviour from the time series itself. The ARIMA model has the benefits of in-depth theoretical analysis and efficient application, however, it has some limitations for time series with seasonality (Garg et al. 2015). The ARIMA (p,d,q) model contains two components: an autoregressive (AR) model and a sliding average (MA) model; d refers to the number of differences required to transform a non-stationary time series into a stationary series, p is the autoregressive order and q is the moving average order. The specific expressions are as follows:
(6)
where is a smooth time series and is a white noise series. p and q can be determined by plotting autocorrelation versus partial autocorrelation, and after determination the model fit parameters and are available.

Seasonal autoregressive integrated moving average (SARIMA) model

The seasonal autoregressive integrated moving average (SARIMA) model is an improvement on the ARIMA model. It is mainly adopted for the analysis of time series with seasonal variations or cyclical variations due to some other factors (Moeeni et al. 2017). It enables easy real-time adjustment of the model using the availability of more historical data, as well as incorporating the residuals generated by the fit as an element of analysis. Its outstanding advantage is the advanced accuracy of the short time prediction results (Fashae et al. 2018). SARIMA models are currently widely used in the fields of energy, climate, medicine and economics, especially for hydrological time series containing seasonal variations, and are admired by scholars at home and abroad.

The SARIMA model is suitable for modelling time series data with a cyclical nature, which is a cycle-based seasonal differencing of the ARIMA model. Consider the period of variation of the seasonal series as s. Define the difference operator:
(7)
If yt denotes a seasonal time series, the primary seasonal difference is divided into:
(8)
The non-stationary seasonal time series is transformed into a stationary series after performing D times seasonal differencing, and a P-order autoregressive, Q-order moving average seasonal time series model is built about period s:.
(9)
It can be described as:
(10)
where L denotes the lag operator. , represent the non-seasonal and seasonal autoregressive polynomials; , stand for the non-seasonal and seasonal moving average polynomials; P, Q mean the maximum lag order of the seasonal autoregressive and moving average operators; p, q indicate the maximum lag order of the non-seasonal autoregressive, moving average operator; d, D refer to the number of non-seasonal and seasonal differentials.

Gravitational search algorithm (GSA)

The GSA is based on Newton's theories of gravity (Wang et al. 2020) which consider the search for a series of particles moving in space as a multi-objective optimal solution problem, with each particle being subject to the force of gravity at the same time. The larger the mass of the particle, the better the position it occupies in the global search space, and thereby the optimal solution to the problem is found. The GSA algorithm is suitable for the optimal search of the parameters of the model and has a good capability of global searches (Pelusi et al. 2019). The detailed process can be described as follows:

Suppose there are s particles in n-dimensional space, i.e:.
(11)
The location of the ith particle in the d-dimension is given by . The positions of all particles are randomly distributed before the search is performed. In d-dimensional space at time t, the gravitational force of particle j on i is:
(12)
denotes the inertial mass of particle j and denotes the inertial mass of particle i. is the Euclidean distance between i and j. The expression is shown in Equation (13). is a tiny constant that prevents the Euclidean distance between i and j from being zero. is the gravitational constant at moment t (usually taken to be 100 for , 10 for ) and is calculated as in Equation (14):.
(13)
(14)
At this point, the sum of the gravitational forces of particle i on all other particles can be derived that:
(15)
of which is a random number between [0,1].
The acceleration of particle i in d-dimensional space is:
(16)
is the inertial mass of particle i.
The position and velocity of the particle are updated with each iteration because of the acceleration, and the update equation is:
(17)
(18)
(19)
where is the fitness value of particle i at moment t. When solving the minimax problem, and can be defined as:
(20)
(21)
While solving the maximum problem, and can be defined as:
(22)
(23)
The present velocity and position of the particle are obtained from the acceleration equation:
(24)
(25)
After each iteration, the position and velocity of the particles are updated simultaneously until the maximum number of iterations reached or the required accuracy achieved.
Figure 1

Flow chart of the coupled GSA-SARIMA model modelling.

Figure 1

Flow chart of the coupled GSA-SARIMA model modelling.

Close modal

Modelling process

The overall modeling process of this study is depicted in Figure 1.

Evaluation criteria

In order to explore the feasibility of the above prediction model, three metrics, mean relative error (MAPE), root mean square error (RMSE), and Nash efficiency coefficient (NSE), were selected to assess the accuracy of the model in this paper (Song et al. 2020; Tuo et al. 2020). Assume that N is the total length of the runoff series. is the actual value of runoff at time i. is the predicted value of runoff at time i. Then:
(26)
(27)
(28)

Study area

The Wei River, also known as Weishui, is the largest tributary of the Yellow River. It originates in the Bird Rat Mountain in Weiyuan County of Dingxi City in Gansu Province, and flows mainly through Xianyang City and Xi'an City in Shaanxi Province, finally merging into the Yellow River at Weinan City. The Wei River has a total length of 818 km and a basin area of 134,766 km2. The Weihe River has an average annual precipitation of 537–650 mm, a mean temperature of 9.0–13.2 °C, a multi-year average runoff of 75.7×108 m3 and a high runoff modulus of about 2.5–3.7 L/s·km2. Runoff is unevenly distributed regionally, with a general trend of decreasing from south to north. There is more runoff in the west than the east, and more midstream runoff than downstream. The Wei River basin is located in the warm temperate zone and has a continental monsoon climate. The four seasons differ from each other in terms of heat, cold, dryness and humidity, and the climate is mild, which is conducive to the development of agriculture, forestry, animal husbandry and fishing industries. The Xianyang section of the Wei River has a total length of 21.2 km and a basin area of 46,827 km2. The Xianyang hydrological station is equipped to observe the hydrological elements of the river. Xianyang Hydrographic Station is a national significant hydrographic station in China, located in Xianyang City of Shaanxi Province, 108 °42′ East longitude and 34 °19′ North latitude. The runoff data are obtained from the Resource and Environmental Science and Data Centre of the Chinese Academy of Sciences (https://www.resdc.cn/), and the interpolation and extension of adjacent stations is applied to fill in the years with missing data. The specific schematic diagram of the study area is presented in Figure 2, and the runoff sequence of the Xianyang section of the Wei River from 2010 to 2019 is shown in Figure 3.
Figure 2

Location of the study area.

Figure 2

Location of the study area.

Close modal
Figure 3

Monthly runoff data from Xianyang hydrological station (2011–2020).

Figure 3

Monthly runoff data from Xianyang hydrological station (2011–2020).

Close modal

Mann-Kendall (MK) test

The statistical parameters of the MK trend test for the runoff series from the Xianyang hydrological station can be acquired by the method described in section 2.1 and are listed in Table 1. The change in runoff volume over time and its first order linear trend line are plotted in Figure 3. The rejection of the original hypothesis is statistically significant at the 0.05 level of significance when . The UFk trend test value of −0.116, with an absolute value of less than 1.96, indicates that the trend test of significance with a 95% confidence level was not passed. This indicates that the runoff series shows a decreasing trend, however the trend is not significant.

Table 1

Statistical parameters of MK trend test for runoff series at Xianyang hydrological station

Correlation coefficient rMedian βSVarUFkUBk
0.091 9.750 −50 194,366.667 −0.116 −0.111 
Correlation coefficient rMedian βSVarUFkUBk
0.091 9.750 −50 194,366.667 −0.116 −0.111 

As seen in Figure 3, the monthly average runoff at Xianyang station from 2010 to 2020 is highly stochastic, with two large peaks in the 29th and 89th months. In addition, the runoff sequences exhibit a considerable degree of seasonality, with dramatic changes in runoff occurring with each change of season. The trend line reveals that the correlation coefficient r is 0.091, which is much less than 1, which verifies the non-linear pattern of the runoff time series. For such non-linear and seasonal time series, prediction may become quite challenging.

Figure 4 illustrates the results of the abrupt variability test for the runoff series at Xianyang hydrological station. The UF and UB curves have an intersection point between 98 and 99 months, with the corresponding UF value less than 0, indicating that this intersection point is the abrupt variation point for the decrease in monthly mean runoff. The mutation point is between the critical lines, suggesting that the mutation is not significant. A trend towards a decrease in mean monthly runoff is evident after the 99th month when UF values begin to be greater than 1.96. As a result, runoff data from 1 to 98 months prior to the mutation point is used as the training set and runoff data from 99 to 120 months is considered as the prediction set.
Figure 4

MK test for runoff sequences from Xianyang hydrological station.

Figure 4

MK test for runoff sequences from Xianyang hydrological station.

Close modal

ARIMA model

Smoothness test

The ARIMA model is built on the premise that the data is smooth. As can be seen from Figure 3, the runoff data is significantly non-smooth and random, so the original data needs to be differenced to the first order before modelling. The time-series diagram after differencing is shown in Figure 5. To examine the smoothness of the time series after the first order difference, the ADF test is chosen to perform the determination. In the ADF test process, three cases are selected in turn, including the intercept term and trend term, the intercept term, and the original series for stepwise testing and rejection, and the test results are displayed in Table 2.

The t-value of −9.37919 can be seen in Table , which is significantly smaller than the three level cut-off values. The original hypothesis of the existence of a unit root is not satisfied at the 1% significance level. So the series is considered smooth after the first order difference treatment and can be modelled for the seasonal ARIMA analysis.
Table 2

ADF smoothness test for first order difference series

t-StatisticProb.*
Augmented Dickey-Fuller test statistic  −9.37919 0.00012 
Test critical values: 1% level −2.58471  
 5% level −1.94356  
 10% level −1.61493  
t-StatisticProb.*
Augmented Dickey-Fuller test statistic  −9.37919 0.00012 
Test critical values: 1% level −2.58471  
 5% level −1.94356  
 10% level −1.61493  
Figure 5

First-order differential timing diagram.

Figure 5

First-order differential timing diagram.

Close modal

ARIMA identification and ordering

In order to meet the basic requirements for ARIMA model building, the model also needs to be identified and ordered. The common ARIMA identification and ordering method is to determine the optimal ARIMA model by using the partial autocorrelation coefficient, autocorrelation coefficient and the AIC, SC and HQ criteria to determine the fit effect (Fouli et al. 2018). The autocorrelation and partial autocorrelation coefficients of the first-order difference series and their correlation plots are obtained using the MATLAB toolbox, as depicted in Figure 6.
Figure 6

ACF and PACF diagram.

Figure 6

ACF and PACF diagram.

Close modal

It can be observed that there is a more pronounced trailing in Figure 6, and the initial judgement is that there is an autocorrelation 3rd order trailing with a partial autocorrelation 5th order trailing. For further determination of the autoregressive order and moving average order of the model, the models ARIMA(3,1,3), ARIMA(3,1,5), ARIMA(5,1,3) and ARIMA(5,1,5) are established respectively and combined with the AIC, SC and HQ values to determine the model parameters (as in Table 3). It is found that the AIC, SC and HQ values of ARIMA(3,1,3) are the smallest. Based on the minimisation principle, the ARIMA(3,1,3) model is chosen to be developed.

Table 3

Comparison of model indicators

ARIMA(3,1,3)ARIMA(3,1,5)ARIMA(5,1,3)ARIMA(5,1,5)
AIC 5.6639 5.7626 5.7157 5.8297 
SC 5.7336 5.8323 5.7854 5.8994 
HQ 5.6922 5.7909 5.7440 5.8580 
ARIMA(3,1,3)ARIMA(3,1,5)ARIMA(5,1,3)ARIMA(5,1,5)
AIC 5.6639 5.7626 5.7157 5.8297 
SC 5.7336 5.8323 5.7854 5.8994 
HQ 5.6922 5.7909 5.7440 5.8580 

Significance test

Once the model is established, the runoff time series of Xianyang hydrological station can be predicted. To judge the validity of the model, the ARIMA(3,1,3) model is also tested for significance. The model test results are reported in Table 4.

Table 4

Model test results

VariableCoefficientStd. errort-StatisticProb.
AR(3) 0.357094 0.156827 2.276988 0.0246 
MA(3) −0.781715 0.122941 −6.358476 0.0000 
VariableCoefficientStd. errort-StatisticProb.
AR(3) 0.357094 0.156827 2.276988 0.0246 
MA(3) −0.781715 0.122941 −6.358476 0.0000 

The coefficients of AR (3) and MA(3) in Table 4 are significantly non-zero and the values of the regression coefficients P are less than 0.05, which suggests that the developed ARIMA(3,1,3) model passed the significance test.

SARIMA model

Smoothness test

The SARIMA model requires the series to be generated by a zero-mean smooth stochastic process, which is reflected graphically as all sample points fluctuating randomly up and down around a certain level, therefore the smoothness of the data needs to be ascertained before using the SARIMA model (Farsi et al. 2020). It is clear from Figure 3 that runoff changes are seasonal, and the original series needs to be differenced seasonally. Runoff variations are commonly annualized, so the original series is differenced 12 times, and the differencing results are provided in Figure 7. The presence of unit roots in the series is tested by ADF, and the series is non-stationary if it exists. The time series of runoff from Xianyang hydrological station are analysed and the ADF test results for the seasonally differenced series are given in Table 5.
Table 5

ADF smoothness test

t-StatisticProb.*
Augmented Dickey-Fuller test statistic  −11.48168 0.0000 
Test critical values: 1% level −2.589273  
 5% level −1.944211  
 10% level −1.614532  
t-StatisticProb.*
Augmented Dickey-Fuller test statistic  −11.48168 0.0000 
Test critical values: 1% level −2.589273  
 5% level −1.944211  
 10% level −1.614532  
Figure 7

Time series after seasonal differencing process.

Figure 7

Time series after seasonal differencing process.

Close modal

The prerequisite for the construction of a SARIMA model is that the time series should be generated by a smooth stochastic process with zero mean, which means the stochastic nature of the process is time-invariant. Figure 7 reveals that the data basically fluctuates around the value of 0 and the fluctuation range is basically symmetrical. This indicates that the seasonally differenced runoff series satisfy the prerequisites and that a SARIMA model can be constructed for modelling predictions.

SARIMA identification and ordering

The SARIMA model is built using the processed time series and the identification and order fixing methods can be determined by combining the autocorrelation function plots of the runoff time series with the non-autocorrelation function plots. The autocorrelation and partial autocorrelation coefficients of the seasonally differenced time series and their correlation plots are obtained from the MATLAB toolbox, which is presented in Figure 8. The models are established in order to further define the autoregressive order and moving average order of the models, and the contrasting values of each model parameter are shown in Table 6.
Table 6

Comparison of indicators

SARIMA(4,1,4) (1,1,1)12SARIMA(4,1,5) (1,1,1)12SARIMA(5,1,4) (1,1,1)12SARIMA(5,1,5) (1,1,1)12
AIC 5.3984 5.3975 5.4033 5.4308 
SC 5.5466 5.5456 5.5515 5.5789 
HQ 5.4585 5.4576 5.4634 5.4908 
SARIMA(4,1,4) (1,1,1)12SARIMA(4,1,5) (1,1,1)12SARIMA(5,1,4) (1,1,1)12SARIMA(5,1,5) (1,1,1)12
AIC 5.3984 5.3975 5.4033 5.4308 
SC 5.5466 5.5456 5.5515 5.5789 
HQ 5.4585 5.4576 5.4634 5.4908 
Figure 8

ACF and PACF diagram.

Figure 8

ACF and PACF diagram.

Close modal

Since the runoff series is differenced initially and then seasonally with a first order step of 12. As a consequence, d = 1 and D = 1. The ACF plot tapers to zero after the next four orders and the PACF plot trails off after the next five orders. Meanwhile, at lag order k = 12, both ACF and PACF coefficients are significantly nonzero, so that P = 1 and Q = 1. Thus an attempt can be made to build models SARIMA(4,1,4)(1,1,1)12, SARIMA(4,1,5)(1,1,1)12, SARIMA(5,1,4)(1,1,1)12, SARIMA(5,1,5)(1,1,1)12. It is observed that the AIC, SC, and HQ values of SARIMA(4,1,5)(1,1,1)12 are all minimum, and according to the minimization principle, the SARIMA(4,1,5)(1,1,1)12 model is recommended.

Significance test

After the establishment of the model, the runoff time series of Xianyang hydrological station can be predicted. For judging the validity of the model, the SARIMA(4,1,5)(1,1,1)12 model is also required to be tested for significance, and the model test results are described in Table 7.

Table 7

Model test results

VariableCoefficientStd. errort-StatisticProb.
−0.192271 0.876339 −0.219402 0.5268 
AR(4) 0.209565 0.125484 1.670055 0.0382 
SAR(12) 0.283781 0.167126 1.698005 0.0126 
MA(5) 0.012912 0.128379 0.10058 0.0201 
SMA(12) 0.420178 0.163885 2.563859 0.0118 
VariableCoefficientStd. errort-StatisticProb.
−0.192271 0.876339 −0.219402 0.5268 
AR(4) 0.209565 0.125484 1.670055 0.0382 
SAR(12) 0.283781 0.167126 1.698005 0.0126 
MA(5) 0.012912 0.128379 0.10058 0.0201 
SMA(12) 0.420178 0.163885 2.563859 0.0118 

As evidenced by the above results, the AR, SAR, and MA coefficients are significantly non-zero, and the values of the regression coefficients P are less than 0.05, which suggests that the developed SARIMA(4,1,5)(1,1,1)12 model passes the significance test.

GSA-ARIMA model

GSA has a strong global search capability, and the modeling steps of the GSA-ARIMA model are basically the same as those of the ARIMA model. The main difference is that the gravitational search algorithm (GSA) is introduced when conducting the model to determine the order, and the optimal solution is sought for the model order by the gravitational search algorithm. The optimal model order sought by the GSA algorithm is p = 2 and q = 2, which is the ARIMA(2,1,2) model.

GSA-SARIMA model

The modeling steps of the GSA-SARIMA model are exactly identical to those of the GSA-ARIMA model, but the results of the model fixed order are not the same. This is because the SARIMA model introduces seasonal parameters P and Q as well as seasonal differential counts D. The optimal model order searched by the GSA algorithm is p = 2, q = 3, P = 1, Q = 1, namely the SARIMA(2,1,3) (1,1,1)12 model.

Results

Based on the ARIMA, SARIMA, GSA-SARIMA and GSA-ARIMA models constructed in section 2, the prediction results of each model can be deduced. Comparing the obtained predicted values with the true values, the relative errors as well as the average relative errors of each model are acquired, which are detailed in Table 8.

Table 8

Prediction results of each model

GSA-SARIMA
GSA-ARIMA
SARIMA
ARIMA
MonthObservedPredictedRE/%PredictedRE/%PredictedRE/%PredictedRE/%
99 7.38 7.14 −3.33 6.46 −12.45 6.57 −11.07 7.24 −1.95 
100 7.30 6.25 −14.49 5.34 −26.92 5.83 −20.17 5.85 −19.97 
101 5.21 5.05 −3.01 4.87 −6.56 6.33 21.50 4.89 −6.25 
102 3.45 3.99 15.68 3.70 7.31 5.20 50.94 3.94 14.36 
103 3.46 3.70 6.95 3.82 10.62 4.05 17.17 3.31 −4.22 
104 3.73 3.89 4.48 3.81 2.21 3.87 3.73 3.72 −0.08 
105 2.69 2.29 −15.06 2.83 5.15 3.71 37.82 4.54 68.84 
106 5.00 4.90 −2.13 5.20 3.83 3.67 −26.65 5.75 14.98 
107 6.82 5.74 −15.83 5.73 −15.97 5.60 −17.84 7.04 3.27 
108 6.50 6.34 −2.41 5.80 −10.80 5.89 −9.33 7.33 12.71 
109 4.99 5.77 15.63 5.80 16.20 6.03 20.78 7.25 45.09 
110 5.75 5.40 −6.17 6.61 14.88 6.07 5.62 7.09 23.33 
111 7.67 7.95 3.66 7.45 −2.87 9.37 22.14 6.84 −10.80 
112 11.54 11.06 −4.17 11.36 −1.56 9.87 −14.46 6.76 −41.41 
113 12.17 10.48 −13.91 12.18 0.09 10.16 −16.53 6.69 −45.00 
114 8.28 7.97 −3.72 7.69 −7.15 10.24 23.65 6.61 −20.16 
115 4.42 4.32 −2.24 4.55 2.94 5.49 24.09 6.49 46.76 
116 4.75 5.07 6.79 5.38 13.16 8.77 84.69 6.21 30.68 
117 5.54 5.32 −4.02 5.74 3.46 4.43 −20.16 5.86 5.75 
118 4.18 4.23 1.30 4.86 16.37 4.32 3.48 5.45 30.55 
119 3.41 3.56 4.45 3.65 7.17 3.76 10.35 5.00 46.68 
120 3.36 3.85 14.64 3.89 15.83 3.67 9.11 4.79 42.58 
Mean relative error() 6.57 9.25 20.49 23.28 
GSA-SARIMA
GSA-ARIMA
SARIMA
ARIMA
MonthObservedPredictedRE/%PredictedRE/%PredictedRE/%PredictedRE/%
99 7.38 7.14 −3.33 6.46 −12.45 6.57 −11.07 7.24 −1.95 
100 7.30 6.25 −14.49 5.34 −26.92 5.83 −20.17 5.85 −19.97 
101 5.21 5.05 −3.01 4.87 −6.56 6.33 21.50 4.89 −6.25 
102 3.45 3.99 15.68 3.70 7.31 5.20 50.94 3.94 14.36 
103 3.46 3.70 6.95 3.82 10.62 4.05 17.17 3.31 −4.22 
104 3.73 3.89 4.48 3.81 2.21 3.87 3.73 3.72 −0.08 
105 2.69 2.29 −15.06 2.83 5.15 3.71 37.82 4.54 68.84 
106 5.00 4.90 −2.13 5.20 3.83 3.67 −26.65 5.75 14.98 
107 6.82 5.74 −15.83 5.73 −15.97 5.60 −17.84 7.04 3.27 
108 6.50 6.34 −2.41 5.80 −10.80 5.89 −9.33 7.33 12.71 
109 4.99 5.77 15.63 5.80 16.20 6.03 20.78 7.25 45.09 
110 5.75 5.40 −6.17 6.61 14.88 6.07 5.62 7.09 23.33 
111 7.67 7.95 3.66 7.45 −2.87 9.37 22.14 6.84 −10.80 
112 11.54 11.06 −4.17 11.36 −1.56 9.87 −14.46 6.76 −41.41 
113 12.17 10.48 −13.91 12.18 0.09 10.16 −16.53 6.69 −45.00 
114 8.28 7.97 −3.72 7.69 −7.15 10.24 23.65 6.61 −20.16 
115 4.42 4.32 −2.24 4.55 2.94 5.49 24.09 6.49 46.76 
116 4.75 5.07 6.79 5.38 13.16 8.77 84.69 6.21 30.68 
117 5.54 5.32 −4.02 5.74 3.46 4.43 −20.16 5.86 5.75 
118 4.18 4.23 1.30 4.86 16.37 4.32 3.48 5.45 30.55 
119 3.41 3.56 4.45 3.65 7.17 3.76 10.35 5.00 46.68 
120 3.36 3.85 14.64 3.89 15.83 3.67 9.11 4.79 42.58 
Mean relative error() 6.57 9.25 20.49 23.28 

Note: RE stands for Relative Error.

It can be seen that the GSA-SARIMA model performs most consistently, within ±20% error for each monthly runoff prediction. The average relative error of this model is 6.57%, which is the smallest among the models. Furthermore, after the optimisation of the GSA algorithm, the average relative errors of the GSA-ARIMA and GSA-SARIMA models are both less than 10%, meeting the engineering forecasting requirements. To provide a clearer view of the predictions of each model, the predicted values of each model are plotted against the true values of runoff in Figure 9.
Figure 9

Comparison of prediction results between models.

Figure 9

Comparison of prediction results between models.

Close modal

Discussion

The predictions of each model are plotted as a scatter plot with the observed values and a linear correlation coefficient can be calculated by a first order fit. The fit is illustrated in Figure 10.
Figure 10

Scatter plot of predicted results for each model.

Figure 10

Scatter plot of predicted results for each model.

Close modal
The scatter plot demonstrates that the ARIMA model is the worst fit, owing to its inability to accommodate seasonal variations in runoff. Both the SARIMA model and the GSA-ARIMA model perform better than the original ARIMA model, suggesting that the model accuracy can be significantly improved by both seasonal differencing and the introduction of an intelligent optimisation algorithm. Moreover, the GSA-SARIMA model provides the best fit, with a first-order linear correlation coefficient of 0.9351 and a first-order linear equation close to y=x, which suggests that the predicted values exhibit a significant positive correlation with the observed values. In terms of model training, we have compared the training process of the GSA-ARIMA model with that of the GSA-SARIMA model, as demonstrated in Figure 11.
Figure 11

GSA optimization of the covariance training process.

Figure 11

GSA optimization of the covariance training process.

Close modal

The GSA-SARIMA model plateaus earlier than the GSA-ARIMA model, and the loss value of training decreases rapidly within 50 steps, which demonstrates that the GSA-SARIMA model has a faster learning speed. Additionally, the loss value for the GSA-SARIMA model to reach a steady state is less than 0.1, showing a superior training effect. According to the evaluation indicators in section 1.6, the evaluation parameters for each model are calculated as follows in Table 9.

Table 9

Comparative analysis of prediction accuracy among models

Predictive modelMean relative error/%Root mean square error/m3Nash efficiency coefficient
ARIMA 23.28 3.78 0.34 
SARIMA 20.49 2.08 0.58 
GSA-ARIMA 9.25 0.50 0.86 
GSA-SARIMA 6.57 0.21 0.91 
Predictive modelMean relative error/%Root mean square error/m3Nash efficiency coefficient
ARIMA 23.28 3.78 0.34 
SARIMA 20.49 2.08 0.58 
GSA-ARIMA 9.25 0.50 0.86 
GSA-SARIMA 6.57 0.21 0.91 

As can be found in Table 9, the average relative error and root mean square error of the model are both decreasing in order from top to bottom, while the Nash efficiency coefficient tends increasingly to 1, which shows a positive trend towards model prediction accuracy. The purpose of this study is to perform seasonal differencing to remove the effects of seasonal variation, while another method to achieve the same goal is to carry out filter decomposition (Zhao et al. 2021; Zhang et al. 2022). It is possible in the future to make various approaches in different situations, and also to compare and analyse the models constructed by the two approaches to study the adaptability, strengths and weaknesses of respective approaches (Wang & Zhou 2020).

  • (1)

    A novel runoff prediction model, the GSA-SARIMA model, is developed in this study. The modelling process is as follows: the 99th month is analysed as the abrupt change point in the runoff data by the MK test, so the 99th month is regarded as the dividing point between the training set and the prediction set. Subsequently, we have constructed the SARIMA model by performing seasonal differencing based on the ARIMA model. In order to enhance the stability of the SARIMA model, the model fixed parameters are optimised by the GSA gravity search algorithm at the early stage of modelling. Ultimately, the GSA-SARIMA model is constructed.

  • (2)

    By applying it to the prediction of the runoff series of the Xianyang section of the Weihe River from 2010 to 2019, the results show that the GSA gravitational search algorithm has a strong local search capability, and the model with seasonal differencing tends to have improved prediction performance. The analysis of the evaluation metrics revealed that the GSA-SARIMA model features the highest linear correlation coefficient and Nash efficiency coefficient, which are 0.9351 and 0.91. The GSA-SARIMA model is also accompanied by lower mean relative errors and root mean square errors, with 6.57 and 0.21 respectively. The model outperforms the other models developed in all respects, making its application to the prediction of runoff data extremely promising.

  • (3)

    Although the developed GSA-SARIMA model is outstanding in all aspects, the scope of this study is localized. In the future, this method can be applied to regions with different substrates and varying climates for a deeper level of exploration. This study only provides short-term predictions of river runoff, and the physical mechanisms behind the changes in runoff are not considered. Future research may develop predictive studies from the physical mechanisms.

Not applicable.

Not applicable.

Not applicable.

The authors declare no competing interests.

Zhang XQ: Methodology, Investigation, Formal analysis. Wu XL: Conceptualization, Writing-Original draft preparation. Zhu GY: Methodology, project administration. Lu XB: Conceptualization, Resources. Wang Kai: Methodology, Formal analysis.

The authors wish to thank the Key Scientific Research Project of Colleges and Universities in Henan Province (CN) [grant numbers 17A570004] for the collection, analysis and interpretation of data.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Alifujiang
Y.
,
Abuduwaili
J.
,
Maihemuti
B.
,
Emin
B.
&
Groll
M.
2020
Innovative trend analysis of precipitation in the lake Issyk-Kul Basin. Kyrgyzstan
.
Atmosphere
11
(
4
),
332
.
https://doi.org/ 10.3390/atmos11040332
.
Banihabib
M. E.
,
Ahmadian
A.
&
Valipour
M.
2018
Hybrid MARMA-NARX model for flow forecasting based on the large-scale climate signals, sea-surface temperatures, and rainfall
.
Nordic Hydrology
49
(
5–6
),
1788
1803
.
https://doi.org/ 10.2166/nh.2018.145
.
Box
G.
,
Jenkins
G. M.
&
Reinsel
G. C.
1976
Time series analysis forecasting and control – Rev. ed
.
Journal of Time
31
(
2
),
238
242
.
https://doi.org/ 10.1111/j.1467-9892.2009.00643.x
.
Boyd
G.
,
Na
D.
,
Li
Z.
,
Snowling
S.
,
Zhang
Q.
&
Zhou
P.
2019
Influent forecasting for wastewater treatment plants in North America
.
Sustainability
11
(
6
),
1764
.
https://doi.org/ 10.3390/su11061764
.
Farsi
M.
,
Hosahalli
D.
,
Manjunatha
B. R.
,
Gad
I. M.
&
Ghoneim
O. A.
2020
Parallel genetic algorithms for optimizing the SARIMA model for better forecasting of the NCDC weather data
.
AEJ – Alexandria Engineering Journal
60
(
1
),
1299
1316
.
https://doi.org/ 10.1016/j.aej.2020.10.052
.
Fashae
O. A.
,
Olusola
A. O.
,
Ndubuisi
I.
&
Udomboso
C. G.
2018
Comparing ANN and ARIMA model in predicting the discharge of River Opeki from 2010 to 2020
.
River Research and Applications
35
,
169
177
.
https://doi.org/ 10.1002/rra.3391
.
Fouli
H.
,
Fouli
R.
,
Bashir
B.
&
Loni
O. A.
2018
Seasonal forecasting of rainfall and runoff volumes in Riyadh Region, KSA
.
KSCE Journal of Civil Engineering
22
(
7
),
2637
2647
.
https://doi.org/ 10.1007/s12205-017-0726-3
.
Garg
N.
,
Soni
K.
,
Saxena
T. K.
&
Maji
S.
2015
Applications of AutoRegressive Integrated Moving Average (ARIMA) approach in time-series prediction of traffic noise pollution
.
Noise Control Engineering Journal
63
(
2
),
182
194
.
https:// info:doi/10.3397/1/376317
.
Hao
C. F.
,
Qiu
J.
&
Li
F. F.
2017
Methodology for analyzing and predicting the runoff and sediment into a reservoir
.
Water
9
(
6
),
440
440
.
https://doi.org/ 10.3390/w9060440
.
Moeeni
H.
,
Bonakdari
H.
&
Ebtehaj
I.
2017
Monthly reservoir inflow forecasting using a new hybrid SARIMA genetic programming approach
.
Journal of Earth System Science
126
(
2
),
18
.
https://doi.org/ 10.1007/s12040-017-0798-y
.
Niu
W. J.
,
Feng
Z. K.
,
Zeng
M.
,
Feng
B. F.
,
Min
Y. W.
,
Cheng
C. T.
&
Zhou
J. Z.
2019
Forecasting reservoir monthly runoff via ensemble empirical mode decomposition and extreme learning machine optimized by an improved gravitational search algorithm
.
Applied Soft Computing
82
,
105589
105589
.
https://doi.org/ 10.1016/j.asoc.2019.105589
.
Niu
W. J.
,
Feng
Z. K.
,
Chen
Y. B.
,
Zhang
H. R.
&
Cheng
C. T.
2020
Annual streamflow time series prediction using extreme learning machine based on gravitational search algorithm and variational mode decomposition
.
Journal of Hydrologic Engineering
25
(
5
),
04020008
.
https://doi.org/ 10.1061/(ASCE)HE.1943-5584.0001902
.
Nury
A. H.
,
Hasan
K.
&
Alam
M.
2017
Comparative study of wavelet-ARIMA and wavelet-ANN models for temperature time series data in northeastern Bangladesh
.
Journal of King Saud University – Science
29
(
1
),
47
61
.
https://doi.org/ 10.1016/j.jksus.2015.12.002
.
Pelusi
D.
,
Mascella
R.
,
Tallini
L.
,
Nayak
J.
&
Deng
Y.
2019
Improving exploration and exploitation via a hyperbolic gravitational search algorithm
.
Knowledge-Based Systems
193
,
105404
.
https://doi.org/ 10.1016/j.knosys.2019.105404
.
Razak
N. A.
,
Aris
A. Z.
,
Ramli
M. F.
,
Looi
L. J.
&
Juahir
H.
2018
Temporal flood incidence forecasting for Segamat River (Malaysia) using autoregressive integrated moving average modelling
.
Journal of Flood Risk Management
11
,
794
804
.
https://doi.org/ 10.1111/jfr3.12258
.
Rizeei
H. M.
,
Pradhan
B.
&
Saharkhiz
M. A.
2018
Surface runoff prediction regarding LULC and climate dynamics using coupled LTM, optimized ARIMA, and GIS-based SCS-CN models in tropical region
.
Arabian Journal of Geosciences
11
(
3
),
53
.
https://doi.org/10.1007/s12517-018-3397-6
.
Song
P.
,
Liu
W.
,
Sun
J.
,
Wang
C.
&
Wang
H.
2020
Annual runoff forecasting based on multi-model information fusion and residual error correction in the Ganjiang River Basin
.
Water
12
(
8
),
2086
.
https://doi.org/ 10.3390/w12082086
.
Tuo
W.
,
Zhang
X. Q.
,
Song
C.
,
Hu
D. K.
&
Liang
T. Y.
2020
Annual precipitation analysis and forecasting – taking Zhengzhou as an example
.
Water Supply
20
(
5
),
1604
1616
.
https://doi.org/ 10.2166/ws.2020.067
.
Valipour
M.
2015
Long-term runoff study using SARIMA and ARIMA models in the United States
.
Meteorological Applications
22
(
3
),
592
598
.
https://doi.org/ 10.1002/met.1491
.
Wang
T.
&
Zhou
Y.
2020
Forecasting the Yellow River runoff based on functional data analysis methods
.
Environmental and Ecological Statistics
28
(
1
),
1
20
.
https://doi.org/ 10.1007/s10651-020-00469-x
.
Wang
W. C.
,
Chau
K. W.
,
Xu
D. M.
&
Chen
X. Y.
2015
Improving forecasting accuracy of annual runoff time series using ARIMA based on EEMD decomposition
.
Water Resources Management
29
(
8
),
2655
2675
.
https://doi.org/ 10.1007/s11269-015-0962-6
.
Wang
Z. Y.
,
Qiu
J.
&
Li
F. F.
2018
Hybrid models combining EMD/EEMD and ARIMA for long-term streamflow forecasting
.
Water
10
(
7
),
853
.
https://doi.org/ 10.3390/w10070853
.
Wang
Y.
,
Gao
S.
,
Yu
Y.
,
Wang
Z. Q.
,
Cheng
J. J.
&
Yuki
T.
2020
A gravitational search algorithm with chaotic neural oscillators
.
IEEE Access
8
(
99
),
25938
25948
.
https://doi.org/ 10.1109/ACCESS.2020.2971505
.
Wu
P.
,
Liang
S.
,
Wang
X. S.
,
Mckenzie
J. M.
,
Jeffrey
M.
&
Feng
Y. Q.
2020
Climate change impacts on cold season runoff in the headwaters of the Yellow River considering frozen ground degradation
.
Water
12
(
2
),
602
.
https://doi.org/ 10.3390/w12020602
.
Yagbasan
O.
,
Demir
V.
&
Yazicigil
H.
2020
Trend analyses of meteorological variables and lake levels for two shallow lakes in central Turkey
.
Water
12
(
2
),
414
.
https://doi.org/ 10.3390/w12020414
.
Zhang
X.
,
Wei
T.
&
Song
C.
2019
Application of MEEMD-ARIMA combining model for annual runoff prediction in the Lower Yellow River
.
Journal of Water and Climate Change
11
(
3
),
865
876
.
https://doi.org/ 10.2166/wcc.2019.271
.
Zhang
X.
,
Wu
X.
,
He
S.
&
Zhao
D.
2021
Precipitation forecast based on CEEMD-LSTM coupled model
.
Water Science & Technology Water Supply
21
(
8
),
4641
4657
.
https://doi.org/ 10.2166/ws.2021.237
.
Zhang
X.
,
Zhao
D.
&
Duan
B.
2022
A novel groundwater burial depth prediction model – based on the combined VMD-WSD-ELMAN model
.
Environmental Science and Pollution Research
.
https://doi.org/10.1007/s11356-022-21209-7
.
Zhao
L. T.
,
Miao
J.
,
Qu
S.
&
Chen
X. H.
2021
A multi-factor integrated model for carbon price forecasting: market interaction promoting carbon emission reduction
.
Science of the Total Environment
796
,
149110
.
https://doi.org/ 10.1016/j.scitotenv.2021.149110
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).