Abstract
Precise and credible runoff forecasting is extraordinarily vital for various activities of water resources deployment and implementation. The neoteric contribution of the current article is to develop a hybrid model (ANFIS-GPR) based on adaptive neuro-fuzzy inference system (ANFIS) and Gaussian process regression (GPR) for monthly runoff forecasting in the Beiru river of China, and the optimal input schemes of the models are discussed in detail. Firstly, variables related to runoff are selected from the precipitation, soil moisture content, and evaporation as the first set of input schemes according to correlation analysis (CA). Secondly, principal component analysis (PCA) is used to eliminate the redundant information between the original input variables for forming the second set of input schemes. Finally, the runoff is predicted based on different input schemes and different models, and the prediction performance is compared comprehensively. The results show that the input schemes jointly established by CA and PCA (CA-PCA) can greatly improve the prediction accuracy. ANFIS-GPR displays the best forecasting performance among all the peer models. In the single models, the performance of GPR is better than that of ANFIS.
HIGHLIGHTS
Principal component analysis is used to simplify the prediction factors and improve the prediction accuracy.
The novel contribution of the article is to develop a new hybrid model, i.e., ANFIS-GPR.
The new model (ANFIS-GPR) is more accurate and reliable in predicting the peak discharge.
The research results are more reliable because there is no large water conservancy project in the target basin.
INTRODUCTION
Improving the accuracy and reliability of runoff prediction plays a key role in various activities of water resources deployment and implementation, such as allocation of water, drought control, flood mitigation, dam planning, and designing for navigation (Allawi & El-Shafie 2016; Yaseen et al. 2016; Yuan et al. 2018; Liu et al. 2021). In particular, runoff prediction at monthly time scales is very significant for reservoir operation, water planning, and irrigation management (Chen et al. 2014; Yuan et al. 2015; Buizer et al. 2016; Liang et al. 2017; Hadi & Tombul 2018). Moreover, runoff prediction is an important non-engineering measure to realize the efficient utilization of water resources, so it has always been a hot and core issue in hydrology.
A novel hybrid ANFIS-GPR model is developed by coupling ANFIS and GPR models based on the Lagrange multiplier method.
The influence of different inputs on the prediction accuracy of the model is evaluated.
Performance of ANFIS-GPR in forecasting runoff is assessed using RMSE, NSC, and CORR.
METHODOLOGY
Selection method of predictors
Correlation analysis
Principal component analysis
Principal component analysis (PCA) is a multivariate statistical analysis tool that selects a small number of important variables by a linear transformation of multiple variables. This method can produce several independent principal components (PCs), which can approximately replace the original variables to express the main information (Wold et al. 1987).
Construction of coupled prediction model
Adaptive neural-fuzzy inference system
Gaussian process regression model
Gaussian process regression (GPR) is an ML tool based on Bayesian theory (Quiñonero-Candela & Rasmussen 2005). The main principle of GPR can be summarized as follows. Assume that the joint distribution of finite random variables obeys Gaussian distribution in the sample space, and the random process satisfies the prior probability of the Gaussian process. Therefore, GPR can follow Bayesian theory to estimate the posterior probability of the random process, and the maximum likelihood method is used to estimate the optimal hyperparameters in this paper.
ANFIS-GPR model
The construction idea of the ANFIS-GPR model is to minimize the prediction error using optimization. The specific approach can be briefly described as follows. Firstly, ANFIS and GPR are used for runoff prediction to obtain the predicted values, respectively. Secondly, the weight parameters (w1 and w2) of the predicted values of the two models are calculated. Finally, the final predicted values are combined and calculated.
The key to improving the prediction accuracy of ANFIS-GPR lies in finding appropriate weights w1 and w2 to minimize the error from Equation (24), and this is a typical optimization issue. The Lagrange multiplier method is used to solve it (Bertsekas & Dimitri 1982), and the steps are as follows.
Obviously, the method of constructing the ANFIS-GPR model is different from IATNN and MMCS, but the model coupling problem is transformed into an optimization problem, and the classical mathematical method – Lagrange multiplier method is introduced to solve it, realizing the organic combination of mathematical theory and neural network model.
STUDY AREA AND DATA
Case study site
Data collection and preparation
Hydro-meteorological variables from 1/1985 to 12/2016, including the monthly precipitation (P), SMC, evaporation (E), and runoff (R), are gathered from Ruzhou hydrological station. According to the production and confluence theory, P, SMC, and E are the key factors affecting R. The potential runoff-driving factors can be established with 12 months as the maximum lead time, including 36 (12 × 3) variables. The details can be described as follows: The runoff-driving factors of P, SMC, and E are expressed as P(t − i), SMC(t − i), and E(t − i), respectively. Among them, the index signs i = 1, 2, ···, 12 indicate lead time. For example, P(t − i) indicates P 1 month ahead of R, and P(t − 12) indicates P 12 months ahead of R, so 12 alternative predictors can be formed based on P. If E and SMC are all counted together, there are 36 predictors.
DEVELOPMENT OF THE MODELS FOR RUNOFF PREDICTION
Selection of forecast factors
Construction of the input schemes
The structure of the prediction models is determined by the input schemes because these variables can affect the debugging of model parameters. Therefore, the best input schemes are confirmed by trial and error. The specific approach is divided into the following three steps. Firstly, as shown in Table 1, seven input schemes are constructed based on the conclusions in section 4.1. ANFIS or GPR is used to evaluate the prediction performance of different input schemes to select better schemes. Secondly, PCA is used to conduct linear dimensional-reduction recombination (the cumulative contribution rate is set as 0.9) based on the better schemes to form the new input schemes. Finally, all the input schemes are compared to determine the final model input schemes.
Input scheme . | Input structure . | Number of variables . |
---|---|---|
M1 | P(t − 1), SMC(t − 1) | 2 |
M2 | P(t − 1), SMC(t − 1), P(t − 2) | 3 |
M3 | P(t − 1), SMC(t − 1), P(t − 2), P (t − 12) | 4 |
M4 | P(t − 1), SMC(t − 1), P(t − 2), P (t − 12), E(t − 7) | 5 |
M5 | P(t − 1), SMC(t − 1), P(t − 2), P (t − 12), E(t − 7), E(t − 8) | 6 |
M6 | P(t − 1), SMC(t − 1), P(t − 2), P (t − 12), E(t − 7), E(t − 8), E(t − 9) | 7 |
M7 | P(t − 1), SMC(t − 1), P(t − 2), P (t − 12), E(t − 7), E(t − 8), E(t − 9), E(t − 6) | 8 |
Input scheme . | Input structure . | Number of variables . |
---|---|---|
M1 | P(t − 1), SMC(t − 1) | 2 |
M2 | P(t − 1), SMC(t − 1), P(t − 2) | 3 |
M3 | P(t − 1), SMC(t − 1), P(t − 2), P (t − 12) | 4 |
M4 | P(t − 1), SMC(t − 1), P(t − 2), P (t − 12), E(t − 7) | 5 |
M5 | P(t − 1), SMC(t − 1), P(t − 2), P (t − 12), E(t − 7), E(t − 8) | 6 |
M6 | P(t − 1), SMC(t − 1), P(t − 2), P (t − 12), E(t − 7), E(t − 8), E(t − 9) | 7 |
M7 | P(t − 1), SMC(t − 1), P(t − 2), P (t − 12), E(t − 7), E(t − 8), E(t − 9), E(t − 6) | 8 |
Data used and normalization
As mentioned above in 3.2, the data series from 1985 to 2016 is divided into two parts, i.e., the training datasets (1985–2008) and the test datasets (2009–2016). The training datasets are applied to train the structure of the models, and the testing datasets serve to verify the performance of the models.
Performance indices
Parameter settings
The ANFIS, GPR, and ANFIS-GPR models are developed in a Matlab environment. The computations related to all models, including ANFIS, GPR, and ANFIS-GPR, are implemented in a computer with Inter core i5, 2.3-GHz CPU, and 16GB of RAM. The parameters of ANFIS are optimized by the backpropagation algorithm in the first and fourth layers. The Gaussian function is used as MFs. Fuzzy C-means clustering algorithm (FCM) is used to generate the primordial FIS, which can overcome the dimension disaster caused by excessive fuzzy rules in the ANFIS. The training and testing procedures of the ANFIS model are carried out from scratch for the mentioned datasets (Fattahi 2016). The maximum likelihood is calculated by directly maximizing the posterior distribution of the hyperparameters (σ and l) of GPR. The remaining key parameter settings are shown in Tables 2–4.
Parameter . | Description/value . |
---|---|
Number of clusters | 10 |
Partition matrix exponent | 2 |
Maximum number of iterations | 100 |
Minimum improvement | 0.00001 |
Parameter . | Description/value . |
---|---|
Number of clusters | 10 |
Partition matrix exponent | 2 |
Maximum number of iterations | 100 |
Minimum improvement | 0.00001 |
Parameter . | Description/value . |
---|---|
Maximum number of epochs | 300 |
Error goal | 0 |
Initial step size | 1.1 |
Step size decrease rate | 0.9 |
Step size increase rate | 1.1 |
Parameter . | Description/value . |
---|---|
Maximum number of epochs | 300 |
Error goal | 0 |
Initial step size | 1.1 |
Step size decrease rate | 0.9 |
Step size increase rate | 1.1 |
Parameter . | Description/value . |
---|---|
Kernel function | Mean excess function |
Covariance function | Rational quadratic covariance function |
Likelihood function | Gaussian likelihood function |
Parameter . | Description/value . |
---|---|
Kernel function | Mean excess function |
Covariance function | Rational quadratic covariance function |
Likelihood function | Gaussian likelihood function |
RESULTS AND DISCUSSION
Determination of the best input schemes
ANFIS and GPR are applied to evaluate the availability of seven different input schemes, i.e., M1, M2, M3, M4, M5, M6, and M7 are used as the inputs of ANFIS and GPR to predict R in turn, to obtain the performance indices (RMSE, NSC, and CORR) for comparative analysis. Due to the instability of ANFIS prediction, the average prediction statistics of ANFIS are obtained after running 10 times, while the stability of GPR prediction is good, so the prediction statistics of GPR are collected after running once. The RMSE, NSC, and CORR of ANFIS and GPR using M1, M2, M3, M4, M5, M6, and M7 in the testing period are shown in Table 5, and the best results are significantly signed. There is a significant trend between the performance of models and the input schemes. That is, the prediction accuracy of the models continues to increase as the number of variables in input schemes increases. In other words, models with more input variables usually show higher performance, such as M5, M6, and M7. Moreover, the results also reflect that the most satisfactory models’ prediction accuracy cannot be obtained by simply increasing the number of predictors. We can find that the prediction performance of models with M6 is better than that of models with M7, indicating that too many predictors cannot improve the prediction accuracy, but reduces the prediction performance of the models due to a large amount of redundant information.
Input scheme . | ANFIS . | GPR . | ||||
---|---|---|---|---|---|---|
RMSE . | NSC . | CORR . | RMSE . | NSC . | CORR . | |
M1 | 8.4766 | 0.4605 | 0.4578 | 8.1814 | 0.5954 | 0.5329 |
M2 | 8.4247 | 0.3217 | 0.3959 | 7.6445 | 0.5651 | 0.4744 |
M3 | 5.7224 | 0.5972 | 0.4665 | 7.1064 | 0.6006 | 0.4994 |
M4 | 6.8289 | 0.6474 | 0.5946 | 6.7069 | 0.7082 | 0.5468 |
M5 | 7.1524 | 0.5266 | 0.5529 | 6.6052 | 0.7284 | 0.5858 |
M6 | 7.4154 | 0.6784 | 0.5979 | 6.5725 | 0.7113 | 0.5918 |
M7 | 7.8122 | 0.4177 | 0.5285 | 6.4937 | 0.6705 | 0.5886 |
Input scheme . | ANFIS . | GPR . | ||||
---|---|---|---|---|---|---|
RMSE . | NSC . | CORR . | RMSE . | NSC . | CORR . | |
M1 | 8.4766 | 0.4605 | 0.4578 | 8.1814 | 0.5954 | 0.5329 |
M2 | 8.4247 | 0.3217 | 0.3959 | 7.6445 | 0.5651 | 0.4744 |
M3 | 5.7224 | 0.5972 | 0.4665 | 7.1064 | 0.6006 | 0.4994 |
M4 | 6.8289 | 0.6474 | 0.5946 | 6.7069 | 0.7082 | 0.5468 |
M5 | 7.1524 | 0.5266 | 0.5529 | 6.6052 | 0.7284 | 0.5858 |
M6 | 7.4154 | 0.6784 | 0.5979 | 6.5725 | 0.7113 | 0.5918 |
M7 | 7.8122 | 0.4177 | 0.5285 | 6.4937 | 0.6705 | 0.5886 |
The bold values represent where the optimal performance indicator values occur.
Input scheme . | Types of variables . | Number of variables . |
---|---|---|
M5-PCA | Recombinational data | 5 |
M6-PCA | Recombinational data | 6 |
M7-PCA | Recombinational data | 7 |
Input scheme . | Types of variables . | Number of variables . |
---|---|---|
M5-PCA | Recombinational data | 5 |
M6-PCA | Recombinational data | 6 |
M7-PCA | Recombinational data | 7 |
Runoff prediction based on ANFIS, GPR, and ANFIS-GPR
Since the input schemes (i.e., M5-PCA, M6-PCA, and M7-PCA) have been proven to be effective in improving prediction accuracy, they served as the preferred input schemes for the models in this part. Because of the instability of ANFIS, the ANFIS-GPR model is also run 10 times in the same way. The average performances of ANFIS, GPR, and ANFIS-GPR for runoff prediction in the testing phase are shown in Table 7, respectively. It is seen that the prediction performances of ANFIS-GPR are better than ANFIS and GPR in terms of RMSE and NSC, and only CORR shows that ANFIS-GPR has a slightly lower performance than GPR. The results show that the proposed prediction technique is superior to other methods. Besides, we can find that the prediction performance of GPR is better than that of ANFIS, indicating that the probabilistic prediction model is more suitable than the coupled model for runoff prediction. Meanwhile, Table 8 shows the statistical performance of ANFIS and GPR models during the training period. It can be seen that GPR outperforms ANFIS in terms of RMSE, NSC, and CORR, demonstrating that GPR's robustness is better than that of ANFIS.
Input scheme . | M5-PCA . | M6-PCA . | M7-PCA . | ||||||
---|---|---|---|---|---|---|---|---|---|
Model . | RMSE . | NSC . | CORR . | RMSE . | NSC . | CORR . | RMSE . | NSC . | CORR . |
ANFIS | 1.0336 | 0.9966 | 0.9985 | 1.5985 | 0.9907 | 0.9964 | 1.2116 | 0.9952 | 0.9979 |
GPR | 0.8286 | 0.9978 | 0.9990 | 0.8843 | 0.9976 | 0.9989 | 0.8130 | 0.9979 | 0.9991 |
ANFIS-GPR | 0.7637 | 0.9991 | 0.9982 | 0.8453 | 0.9989 | 0.9978 | 0.7776 | 0.9991 | 0.9981 |
Input scheme . | M5-PCA . | M6-PCA . | M7-PCA . | ||||||
---|---|---|---|---|---|---|---|---|---|
Model . | RMSE . | NSC . | CORR . | RMSE . | NSC . | CORR . | RMSE . | NSC . | CORR . |
ANFIS | 1.0336 | 0.9966 | 0.9985 | 1.5985 | 0.9907 | 0.9964 | 1.2116 | 0.9952 | 0.9979 |
GPR | 0.8286 | 0.9978 | 0.9990 | 0.8843 | 0.9976 | 0.9989 | 0.8130 | 0.9979 | 0.9991 |
ANFIS-GPR | 0.7637 | 0.9991 | 0.9982 | 0.8453 | 0.9989 | 0.9978 | 0.7776 | 0.9991 | 0.9981 |
The bold values represent where the optimal performance indicator values occur.
Input scheme . | M5-PCA . | M6-PCA . | M7-PCA . | ||||||
---|---|---|---|---|---|---|---|---|---|
Model . | RMSE . | NSC . | CORR . | RMSE . | NSC . | CORR . | RMSE . | NSC . | CORR . |
ANFIS | 0.3593 | 0.9994 | 0.9997 | 0.4423 | 0.9991 | 0.9995 | 0.3829 | 0.9993 | 0.9997 |
GPR | 0.2861 | 0.9996 | 0.9997 | 0.3087 | 0.9993 | 0.9997 | 0.2788 | 0.9996 | 0.9998 |
Input scheme . | M5-PCA . | M6-PCA . | M7-PCA . | ||||||
---|---|---|---|---|---|---|---|---|---|
Model . | RMSE . | NSC . | CORR . | RMSE . | NSC . | CORR . | RMSE . | NSC . | CORR . |
ANFIS | 0.3593 | 0.9994 | 0.9997 | 0.4423 | 0.9991 | 0.9995 | 0.3829 | 0.9993 | 0.9997 |
GPR | 0.2861 | 0.9996 | 0.9997 | 0.3087 | 0.9993 | 0.9997 | 0.2788 | 0.9996 | 0.9998 |
The bold values represent where the optimal performance indicator values occur.
CONCLUSIONS
Improving runoff prediction accuracy is critical to water resources management and water conservancy project planning and design. A neoteric combinatorial model ANFIS-GPR for runoff prediction is proposed by merging ANFIS with GPR in this article. Firstly, the runoff forecasting outputs by ANFIS and GPR models are collected separately. Secondly, w1 and w2 are given to the different outputs from ANFIS and GPR. Finally, the issue of solving the weights is transformed into an optimization problem. The Lagrange multiplier method is used to calculate the weights, and the predicted values of ANFIS-GPR are obtained.
Meanwhile, the influence of different inputs on prediction accuracy is also studied. The results demonstrated that the relationship between prediction accuracy and model inputs is sensitive. It is not that the more input variables of the models are, the more prediction accuracy will be. That is, the prediction accuracy is closely related to the effective information provided by the input variables. The input schemes M5-PCA, M6-PCA, and M7-PCA based on the principal component extraction strategy are proven to be able to improve the prediction accuracy effectively. Thus, ANFIS, GPR, and ANFIS-GPR are used for runoff prediction with them (i.e., M5-PCA, M6-PCA, and M7-PCA). Results show that the ANFIS-GPR outperformed the ANFIS and GPR for runoff prediction in terms of RMSE, NSC, and R2, and GPR outperformed the ANFIS-GPR in terms of CORR. In the single models, the performance of GPR is better than that of ANFIS.
Comparing the measures indicates that the coupling strategy of the ANFIS and GPR combination has more advantages in improving the accuracy of runoff prediction. Significantly, the ANFIS-GPR is more accurate than others in predicting the runoff peak values. Therefore, the runoff prediction project combining CA-PCA and ANFIS-GPR is the first choice in the study area.
AVAILABILITY OF DATA AND MATERIALS
The data that support the funding of this study are available from the corresponding author upon reasonable request.
FUNDING
This work is supported by the National Natural Science Foundation of China (No. 52069005 and 62163008) and Guizhou Provincial Science and Technology Projects (Guizhou Science Foundation-ZK[2021] General 295 and [2020]1Y248),and Science and Technology Special Funds of Guizhou Water Resources Department (KT202232), and Startup Project for High-level Talents of Guizhou Institute of Technology (XJGC20210425) and special thanks are given to the anonymous reviewers and editors for their constructive comments.
AUTHOR CONTRIBUTIONS
Zhennan Liu conceptualized the whole article and developed the methodology. Jingnan Zhou wrote the original draft. Xianzhong Zeng conducted data curation. Xiaoyu Wang wrote the program. Weiguo Jiao wrote the review. Min Xu edited the article. Anjie Wu visualized the data.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.