Many nonlinear models have been proposed to forecast groundwater level. However, the evidence of chaos in groundwater levels in landslide has not been explored. In addition, linear correlation analyses are used to determine the input and output variables for the nonlinear models. Linear correlation analyses are unable to capture the nonlinear relationships between the input and output variables. This paper proposes to use chaos theory to select the input and output variables for nonlinear models. The nonlinear model is constructed based on support vector machine (SVM). The parameters of SVM are obtained by particle swarm optimization (PSO). The proposed PSO-SVM model based on chaos theory (chaotic PSO-SVM) is applied to predict the daily groundwater levels in Huayuan landslide and the weekly, monthly groundwater levels in Baijiabao landslide in the Three Gorges Reservoir Area in China. The results show that there are chaos characteristics in the groundwater levels. The linear correlation analysis based PSO-SVM (linear PSO-SVM) and chaos theory-based back-propagation neural network (chaotic BPNN) are also applied for the purpose of comparison. The results show that the chaotic PSO-SVM model has higher prediction accuracy than the linear PSO-SVM and chaotic BPNN models for the test data considered.
INTRODUCTION
There are many landslides in the Three Gorges Reservoir Area especially after the impoundment of the Three Gorges Reservoir. The instability of many reservoir landslides has been related to the dramatic changes in the groundwater seepage field (Asch et al. 2009; Zhang et al. 2012). Therefore, the prediction of groundwater levels is of critical importance for landslide prevention (Keqiang et al. 2015).
Generally speaking, groundwater level prediction models include physically based and data-based models (Adamowski & Chan 2011). Some physically based models such as the ARX model (Knotters & Bierkens 2000), process-based spatio-temporal model (Schmidt & Dikau 2004) or the water-table fluctuation method (Park & Parker 2008) have been used to simulate and forecast the processes of groundwater level fluctuation. However, physically based models have practical limitations (Knotters & Bierkens 2000, 2002; Nourani et al. 2007). For example, spatial variations and the uncertainty of hydrological investigation also have negative effects on the accuracy of physically based models. Knotters & Bierkens (2001) provided a regionalized ARX model to determine parameters for physically based models considering spatial variability.
It is easy to build data-based models using only an input-output variable approach (Trichakis et al. 2011; Nourani & Komasi 2013). In recent years, many data-based models have been adopted to predict groundwater levels, such as regression models (Adamowski & Feluch 1991; Sahoo & Jha 2013; Raghavendra & Deka 2016), artificial neural networks (ANN) (Nourani et al. 2008; Tsanis et al. 2008; Dash et al. 2010; Mohanty et al. 2010, 2015; Adamowski & Chan 2011; Jalalkamali et al. 2011; Sreekanth et al. 2011; Wang et al. 2012; Maheswaran & Khosa 2013; Atiquzzaman & Kandasamy 2015; Behnia & Rezaeian 2015; Chang et al. 2015), and genetic programming (Fallah-Mehdipour et al. 2013). Moreover, a literature review indicates that linear correlation analysis methods are widely used to select the input and output variables of these data-based models (Daliakopoulos et al. 2005; Nayak et al. 2006; Wong et al. 2007; Chen et al. 2009a; Yang et al. 2009; Chen et al. 2010; Sahoo & Jha 2013; Maiti & Tiwari 2014). The principal component analysis method can be used to reduce the redundant information in the input variables (Jolliffe 2002). Since groundwater level processes are non-linear in unconfined aquifers, a nonlinear model that can capture not only the overall appearance but also the underlying dynamic behavior of all of the nonlinear processes is required.
Both chaos theory (Sivakumar et al. 2001) and fractal theory (Zhang & Yang 2010) can be used to explore the nonlinear dynamic behavior of groundwater level time series. Both of these are sensitive to the initial conditions of the dynamic system, and there are self-similar characteristics in the chaos attractor and the fractal structure (Baas 2002). However, these methods usually explore the nonlinear system from different perspectives. The fractal theory mainly explores the structure of the attractor in geometrical space while the chaos theory mainly explores the evolution characteristic of the nonlinear system from the perspectives of time series (Peitgen et al. 2006).
Recently, chaos theory has been widely used in the nonlinear analysis of hydrological time series (Gutiérrez et al. 2006). Evidence of chaos has been demonstrated in many hydrological phenomena such as water level (Liong et al. 2005), precipitation (Jayawardena & Lai 1994), stream flows (Salas et al. 2005) and rainfall-runoff processes (Sivakumar et al. 2001). However, research on the evidence of chaos in groundwater levels in landslides has been very limited. In this study, based on the finding of evidence of chaos, embedding theory and phase space reconstruction (PSR) method are used to build the chaotic time series model. According to embedding theory (Takens 1981), in the long-term evolution of a chaotic groundwater level time series, information about the hidden states of the whole dynamic system can be preserved through a univariable groundwater level output. It is significant to effectively predict a nonlinear time series using a univariable model because sometimes it is difficult to obtain other correlated variables. The chaotic model is able to do nonlinear prediction using a univariable time series. In the PSR method from the chaotic model (King & Stewart 1992), a univariable groundwater level can be constructed into a multi-dimensional phase-space. As a result, the inputs and output of the nonlinear model can be obtained from the reconstructed multi-dimensional phase spaces.
It is necessary to choose a nonlinear model for chaotic groundwater level model building. As noted previously, ANN models (without the incorporation of chaos theory) have been extensively applied to forecast groundwater levels. ANN has limitations, however, including locally optimal values and the requirement of extensive data. Recently, support vector machines (SVM) were developed for time series prediction (Cortes & Vapnik 1995). SVM models have many advantages, including excellent generalization performance and global optimum. They have gained special attention in many areas such as electronic power prediction (Niu et al. 2010), traffic flow forecasting (Chen et al. 2009b), rainfall and runoff prediction (Tripathi et al. 2006), and landslide prediction (Feng et al. 2004; Yao et al. 2008). Meanwhile, the SVM model without chaos theory has also been used for groundwater level prediction (Behzad et al. 2009; Guzman et al. 2015; Gong et al. 2016). The main problem with SVM is the determination of its parameters. The particle swarm optimization (PSO) algorithm is widely used for SVM parameter selection because of its excellent global search ability (Lin et al. 2008; Fei et al. 2009). A novel PSO-SVM model based on chaos theory (chaotic PSO-SVM) is proposed in this study. It is then used to forecast the daily groundwater levels of the Huayuan landslide area, as well as the weekly and monthly groundwater levels of the Baijiabao landslide area.
To show the excellent generalization capability of PSO-SVM model, the back-propagation neural network model based on chaos theory (chaotic BPNN) is also used to predict the groundwater levels. The results show that the chaotic PSO-SVM model has higher prediction accuracy than the chaotic BPNN model for the test data. To show the better guideline for determining the input and output variables, the PSO-SVM model based on linear correlation analysis (linear PSO-SVM) is also used. The results show that the PSR method is more appropriate to select the optimal input and output variables.
METHOD
The proposed method includes data pre-processing, PSR, evidence of chaos identification, the PSO-SVM model and accuracy assessment. All the programs are implemented in MATLAB R2015b. The PSR and the false nearest neighbor (FNN) algorithms in ChaosToolbox2p9_trial are used. The SVM model is tested and trained using the program libsvm-3.1-[FarutoUltimate 3.111code].
Data pre-processing
Phase space reconstruction
Determination of delay time
Typically, τ is determined using either the correlation analysis method (Aguirre 1995) or the mutual information method (Fraser & Swinney 1986). Both methods are generally suitable for noiseless long chaotic time series, and there are no commonly accepted guidelines for selection of parameters. Meanwhile, if a large value is selected for τ, the difficulty of computing the nonlinear model will increase greatly (Wolf et al. 1985). Therefore, a relatively small τ is usually chosen for the evidence of chaos identification and nonlinear prediction of the finite hydrological and environment time series (Sivakumar et al. 2002; Han & Wang 2009; Huang et al. 2016). In this study, the τ of daily, weekly and monthly groundwater levels are all set to 1, because the measurements of groundwater levels are finite and noisy. Huang et al. (2015) show that reasonably good groundwater level forecasts were obtained when τ was set to 1.
Determination of embedding dimension
The m is the minimum number of state variables required to describe the chaotic system. If m is too large, longer groundwater level time series and more complex computations are needed. As a result, the efficiency of the nonlinear model will be reduced because of data redundancy. If m is too small, the strange attractors cannot be reconstructed (Cao 1997). The FNN method (Kennel et al. 1992) is one of the most popular methods for estimating optimal m because it is insensitive to the finite and noisy data points. It is used in this study.
Evidence of chaos identification
If there are chaos characteristics in the groundwater levels, chaotic models can be used to forecast the groundwater levels. Otherwise, if the groundwater levels are random or periodic time series, chaotic models cannot be used. The chaos characteristics of nonlinear time series are primarily identified by qualitative or quantitative methods (Welch 1967). Qualitative methods identify the chaotic time series through revealing the special spatial structures or frequency features shown in the spatial or frequency domains (Yeragani et al. 1993). Qualitative methods are less specific and have lower accuracy. Thus, it is preferred to identify the chaos characteristics through quantitative methods.
Quantitative methods include the largest Lyapunov exponent (LLE) method (Wolf et al. 1985), correlation dimension method (Broock et al. 1996) and Kolmogorov entropy method (Kosloff & Rice 1981). It is difficult to calculate the Kolmogorov entropy values of groundwater levels because the groundwater level time series are finite and noisy. Meanwhile, the LLE and correlation dimension methods have less stringent requirement than the Kolmogorov entropy method in the length of groundwater level time series. Hence, the LLE and correlation dimension methods are applied to characterize the chaotic system of groundwater levels in this study. The LLE describes the divergence rate of trajectories that starts close but diverges over time. In addition, the correlation dimension is one of the infinite numbers of dimensions in the dimension spectrum which characterizes the multi-fractal structure of the strange attractor (Procaccia et al. 1983). If the evidence of chaos in the groundwater levels can be revealed from the results of both of the LLE and correlation dimension methods, there are chaos characteristics in the groundwater levels. Then the chaotic models can be used to predict the groundwater levels.
Calculation of LLE
Calculation of correlation dimension
Correlation dimension is one of the most efficient methods to identify the evidence of chaos. The method uses a fractal dimension, which is non-integer for chaotic systems. The Grassberger-Procaccia (G-P) approach (Grassberger & Procaccia 1983) is suitable for finite time series and easy to implement. It is used in this study to calculate the correlation dimension of groundwater levels.
A series of can be obtained by increasing the m. For a chaotic time series, continuously increases and then converges to a constant when m increases. For a random or periodic time series, increases without converging when m increases (Lai & Lerner 1998).
PSO-SVM model
Support vector machine
PSO-SVM
Accuracy assessment
Development of the proposed model
CASE STUDIES
Research area and materials
Huayuan landslide
Baijiabao landslide
Reconstruct the phase spaces of groundwater levels
Groundwater levels . | Reconstructed phase spaces (input variables) . | Output variables . |
---|---|---|
Daily | ||
Weekly | ||
Monthly |
Groundwater levels . | Reconstructed phase spaces (input variables) . | Output variables . |
---|---|---|
Daily | ||
Weekly | ||
Monthly |
Evidence of chaos in groundwater levels
Parameters of PSO-SVM model
Based on the reconstructed phase spaces and evidence of chaos, groundwater levels are predicted by PSO-SVM model. The position vector of a particle represents a parameter combination of the SVM. The final position vector is regarded as the optimal parameter combination of SVM, as shown in Table 2. The final predictive values are shown later in Figure 14.
. | Groundwater levels . | . |
---|---|---|
Chaotic PSO-SVM | Daily | (6,562.5, 0.011, 0.344) |
Weekly | (67.4, 0.031, 0.287) | |
Monthly | (30.5, 0.162, 0.498) |
. | Groundwater levels . | . |
---|---|---|
Chaotic PSO-SVM | Daily | (6,562.5, 0.011, 0.344) |
Weekly | (67.4, 0.031, 0.287) | |
Monthly | (30.5, 0.162, 0.498) |
In this study, the daily groundwater levels are relatively long time series with small fluctuations. Therefore, the of SVM for the daily groundwater levels should be high enough to fit the training data well, so as to predict the test data well. However, the weekly and monthly groundwater levels are very limited time series with large fluctuations. A high will result in the problem of over-fitting. Therefore, the of weekly and monthly groundwater levels is low to allow appropriate errors in the training process.
Comparisons with other models
Linear PSO-SVM model
The input and output variables for the linear PSO-SVM model are selected by linear correlation analysis. The input variables are regarded as the number of groundwater levels that have remarkable relationships with the output variable. Because groundwater levels are finite and belong to a univariable time series, the linear autoregressive method is commonly used (Daliakopoulos et al. 2005; Wong et al. 2007; Yang et al. 2009).
Groundwater levels . | Input variables . | Output variables . |
---|---|---|
Daily | ||
Weekly | ||
Monthly |
Groundwater levels . | Input variables . | Output variables . |
---|---|---|
Daily | ||
Weekly | ||
Monthly |
. | Groundwater levels . | . |
---|---|---|
Linear PSO-SVM | Daily | (3,562.5, 0.014, 0.352) |
Weekly | (72.8, 0.063, 0.291) | |
Monthly | (179.2, 0.097, 0.878) |
. | Groundwater levels . | . |
---|---|---|
Linear PSO-SVM | Daily | (3,562.5, 0.014, 0.352) |
Weekly | (72.8, 0.063, 0.291) | |
Monthly | (179.2, 0.097, 0.878) |
Chaotic BPNN model
In order to compare the PSO-SVM model with the BPNN model, the same input and output variables that were used in the chaotic PSO-SVM model are used again in the chaotic BPNN model. BPNN (Zhang & Wu 2009) is a commonly used ANN method. Recent research has demonstrated that the BPNN model with three-layer networks can fit nonlinear mapping relationships accurately (Jiang et al. 2008). In this study, a three-layered BPNN model is used. The input variables of the chaotic BPNN model are shown in Table 1. Three nodes of daily, two nodes of weekly, and two nodes of monthly groundwater levels are defined as input layers, and one node of all of the groundwater levels is defined as the output layer. The number of hidden layer nodes is determined using the trial-and error method (Basheer & Hajmeer 2000; Yang et al. 2009). The final selected hidden layer nodes of daily, weekly and monthly groundwater levels are 7, 5 and 5, respectively.
The built-in BPNN model in the MATLAB R2015b is used. The initial weights and thresholds for all connection links are set randomly within the range from 0 to 1. The transferring functions of the neural networks of the hidden layer and the output layer are tansig and purelin functions. The weight values of connection links are trained by gradient descent algorithm. The maximum iteration number is set to 1,000. The learning rate is set to 0.01. The training error is 0.001. The predictive results of the chaotic BPNN model are shown later in Figure 14.
Comparison of the three models
Training accuracies of the three models
The training accuracies of the three models for the daily, weekly and monthly groundwater levels are shown in Table 5. It can be seen from Table 5 that the daily, weekly and monthly groundwater levels are trained well by the three models. It can also be seen from Table 6 that the testing accuracies of the three models are reasonably good. Hence, there are no overtraining signs in the three models.
Groundwater . | Chaotic PSO-SVM . | Chaotic BPNN . | Linear PSO-SVM . | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE (m) . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | |
Daily | 0.054 | 0.948 | 0.952 | 0.034 | 0.958 | 0.964 | 0.057 | 0.942 | 0.943 |
Weekly | 0.097 | 0.904 | 0.884 | 0.072 | 0.923 | 0.921 | 0.106 | 0.896 | 0.878 |
Monthly | 0.142 | 0.923 | 0.875 | 0.103 | 0.941 | 0.906 | 0.151 | 0.914 | 0.869 |
Groundwater . | Chaotic PSO-SVM . | Chaotic BPNN . | Linear PSO-SVM . | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE (m) . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | |
Daily | 0.054 | 0.948 | 0.952 | 0.034 | 0.958 | 0.964 | 0.057 | 0.942 | 0.943 |
Weekly | 0.097 | 0.904 | 0.884 | 0.072 | 0.923 | 0.921 | 0.106 | 0.896 | 0.878 |
Monthly | 0.142 | 0.923 | 0.875 | 0.103 | 0.941 | 0.906 | 0.151 | 0.914 | 0.869 |
Groundwater . | Chaotic PSO-SVM . | Chaotic BPNN . | Linear PSO-SVM . | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE (m) . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | |
Daily | 0.075 | 0.917 | 0.938 | 0.098 | 0.906 | 0.912 | 0.105 | 0.895 | 0.904 |
Weekly | 0.127 | 0.893 | 0.867 | 0.148 | 0.863 | 0.836 | 0.157 | 0.852 | 0.825 |
Monthly | 0.189 | 0.896 | 0.849 | 0.223 | 0.857 | 0.826 | 0.207 | 0.831 | 0.817 |
Groundwater . | Chaotic PSO-SVM . | Chaotic BPNN . | Linear PSO-SVM . | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE (m) . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | RMSE (m) . | R2 . | NSE . | |
Daily | 0.075 | 0.917 | 0.938 | 0.098 | 0.906 | 0.912 | 0.105 | 0.895 | 0.904 |
Weekly | 0.127 | 0.893 | 0.867 | 0.148 | 0.863 | 0.836 | 0.157 | 0.852 | 0.825 |
Monthly | 0.189 | 0.896 | 0.849 | 0.223 | 0.857 | 0.826 | 0.207 | 0.831 | 0.817 |
Comparison of the prediction results
The final prediction results of one day, one week and one month ahead groundwater levels are compared in Table 6 and Figure 14. It can be seen from Figure 14 that although some prediction values deviate from the monitoring data, the chaotic PSO-SVM model predicts well the groundwater level especially for the daily and weekly groundwater levels. However, as highlighted by the green ellipse in Figure 14, the fluctuation of the daily, weekly and monthly groundwater levels are not well predicted by the chaotic BPNN and linear PSO-SVM models.
The prediction performances of the three models are also compared in Table 6, with the three indices RMSE, R2 and NSE. It can be seen from Table 4 that the daily data has the highest prediction accuracy, and the monthly data has the lowest prediction accuracy. It can also be seen from Table 4 that the NSE metric indicates that chaotic PSO-SVM model is superior to the chaotic BPNN model and the linear PSO-SVM model for the test data. The other two indices also show that the performance of the chaotic PSO-SVM model is better than the chaotic BPNN model and the linear PSO-SVM model for the test data of all daily, weekly and monthly groundwater levels.
Discussion
The forecast results indicate that the chaotic PSO-SVM model is more accurate and credible than the linear PSO-SVM model, and the PSO-SVM model is more appropriate for finite nonlinear data than the BPNN model. The comparisons indicate that the PSR method of chaos theory provides the input and output variables more appropriately than the linear autoregressive model. It can be seen that there are similarities between the PSR method and SVM model. The PSR method can determine the number of input variables by extending one-dimensional groundwater level to high dimensions referred to as ‘embedding dimensions’ (Sivakumar et al. 2002). Similarly, the SVM method provides a nonlinear kernel function to map the input variables into high dimensional feature space. Both methods map the groundwater level from the low-dimensional space to the high-dimensional space (Wang & Shi 2013). Hence, the PSR method is suitable to determine the input and output variables for the SVM model. The nonlinear dynamical characteristics of groundwater level can be reflected in the embedding space.
However, all the three models have relatively low prediction precision for the extreme monitoring values. One important reason is that the length of the groundwater level time series is not long enough and the reconstructed strange attractors are not fully unfolded to reflect the original strange attractors. One year is a very short time period for data-based model training. Another reason is that only groundwater level is used as input. For future studies, a nonlinear model based on multivariable chaos theory (Garcia & Almeida 2005) can be used, which can consider some other input variables such as rainfall, reservoir water level and temperature. Furthermore, the prediction windows of this study are one day, one week and one month, the prediction window can be enlarged by feeding the predicted values to the model again as previously done by Trichakis et al. (2009). In addition, all the predictions are conducted in a server with Intel Xeon CPU [email protected] GHz with 256GB RAM. The computational time of the chaotic PSO-SVM, chaotic BPNN and linear PSO-SVM models for the daily groundwater level prediction is 184.5 s, 24.6 s and 179.3 s, respectively.
CONCLUSION
Based on chaos theory, this study proposes the chaotic PSO-SVM model for groundwater level predictions. Two criteria are proposed to ensure that there is evidence of chaos in the groundwater levels. The first one is that the LLE should be greater than zero. The second criterion is that the values of correlation dimension should converge to a constant when the embedding dimension increases.
The proposed model is used to predict the daily groundwater levels in Huayuan landslide, and weekly and monthly groundwater levels in Baijiabao landslide. Both chaotic BPNN and linear PSO-SVM models are used for comparisons. The results show that the chaotic PSO-SVM model provides the best predictions among the three methods for the test data considered. However, there are still some limitations of the three models. For the chaotic PSO-SVM model, only the groundwater level is used as an input variable without considering the external factors such as rainfall and reservoir water level. Further research is needed to develop models based on multivariable chaos theory. In addition, relatively long groundwater level time series is needed to train and test the chaotic PSO-SVM model well. For the chaotic BPNN model, the prediction accuracy is not ideal although the prediction efficiency is high. For the linear PSO-SVM model, the input variables are selected based on simple linear correlation coefficient, which sometimes cannot identify properly the input variables.
In summary, the chaotic PSO-SVM model has advantages in determining the input-output variables through nonlinear method, and obtaining more accurate predictive values of groundwater levels than the linear PSO-SVM and chaotic BPNN models. The proposed chaotic PSO-SVM model can be used to predict the real world groundwater levels.
ACKNOWLEDGEMENTS
The first author is currently supported by the China Scholarship Council to visit the ARC Centre of Excellence for Geotechnical Science and Engineering, University of Newcastle, NSW, Australia. This work was also supported by the National Natural Science Foundation of China (Project No. 51679117, 51509125).