Many nonlinear models have been proposed to forecast groundwater level. However, the evidence of chaos in groundwater levels in landslide has not been explored. In addition, linear correlation analyses are used to determine the input and output variables for the nonlinear models. Linear correlation analyses are unable to capture the nonlinear relationships between the input and output variables. This paper proposes to use chaos theory to select the input and output variables for nonlinear models. The nonlinear model is constructed based on support vector machine (SVM). The parameters of SVM are obtained by particle swarm optimization (PSO). The proposed PSO-SVM model based on chaos theory (chaotic PSO-SVM) is applied to predict the daily groundwater levels in Huayuan landslide and the weekly, monthly groundwater levels in Baijiabao landslide in the Three Gorges Reservoir Area in China. The results show that there are chaos characteristics in the groundwater levels. The linear correlation analysis based PSO-SVM (linear PSO-SVM) and chaos theory-based back-propagation neural network (chaotic BPNN) are also applied for the purpose of comparison. The results show that the chaotic PSO-SVM model has higher prediction accuracy than the linear PSO-SVM and chaotic BPNN models for the test data considered.

## INTRODUCTION

There are many landslides in the Three Gorges Reservoir Area especially after the impoundment of the Three Gorges Reservoir. The instability of many reservoir landslides has been related to the dramatic changes in the groundwater seepage field (Asch *et al.* 2009; Zhang *et al.* 2012). Therefore, the prediction of groundwater levels is of critical importance for landslide prevention (Keqiang *et al.* 2015).

Generally speaking, groundwater level prediction models include physically based and data-based models (Adamowski & Chan 2011). Some physically based models such as the ARX model (Knotters & Bierkens 2000), process-based spatio-temporal model (Schmidt & Dikau 2004) or the water-table fluctuation method (Park & Parker 2008) have been used to simulate and forecast the processes of groundwater level fluctuation. However, physically based models have practical limitations (Knotters & Bierkens 2000, 2002; Nourani *et al.* 2007). For example, spatial variations and the uncertainty of hydrological investigation also have negative effects on the accuracy of physically based models. Knotters & Bierkens (2001) provided a regionalized ARX model to determine parameters for physically based models considering spatial variability.

It is easy to build data-based models using only an input-output variable approach (Trichakis *et al.* 2011; Nourani & Komasi 2013). In recent years, many data-based models have been adopted to predict groundwater levels, such as regression models (Adamowski & Feluch 1991; Sahoo & Jha 2013; Raghavendra & Deka 2016), artificial neural networks (ANN) (Nourani *et al.* 2008; Tsanis *et al.* 2008; Dash *et al.* 2010; Mohanty *et al.* 2010, 2015; Adamowski & Chan 2011; Jalalkamali *et al.* 2011; Sreekanth *et al.* 2011; Wang *et al.* 2012; Maheswaran & Khosa 2013; Atiquzzaman & Kandasamy 2015; Behnia & Rezaeian 2015; Chang *et al.* 2015), and genetic programming (Fallah-Mehdipour *et al.* 2013). Moreover, a literature review indicates that linear correlation analysis methods are widely used to select the input and output variables of these data-based models (Daliakopoulos *et al.* 2005; Nayak *et al.* 2006; Wong *et al.* 2007; Chen *et al.* 2009a; Yang *et al.* 2009; Chen *et al.* 2010; Sahoo & Jha 2013; Maiti & Tiwari 2014). The principal component analysis method can be used to reduce the redundant information in the input variables (Jolliffe 2002). Since groundwater level processes are non-linear in unconfined aquifers, a nonlinear model that can capture not only the overall appearance but also the underlying dynamic behavior of all of the nonlinear processes is required.

Both chaos theory (Sivakumar *et al.* 2001) and fractal theory (Zhang & Yang 2010) can be used to explore the nonlinear dynamic behavior of groundwater level time series. Both of these are sensitive to the initial conditions of the dynamic system, and there are self-similar characteristics in the chaos attractor and the fractal structure (Baas 2002). However, these methods usually explore the nonlinear system from different perspectives. The fractal theory mainly explores the structure of the attractor in geometrical space while the chaos theory mainly explores the evolution characteristic of the nonlinear system from the perspectives of time series (Peitgen *et al.* 2006).

Recently, chaos theory has been widely used in the nonlinear analysis of hydrological time series (Gutiérrez *et al.* 2006). Evidence of chaos has been demonstrated in many hydrological phenomena such as water level (Liong *et al.* 2005), precipitation (Jayawardena & Lai 1994), stream flows (Salas *et al.* 2005) and rainfall-runoff processes (Sivakumar *et al.* 2001). However, research on the evidence of chaos in groundwater levels in landslides has been very limited. In this study, based on the finding of evidence of chaos, embedding theory and phase space reconstruction (PSR) method are used to build the chaotic time series model. According to embedding theory (Takens 1981), in the long-term evolution of a chaotic groundwater level time series, information about the hidden states of the whole dynamic system can be preserved through a univariable groundwater level output. It is significant to effectively predict a nonlinear time series using a univariable model because sometimes it is difficult to obtain other correlated variables. The chaotic model is able to do nonlinear prediction using a univariable time series. In the PSR method from the chaotic model (King & Stewart 1992), a univariable groundwater level can be constructed into a multi-dimensional phase-space. As a result, the inputs and output of the nonlinear model can be obtained from the reconstructed multi-dimensional phase spaces.

It is necessary to choose a nonlinear model for chaotic groundwater level model building. As noted previously, ANN models (without the incorporation of chaos theory) have been extensively applied to forecast groundwater levels. ANN has limitations, however, including locally optimal values and the requirement of extensive data. Recently, support vector machines (SVM) were developed for time series prediction (Cortes & Vapnik 1995). SVM models have many advantages, including excellent generalization performance and global optimum. They have gained special attention in many areas such as electronic power prediction (Niu *et al.* 2010), traffic flow forecasting (Chen *et al.* 2009b), rainfall and runoff prediction (Tripathi *et al.* 2006), and landslide prediction (Feng *et al.* 2004; Yao *et al.* 2008). Meanwhile, the SVM model without chaos theory has also been used for groundwater level prediction (Behzad *et al.* 2009; Guzman *et al.* 2015; Gong *et al.* 2016). The main problem with SVM is the determination of its parameters. The particle swarm optimization (PSO) algorithm is widely used for SVM parameter selection because of its excellent global search ability (Lin *et al.* 2008; Fei *et al.* 2009). A novel PSO-SVM model based on chaos theory (chaotic PSO-SVM) is proposed in this study. It is then used to forecast the daily groundwater levels of the Huayuan landslide area, as well as the weekly and monthly groundwater levels of the Baijiabao landslide area.

To show the excellent generalization capability of PSO-SVM model, the back-propagation neural network model based on chaos theory (chaotic BPNN) is also used to predict the groundwater levels. The results show that the chaotic PSO-SVM model has higher prediction accuracy than the chaotic BPNN model for the test data. To show the better guideline for determining the input and output variables, the PSO-SVM model based on linear correlation analysis (linear PSO-SVM) is also used. The results show that the PSR method is more appropriate to select the optimal input and output variables.

## METHOD

The proposed method includes data pre-processing, PSR, evidence of chaos identification, the PSO-SVM model and accuracy assessment. All the programs are implemented in MATLAB R2015b. The PSR and the false nearest neighbor (FNN) algorithms in ChaosToolbox2p9_trial are used. The SVM model is tested and trained using the program libsvm-3.1-[FarutoUltimate 3.111code].

### Data pre-processing

*N*is the number of groundwater levels, and are the lower and upper bounds of the original groundwater levels. are used to predict groundwater level and the results are back-transformed to obtain the final predicted groundwater levels: where is the predicted groundwater level.

### Phase space reconstruction

*et al.*1992) is used to model the deterministic regular of strange attractors of groundwater level. The PSR method provides a simplified, multi-dimensional representation of a univariable nonlinear time series. In this approach, the nonlinear dynamics of a groundwater level can be fully embedded in a multi-dimensional phase space as: where is the reconstructed phase space, is the normalized groundwater level,

*m*is the embedding dimension ,

*d*is the dimension of the strange attractor of phase space, and

*τ*is the delay time.

*τ*represents the average length of memory of the system. The phase space of a univariable chaotic time series can be reconstructed by selecting appropriate

*τ*and

*m*(Sauer

*et al.*1991).

#### Determination of delay time

Typically, *τ* is determined using either the correlation analysis method (Aguirre 1995) or the mutual information method (Fraser & Swinney 1986). Both methods are generally suitable for noiseless long chaotic time series, and there are no commonly accepted guidelines for selection of parameters. Meanwhile, if a large value is selected for *τ*, the difficulty of computing the nonlinear model will increase greatly (Wolf *et al.* 1985). Therefore, a relatively small *τ* is usually chosen for the evidence of chaos identification and nonlinear prediction of the finite hydrological and environment time series (Sivakumar *et al.* 2002; Han & Wang 2009; Huang *et al.* 2016). In this study, the *τ* of daily, weekly and monthly groundwater levels are all set to 1, because the measurements of groundwater levels are finite and noisy. Huang *et al.* (2015) show that reasonably good groundwater level forecasts were obtained when *τ* was set to 1.

#### Determination of embedding dimension

The *m* is the minimum number of state variables required to describe the chaotic system. If *m* is too large, longer groundwater level time series and more complex computations are needed. As a result, the efficiency of the nonlinear model will be reduced because of data redundancy. If *m* is too small, the strange attractors cannot be reconstructed (Cao 1997). The FNN method (Kennel *et al.* 1992) is one of the most popular methods for estimating optimal *m* because it is insensitive to the finite and noisy data points. It is used in this study.

*m*using FNN method starts with an embedding space named . Suppose is a data point in the and is its nearest neighbor. The Euclidean distance between these two elements is calculated as: The Euclidean distance between the projections of these two points into is given by: The parameter

*S*is defined as a measure of the distance between and in

*R*

_{m+}_{1}, normalized against their distance in as: The FNN method is aimed to search for all the data points which are neighbors in a particular embedding dimension

*m*and which do not remain so when increasing the

*m*to

*m*+ 1. In this study, the ratio

*S*of the distances between a particular data point and its nearest neighbor in the

*m*+ 1th and

*m*th dimensions is computed. If the

*S*is larger than a particular threshold , the neighbor is false. The is determined according to the number of reconstructed phase spaces. The can be set to 10 when the distribution of the reconstructed phase spaces in the dynamic system is sparse; and the can be set to a greater value when the distribution of the reconstructed phase spaces in the dynamic system is intensive (Kennel & Abarbanel 2002; Han 2007). There are just tens of phase spaces in the weekly and monthly groundwater levels, and hundreds of phase spaces in the daily groundwater levels in this study. Hence, the is set to 10. A greater can be used if there are thousands of phase spaces in the groundwater levels. When the percentage of FNN falls to 5%, the corresponding embedding dimension is considered high enough to represent the dynamics of the groundwater level.

### Evidence of chaos identification

If there are chaos characteristics in the groundwater levels, chaotic models can be used to forecast the groundwater levels. Otherwise, if the groundwater levels are random or periodic time series, chaotic models cannot be used. The chaos characteristics of nonlinear time series are primarily identified by qualitative or quantitative methods (Welch 1967). Qualitative methods identify the chaotic time series through revealing the special spatial structures or frequency features shown in the spatial or frequency domains (Yeragani *et al.* 1993). Qualitative methods are less specific and have lower accuracy. Thus, it is preferred to identify the chaos characteristics through quantitative methods.

Quantitative methods include the largest Lyapunov exponent (LLE) method (Wolf *et al.* 1985), correlation dimension method (Broock *et al.* 1996) and Kolmogorov entropy method (Kosloff & Rice 1981). It is difficult to calculate the Kolmogorov entropy values of groundwater levels because the groundwater level time series are finite and noisy. Meanwhile, the LLE and correlation dimension methods have less stringent requirement than the Kolmogorov entropy method in the length of groundwater level time series. Hence, the LLE and correlation dimension methods are applied to characterize the chaotic system of groundwater levels in this study. The LLE describes the divergence rate of trajectories that starts close but diverges over time. In addition, the correlation dimension is one of the infinite numbers of dimensions in the dimension spectrum which characterizes the multi-fractal structure of the strange attractor (Procaccia *et al.* 1983). If the evidence of chaos in the groundwater levels can be revealed from the results of both of the LLE and correlation dimension methods, there are chaos characteristics in the groundwater levels. Then the chaotic models can be used to predict the groundwater levels.

#### Calculation of LLE

*et al.*1993) is both simple and appropriate for a finite time series. It is used to calculate the LLE in this study. If the LLE is denoted as

*L*, the average divergence at time

*t*can be defined as: where

*k*is a constant that normalizes the initial separation. The first step is to reconstruct the attractor dynamics from a univariable time series. The reconstructed trajectory,

*X*, can be expressed as a matrix where each row is a phase-space vector. That is: where is the

*i*th data point of the dynamic system,

*M*is the number of data points on the reconstructed attractor, For a

*N*-point groundwater level time series, , each is given by: Thus

*X*is a

*M*×

*m*matrix. The constants

*m*,

*M*,

*τ*and

*N*are related as: After reconstructing the dynamics, the nearest neighbor of each point can be located on the trajectory. The nearest neighbor is found by searching for the point that minimizes the distance to the particular reference point as: where is the initial distance from the

*j*th point to its nearest neighbor, and denotes the Euclidean norm. The

*L*is then estimated as the mean rate of separation of the nearest neighbors. Based on the definition of

*L*given in Equation (7), the

*j*th pair of nearest neighbors diverge approximately at a rate given by: where , is the sampling period of the time series, is the initial separation of the

*j*th pair of nearest neighbors. Taking the logarithm of both sides of Equation (12), Equation (13) represents a set of approximately parallel lines (for ), each with a slope roughly proportional to

*L*. The

*L*can be calculated using a least-squares fit to the ‘average’ line defined by where denotes the average over all

*j*. This process of averaging is the key step to calculate accurate

*L*using finite and noisy time series. If the dynamic system of groundwater levels contains chaos characteristics, the LLE must be greater than zero. This is because that, for a two trajectories with nearby initial conditions on the dynamic system, if the LLE is smaller than zero, the trajectories cannot diverge exponentially when time increases (Eckmann & Ruelle 1985).

#### Calculation of correlation dimension

Correlation dimension is one of the most efficient methods to identify the evidence of chaos. The method uses a fractal dimension, which is non-integer for chaotic systems. The Grassberger-Procaccia (G-P) approach (Grassberger & Procaccia 1983) is suitable for finite time series and easy to implement. It is used in this study to calculate the correlation dimension of groundwater levels.

*H*is the Heaviside step function, with for , and for ,

*r*is the radius of the sphere centered on or ; if the time series is characterized by an attractor, can be related to radius

*r*as: where is the correlation dimension. Take the logarithm of Equation (16) and rearrange it as:

A series of can be obtained by increasing the *m.* For a chaotic time series, continuously increases and then converges to a constant when *m* increases. For a random or periodic time series, increases without converging when *m* increases (Lai & Lerner 1998).

### PSO-SVM model

#### Support vector machine

*ɛ*-SVM which locates the hyperplane with an

*ɛ*-insensitive loss value. The

*ε*-SVM is formulated as: where is a nonlinear mapping from the input space to the feature space,

*w*is a vector of weight coefficients and

*b*is a bias constant.

*w*and

*b*are estimated by the following optimization problem: To cope with feasibility issues and to make the method more robust, points from the

*ɛ*-insensitive band are not eliminated. Instead, these points are penalized by introducing slack variables : where the cost constant determines the trade-off between model complexity. After taking the Lagrangian and conditions for optimality, the solution in dual form is where are non-zero Lagrangian multipliers and the solution for the dual problem. is the kernel function which represents the inner product . In this study, the radial basis function (RBF) is used: where

*γ*is the width parameter of RBF kernel. In this study, the cost constant , the insensitive loss

*ɛ*and the kernel function parameter

*γ*are determined by the PSO algorithm.

#### PSO-SVM

**(**Eberhart & Kennedy 1995) is an adaptive algorithm for parameter selection. The general procedure of a PSO-SVM model (Wang

*et al.*2013) is shown in Figure 1. In the first step, the initial parameters such as the number of particles, learning factors and maximum iterations of the PSO algorithm are determined. In the second step, the SVM is trained and tested. In the third step, the root mean square error (

*RMSE*) of SVM predicted values were selected as the fitness function, and the fitness function is regarded as the objective function for PSO algorithm. In the fourth and fifth steps, the position and velocity of all particles are updated by comparing the particle fitness value with the local and global best fitness values. Finally, the process is repeated until the maximum number of iterations is reached.

### Accuracy assessment

*RMSE*is calculated as: where is the original groundwater level, is the final predicted values, and is the length of predicted data. The

*RMSE*indicates the discrepancy between the monitoring and predicted values. The lower the

*RMSE*is, the more accurate the prediction. In addition, the goodness of fit (

*R*

^{2}) is also used to assess the accuracy in this study. The

*R*

^{2}represents the percentage of the initial uncertainty that is explained by the prediction models:

*NSE*) (Pulido-Calvo & Gutierrez-Estrada 2009) is also used to assess the performance of the three models as: The

*NSE*falls in the range from 0 to 1. A

*NSE*of 1 corresponds to a perfect match of the predictive values to the monitored data. The closer the

*NSE*is to 1, the more accurate the model is.

### Development of the proposed model

## CASE STUDIES

### Research area and materials

#### Huayuan landslide

^{4}m

^{2}with a maximum longitudinal length of 380 m and a width of 360 m. The elevation of its leading edge is approximately 125 m and the elevation of the trailing edge is approximately 270 m, while the left and right boundaries are both defined by seasonal gullies. The topographic map of the landslide area is shown in Figure 4.

#### Baijiabao landslide

^{4}m

^{2}with a maximum longitudinal length of 550 m and a width of 400 m. The landslide extends between 135 m and 275 m in elevation and the mean depth of the sliding surface is approximately 45 m. The main sliding direction of Baijiabao landslide is oriented at N 85 ° E, and the left and right boundaries of the landslide are defined by bedrock and a gully, respectively. The topographic map is shown in Figure 7.

### Reconstruct the phase spaces of groundwater levels

*τ*of all groundwater levels are set to 1. The optimal

*m*of daily, weekly and monthly groundwater levels are 3, 3 and 2, respectively, as shown in Figure 10. The reconstructed phase spaces are shown in Table 1.

Groundwater levels | Reconstructed phase spaces (input variables) | Output variables |
---|---|---|

Daily | ||

Weekly | ||

Monthly |

Groundwater levels | Reconstructed phase spaces (input variables) | Output variables |
---|---|---|

Daily | ||

Weekly | ||

Monthly |

### Evidence of chaos in groundwater levels

*L*of groundwater levels. For the groundwater level series, the initial point is chosen near the attractor and the transient points are discarded. In Figure 11, the

*y*axis is the average distance between all nearest neighbors after discrete time steps. The red line is the fitted average line of the coordinate points . The slope of the red line is taken as

*L*. The final calculated

*L*of daily, weekly and monthly groundwater levels are 0.2291, 0.1533 and 0.0591, respectively. Although the

*L*of monthly groundwater levels is close to zero, it just means that the orbits in the state space of monthly groundwater levels are very close (An

*et al.*2011; Zhou & Yin 2014). It is clear that there are chaos characteristics in all the groundwater levels because

*L*is greater than zero.

*m*is increased from 1 to 20, 15 and 20, respectively. The relationships of and in daily, weekly and monthly groundwater levels are shown in Figure 12. It can be seen that when

*m*is increased to 11, 10 and 15, the slopes of the lines converge to a constant. Therefore, there are chaos characteristics in the daily, weekly and monthly groundwater levels.

### Parameters of PSO-SVM model

Based on the reconstructed phase spaces and evidence of chaos, groundwater levels are predicted by PSO-SVM model. The position vector of a particle represents a parameter combination of the SVM. The final position vector is regarded as the optimal parameter combination of SVM, as shown in Table 2. The final predictive values are shown later in Figure 14.

Groundwater levels | ||
---|---|---|

Chaotic PSO-SVM | Daily | (6,562.5, 0.011, 0.344) |

Weekly | (67.4, 0.031, 0.287) | |

Monthly | (30.5, 0.162, 0.498) |

Groundwater levels | ||
---|---|---|

Chaotic PSO-SVM | Daily | (6,562.5, 0.011, 0.344) |

Weekly | (67.4, 0.031, 0.287) | |

Monthly | (30.5, 0.162, 0.498) |

In this study, the daily groundwater levels are relatively long time series with small fluctuations. Therefore, the of SVM for the daily groundwater levels should be high enough to fit the training data well, so as to predict the test data well. However, the weekly and monthly groundwater levels are very limited time series with large fluctuations. A high will result in the problem of over-fitting. Therefore, the of weekly and monthly groundwater levels is low to allow appropriate errors in the training process.

### Comparisons with other models

#### Linear PSO-SVM model

The input and output variables for the linear PSO-SVM model are selected by linear correlation analysis. The input variables are regarded as the number of groundwater levels that have remarkable relationships with the output variable. Because groundwater levels are finite and belong to a univariable time series, the linear autoregressive method is commonly used (Daliakopoulos *et al.* 2005; Wong *et al.* 2007; Yang *et al.* 2009).

Groundwater levels | Input variables | Output variables |
---|---|---|

Daily | ||

Weekly | ||

Monthly |

Groundwater levels | Input variables | Output variables |
---|---|---|

Daily | ||

Weekly | ||

Monthly |

Groundwater levels | ||
---|---|---|

Linear PSO-SVM | Daily | (3,562.5, 0.014, 0.352) |

Weekly | (72.8, 0.063, 0.291) | |

Monthly | (179.2, 0.097, 0.878) |

Groundwater levels | ||
---|---|---|

Linear PSO-SVM | Daily | (3,562.5, 0.014, 0.352) |

Weekly | (72.8, 0.063, 0.291) | |

Monthly | (179.2, 0.097, 0.878) |

#### Chaotic BPNN model

In order to compare the PSO-SVM model with the BPNN model, the same input and output variables that were used in the chaotic PSO-SVM model are used again in the chaotic BPNN model. BPNN (Zhang & Wu 2009) is a commonly used ANN method. Recent research has demonstrated that the BPNN model with three-layer networks can fit nonlinear mapping relationships accurately (Jiang *et al.* 2008). In this study, a three-layered BPNN model is used. The input variables of the chaotic BPNN model are shown in Table 1. Three nodes of daily, two nodes of weekly, and two nodes of monthly groundwater levels are defined as input layers, and one node of all of the groundwater levels is defined as the output layer. The number of hidden layer nodes is determined using the trial-and error method (Basheer & Hajmeer 2000; Yang *et al.* 2009). The final selected hidden layer nodes of daily, weekly and monthly groundwater levels are 7, 5 and 5, respectively.

The built-in BPNN model in the MATLAB R2015b is used. The initial weights and thresholds for all connection links are set randomly within the range from 0 to 1. The transferring functions of the neural networks of the hidden layer and the output layer are *tansig* and *purelin* functions. The weight values of connection links are trained by gradient descent algorithm. The maximum iteration number is set to 1,000. The learning rate is set to 0.01. The training error is 0.001. The predictive results of the chaotic BPNN model are shown later in Figure 14.

### Comparison of the three models

#### Training accuracies of the three models

The training accuracies of the three models for the daily, weekly and monthly groundwater levels are shown in Table 5. It can be seen from Table 5 that the daily, weekly and monthly groundwater levels are trained well by the three models. It can also be seen from Table 6 that the testing accuracies of the three models are reasonably good. Hence, there are no overtraining signs in the three models.

Groundwater | Chaotic PSO-SVM | Chaotic BPNN | Linear PSO-SVM | ||||||
---|---|---|---|---|---|---|---|---|---|

RMSE (m) | R^{2} | NSE | RMSE (m) | R^{2} | NSE | RMSE (m) | R^{2} | NSE | |

Daily | 0.054 | 0.948 | 0.952 | 0.034 | 0.958 | 0.964 | 0.057 | 0.942 | 0.943 |

Weekly | 0.097 | 0.904 | 0.884 | 0.072 | 0.923 | 0.921 | 0.106 | 0.896 | 0.878 |

Monthly | 0.142 | 0.923 | 0.875 | 0.103 | 0.941 | 0.906 | 0.151 | 0.914 | 0.869 |

Groundwater | Chaotic PSO-SVM | Chaotic BPNN | Linear PSO-SVM | ||||||
---|---|---|---|---|---|---|---|---|---|

RMSE (m) | R^{2} | NSE | RMSE (m) | R^{2} | NSE | RMSE (m) | R^{2} | NSE | |

Daily | 0.054 | 0.948 | 0.952 | 0.034 | 0.958 | 0.964 | 0.057 | 0.942 | 0.943 |

Weekly | 0.097 | 0.904 | 0.884 | 0.072 | 0.923 | 0.921 | 0.106 | 0.896 | 0.878 |

Monthly | 0.142 | 0.923 | 0.875 | 0.103 | 0.941 | 0.906 | 0.151 | 0.914 | 0.869 |

Groundwater | Chaotic PSO-SVM | Chaotic BPNN | Linear PSO-SVM | ||||||
---|---|---|---|---|---|---|---|---|---|

RMSE (m) | R^{2} | NSE | RMSE (m) | R^{2} | NSE | RMSE (m) | R^{2} | NSE | |

Daily | 0.075 | 0.917 | 0.938 | 0.098 | 0.906 | 0.912 | 0.105 | 0.895 | 0.904 |

Weekly | 0.127 | 0.893 | 0.867 | 0.148 | 0.863 | 0.836 | 0.157 | 0.852 | 0.825 |

Monthly | 0.189 | 0.896 | 0.849 | 0.223 | 0.857 | 0.826 | 0.207 | 0.831 | 0.817 |

Groundwater | Chaotic PSO-SVM | Chaotic BPNN | Linear PSO-SVM | ||||||
---|---|---|---|---|---|---|---|---|---|

RMSE (m) | R^{2} | NSE | RMSE (m) | R^{2} | NSE | RMSE (m) | R^{2} | NSE | |

Daily | 0.075 | 0.917 | 0.938 | 0.098 | 0.906 | 0.912 | 0.105 | 0.895 | 0.904 |

Weekly | 0.127 | 0.893 | 0.867 | 0.148 | 0.863 | 0.836 | 0.157 | 0.852 | 0.825 |

Monthly | 0.189 | 0.896 | 0.849 | 0.223 | 0.857 | 0.826 | 0.207 | 0.831 | 0.817 |

#### Comparison of the prediction results

The final prediction results of one day, one week and one month ahead groundwater levels are compared in Table 6 and Figure 14. It can be seen from Figure 14 that although some prediction values deviate from the monitoring data, the chaotic PSO-SVM model predicts well the groundwater level especially for the daily and weekly groundwater levels. However, as highlighted by the green ellipse in Figure 14, the fluctuation of the daily, weekly and monthly groundwater levels are not well predicted by the chaotic BPNN and linear PSO-SVM models.

The prediction performances of the three models are also compared in Table 6, with the three indices *RMSE*, *R*^{2} and *NSE*. It can be seen from Table 4 that the daily data has the highest prediction accuracy, and the monthly data has the lowest prediction accuracy. It can also be seen from Table 4 that the *NSE* metric indicates that chaotic PSO-SVM model is superior to the chaotic BPNN model and the linear PSO-SVM model for the test data. The other two indices also show that the performance of the chaotic PSO-SVM model is better than the chaotic BPNN model and the linear PSO-SVM model for the test data of all daily, weekly and monthly groundwater levels.

#### Discussion

The forecast results indicate that the chaotic PSO-SVM model is more accurate and credible than the linear PSO-SVM model, and the PSO-SVM model is more appropriate for finite nonlinear data than the BPNN model. The comparisons indicate that the PSR method of chaos theory provides the input and output variables more appropriately than the linear autoregressive model. It can be seen that there are similarities between the PSR method and SVM model. The PSR method can determine the number of input variables by extending one-dimensional groundwater level to high dimensions referred to as ‘embedding dimensions’ (Sivakumar *et al.* 2002). Similarly, the SVM method provides a nonlinear kernel function to map the input variables into high dimensional feature space. Both methods map the groundwater level from the low-dimensional space to the high-dimensional space (Wang & Shi 2013). Hence, the PSR method is suitable to determine the input and output variables for the SVM model. The nonlinear dynamical characteristics of groundwater level can be reflected in the embedding space.

However, all the three models have relatively low prediction precision for the extreme monitoring values. One important reason is that the length of the groundwater level time series is not long enough and the reconstructed strange attractors are not fully unfolded to reflect the original strange attractors. One year is a very short time period for data-based model training. Another reason is that only groundwater level is used as input. For future studies, a nonlinear model based on multivariable chaos theory (Garcia & Almeida 2005) can be used, which can consider some other input variables such as rainfall, reservoir water level and temperature. Furthermore, the prediction windows of this study are one day, one week and one month, the prediction window can be enlarged by feeding the predicted values to the model again as previously done by Trichakis *et al.* (2009). In addition, all the predictions are conducted in a server with Intel Xeon CPU X5675@3.07 GHz with 256GB RAM. The computational time of the chaotic PSO-SVM, chaotic BPNN and linear PSO-SVM models for the daily groundwater level prediction is 184.5 s, 24.6 s and 179.3 s, respectively.

## CONCLUSION

Based on chaos theory, this study proposes the chaotic PSO-SVM model for groundwater level predictions. Two criteria are proposed to ensure that there is evidence of chaos in the groundwater levels. The first one is that the LLE should be greater than zero. The second criterion is that the values of correlation dimension should converge to a constant when the embedding dimension increases.

The proposed model is used to predict the daily groundwater levels in Huayuan landslide, and weekly and monthly groundwater levels in Baijiabao landslide. Both chaotic BPNN and linear PSO-SVM models are used for comparisons. The results show that the chaotic PSO-SVM model provides the best predictions among the three methods for the test data considered. However, there are still some limitations of the three models. For the chaotic PSO-SVM model, only the groundwater level is used as an input variable without considering the external factors such as rainfall and reservoir water level. Further research is needed to develop models based on multivariable chaos theory. In addition, relatively long groundwater level time series is needed to train and test the chaotic PSO-SVM model well. For the chaotic BPNN model, the prediction accuracy is not ideal although the prediction efficiency is high. For the linear PSO-SVM model, the input variables are selected based on simple linear correlation coefficient, which sometimes cannot identify properly the input variables.

In summary, the chaotic PSO-SVM model has advantages in determining the input-output variables through nonlinear method, and obtaining more accurate predictive values of groundwater levels than the linear PSO-SVM and chaotic BPNN models. The proposed chaotic PSO-SVM model can be used to predict the real world groundwater levels.

## ACKNOWLEDGEMENTS

The first author is currently supported by the China Scholarship Council to visit the ARC Centre of Excellence for Geotechnical Science and Engineering, University of Newcastle, NSW, Australia. This work was also supported by the National Natural Science Foundation of China (Project No. 51679117, 51509125).