Abstract
Dynamic monitoring data of groundwater level is an important basis for understanding the current situation of groundwater development and for the utilization and planning of sustainable exploitation. Dynamic monitoring data of groundwater level are typical spatio-temporal sequence data, which have the characteristics of non-linearity and strong spatio-temporal correlation. The trend of dynamic change of groundwater level is the key factor for the optimal allocation of groundwater resources. However, most of the existing groundwater level prediction models are insufficient in considering temporal and spatial factors and their spatio-temporal correlation. Therefore, construction of a space–time prediction model of groundwater level considering space–time factors and improving the prediction accuracy of groundwater level dynamic changes is of considerable theoretical and practical importance for the sustainable development of groundwater resources utilization. Based on the analysis of spatial–temporal characteristics of groundwater level of the pore confined aquifer II of Changwu area in the Yangtze River Delta region of China, the wavelet transform method was used to remove the noise in the original data, and the K-nearest neighbor (KNN) method was used to calculate the water level. The spatial–temporal dataset and the long short-term memory (LSTM) were reconstructed by screening the spatial correlation of the monitoring wells in the study area. A spatio-temporal KNN-LSTM prediction model for groundwater level considering spatio-temporal factors was also constructed. The reliability and accuracy of KNN-LSTM, LSTM, support vector regression (SVR), and autoregressive integrated moving average (ARIMA) model were evaluated by a cross-validation algorithm. Results showed that the prediction accuracy of KNN-LSTM is 20.68%, 46.54%, and 55.34% higher than that of the other single prediction models (LSTM, SVR, and ARIMA, respectively).
HIGHLIGHTS
A KNN-LSTM spatio-temporal prediction model for groundwater level is proposed.
It is vital to use wavelet transform to denoise the original data before prediction.
KNN-LSTM has better applicability and accuracy than traditional single random models.
Graphical Abstract
INTRODUCTION
The dynamic change trend of groundwater level, as an important basis for groundwater resources management, is of considerable importance for the sustainable utilization planning of regional groundwater resources (Stasik et al. 2016; Ziolkowska & Reyes 2017). Groundwater level dynamics often show a high degree of randomness and hysteresis in time sequence due to the combined effects of natural conditions and human activities. The dynamic trend of groundwater level can be predicted to a certain extent based on a corresponding prediction model (Scibek & Allen 2006; Sun 2013). At present, two kinds of models, namely deterministic and stochastic, are mainly used to predict groundwater level. A deterministic model requires long sequences of monitoring data (Attard et al. 2016; Moghaddam et al. 2018). Moreover, the number of data and precision requirements of this type of model are high; a large amount of calculation complicates model identification and parameter calibration. A stochastic model uses stochastic processes, such as regression analysis (Evangelides et al. 2017), time sequence analysis (Craciun et al. 2018), fuzzy recognition (Pathak & Hiratsuka 2011), gray models (Wu et al. 2001; Kang & Maeng 2016), and wavelet neural network analysis (Wang et al. 2020), to describe hydrological processes. These prediction models have their advantages. However, a single stochastic model still has some inevitable limitations in simulating the complex dynamic characteristics of groundwater level.
Therefore, researchers have explored alternative methods, such as data-driven (statistical and machine learning (ML)) techniques, in the past two decades. Compared with traditional statistical analysis models, the ML method has effective real-time prediction and can solve the problem of multiple independent and dependent variables (Yoon et al. 2016). Moreover, the ML method avoids some complex physical hydrological processes and only relies on the statistical relationship between explanatory and response variables (Knoll et al. 2019). ML methods have been successfully applied to groundwater level prediction.
Some studies (Park et al. 2016) have also revealed that the accuracy of groundwater level prediction models combined with different ML methods is higher than that of single ML methods. This finding is due to the effective extraction of specific patterns in the data (such as trends, cycles, and horizontal changes) by different methods. For example, wavelet transform (WT) (Wang et al. 2020) is a time-frequency localization method, which can extract time-varying (trend and periodicity) or multi-scale behaviors from the time sequence. WT is usually combined with ML to predict groundwater level, and such a combination considerably improves the prediction accuracy compared with ML alone (Bhardwaj et al. 2020; Rahman et al. 2020).
Multivariate linear regression (Mogaji et al. 2015), artificial neural network (Trichakis et al. 2011), extreme learning machine (Liu et al. 2017), fuzzy logic (Theodoridou et al. 2017), adaptive neural-fuzzy inference systems (Jebastina & Arulraj 2018), support vector regression (SVR) (Zhou et al. 2017; Yoon et al. 2019), genetic programming (Gorgij et al. 2017), gradient-based machine and statistical methods (such as the ARIMA model) (Kumar & Rathnam 2019) are widely used in groundwater level prediction. In recent years, with the rapid development of deep learning techniques, deep learning models are gradually being applied to the study of time and sequence data. A recurrent neural network (Bowes et al. 2019; Jeong et al. 2020) introduces the time sequence into the neural network, facilitating the adaptation of the model in the analysis of time series data. The long and short-term memory (LSTM) neural network model (Chung 2020; Supreetha et al. 2020) has good adaptive and self-learning capabilities. LSTM has good performance in mining the regular patterns of time sequences but is ineffective in considering the spatial correlation of spatio-temporal sequences. The K-nearest neighbor (KNN) algorithm is a distance-based weighted classification algorithm (Sakizadeh & Mirzaei 2016; Motevalli et al. 2019). KNN can be used to screen the spatial correlation of target monitoring wells and reconstruct the space–time data set and the LSTM neural network algorithm. When combined with the actual situation of the study area, wavelet transform is used to remove the impact of noise in the data (Rahman et al. 2020; Wang et al. 2020).
Thus, the introduction of appropriate in-depth learning algorithms and the development of high-precision random prediction models are important research directions for future development (Park et al. 2016; Yoon et al. 2016). The traditional prediction methods of groundwater level regimes are only based on time series data, and these methods are relatively simple and seldom consider spatial factors (Sena & Nagwani 2016; Zhang et al. 2017). The prediction of a groundwater level regime should consider the factors of time and space dimensions, which are not only affected by the historical data of the target monitoring well but also by those of other monitoring wells related to it. Moreover, exploring the law of groundwater level monitoring data sequence, proposing an effective spatio-temporal data mining method, and building a nonlinear and high-precision groundwater level prediction hybrid model considering spatio-temporal factors are all necessary.
STUDY AREA AND DATA COLLECTION
Overview of study area
The study area was Changzhou City, Jiangsu Province, which is located in the hinterland of the Yangtze River Delta. The study area, which has a flat terrain, is located on the northwest edge of Taihu Plain. In the Taihu Lake area, it is slightly high in the north and low in the south, and the elevation difference between the north and the south is approximately 1–2 m (Lu et al. 2019). The study area is approximately 1,793 km2. This area covers four municipal districts of Xinbei, Tianning, Zhonglou, and Wujin, which are called ‘Changwu area.’ The geographical coordinates are 31°20′–32°03′N and 119°40′–120°12′E. The geographical location is shown in Figure 1.
Map of the study area and its geographical location. (a) Jiangsu Province, China. (b) Changzhou City. (c) Spatial location of monitoring wells in the study area.
Map of the study area and its geographical location. (a) Jiangsu Province, China. (b) Changzhou City. (c) Spatial location of monitoring wells in the study area.
The pore groundwater in the Changwu area mainly exists in Quaternary unconsolidated sediment layers. The pore phreatic aquifer and the pore confined aquifers I to III distributed from top to bottom (Chen et al. 2018). The recharge of the pore phreatic water and the pore confined aquifer I water is mainly atmospheric precipitation, surface water, and agricultural irrigation water, and the main discharge modes are exploitation and evaporation. The pore confined aquifers II and III water are mainly used for development and utilization, respectively, and the pore confined aquifer II is the main mining layer of groundwater.
The pore confined aquifer II group belongs to the middle Pleistocene aquifer group of the Quaternary, which is distributed in the Qidong Formation and divided into the upper (I1) and lower (II2) aquifers. Of these aquifers, the II2 aquifer is the main aquifer in the area because of its large thickness and excellent water yield. From the mid-1980s to 2000, the pore confined aquifer II water was in a long-term over-exploitation state (Chen et al. 2018). The pore confined aquifer II water level remarkably dropped and formed a large-scale depression cone, which resulted in a serious unbalance of the groundwater seepage field and caused a series of environmental geological problems, such as land subsidence, ground fissures, and ground collapse. This phenomenon is a serious threat to the lives and properties of residents, seriously damaging the ecological environment and affecting the sustainable development of the local economy (Lu et al. 2019). Thus, since 2000, the local government at each level has formulated a series of regulations on groundwater mining restriction and prohibition. These regulations curbed the continuous decline of the groundwater level. The rate of land subsidence has currently been alleviated, and the geological environment has shown a benign development trend.
Data collection
Monitoring groundwater level provides spatio-temporal sequence data (Khetwani & Singh 2018; Khorrami 2019). The spatio-temporal characteristics of the groundwater level monitoring value determine the selection of the spatio-temporal prediction model structure and algorithm of groundwater level. The pore confined aquifer II in the study area has 33 monitoring wells, and the groundwater level dynamic monitoring frequency is once a month in the first 10 days. Fifteen years of water level monitoring data from 2004 to 2018 were selected in this study.
Outliers in groundwater level monitoring datasets are often caused by monitoring equipment errors and non-standard monitoring methods (Jeong et al. 2017; Peterson et al. 2018) and often lie somewhere between fact and error. The fact means that the value is a fact, while the error means that the monitoring method may be irregular. Such errors are often more hidden than missing data and are difficult to handle. The time sequence outliers of water level at each monitoring point are analyzed in this study by using a boxplot of the statistics, as shown in Figure 2.
Figure 2 shows that some sequences of the spatio-temporal dataset of groundwater level in the study area considerably fluctuated and revealed slight outliers. However, this trend does not rule out the existence of anomalies in individual data cycle changes due to the large changes in the time sequence. The verification of the original data indicated that the abnormal values are caused by errors of the actual monitoring data records of monitoring wells and the influence of excessive exploitation of adjacent water intake wells. This anomaly will reduce the prediction accuracy of a groundwater level regime. Therefore, preprocessing the original monitoring well data, such as noise reduction, is necessary for the subsequent construction of a groundwater level prediction model to improve the accuracy of the prediction model.
In statistics, a spatial stationary process indicates that continuous spatial variables do not change with the location of the random process (Nakagawa et al. 2016). Studying the spatial stationarity of the spatio-temporal sequence of groundwater level involves the analysis of the spatial distribution trend of the monitoring data. A mathematical curved surface is formed by fitting the spatial sequence to reflect the various characteristics of the groundwater level in the spatial region (Fang et al. 2019). On the basis of ignoring local anomalies, the overall variation trend of the groundwater level in a spatial dimension is revealed. Figure 3 is the second-order spatial trend map of the average water level of each monitoring point in the space–time sequence of groundwater level in the study area.
Figure 3 shows that the groundwater level is generally high in the west and low in the east. The groundwater level first decreases in the north–south direction and then increases from south to north. Therefore, the groundwater data of the study area are non-stationary in the spatial dimension.
The time stability of the monitoring sequence of groundwater level in the study area was tested by analyzing the change in groundwater level of each monitoring well in time sequence. Figure 4 demonstrates the water level changes of the four selected representative monitoring wells in the past 10 years. The average water level of monitoring wells in the study area during the past 15 years has generally been increasing with years; that is, the overall water level changes show an upward trend. Simultaneously, the average water level varies with the change in location and the passage of time, and a small fluctuation is observed at the same time interval. Therefore, the monitoring sequence of groundwater level in the study area is a non-stationary sequence in time.
Sequence diagram of the average groundwater level in the study area from 2014 to 2018.
Sequence diagram of the average groundwater level in the study area from 2014 to 2018.
CONSTRUCTION OF THE SPATIO-TEMPORAL DATA PREDICTION COUPLING MODEL
The monitoring value of groundwater level is a kind of non-stationary and non-linear geographical space–time data (Khorrami 2019). The analysis of spatial–temporal characteristics of groundwater level monitoring data reveals the regularity and difference of the groundwater level distribution in time and space. A strong spatio-temporal correlation is found between sequence and noise. Thus, the realization of groundwater level prediction should not only consider the historical data of the target monitoring wells and noise impact but also the influence of other monitoring well data associated with the target monitoring well. Therefore, constructing a spatio-temporal hybrid model for groundwater level prediction, which considers temporal and spatial factors, is necessary.
Brief introduction of spatio-temporal data prediction coupling model
The KNN-LSTM model proposed in this paper includes WT, the KNN algorithm considering spatial correlation, LSTM used to predict time sequence, and a cross-validation method employed to verify the accuracy of the model. The main steps are as follows.
Step 1: Data preparation phase. Spatio-temporal monitoring data of groundwater levels were sorted, and a spatio-temporal sequence data set was compiled to eliminate outliers. The noise reduction sequence Matrix XR of the continuous D years of groundwater level was obtained through the WT of the original data set.
Step 2: Spatial correlation filtering phase. Based on the spatio-temporal sequence data set, the KNN algorithm based on distance weighting is used to filter the spatial correlation of the denoised sequence Matrix XR. The K monitoring wells with the strongest spatial correlation with the target monitoring wells are obtained. The sequence matrix XCR of continuous D years of groundwater monitoring wells after spatial screening is constructed.
Step 3: Model training phase. The data of K monitoring wells in XCR for successive D years are divided into two groups: training and test sets for model training and validation, respectively. The data of the D-1 year are selected to construct the training set, which is inputted into the LSTM model to predict the groundwater level of K monitoring wells in the D year.
Step 4: Groundwater level prediction phase. The predicted values of groundwater level of K monitoring wells in the predicted years are weighted and fused in accordance with the weight of each monitoring well. The predicted value of groundwater level of the target monitoring well in the predicted year is obtained, and the predicted value is taken as the initial prediction result.
Step 5: Model parameter adjustment phase. The cross-validation method is used to verify the prediction accuracy of the model, and the optimal KNN-LSTM model is constructed. Thus, the final prediction result of the groundwater level value in the coming year can be obtained by training.
Wavelet threshold denoising method
According to this principle the signal in the time domain is transformed into the spectrum energy of different frequency bands in the frequency domain by wavelet transform (Bhardwaj et al. 2020). High-frequency information and noise can be effectively distinguished by setting a reasonable spectrum energy threshold to suppress the interference of high-frequency noise.





KNN-LSTM based on distance weighting
The spatial correlation of the real water level sequence XR of the underground water level monitoring well is screened. The K monitoring wells with the strongest spatial correlation with the target monitoring wells are selected by the KNN algorithm based on distance weighting. The groundwater level sequence of K selected monitoring wells is transferred to the LSTM prediction model as the input data set, the spatio-temporal prediction is conducted, and the prediction error is calculated. The result with the smallest error is regarded as the final result of spatial correlation screening based on the constant adjustment of the K value. The main process of spatial correlation screening and prediction algorithm is as follows:
- (1)
The denoised data set XR is divided into sample and object sequences.
- (2)The Euclidean distance l between each monitoring well and the target monitoring well is calculated. For any position monitoring well i
and target monitoring well o
, the real distance between two monitoring wells is evaluated by calculating the square sum of the difference between the kilometer grid coordinates of the two monitoring wells, and the specific calculation formula is as follows:
- (3)
The monitoring wells are sorted in increasing order according to the size of the calculated Euclidean distance l between the monitoring wells, and the error is expressed by M. The initial setting is K = 1 and M = 1.
- (4)The first K distances with the smallest order corresponding to the denoised groundwater level sequence of the monitoring wells are selected, and the sequence Matrix XcR of the monitoring wells for D consecutive years are constructed after spatial screening.where
represents the denoised water level sequence of the i monitoring well in the continuous D year after screening through the spatial correlation, and
represents the denoised water level sequence of the i monitoring well in the d year.
- (5)The data of K monitoring wells in the first D-1 year after spatial screening are selected as the training set. The LSTM prediction model, learning, and training are inputted, and the D year predicted water level value of the K monitoring wells is outputted. The prediction values of each monitoring well are fused in accordance with different weight values. The prediction data of the groundwater level of the target monitoring well o in the D year are obtained and expressed as follows:where Φ refers to the prediction model used, with the LSTM model selected in this paper;
is the predicted value of the groundwater level of the i monitoring well in the D year; Wi refers to the weighted coefficient of the i monitoring well in the weighted fusion of the predicted values of each monitoring well, ri refers to the grade of the i monitoring well, ri ∈ [1, K];
refers to the prediction value of groundwater level in the D year obtained by the target monitoring well according to the weighted fusion of the weight matrix.
- (6)
K=K+1 and loop steps (4) and (5) until K > M.
- (7)
The magnitude of the prediction error corresponding to the target monitoring well under different K values is calculated, and the prediction result with the smallest prediction error is selected as the prediction data before the model is optimized. At this time, the K value is the monitoring well with the largest correlation between the final results of spatial screening.
MODEL EVALUATION AND COMPARATIVE TEST
Contrast model experimental design
The single prediction model LSTM (Supreetha et al. 2020), the commonly used nonlinear prediction model SVR (Yoon et al. 2019) and the linear regression model ARIMA (Kumar & Rathnam 2019) were selected to compare the prediction results and further verify the prediction accuracy of the proposed KNN-LSTM coupling model. The original data sequence is divided into training and validation sets; the former is used to input the neural network model to train and learn the data features, and the latter is used for model validation. The water level data of each monitoring well in the study area from 2004 to 2017 (168 issues) were selected as the training set, and the water level data in 2018 (12 issues) were selected as the model validation set. The groundwater level in the study area was predicted, and the error values were calculated by the cross-validation method (Lacorte et al. 2002).
Clarifying the input and output variables of the model is necessary before applying each model to forecast. The two methods used to train the model are shown in Figure 5.
Model training relationship diagram. (a) Training relationship f1 of multiperiod prediction at a single site. (b) Training relationship f2 for multiperiod prediction at multiple sites.
Model training relationship diagram. (a) Training relationship f1 of multiperiod prediction at a single site. (b) Training relationship f2 for multiperiod prediction at multiple sites.
The relationship f1 and f2 denote the training relations of single and multi-sequence multi-time-step prediction, respectively. The training relation of the KNN-LSTM model is selected as f2 in this paper. According to the number of monitoring wells in the study area, the number of nodes in the input and output of the model is set to 33, and the water level data in 2018 (12 issues) are predicted. The training relationship of other comparative models is f1. The experiment shows that the prediction effect is the best when the delay of input node in the f1 training relation is 3. Therefore, the numbers of input and output nodes are 3 and 1, respectively. Table 1 shows the configuration details for the comparative model experiments in this paper.
Comparison of the configuration information of model experiments
. | KNN-LSTM . | LSTM . | SVR . | ARIMA . |
---|---|---|---|---|
Training set | 2004.01–2017.12 | 2004.01–2017.12 | 2004.01–2017.12 | 2004.01–2017.12 |
Test set | 2018.01–2018.12 | 2018.01–2018.12 | 2018.01–2018.12 | 2018.01–2018.12 |
Input nodes | 33 | 3 | 3 | 3 |
Output nodes | 33 | 1 | 1 | 1 |
Training relationship | f2 | f1 | f1 | f1 |
Wavelet function | db4 | – | – | – |
Prediction site | Multiple sites | Single site | Single site | Single site |
Prediction periods (month) | 12 | 12 | 12 | 12 |
. | KNN-LSTM . | LSTM . | SVR . | ARIMA . |
---|---|---|---|---|
Training set | 2004.01–2017.12 | 2004.01–2017.12 | 2004.01–2017.12 | 2004.01–2017.12 |
Test set | 2018.01–2018.12 | 2018.01–2018.12 | 2018.01–2018.12 | 2018.01–2018.12 |
Input nodes | 33 | 3 | 3 | 3 |
Output nodes | 33 | 1 | 1 | 1 |
Training relationship | f2 | f1 | f1 | f1 |
Wavelet function | db4 | – | – | – |
Prediction site | Multiple sites | Single site | Single site | Single site |
Prediction periods (month) | 12 | 12 | 12 | 12 |
Analysis of experimental results
Taking six monitoring wells from X13 to X18 as examples, different models are used for prediction. The groundwater level data of 12 months in 2018 were obtained, and the evaluation indexes were acquired through cross-validation. The comprehensive evaluation results are shown in Figure 6.
The comparison results presented in Figure 6 reveal that the prediction errors of Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) of the KNN-LSTM model are smallest during the groundwater level prediction of the target monitoring wells X13 to X18 in the study area. This finding indicates that the performance of the groundwater level prediction model based on KNN-LSTM is better than the LSTM, SVR, and ARIMA models.
ARIMA performed the worst, and KNN-LSTM performed the best overall. Similarly, the values of different error terms of all 33 monitoring wells in the study area can be calculated, and RMSE is the most commonly used (Mentaschi et al. 2013). Cross-validation experiments can be calculated to obtain the mean RMSE of each prediction error to compare the prediction accuracy of different models directly. The mean RMSEs of the KNN-LSTM, LSTM, SVR, and ARIMA models are 0.418, 0.527, 0.782, and 0.936, respectively. The mean RMSE of the KNN-LSTM model proposed in this paper is the smallest compared with that of the three other models. The prediction accuracy of the model is improved by 20.68%, 46.54%, and 55.34% compared with the three other models, which further verifies the reliability of the model.
The curve comparison charts of the predicted and the true values of the KNN-LSTM model and the three other comparison models, namely LSTM, SVR, and ARIMA, for the groundwater level prediction of the monitoring wells X13 to X18, were drawn, as shown in Figure 7.
The characteristic analysis of the curves in Figure 7 reveals that the prediction accuracy of the typical linear model ARIMA is the lowest, thus failing to predict the nonlinear and non-stationary groundwater level monitoring data accurately. Compared with the ARIMA model, the prediction performance of the SVR model improves to some extent. However, the prediction performance of the LSTM model is evident. The prediction performance of the KNN-LSTM model is the best, and the corresponding curve is close to the true value. Moreover, the evaluation index of each error is the smallest, and the prediction accuracy is high. Thus, denoising the original data is necessary.
MODEL APPLICATION
Data noise reduction processing
The wavelet threshold denoising method should be used to denoise the data to avoid the influence of noise on the model prediction in the process of groundwater level monitoring. The important factors that determine the effect of wavelet denoising during wavelet decomposition include wavelet function (Behnia & Rezaeian 2015; Yu & Lin 2015), threshold coefficient, and decomposition layer. Wavelet coefficients are used to evaluate the choice of threshold coefficients in denoising. Simultaneously, excessively large or small decomposition levels are not conducive to signal recovery. The parameters of KNN-LSTM wavelet denoising processing of groundwater level data are set as follows: wavelet function is db4, threshold coefficient P is 0.5, and decomposition level is 5.
Figure 8 takes the monitoring well X09 as an example and decomposes the original data into a five-layer structure chart, where D1–D5 are the noise signals, and A1–A5 are the real signals after noise reduction. A reasonable threshold is set to eliminate the noise signal in the high-frequency components, and then the wavelet signal is reconstructed. Figure 9 shows the noise signal in the raw data. By comparing the noise signal with the actual monitoring data, the time of the large noise appearance is the time when the monitoring well appears abnormal (such as during instrument failure). The wavelet threshold denoising method is used to process the data to remove large fluctuations in noise signal and obtain the real signal after noise reduction. The overall signal change trend becomes stable.
Wavelet decomposition five-layer structure diagram of monitoring well X09.
Original, noise, and real signals after noise reduction of monitoring well X09.
Screening and prediction of KNN-LSTM spatial correlation based on distance weighting
The KNN based on distance weighting is used to select the spatial correlation of the existing monitoring wells in the study area. The monitoring wells with the highest spatial correlation with the target monitoring wells are obtained, and the time–space correlation matrix is constructed to predict the groundwater level. The spatial correlation screening principle of the KNN algorithm indicates that screening different K values determines the number of sites that are spatially related to the target monitoring wells, which will directly affect the prediction accuracy of the model (Motevalli et al. 2019).
Taking the monitoring wells X13 and X16 in the study area as examples, the monitoring wells can be used as a demonstration of spatial correlation screening and water level prediction of KNN-LSTM. Table 2 lists the monitoring well numbers that are most relevant to the target monitoring wells X13 and X16 for different values of K. The weighted coefficient Wi corresponds to the weighted fusion of the predicted values of the target monitoring wells X13 and X16 and other monitoring wells. The degree of spatial correlation between the monitoring wells screened by the KNN algorithm and the target monitoring wells gradually decreases with the increase in K value. Therefore, the monitoring well numbers and their corresponding weighting coefficients Wi selected by different K values are listed in Table 2.
Monitoring well numbers and their weighting coefficients selected for different K values
K value . | Target Monitoring Well . | Screened monitoring well number and fused weighting coefficient . |
---|---|---|
K= 1 | X13 | {X13} |
X16 | {X16} | |
Weighting coefficient Wi | {1} | |
K= 2 | X13 | {X13, X19} |
X16 | {X16, X24} | |
Weighting coefficient Wi | {0.800, 0.200} | |
K= 3 | X13 | {X13, X19, X02} |
X16 | {X16, X24, X21} | |
Weighting coefficient Wi | {0.643, 0.286, 0.071} | |
K= 4 | X13 | {X13, X19, X02, X18} |
X16 | {X16, X24, X21, X32} | |
Weighting coefficient Wi | {0.533, 0.300, 0.133, 0.033} | |
K= 5 | X13 | {X13, X19, X02, X18, X33} |
X16 | {X16, X24, X21, X32, X29} | |
Weighting coefficient Wi | {0.455, 0.291, 0.164, 0.073, 0.018} | |
K= 6 | X13 | {X13, X19, X02, X18, X33, X14} |
X16 | {X16, X24, X21, X32, X29, X04} | |
Weighting coefficient Wi | {0.396, 0.275, 0.176, 0.099, 0.044, 0.011} | |
K= 7 | X13 | {X13, X19, X02, X18, X33, X14, X22} |
X16 | {X16, X24, X21, X32, X29, X04, X28} | |
Weighting coefficient Wi | {0.350, 0.257, 0.179, 0.114, 0.064, 0.029, 0.007} | |
K = 8 | X13 | {X13, X19, X02, X18, X33, X14, X22, X03} |
X16 | {X16, X24, X21, X32, X29, X04, X28, X20} | |
Weighting coefficient Wi | {0.314, 0.240, 0.177, 0.123, 0.078, 0.044, 0.020, 0.005} | |
K= 9 | X13 | {X13, X19, X02, X18, X33, X14, X22, X03, X04} |
X16 | {X16, X24, X21, X32, X29, X04, X28, X20, X25} | |
Weighting coefficient Wi | {0.284, 0.225, 0.172, 0.126, 0.088, 0.056, 0.032, 0.014, 0.004} | |
K= 10 | X13 | {X13, X19, X02, X18, X33, X14, X22, X03, X04, X12} |
X16 | {X16, X24, X21, X32, X29, X04, X28, X20, X25, X18} | |
Weighting coefficient Wi | {0.260, 0.210, 0.166, 0.127, 0.094, 0.065, 0.042, 0.023, 0.010, 0.003} |
K value . | Target Monitoring Well . | Screened monitoring well number and fused weighting coefficient . |
---|---|---|
K= 1 | X13 | {X13} |
X16 | {X16} | |
Weighting coefficient Wi | {1} | |
K= 2 | X13 | {X13, X19} |
X16 | {X16, X24} | |
Weighting coefficient Wi | {0.800, 0.200} | |
K= 3 | X13 | {X13, X19, X02} |
X16 | {X16, X24, X21} | |
Weighting coefficient Wi | {0.643, 0.286, 0.071} | |
K= 4 | X13 | {X13, X19, X02, X18} |
X16 | {X16, X24, X21, X32} | |
Weighting coefficient Wi | {0.533, 0.300, 0.133, 0.033} | |
K= 5 | X13 | {X13, X19, X02, X18, X33} |
X16 | {X16, X24, X21, X32, X29} | |
Weighting coefficient Wi | {0.455, 0.291, 0.164, 0.073, 0.018} | |
K= 6 | X13 | {X13, X19, X02, X18, X33, X14} |
X16 | {X16, X24, X21, X32, X29, X04} | |
Weighting coefficient Wi | {0.396, 0.275, 0.176, 0.099, 0.044, 0.011} | |
K= 7 | X13 | {X13, X19, X02, X18, X33, X14, X22} |
X16 | {X16, X24, X21, X32, X29, X04, X28} | |
Weighting coefficient Wi | {0.350, 0.257, 0.179, 0.114, 0.064, 0.029, 0.007} | |
K = 8 | X13 | {X13, X19, X02, X18, X33, X14, X22, X03} |
X16 | {X16, X24, X21, X32, X29, X04, X28, X20} | |
Weighting coefficient Wi | {0.314, 0.240, 0.177, 0.123, 0.078, 0.044, 0.020, 0.005} | |
K= 9 | X13 | {X13, X19, X02, X18, X33, X14, X22, X03, X04} |
X16 | {X16, X24, X21, X32, X29, X04, X28, X20, X25} | |
Weighting coefficient Wi | {0.284, 0.225, 0.172, 0.126, 0.088, 0.056, 0.032, 0.014, 0.004} | |
K= 10 | X13 | {X13, X19, X02, X18, X33, X14, X22, X03, X04, X12} |
X16 | {X16, X24, X21, X32, X29, X04, X28, X20, X25, X18} | |
Weighting coefficient Wi | {0.260, 0.210, 0.166, 0.127, 0.094, 0.065, 0.042, 0.023, 0.010, 0.003} |
The LSTM model is applied for prediction according to the selected monitoring well water level data. The RMSEs of groundwater level prediction error corresponding to the target monitoring wells X13 and X16 with different K values are shown in Figure 10.
Figure 10 shows that the monitoring well X13 corresponds to the smallest prediction error at K= 10. The prediction error of monitoring well X16 corresponding to K= 7 is the smallest. The analysis of the location distribution of each monitoring well and the spatial correlation screening results of the target monitoring wells in the study area reveal the following: when K is 10 and 7, the monitoring wells with the largest spatial correlation to the target monitoring wells X13 and X16 are mostly located in their adjacent locations. That is to say, most of the monitoring wells are located in the cone of depression formed when the water level drops, and a few monitoring wells are relatively far away from the space.
The above analysis reveals that the spatio-temporal prediction of groundwater level is markedly influenced by the change in water level value of monitoring wells nearby. However, the prediction is also affected by other associated monitoring wells outside the radius of the water level depression funnel, but the impact is small. Therefore, in addition to considering the impact of historical time sequence, selecting the spatial correlation of monitoring wells in the study area is crucial in the study of the groundwater level prediction method.
Prediction results of groundwater level and its space–time expression
Results of groundwater level prediction
On the basis of water level dynamic prediction of 33 monitoring wells, nine representative groundwater level dynamic monitoring wells, which run through drawdown cones, were selected. The comparison of the original water level sequence from January to December 2018 with the predicted water level sequence is shown in Figure 11.
Figure 11 reveals that the water level prediction sequence of each monitoring well shows a trend of water level recovery overall. The sequence also has a certain degree of random fluctuations, which is consistent with the actual trend of water level changes in the study area. Overall, the predicted value of the water level is close to the original monitoring value. However, monitoring wells X07, X09, and X17 have more evident water level fluctuation than other monitoring wells because X07 and X17 monitoring wells are located along the Yangtze River in the northern part of the study area. The dynamic variation of the water level in the pore confined aquifer II is influenced by the infiltration of atmospheric precipitation and the dynamic variation of the water level in the Yangtze River, resulting in strong seasonal fluctuation. Although the X09 monitoring well is located at the edge of the lake, the lake bottom is only approximately 2.0 m deep, which belongs to a good impermeable layer. In addition, the pore confined aquifer II is unaffected by the dynamic change in the water level of the lake because no recharge from the lake to the pore confined aquifer II is observed. The analysis reveals that the exploitation intensity of the pore confined aquifer II water around the X09 monitoring well considerably changes at different times. Thus, the water level fluctuation is evident.
However, the difference between the two sequences of monitoring wells X07, X09, and X17 is revealed in Figure 11. The gap is magnified after mapping due to the large groundwater level value and the small fluctuation. In the actual calculation of prediction error, the error value of such monitoring well sequence is small, and its prediction results are still excellent. The prediction sequence of monitoring wells X04 and X33 has a relative difference with the original sequence, and the prediction accuracy is low. By querying the original information, a considerable amount of missing data in the sequence is noted. This phenomenon affects the accuracy of the prediction results.
Spatial and temporal expressions of groundwater level data
The Kriging interpolation method (Kumar 2007) is used to draw a water level isosurface map for each month in 2018 as predicted by the KNN-LSTM model to display the dynamic trend of groundwater level in the study area intuitively. Figure 12 shows the lowest water level marked in red, the highest water level marked in green, and the middle water level changes from red to yellow and then to green.
Isogram of predicted water level in the study area from January to December 2018. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2021.140
Isogram of predicted water level in the study area from January to December 2018. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/ws.2021.140
The overall change of groundwater level is minimal because the time interval is one month and the step length is short. However, the color of the red low water area in the central part of the study area gradually transits to yellow, and the water level value continues to rise. From the mid-1980s to 2000, the Changwu area was overexploited for a long time. The central water level of the regional groundwater depression cone reaches below −80 m. Since 2004, especially in the central part of the study area, the groundwater depression cone has been significantly alleviated each year. The overall groundwater level in the central area of the Changwu area gradually and steadily rose to 2018.
The amount of exploitation should be strictly controlled and the future groundwater level dynamics should be predicted with high accuracy to realize the effective protection of groundwater resources. The corresponding reasonable groundwater exploitation scheme should be formulated in accordance with the prediction results. The over-exploitation of groundwater resources should be prevented and curbed comprehensively until the depression cone of groundwater in the study area disappears. The geological environment will show a benign development trend through the aforementioned approach, and the effective management and rational development of groundwater resources will be realized.
CONCLUSION
The KNN-LSTM prediction model proposed in this paper considers the spatio-temporal characteristics of the original data set rather than a traditional single random prediction model. The wavelet threshold denoising method is used to remove the influence of noise in the data. The KNN algorithm is employed to screen the spatial correlation of the monitoring wells, and the spatio-temporal data set and the LSTM algorithm are reconstructed. Moreover, the self-organization and self-learning capabilities of deep learning are used in the prediction to mine the variation rules of spatio-temporal sequence groundwater level data and the correlation between sequences. Simultaneously, combined with geostatistical methods, the geographical location data are incorporated into the calculation process. Through cross-validation, the KNN-LSTM prediction model has better applicability and accuracy than the traditional single random model.
However, the mechanisms of groundwater formation, storage, and migration are complex. The distance-weighted KNN algorithm is used in this study to select the spatial correlation, and the weighted coefficient is determined by calculating the distance between the target monitoring well and the other monitoring wells. However, the actual occurrence of groundwater in the environment reveals a ‘thinning out’ of the aquifer, thus affecting the determination of the weighting coefficient. This study has not considered this special situation. In addition, the validation of the KNN-LSTM model takes the groundwater level sequence in Changwu as an example. However, the prediction accuracy will change when the same model is applied to different regions. In a later stage, further exploration and research will be conducted in the field of spatio-temporal prediction of groundwater level, emphasizing the universality of the model.
ACKNOWLEDGEMENTS
We are grateful for the funding support from the National Natural Science Foundation of China (Grant No. 41571386), the Priority Academic Program Development of Jiangsu Higher Education Institutions (Grant No. 1612206002), and the Postgraduate research and innovation plan project in Jiangsu Province (Grant No. KYCX21_1334).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.