The improvement in predicting groundwater depth accuracy has significant implications for the management, ecological environment protection, and economic and social development of regional water resources. Employing empirical wavelet transform (EWT) for nonlinear processing, Savitzky–Golay (S–G) filtering to reduce high-frequency noise, gate recurrent unit (GRU) neural network for linear feature signal processing, and least squares support vector machine (LSSVM) for nonlinear signal handling, we established a comprehensive model combining EWT-S–G-GRU and LSSVM. The results demonstrate that the proposed model exhibits superior accuracy in groundwater depth prediction, with an average relative error of only 2.14% and Nash–Sutcliffe efficiency (NSE) of 0.93. This outperforms the other four models, with average relative errors of 12.88%, 11.90%, 7.07%, and 11.10%, and NSE values of 0.58, 0.56, 0.71, and 0.73, respectively. The superiority of the model established in this study is attributed to its effective handling of both nonlinear and linear features of groundwater, thereby enhancing predictive accuracy. The EWT-S–G-GRU and LSSVM model proves to be more reliable in revealing the spatial distribution and dynamic changes of future groundwater, providing a robust reference for the rational development and utilization of groundwater in the urban area of Xinxiang City.

  • Through empirical wavelet transform decomposition, Savitzky–Golay filtering and noise reduction process reduces signal fluctuation.

  • Separate the high-frequency and low-frequency components.

  • The spatial distribution of groundwater in the future and its dynamic change with time are revealed.

  • Compared with other models, the advantages of this model are clarified.

  • The coupling of multiple models improves the prediction accuracy.

Groundwater depth is an important indicator of changes in groundwater resources. Due to factors such as exploitation, recharge and evaporation, the groundwater depth sequence has characteristics such as randomness, uncertainty and non-stationarity, which adds to the difficulty of predicting groundwater depth accurately (Jeong & Kim 2009; Chen et al. 2018; He et al. 2021). When groundwater is overexploited, it can form a groundwater funnel and land subsidence. When the recharge exceeds the extraction, the depth of groundwater becomes shallower. Therefore, accurate prediction of changes in groundwater depth can provide a theoretical basis for groundwater protection, adjustment of planting structure and patterns, rational utilization of water and soil resources, and ecological environment protection (Wang et al. 2005; Chai et al. 2006; Tian 2020). Currently, research on groundwater depth prediction models at home and abroad is relatively important and has achieved fruitful results. Foreign research on groundwater depth prediction models mainly focuses on neural networks and other related techniques. Khan et al. (2023) conducted a comprehensive assessment of prevalent numerical, machine learning and deep learning models employed in forecasting groundwater levels. Their findings evince the efficacy of machine learning and deep learning approaches in the modeling of groundwater depth. Takafuji et al. (2019) compared the performance of two methods, autoregressive integrated moving average (ARIMA) and sequential Gaussian simulation (SGS), in predicting groundwater depth. The results showed that the ARIMA model is considered more suitable for monitoring aquifers, as they achieved the same level of accuracy as SGS in a 2-month forecast. Gharehbaghi et al. (2022) used a gate recurrent unit (GRU) neural network to predict the regional average monthly time series of groundwater depth within a range of 4.82 m in the Urmia plain. Bowes et al. (2019) explored two machine learning models, LSTM and recurrent neural network, to simulate and predict the response of groundwater levels to storm events in Norfolk, Virginia, a coastal city prone to flooding. These results demonstrated the first use of LSTM networks to create hourly forecasts of groundwater levels in a coastal city and showed that they are well suited for creating real-time operational forecasts. Aghel et al. (2023, 2018) employed data-driven models, including adaptive neural fuzzy inference systems and hybrid structures such as particle swarm optimization–adaptive neural fuzzy inference systems, to accurately predict water quality parameters. Agoubi & Kharroubi (2019) evaluated and predicted short-term groundwater depths under different influencing factors using stochastic time series and artificial neural network (ANN). Domestic scholars (e.g. Zhou et al. 2013) proposed a data-driven prediction model for groundwater depth prediction using discrete wavelet transformation preprocessing and support vector machines (SVM) and compared it with conventional ANN, conventional SVM and weight agnostic neural networks models with wavelet preprocessing. Zhang et al. (2017a, 2017b) used the grey self-memory model, radial basis function network and adaptive neuro fuzzy inference system models to predict the buried depth of unconfined aquifers in Jilin City. Yang & Zhang (2022) devised a novel hybrid model, convolutional neural network-long short-term memory-meta-learning (CNN-LSTM-ML), that integrates CNN and LSTM network structures. The model leverages a ML algorithm framework to guarantee network performance under sample conditions when forecasting the groundwater level for 66 observation wells in the middle and lower reaches of the Heihe River in arid zones. Yin & Zhang (2021) used a multi-stage cumulative grey model to fit and predict different periodic indicators and long-term trends separately, and combined the prediction results of both to obtain the final prediction result. They applied this model to groundwater prediction in the Yinchuan Plain. Zhou et al. (2013) successfully applied the least squares support vector machine (LSSVM)-Markov chain combined model to annual runoff prediction. Zhang et al. (2010) constructed a runoff prediction model based on multi-factor quantification indicators using the least squares support vector method.

From a comprehensive analysis of domestic and foreign research, numerical prediction methods such as grey model and neural network are mainly used in the prediction of groundwater depth. However, it is rare to construct a prediction model that takes into account the characteristics of the groundwater depth sequence itself by reducing the non-stationarity of the sequence and separating high-frequency components from low-frequency components. This article applies the empirical wavelet transform (EWT) decomposition technique to preprocess the groundwater depth data in Xinxiang City's urban region. The technique decomposes the data into a series of components that contain single-frequency information, filters and denoises the high-frequency components, and separates them from the low-frequency components for forecasting purposes. The study establishes a prediction model using EWT-Savitzky–Golay (S–G)-GRU and LSSVM, and assesses the model error to confirm the efficacy of the approach.

The objective of this study is to establish a comprehensive groundwater depth prediction model by integrating EWT-S–G-GRU and LSSVM. The uniqueness of this model lies in its effective handling of both nonlinear and linear features of groundwater, leading to a significant improvement in predictive accuracy. The novelty of this research lies in the integration of various advanced techniques, providing a reliable prediction framework for groundwater resource management and rational utilization. This study contributes new insights to the relevant field of research and practice.

EWT

In 2013, Gilles (2013) proposed a new method for constructing adaptive wavelets directly generated by the characteristics of the signal itself, called the EWT (Cohen 1955; Zheng et al. 2016). This method inherits the adaptive properties of the empirical mode decomposition (EMD) algorithm, enriches the mathematical theory and can solve the problem of mode mixing in the EMD method, extracting more stable components. The implementation process of this method is as follows:

  • (1)

    Perform a Fourier transform on the original signal to obtain the signal's frequency spectrum.

  • (2)

    Perform adaptive segmentation on the signal's frequency spectrum. Search for local minima in the spectrum to determine the boundaries for segmentation.

  • (3)

    Based on the segmentation boundaries determined by the frequency spectrum, the principles of classical wavelet transform are used to construct a EWT filter bank which is used to decompose the original signal into its individual frequency components.

This method can effectively separate the low-frequency and high-frequency components, with almost no mode mixing.

S–G

This study uses the S–G filtering method based on the filtering function smoothing method (Djurhuus et al. 2019). Assume that a set of data points x[n] are being processed, and a continuous set of 2M + 1 data points is selected with n = 0 as the center. The data are then fitted using the following formula:
(1)
Therefore, the residuals of the least squares fit are:
(2)
when is minimal, the partial derivatives of with respect to each parameter are 0. At this point, an auxiliary matrix A with (2M + 1) rows and (N + 1) columns is introduced.
(3)
Introducing a further auxiliary matrix B:
(4)
Set:
(5)
Then:
(6)
(7)

From this, the values of the coefficients a0, a1, … , aN of the polynomial are obtained. After that, the window width of 2M + 1 is shifted to obtain the filtered series y[n]. The above derivation shows that the performance of S–G filtering is mainly determined by two filtering parameters: the window width M and the fitting order N. Therefore, it is only necessary to make reasonable adjustments to these two filtering parameters to deal with different data situations.

To summarize, smoothing filtering is a widely used preprocessing technique in spectral analysis. The S–G method of smoothing filtering can enhance the smoothness of the spectrum and mitigate noise interference. The impact of S–G smoothing filtering is reliant on the window width selected and can cater to diverse requirements in various settings.

GRU

In contrast to LSTM, the GRU exhibits a simpler structure. LSTM comprises three gates – input gate, forget gate and output gate, whereas GRU minimizes and combines the gate structure to only include reset gate and update gate (Yi & Yan 2021). This approach accelerates network training without compromising precision. Figure 1 illustrates the GRU architecture.
Figure 1

GRU structure diagram.

Figure 1

GRU structure diagram.

Close modal

The components of Figure 1 are labeled as follows: refers to the input at time t, signifies the output or hidden layer cell output at time t, C denotes the connection between two vectors, X refers to the product of the two components of input data, and 1− is the result of subtracting 1 from the data input to the module. The update gate is denoted as , responsible for regulating the degree of information inflow when the state information from the preceding moment enters the current state. A larger value implies greater integration of information from the prior state into the current state. The reset gate, , is responsible for managing the proportion of information from the previous state acting on the current state's hidden cell candidate set, . The hyperbolic tangent activation function, tanh, outputs values between (−1, 1), while the sigmoid activation function, , generates outputs within the range (0, 1). After the information enters the GRU, the process transfer consists of the following steps:

  • (1)

    Equation (8) concatenates the input data at time t and the output of the hidden layer at time t − 1 to derive the reset gate output signal, .

  • (2)

    is the output signal of the update gate, obtained from Equation (9).

  • (3)

    is the candidate set of current state hidden units, obtained from Equation (10), which mainly integrates the input data and the hidden layer state selected by the reset gate from the previous time step t − 1.

  • (4)

    The output of the hidden layer at time t is then computed using Equation (11), which involves the selection of essential information from the candidate hidden layer at time t while neglecting the previously transmitted hidden layer information, , at time t − 1.

represents the selectively ‘forgetting’ of the hidden state at time t − 1, and represents the selectively ‘forgetting’ of the information in the candidate set of hidden units , retaining only the important information in . In this way, the accumulation and memory of historical information are repeatedly implemented, which can effectively capture the long-term correlations that exist in economic forecasting or time series models:
(8)
(9)
(10)
(11)

This study employed a gated recurrent unit model with two hidden layers, each followed by a Dropout layer to mitigate overfitting. The output layer utilized a sigmoid activation function, suitable for binary classification tasks. During the model compilation, the Adam optimizer was chosen, along with binary cross-entropy loss function, and accuracy as the evaluation metric.

LSSVM

LSSVM is an extension of SVM, which not only greatly reduces the complexity of calculation, but also has the advantage of strong anti-interference ability and is widely used in the field of machinery and equipment fault diagnosis. Set the training dataset as ,, where is the n-dimensional training data input; is the output value of the model and l is the number of training samples. The equation of the LSSVM can be expressed as (Gestel et al. 2004):
(12)
where b is the deviation value and θ denotes the feature mapping that converts the complex nonlinear relationship between the output y and the input x into a linear relationship between y and θ. b is the deviation vector and ω is the weight vector. Optimization of the requested values leads to:
(13)
(14)
where γ is the regularization parameter, is the error term, is the nonlinear mapping function that maps from the original space to the multi-bit feature space and the Lagrange function is defined as
(15)
By taking partial derivatives of in Equation (15) and making them zero, the matrix equation can be obtained by substituting into Equation (8):
(16)
The optimal classification function can be defined as follows:
(17)
where N is the number of SVMs, b is the classification threshold, and is the sign function.

Model evaluation index

To verify whether the fitting effect of the EWT-S–G-GRU and LSSVM model is better than other models, this paper introduces two indicators, relative error and NSE, to quantitatively assess the accuracy of the model. The specific formulas are as follows:
(18)
where y is true value and is the forecasting value.
(19)
where is the predicted value at time i; is the measured value at time i and is the mean value of the measured value.

Model coupling

This passage describes the establishment of a combined prediction model, EWT-S–G-GRU and LSSVM, to improve the accuracy of predicting the groundwater depth in the urban area of Xinxiang City using multiple methods. The model framework is shown in Figure 2. The specific process of the model is as follows:
  • (1)

    The monthly time series of the groundwater depth of two professional observation wells in Xinxiang City are 204. Firstly, the EWT decomposition is used to decompose the time series into high-frequency components and low-frequency components, which are Intrinsic Mode Function (IMF)1, IMF2, … , IMFn in the figure.

  • (2)

    The IMF components of the first 180 months of the two professional observation wells are used as the training set for GRU and LSSVM, and the last 24 months of IMF components are used as the prediction set for GRU and LSSVM.

  • (3)

    For the high-frequency components, S–G is used for signal smoothing processing and then used as the prediction set of the LSSVM network to obtain the corresponding predicted value, which is Y1 in the figure.

  • (4)

    The low-frequency components are input into GRU separately to obtain the corresponding predicted values, which are Y2, Y3, … ,Yn in Figure 2.

  • (5)

    The predicted results of the two parts are accumulated and restored to predict the groundwater depth.

Figure 2

Model flow chart.

Figure 2

Model flow chart.

Close modal
Figure 3

Location map of observation wells.

Figure 3

Location map of observation wells.

Close modal

Overview of the study area and data sources

The city of Xinxiang is located in the northern part of Henan province, with a total area of 8,629 km2 and a population of 5.65 million. It belongs to the warm temperate zone and spans the Haihe and Yellow River water systems, with 78% of its area being plains. Currently, the city's reservoir capacity is 199 million cubic meters, the average annual rainfall is 621 mm and the average water resources are 1.697 billion cubic meters per year (including 1.12 billion cubic meters of groundwater resources). Henan province is a region with a shortage of water resources, and the shortage is even more severe in the northern part of Henan where Xinxiang is located. The per capita water resources in the area are only 315 cubic meters, which is less than one-seventh of the national average and far below the internationally recognized standard of 1,000 cubic meters per capita. Due to uneven spatial and temporal distribution of rainfall, more than 70% of annual rainfall is concentrated in the flood season, and the supply-demand contradiction of water resources is particularly prominent in some local areas, especially in the Haihe River Basin. With the growth of population and economic development, the demand for water in the city is increasing, and if a drought or a severe drought occurs, the water shortage problem will become more serious. Even after the completion of the South-to-North Water Diversion Project, although it can alleviate the local water shortage situation, there is still a large gap to meet the long-term needs of water resources for economic and social sustainable development and environmental ecology. It can be said that the scarcity of water resources, low per capita water resource ownership, uneven spatial and temporal distribution of water resources, water pollution and deterioration of water ecological environment have become bottlenecks that restrict the sustainable development of Xinxiang's economy and society. Coupled with unreasonable development, blind exploitation, aggravation of water pollution, serious waste of water resources, the originally scarce water resources become even more scarce. Based on this, it is very important to reasonably predict the buried depth of groundwater, analyze the changes in the buried depth of groundwater, carry out scientific research on groundwater and establish a stable groundwater monitoring system to provide effective guarantees for the utilization and development of water resources. The two selected observation wells for groundwater depth are located in Wuling Village, Luwangfen Township, Fengquan District, Xinxiang City (Well No. 1) and southeast of Xiheidui Village, Muye Township, Muye District, Xinxiang City (Well No. 2). Both wells are located in the urban area of Xinxiang City. The location of the research area is shown in Figure 3.

The data in this paper were obtained from the monthly average groundwater depth monitoring data of two professional observation wells in Wuling Village, Luwangmou Township, Fengquan District, Xinxiang City and southeast of Xiheidui Village, Muye Township, Muye District, Xinxiang City, from 2005 to 2021. The two observation wells can fully reflect the real dynamic changes of groundwater, and the monitoring data are well representative and meet the technical requirements of groundwater monitoring specifications. The research data are shown in Figure 4.
Figure 4

Monthly average depth of water table data from 2005 to 2021.

Figure 4

Monthly average depth of water table data from 2005 to 2021.

Close modal

From Figure 4, it can be shown that there exists great volatility in the groundwater depth, which indicates that there are characteristics of combining nonlinearity and linearity in the groundwater burial depth sequence, which makes it not difficult to predict the groundwater depth. In this paper, based on the characteristics of nonlinearity and linearity appearing in the groundwater depth sequence, a hybrid model combining linearity and nonlinearity is proposed to improve the prediction accuracy.

Figure 5 shows the box plot identification results of the monthly average groundwater depth data of the two observation wells for all years. From the figure, most of the groundwater depth data of observation Well No. 1 are above 8 m, but there are abnormal values of different degrees from August to October; most of the groundwater depth data of observation Well No. 2 are below 8 m and there are abnormal values in August.
Figure 5

Box line diagram of groundwater depth observation data of Well No. 1 and Well No. 2.

Figure 5

Box line diagram of groundwater depth observation data of Well No. 1 and Well No. 2.

Close modal
There are many reasons for the anomalous values of the two observation wells, such as the strength of precipitation in that year, the magnitude of human domestic water demand, industrial water demand and so on. Although both observation wells are located in the urban area of Xinxiang City, the data from the two observation wells are very different due to the different degree of water utilization and groundwater extraction in the two areas. Therefore, it is very difficult to predict the accurate groundwater depth data and it is also clear that the groundwater depth data of both observation wells show a trend of decreasing groundwater depth year by year. Therefore, accurate groundwater depth is urgent. The normalization of the decomposed monthly average groundwater depth data is considered to reduce the volatility of the original groundwater depth data and improve the stability of the model prediction. The normalized calculation formula is as follows:
(20)
where is the original data for the training period, and is the normalized processing result.

Data preprocessing

Following the steps of EWT decomposition above, EWT decomposition is performed on the monthly average groundwater data of two specialized observation wells from 2005 to 2021 and the decomposition results are shown in Figure 6.
Figure 6

EWT of groundwater depth sequence.

Figure 6

EWT of groundwater depth sequence.

Close modal
From Figure 6, the groundwater depth data of two observation wells have been divided into five components. The IMF1 component of the two observation wells has a higher frequency, shorter wavelength and weaker periodicity than the other components. The amplitude of other IMF components gradually decreases, the frequency gradually decreases and the wavelength gradually becomes longer, representing the components of the original groundwater depth at different time scales. The non-stationarity of the groundwater depth sequence is decreased after undergoing the EWT decomposition. Due to the nonlinearity and non-stationarity of the groundwater depth sequence, the waveform of the IMF1 component of the two wells fluctuates more violently. Therefore, the IMF1 component is further processed by S–G filtering to improve the accuracy of the model. The IMF1 component consists of 204 data points. First, a fixed coefficient S–G filter is applied to the IMF1 component. After experimentation, it is found that when the window width is 35 and the fitting order is 4, the filtering effect is the best. Therefore, these parameters are used to perform S–G filtering on the IMF1 component, resulting in the filtered signal of the IMF1 component shown in Figure 7.
Figure 7

S–G map.

The red and black colors in the figure represent the original component and the component after S–G filtering, respectively. After smoothing the high-frequency component IMF1 through S–G filtering, the fluctuation of IMF1 relative to the original data has partially decreased, and its non-stationarity has partially decreased as well. When the fitting order is 4, the smoothed signal shows the characteristics of the original signal more completely, and it also accurately captures some of the fine fluctuations, which can effectively preserve the signal's variation information. This demonstrates the effectiveness of S–G filtering.

Groundwater depth forecast

When using GRU and LSSVM for prediction of two observation wells in the urban area of Xinxiang, it is necessary to divide the data into training and prediction sets. The groundwater depth data from 2005 to 2019 is selected as the training set, and the data from 2020 to 2021 are selected as the prediction set. After multiple calibrations, the optimal parameters for the GRU model are selected as follows: the learning rate is , the maximum number of iterations is 421, the gradient threshold is 1, the number of hidden nodes is 616 and the initial input and output are both set to 0. The penalty parameter and kernel parameter in the LSSVM prediction model have a significant impact on the prediction accuracy and selecting appropriate parameters is crucial for the performance of the prediction model. In this paper, the extended memory particle swarm (Duan et al. 2011) is used to optimize and for LSSVM. The optimal is determined to be 25 and the optimal is 0.1.

The groundwater depth of the two observation wells are predicted according to the steps described above and the prediction of each IMF component is shown in Figure 8.
Figure 8

Forecast map of each well component. (a) Well No. 1. and (b) Well No. 2.

Figure 8

Forecast map of each well component. (a) Well No. 1. and (b) Well No. 2.

Close modal
From Figure 8, the groundwater depth time series of the two observation wells underwent EWT decomposition and the S–G filtering improved the stationarity significantly, resulting in a significant reduction in fluctuations. Although there are some individual deviations in the high-frequency component of the two wells under LSSVM prediction, this part of the high-frequency component accounted for a relatively small proportion in the entire groundwater depth time series and had a negligible impact on the overall prediction effect. The low-frequency component showed good prediction results through GRU prediction. The predicted results of IMF1–IMF5 are now reconstructed and compared with the original groundwater depth data. The results are shown in Figure 9, Tables 1 and 2.
Table 1

Comparison of the true and forecast value of Well No. 1 and the relative error table

YearMonthOriginal data (m)Forecast data (m)Relative error (%)
2020 8.51 8.76 2.83 
8.36 8.58 2.60 
8.84 8.79 0.56 
8.88 8.67 2.36 
8.81 8.88 0.81 
9.24 9.23 0.06 
8.25 8.35 1.19 
7.27 7.41 1.89 
7.53 7.47 0.84 
10 7.78 7.76 0.29 
11 8.19 8.33 1.66 
12 8.05 8.39 4.04 
2021 6.82 7.18 4.95 
7.27 7.12 2.11 
6.75 6.79 0.69 
6.86 7.00 2.00 
7.69 7.28 5.51 
8.63 8.60 0.38 
6.68 6.63 0.73 
1.31 1.40 7.02 
1.23 1.25 1.31 
10 1.04 1.10 5.45 
11 2.07 2.10 1.39 
12 2.31 2.30 0.70 
YearMonthOriginal data (m)Forecast data (m)Relative error (%)
2020 8.51 8.76 2.83 
8.36 8.58 2.60 
8.84 8.79 0.56 
8.88 8.67 2.36 
8.81 8.88 0.81 
9.24 9.23 0.06 
8.25 8.35 1.19 
7.27 7.41 1.89 
7.53 7.47 0.84 
10 7.78 7.76 0.29 
11 8.19 8.33 1.66 
12 8.05 8.39 4.04 
2021 6.82 7.18 4.95 
7.27 7.12 2.11 
6.75 6.79 0.69 
6.86 7.00 2.00 
7.69 7.28 5.51 
8.63 8.60 0.38 
6.68 6.63 0.73 
1.31 1.40 7.02 
1.23 1.25 1.31 
10 1.04 1.10 5.45 
11 2.07 2.10 1.39 
12 2.31 2.30 0.70 

Maximum relative error: 7.02%. Minimum relative error: 0.06%. Average relative error: 2.14%.

Table 2

Comparison of the true and forecast value of Well No. 2 and the relative error table

YearMonthOriginal data (m)Forecast data (m)Relative error (%)
2020 6.53 6.29 3.80 
6.02 6.22 3.20 
5.84 5.90 1.01 
6.17 6.48 4.84 
6.28 6.30 0.37 
6.29 6.02 4.48 
5.97 5.61 6.46 
4.17 4.23 1.61 
4.41 4.29 2.77 
10 4.82 4.77 1.20 
11 4.93 5.00 1.27 
12 5.41 5.45 0.63 
2021 5.28 5.36 1.54 
5.49 5.57 1.39 
5.01 5.09 1.52 
5.29 5.35 1.24 
5.59 5.68 1.53 
6.16 6.26 1.68 
5.46 5.35 2.11 
0.96 1.03 7.20 
0.56 0.55 0.96 
10 1.00 0.93 6.97 
11 1.27 1.28 0.76 
12 1.77 1.70 4.46 
YearMonthOriginal data (m)Forecast data (m)Relative error (%)
2020 6.53 6.29 3.80 
6.02 6.22 3.20 
5.84 5.90 1.01 
6.17 6.48 4.84 
6.28 6.30 0.37 
6.29 6.02 4.48 
5.97 5.61 6.46 
4.17 4.23 1.61 
4.41 4.29 2.77 
10 4.82 4.77 1.20 
11 4.93 5.00 1.27 
12 5.41 5.45 0.63 
2021 5.28 5.36 1.54 
5.49 5.57 1.39 
5.01 5.09 1.52 
5.29 5.35 1.24 
5.59 5.68 1.53 
6.16 6.26 1.68 
5.46 5.35 2.11 
0.96 1.03 7.20 
0.56 0.55 0.96 
10 1.00 0.93 6.97 
11 1.27 1.28 0.76 
12 1.77 1.70 4.46 

Maximum relative error: 7.20%. Minimum relative error: 0.37%. Average relative error: 2.63%.

Figure 9

Comparison of model forecast results.

Figure 9

Comparison of model forecast results.

Close modal

From the above figures and tables, the EWT-S–G-GRU and LSSVM models have good tracking and fluctuation prediction effects on groundwater depth, with the predicted trend being basically consistent with the original data. For Well No. 1, the months with a relative error greater than 5% are May, August and October 2021, with a maximum relative error of 7.02%; for Well No. 2, the months with a relative error greater than 5% are July 2020, August 2021 and October 2021, with a maximum relative error of 7.2%. The average relative error (ARE) of the two observation wells is below 5%, which is 2.14 and 2.63%, respectively. This indicates that this hybrid model combining linear and nonlinear models is feasible and verifies that the prediction model based on EWT-S–G-GRU and LSSVM constructed in this study has high accuracy and stationarity.

Model comparison

The EWT-S–G-GRU and LSSVM models demonstrated good prediction performance in the testing of the groundwater depth observation wells in the urban area of Xinxiang City. To demonstrate the superiority of combining linear and nonlinear neural network models, the LSSVM model, GRU model, EWT-GRU model, EWT-LSSVM model, and the prediction results of this paper were compared (using only Well No. 1 as an example).

The prediction results and relative errors of each model are shown in Figures 10 and 11 and Table 3.
Table 3

Comparison of each model NSE

ModelNSE
LSSVM 0.58 
GRU 0.56 
EWT-GRU 0.71 
EWT-LSSVM 0.73 
EWT-S–G-GRU and LSSVM 0.93 
ModelNSE
LSSVM 0.58 
GRU 0.56 
EWT-GRU 0.71 
EWT-LSSVM 0.73 
EWT-S–G-GRU and LSSVM 0.93 
Figure 10

Comparison chart of model forecast results.

Figure 10

Comparison chart of model forecast results.

Close modal
Figure 11

Comparison of relative errors of model forecast results.

Figure 11

Comparison of relative errors of model forecast results.

Close modal

According to Figures 10 and 11 and Table 3, it can be seen that: The performance of single neural network models, namely GRU and LSSVM, in predicting groundwater depth sequence was found to be unsatisfactory. The NSE values for these models were 0.58 and 0.56, respectively, with mean relative errors of 12.88 and 11.9%, and the data were mostly overpredicted by 10–20% compared to actual values. This overprediction was particularly noticeable at peaks. On the other hand, the combined EWT-GRU and EWT-LSSVM models demonstrated better predictive performance, with NSE values of 0.71 and 0.73, respectively, and mean relative errors of 7.07 and 8.09%. However, due to the distinct characteristics of high- and low-frequency components, a single neural network model struggled to provide accurate predictions, resulting in an overall large error and some deviation from actual values. The best prediction performance was observed for the EWT-S–G-GRU and LSSVM model, which decomposed the groundwater depth time series into multiple IMF components using EWT, filtered out the high-frequency component and applied different neural networks to separate the high- and low-frequency components. This approach effectively improved the prediction accuracy compared to the original models, while remaining consistent with the trend of the original sequence.

This study introduces significant innovations in the field of groundwater depth prediction, both methodologically and empirically. The novel approach integrates EWT, S–G filtering, GRU neural network and LSSVM. This comprehensive model excels in handling both nonlinear and linear features of groundwater, resulting in a remarkable improvement in prediction accuracy. Comparative analysis demonstrates its superiority over traditional methods, with an ARE of only 2.14% and an NSE coefficient reaching 0.93. The model's effectiveness lies in its ability to capture the dynamic changes in groundwater through the adept treatment of high- and low-frequency components. The key innovation extends to the accurate prediction of future groundwater dynamics. By employing EWT, filtering out high-frequency components, and applying distinct neural networks to separate components, the model enhances predictive accuracy. This proves instrumental in reliably revealing the spatial distribution and dynamic changes of future groundwater, serving as a robust reference for the rational development and utilization of groundwater in urban areas. These innovations underscore the study's forefront position in groundwater depth prediction, offering a novel and effective approach to address practical challenges in urban water resource management and ecological environmental protection.

In addressing the critical aspect of data quality, our methodology incorporated robust data cleaning techniques, including the removal of duplicate entries, handling missing data through imputation methods and implementing outlier detection algorithms. These steps were pivotal in ensuring the consistency and reliability of our dataset. The sample selection criteria were carefully defined to minimize bias, particularly in the context of a smaller dataset, and great attention was given to the integrity of data sources. Our approach to data quality control aimed not only to enhance the accuracy of our predictive models but also to mitigate the potential impact of data inconsistencies on the overall study outcomes.

In our validation framework, we relied on two key performance metrics: ARE and the NSE coefficient. These metrics were chosen due to their direct relevance to our research goals. The ARE provides a measure of the model's accuracy in predicting groundwater depth, quantifying the average percentage difference between predicted and actual values. The NSE coefficient evaluates the model's ability to capture the observed variations and trends in the data, with a higher NSE indicating better predictive performance. Our focus on these metrics aligns with the specific requirements of our study, providing a robust assessment of our model's accuracy and its ability to reproduce the observed patterns. While our validation approach may be streamlined, the simplicity of the chosen metrics facilitates a clear and focused evaluation of the model's performance. Additionally, the use of these metrics ensures transparency and ease of interpretation. Throughout our discussion of validation results, we emphasize the interpretation of ARE and NSE values, providing insights into the model's strengths and limitations. This streamlined yet informative approach aims to offer a precise evaluation of our model's predictive capabilities in the context of groundwater depth prediction.

Compared to traditional methods, our processing phase demonstrates superior performance in terms of accuracy and adaptability. We have incorporated advanced machine learning techniques, allowing our model to better capture complex patterns within the data. However, it is important to candidly acknowledge that this approach may incur higher computational costs, especially for large-scale datasets. Additionally, for small datasets or specific data types, our model may not perform as well as some specifically designed methods. In comparison to traditional approaches, our processing phase also demands more computational resources and training data. In the future, we plan to further refine our methodology to reduce computational costs and enhance adaptability, making it more widely applicable to various types of data and research scenarios. Such improvements will contribute to expanding the practical utility of our approach.

  • (1)

    The prediction model based on EWT-S–G-GRU and LSSVM proposes in this paper shows good results in predicting groundwater depth in the urban area of Xinxiang City. The average errors of the two observation wells are 2.14 and 2.63%, respectively, and the NSE of Well No. 1 reaches 0.93, indicating that the model is feasible for predicting groundwater depth in the irrigation area. It reveals the spatial distribution of future groundwater and its dynamic changes over time, providing reference for the rational development and utilization of groundwater in the urban area of Xinxiang City.

  • (2)

    For the prediction of groundwater depth in Xinxiang City's urban area, the use of a single neural network or EWT-decomposed models has proven to be unreliable. Given the fluctuating nature of the time series, EWT decomposition is necessary, followed by S–G filtering to reduce noise. The final step involves the separation of the series into high-frequency and low-frequency components for prediction by distinct neural networks, a method that has been proven to significantly improve prediction accuracy.

  • (3)

    Although the prediction accuracy of the EWT-S–G-GRU and LSSVM models is relatively high, the use of signal decomposition on the original sequence involves information from the entire time period, which to some extent leaks information from the test set and needs to be improved. This article only considers the accuracy and stability of the model prediction, without taking into account the time difference in model prediction. In future research, it is necessary to consider the practical engineering application and incorporate a comparison of the time precision of model prediction. The model's performance may rely heavily on the quality and quantity of available data, potentially limiting its generalizability. Additionally, the integration of multiple techniques introduces computational challenges, and the model's effectiveness in diverse geographical regions remains uncertain, necessitating thorough validation and communication of limitations.

All authors contributed to the study conception and design. Haiyang Chen contributed in writing and editing the original manuscript. Shuqi Luo contributed in chart editing and preliminary data collection. All authors read and approved the final manuscript.

Data and materials are available from the corresponding author upon request.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Aghel
B.
,
Rezaei
A.
&
Mohadesi
M.
2018
Modeling and prediction of water quality parameters using a hybrid particle swarm optimization–neural fuzzy approach
.
International Journal of Environmental Science and Technology
16
,
4823
4832
.
Aghel
B.
,
Yahya
S. I.
,
Rezaei
A.
&
Alobaid
F.
2023
A dynamic recurrent neural network for predicting higher heating value of biomass
.
International Journal of Molecular Sciences.
24
,
5780
.
Chai
C. W.
,
Jiang
Z. R.
,
Xu
X. Y.
,
Tang
W. D.
,
Chai
W. V.
,
Li
L. J.
&
Li
C. X.
2006
Determination of land desertification type in the transition zone of desert oasis in Minqin County
.
Journal of Northwest Forestry College
1
(
06
),
12
16
.
Chen
X.
,
Wang
F. X.
,
Qi
W. Y.
&
Zhou
T.
2018
Application of BP neural network model based on genetic algorithm in groundwater burial depth prediction – Mengcheng County as an example
.
Water Resources and Hydropower Technology
49
(
4
),
1
7
.
Cohen
L.
1955
Time-Frequency Analysis: Theory and Application
.
Englewood Cliffs, NJ
:
Prentice Hall
, pp.
25
27
.
Djurhuus
M. S. E.
,
Werzinger
S.
,
Schmauss
B.
,
Clausen
A. T.
&
Zibar
D.
2019
Machine learning assisted fiber Bragg grating-based temperature sensing
.
IEEE Photonics Technology Letters
31
(
12
),
939
942
.
Duan
Q. C.
,
Huang
D. W.
,
Huang
L.
&
Duan
P.
2011
Simulation analysis of particle swarm optimization algorithm with extended memory
.
Control and Decision
26
(
7
),
1087
1090 + 1100
.
Gestel
T. V.
,
Suykens
J. A.
,
Baesens
B.
,
Viaene
S.
,
Vanthienen
J.
,
Dedene
G.
,
De Moor
B.
&
Vandewalle
J.
2004
Benchmarking least squares support vector machine classifiers
.
Machine Learning
54
(
1
),
5
32
.
Gharehbaghi
A.
,
Ghasemlounia
R.
,
Ahmadi
F.
&
Albaji
M.
2022
Groundwater level prediction with meteorologically sensitive gated recurrent unit (GRU) neural networks
.
Journal of Hydrology
612
(
Part C
),
128262
.
Gilles
J.
2013
Empirical wavelet transform
.
IEEE Transactions on Signal Processing
61
(
16
),
3999
4010
.
Jeong
D. I.
&
Kim
Y. O.
2009
Combining single-value streamflow forecasts – A review and guidelines for selecting techniques
.
Journal of Hydrology
377
(
3–4
),
284
299
.
Wang
G. X.
,
Yang
L. Y.
,
Chen
L.
&
Wada
T.
2005
Impact of land use change on groundwater resources in the Heihe River Basin
.
Journal of Geography
2
(
03
),
456
466
.
Yi
J. T.
&
Yan
H.
2021
Research on foreign trade risk prediction and early warning based on wavelet decomposition and ARIMA-GRU hybrid model
.
Chinese Journal of Management Science
31
,
1
11
.
Yin
Z. Q.
&
Zhang
K.
2021
A grey seasonal index model for forecasting groundwater depth of Ningxia Plain
.
Discrete Dynamics in Nature and Society
2021
,
Article ID 6872538
, p.
13
.
Zhang
N.
,
Xia
Z. Q.
&
Jiang
H.
2010
Support vector machine runoff prediction based on multi-factor quantitative metrics
.
Journal of Water Resources
41
(
11
),
1318
1324
.
Zhang
N.
,
Xiao
C. L.
,
Liu
B.
&
Liang
X. J.
2017a
Groundwater depth predictions by GSM, RBF, and ANFIS models: A comparative assessment
.
Arabian Journal of Geosciences
10
,
189
.
Zhang
W. Y.
,
Qu
Z. X.
,
Zhang
K. Q.
,
Mao
W. Q.
,
Ma
Y. N.
&
Fan
X.
2017b
A combined model based on CEEMDAN and modified flower pollination algorithm for wind speed forecasting
.
Energy Conversion and Management
136
(
MAR
),
439
451
.
Zheng
J. D.
,
Pan
H. Y.
,
Pan
Z. W.
&
Luo
J. S.
2016
Adaptive parameterless empirical wavelet transform and its application in rotor fault diagnosis
.
China Mechanical Engineering
27
(
16
),
2218
2224
.
Zhou
X. P.
,
Li
T. X.
&
Wang
W. S.
2013
A combined least squares support vector machine – Markov chain model for annual runoff prediction
.
Journal of Hydroelectricity
32
(
4
),
16
19
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).