## ABSTRACT

The improvement in predicting groundwater depth accuracy has significant implications for the management, ecological environment protection, and economic and social development of regional water resources. Employing empirical wavelet transform (EWT) for nonlinear processing, Savitzky–Golay (S–G) filtering to reduce high-frequency noise, gate recurrent unit (GRU) neural network for linear feature signal processing, and least squares support vector machine (LSSVM) for nonlinear signal handling, we established a comprehensive model combining EWT-S–G-GRU and LSSVM. The results demonstrate that the proposed model exhibits superior accuracy in groundwater depth prediction, with an average relative error of only 2.14% and Nash–Sutcliffe efficiency (NSE) of 0.93. This outperforms the other four models, with average relative errors of 12.88%, 11.90%, 7.07%, and 11.10%, and NSE values of 0.58, 0.56, 0.71, and 0.73, respectively. The superiority of the model established in this study is attributed to its effective handling of both nonlinear and linear features of groundwater, thereby enhancing predictive accuracy. The EWT-S–G-GRU and LSSVM model proves to be more reliable in revealing the spatial distribution and dynamic changes of future groundwater, providing a robust reference for the rational development and utilization of groundwater in the urban area of Xinxiang City.

## HIGHLIGHTS

Through empirical wavelet transform decomposition, Savitzky–Golay filtering and noise reduction process reduces signal fluctuation.

Separate the high-frequency and low-frequency components.

The spatial distribution of groundwater in the future and its dynamic change with time are revealed.

Compared with other models, the advantages of this model are clarified.

The coupling of multiple models improves the prediction accuracy.

## INTRODUCTION

Groundwater depth is an important indicator of changes in groundwater resources. Due to factors such as exploitation, recharge and evaporation, the groundwater depth sequence has characteristics such as randomness, uncertainty and non-stationarity, which adds to the difficulty of predicting groundwater depth accurately (Jeong & Kim 2009; Chen *et al.* 2018; He *et al.* 2021). When groundwater is overexploited, it can form a groundwater funnel and land subsidence. When the recharge exceeds the extraction, the depth of groundwater becomes shallower. Therefore, accurate prediction of changes in groundwater depth can provide a theoretical basis for groundwater protection, adjustment of planting structure and patterns, rational utilization of water and soil resources, and ecological environment protection (Wang *et al.* 2005; Chai *et al.* 2006; Tian 2020). Currently, research on groundwater depth prediction models at home and abroad is relatively important and has achieved fruitful results. Foreign research on groundwater depth prediction models mainly focuses on neural networks and other related techniques. Khan *et al.* (2023) conducted a comprehensive assessment of prevalent numerical, machine learning and deep learning models employed in forecasting groundwater levels. Their findings evince the efficacy of machine learning and deep learning approaches in the modeling of groundwater depth. Takafuji *et al.* (2019) compared the performance of two methods, autoregressive integrated moving average (ARIMA) and sequential Gaussian simulation (SGS), in predicting groundwater depth. The results showed that the ARIMA model is considered more suitable for monitoring aquifers, as they achieved the same level of accuracy as SGS in a 2-month forecast. Gharehbaghi *et al.* (2022) used a gate recurrent unit (GRU) neural network to predict the regional average monthly time series of groundwater depth within a range of 4.82 m in the Urmia plain. Bowes *et al.* (2019) explored two machine learning models, LSTM and recurrent neural network, to simulate and predict the response of groundwater levels to storm events in Norfolk, Virginia, a coastal city prone to flooding. These results demonstrated the first use of LSTM networks to create hourly forecasts of groundwater levels in a coastal city and showed that they are well suited for creating real-time operational forecasts. Aghel *et al.* (2023, 2018) employed data-driven models, including adaptive neural fuzzy inference systems and hybrid structures such as particle swarm optimization–adaptive neural fuzzy inference systems, to accurately predict water quality parameters. Agoubi & Kharroubi (2019) evaluated and predicted short-term groundwater depths under different influencing factors using stochastic time series and artificial neural network (ANN). Domestic scholars (e.g. Zhou *et al.* 2013) proposed a data-driven prediction model for groundwater depth prediction using discrete wavelet transformation preprocessing and support vector machines (SVM) and compared it with conventional ANN, conventional SVM and weight agnostic neural networks models with wavelet preprocessing. Zhang *et al.* (2017a, 2017b) used the grey self-memory model, radial basis function network and adaptive neuro fuzzy inference system models to predict the buried depth of unconfined aquifers in Jilin City. Yang & Zhang (2022) devised a novel hybrid model, convolutional neural network-long short-term memory-meta-learning (CNN-LSTM-ML), that integrates CNN and LSTM network structures. The model leverages a ML algorithm framework to guarantee network performance under sample conditions when forecasting the groundwater level for 66 observation wells in the middle and lower reaches of the Heihe River in arid zones. Yin & Zhang (2021) used a multi-stage cumulative grey model to fit and predict different periodic indicators and long-term trends separately, and combined the prediction results of both to obtain the final prediction result. They applied this model to groundwater prediction in the Yinchuan Plain. Zhou *et al.* (2013) successfully applied the least squares support vector machine (LSSVM)-Markov chain combined model to annual runoff prediction. Zhang *et al.* (2010) constructed a runoff prediction model based on multi-factor quantification indicators using the least squares support vector method.

From a comprehensive analysis of domestic and foreign research, numerical prediction methods such as grey model and neural network are mainly used in the prediction of groundwater depth. However, it is rare to construct a prediction model that takes into account the characteristics of the groundwater depth sequence itself by reducing the non-stationarity of the sequence and separating high-frequency components from low-frequency components. This article applies the empirical wavelet transform (EWT) decomposition technique to preprocess the groundwater depth data in Xinxiang City's urban region. The technique decomposes the data into a series of components that contain single-frequency information, filters and denoises the high-frequency components, and separates them from the low-frequency components for forecasting purposes. The study establishes a prediction model using EWT-Savitzky–Golay (S–G)-GRU and LSSVM, and assesses the model error to confirm the efficacy of the approach.

The objective of this study is to establish a comprehensive groundwater depth prediction model by integrating EWT-S–G-GRU and LSSVM. The uniqueness of this model lies in its effective handling of both nonlinear and linear features of groundwater, leading to a significant improvement in predictive accuracy. The novelty of this research lies in the integration of various advanced techniques, providing a reliable prediction framework for groundwater resource management and rational utilization. This study contributes new insights to the relevant field of research and practice.

## METHODS

### EWT

In 2013, Gilles (2013) proposed a new method for constructing adaptive wavelets directly generated by the characteristics of the signal itself, called the EWT (Cohen 1955; Zheng *et al.* 2016). This method inherits the adaptive properties of the empirical mode decomposition (EMD) algorithm, enriches the mathematical theory and can solve the problem of mode mixing in the EMD method, extracting more stable components. The implementation process of this method is as follows:

- (1)
Perform a Fourier transform on the original signal to obtain the signal's frequency spectrum.

- (2)
Perform adaptive segmentation on the signal's frequency spectrum. Search for local minima in the spectrum to determine the boundaries for segmentation.

- (3)
Based on the segmentation boundaries determined by the frequency spectrum, the principles of classical wavelet transform are used to construct a EWT filter bank which is used to decompose the original signal into its individual frequency components.

This method can effectively separate the low-frequency and high-frequency components, with almost no mode mixing.

### S–G

*et al.*2019). Assume that a set of data points

*x*[

*n*] are being processed, and a continuous set of 2M + 1 data points is selected with

*n*= 0 as the center. The data are then fitted using the following formula:

From this, the values of the coefficients *a*_{0}, *a*_{1}, … , *a _{N}* of the polynomial are obtained. After that, the window width of 2M + 1 is shifted to obtain the filtered series

*y*[

*n*]. The above derivation shows that the performance of S–G filtering is mainly determined by two filtering parameters: the window width

*M*and the fitting order

*N*. Therefore, it is only necessary to make reasonable adjustments to these two filtering parameters to deal with different data situations.

To summarize, smoothing filtering is a widely used preprocessing technique in spectral analysis. The S–G method of smoothing filtering can enhance the smoothness of the spectrum and mitigate noise interference. The impact of S–G smoothing filtering is reliant on the window width selected and can cater to diverse requirements in various settings.

### GRU

The components of Figure 1 are labeled as follows: refers to the input at time *t*, signifies the output or hidden layer cell output at time *t*, *C* denotes the connection between two vectors, *X* refers to the product of the two components of input data, and 1− is the result of subtracting 1 from the data input to the module. The update gate is denoted as , responsible for regulating the degree of information inflow when the state information from the preceding moment enters the current state. A larger value implies greater integration of information from the prior state into the current state. The reset gate, , is responsible for managing the proportion of information from the previous state acting on the current state's hidden cell candidate set, . The hyperbolic tangent activation function, tanh, outputs values between (−1, 1), while the sigmoid activation function, , generates outputs within the range (0, 1). After the information enters the GRU, the process transfer consists of the following steps:

- (1)
Equation (8) concatenates the input data at time

*t*and the output of the hidden layer at time*t*− 1 to derive the reset gate output signal, . - (2)
is the output signal of the update gate, obtained from Equation (9).

- (3)
is the candidate set of current state hidden units, obtained from Equation (10), which mainly integrates the input data and the hidden layer state selected by the reset gate from the previous time step

*t*− 1. - (4)
The output of the hidden layer at time

*t*is then computed using Equation (11), which involves the selection of essential information from the candidate hidden layer at time*t*while neglecting the previously transmitted hidden layer information, , at time*t*− 1.

*t*− 1, and represents the selectively ‘forgetting’ of the information in the candidate set of hidden units , retaining only the important information in . In this way, the accumulation and memory of historical information are repeatedly implemented, which can effectively capture the long-term correlations that exist in economic forecasting or time series models:

This study employed a gated recurrent unit model with two hidden layers, each followed by a Dropout layer to mitigate overfitting. The output layer utilized a sigmoid activation function, suitable for binary classification tasks. During the model compilation, the Adam optimizer was chosen, along with binary cross-entropy loss function, and accuracy as the evaluation metric.

### LSSVM

*n*-dimensional training data input; is the output value of the model and l is the number of training samples. The equation of the LSSVM can be expressed as (Gestel

*et al.*2004):where

*b*is the deviation value and

*θ*denotes the feature mapping that converts the complex nonlinear relationship between the output

*y*and the input

*x*into a linear relationship between

*y*and

*θ*.

*b*is the deviation vector and ω is the weight vector. Optimization of the requested values leads to:where

*γ*is the regularization parameter, is the error term, is the nonlinear mapping function that maps from the original space to the multi-bit feature space and the Lagrange function is defined as

### Model evaluation index

*y*is true value and is the forecasting value.where is the predicted value at time

*i*; is the measured value at time

*i*and is the mean value of the measured value.

### Model coupling

- (1)
The monthly time series of the groundwater depth of two professional observation wells in Xinxiang City are 204. Firstly, the EWT decomposition is used to decompose the time series into high-frequency components and low-frequency components, which are Intrinsic Mode Function (IMF)1, IMF2, … , IMF

*n*in the figure. - (2)
The IMF components of the first 180 months of the two professional observation wells are used as the training set for GRU and LSSVM, and the last 24 months of IMF components are used as the prediction set for GRU and LSSVM.

- (3)
For the high-frequency components, S–G is used for signal smoothing processing and then used as the prediction set of the LSSVM network to obtain the corresponding predicted value, which is

*Y*_{1}in the figure. - (4)
The low-frequency components are input into GRU separately to obtain the corresponding predicted values, which are

*Y*_{2},*Y*_{3}, … ,*Y*in Figure 2._{n} - (5)
The predicted results of the two parts are accumulated and restored to predict the groundwater depth.

## EXAMPLE APPLICATIONS

### Overview of the study area and data sources

The city of Xinxiang is located in the northern part of Henan province, with a total area of 8,629 km^{2} and a population of 5.65 million. It belongs to the warm temperate zone and spans the Haihe and Yellow River water systems, with 78% of its area being plains. Currently, the city's reservoir capacity is 199 million cubic meters, the average annual rainfall is 621 mm and the average water resources are 1.697 billion cubic meters per year (including 1.12 billion cubic meters of groundwater resources). Henan province is a region with a shortage of water resources, and the shortage is even more severe in the northern part of Henan where Xinxiang is located. The per capita water resources in the area are only 315 cubic meters, which is less than one-seventh of the national average and far below the internationally recognized standard of 1,000 cubic meters per capita. Due to uneven spatial and temporal distribution of rainfall, more than 70% of annual rainfall is concentrated in the flood season, and the supply-demand contradiction of water resources is particularly prominent in some local areas, especially in the Haihe River Basin. With the growth of population and economic development, the demand for water in the city is increasing, and if a drought or a severe drought occurs, the water shortage problem will become more serious. Even after the completion of the South-to-North Water Diversion Project, although it can alleviate the local water shortage situation, there is still a large gap to meet the long-term needs of water resources for economic and social sustainable development and environmental ecology. It can be said that the scarcity of water resources, low per capita water resource ownership, uneven spatial and temporal distribution of water resources, water pollution and deterioration of water ecological environment have become bottlenecks that restrict the sustainable development of Xinxiang's economy and society. Coupled with unreasonable development, blind exploitation, aggravation of water pollution, serious waste of water resources, the originally scarce water resources become even more scarce. Based on this, it is very important to reasonably predict the buried depth of groundwater, analyze the changes in the buried depth of groundwater, carry out scientific research on groundwater and establish a stable groundwater monitoring system to provide effective guarantees for the utilization and development of water resources. The two selected observation wells for groundwater depth are located in Wuling Village, Luwangfen Township, Fengquan District, Xinxiang City (Well No. 1) and southeast of Xiheidui Village, Muye Township, Muye District, Xinxiang City (Well No. 2). Both wells are located in the urban area of Xinxiang City. The location of the research area is shown in Figure 3.

From Figure 4, it can be shown that there exists great volatility in the groundwater depth, which indicates that there are characteristics of combining nonlinearity and linearity in the groundwater burial depth sequence, which makes it not difficult to predict the groundwater depth. In this paper, based on the characteristics of nonlinearity and linearity appearing in the groundwater depth sequence, a hybrid model combining linearity and nonlinearity is proposed to improve the prediction accuracy.

### Data preprocessing

The red and black colors in the figure represent the original component and the component after S–G filtering, respectively. After smoothing the high-frequency component IMF1 through S–G filtering, the fluctuation of IMF1 relative to the original data has partially decreased, and its non-stationarity has partially decreased as well. When the fitting order is 4, the smoothed signal shows the characteristics of the original signal more completely, and it also accurately captures some of the fine fluctuations, which can effectively preserve the signal's variation information. This demonstrates the effectiveness of S–G filtering.

### Groundwater depth forecast

When using GRU and LSSVM for prediction of two observation wells in the urban area of Xinxiang, it is necessary to divide the data into training and prediction sets. The groundwater depth data from 2005 to 2019 is selected as the training set, and the data from 2020 to 2021 are selected as the prediction set. After multiple calibrations, the optimal parameters for the GRU model are selected as follows: the learning rate is , the maximum number of iterations is 421, the gradient threshold is 1, the number of hidden nodes is 616 and the initial input and output are both set to 0. The penalty parameter and kernel parameter in the LSSVM prediction model have a significant impact on the prediction accuracy and selecting appropriate parameters is crucial for the performance of the prediction model. In this paper, the extended memory particle swarm (Duan *et al.* 2011) is used to optimize and for LSSVM. The optimal is determined to be 25 and the optimal is 0.1.

Year . | Month . | Original data (m) . | Forecast data (m) . | Relative error (%) . |
---|---|---|---|---|

2020 | 1 | 8.51 | 8.76 | 2.83 |

2 | 8.36 | 8.58 | 2.60 | |

3 | 8.84 | 8.79 | 0.56 | |

4 | 8.88 | 8.67 | 2.36 | |

5 | 8.81 | 8.88 | 0.81 | |

6 | 9.24 | 9.23 | 0.06 | |

7 | 8.25 | 8.35 | 1.19 | |

8 | 7.27 | 7.41 | 1.89 | |

9 | 7.53 | 7.47 | 0.84 | |

10 | 7.78 | 7.76 | 0.29 | |

11 | 8.19 | 8.33 | 1.66 | |

12 | 8.05 | 8.39 | 4.04 | |

2021 | 1 | 6.82 | 7.18 | 4.95 |

2 | 7.27 | 7.12 | 2.11 | |

3 | 6.75 | 6.79 | 0.69 | |

4 | 6.86 | 7.00 | 2.00 | |

5 | 7.69 | 7.28 | 5.51 | |

6 | 8.63 | 8.60 | 0.38 | |

7 | 6.68 | 6.63 | 0.73 | |

8 | 1.31 | 1.40 | 7.02 | |

9 | 1.23 | 1.25 | 1.31 | |

10 | 1.04 | 1.10 | 5.45 | |

11 | 2.07 | 2.10 | 1.39 | |

12 | 2.31 | 2.30 | 0.70 |

Year . | Month . | Original data (m) . | Forecast data (m) . | Relative error (%) . |
---|---|---|---|---|

2020 | 1 | 8.51 | 8.76 | 2.83 |

2 | 8.36 | 8.58 | 2.60 | |

3 | 8.84 | 8.79 | 0.56 | |

4 | 8.88 | 8.67 | 2.36 | |

5 | 8.81 | 8.88 | 0.81 | |

6 | 9.24 | 9.23 | 0.06 | |

7 | 8.25 | 8.35 | 1.19 | |

8 | 7.27 | 7.41 | 1.89 | |

9 | 7.53 | 7.47 | 0.84 | |

10 | 7.78 | 7.76 | 0.29 | |

11 | 8.19 | 8.33 | 1.66 | |

12 | 8.05 | 8.39 | 4.04 | |

2021 | 1 | 6.82 | 7.18 | 4.95 |

2 | 7.27 | 7.12 | 2.11 | |

3 | 6.75 | 6.79 | 0.69 | |

4 | 6.86 | 7.00 | 2.00 | |

5 | 7.69 | 7.28 | 5.51 | |

6 | 8.63 | 8.60 | 0.38 | |

7 | 6.68 | 6.63 | 0.73 | |

8 | 1.31 | 1.40 | 7.02 | |

9 | 1.23 | 1.25 | 1.31 | |

10 | 1.04 | 1.10 | 5.45 | |

11 | 2.07 | 2.10 | 1.39 | |

12 | 2.31 | 2.30 | 0.70 |

Maximum relative error: 7.02%. Minimum relative error: 0.06%. Average relative error: 2.14%.

Year . | Month . | Original data (m) . | Forecast data (m) . | Relative error (%) . |
---|---|---|---|---|

2020 | 1 | 6.53 | 6.29 | 3.80 |

2 | 6.02 | 6.22 | 3.20 | |

3 | 5.84 | 5.90 | 1.01 | |

4 | 6.17 | 6.48 | 4.84 | |

5 | 6.28 | 6.30 | 0.37 | |

6 | 6.29 | 6.02 | 4.48 | |

7 | 5.97 | 5.61 | 6.46 | |

8 | 4.17 | 4.23 | 1.61 | |

9 | 4.41 | 4.29 | 2.77 | |

10 | 4.82 | 4.77 | 1.20 | |

11 | 4.93 | 5.00 | 1.27 | |

12 | 5.41 | 5.45 | 0.63 | |

2021 | 1 | 5.28 | 5.36 | 1.54 |

2 | 5.49 | 5.57 | 1.39 | |

3 | 5.01 | 5.09 | 1.52 | |

4 | 5.29 | 5.35 | 1.24 | |

5 | 5.59 | 5.68 | 1.53 | |

6 | 6.16 | 6.26 | 1.68 | |

7 | 5.46 | 5.35 | 2.11 | |

8 | 0.96 | 1.03 | 7.20 | |

9 | 0.56 | 0.55 | 0.96 | |

10 | 1.00 | 0.93 | 6.97 | |

11 | 1.27 | 1.28 | 0.76 | |

12 | 1.77 | 1.70 | 4.46 |

Year . | Month . | Original data (m) . | Forecast data (m) . | Relative error (%) . |
---|---|---|---|---|

2020 | 1 | 6.53 | 6.29 | 3.80 |

2 | 6.02 | 6.22 | 3.20 | |

3 | 5.84 | 5.90 | 1.01 | |

4 | 6.17 | 6.48 | 4.84 | |

5 | 6.28 | 6.30 | 0.37 | |

6 | 6.29 | 6.02 | 4.48 | |

7 | 5.97 | 5.61 | 6.46 | |

8 | 4.17 | 4.23 | 1.61 | |

9 | 4.41 | 4.29 | 2.77 | |

10 | 4.82 | 4.77 | 1.20 | |

11 | 4.93 | 5.00 | 1.27 | |

12 | 5.41 | 5.45 | 0.63 | |

2021 | 1 | 5.28 | 5.36 | 1.54 |

2 | 5.49 | 5.57 | 1.39 | |

3 | 5.01 | 5.09 | 1.52 | |

4 | 5.29 | 5.35 | 1.24 | |

5 | 5.59 | 5.68 | 1.53 | |

6 | 6.16 | 6.26 | 1.68 | |

7 | 5.46 | 5.35 | 2.11 | |

8 | 0.96 | 1.03 | 7.20 | |

9 | 0.56 | 0.55 | 0.96 | |

10 | 1.00 | 0.93 | 6.97 | |

11 | 1.27 | 1.28 | 0.76 | |

12 | 1.77 | 1.70 | 4.46 |

Maximum relative error: 7.20%. Minimum relative error: 0.37%. Average relative error: 2.63%.

From the above figures and tables, the EWT-S–G-GRU and LSSVM models have good tracking and fluctuation prediction effects on groundwater depth, with the predicted trend being basically consistent with the original data. For Well No. 1, the months with a relative error greater than 5% are May, August and October 2021, with a maximum relative error of 7.02%; for Well No. 2, the months with a relative error greater than 5% are July 2020, August 2021 and October 2021, with a maximum relative error of 7.2%. The average relative error (ARE) of the two observation wells is below 5%, which is 2.14 and 2.63%, respectively. This indicates that this hybrid model combining linear and nonlinear models is feasible and verifies that the prediction model based on EWT-S–G-GRU and LSSVM constructed in this study has high accuracy and stationarity.

### Model comparison

The EWT-S–G-GRU and LSSVM models demonstrated good prediction performance in the testing of the groundwater depth observation wells in the urban area of Xinxiang City. To demonstrate the superiority of combining linear and nonlinear neural network models, the LSSVM model, GRU model, EWT-GRU model, EWT-LSSVM model, and the prediction results of this paper were compared (using only Well No. 1 as an example).

Model . | NSE . |
---|---|

LSSVM | 0.58 |

GRU | 0.56 |

EWT-GRU | 0.71 |

EWT-LSSVM | 0.73 |

EWT-S–G-GRU and LSSVM | 0.93 |

Model . | NSE . |
---|---|

LSSVM | 0.58 |

GRU | 0.56 |

EWT-GRU | 0.71 |

EWT-LSSVM | 0.73 |

EWT-S–G-GRU and LSSVM | 0.93 |

According to Figures 10 and 11 and Table 3, it can be seen that: The performance of single neural network models, namely GRU and LSSVM, in predicting groundwater depth sequence was found to be unsatisfactory. The NSE values for these models were 0.58 and 0.56, respectively, with mean relative errors of 12.88 and 11.9%, and the data were mostly overpredicted by 10–20% compared to actual values. This overprediction was particularly noticeable at peaks. On the other hand, the combined EWT-GRU and EWT-LSSVM models demonstrated better predictive performance, with NSE values of 0.71 and 0.73, respectively, and mean relative errors of 7.07 and 8.09%. However, due to the distinct characteristics of high- and low-frequency components, a single neural network model struggled to provide accurate predictions, resulting in an overall large error and some deviation from actual values. The best prediction performance was observed for the EWT-S–G-GRU and LSSVM model, which decomposed the groundwater depth time series into multiple IMF components using EWT, filtered out the high-frequency component and applied different neural networks to separate the high- and low-frequency components. This approach effectively improved the prediction accuracy compared to the original models, while remaining consistent with the trend of the original sequence.

## DISCUSSION

This study introduces significant innovations in the field of groundwater depth prediction, both methodologically and empirically. The novel approach integrates EWT, S–G filtering, GRU neural network and LSSVM. This comprehensive model excels in handling both nonlinear and linear features of groundwater, resulting in a remarkable improvement in prediction accuracy. Comparative analysis demonstrates its superiority over traditional methods, with an ARE of only 2.14% and an NSE coefficient reaching 0.93. The model's effectiveness lies in its ability to capture the dynamic changes in groundwater through the adept treatment of high- and low-frequency components. The key innovation extends to the accurate prediction of future groundwater dynamics. By employing EWT, filtering out high-frequency components, and applying distinct neural networks to separate components, the model enhances predictive accuracy. This proves instrumental in reliably revealing the spatial distribution and dynamic changes of future groundwater, serving as a robust reference for the rational development and utilization of groundwater in urban areas. These innovations underscore the study's forefront position in groundwater depth prediction, offering a novel and effective approach to address practical challenges in urban water resource management and ecological environmental protection.

In addressing the critical aspect of data quality, our methodology incorporated robust data cleaning techniques, including the removal of duplicate entries, handling missing data through imputation methods and implementing outlier detection algorithms. These steps were pivotal in ensuring the consistency and reliability of our dataset. The sample selection criteria were carefully defined to minimize bias, particularly in the context of a smaller dataset, and great attention was given to the integrity of data sources. Our approach to data quality control aimed not only to enhance the accuracy of our predictive models but also to mitigate the potential impact of data inconsistencies on the overall study outcomes.

In our validation framework, we relied on two key performance metrics: ARE and the NSE coefficient. These metrics were chosen due to their direct relevance to our research goals. The ARE provides a measure of the model's accuracy in predicting groundwater depth, quantifying the average percentage difference between predicted and actual values. The NSE coefficient evaluates the model's ability to capture the observed variations and trends in the data, with a higher NSE indicating better predictive performance. Our focus on these metrics aligns with the specific requirements of our study, providing a robust assessment of our model's accuracy and its ability to reproduce the observed patterns. While our validation approach may be streamlined, the simplicity of the chosen metrics facilitates a clear and focused evaluation of the model's performance. Additionally, the use of these metrics ensures transparency and ease of interpretation. Throughout our discussion of validation results, we emphasize the interpretation of ARE and NSE values, providing insights into the model's strengths and limitations. This streamlined yet informative approach aims to offer a precise evaluation of our model's predictive capabilities in the context of groundwater depth prediction.

Compared to traditional methods, our processing phase demonstrates superior performance in terms of accuracy and adaptability. We have incorporated advanced machine learning techniques, allowing our model to better capture complex patterns within the data. However, it is important to candidly acknowledge that this approach may incur higher computational costs, especially for large-scale datasets. Additionally, for small datasets or specific data types, our model may not perform as well as some specifically designed methods. In comparison to traditional approaches, our processing phase also demands more computational resources and training data. In the future, we plan to further refine our methodology to reduce computational costs and enhance adaptability, making it more widely applicable to various types of data and research scenarios. Such improvements will contribute to expanding the practical utility of our approach.

## CONCLUSION

- (1)
The prediction model based on EWT-S–G-GRU and LSSVM proposes in this paper shows good results in predicting groundwater depth in the urban area of Xinxiang City. The average errors of the two observation wells are 2.14 and 2.63%, respectively, and the NSE of Well No. 1 reaches 0.93, indicating that the model is feasible for predicting groundwater depth in the irrigation area. It reveals the spatial distribution of future groundwater and its dynamic changes over time, providing reference for the rational development and utilization of groundwater in the urban area of Xinxiang City.

- (2)
For the prediction of groundwater depth in Xinxiang City's urban area, the use of a single neural network or EWT-decomposed models has proven to be unreliable. Given the fluctuating nature of the time series, EWT decomposition is necessary, followed by S–G filtering to reduce noise. The final step involves the separation of the series into high-frequency and low-frequency components for prediction by distinct neural networks, a method that has been proven to significantly improve prediction accuracy.

- (3)
Although the prediction accuracy of the EWT-S–G-GRU and LSSVM models is relatively high, the use of signal decomposition on the original sequence involves information from the entire time period, which to some extent leaks information from the test set and needs to be improved. This article only considers the accuracy and stability of the model prediction, without taking into account the time difference in model prediction. In future research, it is necessary to consider the practical engineering application and incorporate a comparison of the time precision of model prediction. The model's performance may rely heavily on the quality and quantity of available data, potentially limiting its generalizability. Additionally, the integration of multiple techniques introduces computational challenges, and the model's effectiveness in diverse geographical regions remains uncertain, necessitating thorough validation and communication of limitations.

## AUTHOR CONTRIBUTION

All authors contributed to the study conception and design. Haiyang Chen contributed in writing and editing the original manuscript. Shuqi Luo contributed in chart editing and preliminary data collection. All authors read and approved the final manuscript.

## AVAILABILITY OF DATA AND MATERIALS

Data and materials are available from the corresponding author upon request.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.