ABSTRACT
The precise prediction of groundwater levels is a challenging task due to the complex relationships between hydrological parameters and the lack of in situ climate data. The present research proposed an integrated machine learning model for groundwater level prediction based on long short-term memory (LSTM) along with principal component analysis (PCA) and discrete wavelength transform (DWT), i.e. PCA–DWT–LSTM model. The proposed model was developed using 23 years (2000–2022) of seasonal groundwater level data and climatic variables for nine wells in the district of Kangra in Himachal Pradesh, India. The proposed model attains higher ranges of R2 (0.8253–0.8828) and lower ranges of root mean square error (RMSE) (0.1011–2.0025) than the alternative model (PCA–LSTM), having R2 and RMSE values in the range of 0.7019–0.8005 and 0.2662–2.9565, respectively. Moreover, when compared to the hybrid models, the accuracy of the DWT-based models is much higher. The developed model (PCA–DWT–LSTM) improves the accuracy and interpretability of groundwater level prediction and has the potential to estimate the accurate groundwater level, particularly in the regions where obtaining the hydrogeological data is difficult.
HIGHLIGHTS
A hybrid principal component analysis (PCA)–discrete wavelength transform (DWT)–long short-term memory (LSTM) model for enhanced prediction accuracy is proposed.
Integration of hydrological and remote sensing data with machine learning is adopted.
Comprehensive performance evaluation using R² and root mean square error (RMSE) is presented.
Benefits for groundwater management and policy-making are presented.
INTRODUCTION
The depletion of groundwater is a worldwide issue that needs immediate action by the water management bodies, the general public, and other stakeholders to guarantee the sustainability of groundwater resources (Konikow & Kendy 2005). The sustainable development of groundwater resources requires the application of efficient management techniques and a thorough understanding of the mechanisms underlying groundwater depletion. In India, the loss of groundwater can reduce agricultural intensity by 20–68% in areas where groundwater is limited (Bhattarai et al. 2021). The long-term balance between groundwater abstraction and recharge is mostly accountable for this effect (Wendt et al. 2020).
In order to anticipate possible groundwater situations and comprehend the dynamics of groundwater systems, advanced modelling techniques can be used to monitor aquifer properties, groundwater levels, and water quality. Moreover, groundwater level changes over the long term are helpful in guiding future research and decision making (Hadidi et al. 2019).
For environmental scientists, water resource planners, and policymakers around the world, managing and predicting groundwater levels has become a critical undertaking in recent decades. Saqr et al. (2024) performed a study on the development of solar energy-based groundwater exploitation suitability maps, which integrates the region-specific characteristics and the three pillars of sustainability to assess the groundwater level. The conventional hydrological models such as MODFLOW and HYDRUS have been employed to provide informative assessments of groundwater dynamics (Twarakavi et al. 2008; Zhu et al. 2012; Sundar et al. 2022). The hybrid simulation–optimisation (S–O) framework of the MODFLOW-UnStructured Grids (USG) model has been effectively utilised for groundwater prediction (Saqr et al. 2022). These models are the physical-based models and when solving the groundwater problem numerically, a set of aquifers’ physical characteristics is utilised implicitly. However, these models are not able to simulate complex geological features (Kumar 2019). Numerical models are hindered by long runtimes, whereas the machine learning models are an efficient alternate. By adopting the machine learning approach, the constraints present in numerical modelling and remote sensing approaches are successfully overcome (Tao et al. 2022). Machine learning models keep the degree of detailed precision in groundwater level predictions and reduce the time needed for model development and calibration (Di Salvo 2022).
In recent years, groundwater modelling and forecasting have gone through significant modifications due to the introduction of machine learning (ML). Machine learning techniques can determine the weak relationship of hydrological variables via simulation through a training network (Banerjee et al. 2011). Artificial intelligence models are an increasingly prevalent choice for predicting groundwater levels than the complicated numerical approaches (Derbela & Nouiri 2020). Several studies have used artificial intelligence in groundwater resource simulation (Lallahem et al. 2005; Tsanis et al. 2008; Suryanarayana et al. 2014; Hussein et al. 2020; Khedri et al. 2020; Samani et al. 2023; Singh et al. 2024). Abd-Elmaboud et al. (2024) utilised the Artificial neural network (ANN) technique for mapping the groundwater potential zones. However, standard artificial intelligence models lack the ability to learn time-series data since they are not capable of maintaining the past information, resulting in limited prediction potential for long-term time-series data (Wiese & Omlin 2009). Convolutional neural network (CNN) and long short-term memory (LSTM) neural networks are important for dealing with long and short-term dependencies (Wunsch et al. 2021), while LSTM is more efficient than the regular neural networks for long-term dependent data (Lechner & Hasani 2020).
For noisy and sparse data, various standalone model techniques presented in the literature are inefficient for precise prediction of groundwater level (Nguyen et al. 2019). Therefore, for limited data, combining several individual techniques, i.e. hybrid models, can be utilised to improve the predictive efficacy of machine learning algorithms (Maxhuni et al. 2016). A multivariate statistical method called Principal Component Analysis (PCA) is used to reduce the dimensionality of the data by converting it into orthogonal components, which are linear combinations of the original variables. PCA is typically used to minimise a dataset's dimensionality while preserving as much of the original data as feasible (Singh et al. 2011). Literature review indicates that PCA is also widely used to analyse the parameters affecting the quality of the groundwater (Usman et al. 2014). Discrete wavelet transform (DWT) is a data processing technology extensively used to improve the prediction of models. DWT is used to separate the dynamic and the multi-scaler features in the groundwater level (Nourani et al. 2015). Moreover, combining the DWT with the machine learning model enhances the model's prediction metrics (Wei et al. 2023).
Driven by the various effective applications of hybrid machine learning models, a novel hybrid PCA–DWT–LSTM model is proposed in the present study for groundwater prediction. The PCA is used to identify the most important variables influencing groundwater level and to generate new variables that represent the highest variance of input dataset. To remove outliers from the groundwater time-series and to obtain the trend and residual series, the DWT technique has been used. The seasonal groundwater level is then predicted using LSTM. The study uses the groundwater level data of 23 years, i.e. for years 2000–2022, for the nine wells in the study area. The data of 19 years were used for the training, whereas the remaining data were utilised for validation purposes. The objectives of the study are as follows: (1) to identify the dominant variables affecting groundwater level and to reduce the dimensionality of the dataset using PCA; (2) to obtain the trend series, i.e. the approximation component from groundwater level series using the DWT; and (3) to develop a model for predicting groundwater level using the LSTM technique and evaluate its efficacy using statistical indicators.
STUDY AREA AND DATA DESCRIPTION
Study area
The Kangra district lies in the western region of Himachal Pradesh, India. The coordinates of the study area are 75°47′55″ to 77°45′ E longitude and 31°21′ to 32°59′ N latitude. The study uses groundwater level data of nine wells that are located at Bandh, Bod, Panjpir, Paprola, Dehra Gopipur, Raja ka Talab, Bharoli, Hardogi, and Jawalaji in the Kangra district of Himachal Pradesh, India. The groundwater level data are recorded by the Central Ground Water Board (CGWB) on seasonal basis. The study area experiences significant climatic variations, with annual temperatures ranging from 0 to 40 °C and annual precipitation varying between 1,200 and 3,000 mm due to its unique topography. In total, 70% of the yearly rainfall that falls in the region occurs during the monsoon season.
Data description
Observed groundwater data level
Satellite data
The National Oceanic and Atmospheric Administration (NOAA) and the National Aeronautics and Space Administration (NASA), both operated by the United States of America (USA), worked together to create CHIRPS, which stands for Climate Hazards Group InfraRed Precipitation with Station Data (USA). The dataset, with a spatial resolution of 0.05° × 0.05°, integrates satellite observations and rain gauge measurements (Funk et al. 2015). The CHIRPS dataset, which spans from 1981 until the present, is acquired from https://www.chc.ucsb.edu/data/chirps. Several studies using gauge-based observations have validated the CHIRPS dataset's reliability and accuracy in capturing precipitation patterns across India (Prakash 2019). CHIRPS rainfall data were successfully used by Moudgil et al. (2024) to analyse the spatiotemporal variation of terrestrial water storage over various Indian river basins.
As previously mentioned, the observed CGWB data are available seasonally, thus, the CHIRPS average monthly precipitation data for August, November, May, and January months have been used in the study. The FLDAS (USA) stands for Famine Early Warning Systems Network (FEWS NET) Land Data Assimilation System. The hydrological variables considered in the present study, i.e. evapotranspiration, surface soil moisture, and root zone soil moisture, are taken from the FLDAS dataset. The datasets corresponding to August, November, May, and January months of every year in the study duration are taken from the Noah 3.6.1 (Land Surface model, USA) simulations with a spatial resolution of 0.1° × 0.1°.
TerraClimate is a high-resolution worldwide gridded dataset of monthly climate and climatic water balance variables that was established in 1958. Hydrological variables considered in the present study, i.e. monthly maximum and minimum temperatures and runoff, are taken from the TerraClimate dataset with a high spatial resolution of 0.04° × 0.04°. Table 1 presents the summary of the different datasets used in the present study to predict the groundwater levels.
Sr. no. . | Variables . | Source . | Resolution . | Time span . |
---|---|---|---|---|
1 | Precipitation | CHIRPS | 0.05° × 0.05° | 2000–2022 |
2 | Evapotranspiration | FLDAS | 0.1° × 0.1° | 2000–2022 |
3 | Surface soil moisture | FLDAS | 0.1° × 0.1° | 2000–2022 |
4 | Root zone soil moisture | FLDAS | 0.1° × 0.1° | 2000–2022 |
5 | Groundwater level | CGWB | In situ data | 2000–2022 |
6 | Minimum temperature | Terra climate | 0.04° × 0.04° | 2000–2022 |
7 | Maximum temperature | Terra climate | 0.04° × 0.04° | 2000–2022 |
8 | Runoff | Terra climate | 0.04° × 0.04° | 2000–2022 |
Sr. no. . | Variables . | Source . | Resolution . | Time span . |
---|---|---|---|---|
1 | Precipitation | CHIRPS | 0.05° × 0.05° | 2000–2022 |
2 | Evapotranspiration | FLDAS | 0.1° × 0.1° | 2000–2022 |
3 | Surface soil moisture | FLDAS | 0.1° × 0.1° | 2000–2022 |
4 | Root zone soil moisture | FLDAS | 0.1° × 0.1° | 2000–2022 |
5 | Groundwater level | CGWB | In situ data | 2000–2022 |
6 | Minimum temperature | Terra climate | 0.04° × 0.04° | 2000–2022 |
7 | Maximum temperature | Terra climate | 0.04° × 0.04° | 2000–2022 |
8 | Runoff | Terra climate | 0.04° × 0.04° | 2000–2022 |
METHODOLOGY
Identification and processing of outliers
Errors in data management (such as field recording and database entry) and observation bore failures (such as collapsed or flooded bores) result in inaccurate measurements of groundwater level. Such mistakes and anomalies ought to be found and considered before being included in the study (Peterson et al. 2018). The ‘3σ’ criterion has been applied widely by several authors to detect the outliers present in the groundwater level data (Tran et al. 2016; Azimi et al. 2018). The ‘3σ’ rule states that 99.73% of all values for a normally distributed parameter should lie within the range of (μ − 3σ, μ + 3σ), where μ is the parameter's mean and σ is its standard deviation.
Using Equation (1), the smoothened outliers are determined, which are represented by Et. The parameter refers to weighted values applied to input values, which vary with time. The parameter is a historical value at time t near the outliers, and k refers to a positive integer, respectively.
Principal component analysis
Wavelet transform
Removing or minimising input noise in hydrologic modelling helps reduce noise impact in simulation, leading to a highly efficient model with the ability to generate findings that are extremely precise. The Fourier transform (FT) and the wavelet transform (WT) are two of the many transforms used to increase model efficiency (Jeihouni et al. 2019). There are two types of wavelets: DWT and continuous wavelet transform (CWT). DWT is a more effective and useful technique for studying time-series data than CWT, as CWT requires large amounts of processing time and data volume (Wu et al. 2021).
The groundwater time-series data are decomposed into approximation and detail components. The high-frequency, quickly varying component of the signal is represented by the detail component, while the low-frequency, slowly variable component is represented by the approximation component. The approximation component provides the most important information about the underlying trends and patterns in the data because it catches the low-frequency changes that are frequently of major relevance in groundwater level analysis. By contrast, the detail component includes high-frequency noise and fluctuations in the data, which may be less useful in predicting the overall groundwater level trends. By retaining the approximation component and rejecting the detail component, the DWT provides a data filtration strategy that enables more efficient analysis of the groundwater level time-series, as only the most essential information is maintained (Deepmala & Piscoran 2016).
LSTM
Model evaluation
Proposed hybrid PCA–DWT–LSTM model
The first LSTM model is designed to give outputs from the main features, i.e. the approximation component. Therefore, the input feature for the first model is the approximation component, which is the denoised series, and the three PCs obtained from PCA. The second LSTM model was created to enhance the capabilities in order to better understand the peak variations in groundwater level and to improve prediction outcomes. The input for the
second model is the residual series and the PCs (PC1, PC2, and PC3). The final groundwater level prediction results are obtained by combining the outputs from both models.
RESULTS AND DISCUSSION
DWT analysis
Determination of the PCs of variables using PCA
Table 2 shows the results of PCA for all nine wells with the CVCR greater than 95%. The high directionality dataset is reduced and presented by the PCs, i.e. PC1, PC2, and PC3. The linear combination defines the loadings of each major component as presented in Table 2. The loadings are generated following the normalisation of the eigenvectors associated with the eigenvalue. The loadings also indicate the contribution of each original feature to the PC and these features are the initial variables in the dataset. A high absolute value of loading means that the feature contributes significantly to that PC. In Table 2, the features runoff (R), evapotranspiration (EVAPO), and minimum temperature (MINI_TEMP) are showing higher loadings and contributing significantly to PC1, PC2, and PC3.
Well . | CVCR of three PCs . | Principal . | ||
---|---|---|---|---|
Components . | Feature . | Loading . | ||
X1 | 97.88% | PC1 | RUNOFF | 0.460041 |
PC2 | EVAPO | −0.687797 | ||
PC3 | MINI_TEMP | 0.833887 | ||
X2 | 97.81% | PC1 | RUNOFF | 0.446283 |
PC2 | EVAPO | −0.698117 | ||
PC3 | MINI_TEMP | 0.847849 | ||
X3 | 97.88% | PC1 | RUNOFF | 0.452446 |
PC2 | EVAPO | −0.700963 | ||
PC3 | MINI_TEMP | 0.872350 | ||
X4 | 97.84% | PC1 | RUNOFF | 0.454899 |
PC2 | EVAPO | −0.699614 | ||
PC3 | MINI_TEMP | 0.834079 | ||
X5 | 98.00% | PC1 | RUNOFF | 0.448317 |
PC2 | EVAPO | −0.701701 | ||
PC3 | MINI_TEMP | 0.856566 | ||
X6 | 97.90% | PC1 | RUNOFF | 0.457426 |
PC2 | EVAPO | −0.695376 | ||
PC3 | MINI_TEMP | 0.843056 | ||
X7 | 98.06% | PC1 | RUNOFF. | 0.441410 |
PC2 | EVAPO | −0.661033 | ||
PC3 | MINI_TEMP | 0.801298 | ||
X8 | 97.93% | PC1 | RUNOFF | 0.464323 |
PC2 | EVAPO | −0.682250 | ||
PC3 | MINI_TEMP | 0.819262 | ||
X9 | 96.00% | PC1 | RUNOFF | 0.448317 |
PC2 | EVAPO | −0.701701 | ||
PC3 | MINI_TEMP | 0.856566 |
Well . | CVCR of three PCs . | Principal . | ||
---|---|---|---|---|
Components . | Feature . | Loading . | ||
X1 | 97.88% | PC1 | RUNOFF | 0.460041 |
PC2 | EVAPO | −0.687797 | ||
PC3 | MINI_TEMP | 0.833887 | ||
X2 | 97.81% | PC1 | RUNOFF | 0.446283 |
PC2 | EVAPO | −0.698117 | ||
PC3 | MINI_TEMP | 0.847849 | ||
X3 | 97.88% | PC1 | RUNOFF | 0.452446 |
PC2 | EVAPO | −0.700963 | ||
PC3 | MINI_TEMP | 0.872350 | ||
X4 | 97.84% | PC1 | RUNOFF | 0.454899 |
PC2 | EVAPO | −0.699614 | ||
PC3 | MINI_TEMP | 0.834079 | ||
X5 | 98.00% | PC1 | RUNOFF | 0.448317 |
PC2 | EVAPO | −0.701701 | ||
PC3 | MINI_TEMP | 0.856566 | ||
X6 | 97.90% | PC1 | RUNOFF | 0.457426 |
PC2 | EVAPO | −0.695376 | ||
PC3 | MINI_TEMP | 0.843056 | ||
X7 | 98.06% | PC1 | RUNOFF. | 0.441410 |
PC2 | EVAPO | −0.661033 | ||
PC3 | MINI_TEMP | 0.801298 | ||
X8 | 97.93% | PC1 | RUNOFF | 0.464323 |
PC2 | EVAPO | −0.682250 | ||
PC3 | MINI_TEMP | 0.819262 | ||
X9 | 96.00% | PC1 | RUNOFF | 0.448317 |
PC2 | EVAPO | −0.701701 | ||
PC3 | MINI_TEMP | 0.856566 |
Results of the PCA–DWT–LSTM model
As previously mentioned, two LSTM models were trained, one for the denoised sequence and the other for the residual series, using the PCA findings as input. In the training process, the 19 years of data were used to train the proposed model. The model is trained individually for each well and after training each model is validated using the dataset of the years 2019–22. Trial-and-error method is applied to analyse the structure of the LSTM model for various hyperparameters, such as the number of hidden layers, neurons in the hidden layers, and the time steps of the input variables.
The regularisation that takes place during the iterative model training (such as gradient descent) is early stopping. The number of iterations that can be done before the model starts overfitting is determined by early stopping rules. A patience parameter of 35 was used to enable early stopping and prevent overfitting. A large amount of variance and noise occur within each batch if the batch size is too small, as the small sample size is unlikely to accurately represent the entire dataset. Conversely, if the batch size is large, then the data becomes overfit as it will not suit the training process' memory. In an attempt to find satisfactory conditions for the complete set of monitoring wells, batch sizes of 1, 2, 5, and 10 were attempted. The minimum average and standard deviation of the RMSE were obtained with a batch size of 10. Dropout was also employed as a regularisation strategy to prevent the model from overfitting.
As proposed by Muhammad et al. (2021), the dropout effect with a probability 0.2 gives best results. In order to prevent overfitting, a batch size of 10 and a dropout value of 0.2 were selected. Moreover, to establish the ideal number of hidden layers for modelling, five hidden layer scenarios were evaluated, ranging from 1 to 5. A large range of RMSE values for the first and fifth hidden layers were observed, but the average RMSE for the second, third, and fourth hidden layers were low with a closer range. Since the average RMSE of the three hidden layers was the lowest, the condition was chosen as the model's best design. Moreover, an excessive number of neurons cause overfitting, whereas an insufficient number of neurons hinders the network's ability to learn. Consequently, 40 neurons were selected for each of the LSTM models with three hidden layers. To assess the developed hybrid model's performance against the PCA–LSTM model, the similar hyperparameters were used to train the PCA–LSTM models for each well. The wells that showed the highest prediction accuracy were X1, X2, and X9, with more stable groundwater dynamics. The area under these wells has consistent recharge pattern, particularly during the monsoon season. Also, the lower RMSE values for these wells, i.e. 0.1011–0.1447 indicate the best performance of the hybrid model. For X5, X7, and X8 wells, the hybrid model demonstrates moderate prediction performance as the RMSE values for these wells vary from 0.1935 to 0.2794. However, the X3, X4, and X6 wells situated in hydrogeological systems with frequent fluctuations in monsoon and recharge pattern show poor performance because the RMSE values for these wells vary from 0.3315 to 2.0025. The well locations in the present study are in an urban ecosystem, which are subjected to variations in groundwater levels due to the unequal recharge rates and extraction. The results for each of the nine wells are presented in Table 3. R2 and RMSE for the PCA–DWT–LSTM model ranged from 0.82534 to 0.8828 and from 0.1011 to 2.0025, respectively. PCA–LSTM model's R2 values varied between 0.7019 and 0.8005, indicating that the PCA–DWT–LSTM model had a significantly higher R2. However, the RMSE of the PCA–DWT–LSTM model was significantly lower than that of the PCA–LSTM model that ranged from 2.9565 to 0.2662 m. After evaluating a number of models for groundwater prediction, Sun et al. (2022) concluded that the LSTM model performed better, with RMSE values ranging from 0.60 to 1.98 for wells in different zones. Whereas the developed PCA–DWT–LSTM model for groundwater level prediction in the present study yielded lower RMSE values, i.e. ranging from 0.10 to 2.00, which indicate improved precision and reliability for the prediction of groundwater level. The findings suggest that the PCA–DWT–LSTM model is better for forecasting seasonal groundwater level depth. The R² range of 0.8253–0.8648 for the developed hybrid model indicates the satisfactory prediction performance. The model also captures a significant amount of the variance in groundwater levels. Lower RMSE values indicate precise predictions, particularly in areas with stable groundwater patterns. The RMSE values range from 0.1011 to 2.0025, which indicate low to moderate prediction errors. These outcomes indicate that the model is dependable for practical purposes in the field of groundwater management. The model provides precise predictions that facilitate accurate planning of water resources especially in areas where groundwater quality varies.
. | PCA–LSTM . | PCA–WT–LSTM . | ||
---|---|---|---|---|
Well . | RMSE . | R2 . | RMSE . | R2 . |
X1 | 0.3538 | 0.7074 | 0.1011 | 0.8253 |
X2 | 0.2956 | 0.7628 | 0.1447 | 0.8828 |
X3 | 0.5247 | 0.7444 | 0.3315 | 0.8643 |
X4 | 0.6124 | 0.7019 | 0.4246 | 0.8650 |
X5 | 0.2951 | 0.7303 | 0.1935 | 0.8459 |
X6 | 2.9565 | 0.8005 | 2.0025 | 0.8598 |
X7 | 0.3118 | 0.7163 | 0.2794 | 0.8635 |
X8 | 0.5992 | 0.7215 | 0.2073 | 0.8271 |
X9 | 0.2662 | 0.7358 | 0.1312 | 0.8733 |
. | PCA–LSTM . | PCA–WT–LSTM . | ||
---|---|---|---|---|
Well . | RMSE . | R2 . | RMSE . | R2 . |
X1 | 0.3538 | 0.7074 | 0.1011 | 0.8253 |
X2 | 0.2956 | 0.7628 | 0.1447 | 0.8828 |
X3 | 0.5247 | 0.7444 | 0.3315 | 0.8643 |
X4 | 0.6124 | 0.7019 | 0.4246 | 0.8650 |
X5 | 0.2951 | 0.7303 | 0.1935 | 0.8459 |
X6 | 2.9565 | 0.8005 | 2.0025 | 0.8598 |
X7 | 0.3118 | 0.7163 | 0.2794 | 0.8635 |
X8 | 0.5992 | 0.7215 | 0.2073 | 0.8271 |
X9 | 0.2662 | 0.7358 | 0.1312 | 0.8733 |
In order to further visualise the agreement between the observed and predicted values, Figure 7 represents the results of the model during the validation phase for the nine wells. The PCA–DWT–LSTM model outperforms the PCA–LSTM model in terms of the prediction accuracy as indicated by statistical findings and data agreement. The PCA–LSTM model predictions lag actual values, resulting in less reliable outcomes at notable peaks compared to the PCA–DWT–LSTM model. The developed hybrid model handles complicated temporal patterns and filter noise using the DWT, which results in achieving more accuracy for predicting groundwater levels using the PCA–DWT–LSTM model. Also, the hybrid model is time-intensive, and it performs well for eliminating noise and smoothing outliers, whereas the PCA–LSTM model is simple and faster but less accurate for capturing complex groundwater dynamics that are more susceptible to noise. The PCA–DWT–LSTM model successfully predicts seasonal groundwater level and achieves satisfactory prediction accuracy. Moreover, the hybrid model can be modified to account for predicted changes in temperature, evapotranspiration, and precipitation over time by integrating general circulation model (GCM) data. The model's capacity to predict output corresponding to long-term climatic shifts can be improved by incorporating the climate projections, which enable the model to simulate future groundwater levels under various climate scenarios. Also, the developed model can be applied to regions with diverse hydrogeological characteristics by adapting input variables to include relevant local factors. Adjusting wavelet transforms for local fluctuations and incorporating region-specific data enhances the reliability of the developed model in predicting groundwater level. Additionally, recalibrating the LSTM component to reflect local recharge and discharge cycles ensures precise predictions, making the model adaptable to various environments.
Sensitivity analysis
The sensitivity analysis of the PCA–DWT–LSTM model is performed for the well that shows the highest predictive accuracy, i.e. X2 well. The hybrid model employs three inputs, i.e. PC1, PC2, and PC3. The sensitivity analysis of the PCA–DWT–LSTM model indicates different predictive accuracy corresponding to different combinations of input variables. From Table 4, it is clear that the most sensitive variable in the developed model is PC3, which contains minimum temperature among other variables. This is because when the PC3 variable is excluded during the model development and only PC1 and PC2 are used, the R² value drastically drops to 0.3686, while a higher RMSE value of 1.6727 is observed. This analysis clearly indicates that the model's performance is highly sensitive to the PC3 variable. By contrast, when PC1 and PC2 are removed during model development, there is no significant decrease in the model's performance, suggesting that these variables are less sensitive.
Variable . | R2 . | RMSE . |
---|---|---|
PC1, PC2 | 0.3686 | 1.6727 |
PC2, PC3 | 0.4354 | 0.7978 |
PC1, PC3 | 0.6392 | 0.5485 |
PC1, PC2, PC3 | 0.8828 | 0.1447 |
Variable . | R2 . | RMSE . |
---|---|---|
PC1, PC2 | 0.3686 | 1.6727 |
PC2, PC3 | 0.4354 | 0.7978 |
PC1, PC3 | 0.6392 | 0.5485 |
PC1, PC2, PC3 | 0.8828 | 0.1447 |
CONCLUSIONS
The study aims to develop a hybrid PCA–DWT–LSTM model that combines the PCA and DWT techniques with the LSTM for seasonal groundwater level prediction. The model is validated using the seasonal groundwater level dataset for nine wells in the Kangra district of Himachal Pradesh, India. The study showed that the use of PCA and DWT reduces the dimensionality and provides high-quality input variables with large variance and gives the trend series for the groundwater level. Furthermore, integrating two LSTM models improves the prediction accuracy at peak points while lowering average error. The sensitivity analysis postulates that the most sensitive input variable to the developed model is PC3, which contains minimum temperature as the main component. The outcomes of the study indicated that the proposed PCA–DWT–LSTM model outperforms the PCA–LSTM model in terms of R2 and RMSE, which range from 0.8253 to 0.8648 and 0.1011 to 2.0025 for the former. The study establishes the efficacy of the PCA–DWT–LSTM approach for predicting seasonal groundwater levels. The study recommends the inclusion of distinct climate projections and modification for real-time monitoring for enhancing the prediction accuracy and improving versatility of the PCA–DWT–LSTM model.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.