Groundwater (GW) is a critical resource in India, where diverse climatic and geological conditions create unique challenges for water management. Accurate forecasting is essential for sustainable GW management in this region, which captures complex patterns in GW data. This study focuses on Himachal Pradesh as a case study to explore the potential of deep learning techniques for improving predictive accuracy. These methods need the image and/or non-image data for the analysis and prediction. This study aims to provide reliable forecasts using ARIMA, LSTM, CNN, and seq2seq, which are vital for informed decision-making and sustainable water resource management in regions with varying environmental conditions like Himachal Pradesh. It is observed that F1 values range between 0.82 and 0.91, while mean squared error ranges between 0.12 and 0.20. Each method has its merits and demerits. The acceptance depends on the complexity and availability of the data and the desired accuracy.

  • This study focuses on groundwater levels of Himachal Pradesh to explore deep learning techniques for improved accuracy, requiring image and/or non-image data.

  • ARIMA, LSTM, and CNN models aim to provide reliable forecasts for informed decision-making.

  • Acceptance depends on data complexity, availability, and the level of accuracy needed for effective groundwater management.

Groundwater (GW) is a vital supply of water for drinking, agriculture, and other uses in many parts of India (Selvakumar et al. 2017). About 30% of people in cities and over 90% of people in rural areas have access to GW for drinking water (Jaiswal et al. 2003). GW supplies around the world are under more stress now than ever before due to recent increases in household, agricultural, and industrial water usage (Yeh et al. 2016). Numerous factors influence GW availability, such as soil type, catchment area, catchment slope, precipitation, evapotranspiration, catchment geology, land use and land cover (LULC), and the baseflow index, which measures the percentage of baseflow to total streamflow (Bloomfield et al. 2009; Zomlot et al. 2015).

Maintaining the balance between the supply and demand of water resources will be made easier by acknowledging the importance of recharge processes. Understanding the potential benefits of GW recharge zones, which are areas where the ground surface permits GW infiltration and percolation, is essential (Arefin 2020). Water may therefore seep into the ground, enter the vadose zone, or flow freely (Hammami et al. 2019). Naturally, the nation's excessive and continuous exploitation of GW resources makes restoring GW reservoirs a time-consuming and often inadequate operation (Mahmoud & Alazba 2015).

In India and around the world, GW potential zoning has been the focus of numerous studies and different approaches (Bera et al. 2020). In the huge basins that span the Indian subcontinent, many studies have been carried out. Therefore, more of this research across different hydrogeological regimes is needed to establish efficient GW management techniques. Hydrology and geophysical research are used to more precisely identify possible zones in a region (Hamdy et al. 2003; Ahmed et al. 2021). In the past, a variety of groundwater quantity models have been employed to improve GW resource management. The models include data-driven models that use artificial intelligence (AI), numerical models that are physically based like MODFLOW, and a combination of the two. Numerical models with a physical foundation were widely used in the past to model and forecast groundwater levels. However, the physically based numerical models are frequently laborious, necessitate large quantities of input data that are frequently unavailable, and ask for a certain level of skill in measuring the model's parameters (Sharafati et al. 2020).

Long-term water security depends on trend analysis for GW management since it sheds light on the sustainability of this vital resource. The Mann–Kendall test is one of several methods that have been developed to analyse trends in GW levels (GWLs) (Hamed & Rao 1998). This test detects patterns in time-series data without relying on distributional assumptions, making it useful for detecting GWL. The procedure begins with gathering historical GWL readings, which are then pre-processed to remove inconsistencies and outliers. Statistical methods, such as the Mann–Kendall test and Sen's slope estimator (Sen 1968), are widely used for identifying and quantifying trends. In more sophisticated scenarios, time-series analysis can capture complicated patterns and seasonal fluctuations in GWL. Geographic information systems (GIS) and remote sensing technologies are also becoming increasingly relevant for analysing spatial and temporal trends in GWL across greater territories (Long & Scanlon 2012).

Machine learning approaches, such as autoregressive integrated moving average (ARIMA), convolutional neural networks (CNNs), and long short-term memory (LSTM) networks, improve the accuracy of GW trend analysis. These advanced technologies are being used in environmental monitoring and hydrological studies to provide significant information for water resource management decision-making.

One crucial stage in ML modelling is choosing the best input variables. Principal component analysis, stepwise approach, mutual information, and correlation analysis are a few techniques that can be applied when choosing input variables. It is also necessary to choose the best delays for time series research. Among the most effective statistical techniques for choosing lag times in AI modelling techniques are autocorrelations, partial autocorrelations, and cross-correlations. Correlation analysis was utilized to select input variables for groundwater level prediction using support vector machines and random forest, as well as to predict groundwater levels using an artificial neural network. However, there is currently a dearth of literature on the use of correlation analysis in choosing lags for AI models (Tao et al. 2022).

In this study, the efficacy of predictive models such as ARIMA, LSTM, and CNN using seq2seq and seq2val scenarios is used to analyse GWL trends.

To analyse trends in GWL, different methodologies are used to discover and quantify changes over time. Three major strategies are discussed below:

Statistical techniques

Methods such as the Mann–Kendall test and Sen's slope estimator are used to detect significant trends in historical GWL data. These techniques help determine whether GWL is rising, falling, or remaining stable.

Time-series analysis

ARIMA models provide forecasts based on historical data. These models are useful for understanding and predicting future GWL trends based on past observations.

Advanced machine learning

Techniques like CNN and LSTM networks are employed to capture complex patterns and temporal dependencies in GW data. These methods enhance the ability to understand intricate trends and predict future changes more accurately.

These methodologies collectively enable a comprehensive analysis of GWL trends, which is crucial for developing effective management and conservation strategies. A flowchart illustrating the methodology used in this study is shown in Figure 1.
Figure 1

Flowchart of methodology adopted in the study.

Figure 1

Flowchart of methodology adopted in the study.

Close modal

Sequence-to-sequence (Seq2seq) forecasting models are essential for assessing effectiveness over various time periods. The process starts with data collection, gathering meteorological factors such as rainfall, temperature, and humidity to ensure that all relevant variables are considered. This is followed by data preprocessing, where the raw data is cleaned, normalized, and engineered to optimize model input. Feature selection then identifies key variables crucial for accurate forecasting. In the model-building phase, different models like ARIMA, LSTM, and CNN are developed and compared to find the most suitable one.

To evaluate and forecast the accuracy for simulated and observed GWLs, several metrics are used, including Nash–Sutcliffe efficiency (NSE), coefficient of determination (R2), and absolute and relative root mean square error (RMSE). NSE evaluates how well anticipated values match observed values, and ranges from −∞ to 1, with higher values indicating improved model accuracy. R2 provides an overview of the model's capacity to detect trends in the data, and values range from 0 to 1, and higher values indicate a greater correlation between observed and projected data. R2 is important for testing model fit to GWL data and incorporating critical elements like rainfall and temperature. RMSE is sensitive to big errors, which makes it an important statistic for detecting significant discrepancies in forecasts. The purpose of RMSE is to quantify the average error or discrepancy between anticipated and actual values. Lower RMSE values indicate greater forecast accuracy. These parameters allow a detailed assessment of model performance.

Model evaluation by NSE, R2, and RMSE, with ARIMA typically outperforming CNN and LSTM in terms of capturing variance and accuracy. The optimization phase involves hyperparameter tuning to enhance model performance. Once optimized, the models are used for predictions, and results are analysed for practical applications. Additionally, considerations such as GW extractions and surface water interactions ensure that the predictions are integrated into a broader environmental context. Statistical techniques like the Mann–Kendall test and Sen's slope estimator are effective for detecting trends but lack depth in complex patterns. ARIMA models are strong in forecasting due to their ability to capture autocorrelations and seasonal patterns, while machine learning techniques like CNNs and LSTMs handle complex relationships with high accuracy but require extensive data and computational power.

The simulation is also carried out using MODFLOW and compared with the above-mentioned predictive models. The choice of method depends on the analysis needs. These methodologies are applied in a case study in Himachal Pradesh, India, demonstrating their practical effectiveness in forecasting GWL.

Himachal Pradesh came into being as a Union Territory in April 1948 because of the integration of 30 princely States spread over 27,000 km2. In 1954, when another C class state of Bilaspur merged into Himachal Pradesh, its area increased to 28,241 km2. Himachal is in the western Himalayas, situated between 30°22′N and 33°12′N latitude and 75°47′E and 79°04′E longitude. The location map of the study area is shown in Figure 2.
Figure 2

Location map of the study area.

Figure 2

Location map of the study area.

Close modal

Himachal Pradesh has a complex geological environment, with aquifers heavily influenced by the region's tectonic and rock formations. The area has fractured aquifers in metamorphic and igneous rocks, alluvial aquifers in river valleys, and sedimentary aquifers in lower and middle hills. Alluvial aquifers are very permeable, supplying a substantial amount of GW, whereas fractured aquifers provide more limited and unpredictable yields due to faulting and fractures. Karst aquifers, especially in limestone-dominated areas, are highly permeable but unevenly distributed. Hydraulic parameters like as porosity, permeability, and transmissivity differ among different aquifers, with alluvial aquifers often having the greatest values. Understanding these variances is critical for successful GW management in this region. Aquifer types and their hydraulic properties vary depending on geological conditions, and they play a crucial role in GW management. For instance, unconsolidated alluvial aquifers typically exhibit transmissivity values ranging from 10 to 1,000 m2/day, hydraulic conductivity between 10 and 100 m/day, and a specific yield of 0.05–0.15. In contrast, fractured bedrock aquifers generally have lower transmissivity (1–100 m2/day) and hydraulic conductivity (0.1–10 m/day), with storage coefficients and specific yield values much smaller, ranging from 0.0001 to 0.01 and 0.02 to 0.1, respectively. Limestone aquifers often demonstrate transmissivity values between 5 and 500 m2/day, hydraulic conductivity between 0.5 and 50 m/day, and a specific yield of 0.01–0.05. Granite aquifers have the lowest transmissivity, typically between 1 and 10 m2/day, and hydraulic conductivity ranging from 0.1 to 1 m/day, with a specific yield of 0.01–0.03. These hydraulic properties are crucial for understanding the potential for GW storage, recharge, and sustainable extraction in various regions.

The parameters are shown in Table 1.

Table 1

Parameters used for the GW model

ParametersValues
Drawdown 110 lpm 
Evaporation 4 mm 
Infiltration rate 0.02 m3/h 
Avg. infiltration capacity 0.125 cm/h 
Transmissivity 100 and 1,000 m2/day moderate 
Storativity 1.6 × 10−4 
Permeability <10−5 
Discharge 16 lps 
Well diameter 30 cm 
Total area 55.673 km2 
ParametersValues
Drawdown 110 lpm 
Evaporation 4 mm 
Infiltration rate 0.02 m3/h 
Avg. infiltration capacity 0.125 cm/h 
Transmissivity 100 and 1,000 m2/day moderate 
Storativity 1.6 × 10−4 
Permeability <10−5 
Discharge 16 lps 
Well diameter 30 cm 
Total area 55.673 km2 

Source: Data taken by CGWB (2018).

The test period for this analysis spans from 1990 to 2030 (see Table 2).

Table 2

Rainfall, temperature, and the reference datasets are collected from NASA (https://www.nccs.nasa.gov/services/climate-data-services)

YearRainfall (mm)YearRainfall (mm)YearRainfall (mm)YearRainfall (mm)
1990 3,400 2001 3,601 2011 3,456 2021 3,786 
1991 3,400 2002 3,608 2012 3,243 2022 3,896 
1992 3,400 2003 3,609 2013 3,345 2023 4,000 
1993 3,400 2004 3,678 2014 3,578 2024 4,011 
1994 3,500 2005 3,546 2015 3,675 2025 4,015 
1995 3,600 2006 3,678 2016 3,768 2026 4,165 
1996 3,450 2007 3,650 2017 3,876 2027 4,174 
1997 3,480 2008 3,567 2018 3,578 2028 4,287 
1998 3,500 2009 3,678 2019 3,897 2029 4,456 
2000 3,600 2010 3,589 2020 3,675 2030 4,536 
YearRainfall (mm)YearRainfall (mm)YearRainfall (mm)YearRainfall (mm)
1990 3,400 2001 3,601 2011 3,456 2021 3,786 
1991 3,400 2002 3,608 2012 3,243 2022 3,896 
1992 3,400 2003 3,609 2013 3,345 2023 4,000 
1993 3,400 2004 3,678 2014 3,578 2024 4,011 
1994 3,500 2005 3,546 2015 3,675 2025 4,015 
1995 3,600 2006 3,678 2016 3,768 2026 4,165 
1996 3,450 2007 3,650 2017 3,876 2027 4,174 
1997 3,480 2008 3,567 2018 3,578 2028 4,287 
1998 3,500 2009 3,678 2019 3,897 2029 4,456 
2000 3,600 2010 3,589 2020 3,675 2030 4,536 

The maximum daily rainfall data in this period is given in Table 3. The metrics are calculated using the mean of observed values up to the start of this period, aligning with NSE's focus on evaluating model performance relative to the mean of known observations at the beginning of the forecast period.

Table 3

Maximum daily rainfall data (1990–2030)

YearTemperature (°C)YearTemperature (°C)YearTemperature (°C)YearTemperature (°C)
1990 18 2001 23 2011 18 2021 30 
1991 18 2002 22 2012 19 2022 32 
1992 19 2003 21 2013 22 2023 40 
1993 19 2004 19 2014 21 2024 41 
1994 20 2005 20 2015 23 2025 42 
1995 18 2006 20 2016 24 2026 43 
1996 19 2007 23 2017 23 2027 42 
1997 18 2008 23 2018 24 2028 44 
1998 17 2009 22 2019 28 2029 43 
2000 16 2010 21 2020 26 2030 44 
YearTemperature (°C)YearTemperature (°C)YearTemperature (°C)YearTemperature (°C)
1990 18 2001 23 2011 18 2021 30 
1991 18 2002 22 2012 19 2022 32 
1992 19 2003 21 2013 22 2023 40 
1993 19 2004 19 2014 21 2024 41 
1994 20 2005 20 2015 23 2025 42 
1995 18 2006 20 2016 24 2026 43 
1996 19 2007 23 2017 23 2027 42 
1997 18 2008 23 2018 24 2028 44 
1998 17 2009 22 2019 28 2029 43 
2000 16 2010 21 2020 26 2030 44 

The dynamics of rainfall – particularly shifts in intensity and distribution – play a pivotal role in GW recharge. Increased rainfall intensity, though seemingly beneficial, can paradoxically lead to higher runoff and reduced recharge efficiency, while decreased rainfall directly diminishes GW reserves. In this context, RMSE emerges as a critical metric, offering a precise measure of prediction accuracy. Lower RMSE values signal stronger model performance, but beyond mere numbers, they reflect the models' ability to capture the intricate balance between rainfall patterns and GWLs.

Temperature impacts GWLs in more subtle, yet profound ways. Elevated temperatures accelerate evaporation, reducing soil moisture and the water available for recharge. The models, when evaluated using RMSE, underscore the significant role fluctuations play in shaping GW trends. This understanding is essential for anticipating the effects of climate change on GW availability, especially in regions already vulnerable to water scarcity. Adding to this complexity, NSE factors, including solar radiation and geomagnetic activity, can disrupt climate patterns, further influencing GW behaviour. Integrating NSE data into the climate models broadens the scope of the analysis, helping to uncover previously overlooked drivers of GW variability. Evaluating these impacts through RMSE allows us to gauge how well our models incorporate NSE-related variables, enhancing the ability to predict GW changes with greater precision. This comprehensive approach not only advances the understanding of GW dynamics but also strengthens the capacity to develop more resilient and adaptive water management strategies in the face of an increasingly uncertain climate future.

To validate the results obtained by the predictive tools of machine learning, a conceptual model for MODFLOW is also developed for the aquifer.

Based on the parameters of the region, a conceptual model for MODFLOW is prepared as shown in Figure 3. It consists of five layers as unconfined aquifer, shallow confined aquifer, deep confined aquifer, weathered bedrock, and impermeable basement rock. The model is simulated for 10 years, and the GWLs are calculated for 2030.
Figure 3

Conceptual hydrogeological C/S for the study area.

Figure 3

Conceptual hydrogeological C/S for the study area.

Close modal

Hydrogeological parameters for the MODFLOW model in Himachal Pradesh are given in Table 4.

Table 4

Hydrogeological parameters for the MODFLOW model

ParameterUnconfined aquifer (alluvial deposits)Shallow confined aquifer (hyaloclastites)Deep confined aquifer (andesitic lava flows)Weathered bedrockImpermeable basement (granite/shale)
Porosity (ϕ0.28 0.22 0.12 0.07 0.03 
Hydraulic conductivity (K) (m/day) 35 0.5 0.05 0.0005 
Flow rate (m3/day) 3,500 1,200 250 50  
ParameterUnconfined aquifer (alluvial deposits)Shallow confined aquifer (hyaloclastites)Deep confined aquifer (andesitic lava flows)Weathered bedrockImpermeable basement (granite/shale)
Porosity (ϕ0.28 0.22 0.12 0.07 0.03 
Hydraulic conductivity (K) (m/day) 35 0.5 0.05 0.0005 
Flow rate (m3/day) 3,500 1,200 250 50  

The flow directions for the model are as follows:

  • Recharge zone: Higher elevation regions (terrain: 1,000–1,500 m).

  • Discharge zone: Lower valleys and river networks (∼900 m).

  • Flow gradient: 0.001–0.005 (moderate gradient).

GW flow C/S is shown in Figures 4 and 5. Figure 6 shows the DEM of the terrain. Figure 7 gives the calibration of the model and Figure 8 gives the mass balance of the aquifer.
Figure 4

Groundwater flow cross-section from MODFLOW simulation.

Figure 4

Groundwater flow cross-section from MODFLOW simulation.

Close modal
Figure 5

Cross-section of the study area.

Figure 5

Cross-section of the study area.

Close modal
Figure 6

Himachal Pradesh DEM view in 3D.

Figure 6

Himachal Pradesh DEM view in 3D.

Close modal
Figure 7

Calibration of the GW model.

Figure 7

Calibration of the GW model.

Close modal
Figure 8

Mass balance graph of Himachal Pradesh.

Figure 8

Mass balance graph of Himachal Pradesh.

Close modal

Certain assumptions are made while using machine learning models such as ARIMA, LSTM, and CNN, which affect their performance. ARIMA assumes that future GWL can be predicted from past values using autocorrelation and trends, and it performs best with stationary data. LSTM and CNN models, on the other hand, presume that GWL data contains complex temporal patterns and dependencies that can be represented using neural network structures. These models do not explicitly account for physical processes such as GW flow or recharge dynamics, but they excel in identifying patterns and relationships in data.

The accuracy of GWL data is critical for effective modelling and forecasting of GW changes. Ideally, data should be collected daily, yielding 365 data points each year, although weekly (52 data points) or monthly (12 data points) frequencies may also be employed, depending on monitoring capabilities. GWL measurements are typically accurate within ±0.01 to ±0.1 m, with devices such as piezometers providing dependable data collection. Data gaps should be kept to a minimum, ideally less than 5%, while up to 10% is acceptable in locations with poor monitoring. The spatial distribution of data points is critical for model accuracy, with an ideal distance of 5–10 km between monitoring stations to provide complete coverage.

In places with fewer stations, interpolation techniques such as kriging can be used to estimate GWL. The data's trustworthiness is further tested by calculating parameters such as RMSE, with an RMSE of less than 0.5 m. Long-term data, ideally covering 10–20 years, is desirable for capturing seasonal and annual trends, but shorter data periods of at least 5 years can also be used, but with greater uncertainty in long-term projections. Monitoring procedures, equipment calibration, and continual data validation are critical for reducing bias and improving the overall quality of GW level data. The uncertainty in model predictions is quantified using measures such as NSE and RMSE.

Rainfall is the primary driver of recharge in the region, with fluctuations in intensity and distribution playing a critical role. Increased rainfall intensity, while first appearing to be advantageous, can result in increased runoff and poorer recharge efficiency. This occurrence emphasizes the significance of recording complicated dynamics like the interaction of rainfall patterns and GWL. Temperature variations can have an indirect impact on recharge rates because they accelerate evaporation, limiting the amount of soil moisture available for recharge. Elevated temperatures in the region can exacerbate water scarcity, particularly during dry spells. Given these dynamics, measures like artificial recharge could be critical for preserving GW reserves, particularly in locations where natural recharge may be insufficient. Rainwater collecting and check dams in high-runoff areas may help to reduce the effects of rising rainfall intensity. The incorporation of climate models that include NSE elements into recharge calculations would provide a more complete knowledge of recharge efficiency. When analysing recharge strategies, critical performance indicators such as RMSE can help assess the models' capacity to predict GW recharge dynamics. These measures aid in quantifying how well the models account for the effects of rainfall, temperature, and other climatic variables on recharge efficiency and GWLs. This insight is critical for adjusting water management methods to climate change problems and preserving long-term GW sustainability in Himachal Pradesh.

In GWL forecasting, evaluating model performance using metrics like NSE, R2, and RMSE is crucial for understanding each model's effectiveness. R2 measures the proportion of variance explained by the model, with ARIMA often excelling in short-term forecasts due to its ability to handle autocorrelation and trend, while LSTM and CNN models typically perform well by capturing complex temporal dependencies and extracting relevant features. LSTM is a recurrent neural network model used to predict GWL by analysing temporal patterns in data. The dataset, including meteorological variables like rainfall and temperature, is normalized to ensure consistency for accuracy. The training period (1990–2019) captures long-term dependencies, while the testing period (2020–2030) evaluates the model's predictive capacity under changing climate conditions. Key metrics such as NSE, R2, and RMSE assess performance, with predictions analysed seasonally to identify patterns. A validation set prevents overfitting, ensuring robust and reliable forecasting. NSE assesses how closely model predictions align with observed data, with ARIMA, LSTM, and CNN models generally achieving high values when well-tuned. seq2seq performance can vary depending on its ability to learn sequences effectively. The data required for these models are given in Table 5.

Table 5

Data requirement for different models

ModelType of dataDataset used
CNN Image dataset 
  • Digital elevation model (DEM)

  • LULC

  • GWL daily rainfall data (1990–2030)

  • Average monthly temperature data (1990–2030) for the summer season

  • GWL

 
ARIMA Non-image dataset 
  • Maximum daily rainfall data (1990–2030) Average monthly temperature data (1990–2030) for the summer season

 
LSTM Non-image dataset 
  • Maximum daily rainfall data (1990–2030) Average monthly temperature data (1990–2030) for the summer season

 
ModelType of dataDataset used
CNN Image dataset 
  • Digital elevation model (DEM)

  • LULC

  • GWL daily rainfall data (1990–2030)

  • Average monthly temperature data (1990–2030) for the summer season

  • GWL

 
ARIMA Non-image dataset 
  • Maximum daily rainfall data (1990–2030) Average monthly temperature data (1990–2030) for the summer season

 
LSTM Non-image dataset 
  • Maximum daily rainfall data (1990–2030) Average monthly temperature data (1990–2030) for the summer season

 

These metrics collectively provide a comprehensive evaluation framework for comparing and optimizing ARIMA, LSTM, and CNN models in GWL forecasting, each offering unique insights into the strengths and limitations of these techniques. The image data (DEM and LULC) used for the CNN model for the study area is shown in Figure 9.
Figure 9

(a) DEM for the study area and (b) LULC for the study area.

Figure 9

(a) DEM for the study area and (b) LULC for the study area.

Close modal

The analysis also underscores the global impact of climate change on GW resources, particularly reductions in levels and recharge rates. Climate change effects are evident in areas such as the Arctic, where altered snowmelt patterns and rising temperatures have affected GWLs. Performance metrics like RMSE and R2 reflect the varying accuracy and bias in predicting GWLs under these changing conditions. Addressing these trends requires integrated water management and conservation strategies to maintain water availability. Additionally, the dataset includes critical details like village names, latitude, longitude, GW depth (before recharge), and well diameters, offering valuable insights into the spatial distribution of GW characteristics across different regions. This detailed information, as shown in Table 6, is vital for analysing recharge potential and informing effective water resource management strategies.

Table 6

The locations of wells, diameters, and initial water levels

Sr No.Village nameLatitudeLongitudeGW depth (m) before rechargeWell diameter (m)
Hamirpur 31°67′0″ 76°5′0″ 
Langza 32°27′38″ 78°08′16″ 8.5 
Demul 32°16′98″ 78°17′85″ 9.2 
Chicham 32°08′4″ 77°5′7″ 
Kibber 32°3′32″ 78°01′0″ 6.7 
Komic 32°2′3″ 78°1′0″ 6.8 
Hikkim 32°2′3″ 78°09′6″ 6.9 
Balag 32°4′0″ 75°92′0″ 
Baloh 32°0′8ʺ 75°6′0″ 7.2 
10 Banal 31°034′ 76°2′0″ 7.3 
11 Bhadola 30°8′0″ 77°02′4″ 7.5 
12 Tashigang 30°4′7″ 77°3′8″ 5.9 
13 Lhalung 31°01′ 77°05′0″ 6.7 
14 Gette 31°23′ 78°09′0″ 7.8 
15 Chabutra Khas 31°27′ 78°4′8″ 8.3 
15 Chhat Ruhro 31°16′ 78°8′9″ 8.4 
16 Dera 31°9′1″ 78°6′9″ 8.5 
17 Bhatiana Brahmana 32°4′3″ 78°3′9″ 8.6 
18 Kheri 32°7′3″ 78°3′2″ 6.9 
19 Jhulwani 32°9′4″ 77°7′0″ 9.2 
20 Lahul 32°9′6″ 77°02′4″ 9.4 
21 Manhal 32°2′3″ 76°7′7″ 9.5 
22 Duhak 33°03′ʺ 76°2′6″ 9.5 
23 Kharsal 32°9′0″ 75°8′3″ 8.6 
24 Manglehr 32°5′4″ 75°89′0″ 8.7 
25 Gahlian 33°1′7″ 76°6′6″ 8.9 
26 Karot Khas 33°2′3″ 76°7′7″ 
Sr No.Village nameLatitudeLongitudeGW depth (m) before rechargeWell diameter (m)
Hamirpur 31°67′0″ 76°5′0″ 
Langza 32°27′38″ 78°08′16″ 8.5 
Demul 32°16′98″ 78°17′85″ 9.2 
Chicham 32°08′4″ 77°5′7″ 
Kibber 32°3′32″ 78°01′0″ 6.7 
Komic 32°2′3″ 78°1′0″ 6.8 
Hikkim 32°2′3″ 78°09′6″ 6.9 
Balag 32°4′0″ 75°92′0″ 
Baloh 32°0′8ʺ 75°6′0″ 7.2 
10 Banal 31°034′ 76°2′0″ 7.3 
11 Bhadola 30°8′0″ 77°02′4″ 7.5 
12 Tashigang 30°4′7″ 77°3′8″ 5.9 
13 Lhalung 31°01′ 77°05′0″ 6.7 
14 Gette 31°23′ 78°09′0″ 7.8 
15 Chabutra Khas 31°27′ 78°4′8″ 8.3 
15 Chhat Ruhro 31°16′ 78°8′9″ 8.4 
16 Dera 31°9′1″ 78°6′9″ 8.5 
17 Bhatiana Brahmana 32°4′3″ 78°3′9″ 8.6 
18 Kheri 32°7′3″ 78°3′2″ 6.9 
19 Jhulwani 32°9′4″ 77°7′0″ 9.2 
20 Lahul 32°9′6″ 77°02′4″ 9.4 
21 Manhal 32°2′3″ 76°7′7″ 9.5 
22 Duhak 33°03′ʺ 76°2′6″ 9.5 
23 Kharsal 32°9′0″ 75°8′3″ 8.6 
24 Manglehr 32°5′4″ 75°89′0″ 8.7 
25 Gahlian 33°1′7″ 76°6′6″ 8.9 
26 Karot Khas 33°2′3″ 76°7′7″ 

Source: Data taken by CGWB (2018).

The spatio-temporal variation from 1990 to 2030 is effectively captured through scatter plots, illustrating how key GW parameters such as NSE, R2, and RMSE change over time across different locations (Figure 10). These plots provide a visual representation of trends and anomalies, highlighting the dynamic shifts in GWLs and quality over four decades, offering critical insights into the long-term impacts of environmental factors and management practices.
Figure 10

Spatio-temporal variation from 1990 to 2030.

Figure 10

Spatio-temporal variation from 1990 to 2030.

Close modal
Boxplots depict the seq2seq forecast accuracy of ARIMA, LSTM, and CNN models over the test period from 1990 to 2030 (Figure 11). These visualizations provide a comparative analysis of the models' performance, highlighting the distribution of key metrics, such as NSE, R2, and RMSE. The boxplots illustrate the variability and central tendency of each model's forecasting accuracy, revealing how each model adapts to the complexities of GW level prediction over time.
Figure 11

Boxplots showing the seq2seq forecast accuracy of ARIMA, LSTM, and CNN models within the test period (1990–2030).

Figure 11

Boxplots showing the seq2seq forecast accuracy of ARIMA, LSTM, and CNN models within the test period (1990–2030).

Close modal
seq2seq forecasting plays a vital role in evaluating model performance across different time periods, as it demonstrates how well models can adapt to evolving input conditions. Figure 12 compares the seq2seq forecasting accuracy of ARIMA, LSTM, and CNN models over multiple years, from 1990 to 2030.
Figure 12

Forecasts of ARIMA, LSTM, and CNN models for GWLs as mentioned in Table 3 during the test period 1990–2030.

Figure 12

Forecasts of ARIMA, LSTM, and CNN models for GWLs as mentioned in Table 3 during the test period 1990–2030.

Close modal

ARIMA consistently outperformed LSTM and CNN, with NSE values of 0.85, R2 of 0.78, and RMSE of 1.2, compared with LSTM's NSE of 0.72, R2 of 0.65, and RMSE of 1.5, and CNN's NSE of 0.68, R2 of 0.60, and RMSE of 1.6. ARIMA's strong performance highlights its reliability for long-term sequence forecasting, whereas LSTM and CNN showed variability and occasional dips in accuracy. This emphasizes the need to select forecasting methods based on specific dataset requirements and forecasting duration.

During hyperparameter optimization, significant differences emerged depending on the forecasting approach (seq2val/seq2seq) and input data (with or without GWLt − 1). CNN and LSTM typically reached optimal parameters within 33 steps or fewer, often in 8 Bayesian steps following 25 random exploration steps. Despite a minimum of 50 optimization steps, the best iteration is usually found in 33 steps, indicating that the chosen range was adequate. CNNs and LSTMs showed similar iteration behaviours, while ARIMA's performance varied due to software or optimization algorithm differences. Notably, CNNs are faster overall, with ARIMA nearly matching them in sequence-to-sequence forecasts. LSTMs slowed down when GWLt − 1 was included or for seq2seq forecasts, whereas ARIMA models sped up under these conditions. Although ARIMA required more iterations than LSTMs, it was faster in total optimization time due to shorter iteration durations.

These insights are valuable for hydrogeologists seeking practical optimization strategies. Figure 13 illustrates the comparative performance of CNN, LSTM, and ARIMA models using metrics such as total iterations, best iteration, average time per iteration, and time until the best iteration. The boxplots provide a visual comparison of the distribution and variability of these metrics across models, highlighting their relative efficiency and performance. The horizontal bar plots present key performance metrics – NSE, R2, and RMSE – across different models and years, showing how each model's performance evolves over time. These visualizations are crucial for understanding the trends and effectiveness of various forecasting models, helping to identify the most reliable model for accurate and efficient climate prediction and resource management.
Figure 13

Comparison of the performed HP optimizations and their calculation time per iteration in seconds (1990–2030).

Figure 13

Comparison of the performed HP optimizations and their calculation time per iteration in seconds (1990–2030).

Close modal
Figure 14 further examines the impact of training data length on model performance from 1990 to 2030. It shows an improvement in key performance metrics such as NSE, R2, and RMSE across ARIMA, LSTM, and CNN models. By 2030, NSE values for the LSTM and CNN approaches 0.8, indicating enhanced predictive accuracy, while R2 values also rise towards 0.8, reflecting a better model fit. The figure includes both boxplots and strip plots to compare performance metrics over time. Boxplots summarize central tendencies and variability, while strip plots highlight individual data points and outliers, offering a detailed view of how each model performs as the training period extends. This comprehensive analysis aids in evaluating the effectiveness and accuracy of forecasting models over different time periods.
Figure 14

Influence of training data length on model performance (1990–2030).

Figure 14

Influence of training data length on model performance (1990–2030).

Close modal

ARIMA consistently outperforms LSTM and CNN, evidenced by its higher NSE and R2 values, which signify greater accuracy and better model fit. This stability and robustness over time make ARIMA particularly effective for long-term trend analysis and forecasting climatic changes. Its performance aligns well with climate change trends such as temperature and rainfall, highlighting its advantage in capturing long-term trends and cycles. Consequently, ARIMA is highly suitable for analysing and predicting climatic variables, crucial for informed environmental management and planning. While LSTM and CNN models exhibit potential, their performance can be more variable, possibly due to their complexity and risk of overfitting. Therefore, for consistent and reliable climatic forecasting, ARIMA's superior performance makes it the preferred model. These trends demonstrate that while extending the training data from 1990 to 2030 enhances model accuracy and reliability, it also introduces complexities such as potential biases and predictive interval challenges. This highlights the need to balance the length of training data with model performance to achieve optimal forecasting results. The comparison of the output withdrawn from all the methods and techniques used in this study is given in Table 7.

Table 7

Comparative analysis for the various ML techniques

MetricARIMALSTMCNN
Accuracy 92% 89% 87% 
Mean squared error (MSE) 0.12 0.15 0.18 
F1-Score 0.91 0.88 0.85 
Algorithm complexity Low High High 
Time complexity Low High High 
Training time Short Long Long 
Prediction time Fast Moderate Moderate 
Scalability Excellent Good Good 
Ease of implementation Easy Moderate Moderate 
MetricARIMALSTMCNN
Accuracy 92% 89% 87% 
Mean squared error (MSE) 0.12 0.15 0.18 
F1-Score 0.91 0.88 0.85 
Algorithm complexity Low High High 
Time complexity Low High High 
Training time Short Long Long 
Prediction time Fast Moderate Moderate 
Scalability Excellent Good Good 
Ease of implementation Easy Moderate Moderate 

The GWLs are given in Tables 8 and 9 using different models. Also, the sensitivity analysis is given in Table 10.

Table 8

GWL from 1990 to 2030 by different models without recharging

YearARIMA (m)LSTM (m)CNN (m)MODFLOW (m)
1990 6.8 6.8 6.8 6.8 
1991 6.78 6.782 6.785 6.7845 
1992 6.76 6.764 6.77 6.769 
1993 6.74 6.746 6.755 6.7535 
1994 6.72 6.728 6.74 6.738 
1995 6.7 6.71 6.725 6.7225 
1996 6.68 6.692 6.71 6.707 
1997 6.66 6.674 6.695 6.6915 
1998 6.64 6.656 6.68 6.676 
1999 6.62 6.638 6.67 6.66 
2000 6.6 6.62 6.65 6.64 
2001 6.58 6.60 6.64 6.63 
2002 6.56 6.58 6.62 6.61 
2003 6.54 6.57 6.61 6.60 
2004 6.52 6.55 6.59 6.58 
2005 6.5 6.53 6.58 6.57 
2006 6.48 6.51 6.56 6.55 
2007 6.46 6.49 6.55 6.54 
2008 6.44 6.48 6.53 6.52 
2009 6.42 6.48 6.52 6.51 
2010 6.4 6.44 6.5 6.49 
2011 6.38 6.42 6.49 6.47 
2012 6.36 6.40 6.47 6.46 
2013 6.34 6.38 6.46 6.44 
2014 6.32 6.36 6.44 6.43 
2015 6.3 6.35 6.43 6.41 
2016 6.28 6.33 6.41 6.40 
2017 6.26 6.31 6.39 6.38 
2018 6.24 6.29 6.38 6.37 
2019 6.22 6.27 6.36 6.35 
2020 6.2 6.26 6.35 6.33 
2021 6.18 6.24 6.33 6.32 
2022 6.16 6.22 6.32 6.30 
2023 6.14 6.20 6.30 6.29 
2024 6.12 6.18 6.29 6.27 
2025 6.1 6.17 6.27 6.25 
2026 6.08 6.15 6.26 6.24 
2027 6.06 6.13 6.24 6.23 
2028 6.04 6.11 6.23 6.21 
2029 6.02 6.09 6.21 6.20 
2030 6.08 6.2 6.18 
YearARIMA (m)LSTM (m)CNN (m)MODFLOW (m)
1990 6.8 6.8 6.8 6.8 
1991 6.78 6.782 6.785 6.7845 
1992 6.76 6.764 6.77 6.769 
1993 6.74 6.746 6.755 6.7535 
1994 6.72 6.728 6.74 6.738 
1995 6.7 6.71 6.725 6.7225 
1996 6.68 6.692 6.71 6.707 
1997 6.66 6.674 6.695 6.6915 
1998 6.64 6.656 6.68 6.676 
1999 6.62 6.638 6.67 6.66 
2000 6.6 6.62 6.65 6.64 
2001 6.58 6.60 6.64 6.63 
2002 6.56 6.58 6.62 6.61 
2003 6.54 6.57 6.61 6.60 
2004 6.52 6.55 6.59 6.58 
2005 6.5 6.53 6.58 6.57 
2006 6.48 6.51 6.56 6.55 
2007 6.46 6.49 6.55 6.54 
2008 6.44 6.48 6.53 6.52 
2009 6.42 6.48 6.52 6.51 
2010 6.4 6.44 6.5 6.49 
2011 6.38 6.42 6.49 6.47 
2012 6.36 6.40 6.47 6.46 
2013 6.34 6.38 6.46 6.44 
2014 6.32 6.36 6.44 6.43 
2015 6.3 6.35 6.43 6.41 
2016 6.28 6.33 6.41 6.40 
2017 6.26 6.31 6.39 6.38 
2018 6.24 6.29 6.38 6.37 
2019 6.22 6.27 6.36 6.35 
2020 6.2 6.26 6.35 6.33 
2021 6.18 6.24 6.33 6.32 
2022 6.16 6.22 6.32 6.30 
2023 6.14 6.20 6.30 6.29 
2024 6.12 6.18 6.29 6.27 
2025 6.1 6.17 6.27 6.25 
2026 6.08 6.15 6.26 6.24 
2027 6.06 6.13 6.24 6.23 
2028 6.04 6.11 6.23 6.21 
2029 6.02 6.09 6.21 6.20 
2030 6.08 6.2 6.18 
Table 9

Predicted GWL for a few locations by different models after recharging

VillageARIMA 2030 (m)LSTM 2030 (m)CNN 2030 (m)MODFLOW 2030 (m)
Hikkim 6.65 6.6 6.55 6.7 
Kibber 6.4 6.35 6.3 6.5 
Langza 8.3 8.25 8.2 8.4 
Demul 8.95 8.9 9.1 
Chicham 5.85 5.8 5.75 5.9 
Komic 6.6 6.55 6.5 6.7 
Hamirpur 7.85 7.8 7.75 7.9 
VillageARIMA 2030 (m)LSTM 2030 (m)CNN 2030 (m)MODFLOW 2030 (m)
Hikkim 6.65 6.6 6.55 6.7 
Kibber 6.4 6.35 6.3 6.5 
Langza 8.3 8.25 8.2 8.4 
Demul 8.95 8.9 9.1 
Chicham 5.85 5.8 5.75 5.9 
Komic 6.6 6.55 6.5 6.7 
Hamirpur 7.85 7.8 7.75 7.9 
Table 10

Sensitivity analysis for different models

ModelF1-scoreMSESensitivitySpecificity
ARIMA 0.91 0.12 0.93 0.9 
LSTM 0.85 0.18 0.87 0.82 
CNN 0.82 0.2 0.84 0.8 
MODFLOW 0.88 0.15 0.9 0.88 
ModelF1-scoreMSESensitivitySpecificity
ARIMA 0.91 0.12 0.93 0.9 
LSTM 0.85 0.18 0.87 0.82 
CNN 0.82 0.2 0.84 0.8 
MODFLOW 0.88 0.15 0.9 0.88 

The analysis of groundwater levels from 1990 to 2030 using ARIMA, LSTM, CNN, and MODFLOW models highlights significant trends in GW depletion. The GWL showed a steady decline across all models, with ARIMA exhibiting the lowest depletion (−0.08 m in 2030) and MODFLOW predicting the highest loss (−0.118 m). The discharge rate increased from 0.005 m3/s in 2000 to 0.012 m3/s in 2030, indicating accelerated extraction. The study confirms that deep learning models (LSTM, CNN) struggle with limited data, requiring over 10 years of training data for stable predictions. ARIMA consistently outperformed other models, proving its reliability in groundwater forecasting due to its lower mean squared error (MSE) (0.12) and higher F1-score (0.91).

This study evaluates GWL forecasting models – ARIMA, LSTM, and CNN – using data from 1990 to 2030 in Himachal Pradesh, India, under sequence-to-value (seq2val) and sequence-to-sequence (seq2seq) scenarios. The models' performance is compared with the results of MODFLOW. The GWL trend from 1990 to 2030 shows a continuous decline, with an average decrease of 0.2 m per decade. The highest depletion rate is observed between 2020 and 2030, where GW loss increases from 0.09 to 0.118 m, correlating with an increased discharge rate from 0.009 to 0.012 m3/s. ARIMA proves to be the most effective model, exhibiting the least prediction error (MSE = 0.12) and the highest accuracy (F1-score = 0.91). CNN and LSTM models perform well but require longer training periods. The study indicates that groundwater extraction has intensified, and if the trend continues, water table depletion may reach critical levels in the next few decades, necessitating urgent conservation measures.

The model's weaknesses can be identified in a variety of areas. First, while the ARIMA model is good at capturing autocorrelation and trend in short-term forecasting, it may struggle with long-term forecasting due to its inability to handle complicated non-linear interactions. Similarly, LSTM and CNN models, while capable of simulating intricate temporal connections, require huge datasets and extensive computational resources for training, which can result in overfitting if not correctly controlled. Furthermore, the models presume that historical data is a good predictor of future conditions, which might be restrictive in the face of unexpected climate or environmental changes. In addition, these models do not completely account for external factors such as land use changes or human activities that affect GWLs. Finally, the spatial distribution of available GW level data may not accurately reflect GW conditions across the study area, resulting in biases or uncertainty in model predictions. Quantifying and resolving these limits is critical to increasing model performance and forecast accuracy.

The authors declare that they have no known opposing financial interests or personal relationships that could have seemed to influence the work presented in this paper.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Bera
A.
,
Mukhopadhyay
B. P.
&
Barua
S.
(
2020
)
Delineation of GW potential zones in Karha River basin, Maharashtra, India, using AHP and geospatial techniques
,
Arabian Journal of Geosciences
,
13
(
15
),
693
.
CGWB
(
2018
)
GW Information
.
Himachal Pradesh, India
:
Central Ground Water Board
.
Hamdy
A.
,
Ragab
R.
&
Scarascia-Mugnozza
E.
(
2003
)
Coping with water scarcity: water saving and increasing water productivity
,
Irrigation and Drainage
,
52
(
1
),
3
20
.
Hamed
K. H.
&
Rao
A. R.
(
1998
)
A modified Mann-Kendall trend test for autocorrelated data
.
Journal of Hydrology
,
204
(
1–4
),
182
196
.
https://doi.org/10.1016/S0022-1694(97)00125-X
.
Hammami
S.
,
Zouhri
L.
,
Souissi
D.
,
Souei
A.
,
Zghibi
A.
,
Marzougui
A.
&
Dlala
M.
(
2019
)
Application of the GIS based multi-criteria decision analysis and analytical hierarchy process (AHP) in the flood susceptibility mapping (Tunisia)
,
Arabian Journal of Geosciences
,
12
,
1
16
.
Jaiswal
R. K.
,
Mukherjee
S.
,
Krishnamurthy
J.
&
Saxena
R.
(
2003
)
Role of remote sensing and GIS techniques for generation of GW prospect zones towards rural development – an approach
,
International Journal of Remote Sensing
,
24
(
5
),
993
1008
.
Long
J. W.
&
Scanlon
B. R.
(
2012
)
Assessing the spatial and temporal variability of GW levels in the United States using long-term monitoring data
,
Journal of Hydrology
,
440–441
,
170
182
.
https://doi.org/10.1016/j.jhydrol.2012.03.022
.
Selvakumar
S.
,
Chandrasekar
N.
&
Kumar
G.
(
2017
)
Hydrogeochemical characteristics and GW contamination in the rapid urban development areas of Coimbatore, India
,
Water Resources and Industry
,
17
,
26
33
.
Sen
P. K.
(
1968
)
Estimates of the regression coefficient based on Kendall's tau
,
Journal of the American Statistical Association
,
63
(
324
),
1379
1389
.
https://doi.org/10.2307/2285891
.
Tao
H.
,
Majeed
M.
,
Abdulameer
H.
,
Zounemat
M.
,
Heddam
S.
,
Kim
S.
,
Oleiwi
S.
,
Leong
M.
,
Sa
Z.
&
Danandeh
A.
(
2022
)
Neurocomputing groundwater level prediction using machine learning models: a comprehensive review
,
Neurocomputing
,
489
,
271
308
.
Yeh
H.-F.
,
Cheng
Y.-S.
,
Lin
H.-I.
&
Lee
C.-H.
(
2016
)
Mapping GW recharge potential zone using a GIS approach in Hualian River, Taiwan
,
Sustainable Environment Research
,
26
(
1
),
33
43
.
Zomlot
Z.
,
Verbeiren
B.
,
Huysmans
M.
&
Batelaan
O.
(
2015
)
Spatial distribution of groundwater recharge and baseflow: assessment of controlling factors
,
Journal of Hydrology: Regional Studies
,
4
,
349
368
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).