Evaluation of local groundwater levels (GWL) is crucial for sustainable water resource management. This study introduces a novel approach leveraging field gravity data to optimize machine learning (ML) models for local GWL evaluation. This unique approach holds the potential to bring about significant breakthroughs in groundwater management. The performance of two ML-based models, random forest (RF) and random trees (RT), is evaluated using four trials with distinct training and testing datasets. Model efficiency was assessed using R2, root mean squared error, mean absolute error, and Nash -Sutcliffe efficiency statistical metrics. Results indicated high model performance during training across all trials, with notable variability during testing. Trial 4 emerged as most successful, with RF achieving R2 and NSE of 0.998 during training and 0.965 during testing. Similarly, RT maintained high performance with R2 and NSE of 0.994 during training and R2 of 0.951 and an NSE of 0.917 during testing. The primary aim of the study was to assess the efficiency of decision tree-based ML techniques in capturing the characteristics of local GWL fluctuations. Addressing these areas will enhance the predictive accuracy of ML-based models and reliability for local groundwater storage evaluation, contributing to better water resource management and informed decision-making.

  • New direction of local groundwater level evaluation through field gravimetry and machine learning (ML).

  • Gravity and Global Positioning System (GPS) time data have been used as input parameters.

  • The role of data selection in ML techniques has been tested.

  • Brings significant breakthroughs in the field of groundwater resource management.

The fundamental source of pure, fresh drinking water is groundwater. The volume of groundwater storage is reduced day by day. The extraction rate of groundwater is not proportional to the groundwater recharge rate. India suffers a significant water crisis with economic growth, livelihoods, and human well-being. As per the report of NITI Aayog (Bhattacharya 2019), states like Delhi, Haryana, Rajasthan, Gujarat, Himachal Pradesh, Jammu & Kashmir, the northern part of Ladakh, and the western part of Uttar Pradesh lie in extremely high-water stress areas. The report also denotes that approximately 600 million individuals are experiencing significant to extreme water stress, with 75% of households lacking access to on-premises drinking water. Additionally, 84% of rural households do not have piped water access. Alarming statistics reveal that 70% of our water is contaminated, leading to approximately two lakh deaths annually due to inadequate freshwater availability. India's water quality index ranking is currently 120 out of 122 countries.

There is an increasing necessity to assess changes in groundwater levels (GWLs) to facilitate the effective and sustainable management of groundwater resources. However, as Beven (2005) states, ‘We do not have the investigative measurement techniques necessary to be secure about what form these (storage-output; note from the author) relationships should take except by seeing which functions might be appropriate in reproducing the discharges at the catchment outlet (where we can take a measurement).’

In the field of hydrology, computing the water storage factors using a hydrological equation (Sivapalan et al. 2005), known as the water balance equation, continues to pose a challenge across various scales. It is well known that the techniques used for evaluating the GWL are time-consuming and not accurate. Groundwater storage evaluation can be done through in situ checking of wells (Eltahir & Yeh 1999; Rodell & Famiglietti 2001). At the field level, determining water storage and its fluctuations typically relies on point measurements. However, the substantial spatial and temporal variability poses challenges in accurately gauging water storage. Various methods and approaches have been devised to address these challenges, such as the collection of numerous soil moisture measurements and their interpolation/extrapolation through geostatistics or the utilization of ground-penetrating radar measurements (Western et al. 2002; Huisman et al. 2003). The utilization of spatial time domain reflectometry for soil moisture measurements (Zehe et al. 2010), high-precision lysimeters (Von & Fank 2008), and the advancement of cosmic ray neutron probes (Zreda et al. 2008) are common methods. Generally, these techniques are constrained to estimating water storage near the surface. While neutron probes, electromagnetic sensors in access tubes, electrical resistivity tomography (ERT), or (cross-)borehole geophysics allow for the assessment of water storage in deeper zones, their temporal and spatial resolution (both depth and area) is limited. Additional constraints, such as the notable inaccuracies of electromagnetic sensors in access tubes (Evett et al. 2009), pose challenges in estimating subsurface water storage capacity at the field scale, particularly in deeper zones. Gyeltshen et al. (2020) used a combination of geospatial, geophysical, and statistical models, along with satellite data, to identify areas with high groundwater potential. They primarily applied the weighted index overlay method and two-dimensional electrical resistivity tomography (2D-ERT) to generate a map highlighting these potential zones.

Also, there are several remote sensing data that help in the development of a GWL prediction model, such as hydrological factors, forest type, soil composition, and land use. Those are the customary methodologies for evaluating groundwater storage changes on point scale. However, in situ, well-checking is a continuous process and is subject to huge expenses and spatial and temporal data quality (Rodell et al. 2007). At the global scale, GRACE (Tapley et al. 2004) gives us the unique opportunity to estimate water storage changes (Ramillien et al. 2008) and to improve macro-scale hydrological models (Güntner 2008; Zaitchik et al. 2008; Werth et al. 2009). Nowadays, regional groundwater storage changes can be derived through satellite gravimetry. The GRACE mission observes changes in gravity in the Earth's subsurface and provides the data continuously. However, GRACE has not given any meaningful results in the local-level studies due to its minimum spatial resolution of about 400 km2. Under this situation, we need to evaluate groundwater storage changes at the local level, through which we can identify any area where we need to control groundwater extraction.

The application of machine learning (ML) in hydrology has shown significant promise in addressing complex and nonlinear problems, such as GWL prediction and water resource management. Recent studies have demonstrated the robustness of ML techniques, including random forest (RF), support vector machine (SVM), and ensemble methods, in modeling hydrological systems. The scalability and adaptability of ML approaches are also discussed in various studies (Meddage et al. 2022; Tao et al. 2022; Fuladipanah et al. 2024; Madhushani et al. 2024; Mishra et al. 2024; Perera et al. 2024), which outline their potential in hydrology, particularly for improving the performance of the data-driven models. For instance, Puri et al. (2024) evaluated data splitting strategies in streamflow prediction using RF, emphasizing the importance of training-test data handling to improve model accuracy and reliability in hydrological applications. Additionally, emerging ML techniques have been applied for groundwater potential zone identification, as seen in the integrated approach of remote sensing, geographic information system (GIS), and the analytic hierarchy process by Sathiyamoorthy et al. (2023). These methods have proven effective in coastal regions for sustainable groundwater resource assessment. Furthermore, the role of ML in assessing water quality and associated human health risks has been highlighted by Vijayakumar et al. (2022), particularly for analyzing heavy metal contamination in groundwater. For example, RF, random tree (RT), and SVM are some of the popular and widely applied ML techniques in the field of water resource engineering (Majumdar et al. 2022). In this, the slope, elevation, water balance product, and other hydrometric parameters were used for the prediction of the groundwater. Nowadays, hybrid or ensemble ML algorithms are used to increase the model's accuracy.

Given the growing importance of integrating ML tools in groundwater resource engineering, this study aims to assess the effectiveness of ML-based techniques in predicting local GWLs using temporal field gravity data. Recent advancements underscore the relevance of employing ML-based models for GWL prediction, especially in conjunction with high-precision temporal gravity data. While it is well established that field gravity varies with time, temporal gravity data offers a unique opportunity for evaluating local GWLs. This study focuses on advancing the application of ML by leveraging high-precision gravity measurements, novel feature engineering techniques, and a hybrid methodology to improve prediction accuracy and scalability. By integrating these innovative approaches, the study aims to contribute to the growing field of hydrological ML applications while addressing the critical challenges of sustainable groundwater management.

Study area

This study was conducted at an installed observation well at the Department of Hydrology, Indian Institute of Technology (IIT) Roorkee campus (Figure 1). IIT Roorkee is in the Roorkee block of Haridwar district in Uttarakhand, India. The geographical coordinates of the observation well are 29° 52′ 6.384″ N and 77° 53′ 42.5976″ E and at an elevation of 259.3 m with respect to the mean sea level (MSL). The IIT Roorkee campus covers a geographical area of 145.08 hectares with a total built-up area of 50.93 hectares. The observation well is 10 cm in radius and 205 m in depth. The depth to the water table at the observation well lies between 5.30 and 9.00 m below ground level (BGL). The seasonal variation of the water table lies from 0.47 to 3.65 m with respect to BGL. The subsurface formations consist of repeating layers of gray micaceous sand, silt, clay, brownish-gray clay, sand, gravel, and occasional pebbles and boulders. These formations are typical of the terrace and channel alluvium from the Quaternary period (Sarkar et al. 2024).
Figure 1

Study area located in IIT Roorkee campus, Uttarakhand, India.

Figure 1

Study area located in IIT Roorkee campus, Uttarakhand, India.

Close modal

In the Roorkee block, an extremely unpredictable continental climate prevails primarily due to its proximity to the immense Himalayan range. The city undergoes four distinct seasons throughout the year. Summer, prevailing from March to July, witnesses temperatures averaging around 28 °C. A significant transformation occurs following July as the monsoon season sets in, marked by heavy rainfall. The presence of the Himalayas obstructs monsoon clouds, prolonging this season until October. After the monsoon, temperatures range between 15 and 21 °C. The city receives an annual average rainfall of approximately 2,600 mm. The primary water source for Roorkee is the River Ganga, complemented by a canal system that not only benefits the city's residents but also plays a crucial role in supporting efficient irrigation for nearby villages. This irrigation system significantly contributes to the agricultural sector, thereby playing a pivotal role in the economic development of Roorkee and its surrounding districts and states (Pathak et al. 2021).

Data collection and reduction

A relative gravimeter with a precision of 0.001 mGal was used to take gravity observations near an observation well. As the relative gravimeter gives the relative gravity value, then relative gravity values have been converted into absolute gravity values with respect to the reference absolute gravity station available at the Earth Science department on the campus. The water table's depth was also measured simultaneously using water level indication, which has a precision of 5 mm. In this study, a dual-frequency GPS receiver has been used for a geographical location (latitude, longitude, and orthometric height), which is most needed during gravity observation.

To carry out the model implementation process, gravity data, GPS time, and groundwater depth data were collected at the observation well located in the Hydrology Department of IIT Roorkee. Then, with the help of a reference absolute gravity station, the observed gravity reading is reduced to absolute gravity, and GWL data are converted with respect to MSL. In this study, gravity and GPS time data have been used as input parameters, and groundwater depth data are used as output parameters. A total of 250 samples were collected, and the data are being used for developing, training, and testing the model. It is observed that all three datasets are independent of each other.

ML models

In this section, we delve into the description of two ML-based techniques chosen for the prediction of GWL. These models rely on two input parameters: GPS time and absolute field gravity data. Both models are described in this section.

Random forest

The RF algorithm, introduced by Breiman (1996), has gained widespread popularity. It constitutes a structured assembly of tree-based models generated through random vector samples. At its core, RF aims to create multiple decision trees based on input datasets and classify them through a majority vote. The Gini index is utilized to evaluate the impurity of output-related parameters, and training datasets are formed by randomly selecting parameters to create distinct trees (Breiman 1996). In the RF model, decision trees play the role of the primary classifier. RF regression requires two predetermined operational variables: an input parameter (m) employed at a specific node for tree generation and the count of generated trees (k) (Breiman 1999). The RF method offers numerous benefits, such as elevated predictive accuracy, simplicity, and non-parametricity, which are applicable across diverse types of datasets.

Random tree (RT)

RT, a regression model (Breiman et al. 1984) rooted in both a random process and the decision tree method, operates by examining a specific number of random features and attribute k at each node. In this model, ‘random’ denotes that each tree in the ensemble holds an equal chance of being selected for the sample, ensuring uniformity in the number of trees across the board. A precise model can be constructed by amalgamating multiple RT. Over the past five years, this model has found utility in diverse engineering applications for precise prediction. The random tree algorithm follows a similar approach to the decision tree, with the key distinction being the provision of a random assortment of attributes for each split.

Performance evaluation

The two above ML-based models were chosen for the analysis of their effectiveness in predicting GWL. We utilized four key statistical parameters to evaluate the models on both the training and testing datasets. These parameters included the coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE), and Nash–Sutcliffe efficiency (NSE), as recommended by McCuen et al. (2006) and Mishra & Ojha (2023). These statistical parameters are used to estimate the correctness and reliability of the RF and RT models in predicting GWL. R2, RMSE, MAE, and NSE are represented and utilized in the evaluation of these models as follows:
(1)
(2)
(3)
(4)

The process of improving the accuracy of the ML models for GWL prediction involves several key steps. First, observed GWL data (yobs) is compared with simulated GWL data (y), and performance is assessed using metrics like R2, NSE, MAE, and RMSE. Higher values of R2 and NSE, closer to 1, are desirable, while lower values of MAE and RMSE, closer to zero, indicate better accuracy. Data collection and reduction are the fundamental step to building these models. The data are then split into training and testing datasets. Subsequently, two different models are applied, and the optimal model parameters are determined through a trial-and-error process. Once the models are fine-tuned, statistical parameters are calculated to assess their predictive performance.

This entire process can be visualized in Figure 2, which presents a flow diagram detailing the steps involved in data organization, model selection, and choosing the best-fitting model based on statistical criteria. This diagram illustrates the adoption of all four essential approaches to enhance GWL prediction accuracy by choosing four trials. The aim is to train the ML models to understand the behavioral pattern of input and output datasets and optimize the predicted results. The output datasets in terms of GWL are above the MSL range (245–248.5 m). Taking its minimum and maximum limits into consideration, four different trials are made to split the datasets. In the first trial, the GWL above the MSL range varying from 245.5 to 246 m and 246.4 to 247 m are taken for testing, and the rest of the data is used for training. In the second trial, the GWL above MSL ranging from 247.5 to 248.5 m was taken for testing, and the remaining data were taken for training. In the third trial, GWL above MSL ranging from 246 to 248.5 m was taken for training, and the remaining datasets were selected for testing. Meanwhile, in the fourth trial, a regular interval is maintained to select training and testing datasets. Where, after the third interval, the fourth number of data was taken for testing, and the remaining datasets were taken for training, as shown in Figure 3. Overall, for all trials, datasets are divided into approximately 70% for training and 30% for testing based on GWL above the MSL range.
Figure 2

Flow chart diagram of GWL prediction process.

Figure 2

Flow chart diagram of GWL prediction process.

Close modal
Figure 3

(a) Plot of total datasets, (b) trial-1 training datasets, (c) trial-1 testing datasets, (d) trial-2 training datasets, (e) trial-2 testing datasets, (f) trial-3 training datasets, (g) trial-3 testing datasets, (h) trial-4 training datasets, (i) trial-4 testing datasets.

Figure 3

(a) Plot of total datasets, (b) trial-1 training datasets, (c) trial-1 testing datasets, (d) trial-2 training datasets, (e) trial-2 testing datasets, (f) trial-3 training datasets, (g) trial-3 testing datasets, (h) trial-4 training datasets, (i) trial-4 testing datasets.

Close modal
In the context of predicting local GWLs, an analysis was conducted using two ML-based models, namely RF and regression trees. Four key statistical parameters – R2, RMSE, MAE, and NSE – were carefully chosen for performance evaluation to gauge the efficiency of these models. The evaluation process entailed meticulously examining each training and testing dataset separately. Detailed insights into the model performances across four distinct trials are shown in Tables 14. Notably, during trial 4, both the RF and RT models exhibited remarkable accuracy, showcasing elevated values for R2 and NSE, as shown in Table 4. It implies that, in this trial, both models demonstrated a robust ability to provide precise and reliable predictions of local GWLs. The high R2 and NSE values highlight the models' effectiveness in capturing the underlying patterns and dynamics governing groundwater behavior during this assessment. The observed and predicted datasets across all trials are visually depicted through an agreement diagram in Figure 4(a)–4(h) to enhance visual comprehension.
Table 1

Performance of models in trial-1

Statistical methodsRF
RT
TrainingTestingTrainingTesting
R2 0.994 0.581 0.979 0.786 
MAE (m) 0.045 0.296 0.007 0.285 
RMSE (m) 0.058 0.345 0.011 0.336 
NSE 0.994 0.416 0.991 0.445 
Statistical methodsRF
RT
TrainingTestingTrainingTesting
R2 0.994 0.581 0.979 0.786 
MAE (m) 0.045 0.296 0.007 0.285 
RMSE (m) 0.058 0.345 0.011 0.336 
NSE 0.994 0.416 0.991 0.445 
Table 2

Performance of models in trial-2

Statistical methodsRF
RT
TrainingTestingTrainingTesting
R2 0.991 0.767 0.981 0.743 
MAE (m) 0.051 0.187 0.007 0.217 
RMSE (m) 0.067 0.254 0.011 0.309 
NSE 0.989 −0.397 0.979 −1.069 
Statistical methodsRF
RT
TrainingTestingTrainingTesting
R2 0.991 0.767 0.981 0.743 
MAE (m) 0.051 0.187 0.007 0.217 
RMSE (m) 0.067 0.254 0.011 0.309 
NSE 0.989 −0.397 0.979 −1.069 
Table 3

Performance of models in trial-3

Statistical methodsRF
RT
TrainingTestingTrainingTesting
R2 0.993 0.578 0.999 0.786 
MAE (m) 0.046 0.302 0.007 0.285 
RMSE (m) 0.061 0.349 0.011 0.336 
NSE 0.993 0.397 0.999 0.445 
Statistical methodsRF
RT
TrainingTestingTrainingTesting
R2 0.993 0.578 0.999 0.786 
MAE (m) 0.046 0.302 0.007 0.285 
RMSE (m) 0.061 0.349 0.011 0.336 
NSE 0.993 0.397 0.999 0.445 
Table 4

Performance of models in trial-4

Statistical methodsRF
RT
TrainingTestingTrainingTesting
R2 0.994 0.965 0.998 0.965 
MAE (m) 0.048 0.126 0.009 0.126 
RMSE (m) 0.064 0.155 0.013 0.155 
NSE 0.994 0.965 0.998 0.947 
Statistical methodsRF
RT
TrainingTestingTrainingTesting
R2 0.994 0.965 0.998 0.965 
MAE (m) 0.048 0.126 0.009 0.126 
RMSE (m) 0.064 0.155 0.013 0.155 
NSE 0.994 0.965 0.998 0.947 
Figure 4

Agreement diagrams of AF and RT models. (a) Trial-1 training datasets, (b) trial-1 testing datasets, (c) trial-2 training datasets, (d) trial-2 testing datasets, (e) trial-3 training datasets, (f) trial-3 testing datasets, (g) trial-4 training datasets, (h) trial-4 testing datasets.

Figure 4

Agreement diagrams of AF and RT models. (a) Trial-1 training datasets, (b) trial-1 testing datasets, (c) trial-2 training datasets, (d) trial-2 testing datasets, (e) trial-3 training datasets, (f) trial-3 testing datasets, (g) trial-4 training datasets, (h) trial-4 testing datasets.

Close modal

For all four trials, the training datasets yielded strong statistical performance, with high R² and NSE values and low RMSE and MAE values, as shown in Tables 1–4. During testing in trials 1–3, models are not able to predict GWL similarly as predicted while training the models. However, in trial 4, during training as well as in the testing phase, both the models showed good performance in predicting GWL. Overall, the RF model shows good results in terms of statistical parameters.

Alternatively, the training and testing performance of the models can be represented through the Taylor diagram. These diagrams encapsulate correlation coefficients (CCs) and include standard deviations of all selected datasets. Taylor diagrams, shown in Figure 5, encapsulate the comparative analysis of all models, including their CCs and standard deviations, considering training and testing datasets across all four trials. Using Taylor diagrams, similar patterns and outcomes were identified, reinforcing the consistency of findings across different visualization methods. The diagrams in Figure 5 serve as comprehensive tools for assessing the performance of the models in capturing the nuances of observed and predicted datasets, providing more understanding of their effectiveness and reliability.
Figure 5

Taylor diagrams of AF and RT models. (a) Trial-1 training datasets, (b) trial-1 testing datasets, (c) trial-2 training datasets, (d) trial-2 testing datasets, (e) trial-3 training datasets, (f) trial-3 testing datasets, (g) trial-4 training datasets, (h) trial-4 testing datasets.

Figure 5

Taylor diagrams of AF and RT models. (a) Trial-1 training datasets, (b) trial-1 testing datasets, (c) trial-2 training datasets, (d) trial-2 testing datasets, (e) trial-3 training datasets, (f) trial-3 testing datasets, (g) trial-4 training datasets, (h) trial-4 testing datasets.

Close modal
Another critical aspect was assessing the absolute error between observed and predicted datasets. To visualize the absolute errors of all predicted datasets generated by both models, a violin diagram in Figure 6 is presented. Notably, this analysis treated training and testing datasets as distinct entities, allowing for the determination and evaluation of their respective absolute errors.
Figure 6

Diagram of Violin box representing absolute error (a) training and testing error of both models for trial-1; (b) training and testing error of both models for trial-2; (c) training and testing error of both models for trial-3; (d) training and testing error of both models for trial-4.

Figure 6

Diagram of Violin box representing absolute error (a) training and testing error of both models for trial-1; (b) training and testing error of both models for trial-2; (c) training and testing error of both models for trial-3; (d) training and testing error of both models for trial-4.

Close modal

The study, which involved utilizing datasets from four trials to select training and testing data, consistently revealed that trial 4 produced superior results. Adopting four different approaches collectively conveyed an understanding that ML tools encountered challenges in achieving precise predictions. The substantial percentage of error highlighted the pivotal role of dataset selection in obtaining accurate results in predicting GWL and its trends. From the above observation, it has been proved that the ML tool is good enough to capture the variation in input parameters during training and testing of GWL fluctuation. Therefore, the finding highlights the necessity for a comprehensive research initiative to refine and enhance these ML-based model's prediction capabilities.

Model performance and dataset influence

The performance of the ML models, RF, and regression trees demonstrated notable variability across the four trials, emphasizing the significant impact of dataset selection on predictive accuracy. The models showed high accuracy during the training phases of all trials, evidenced by elevated R2 and NSE values and low MAE and RMSE values. However, a stark contrast was observed during testing, particularly in trials 1–3, where the models' performance drastically declined. This discrepancy indicates that the models might have overfitted the training data, capturing noise along with the underlying patterns, which hindered their ability to generalize unseen data.

Statistical metrics: understanding the nuances

The chosen statistical metrics (R2, RMSE, MAE, and NSE) provided a comprehensive evaluation of the models' performance. R2 values close to 1 indicated a high proportion of variance in the observed data explained by the models. The RF model's R2 value of 0.9651 during testing in trial 4 reflected its strong explanatory power. Similarly, NSE values close to 1, as seen in both models during trial 4, indicated that the model's predictions closely matched the observed values. MAE and RMSE are crucial for understanding the absolute and squared differences between observed and predicted values, respectively. Lower values of these metrics, particularly in trial 4, highlighted the models' accuracy in predicting GWL with minimal error. The models' performance in the first three trials, however, showed higher MAE and RMSE values during testing, indicating larger prediction errors and reinforcing the overfitting issue. The observation of overfitting in some trials, evidenced by significantly higher training accuracy compared to testing accuracy, arises from the models' inability to generalize beyond the training data. Overfitting occurs when a model learns not only the underlying patterns in the data but also the noise and specific details unique to the training set (Meddage et al. 2024).

Beyond the statistical evaluation of ML models, it is crucial to contextualize these findings within the broader framework of hydrological dynamics. GWLs are influenced by a complex interplay of factors, including recharge from precipitation, groundwater abstraction, land use changes, and subsurface geology. The performance of the models, particularly in trial 4, underscores their capability to capture these intricate patterns when trained on representative datasets. However, the variability in performance across trials highlights the importance of integrating domain-specific hydrological insights into the modeling process. For instance, understanding seasonal recharge cycles, aquifer characteristics, and anthropogenic influences can aid in refining model inputs, enhancing predictive accuracy, and improving the interpretability of results. Furthermore, adopting a hydrology-driven approach to dataset selection and feature engineering – such as incorporating parameters like soil permeability, surface runoff, or evapotranspiration – can bridge the gap between data-driven predictions and physical processes. By aligning ML-based predictions with hydrological principles, the study can offer practical value for groundwater resource management, enabling stakeholders to make informed decisions for sustainable water use and conservation.

Implications for GWL prediction

The findings from this study highlight the potential and challenges of using ML models for GWL prediction. While the models demonstrated the capability to capture complex groundwater dynamics, their performance was highly sensitive to the quality and representativeness of the training datasets. This underscores the necessity for careful dataset selection and preprocessing to ensure robust and reliable model predictions. Several critical areas require further investigation and development to enhance the predictive capabilities and reliability of ML techniques for local GWL evaluation. Addressing these areas will help overcome the challenges identified in this study and contribute to more accurate and robust predictions.

This study evaluated the performance of RF and regression trees models in predicting local GWLs using field gravity data. The analysis revealed that while both models demonstrated strong predictive capabilities during training, their performance during testing was influenced significantly by the quality and representativeness of the datasets. Notably, the fourth trial showed superior predictive accuracy across both training and testing phases, with high R2 and NSE values and low MAE and RMSE values, emphasizing the critical role of dataset selection in achieving reliable predictions.

The findings highlight the potential of ML models in capturing the complex dynamics of groundwater systems, provided they are supported by robust dataset preprocessing and hydrological contextualization. However, challenges such as overfitting and sensitivity to dataset variability remain, underscoring the need for further refinement of ML approaches.

Future research can enhance the predictive accuracy and reliability of ML models in evaluating local GWL by expanding input parameters and incorporating additional hydrologically relevant variables such as recharge rates, aquifer properties, and land-use changes to improve the model's explanatory power. Additionally, enhancing model robustness through advanced regularization techniques, hybrid modeling approaches, and ensemble methods may help address overfitting and improve generalization. Incorporating uncertainty quantification methods is also essential to provide more reliable and actionable insights. Furthermore, leveraging new data and real-time inputs for dynamic model updates can refine predictions and adapt to evolving groundwater conditions. By addressing these research directions, ML models can offer more effective tools for groundwater resource management, enabling policymakers to make sustainable decisions and allocate resources more efficiently.

We express our deep gratitude to the Civil Engineering Department (Geomatics Lab), Indian Institute of Technology (IIT), Roorkee, India, for allowing us to use their excellent geospatial lab facilities. We express our special thanks to the late Prof. Jayanta Kumar Ghosh, who gave us the concept of field gravimetry in the evaluation of local groundwater levels.

This research received no external funding.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Beven
K.
(
2005
)
On the concept of model structural error
,
Water Science and Technology
,
52
(
6
),
167
175
.
https://doi.org/10.2166/wst.2005.0165
.
Bhattacharya
K.
(
2019
)
Composite Water Resources Management: Performance of States
.
New York: Safe Water Network. Retrieved from: https://coilink.org/20.500.12592/qw2wfj on 03 May 2025
.
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. (1984) Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks
.
Breiman
L.
(
1996
)
Bagging predictors
,
Machine Learning
,
24
,
123
140
.
https://doi.org/10.1007/BF00058655
.
Breiman
L.
(
1999
) Using adaptive bagging to debias regressions (p. 16). Technical Report 547,
Berkeley, CA: Statistics Dept., University of California at Berkeley
.
Eltahir
E. A.
&
Yeh
P. J. F.
(
1999
)
On the asymmetric response of aquifer water level to floods and droughts in Illinois
,
Water Resources Research
,
35
(
4
),
1199
1217
.
https://doi.org/10.1029/1998WR900071
.
Evett
S. R.
,
Schwartz
R. C.
,
Tolk
J. A.
&
Howell
T. A.
(
2009
)
Soil profile water content determination: spatiotemporal variability of electromagnetic and neutron probe sensors in access tubes
,
Vadose Zone Journal
,
8
(
4
),
926
941
.
https://doi.org/10.2136/vzj2008.0146
.
Fuladipanah
M.
,
Shahhosseini
A.
,
Rathnayake
N.
,
Azamathulla
H. M.
,
Rathnayake
U.
,
Meddage
D. P. P.
&
Tota-Maharaj
K.
(
2024
)
In-depth simulation of rainfall–runoff relationships using machine learning methods
,
Water Practice & Technology
,
19
(
6
),
2442
2459
.
https://doi.org/10.2166/wpt.2024.147
.
Güntner
A.
(
2008
)
Improvement of global hydrological models using GRACE data
,
Surveys in Geophysics
,
29
,
375
397
.
https://doi.org/10.1007/s10712-008-9038-y
.
Gyeltshen
S.
,
Tran
T. V.
,
Teja Gunda
G. K.
,
Kannaujiya
S.
,
Chatterjee
R. S.
&
Champatiray
P. K.
(
2020
)
Groundwater potential zones using a combination of geospatial technology and geophysical approach: case study in Dehradun, India
,
Hydrological Sciences Journal
,
65
(
2
),
169
182
.
https://doi.org/10.1080/02626667.2019.1688334
.
Huisman
J. A.
,
Hubbard
S. S.
,
Redman
J. D.
&
Annan
A. P.
(
2003
)
Measuring soil water content with ground penetrating radar: a review
,
Vadose Zone Journal
,
2
(
4
),
476
491
.
https://doi.org/10.2113/2.4.476
.
Madhushani
C.
,
Dananjaya
K.
,
Ekanayake
I. U.
,
Meddage
D. P. P.
,
Kantamaneni
K.
&
Rathnayake
U.
(
2024
)
Modeling streamflow in non-gauged watersheds with sparse data considering physiographic, dynamic climate, and anthropogenic factors using explainable soft computing techniques
,
Journal of Hydrology
,
631
,
130846
.
https://doi.org/10.1016/j.jhydrol.2024.130846
.
Majumdar
S.
,
Smith
R.
,
Conway
B. D.
&
Lakshmi
V.
(
2022
)
Advancing remote sensing and machine learning-driven frameworks for groundwater withdrawal estimation in Arizona: linking land subsidence to groundwater withdrawals
,
Hydrological Processes
,
36
(
11
),
e14757
.
https://doi.org/10.1002/hyp.14757
.
McCuen
R. H.
,
Knight
Z.
&
Cutter
A. G.
(
2006
)
Evaluation of the Nash–Sutcliffe efficiency index
,
Journal of Hydrologic Engineering
,
11
(
6
),
597
602
.
https://doi.org/10.1061/(ASCE)1084-0699(2006)11:6(597)
.
Meddage
D. P. P.
,
Ekanayake
I. U.
,
Herath
S.
,
Gobirahavan
R.
,
Muttil
N.
&
Rathnayake
U.
(
2022
)
Predicting bulk average velocity with rigid vegetation in open channels using tree-based machine learning: a novel approach using explainable artificial intelligence
,
Sensors
,
22
(
12
),
4398
.
https://doi.org/10.3390/s22124398
.
Meddage
D. P. P.
,
Mohotti
D.
&
Wijesooriya
K.
(
2024
)
Predicting transient wind loads on tall buildings in three-dimensional spatial coordinates using machine learning
,
Journal of Building Engineering
,
85
,
108725
.
https://doi.org/10.1016/j.jobe.2024.108725
.
Mishra
R.
&
Ojha
C. S. P.
(
2023
)
Application of AI-based techniques on Moody's diagram for predicting friction factor in pipe flow
,
J
,
6
(
4
),
544
563
.
https://doi.org/10.3390/j6040036
.
Mishra
R.
,
Kumar
S.
,
Sarkar
H.
&
Ojha
C. S. P.
(
2024
)
Utility of certain AI models in climate-induced disasters
,
World
,
5
(
4
),
865
.
doi:10.3390/world5040045
.
Pathak
S.
,
Gupta
S.
&
Ojha
C. S. P.
(
2021
)
Assessment of groundwater vulnerability to contamination with ASSIGN index: a case study in Haridwar, Uttarakhand, India
,
Journal of Hazardous, Toxic, and Radioactive Waste
,
25
(
2
),
04020081
.
Perera
U. A. K. K.
,
Coralage
D. T. S.
,
Ekanayake
I. U.
,
Alawatugoda
J.
&
Meddage
D. P. P.
(
2024
)
A new frontier in streamflow modeling in ungauged basins with sparse data: a modified generative adversarial network with explainable AI
,
Results in Engineering
,
21
,
101920
.
https://doi.org/10.1016/j.rineng.2024.101920
.
Puri
D.
,
Sihag
P.
,
Thakur
M. S.
,
Jameel
M.
,
Chadee
A. A.
&
Hazi
M. A.
(
2024
)
Analysis of data splitting on streamflow prediction using random forest
,
AIMS Environmental Science
,
11
(
4
), 593–609. doi: 10.3934/environsci.2024029.
Ramillien
G.
,
Famiglietti
J. S.
&
Wahr
J.
(
2008
)
Detection of continental hydrology and glaciology signals from GRACE: a review
,
Surveys in Geophysics
,
29
,
361
374
.
https://doi.org/10.1007/s10712-008-9048-9
.
Rodell
M.
&
Famiglietti
J. S.
(
2001
)
An analysis of terrestrial water storage variations in Illinois with implications for the Gravity Recovery and Climate Experiment (GRACE)
,
Water Resources Research
,
37
(
5
),
1327
1339
.
https://doi.org/10.1029/2000WR900306
.
Rodell
M.
,
Chen
J.
,
Kato
H.
,
Famiglietti
J. S.
,
Nigro
J.
&
Wilson
C. R.
(
2007
)
Estimating groundwater storage changes in the Mississippi River basin (USA) using GRACE
,
Hydrogeology Journal
,
15
,
159
166
.
https://doi.org/10.1007/s10040-006-0103-7
.
Sarkar
H.
,
Goriwale
S. S.
,
Ghosh
J. K.
,
Ojha
C. S. P.
&
Ghosh
S. K.
(
2024
)
Potential of machine learning algorithms in groundwater level prediction using temporal gravity data
,
Groundwater for Sustainable Development
,
25
,
101114
.
Sathiyamoorthy
M.
,
Masilamani
U. S.
,
Chadee
A. A.
,
Golla
S. D.
,
Aldagheiri
M.
,
Sihag
P.
,
Rathnayake
U.
,
Patidar
J.
,
Shukla
S.
,
Singh
A. K.
&
Kumar
B.
(
2023
)
Sustainability of groundwater potential zones in coastal areas of Cuddalore District, Tamil Nadu, South India using integrated approach of remote sensing, GIS and AHP Techniques
,
Sustainability
,
15
(
6
),
5339
.
https://doi.org/10.3390/su15065339
.
Sivapalan
M.
,
Blöschl
G.
,
Merz
R.
&
Gutknecht
D.
(
2005
)
Linking flood frequency to long-term water balance: incorporating effects of seasonality
,
Water Resources Research
,
41
(
6
), W06012.
https://doi.org/10.1029/2004WR003439
.
Tao
H.
,
Hameed
M. M.
,
Marhoon
H. A.
,
Zounemat-Kermani
M.
,
Heddam
S.
,
Kim
S.
,
Sulaiman
S. O.
,
Tan
M. L.
,
Sa'adi
Z.
,
Mehr
A. D.
&
Allawi
M. F.
(
2022
)
Groundwater level prediction using machine learning models: a comprehensive review
,
Neurocomputing
,
489
,
271
308
.
https://doi.org/10.1016/j.neucom.2022.03.014
.
Tapley
B. D.
,
Bettadpur
S.
,
Watkins
M.
&
Reigber
C.
(
2004
)
The gravity recovery and climate experiment: mission overview and early results
,
Geophysical Research Letters
,
31
(
9
),
L09607
.
https://doi.org/10.1029/2004GL019920
.
Vijayakumar
C. R.
,
Balasubramani
D. P.
&
Azamathulla
H. M.
(
2022
)
Assessment of groundwater quality and human health risk associated with chromium exposure in the industrial area of Ranipet, Tamil Nadu, India
,
Journal of Water, Sanitation and Hygiene for Development
,
12
(
1
),
58
67
.
https://doi.org/10.2166/washdev.2021.260
.
Von Unold
G.
&
Fank
J.
(
2008
)
Modular design of field lysimeters for specific application needs
,
Water, Air, & Soil Pollution: Focus
,
8
,
233
242
.
https://doi.org/10.1007/s11267-007-9172-4
.
Werth
S.
,
Güntner
A.
,
Petrovic
S.
&
Schmidt
R.
(
2009
)
Integration of GRACE mass variations into a global hydrological model
,
Earth and Planetary Science Letters
,
277
(
1–2
),
166
173
.
https://doi.org/10.1016/j.epsl.2008.10.021
.
Western
A. W.
,
Grayson
R. B.
&
Blöschl
G.
(
2002
)
Scaling of soil moisture: a hydrologic perspective
,
Annual Review of Earth and Planetary Sciences
,
30
(
1
),
149
180
.
https://doi.org/10.1146/annurev.earth.30.091201.140434
.
Zaitchik
B. F.
,
Rodell
M.
&
Reichle
R. H.
(
2008
)
Assimilation of GRACE terrestrial water storage data into a land surface model: results for the Mississippi River basin
,
Journal of Hydrometeorology
,
9
(
3
),
535
548
.
https://doi.org/10.1175/2007JHM951.1
.
Zehe
E.
,
Graeff
T.
,
Morgner
M.
,
Bauer
A.
&
Bronstert
A.
(
2010
)
Plot and field scale soil moisture dynamics and subsurface wetness control on runoff generation in a headwater in the Ore Mountains
,
Hydrology and Earth System Sciences
,
14
(
6
),
873
889
.
https://doi.org/10.5194/hess-14-873-2010
.
Zreda
M.
,
Desilets
D.
,
Ferré
T. P. A.
&
Scott
R. L.
(
2008
)
Measuring soil moisture content non-invasively at intermediate spatial scale using cosmic-ray neutrons
,
Geophysical Research Letters
,
35
(
21
),
L21402
.
https://doi.org/10.1029/2008GL035655
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).