ABSTRACT
Evaluation of local groundwater levels (GWL) is crucial for sustainable water resource management. This study introduces a novel approach leveraging field gravity data to optimize machine learning (ML) models for local GWL evaluation. This unique approach holds the potential to bring about significant breakthroughs in groundwater management. The performance of two ML-based models, random forest (RF) and random trees (RT), is evaluated using four trials with distinct training and testing datasets. Model efficiency was assessed using R2, root mean squared error, mean absolute error, and Nash -Sutcliffe efficiency statistical metrics. Results indicated high model performance during training across all trials, with notable variability during testing. Trial 4 emerged as most successful, with RF achieving R2 and NSE of 0.998 during training and 0.965 during testing. Similarly, RT maintained high performance with R2 and NSE of 0.994 during training and R2 of 0.951 and an NSE of 0.917 during testing. The primary aim of the study was to assess the efficiency of decision tree-based ML techniques in capturing the characteristics of local GWL fluctuations. Addressing these areas will enhance the predictive accuracy of ML-based models and reliability for local groundwater storage evaluation, contributing to better water resource management and informed decision-making.
HIGHLIGHTS
New direction of local groundwater level evaluation through field gravimetry and machine learning (ML).
Gravity and Global Positioning System (GPS) time data have been used as input parameters.
The role of data selection in ML techniques has been tested.
Brings significant breakthroughs in the field of groundwater resource management.
INTRODUCTION
The fundamental source of pure, fresh drinking water is groundwater. The volume of groundwater storage is reduced day by day. The extraction rate of groundwater is not proportional to the groundwater recharge rate. India suffers a significant water crisis with economic growth, livelihoods, and human well-being. As per the report of NITI Aayog (Bhattacharya 2019), states like Delhi, Haryana, Rajasthan, Gujarat, Himachal Pradesh, Jammu & Kashmir, the northern part of Ladakh, and the western part of Uttar Pradesh lie in extremely high-water stress areas. The report also denotes that approximately 600 million individuals are experiencing significant to extreme water stress, with 75% of households lacking access to on-premises drinking water. Additionally, 84% of rural households do not have piped water access. Alarming statistics reveal that 70% of our water is contaminated, leading to approximately two lakh deaths annually due to inadequate freshwater availability. India's water quality index ranking is currently 120 out of 122 countries.
There is an increasing necessity to assess changes in groundwater levels (GWLs) to facilitate the effective and sustainable management of groundwater resources. However, as Beven (2005) states, ‘We do not have the investigative measurement techniques necessary to be secure about what form these (storage-output; note from the author) relationships should take except by seeing which functions might be appropriate in reproducing the discharges at the catchment outlet (where we can take a measurement).’
In the field of hydrology, computing the water storage factors using a hydrological equation (Sivapalan et al. 2005), known as the water balance equation, continues to pose a challenge across various scales. It is well known that the techniques used for evaluating the GWL are time-consuming and not accurate. Groundwater storage evaluation can be done through in situ checking of wells (Eltahir & Yeh 1999; Rodell & Famiglietti 2001). At the field level, determining water storage and its fluctuations typically relies on point measurements. However, the substantial spatial and temporal variability poses challenges in accurately gauging water storage. Various methods and approaches have been devised to address these challenges, such as the collection of numerous soil moisture measurements and their interpolation/extrapolation through geostatistics or the utilization of ground-penetrating radar measurements (Western et al. 2002; Huisman et al. 2003). The utilization of spatial time domain reflectometry for soil moisture measurements (Zehe et al. 2010), high-precision lysimeters (Von & Fank 2008), and the advancement of cosmic ray neutron probes (Zreda et al. 2008) are common methods. Generally, these techniques are constrained to estimating water storage near the surface. While neutron probes, electromagnetic sensors in access tubes, electrical resistivity tomography (ERT), or (cross-)borehole geophysics allow for the assessment of water storage in deeper zones, their temporal and spatial resolution (both depth and area) is limited. Additional constraints, such as the notable inaccuracies of electromagnetic sensors in access tubes (Evett et al. 2009), pose challenges in estimating subsurface water storage capacity at the field scale, particularly in deeper zones. Gyeltshen et al. (2020) used a combination of geospatial, geophysical, and statistical models, along with satellite data, to identify areas with high groundwater potential. They primarily applied the weighted index overlay method and two-dimensional electrical resistivity tomography (2D-ERT) to generate a map highlighting these potential zones.
Also, there are several remote sensing data that help in the development of a GWL prediction model, such as hydrological factors, forest type, soil composition, and land use. Those are the customary methodologies for evaluating groundwater storage changes on point scale. However, in situ, well-checking is a continuous process and is subject to huge expenses and spatial and temporal data quality (Rodell et al. 2007). At the global scale, GRACE (Tapley et al. 2004) gives us the unique opportunity to estimate water storage changes (Ramillien et al. 2008) and to improve macro-scale hydrological models (Güntner 2008; Zaitchik et al. 2008; Werth et al. 2009). Nowadays, regional groundwater storage changes can be derived through satellite gravimetry. The GRACE mission observes changes in gravity in the Earth's subsurface and provides the data continuously. However, GRACE has not given any meaningful results in the local-level studies due to its minimum spatial resolution of about 400 km2. Under this situation, we need to evaluate groundwater storage changes at the local level, through which we can identify any area where we need to control groundwater extraction.
The application of machine learning (ML) in hydrology has shown significant promise in addressing complex and nonlinear problems, such as GWL prediction and water resource management. Recent studies have demonstrated the robustness of ML techniques, including random forest (RF), support vector machine (SVM), and ensemble methods, in modeling hydrological systems. The scalability and adaptability of ML approaches are also discussed in various studies (Meddage et al. 2022; Tao et al. 2022; Fuladipanah et al. 2024; Madhushani et al. 2024; Mishra et al. 2024; Perera et al. 2024), which outline their potential in hydrology, particularly for improving the performance of the data-driven models. For instance, Puri et al. (2024) evaluated data splitting strategies in streamflow prediction using RF, emphasizing the importance of training-test data handling to improve model accuracy and reliability in hydrological applications. Additionally, emerging ML techniques have been applied for groundwater potential zone identification, as seen in the integrated approach of remote sensing, geographic information system (GIS), and the analytic hierarchy process by Sathiyamoorthy et al. (2023). These methods have proven effective in coastal regions for sustainable groundwater resource assessment. Furthermore, the role of ML in assessing water quality and associated human health risks has been highlighted by Vijayakumar et al. (2022), particularly for analyzing heavy metal contamination in groundwater. For example, RF, random tree (RT), and SVM are some of the popular and widely applied ML techniques in the field of water resource engineering (Majumdar et al. 2022). In this, the slope, elevation, water balance product, and other hydrometric parameters were used for the prediction of the groundwater. Nowadays, hybrid or ensemble ML algorithms are used to increase the model's accuracy.
Given the growing importance of integrating ML tools in groundwater resource engineering, this study aims to assess the effectiveness of ML-based techniques in predicting local GWLs using temporal field gravity data. Recent advancements underscore the relevance of employing ML-based models for GWL prediction, especially in conjunction with high-precision temporal gravity data. While it is well established that field gravity varies with time, temporal gravity data offers a unique opportunity for evaluating local GWLs. This study focuses on advancing the application of ML by leveraging high-precision gravity measurements, novel feature engineering techniques, and a hybrid methodology to improve prediction accuracy and scalability. By integrating these innovative approaches, the study aims to contribute to the growing field of hydrological ML applications while addressing the critical challenges of sustainable groundwater management.
DATA AND METHODS
Study area
In the Roorkee block, an extremely unpredictable continental climate prevails primarily due to its proximity to the immense Himalayan range. The city undergoes four distinct seasons throughout the year. Summer, prevailing from March to July, witnesses temperatures averaging around 28 °C. A significant transformation occurs following July as the monsoon season sets in, marked by heavy rainfall. The presence of the Himalayas obstructs monsoon clouds, prolonging this season until October. After the monsoon, temperatures range between 15 and 21 °C. The city receives an annual average rainfall of approximately 2,600 mm. The primary water source for Roorkee is the River Ganga, complemented by a canal system that not only benefits the city's residents but also plays a crucial role in supporting efficient irrigation for nearby villages. This irrigation system significantly contributes to the agricultural sector, thereby playing a pivotal role in the economic development of Roorkee and its surrounding districts and states (Pathak et al. 2021).
Data collection and reduction
A relative gravimeter with a precision of 0.001 mGal was used to take gravity observations near an observation well. As the relative gravimeter gives the relative gravity value, then relative gravity values have been converted into absolute gravity values with respect to the reference absolute gravity station available at the Earth Science department on the campus. The water table's depth was also measured simultaneously using water level indication, which has a precision of 5 mm. In this study, a dual-frequency GPS receiver has been used for a geographical location (latitude, longitude, and orthometric height), which is most needed during gravity observation.
To carry out the model implementation process, gravity data, GPS time, and groundwater depth data were collected at the observation well located in the Hydrology Department of IIT Roorkee. Then, with the help of a reference absolute gravity station, the observed gravity reading is reduced to absolute gravity, and GWL data are converted with respect to MSL. In this study, gravity and GPS time data have been used as input parameters, and groundwater depth data are used as output parameters. A total of 250 samples were collected, and the data are being used for developing, training, and testing the model. It is observed that all three datasets are independent of each other.
ML models
In this section, we delve into the description of two ML-based techniques chosen for the prediction of GWL. These models rely on two input parameters: GPS time and absolute field gravity data. Both models are described in this section.
Random forest
The RF algorithm, introduced by Breiman (1996), has gained widespread popularity. It constitutes a structured assembly of tree-based models generated through random vector samples. At its core, RF aims to create multiple decision trees based on input datasets and classify them through a majority vote. The Gini index is utilized to evaluate the impurity of output-related parameters, and training datasets are formed by randomly selecting parameters to create distinct trees (Breiman 1996). In the RF model, decision trees play the role of the primary classifier. RF regression requires two predetermined operational variables: an input parameter (m) employed at a specific node for tree generation and the count of generated trees (k) (Breiman 1999). The RF method offers numerous benefits, such as elevated predictive accuracy, simplicity, and non-parametricity, which are applicable across diverse types of datasets.
Random tree (RT)
RT, a regression model (Breiman et al. 1984) rooted in both a random process and the decision tree method, operates by examining a specific number of random features and attribute k at each node. In this model, ‘random’ denotes that each tree in the ensemble holds an equal chance of being selected for the sample, ensuring uniformity in the number of trees across the board. A precise model can be constructed by amalgamating multiple RT. Over the past five years, this model has found utility in diverse engineering applications for precise prediction. The random tree algorithm follows a similar approach to the decision tree, with the key distinction being the provision of a random assortment of attributes for each split.
Performance evaluation
The process of improving the accuracy of the ML models for GWL prediction involves several key steps. First, observed GWL data (yobs) is compared with simulated GWL data (y), and performance is assessed using metrics like R2, NSE, MAE, and RMSE. Higher values of R2 and NSE, closer to 1, are desirable, while lower values of MAE and RMSE, closer to zero, indicate better accuracy. Data collection and reduction are the fundamental step to building these models. The data are then split into training and testing datasets. Subsequently, two different models are applied, and the optimal model parameters are determined through a trial-and-error process. Once the models are fine-tuned, statistical parameters are calculated to assess their predictive performance.
(a) Plot of total datasets, (b) trial-1 training datasets, (c) trial-1 testing datasets, (d) trial-2 training datasets, (e) trial-2 testing datasets, (f) trial-3 training datasets, (g) trial-3 testing datasets, (h) trial-4 training datasets, (i) trial-4 testing datasets.
(a) Plot of total datasets, (b) trial-1 training datasets, (c) trial-1 testing datasets, (d) trial-2 training datasets, (e) trial-2 testing datasets, (f) trial-3 training datasets, (g) trial-3 testing datasets, (h) trial-4 training datasets, (i) trial-4 testing datasets.
RESULTS
Performance of models in trial-1
Statistical methods . | RF . | RT . | ||
---|---|---|---|---|
Training . | Testing . | Training . | Testing . | |
R2 | 0.994 | 0.581 | 0.979 | 0.786 |
MAE (m) | 0.045 | 0.296 | 0.007 | 0.285 |
RMSE (m) | 0.058 | 0.345 | 0.011 | 0.336 |
NSE | 0.994 | 0.416 | 0.991 | 0.445 |
Statistical methods . | RF . | RT . | ||
---|---|---|---|---|
Training . | Testing . | Training . | Testing . | |
R2 | 0.994 | 0.581 | 0.979 | 0.786 |
MAE (m) | 0.045 | 0.296 | 0.007 | 0.285 |
RMSE (m) | 0.058 | 0.345 | 0.011 | 0.336 |
NSE | 0.994 | 0.416 | 0.991 | 0.445 |
Performance of models in trial-2
Statistical methods . | RF . | RT . | ||
---|---|---|---|---|
Training . | Testing . | Training . | Testing . | |
R2 | 0.991 | 0.767 | 0.981 | 0.743 |
MAE (m) | 0.051 | 0.187 | 0.007 | 0.217 |
RMSE (m) | 0.067 | 0.254 | 0.011 | 0.309 |
NSE | 0.989 | −0.397 | 0.979 | −1.069 |
Statistical methods . | RF . | RT . | ||
---|---|---|---|---|
Training . | Testing . | Training . | Testing . | |
R2 | 0.991 | 0.767 | 0.981 | 0.743 |
MAE (m) | 0.051 | 0.187 | 0.007 | 0.217 |
RMSE (m) | 0.067 | 0.254 | 0.011 | 0.309 |
NSE | 0.989 | −0.397 | 0.979 | −1.069 |
Performance of models in trial-3
Statistical methods . | RF . | RT . | ||
---|---|---|---|---|
Training . | Testing . | Training . | Testing . | |
R2 | 0.993 | 0.578 | 0.999 | 0.786 |
MAE (m) | 0.046 | 0.302 | 0.007 | 0.285 |
RMSE (m) | 0.061 | 0.349 | 0.011 | 0.336 |
NSE | 0.993 | 0.397 | 0.999 | 0.445 |
Statistical methods . | RF . | RT . | ||
---|---|---|---|---|
Training . | Testing . | Training . | Testing . | |
R2 | 0.993 | 0.578 | 0.999 | 0.786 |
MAE (m) | 0.046 | 0.302 | 0.007 | 0.285 |
RMSE (m) | 0.061 | 0.349 | 0.011 | 0.336 |
NSE | 0.993 | 0.397 | 0.999 | 0.445 |
Performance of models in trial-4
Statistical methods . | RF . | RT . | ||
---|---|---|---|---|
Training . | Testing . | Training . | Testing . | |
R2 | 0.994 | 0.965 | 0.998 | 0.965 |
MAE (m) | 0.048 | 0.126 | 0.009 | 0.126 |
RMSE (m) | 0.064 | 0.155 | 0.013 | 0.155 |
NSE | 0.994 | 0.965 | 0.998 | 0.947 |
Statistical methods . | RF . | RT . | ||
---|---|---|---|---|
Training . | Testing . | Training . | Testing . | |
R2 | 0.994 | 0.965 | 0.998 | 0.965 |
MAE (m) | 0.048 | 0.126 | 0.009 | 0.126 |
RMSE (m) | 0.064 | 0.155 | 0.013 | 0.155 |
NSE | 0.994 | 0.965 | 0.998 | 0.947 |
Agreement diagrams of AF and RT models. (a) Trial-1 training datasets, (b) trial-1 testing datasets, (c) trial-2 training datasets, (d) trial-2 testing datasets, (e) trial-3 training datasets, (f) trial-3 testing datasets, (g) trial-4 training datasets, (h) trial-4 testing datasets.
Agreement diagrams of AF and RT models. (a) Trial-1 training datasets, (b) trial-1 testing datasets, (c) trial-2 training datasets, (d) trial-2 testing datasets, (e) trial-3 training datasets, (f) trial-3 testing datasets, (g) trial-4 training datasets, (h) trial-4 testing datasets.
For all four trials, the training datasets yielded strong statistical performance, with high R² and NSE values and low RMSE and MAE values, as shown in Tables 1–4. During testing in trials 1–3, models are not able to predict GWL similarly as predicted while training the models. However, in trial 4, during training as well as in the testing phase, both the models showed good performance in predicting GWL. Overall, the RF model shows good results in terms of statistical parameters.
Taylor diagrams of AF and RT models. (a) Trial-1 training datasets, (b) trial-1 testing datasets, (c) trial-2 training datasets, (d) trial-2 testing datasets, (e) trial-3 training datasets, (f) trial-3 testing datasets, (g) trial-4 training datasets, (h) trial-4 testing datasets.
Taylor diagrams of AF and RT models. (a) Trial-1 training datasets, (b) trial-1 testing datasets, (c) trial-2 training datasets, (d) trial-2 testing datasets, (e) trial-3 training datasets, (f) trial-3 testing datasets, (g) trial-4 training datasets, (h) trial-4 testing datasets.
Diagram of Violin box representing absolute error (a) training and testing error of both models for trial-1; (b) training and testing error of both models for trial-2; (c) training and testing error of both models for trial-3; (d) training and testing error of both models for trial-4.
Diagram of Violin box representing absolute error (a) training and testing error of both models for trial-1; (b) training and testing error of both models for trial-2; (c) training and testing error of both models for trial-3; (d) training and testing error of both models for trial-4.
The study, which involved utilizing datasets from four trials to select training and testing data, consistently revealed that trial 4 produced superior results. Adopting four different approaches collectively conveyed an understanding that ML tools encountered challenges in achieving precise predictions. The substantial percentage of error highlighted the pivotal role of dataset selection in obtaining accurate results in predicting GWL and its trends. From the above observation, it has been proved that the ML tool is good enough to capture the variation in input parameters during training and testing of GWL fluctuation. Therefore, the finding highlights the necessity for a comprehensive research initiative to refine and enhance these ML-based model's prediction capabilities.
DISCUSSION
Model performance and dataset influence
The performance of the ML models, RF, and regression trees demonstrated notable variability across the four trials, emphasizing the significant impact of dataset selection on predictive accuracy. The models showed high accuracy during the training phases of all trials, evidenced by elevated R2 and NSE values and low MAE and RMSE values. However, a stark contrast was observed during testing, particularly in trials 1–3, where the models' performance drastically declined. This discrepancy indicates that the models might have overfitted the training data, capturing noise along with the underlying patterns, which hindered their ability to generalize unseen data.
Statistical metrics: understanding the nuances
The chosen statistical metrics (R2, RMSE, MAE, and NSE) provided a comprehensive evaluation of the models' performance. R2 values close to 1 indicated a high proportion of variance in the observed data explained by the models. The RF model's R2 value of 0.9651 during testing in trial 4 reflected its strong explanatory power. Similarly, NSE values close to 1, as seen in both models during trial 4, indicated that the model's predictions closely matched the observed values. MAE and RMSE are crucial for understanding the absolute and squared differences between observed and predicted values, respectively. Lower values of these metrics, particularly in trial 4, highlighted the models' accuracy in predicting GWL with minimal error. The models' performance in the first three trials, however, showed higher MAE and RMSE values during testing, indicating larger prediction errors and reinforcing the overfitting issue. The observation of overfitting in some trials, evidenced by significantly higher training accuracy compared to testing accuracy, arises from the models' inability to generalize beyond the training data. Overfitting occurs when a model learns not only the underlying patterns in the data but also the noise and specific details unique to the training set (Meddage et al. 2024).
Beyond the statistical evaluation of ML models, it is crucial to contextualize these findings within the broader framework of hydrological dynamics. GWLs are influenced by a complex interplay of factors, including recharge from precipitation, groundwater abstraction, land use changes, and subsurface geology. The performance of the models, particularly in trial 4, underscores their capability to capture these intricate patterns when trained on representative datasets. However, the variability in performance across trials highlights the importance of integrating domain-specific hydrological insights into the modeling process. For instance, understanding seasonal recharge cycles, aquifer characteristics, and anthropogenic influences can aid in refining model inputs, enhancing predictive accuracy, and improving the interpretability of results. Furthermore, adopting a hydrology-driven approach to dataset selection and feature engineering – such as incorporating parameters like soil permeability, surface runoff, or evapotranspiration – can bridge the gap between data-driven predictions and physical processes. By aligning ML-based predictions with hydrological principles, the study can offer practical value for groundwater resource management, enabling stakeholders to make informed decisions for sustainable water use and conservation.
Implications for GWL prediction
The findings from this study highlight the potential and challenges of using ML models for GWL prediction. While the models demonstrated the capability to capture complex groundwater dynamics, their performance was highly sensitive to the quality and representativeness of the training datasets. This underscores the necessity for careful dataset selection and preprocessing to ensure robust and reliable model predictions. Several critical areas require further investigation and development to enhance the predictive capabilities and reliability of ML techniques for local GWL evaluation. Addressing these areas will help overcome the challenges identified in this study and contribute to more accurate and robust predictions.
CONCLUSION
This study evaluated the performance of RF and regression trees models in predicting local GWLs using field gravity data. The analysis revealed that while both models demonstrated strong predictive capabilities during training, their performance during testing was influenced significantly by the quality and representativeness of the datasets. Notably, the fourth trial showed superior predictive accuracy across both training and testing phases, with high R2 and NSE values and low MAE and RMSE values, emphasizing the critical role of dataset selection in achieving reliable predictions.
The findings highlight the potential of ML models in capturing the complex dynamics of groundwater systems, provided they are supported by robust dataset preprocessing and hydrological contextualization. However, challenges such as overfitting and sensitivity to dataset variability remain, underscoring the need for further refinement of ML approaches.
Future research can enhance the predictive accuracy and reliability of ML models in evaluating local GWL by expanding input parameters and incorporating additional hydrologically relevant variables such as recharge rates, aquifer properties, and land-use changes to improve the model's explanatory power. Additionally, enhancing model robustness through advanced regularization techniques, hybrid modeling approaches, and ensemble methods may help address overfitting and improve generalization. Incorporating uncertainty quantification methods is also essential to provide more reliable and actionable insights. Furthermore, leveraging new data and real-time inputs for dynamic model updates can refine predictions and adapt to evolving groundwater conditions. By addressing these research directions, ML models can offer more effective tools for groundwater resource management, enabling policymakers to make sustainable decisions and allocate resources more efficiently.
ACKNOWLEDGEMENTS
We express our deep gratitude to the Civil Engineering Department (Geomatics Lab), Indian Institute of Technology (IIT), Roorkee, India, for allowing us to use their excellent geospatial lab facilities. We express our special thanks to the late Prof. Jayanta Kumar Ghosh, who gave us the concept of field gravimetry in the evaluation of local groundwater levels.
FUNDING
This research received no external funding.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.