ABSTRACT
This study presents the first attempt to develop interpretable machine learning (ML) models for simulating groundwater fluctuations in urbanized aquifers in rainfall-scarce regions. The ML-based modeling approach was designed to provide urban water managers with a reliable tool for controlling the development of shallow water tables resulting from artificial recharge. Support vector machine, Gaussian process regression, and regression tree models were constructed to simulate historical groundwater levels (GWLs) in four wells in Kuwait City. Groundwater data preprocessing was conducted to isolate the effects of artificial recharge activities and improve the performance of the ML models. The detrended GWLs were autocorrelated to determine the input delays for the ML models. The Local Interpretable Model-agnostic Explanation (LIME) technique and SHapley Additive exPlanations (SHAP) were utilized to interpret the models' outcomes. The R2 values for the wells examined in this study ranged from 0.75 to 0.98 during validation. The outcomes of the techniques employed revealed that the ML-based approach was superior to other frameworks, with a 50% decrease in the mean absolute error compared to statistical models. The findings of this study provide urban planners in arid regions with a useful strategy for managing shallow water tables.
HIGHLIGHTS
The study presents a novel interpretable machine learning model for groundwater dynamics in urban arid regions.
The proposed explainable groundwater modeling framework aids urban water managers and planners.
Machine learning models outperformed traditional methods in simulating groundwater levels.
The presented methodology provides a user-friendly and effective tool for managing shallow water tables in arid regions.
INTRODUCTION
Groundwater is the primary source of meeting the daily water needs of 2.5 billion individuals globally (Klein Goldewijk et al. 2010). Over 50% of the global population depends on groundwater for potable use (Bierkens & Wada 2019). Additionally, a significant portion of the world's irrigation water is derived from groundwater (Winpenny et al. 2010; Vyas et al. 2024). Combined with issues such as population growth, deteriorating groundwater quality, and climate change, these facts underscore the importance of improving the use, management, and sharing of water (Connor 2015; Jodhani et al. 2025). In many regions, groundwater acts as a buffer during periods of surface water scarcity, playing a vital role in maintaining water supply resilience amid increasing climate variability. Therefore, research efforts on accurate and reliable groundwater level (GWL) forecasts are essential in this context. This research direction can serve as a foundation for management decisions and plans by providing valuable quantitative data on groundwater availability.
Deep learning (DL) approaches have recently shown great potential and are increasingly being integrated into various scientific fields, including the water sciences (Shen 2018). Within this context, machine learning (ML) techniques are used in various water science applications to address the correlation between applicable input and influential system forcings, such as runoff (Rathnayake et al. 2023) and water table depth (Alsumaiei 2020), without constructing mathematical models or explicitly defining physical relationships. Classical groundwater models often fall short, either by being overly simplistic or, in the case of numerical models, by requiring vast amounts of data, presenting significant challenges, incurring high setup and maintenance costs, or requiring substantial effort. In contrast, data-driven approaches have been successful in various research areas, including surface water studies (Maier et al. 2010; Alsumaiei 2024a) and GWL applications (Rajaee et al. 2019; Alsumaiei 2020). Although DL was initially adopted gradually in water science (Shen 2018), it is now poised for significant growth, as evidenced by the steadily increasing number of publications related to DL and water resources (Abed et al. 2022; Barrera-Animas et al. 2022; Alsumaiei 2024b; 2025a, 2025b; Jiang et al. 2024).
To meet the growing water needs of the global population, governments and urban planners are seeking methods to precisely forecast GWLs, which are critical for developing new urban areas. Representing groundwater systems and their responses to climatic variables is difficult owing to their nonlinear characteristics. However, modern modeling approaches using ML algorithms provide a more feasible solution for predicting GWLs because they bypass the need to understand the system's physical properties. ML algorithms use mathematical principles to identify optimal functions from given data and learn new patterns from incoming data. Over the last decade, numerous studies have employed ML algorithms, such as artificial neural networks (ANNs), support vector machines (SVMs), and genetic programming, to model groundwater and predict variations in its levels. These studies demonstrated the potential of ML models compared to physical modeling techniques. Standalone ML models exhibit significant performance variations, particularly regarding lead times. To enhance performance, researchers in both academia and industry have created innovative models by integrating various algorithms with ML models. For example, a wavelet transformer can break down a data series into its essential components and capture most of the information (He et al. 2014; Mohammed et al. 2025). These components are then fed into a neural network model to predict the outcomes more accurately. This approach can significantly improve prediction accuracy by identifying critical information and excluding noisy data. Despite the potential of hybrid models to outperform standalone models, a trade-off is expected between the time required to train the model and its overall performance. Therefore, there is a need to develop a framework capable of achieving high performance without being time intensive.
While these advancements have significantly improved GWL forecasting in general, modeling groundwater in urban environments presents an additional layer of complexity. Urban groundwater systems are influenced by a unique interplay of natural and anthropogenic processes, making them more difficult to characterize using conventional or even hybrid ML models. Enhancing the understanding of water movement through urban landscapes remains a major challenge in urban hydrological modeling. There are several major limitations to urban hydrological modeling that hinder its progress, as follows:
1. Limited capacity to represent processes occurring in both engineered and natural systems.
2. There is a general lack of understanding of how processes change over time and space in urban locations.
3. A lack of data to characterize local heterogeneity at the catchment or city levels (LaBianca et al. 2023; Oswald et al. 2023).
ML is expected to play a major role in advancing the understanding of hydrology and making significant strides in hydrological predictions, forecasts, and downscaling (Nearing et al. 2021). However, the restricted availability of groundwater data compared to that of the surface water domain leads to lower expectations for ML applications in groundwater predictions (Nearing et al. 2021). One of the main obstacles to the development of ML in the groundwater domain is the lack of data on shallow GWLs (Ma et al. 2021). Despite these inherent challenges, various ML frameworks have proven effective in modeling shallow groundwater dynamics. These frameworks include ANNs, random forest (RF), SVMs (Wu et al. 2023), bagging decision trees (Gupta et al. 2024), recurrent neural networks, and long short-term memory (Bowes et al. 2019). Few ML approaches have been specifically tailored for urban environments. For example, Yadav et al. (2020) used ML frameworks to forecast monthly GWLs based on observations from 24 wells in an urban groundwater-stressed district in India. Gonzalez & Arsanjani (2021) found that several ML framework approaches underestimated shallow GWLs in the Danish Capital Region by using a small number of long time series to predict the influence of climate change.
The primary objective of this study is to build on previous efforts to forecast GWL fluctuations in urban aquifer systems using ML-based approaches. While ML has shown promise in various groundwater applications, the unique hydrological and infrastructural challenges of urban environments in hyper-arid climates hinder modeling outcomes. These challenges are particularly evident in the case of the Kuwait City urban aquifer system, where rapid urbanization, high population density, data scarcity, and limited surface water availability place immense pressure on shallow groundwater resources.
In this hyper-arid setting, anthropogenic activities such as excess irrigation, infrastructure leakage, and stormwater mismanagement have significantly altered subsurface water fluxes. These factors contribute to artificial recharge that raises shallow water tables, increasing the risk of geotechnical instability beneath the city. Without sustained groundwater management and carefully implemented dewatering practices, there is a tangible threat of partial aquifer collapse. This, in turn, endangers vital infrastructure including foundations, roads, sewer networks, and water distribution systems. Traditional dewatering methods rely on limited and shallow observations, often failing to account for deeper aquifer behavior. Therefore, robust modeling approaches such as ML applied to the Kuwait Group Formation are essential for capturing spatiotemporal variability and guiding sustainable groundwater strategies. These tools are critical for ensuring the long-term resilience of large-scale infrastructure.
Another principal objective of this study is the development of an interpretable ML-based decision-support framework for regulating GWL fluctuations within the shallow aquifers of Kuwait City. The selected study area is characterized by extreme hydroclimatic conditions and significant anthropogenic modification of subsurface hydrological processes, particularly due to artificial recharge resulting from irrigation practices and infrastructure leakage. These factors exert a pronounced influence on aquifer dynamics, yet they remain under-represented in traditional physically based and even many data-driven groundwater models. The complexity and spatial heterogeneity of these processes necessitate advanced modeling techniques capable of both predictive accuracy and interpretability to facilitate informed management interventions. A persistent limitation of many ML approaches in hydrological modeling is their ‘black-box’ nature, which inhibits transparency and limits the trust and adoption of such models by practitioners and decision-makers. To address this gap, the present study proposes an interpretable ML framework tailored specifically for shallow urban aquifer systems in hyper-arid regions. The framework incorporates SVMs, Gaussian process regression (GPR), and regression tree (RT) models algorithms selected for their robustness under nonlinear and sparse-data conditions. To address interpretability component, the modeling framework is integrated with two state-of-the-art explanatory tools: Local Interpretable Model-Agnostic Explanation (LIME) and SHapley Additive exPlanations (SHAP). These tools provide both local and global insight into feature importance, enabling the identification of key hydroclimatic and anthropogenic drivers influencing groundwater dynamics.
To the best of author knowledge, this study represents the first application of LIME and SHAP in the context of shallow urban groundwater forecasting under hyper-arid climatic conditions. The proposed framework offers a novel, data-efficient, and transferable approach for interpreting and forecasting groundwater fluctuations in complex urban settings. By improving model transparency and elucidating the relationships between predictor variables and groundwater response, the framework facilitates evidence-based groundwater management and supports long-term planning for urban infrastructure resilience. Furthermore, the interpretability analysis lays the foundation for future research comparing the diagnostic utility of LIME and SHAP, thereby contributing to the refinement of explainable ML applications in hydrological sciences and engineering.
STUDY AREA
Kuwait, located in the northwest corner of the Arabian Gulf in Western Asia, experiences a hot and dry climate. Rainfall is scarce, and most of the available water evaporates because of the high temperatures. Maximum temperatures in this region typically range from 25 to 45 °C, with extremes surpassing 50 °C. In contrast, the mild winter season (November–March) is characterized by average temperatures ranging from 5 to 15 °C. Daily weather data provided by the Kuwait International Airport Weather Station were used to train the proposed ML models. Kuwait City and its metropolitan area cover 205 km2 approximately and are primarily residential, with some commercial use. Kuwait has almost no surface water bodies, making its water supply scarce. Seawater desalination is the primary source of domestic water supply. Numerous desalination facilities have been constructed to meet the increasing water demand. Rapid economic development and population growth, now approximately 4.5 million, have exacerbated the region's severe water shortages. Most of the population resides in Kuwait City and its coastal suburbs, which constitute less than 2% of the nation's total land area.
The rapid expansion of the metropolitan area in Kuwait City has significantly altered the natural recharge of the subsurface. This alteration is attributed to the increase in impervious surfaces and the implementation of urban stormwater-drainage systems. Previous studies have reported decreasing rates of natural recharge using subsurface approaches (Al-Sanad & Shaqour 1991). However, urbanization has reduced evaporation from shallow water tables. GWL variations in the area of interest have been modeled using physically based (PB) numerical modeling frameworks (Hamdan & Mukhopadhyay 1991; Székely 1999) or statistically based periodic models (Almedeij & Al-Ruwaih 2006). PB models are limited in their ability to simulate fluctuations in GWLs owing to the complexity of predicting artificial recharge activities and the complex hydro-stratigraphic features of the study area. Additionally, the intricate and varied subsurface lithology of the aquifer system imposes further limitations when aquifer parameters are scarce. Periodic models also fail to provide reliable methods for predicting future GWLs because of discrepancies in their projections (Almedeij & Al-Ruwaih 2006).
METHODS
Support vector machine
Based on ML concepts, the SVM model is a data-driven methodology grounded in the theory of statistical learning, specifically the structural risk minimization hypothesis. The formulation of an SVM was first presented by Cortes & Vapnik (1995) as a reliable classification model and has gained interest across various study domains owing to its simple theoretical foundations and superior predictive power over other artificial intelligence (AI) methods. To facilitate the classification process within the feature space, the input variables are first transformed into a high-dimensional space. SVM employs a kernel approach to construct a linear classifier that addresses nonlinear classification issues. This method leverages AI strategies to determine the relationship between the data in the input and feature space. Finally, using three well-known mathematical concepts, Fermat, Lagrange, and Kuhn–Tucker, the error term is minimized simultaneously within the model structure. According to Cherkassky & Ma (2004), these theoretical underpinnings enhance the dependability of SVM algorithms.









Gaussian process regression



A Gaussian process comprises a collection of random variables, each associated with a joint Gaussian probability distribution for a finite set of real numbers. Methods such as maximum likelihood estimation and maximum a posteriori are commonly used to optimize the regression performance. The choice of an appropriate covariance function for the training set is crucial because it significantly influences GPR performance. In this study, the target data (groundwater data) were used to identify the most suitable covariance function. During the training phase, various sets were evaluated to determine the optimal set for model construction. The selection of the GPR model for the current study is supported by its probabilistic structure with uncertainty quantification capability. Moreover, the GPR model does not require a predefined function, which is well suited for stochastic simulation of groundwater fluctuations (Deringer et al. 2021).
RT model
RT models are ML methods that utilize categorization data in a statistical context. The simplicity and strong predictive power of this approach have led to its widespread application in hydrological process modeling (Wen et al. 2009; Wilkes et al. 2016). The algorithm first partitions the data into subsets by generating child nodes, thereby ensuring that the child nodes are more homogeneous than the parent nodes. The splitting process continues until further classification does not improve the trees. For regression problems, the decision tree model uses the least-squares deviation criterion to predict the target variable, which is typically a continuous real number. Unlike other models that have black-box transfer functions, the decision tree enables visualization of how each variable influences the tree structure. Although the model may include smaller trees with similar accuracy, the tree with the lowest cross-validation error was selected. RT was incorporated in the current study because of its interpretability and ability to handle nonlinear interactions between input variables without requiring extensive data preprocessing. RT is particularly effective in identifying dominant predictors, which aligns with the study's emphasis on models' interpretability. Table 1 summarizes the models developed in this study.
Description for ML models used for groundwater level modeling in urban aquifers of Kuwait City
Well . | Model # . | ML model . | Predictors . |
---|---|---|---|
BN | m1 | Fine regression trees | Previous month groundwater depth (g_1) |
m2 | Linear SVM | ||
m3 | Rational quadratic GPR | Groundwater depth before 2 months (g_2) | |
NZ | m1 | Fine regression trees | |
m2 | Linear SVM | Groundwater depth before 3 months (g_3) | |
m3 | Rational quadratic GPR | ||
JB | m1 | Fine regression trees | Groundwater depth before 4 months (g_4) |
m2 | Linear SVM | ||
m3 | Rational quadratic GPR | Monthly rainfall (R) | |
HL | m1 | Fine regression trees | |
m2 | Linear SVM | Average monthly temperature (T) | |
m3 | Rational quadratic GPR |
Well . | Model # . | ML model . | Predictors . |
---|---|---|---|
BN | m1 | Fine regression trees | Previous month groundwater depth (g_1) |
m2 | Linear SVM | ||
m3 | Rational quadratic GPR | Groundwater depth before 2 months (g_2) | |
NZ | m1 | Fine regression trees | |
m2 | Linear SVM | Groundwater depth before 3 months (g_3) | |
m3 | Rational quadratic GPR | ||
JB | m1 | Fine regression trees | Groundwater depth before 4 months (g_4) |
m2 | Linear SVM | ||
m3 | Rational quadratic GPR | Monthly rainfall (R) | |
HL | m1 | Fine regression trees | |
m2 | Linear SVM | Average monthly temperature (T) | |
m3 | Rational quadratic GPR |
Data preprocessing
In the proposed modeling approach, the initial step involved detrending GWL data. The groundwater datasets exhibited a cumulative trend in the water table, indicating the shallowing of the water level owing to increased artificial recharge activities. The detrending process isolates the impact of external factors on the data, which is particularly crucial for projects involving dewatering or artificial recharge, which are challenging to estimate. This process significantly enhances the accuracy of ML models.
LIME for local interpretability
A surrogate glass-box model can be fitted to the decision space of any black-box model's prediction using a technique called LIME. By focusing on a sufficiently limited decision surface, even simple linear models can yield accurate approximations of the black-box model behavior. The primary goal of LIME is to simulate the local neighborhood of any given prediction. Users can then examine the glass-box model to understand how the black-box model behaves in a specific region. LIME generates synthetic data by perturbing individual data points, which are then assessed using a black-box system and used as a training set for the glass-box model. The advantages of LIME are its applicability to nearly all models and interpretability, which is similar to that of a linear model. However, these explanations are highly dependent on the perturbation process and can occasionally be unstable. Specifically, LIME examines how a model's predictions change when it is fed with varying sets of data. By perturbing individual data points, black-box predictions were obtained for these additional points. LIME then trains an interpretable weighted model, similar to a linear classifier, using this new dataset. This locally faithful explanation, also known as local fidelity, is then represented by a linear classifier. LIME has been utilized for interpreting hydrological models with promising results (Perera et al. 2024).
SHAP for global interpretability
To improve the interpretability of ML models, this study used SHAP, a powerful feature attribution method of coalitional game theory. By estimating how the prediction changes if a feature is included or removed, SHAP computes Shapley values, that is, the average marginal contribution of each feature to the model output. Unlike other interpretability methods, SHAP provides a global insight into model behavior that can reveal which features matter most across the entire dataset.
This study employed SHAP on all the trained ML models (SVM, GPR, and RT) to assess the relative impact of the predictors, such as past groundwater levels (g_1, g_2, g_3, and g_4), temperature (T), and rainfall (R). The generated SHAP summary plots visually represent feature importance, where predictors with higher absolute Shapley values contribute more significantly to GWL predictions. Moreover, adding the color gradient to the plots also provides a hint regarding the variation of the predictor magnitudes, offering further insights into feature distributions. Using SHAP to plug into the existing LIME-based local interpretability results in a complete multiscale interpretability framework. While LIME offers instance-specific explanations, SHAP provides global feature importance and dependencies, which is helpful in identifying the dominant predictors and interactions that control groundwater dynamics. The combination of these two sides improves both the transparency of ML models and the informedness of groundwater decision-making via localized and holistic insights into the predictive mechanisms of the ML models. Recent studies have witnessed a successful integration of SHAP for interpreting environmental variables (Makumbura et al. 2024; Mishra et al. 2025).
Performance measures
The generated model was validated using a conventional method of splitting the observed data into training and validation subsets. All ML models were constructed using 80% of the data for model training and 20% for model validation. To evaluate the performance of the ML models, three performance metrics, the coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE), were computed for each of the three validation rounds. R2 assesses the degree to which the simulated and observed targets are associated. The R2 value ranges from zero to one, where zero denotes the absence of any statistical relationship and one represents an exact match between the simulated and observed targets. R2 is the square of the Pearson correlation coefficient and evaluates how well a predictor can be generated from the model, rather than directly assessing the quality of the predictions, as the Pearson correlation does.
MAE served as the second criterion in this study. The MAE quantifies the variation of the simulated targets from the observations. The RMSE was the third criterion used to evaluate the effectiveness of the model. The RMSE measures the average difference between the values predicted by the statistical model and the actual observed values. Mathematically, the RMSE represents the standard deviation of the residuals, which are the differences between the observed data points and the regression line. The RMSE indicates the spread of the residuals, reflecting how well the observed data fit the predicted model. A lower RMSE value indicates that the data points are closer to the regression line, suggesting a more accurate model. The RMSE values are expressed in units of the dependent variable and can range from zero to positive infinity.
RESULTS AND DISCUSSION
ML model results
The autoregressive term in the current study was selected to be four months only, even though all wells exhibited higher-order autocorrelation. Limiting the autoregressive term to only four months avoids model overfitting and provides adequate short-term GWL forecasts. The remainder of the groundwater signals were then forced into different ML models, as shown in Figure 2. The data were chronologically divided into training and validation subsets for analysis. A total of 80% of the datasets were used for training, and the remaining 20% were used for model validation. The initial hyperparameters for all the ML models are listed in Table 2. The hyperparameters were carefully assigned to boost the efficacy of the models, avoid overfitting, and reduce the computation time. Hyperparameters selection was guided by model performance on a validation stage using RMSE as the primary criterion, with an emphasis on achieving both high accuracy and generalization across wells. This process allowed for a balanced trade-off between complexity and performance, ensuring that the final models were both efficient and robust.
Hyperparameters for ML model construction
ML model . | Hyperparameters . | |
---|---|---|
Regression tree | Preset | Fine tree |
Min. leaf size | 4 | |
Surrogate decision split | Off | |
SVM | Kernel function | Linear |
Kernel scale | Automatic | |
Box constraint | Automatic | |
Standardize data | Yes | |
GPR | Preset | Rational quadratic GPR |
Basis function | Constant | |
Kernel function | Rational quadratic | |
Optimize numeric parameters | Yes | |
Standardize data | Yes |
ML model . | Hyperparameters . | |
---|---|---|
Regression tree | Preset | Fine tree |
Min. leaf size | 4 | |
Surrogate decision split | Off | |
SVM | Kernel function | Linear |
Kernel scale | Automatic | |
Box constraint | Automatic | |
Standardize data | Yes | |
GPR | Preset | Rational quadratic GPR |
Basis function | Constant | |
Kernel function | Rational quadratic | |
Optimize numeric parameters | Yes | |
Standardize data | Yes |
Observed (true) versus predicted groundwater levels remainder for all wells.
Statistical performance metrics for different wells in urban aquifers of Kuwait City (validation period)
Well . | Model # . | R2 . | MAE . | RMSE . |
---|---|---|---|---|
BN | m1 | 0.85 | 0.088 | 0.106 |
m2 | 0.95 | 0.052 | 0.063 | |
m3 | 0.95 | 0.052 | 0.064 | |
NZ | m1 | 0.83 | 0.097 | 0.125 |
m2 | 0.98 | 0.029 | 0.038 | |
m3 | 0.92 | 0.063 | 0.085 | |
JB | m1 | 0.85 | 0.053 | 0.065 |
m2 | 0.93 | 0.034 | 0.044 | |
m3 | 0.92 | 0.039 | 0.049 | |
HL | m1 | 0.75 | 0.112 | 0.145 |
m2 | 0.85 | 0.085 | 0.114 | |
m3 | 0.88 | 0.077 | 0.102 |
Well . | Model # . | R2 . | MAE . | RMSE . |
---|---|---|---|---|
BN | m1 | 0.85 | 0.088 | 0.106 |
m2 | 0.95 | 0.052 | 0.063 | |
m3 | 0.95 | 0.052 | 0.064 | |
NZ | m1 | 0.83 | 0.097 | 0.125 |
m2 | 0.98 | 0.029 | 0.038 | |
m3 | 0.92 | 0.063 | 0.085 | |
JB | m1 | 0.85 | 0.053 | 0.065 |
m2 | 0.93 | 0.034 | 0.044 | |
m3 | 0.92 | 0.039 | 0.049 | |
HL | m1 | 0.75 | 0.112 | 0.145 |
m2 | 0.85 | 0.085 | 0.114 | |
m3 | 0.88 | 0.077 | 0.102 |
Best model for each well is indicated in bold.
Error distribution of model predictions against true groundwater levels.
Interpreting ML models using LIME and SHAP
A LIME approach was employed in this study to enhance the local interpretability of ML models. This investigation confirmed the significance of groundwater at the previous time step in predicting the current GWLs in a monthly time scale. The influence of meteorological inputs, such as rainfall and temperature, was of secondary importance. LIME highlights the dependence of predictions on one or two features owing to the strong temporal autocorrelation in groundwater levels, where past values heavily influence future predictions. In arid climates such as Kuwait, natural recharge is minimal, and artificial recharge (e.g., irrigation) dominates, making temperature and previous groundwater levels the most influential factors. ML models optimize predictions based on data variability and naturally prioritize highly correlated inputs. Additionally, LIME provides local explanations, meaning that it identifies the most significant features for a specific instance rather than across the entire dataset. This result aligns with the hydroclimatic features of the study area, where natural recharge and natural evaporation from the water table are limited owing to the characteristics of the study area in terms of limited rainfall events and a high percentage of surface imperviousness. In addition, from a theoretical perspective, LIME explanations align with hydrological and ML theories. The strong temporal autocorrelation of GWLs is attributed to the slow movement of subsurface water, as explained by Darcy's equation. More specifically, and considering the arid climatology of the study site, artificial forcing primarily controls recharge (Almedeij & Al-Ruwaih 2006; Alsumaiei 2020). In contrast, ML models optimize predictions by prioritizing dominant predictors, consistent with Vapnik's Statistical Learning Theory (Cortes & Vapnik 1995). LIME's local approximation relies on the Locally Weighted Learning principle, which gives more weight to temporally adjacent values in autoregressive data . These theoretical foundations justify why past groundwater levels and temperature dominate the LIME results, reinforcing the model's reliability in hydrological forecasting.
The LIME approach is based on the assumption that the decision boundary of a complex ML model is linear around the instance for which the explanation should be provided. It works by training an interpretable model on a perturbed sample around an instance of interest and provides an explanation for the observed phenomenon. Specifically, LIME generates a perturbed sample around the instance for which an explanation is required. Subsequently, LIME obtains the explanation prediction for each instance in the perturbed sample. The perturbed sample and explanation prediction are then used as the training dataset for the interpretable model. Thereafter, the approach assigns weights to the examples in the newly formed training dataset depending on how close these examples are to the instance being explained. Finally, LIME uses the updated training dataset to fit an interpretable model. This embedded algorithm ensures that the input features are well explained within the ML modeling framework.
SHAP summary plots illustrating global feature importance for all models.
Models such as BN-m1 and JB-m3 are cases where deeper groundwater predictors (g_3 and g_4) are more influential, and SHAP is a better option than LIME because it can capture feature interactions. This implies that some models have a more pronounced dependence on longer-term effects, which could be overlooked by the LIME model, which is limited in space. In this case, SHAP is more computationally intensive because model evaluation is performed more often than LIME, which is faster, provides an instance-based explanation, and requires lower computational resources.
SHAP and LIME complement each other; thus, together, they provide better overall interpretability of ML models. SHAP provides a global insight into the importance of features in identifying the shape of the dominant groundwater trend drivers. In contrast, LIME is more applicable for localized decision-making than SHAP because it provides case-specific explanations. This confirms that the ML framework is suitable for groundwater forecasting in urban aquifers, and the results from both methods align to strengthen the confidence in the reliability of this framework. From a computational perspective, SHAP is more demanding than LIME, as it evaluates numerous feature combinations to estimate global importance. LIME, on the other hand, generates faster, instance-level explanations with lower computational cost, making it more suitable for real-time or operational use. While SHAP offers broader insight during model evaluation, LIME provides a quicker and more practical option for everyday decision-making.
Comparisons with previous studies
The findings of this study using the ML modeling approach closely align with those of previous studies in the field (Sahoo et al. 2017; Kardan Moghaddam et al. 2021; Pham et al. 2022; Tao et al. 2022; LaBianca et al. 2024). The results confirmed that ML can provide an effective model for simulating changes in the GWLs. The R2 values for the wells examined in this study ranged from 0.75 to 0.98, which are consistent with or even surpass those of other studies using similar ML approaches within similar hydrologic settings. For instance, LaBianca et al. (2024) reported R2 values between 0.4 and 0.7 for modeling GWL fluctuations with different ML schemes applied to an urban aquifer in Denmark. This superiority can be attributed to the differences in recharge drivers between the present study area and the study location examined by Labianca et al. (2024). Considering the Kuwait City urban aquifer specifically, the SVM and GPR models developed in the present study achieved R2 values of 0.98, which are comparable to the performance of the NARX models reported by Alsumaiei (2020), where R2 ranged between 0.76 and 0.99. However, the proposed framework introduces interpretability to these black-box models, an aspect not previously addressed in the study area, by quantitatively explaining how lagged groundwater levels influence current water table dynamics. In comparison, the numerical model developed by Alkandari & Alsumaiei (2025) reported a best-case RMSE of 1.13 m, whereas the maximum RMSE across all wells in the current study did not exceed 0.145 m. Notably, the numerical model was highly dependent on detailed aquifer parameters and boundary condition calibration, while the ML framework relied solely on time series inputs. These results underscore the data-efficiency, robustness, and practical strength of the proposed approach in simulating groundwater fluctuations in hyper-arid, anthropogenically influenced environments.
A fundamental concept of using ML models to model hydrological processes is to consider the area in which the ML models have been applied. Unfortunately, ML model applications for simulating GWL dynamics in urbanized aquifer systems under arid climatic conditions have not been thoroughly explored, except in a limited number of studies(e.g. (LaBianca et al. 2024)). This study addresses this gap and demonstrates the promising results of ML applications in such aquifer systems. Additionally, to the best of the author knowledge, this study presents the first attempt to utilize interpretable techniques to explain the ML model predictors. According to Khan et al. (2023), none of the studies using ML techniques or physical models to forecast GWLs published between 2008 and 2022 used LIME techniques to explain model forcing. The efficiency metrics of the current study were also found to surpass those of the statistical models applied to the same study area (Almedeij & Al-Ruwaih 2006). The MAE for the wells investigated in this study was nearly 50% lower than that obtained using the periodic statistical models.
The ML approach accurately reflected GWL variability, with minimal differences between modeled and observed values. Urban water managers in the study area can utilize the presented ML approach to support decisions related to groundwater table control, as it has proven to be effective in capturing the impact of artificial recharge compared to other statistical or PB methodologies implemented in the same region. Furthermore, the ML model can predict changes in GWLs without requiring detailed measurements of the aquifer parameters. This was evident when comparing the findings of the current study with the outcomes of numerical simulations conducted in the study area (Hamdan & Mukhopadhyay 1991; Alkandari & Alsumaiei 2025). Such metrics often do not exist or only partially characterize aquifer heterogeneity. However, the proposed method requires only meteorological and groundwater data, both of which are readily available. Therefore, the proposed method can be easily implemented in similar aquifer systems.
Although hybridization with meta-heuristic optimization was found to be a sound approach for enhancing predictive models' efficacy in numerous previous studies, the focus of the current study is dedicated to interpretability of ML models outcomes. Emerging hybrid ML models include the Ant Lion Optimizer model, LSTM-ALO, the optimization algorithm LSTM-INFO, which involves the long short-term memory neural network and information maximization objective, and the support vector machine combined with the firefly algorithm and particle swarm optimization, FFAPSO. The lack of such hybridization is considered a drawback of the current study, as it does not take advantage of any hybridization technique. Considering the simplicity of this modeling framework for practical purposes, the modeling results can be considered acceptable. In many cases, hybrid approaches require considerable computational capacity, which is not always available. To further improve the effectiveness of the model, this study recommends that future research should be directed toward optimizing ML models.
Model shortcomings and generalizability
Despite the validity of using ML models, several minor flaws must be considered. First, the selection of hyperparameters for ML models is highly arbitrary. The hyperparameters were determined through trial and error, which does not guarantee that the parameter values represented a globally optimal solution. Second, overfitting can cause the ML model to perform poorly in terms of forecasting. Overfitting occurs when the ML model attempts to accommodate the noise component of the data during the training phase, rather than the actual data pattern, leading to a significant decline in performance during the validation phase. However, the performance metrics during the validation period may indicate that overfitting is not an issue. A more comprehensive analysis of the models' tendencies for overfitting might be more insightful in exploring overfitting and should be conducted in future studies. The monthly GWLs across four monitoring wells were modeled using a ML-based modeling approach to verify the generalizability of the results. Data were collected over an adequate temporal scale, which was considered sufficient for model testing and generalizability assessment. Comparable statistical metrics across all wells indicate that the models demonstrated consistent efficiency, highlighting the additional robustness of the created ML-based models. Therefore, the proposed modeling framework can be applied to regions with similar climates. If the modeling framework is to be used in a different climate, it would need to be recalibrated and its parameters reevaluated. It is also recommended that the evaluation of model generalizability be expanded to include additional data-driven models.
One limitation of the current research is the lack of ML hybridization, such as incorporating a complementary algorithm into a synthetic ML model to predict groundwater fluctuations. This can significantly enhance prediction accuracy and reliability. Standalone ML models, as presented in the current study, showed reliable efficacy and adequacy in capturing nonlinear relationships, but they might exhibit deteriorated performance with long-term dependencies and regional groundwater flow patterns. The gaps in this model can be bridged by hybrid approaches that combine different modeling strengths. Integrating DL (e.g., LSTM-ALO and CNN-SVM) can improve accuracy by capturing long-term dependencies in groundwater fluctuations. Furthermore, optimizing feature selection using genetic algorithms (GA) or particle swarm optimization (PSO) can enhance the efficiency of the model. In addition, using ensemble methods, such as RF + gradient boosting (GB), can reduce errors by aggregating multiple ML predictions and improving overall accuracy.
Practical applications and policy implications
The findings of this study provide critical insights into groundwater management in urban aquifers. In particular, the hydrological process of recharging the groundwater table in arid climates is better articulated, where artificial recharge forcing significantly impacts water table dynamics. The high predictive accuracy of the ML models (R2 = 0.75–0.98) demonstrates their practical viability for assisting urban planners and policymakers in making informed decisions regarding groundwater management. The foundational policy implication of this study is the need for real-time monitoring and adaptive management of shallow water tables. Interpretability analysis showed that the influence of the previous GWL strongly impacted the current GWL values rather than rainfall. This finding provides municipal authorities with scientific guidance to prioritize long-term groundwater monitoring and enhanced dewatering strategies rather than relying on traditional hydrometeorological assumptions. Given that artificial recharge, primarily from irrigation and leakage, was identified as a major driver of groundwater fluctuations, policymakers should consider regulations to control excessive water use in urban landscaping and infrastructure leakage. Additionally, the ML framework developed in this study offers a scalable and transferable approach for urbanized arid regions facing similar groundwater management challenges. The ability of ML models to outperform conventional statistical methods by reducing the MAE by 50% (Table 3) further supports their integration into the national water management frameworks. Moreover, the interpretability of ML models through LIME and SHAP provides a transparent, explainable AI-based decision-support tool, fostering greater trust among stakeholders, including hydrologists, engineers, and policymakers. This enhances cross-disciplinary collaboration for developing sustainable groundwater policies and urban planning strategies.
Advancing groundwater modeling: overcoming the limitations of traditional approaches with ML
Traditional groundwater models, such as numerical models (e.g., MODFLOW), require extensive parameterization of aquifer properties, such as hydraulic conductivity and storage coefficients (Alsumaiei & Bailey 2018a, b). These data can be scarce or difficult to obtain, particularly in urban and arid aquifers with complex hydrogeological conditions. Moreover, numerical models require large amounts of computational resources and substantial long simulation run times, which makes them impractical for real-time groundwater management. Additionally, they rely on oversimplified assumptions, such as homogeneous aquifer conditions, that do not consider the spatial variability of chambers of artificial recharge, that is, irrigation and leakage. In addition, these models do not adapt; they require manual recalibration when environmental conditions change (i.e., urbanization or climate variability), making them unsuitable for dynamic groundwater systems.
A data-driven alternative to the above challenges is based on ML models. In contrast to physical models, ML models only seek space and time series data of groundwater and meteorological fields; therefore, they are appropriate for places with low data availability. In this study, SVM, GPR, and RT models were developed, demonstrating high predictive accuracy (R2 = 0.75–0.98) that surpassed traditionally and statistically developed models. These ML models learn the nonlinear relationships between groundwater fluctuations without assuming an explicit form of system dynamics or automatic ability. Moreover, ML models are computationally efficient and can be updated in real-time with new data. Unlike traditional models, LIME-based interpretability allows urban planners to determine the most significant factors in terms of what causes groundwater fluctuations. By combining ML models with groundwater management frameworks, decision-makers can obtain more adaptive and accurate groundwater control, and thus, more sustainable urban planning in arid environments. For long-term groundwater forecasts, further model robustness could be improved by combining hybrid ML-physical modeling and investing in DL methods through additional research.
Broader applicability of the proposed framework to other urban and arid/semi-arid regions
The ML-based groundwater modeling framework developed in this study is highly adaptable and can be applied to other urban and arid/semi-arid regions that experience similar hydrological challenges. Urban aquifers worldwide, particularly in arid and semi-arid environments, face groundwater fluctuations driven by artificial recharge (e.g., irrigation and leaks), limited natural infiltration, and urbanization-induced changes. The proposed framework, which integrates SVM, GPR, and RT models with interpretability, offers a scalable and transferable approach to groundwater management. Regions such as the Arabian Gulf, Mediterranean coastal cities, arid regions in the United States, and parts of Australia face comparable challenges with shallow water tables, urban expansion, and climate variability. As the framework requires minimal input data (past GWL records and meteorological variables), it is particularly useful for data-scarce environments where traditional physical models are impractical. However, these proposed applications should consider fine-tuning for model parameters within the ML model training period to tailor the models to the unique regional hydrogeological characteristics. Furthermore, incorporating remote sensing data and integrating hybrid ML-physical models could be beneficial for city planners to enhance predictive accuracy. Expanding this approach to different climatic zones can further validate its effectiveness for global groundwater management.
SUMMARY AND CONCLUSIONS
This study investigated the applicability of interpretable ML models for simulating changes in GWLs in dry and urbanized aquifer systems, with a specific focus on Kuwait City. A computational framework was developed and applied to selected groundwater wells in the study area. A linear detrending technique was employed to preprocess the GWL time series data. Autocorrelation analysis of the detrended groundwater remainder time series data revealed high autocorrelation coefficients. The modeling approach used detrended groundwater data to construct the SVM, GPR, and RT models for the examined wells within the study area. Owing to the distinct hydrological processes within the study area, artificial recharge sources are the primary cause of shallow water table development. This study incorporated six predictors, forcing different ML models to be used. The analysis revealed that GWLs in the previous month were the most influential predictors of the current GWLs. The LIME technique was employed to interpret the ML model results based on the hydro-lithological features of the study area. In addition, SHAP analysis was conducted to extend the ML models interpretability to a global scale. The efficiency of the proposed procedure was evaluated by applying a conventional chronological division of groundwater data into training and validation subsets.
Longer groundwater data inputs to the ML network have been shown to improve network performance. The R2 values for the wells examined in this study ranged from 0.75 to 0.98, during the validation period. Comparisons with other GWL modeling techniques applied at the research site demonstrated that the ML-based approach surpassed statistical and other PB models, with a notable 50% decrease in MAE compared with that of statistical periodic models. In contrast to the PB models, the proposed ML model is user-friendly and does not require detailed field data. The findings of this study align with those of previous studies on modeling changes in GWLs using AI techniques. Further research on groundwater resources in arid regions is crucial for developing effective water management strategies, particularly in areas where groundwater is the primary accessible water source. Although the methodology was tested on a limited number of wells in Kuwait City, the modeling approach could be applicable to other urban aquifer settings, provided that hydro-lithologic similarities are considered in the future. This study underscores the potential of interpretable ML models as robust tools for urban groundwater forecasting in arid environments. The findings highlight the viability of ML-based approaches as data-efficient, adaptable alternatives to traditional groundwater models, supporting informed water resource management strategies.
FUNDING
The author declares that no funds, grants, or other support were received during the preparation of this manuscript.
AVAILABILITY OF DATA AND MATERIALS
All relevant data are included in the paper.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The author declares there is no conflict.