This study presents the first attempt to develop interpretable machine learning (ML) models for simulating groundwater fluctuations in urbanized aquifers in rainfall-scarce regions. The ML-based modeling approach was designed to provide urban water managers with a reliable tool for controlling the development of shallow water tables resulting from artificial recharge. Support vector machine, Gaussian process regression, and regression tree models were constructed to simulate historical groundwater levels (GWLs) in four wells in Kuwait City. Groundwater data preprocessing was conducted to isolate the effects of artificial recharge activities and improve the performance of the ML models. The detrended GWLs were autocorrelated to determine the input delays for the ML models. The Local Interpretable Model-agnostic Explanation (LIME) technique and SHapley Additive exPlanations (SHAP) were utilized to interpret the models' outcomes. The R2 values for the wells examined in this study ranged from 0.75 to 0.98 during validation. The outcomes of the techniques employed revealed that the ML-based approach was superior to other frameworks, with a 50% decrease in the mean absolute error compared to statistical models. The findings of this study provide urban planners in arid regions with a useful strategy for managing shallow water tables.

  • The study presents a novel interpretable machine learning model for groundwater dynamics in urban arid regions.

  • The proposed explainable groundwater modeling framework aids urban water managers and planners.

  • Machine learning models outperformed traditional methods in simulating groundwater levels.

  • The presented methodology provides a user-friendly and effective tool for managing shallow water tables in arid regions.

Groundwater is the primary source of meeting the daily water needs of 2.5 billion individuals globally (Klein Goldewijk et al. 2010). Over 50% of the global population depends on groundwater for potable use (Bierkens & Wada 2019). Additionally, a significant portion of the world's irrigation water is derived from groundwater (Winpenny et al. 2010; Vyas et al. 2024). Combined with issues such as population growth, deteriorating groundwater quality, and climate change, these facts underscore the importance of improving the use, management, and sharing of water (Connor 2015; Jodhani et al. 2025). In many regions, groundwater acts as a buffer during periods of surface water scarcity, playing a vital role in maintaining water supply resilience amid increasing climate variability. Therefore, research efforts on accurate and reliable groundwater level (GWL) forecasts are essential in this context. This research direction can serve as a foundation for management decisions and plans by providing valuable quantitative data on groundwater availability.

Deep learning (DL) approaches have recently shown great potential and are increasingly being integrated into various scientific fields, including the water sciences (Shen 2018). Within this context, machine learning (ML) techniques are used in various water science applications to address the correlation between applicable input and influential system forcings, such as runoff (Rathnayake et al. 2023) and water table depth (Alsumaiei 2020), without constructing mathematical models or explicitly defining physical relationships. Classical groundwater models often fall short, either by being overly simplistic or, in the case of numerical models, by requiring vast amounts of data, presenting significant challenges, incurring high setup and maintenance costs, or requiring substantial effort. In contrast, data-driven approaches have been successful in various research areas, including surface water studies (Maier et al. 2010; Alsumaiei 2024a) and GWL applications (Rajaee et al. 2019; Alsumaiei 2020). Although DL was initially adopted gradually in water science (Shen 2018), it is now poised for significant growth, as evidenced by the steadily increasing number of publications related to DL and water resources (Abed et al. 2022; Barrera-Animas et al. 2022; Alsumaiei 2024b; 2025a, 2025b; Jiang et al. 2024).

To meet the growing water needs of the global population, governments and urban planners are seeking methods to precisely forecast GWLs, which are critical for developing new urban areas. Representing groundwater systems and their responses to climatic variables is difficult owing to their nonlinear characteristics. However, modern modeling approaches using ML algorithms provide a more feasible solution for predicting GWLs because they bypass the need to understand the system's physical properties. ML algorithms use mathematical principles to identify optimal functions from given data and learn new patterns from incoming data. Over the last decade, numerous studies have employed ML algorithms, such as artificial neural networks (ANNs), support vector machines (SVMs), and genetic programming, to model groundwater and predict variations in its levels. These studies demonstrated the potential of ML models compared to physical modeling techniques. Standalone ML models exhibit significant performance variations, particularly regarding lead times. To enhance performance, researchers in both academia and industry have created innovative models by integrating various algorithms with ML models. For example, a wavelet transformer can break down a data series into its essential components and capture most of the information (He et al. 2014; Mohammed et al. 2025). These components are then fed into a neural network model to predict the outcomes more accurately. This approach can significantly improve prediction accuracy by identifying critical information and excluding noisy data. Despite the potential of hybrid models to outperform standalone models, a trade-off is expected between the time required to train the model and its overall performance. Therefore, there is a need to develop a framework capable of achieving high performance without being time intensive.

While these advancements have significantly improved GWL forecasting in general, modeling groundwater in urban environments presents an additional layer of complexity. Urban groundwater systems are influenced by a unique interplay of natural and anthropogenic processes, making them more difficult to characterize using conventional or even hybrid ML models. Enhancing the understanding of water movement through urban landscapes remains a major challenge in urban hydrological modeling. There are several major limitations to urban hydrological modeling that hinder its progress, as follows:

  • 1. Limited capacity to represent processes occurring in both engineered and natural systems.

  • 2. There is a general lack of understanding of how processes change over time and space in urban locations.

  • 3. A lack of data to characterize local heterogeneity at the catchment or city levels (LaBianca et al. 2023; Oswald et al. 2023).

ML is expected to play a major role in advancing the understanding of hydrology and making significant strides in hydrological predictions, forecasts, and downscaling (Nearing et al. 2021). However, the restricted availability of groundwater data compared to that of the surface water domain leads to lower expectations for ML applications in groundwater predictions (Nearing et al. 2021). One of the main obstacles to the development of ML in the groundwater domain is the lack of data on shallow GWLs (Ma et al. 2021). Despite these inherent challenges, various ML frameworks have proven effective in modeling shallow groundwater dynamics. These frameworks include ANNs, random forest (RF), SVMs (Wu et al. 2023), bagging decision trees (Gupta et al. 2024), recurrent neural networks, and long short-term memory (Bowes et al. 2019). Few ML approaches have been specifically tailored for urban environments. For example, Yadav et al. (2020) used ML frameworks to forecast monthly GWLs based on observations from 24 wells in an urban groundwater-stressed district in India. Gonzalez & Arsanjani (2021) found that several ML framework approaches underestimated shallow GWLs in the Danish Capital Region by using a small number of long time series to predict the influence of climate change.

The primary objective of this study is to build on previous efforts to forecast GWL fluctuations in urban aquifer systems using ML-based approaches. While ML has shown promise in various groundwater applications, the unique hydrological and infrastructural challenges of urban environments in hyper-arid climates hinder modeling outcomes. These challenges are particularly evident in the case of the Kuwait City urban aquifer system, where rapid urbanization, high population density, data scarcity, and limited surface water availability place immense pressure on shallow groundwater resources.

In this hyper-arid setting, anthropogenic activities such as excess irrigation, infrastructure leakage, and stormwater mismanagement have significantly altered subsurface water fluxes. These factors contribute to artificial recharge that raises shallow water tables, increasing the risk of geotechnical instability beneath the city. Without sustained groundwater management and carefully implemented dewatering practices, there is a tangible threat of partial aquifer collapse. This, in turn, endangers vital infrastructure including foundations, roads, sewer networks, and water distribution systems. Traditional dewatering methods rely on limited and shallow observations, often failing to account for deeper aquifer behavior. Therefore, robust modeling approaches such as ML applied to the Kuwait Group Formation are essential for capturing spatiotemporal variability and guiding sustainable groundwater strategies. These tools are critical for ensuring the long-term resilience of large-scale infrastructure.

Another principal objective of this study is the development of an interpretable ML-based decision-support framework for regulating GWL fluctuations within the shallow aquifers of Kuwait City. The selected study area is characterized by extreme hydroclimatic conditions and significant anthropogenic modification of subsurface hydrological processes, particularly due to artificial recharge resulting from irrigation practices and infrastructure leakage. These factors exert a pronounced influence on aquifer dynamics, yet they remain under-represented in traditional physically based and even many data-driven groundwater models. The complexity and spatial heterogeneity of these processes necessitate advanced modeling techniques capable of both predictive accuracy and interpretability to facilitate informed management interventions. A persistent limitation of many ML approaches in hydrological modeling is their ‘black-box’ nature, which inhibits transparency and limits the trust and adoption of such models by practitioners and decision-makers. To address this gap, the present study proposes an interpretable ML framework tailored specifically for shallow urban aquifer systems in hyper-arid regions. The framework incorporates SVMs, Gaussian process regression (GPR), and regression tree (RT) models algorithms selected for their robustness under nonlinear and sparse-data conditions. To address interpretability component, the modeling framework is integrated with two state-of-the-art explanatory tools: Local Interpretable Model-Agnostic Explanation (LIME) and SHapley Additive exPlanations (SHAP). These tools provide both local and global insight into feature importance, enabling the identification of key hydroclimatic and anthropogenic drivers influencing groundwater dynamics.

To the best of author knowledge, this study represents the first application of LIME and SHAP in the context of shallow urban groundwater forecasting under hyper-arid climatic conditions. The proposed framework offers a novel, data-efficient, and transferable approach for interpreting and forecasting groundwater fluctuations in complex urban settings. By improving model transparency and elucidating the relationships between predictor variables and groundwater response, the framework facilitates evidence-based groundwater management and supports long-term planning for urban infrastructure resilience. Furthermore, the interpretability analysis lays the foundation for future research comparing the diagnostic utility of LIME and SHAP, thereby contributing to the refinement of explainable ML applications in hydrological sciences and engineering.

Kuwait, located in the northwest corner of the Arabian Gulf in Western Asia, experiences a hot and dry climate. Rainfall is scarce, and most of the available water evaporates because of the high temperatures. Maximum temperatures in this region typically range from 25 to 45 °C, with extremes surpassing 50 °C. In contrast, the mild winter season (November–March) is characterized by average temperatures ranging from 5 to 15 °C. Daily weather data provided by the Kuwait International Airport Weather Station were used to train the proposed ML models. Kuwait City and its metropolitan area cover 205 km2 approximately and are primarily residential, with some commercial use. Kuwait has almost no surface water bodies, making its water supply scarce. Seawater desalination is the primary source of domestic water supply. Numerous desalination facilities have been constructed to meet the increasing water demand. Rapid economic development and population growth, now approximately 4.5 million, have exacerbated the region's severe water shortages. Most of the population resides in Kuwait City and its coastal suburbs, which constitute less than 2% of the nation's total land area.

Figure 1 depicts several groundwater monitoring wells in Kuwait City and additional descriptive information. These wells are operated by the Ministry of Electricity and Water (MEW). The groundwater table in the study area is typically shallow, ranging from 1.0 m along the coast to more than 15.0 m inland. Generally, the water table gradually slopes from the inland areas toward the coast, reflecting ground surface morphology. Since 1985, the official authorities have collected monthly water table depth data. One significant concern is that the development of a shallow water table can damage municipal infrastructure. Gaining a hydro-stratigraphic understanding is crucial for developing efficient modeling schemes in the study area. The shallow lithology of Kuwait City (extending to a depth of 20 m) is primarily composed of granular silty sand with high hydraulic conductivity with cemented discontinuities in some areas. Beneath the granular deposits, there is a layer of gatch soil with low conductivity. This gatch layer acts as an aquitard, preventing infiltrated water from percolating deeper into the regional groundwater flow system.
Figure 1

Map of the study area and groundwater observation wells.

Figure 1

Map of the study area and groundwater observation wells.

Close modal

The rapid expansion of the metropolitan area in Kuwait City has significantly altered the natural recharge of the subsurface. This alteration is attributed to the increase in impervious surfaces and the implementation of urban stormwater-drainage systems. Previous studies have reported decreasing rates of natural recharge using subsurface approaches (Al-Sanad & Shaqour 1991). However, urbanization has reduced evaporation from shallow water tables. GWL variations in the area of interest have been modeled using physically based (PB) numerical modeling frameworks (Hamdan & Mukhopadhyay 1991; Székely 1999) or statistically based periodic models (Almedeij & Al-Ruwaih 2006). PB models are limited in their ability to simulate fluctuations in GWLs owing to the complexity of predicting artificial recharge activities and the complex hydro-stratigraphic features of the study area. Additionally, the intricate and varied subsurface lithology of the aquifer system imposes further limitations when aquifer parameters are scarce. Periodic models also fail to provide reliable methods for predicting future GWLs because of discrepancies in their projections (Almedeij & Al-Ruwaih 2006).

Support vector machine

Based on ML concepts, the SVM model is a data-driven methodology grounded in the theory of statistical learning, specifically the structural risk minimization hypothesis. The formulation of an SVM was first presented by Cortes & Vapnik (1995) as a reliable classification model and has gained interest across various study domains owing to its simple theoretical foundations and superior predictive power over other artificial intelligence (AI) methods. To facilitate the classification process within the feature space, the input variables are first transformed into a high-dimensional space. SVM employs a kernel approach to construct a linear classifier that addresses nonlinear classification issues. This method leverages AI strategies to determine the relationship between the data in the input and feature space. Finally, using three well-known mathematical concepts, Fermat, Lagrange, and Kuhn–Tucker, the error term is minimized simultaneously within the model structure. According to Cherkassky & Ma (2004), these theoretical underpinnings enhance the dependability of SVM algorithms.

Regression involves the statistical process of determining the hyperplane that minimizes the error and best fits a given dataset. The error term represents the difference between the regression output and the observed data points. To optimize performance, the SVM model must be sensitive to minor deviations from the target data. Therefore, a penalty function was implemented to account for these deviations. According to Kisi & Cimen (2011) and Zhou et al. (2013), the penalty function is defined as follows:
(1)
where is the target variable, is the general regression function, and is the error term. A penalty function was used to optimize the performance of the general regression function . has the general regression function format outlined in Equation (2):
(2)
where denotes the input space, and b is a real number that defines the bias term (Shabani et al. 2020). A convex optimization was then formulated to minimize the norm vector and determine the flatness of Equation (2). The minimization of the norm vector is defined as follows:
(3)
where are slack deficiency variables, and E is the positive number that influences the penalizing loss during model training. Minimizing the objective function (Equation (3)) helps in reducing the risk of overfitting and improves the reliability of the regression. SVM was chosen because of its ability to efficiently handle high-dimensional data and nonlinearity using kernel functions. Consequently, it does not overfit owing to its use of the structural risk minimization principle, making it appropriate for datasets with limited and noisy groundwater (Cortes & Vapnik 1995). More comprehensive details on the SVM procedure and its formulations are available in Ma & Guo (2014). SVM algorithms for GWL modeling were implemented using MATLAB software, specifically the built-in Statistics and ML toolbox.

Gaussian process regression

GPR is a nonparametric, kernel-based supervised learning method used for probabilistic regression. It is assumed that both the input and output variables follow Gaussian probability distributions (Deringer et al. 2021). Let x and y represent input and output variables, respectively. The fundamental premise of GPR is as follows:
(4)
where denotes Gaussian noise with variance of σ2. GPR assumes that the coefficient is determined from the data for each input variable, x. The error term, , is considered independent and follows a normal distribution with a mean of zero and a variance of σ2. The latent variable f is introduced via a Gaussian process with an explicit basis function, h.

A Gaussian process comprises a collection of random variables, each associated with a joint Gaussian probability distribution for a finite set of real numbers. Methods such as maximum likelihood estimation and maximum a posteriori are commonly used to optimize the regression performance. The choice of an appropriate covariance function for the training set is crucial because it significantly influences GPR performance. In this study, the target data (groundwater data) were used to identify the most suitable covariance function. During the training phase, various sets were evaluated to determine the optimal set for model construction. The selection of the GPR model for the current study is supported by its probabilistic structure with uncertainty quantification capability. Moreover, the GPR model does not require a predefined function, which is well suited for stochastic simulation of groundwater fluctuations (Deringer et al. 2021).

RT model

RT models are ML methods that utilize categorization data in a statistical context. The simplicity and strong predictive power of this approach have led to its widespread application in hydrological process modeling (Wen et al. 2009; Wilkes et al. 2016). The algorithm first partitions the data into subsets by generating child nodes, thereby ensuring that the child nodes are more homogeneous than the parent nodes. The splitting process continues until further classification does not improve the trees. For regression problems, the decision tree model uses the least-squares deviation criterion to predict the target variable, which is typically a continuous real number. Unlike other models that have black-box transfer functions, the decision tree enables visualization of how each variable influences the tree structure. Although the model may include smaller trees with similar accuracy, the tree with the lowest cross-validation error was selected. RT was incorporated in the current study because of its interpretability and ability to handle nonlinear interactions between input variables without requiring extensive data preprocessing. RT is particularly effective in identifying dominant predictors, which aligns with the study's emphasis on models' interpretability. Table 1 summarizes the models developed in this study.

Table 1

Description for ML models used for groundwater level modeling in urban aquifers of Kuwait City

WellModel #ML modelPredictors
BN m1 Fine regression trees Previous month groundwater depth (g_1) 
m2 Linear SVM 
m3 Rational quadratic GPR Groundwater depth before 2 months (g_2) 
NZ m1 Fine regression trees 
m2 Linear SVM Groundwater depth before 3 months (g_3) 
m3 Rational quadratic GPR 
JB m1 Fine regression trees Groundwater depth before 4 months (g_4) 
m2 Linear SVM 
m3 Rational quadratic GPR Monthly rainfall (R) 
HL m1 Fine regression trees 
m2 Linear SVM Average monthly temperature (T) 
m3 Rational quadratic GPR 
WellModel #ML modelPredictors
BN m1 Fine regression trees Previous month groundwater depth (g_1) 
m2 Linear SVM 
m3 Rational quadratic GPR Groundwater depth before 2 months (g_2) 
NZ m1 Fine regression trees 
m2 Linear SVM Groundwater depth before 3 months (g_3) 
m3 Rational quadratic GPR 
JB m1 Fine regression trees Groundwater depth before 4 months (g_4) 
m2 Linear SVM 
m3 Rational quadratic GPR Monthly rainfall (R) 
HL m1 Fine regression trees 
m2 Linear SVM Average monthly temperature (T) 
m3 Rational quadratic GPR 

Data preprocessing

In the proposed modeling approach, the initial step involved detrending GWL data. The groundwater datasets exhibited a cumulative trend in the water table, indicating the shallowing of the water level owing to increased artificial recharge activities. The detrending process isolates the impact of external factors on the data, which is particularly crucial for projects involving dewatering or artificial recharge, which are challenging to estimate. This process significantly enhances the accuracy of ML models.

Detrending raw groundwater data offers another significant benefit: ensuring that large autocorrelation delays in groundwater time series are accurately specified. Strongly trending time series data often exhibit high autocorrelation coefficients. A commonly used detrending procedure in groundwater modeling is the first-order polynomial detrending process, also known as linear detrending (Ghanbari & Bravo 2011; Cao & Zheng 2016). In this method, groundwater data are fitted to a linear regression model using the least-squares criterion, and the remaining signal or detrended groundwater signal serves as the target variable for the ML models. The detrended groundwater datasets (the remainder) were then tested for autocorrelation using an autocorrelation function (ACF). The ML models simulated groundwater volatility in this scenario, independent of the impact of external fluxes. To account for these external influences, trends were added back to the groundwater signals after simulation using the ML models. The two primary hydrological and statistical considerations that justify employing linear detrending are: (i) addressing non-stationarity in GWL time series and (ii) reducing the impact of autocorrelation for robust ML forecasts. Figure 2 illustrates the sequential procedure of the proposed methodology in the present study. The choice of input features was based on practical hydrological understanding of the study area. Lagged groundwater levels were used due to their strong autocorrelation, which reflects the slow response of subsurface flow. Temperature was included as it captures the effect of artificial recharge in the study area (Alsumaiei 2020; Alkandari & Alsumaiei 2025), especially during hot months when irrigation and water use increase significantly. Rainfall was retained for completeness, though its influence is minimal in such an arid setting. This feature selection helped ensure that the models focused on the main drivers of GWL variation without being affected by less relevant or noisy variables.
Figure 2

Flowchart of the proposed groundwater modeling framework.

Figure 2

Flowchart of the proposed groundwater modeling framework.

Close modal

LIME for local interpretability

A surrogate glass-box model can be fitted to the decision space of any black-box model's prediction using a technique called LIME. By focusing on a sufficiently limited decision surface, even simple linear models can yield accurate approximations of the black-box model behavior. The primary goal of LIME is to simulate the local neighborhood of any given prediction. Users can then examine the glass-box model to understand how the black-box model behaves in a specific region. LIME generates synthetic data by perturbing individual data points, which are then assessed using a black-box system and used as a training set for the glass-box model. The advantages of LIME are its applicability to nearly all models and interpretability, which is similar to that of a linear model. However, these explanations are highly dependent on the perturbation process and can occasionally be unstable. Specifically, LIME examines how a model's predictions change when it is fed with varying sets of data. By perturbing individual data points, black-box predictions were obtained for these additional points. LIME then trains an interpretable weighted model, similar to a linear classifier, using this new dataset. This locally faithful explanation, also known as local fidelity, is then represented by a linear classifier. LIME has been utilized for interpreting hydrological models with promising results (Perera et al. 2024).

SHAP for global interpretability

To improve the interpretability of ML models, this study used SHAP, a powerful feature attribution method of coalitional game theory. By estimating how the prediction changes if a feature is included or removed, SHAP computes Shapley values, that is, the average marginal contribution of each feature to the model output. Unlike other interpretability methods, SHAP provides a global insight into model behavior that can reveal which features matter most across the entire dataset.

This study employed SHAP on all the trained ML models (SVM, GPR, and RT) to assess the relative impact of the predictors, such as past groundwater levels (g_1, g_2, g_3, and g_4), temperature (T), and rainfall (R). The generated SHAP summary plots visually represent feature importance, where predictors with higher absolute Shapley values contribute more significantly to GWL predictions. Moreover, adding the color gradient to the plots also provides a hint regarding the variation of the predictor magnitudes, offering further insights into feature distributions. Using SHAP to plug into the existing LIME-based local interpretability results in a complete multiscale interpretability framework. While LIME offers instance-specific explanations, SHAP provides global feature importance and dependencies, which is helpful in identifying the dominant predictors and interactions that control groundwater dynamics. The combination of these two sides improves both the transparency of ML models and the informedness of groundwater decision-making via localized and holistic insights into the predictive mechanisms of the ML models. Recent studies have witnessed a successful integration of SHAP for interpreting environmental variables (Makumbura et al. 2024; Mishra et al. 2025).

Performance measures

The generated model was validated using a conventional method of splitting the observed data into training and validation subsets. All ML models were constructed using 80% of the data for model training and 20% for model validation. To evaluate the performance of the ML models, three performance metrics, the coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE), were computed for each of the three validation rounds. R2 assesses the degree to which the simulated and observed targets are associated. The R2 value ranges from zero to one, where zero denotes the absence of any statistical relationship and one represents an exact match between the simulated and observed targets. R2 is the square of the Pearson correlation coefficient and evaluates how well a predictor can be generated from the model, rather than directly assessing the quality of the predictions, as the Pearson correlation does.

MAE served as the second criterion in this study. The MAE quantifies the variation of the simulated targets from the observations. The RMSE was the third criterion used to evaluate the effectiveness of the model. The RMSE measures the average difference between the values predicted by the statistical model and the actual observed values. Mathematically, the RMSE represents the standard deviation of the residuals, which are the differences between the observed data points and the regression line. The RMSE indicates the spread of the residuals, reflecting how well the observed data fit the predicted model. A lower RMSE value indicates that the data points are closer to the regression line, suggesting a more accurate model. The RMSE values are expressed in units of the dependent variable and can range from zero to positive infinity.

ML model results

The modeling procedure shown in Figure 2 begins by applying a linear detrending transformation to the collected groundwater data. The collected data exhibited evident decreasing trends (the water table became shallower) owing to excessive irrigation during the extremely hot and prolonged dry summer seasons. Irrigated water percolates through the soil and artificially recharges the phreatic water table. To validate the utility of the autoregressive modeling scheme, detrending was used to ensure the stationarity of the input data. Prior to forcing the data into the ML models, the GWL time series was detrended using a linear detrending method. In urban aquifers, where long-term artificial recharge and climate variations cause non-stationarity in the dataset, detrending is a crucial preprocessing step in groundwater modeling. Thus, to give the ML models a chance to capture short-term variations in the GWL rather than long-term cumulative changes, persistent trends were removed. GWLs in Kuwait City exhibit an overall shallowing trend, primarily owing to anthropogenic activities such as irrigation and infrastructure leakage. If the long-term trend is not removed, the ML models may misinterpret these trends as short-term fluctuations, leading to biased predictions that do not reflect the actual groundwater dynamics. Applying first-order polynomial detrending (linear detrending) isolates the underlying short-term variability, ensuring that the ML models focus on groundwater fluctuations rather than long-term shifts. Figure 3 depicts the detrending transformation of the groundwater signal in well NZ. The detrended signals for all wells (remainders) were then tested for autocorrelation by applying ACF. Figure 3 demonstrates that the linear detrending process successfully removed the cumulative effect of artificial recharge, making these seasonal and short-term variations more apparent. The detrending procedure is particularly important for constructing autoregressive models, as a predominant upward or downward trend may give rise to spurious correlations that reduce the accuracy of the ML-based forecasts. Figure 4 illustrates the ACF results for all the examined wells. The ACF results demonstrated a strong autocorrelation with the remaining groundwater signals. This supports the basic hypothesis of this study regarding the suitability of an autoregressive computational scheme for determining groundwater fluctuations at the study sites.
Figure 3

Groundwater level detrending procedure for well NZ.

Figure 3

Groundwater level detrending procedure for well NZ.

Close modal
Figure 4

Autocorrelation function (ACF) analysis for the examined wells.

Figure 4

Autocorrelation function (ACF) analysis for the examined wells.

Close modal

The autoregressive term in the current study was selected to be four months only, even though all wells exhibited higher-order autocorrelation. Limiting the autoregressive term to only four months avoids model overfitting and provides adequate short-term GWL forecasts. The remainder of the groundwater signals were then forced into different ML models, as shown in Figure 2. The data were chronologically divided into training and validation subsets for analysis. A total of 80% of the datasets were used for training, and the remaining 20% were used for model validation. The initial hyperparameters for all the ML models are listed in Table 2. The hyperparameters were carefully assigned to boost the efficacy of the models, avoid overfitting, and reduce the computation time. Hyperparameters selection was guided by model performance on a validation stage using RMSE as the primary criterion, with an emphasis on achieving both high accuracy and generalization across wells. This process allowed for a balanced trade-off between complexity and performance, ensuring that the final models were both efficient and robust.

Table 2

Hyperparameters for ML model construction

ML modelHyperparameters
Regression tree Preset Fine tree 
Min. leaf size 
Surrogate decision split Off 
SVM Kernel function Linear 
Kernel scale Automatic 
Box constraint Automatic 
Standardize data Yes 
GPR Preset Rational quadratic GPR 
Basis function Constant 
Kernel function Rational quadratic 
Optimize numeric parameters Yes 
Standardize data Yes 
ML modelHyperparameters
Regression tree Preset Fine tree 
Min. leaf size 
Surrogate decision split Off 
SVM Kernel function Linear 
Kernel scale Automatic 
Box constraint Automatic 
Standardize data Yes 
GPR Preset Rational quadratic GPR 
Basis function Constant 
Kernel function Rational quadratic 
Optimize numeric parameters Yes 
Standardize data Yes 

The kernel function for the SVM model was parameterized to linear to render the SVM model capable of handling pattern recognition within the GWL dataset efficiently. Furthermore, to enable the SVM model to balance error prioritization, the box constraint (C) was set automatically. In regard to GPR model parameterization, the rational quadratic enabled the GPR model to capture short-and long-term movements in groundwater fluctuations. Furthermore, generalization was improved by the automatic optimization of numerical parameters. To prevent the RT models from splitting excessively, a minimum leaf size of four was assigned to all RT models. This prevents the tree depth from growing too much while allowing for relatively interpretable models. Despite the manual assignment of ML model hyperparameters, sound models' accuracies were achieved (R2 up to 0.98) and low prediction errors. Figure 5 shows the observed (true) versus simulated (predicted) groundwater remainder time series at the examined wells in the study area. All ML models demonstrated reliable and robust predictability of the observed data within the training and validation periods. Specifically, seasonal variations in GWLs were well captured, with notable efficiency in replicating the observed peak values. Although the ML models described above generated good results, fine-tuning the model parameters could produce even better outcomes. These models settings were initially selected arbitrarily. However, in terms of statistical indicators, the outcomes were satisfactory. A thorough sensitivity analysis of several ML models is recommended for future studies in this direction, as such analysis might be outside the scope of this study owing to the multidimensional optimization involved. However, from the standpoint of hydrological modeling, the models outputs are deemed sufficient and trustworthy for managing water resources.
Figure 5

Observed (true) versus predicted groundwater levels remainder for all wells.

Figure 5

Observed (true) versus predicted groundwater levels remainder for all wells.

Close modal
Figure 6 shows the performances of the ML models with respect to the 1:1 perfect-matching line. The plots indicate the random scattering of data points with respect to the 45° line. This randomness indicates that all models generated objective forecasts without any considerable bias. Figure 7 depicts the distribution of the model errors with respect to the true response values. The error distributions supported the unbiased predictions of the models; however, they demonstrated slightly larger errors that were associated with larger true values. This indicates that the efficiency of the ML models deteriorated slightly as the water table became deeper. This slight decrease in the efficiency of the ML models was considered minimal and did not violate the reliability of the prediction capacity. Table 3 lists the statistical evaluations of all models during the training and validation periods. The primary statistical metric to consider was R2, which provides a commonly recognized quantitative metric for modeling efficiency. The R2 values generally ranged between 0.75 and 0.98 for all wells, with the SVM model demonstrating notable superiority for all examined wells, except for well HL. Notably, the efficient performance of the ML models in terms of R2 during the validation period enhanced the robustness of the modeling approach developed in this study. This enhanced performance of the ML models is mainly attributed to the detrending procedure applied to the data before the application of the ML model and to high autocorrelation within groundwater remainder datasets. The detrending procedure reduced the data noise and rendered the groundwater data patterns more detectable for all the ML models. The other error quantification metrics listed in Table 3 support the resemblance of the predicted groundwater time series to the true groundwater time series with minimal error values. Another notable pattern in the results of Table 3 is the higher errors in the ML models observed in wells with deeper groundwater tables, hence showing lower predictive performance. Most of these results stem from weaker autocorrelation in deeper groundwater systems with timescales long enough for the effect of fluctuation and influences of delayed recharge and regional flow dynamics, as opposed to immediate surface interaction. In addition, the artificial recharge effects that have a large impact on shallow aquifers have a reduced effect on deeper water tables, resulting in less success for short-term ML predictions. A general outlook to the results shows that the SVM model stood out as the top performer in most wells, delivering the highest R2 values and the lowest error rates in three out of the four cases (Table 3). GPR produced similar results in some wells but showed more variation. Although RT models produced good results, their predictive performance was generally lower, particularly in deeper aquifers. These results confirm that SVM offers a reliable balance between accuracy and generalization, especially when combined with effective preprocessing steps like detrending and autocorrelation-based input selection.
Table 3

Statistical performance metrics for different wells in urban aquifers of Kuwait City (validation period)

WellModel #R2MAERMSE
BN m1 0.85 0.088 0.106 
m2 0.95 0.052 0.063 
m3 0.95 0.052 0.064 
NZ m1 0.83 0.097 0.125 
m2 0.98 0.029 0.038 
m3 0.92 0.063 0.085 
JB m1 0.85 0.053 0.065 
m2 0.93 0.034 0.044 
m3 0.92 0.039 0.049 
HL m1 0.75 0.112 0.145 
m2 0.85 0.085 0.114 
m3 0.88 0.077 0.102 
WellModel #R2MAERMSE
BN m1 0.85 0.088 0.106 
m2 0.95 0.052 0.063 
m3 0.95 0.052 0.064 
NZ m1 0.83 0.097 0.125 
m2 0.98 0.029 0.038 
m3 0.92 0.063 0.085 
JB m1 0.85 0.053 0.065 
m2 0.93 0.034 0.044 
m3 0.92 0.039 0.049 
HL m1 0.75 0.112 0.145 
m2 0.85 0.085 0.114 
m3 0.88 0.077 0.102 

Best model for each well is indicated in bold.

Figure 6

Models performance comparison with 1 |1 perfect-matching line.

Figure 6

Models performance comparison with 1 |1 perfect-matching line.

Close modal
Figure 7

Error distribution of model predictions against true groundwater levels.

Figure 7

Error distribution of model predictions against true groundwater levels.

Close modal

Interpreting ML models using LIME and SHAP

Figure 8 demonstrates the LIME values for all the ML models for the examined wells. The GWLs of the previous month had the highest LIME values for all wells under all the ML models. This result can be attributed to the strong correlation between the current and previous month's GWLs of all wells. From a hydrological perspective, the subsurface groundwater flow regime is characterized by slow flow velocities, which ultimately lead to successive monthly groundwater depth values being very close to one another. Regarding rainfall input to the ML models, the LIME values were minimal and did not significantly contribute to the ML models. The justification for these results should be aligned with a comprehensive understanding of the hydrological processes controlling the variation in GWLs in the study area. In arid climates, prolonged drought periods limit the natural recharge to the groundwater table by rainfall. Instead, the elevated heat in summer promotes the excessive use of domestic water supply for lawn irrigation in urban areas. In addition, frequent dust storms in summer increase the use of water for street and yard cleaning. These activities result in the artificial recharging of the water table, leading to a shallower water table in late summer. Therefore, nearly all the well temperature variables had higher LIME values than rainfall, as these artificial recharge activities were strongly associated with elevated temperatures.
Figure 8

LIME values for ML models across the examined wells.

Figure 8

LIME values for ML models across the examined wells.

Close modal

A LIME approach was employed in this study to enhance the local interpretability of ML models. This investigation confirmed the significance of groundwater at the previous time step in predicting the current GWLs in a monthly time scale. The influence of meteorological inputs, such as rainfall and temperature, was of secondary importance. LIME highlights the dependence of predictions on one or two features owing to the strong temporal autocorrelation in groundwater levels, where past values heavily influence future predictions. In arid climates such as Kuwait, natural recharge is minimal, and artificial recharge (e.g., irrigation) dominates, making temperature and previous groundwater levels the most influential factors. ML models optimize predictions based on data variability and naturally prioritize highly correlated inputs. Additionally, LIME provides local explanations, meaning that it identifies the most significant features for a specific instance rather than across the entire dataset. This result aligns with the hydroclimatic features of the study area, where natural recharge and natural evaporation from the water table are limited owing to the characteristics of the study area in terms of limited rainfall events and a high percentage of surface imperviousness. In addition, from a theoretical perspective, LIME explanations align with hydrological and ML theories. The strong temporal autocorrelation of GWLs is attributed to the slow movement of subsurface water, as explained by Darcy's equation. More specifically, and considering the arid climatology of the study site, artificial forcing primarily controls recharge (Almedeij & Al-Ruwaih 2006; Alsumaiei 2020). In contrast, ML models optimize predictions by prioritizing dominant predictors, consistent with Vapnik's Statistical Learning Theory (Cortes & Vapnik 1995). LIME's local approximation relies on the Locally Weighted Learning principle, which gives more weight to temporally adjacent values in autoregressive data . These theoretical foundations justify why past groundwater levels and temperature dominate the LIME results, reinforcing the model's reliability in hydrological forecasting.

The LIME approach is based on the assumption that the decision boundary of a complex ML model is linear around the instance for which the explanation should be provided. It works by training an interpretable model on a perturbed sample around an instance of interest and provides an explanation for the observed phenomenon. Specifically, LIME generates a perturbed sample around the instance for which an explanation is required. Subsequently, LIME obtains the explanation prediction for each instance in the perturbed sample. The perturbed sample and explanation prediction are then used as the training dataset for the interpretable model. Thereafter, the approach assigns weights to the examples in the newly formed training dataset depending on how close these examples are to the instance being explained. Finally, LIME uses the updated training dataset to fit an interpretable model. This embedded algorithm ensures that the input features are well explained within the ML modeling framework.

The SHAP summary plots in Figure 9 extend ML models explanation by providing a global perspective on feature importance, highlighting how each predictor contributes to GWL predictions across different models. Unlike LIME, which offers local interpretability for individual predictions, SHAP quantifies the overall impact of predictors across the dataset, thereby allowing a more comprehensive assessment of model behavior. The SHAP results confirmed that past groundwater levels (g_1, g_2, g_3, and g_4) exerted the most decisive influence on GWL predictions across all models. This aligns with the findings of LIME and supports the high temporal autocorrelation of groundwater fluctuations observed in the study area. Temperature (T) is also identified as a moderately influential feature, particularly in models where seasonal variations play a role in artificial recharge and evaporation processes. Rainfall (R), however, has a relatively low SHAP value, reinforcing the conclusion that natural recharge plays a minimal role in groundwater dynamics in arid urban environments.
Figure 9

SHAP summary plots illustrating global feature importance for all models.

Figure 9

SHAP summary plots illustrating global feature importance for all models.

Close modal

Models such as BN-m1 and JB-m3 are cases where deeper groundwater predictors (g_3 and g_4) are more influential, and SHAP is a better option than LIME because it can capture feature interactions. This implies that some models have a more pronounced dependence on longer-term effects, which could be overlooked by the LIME model, which is limited in space. In this case, SHAP is more computationally intensive because model evaluation is performed more often than LIME, which is faster, provides an instance-based explanation, and requires lower computational resources.

SHAP and LIME complement each other; thus, together, they provide better overall interpretability of ML models. SHAP provides a global insight into the importance of features in identifying the shape of the dominant groundwater trend drivers. In contrast, LIME is more applicable for localized decision-making than SHAP because it provides case-specific explanations. This confirms that the ML framework is suitable for groundwater forecasting in urban aquifers, and the results from both methods align to strengthen the confidence in the reliability of this framework. From a computational perspective, SHAP is more demanding than LIME, as it evaluates numerous feature combinations to estimate global importance. LIME, on the other hand, generates faster, instance-level explanations with lower computational cost, making it more suitable for real-time or operational use. While SHAP offers broader insight during model evaluation, LIME provides a quicker and more practical option for everyday decision-making.

Comparisons with previous studies

The findings of this study using the ML modeling approach closely align with those of previous studies in the field (Sahoo et al. 2017; Kardan Moghaddam et al. 2021; Pham et al. 2022; Tao et al. 2022; LaBianca et al. 2024). The results confirmed that ML can provide an effective model for simulating changes in the GWLs. The R2 values for the wells examined in this study ranged from 0.75 to 0.98, which are consistent with or even surpass those of other studies using similar ML approaches within similar hydrologic settings. For instance, LaBianca et al. (2024) reported R2 values between 0.4 and 0.7 for modeling GWL fluctuations with different ML schemes applied to an urban aquifer in Denmark. This superiority can be attributed to the differences in recharge drivers between the present study area and the study location examined by Labianca et al. (2024). Considering the Kuwait City urban aquifer specifically, the SVM and GPR models developed in the present study achieved R2 values of 0.98, which are comparable to the performance of the NARX models reported by Alsumaiei (2020), where R2 ranged between 0.76 and 0.99. However, the proposed framework introduces interpretability to these black-box models, an aspect not previously addressed in the study area, by quantitatively explaining how lagged groundwater levels influence current water table dynamics. In comparison, the numerical model developed by Alkandari & Alsumaiei (2025) reported a best-case RMSE of 1.13 m, whereas the maximum RMSE across all wells in the current study did not exceed 0.145 m. Notably, the numerical model was highly dependent on detailed aquifer parameters and boundary condition calibration, while the ML framework relied solely on time series inputs. These results underscore the data-efficiency, robustness, and practical strength of the proposed approach in simulating groundwater fluctuations in hyper-arid, anthropogenically influenced environments.

A fundamental concept of using ML models to model hydrological processes is to consider the area in which the ML models have been applied. Unfortunately, ML model applications for simulating GWL dynamics in urbanized aquifer systems under arid climatic conditions have not been thoroughly explored, except in a limited number of studies(e.g. (LaBianca et al. 2024)). This study addresses this gap and demonstrates the promising results of ML applications in such aquifer systems. Additionally, to the best of the author knowledge, this study presents the first attempt to utilize interpretable techniques to explain the ML model predictors. According to Khan et al. (2023), none of the studies using ML techniques or physical models to forecast GWLs published between 2008 and 2022 used LIME techniques to explain model forcing. The efficiency metrics of the current study were also found to surpass those of the statistical models applied to the same study area (Almedeij & Al-Ruwaih 2006). The MAE for the wells investigated in this study was nearly 50% lower than that obtained using the periodic statistical models.

The ML approach accurately reflected GWL variability, with minimal differences between modeled and observed values. Urban water managers in the study area can utilize the presented ML approach to support decisions related to groundwater table control, as it has proven to be effective in capturing the impact of artificial recharge compared to other statistical or PB methodologies implemented in the same region. Furthermore, the ML model can predict changes in GWLs without requiring detailed measurements of the aquifer parameters. This was evident when comparing the findings of the current study with the outcomes of numerical simulations conducted in the study area (Hamdan & Mukhopadhyay 1991; Alkandari & Alsumaiei 2025). Such metrics often do not exist or only partially characterize aquifer heterogeneity. However, the proposed method requires only meteorological and groundwater data, both of which are readily available. Therefore, the proposed method can be easily implemented in similar aquifer systems.

Although hybridization with meta-heuristic optimization was found to be a sound approach for enhancing predictive models' efficacy in numerous previous studies, the focus of the current study is dedicated to interpretability of ML models outcomes. Emerging hybrid ML models include the Ant Lion Optimizer model, LSTM-ALO, the optimization algorithm LSTM-INFO, which involves the long short-term memory neural network and information maximization objective, and the support vector machine combined with the firefly algorithm and particle swarm optimization, FFAPSO. The lack of such hybridization is considered a drawback of the current study, as it does not take advantage of any hybridization technique. Considering the simplicity of this modeling framework for practical purposes, the modeling results can be considered acceptable. In many cases, hybrid approaches require considerable computational capacity, which is not always available. To further improve the effectiveness of the model, this study recommends that future research should be directed toward optimizing ML models.

Model shortcomings and generalizability

Despite the validity of using ML models, several minor flaws must be considered. First, the selection of hyperparameters for ML models is highly arbitrary. The hyperparameters were determined through trial and error, which does not guarantee that the parameter values represented a globally optimal solution. Second, overfitting can cause the ML model to perform poorly in terms of forecasting. Overfitting occurs when the ML model attempts to accommodate the noise component of the data during the training phase, rather than the actual data pattern, leading to a significant decline in performance during the validation phase. However, the performance metrics during the validation period may indicate that overfitting is not an issue. A more comprehensive analysis of the models' tendencies for overfitting might be more insightful in exploring overfitting and should be conducted in future studies. The monthly GWLs across four monitoring wells were modeled using a ML-based modeling approach to verify the generalizability of the results. Data were collected over an adequate temporal scale, which was considered sufficient for model testing and generalizability assessment. Comparable statistical metrics across all wells indicate that the models demonstrated consistent efficiency, highlighting the additional robustness of the created ML-based models. Therefore, the proposed modeling framework can be applied to regions with similar climates. If the modeling framework is to be used in a different climate, it would need to be recalibrated and its parameters reevaluated. It is also recommended that the evaluation of model generalizability be expanded to include additional data-driven models.

One limitation of the current research is the lack of ML hybridization, such as incorporating a complementary algorithm into a synthetic ML model to predict groundwater fluctuations. This can significantly enhance prediction accuracy and reliability. Standalone ML models, as presented in the current study, showed reliable efficacy and adequacy in capturing nonlinear relationships, but they might exhibit deteriorated performance with long-term dependencies and regional groundwater flow patterns. The gaps in this model can be bridged by hybrid approaches that combine different modeling strengths. Integrating DL (e.g., LSTM-ALO and CNN-SVM) can improve accuracy by capturing long-term dependencies in groundwater fluctuations. Furthermore, optimizing feature selection using genetic algorithms (GA) or particle swarm optimization (PSO) can enhance the efficiency of the model. In addition, using ensemble methods, such as RF + gradient boosting (GB), can reduce errors by aggregating multiple ML predictions and improving overall accuracy.

Practical applications and policy implications

The findings of this study provide critical insights into groundwater management in urban aquifers. In particular, the hydrological process of recharging the groundwater table in arid climates is better articulated, where artificial recharge forcing significantly impacts water table dynamics. The high predictive accuracy of the ML models (R2 = 0.75–0.98) demonstrates their practical viability for assisting urban planners and policymakers in making informed decisions regarding groundwater management. The foundational policy implication of this study is the need for real-time monitoring and adaptive management of shallow water tables. Interpretability analysis showed that the influence of the previous GWL strongly impacted the current GWL values rather than rainfall. This finding provides municipal authorities with scientific guidance to prioritize long-term groundwater monitoring and enhanced dewatering strategies rather than relying on traditional hydrometeorological assumptions. Given that artificial recharge, primarily from irrigation and leakage, was identified as a major driver of groundwater fluctuations, policymakers should consider regulations to control excessive water use in urban landscaping and infrastructure leakage. Additionally, the ML framework developed in this study offers a scalable and transferable approach for urbanized arid regions facing similar groundwater management challenges. The ability of ML models to outperform conventional statistical methods by reducing the MAE by 50% (Table 3) further supports their integration into the national water management frameworks. Moreover, the interpretability of ML models through LIME and SHAP provides a transparent, explainable AI-based decision-support tool, fostering greater trust among stakeholders, including hydrologists, engineers, and policymakers. This enhances cross-disciplinary collaboration for developing sustainable groundwater policies and urban planning strategies.

Advancing groundwater modeling: overcoming the limitations of traditional approaches with ML

Traditional groundwater models, such as numerical models (e.g., MODFLOW), require extensive parameterization of aquifer properties, such as hydraulic conductivity and storage coefficients (Alsumaiei & Bailey 2018a, b). These data can be scarce or difficult to obtain, particularly in urban and arid aquifers with complex hydrogeological conditions. Moreover, numerical models require large amounts of computational resources and substantial long simulation run times, which makes them impractical for real-time groundwater management. Additionally, they rely on oversimplified assumptions, such as homogeneous aquifer conditions, that do not consider the spatial variability of chambers of artificial recharge, that is, irrigation and leakage. In addition, these models do not adapt; they require manual recalibration when environmental conditions change (i.e., urbanization or climate variability), making them unsuitable for dynamic groundwater systems.

A data-driven alternative to the above challenges is based on ML models. In contrast to physical models, ML models only seek space and time series data of groundwater and meteorological fields; therefore, they are appropriate for places with low data availability. In this study, SVM, GPR, and RT models were developed, demonstrating high predictive accuracy (R2 = 0.75–0.98) that surpassed traditionally and statistically developed models. These ML models learn the nonlinear relationships between groundwater fluctuations without assuming an explicit form of system dynamics or automatic ability. Moreover, ML models are computationally efficient and can be updated in real-time with new data. Unlike traditional models, LIME-based interpretability allows urban planners to determine the most significant factors in terms of what causes groundwater fluctuations. By combining ML models with groundwater management frameworks, decision-makers can obtain more adaptive and accurate groundwater control, and thus, more sustainable urban planning in arid environments. For long-term groundwater forecasts, further model robustness could be improved by combining hybrid ML-physical modeling and investing in DL methods through additional research.

Broader applicability of the proposed framework to other urban and arid/semi-arid regions

The ML-based groundwater modeling framework developed in this study is highly adaptable and can be applied to other urban and arid/semi-arid regions that experience similar hydrological challenges. Urban aquifers worldwide, particularly in arid and semi-arid environments, face groundwater fluctuations driven by artificial recharge (e.g., irrigation and leaks), limited natural infiltration, and urbanization-induced changes. The proposed framework, which integrates SVM, GPR, and RT models with interpretability, offers a scalable and transferable approach to groundwater management. Regions such as the Arabian Gulf, Mediterranean coastal cities, arid regions in the United States, and parts of Australia face comparable challenges with shallow water tables, urban expansion, and climate variability. As the framework requires minimal input data (past GWL records and meteorological variables), it is particularly useful for data-scarce environments where traditional physical models are impractical. However, these proposed applications should consider fine-tuning for model parameters within the ML model training period to tailor the models to the unique regional hydrogeological characteristics. Furthermore, incorporating remote sensing data and integrating hybrid ML-physical models could be beneficial for city planners to enhance predictive accuracy. Expanding this approach to different climatic zones can further validate its effectiveness for global groundwater management.

This study investigated the applicability of interpretable ML models for simulating changes in GWLs in dry and urbanized aquifer systems, with a specific focus on Kuwait City. A computational framework was developed and applied to selected groundwater wells in the study area. A linear detrending technique was employed to preprocess the GWL time series data. Autocorrelation analysis of the detrended groundwater remainder time series data revealed high autocorrelation coefficients. The modeling approach used detrended groundwater data to construct the SVM, GPR, and RT models for the examined wells within the study area. Owing to the distinct hydrological processes within the study area, artificial recharge sources are the primary cause of shallow water table development. This study incorporated six predictors, forcing different ML models to be used. The analysis revealed that GWLs in the previous month were the most influential predictors of the current GWLs. The LIME technique was employed to interpret the ML model results based on the hydro-lithological features of the study area. In addition, SHAP analysis was conducted to extend the ML models interpretability to a global scale. The efficiency of the proposed procedure was evaluated by applying a conventional chronological division of groundwater data into training and validation subsets.

Longer groundwater data inputs to the ML network have been shown to improve network performance. The R2 values for the wells examined in this study ranged from 0.75 to 0.98, during the validation period. Comparisons with other GWL modeling techniques applied at the research site demonstrated that the ML-based approach surpassed statistical and other PB models, with a notable 50% decrease in MAE compared with that of statistical periodic models. In contrast to the PB models, the proposed ML model is user-friendly and does not require detailed field data. The findings of this study align with those of previous studies on modeling changes in GWLs using AI techniques. Further research on groundwater resources in arid regions is crucial for developing effective water management strategies, particularly in areas where groundwater is the primary accessible water source. Although the methodology was tested on a limited number of wells in Kuwait City, the modeling approach could be applicable to other urban aquifer settings, provided that hydro-lithologic similarities are considered in the future. This study underscores the potential of interpretable ML models as robust tools for urban groundwater forecasting in arid environments. The findings highlight the viability of ML-based approaches as data-efficient, adaptable alternatives to traditional groundwater models, supporting informed water resource management strategies.

The author declares that no funds, grants, or other support were received during the preparation of this manuscript.

All relevant data are included in the paper.

All relevant data are included in the paper or its Supplementary Information.

The author declares there is no conflict.

Al-Sanad
H. A.
&
Shaqour
F. M.
(
1991
)
Geotechnical implications of subsurface water rise in Kuwait
,
Eng. Geol.
,
31
,
59
69
.
Alsumaiei
A. A.
(
2024b
)
Long-term rainfall forecasting in arid climates using artificial intelligence and statistical recurrent models
,
J. Eng. Res.
https://doi.org/10.1016/j.jer.2024.03.001
.
Alsumaiei
A. A.
(
2025a
)
Improving evaporative loss forecasts in arid climates by integrating machine learning models with feature selection algorithms
,
J. Am. Water Resour. Assoc.
,
61
,
e70025
.
https://doi.org/10.1111/1752-1688.70025
Alsumaiei
A. A.
(
2025b
)
Modeling of drought periods onset using explainable machine learning models enhanced by Bayesian optimization
,
J. Hydrol. Eng.
,
https://doi.org/10.1061/JHYEFF/HEENG-6515
Barrera-Animas
A. Y.
,
Oyedele
L. O.
,
Bilal
M.
,
Akinosho
T. D.
,
Delgado
J. M. D.
&
Akanbi
L. A.
(
2022
)
Rainfall prediction: a comparative analysis of modern machine learning algorithms for time-series forecasting
,
Mach. Learn. Appl.
,
7
,
100204
.
Bierkens
M. F. P.
&
Wada
Y.
(
2019
)
Non-renewable groundwater use and groundwater depletion: a review
,
Environ. Res. Lett.
,
14
,
63002
.
Bowes, B. D., Sadler, J. M., Morsy, M. M., Behl, M. & Goodall, J. L. (2019) Forecasting groundwater table in a flood prone coastal city with long short-term memory and recurrent neural networks, Water, 11 (5), 1098.
Connor
R.
(
2015
)
The United Nations World Water Development Report 2015: Water for A Sustainable World
.
Paris, France: UNESCO publishing
.
Cortes
C.
&
Vapnik
V.
(
1995
)
Support-vector networks
,
Mach. Learn.
,
20
,
273
297
.
Deringer
V. L.
,
Bartók
A. P.
,
Bernstein
N.
,
Wilkins
D. M.
,
Ceriotti
M.
&
Csányi
G.
(
2021
)
Gaussian process regression for materials and molecules
,
Chem. Rev.
,
121
,
10073
10141
.
Ghanbari
R. N.
&
Bravo
H. R.
(
2011
)
Coherence among climate signals, precipitation, and groundwater
,
Groundwater
,
49
,
476
490
.
Gonzalez, R. Q. & Arsanjani, J. J. (2021) Prediction of groundwater level variations in a changing climate: a Danish case study. ISPRS Int. J. Geo-Inform., 10 (11), 792.
Gupta
S. K.
,
Sahoo
S.
,
Sahoo
B. B.
,
Srivastava
P. K.
,
Pateriya
B.
&
Santosh
D. T.
(
2024
)
Prediction of groundwater level changes based on machine learning technique in highly groundwater irrigated alluvial aquifers of south-central Punjab, India
.
Phys. Chem. Earth, Parts A/B/C
,
135
,
103603
.
Hamdan
L.
&
Mukhopadhyay
A.
(
1991
)
Numerical simulation of subsurface-water rise in Kuwait City
,
Groundwater
,
29
,
93
104
.
Jodhani
K. H.
,
Gupta
N.
,
Dadia
S.
,
Patel
H.
,
Patel
D.
,
Jamjareegulgarn
P.
,
Singh
S. K.
&
Rathnayake
U.
(
2025
)
Sustainable groundwater management through water quality index and geochemical insights in Valsad India
,
Sci. Rep.
,
15
,
8769
.
Kardan Moghaddam
H.
,
Ghordoyee Milan
S.
,
Kayhomayoon
Z.
,
Rahimzadeh kivi
Z.
&
Arya Azar
N.
(
2021
)
The prediction of aquifer groundwater level based on spatial clustering approach using machine learning
,
Environ. Monit. Assess.
,
193
,
173
.
LaBianca
A.
,
Mortensen
M. H.
,
Sandersen
P.
,
Sonnenborg
T. O.
,
Jensen
K. H.
&
Kidmose
J.
(
2023
)
Impact of urban geology on model simulations of shallow groundwater levels and flow paths
,
Hydrol. Earth Syst. Sci.
,
27
,
1645
1666
.
LaBianca
A.
,
Koch
J.
,
Jensen
K. H.
,
Sonnenborg
T. O.
&
Kidmose
J.
(
2024
)
Machine learning for predicting shallow groundwater levels in urban areas
,
J. Hydrol.
,
632
,
130902
.
Ma
Y.
&
Guo
G.
(
2014
)
Support Vector Machines Applications
.
Springer
. https://doi.org/10.1007/978-3-319-02300-7
Ma, Y., Montzka, C., Bayat, B. & Kollet, S. (2021) Using long short-term memory networks to connect water table depth anomalies to precipitation anomalies over Europe, Hydrol. Earth Syst. Sci., 25, 3555–3575, https://doi.org/10.5194/hess-25-3555-2021
.
Maier, H. R., Jain, A., Dandy, G. C. & Sudheer, K. P. (2010) Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions, Environ. Model. Softw., 25 (8), 891–909.
Mishra
U.
,
Tiwari
D.
,
Pandey
K. K.
,
Pagariya
A.
,
Kumar
K.
,
Gupta
N.
,
Jodhani
K. H.
&
Rathnayake
U.
(
2025
)
Explainable machine learning to analyze the optimized reverse curve geometry for flow over ogee spillways
,
Water Resour. Manage
,
39
(
5
),
1
23
.
Nearing
G. S.
,
Kratzert
F.
,
Sampson
A. K.
,
Pelissier
C. S.
,
Klotz
D.
,
Frame
J. M.
,
Prieto
C.
&
Gupta
H. V.
(
2021
)
What role does hydrological science play in the age of machine learning?
,
Water Resour. Res.
,
57
,
e2020WR028091
.
Oswald
C. J.
,
Kelleher
C.
,
Ledford
S. H.
,
Hopkins
K. G.
,
Sytsma
A.
,
Tetzlaff
D.
,
Toran
L.
&
Voter
C.
(
2023
)
Integrating urban water fluxes and moving beyond impervious surface cover: a review
,
J. Hydrol.
,
618
,
129188
.
Perera
U.
,
Coralage
D. T. S.
,
Ekanayake
I. U.
,
Alawatugoda
J.
&
Meddage
D. P. P.
(
2024
)
A new frontier in streamflow modeling in ungauged basins with sparse data: a modified generative adversarial network with explainable AI
,
Results Eng.
,
21
,
101920
.
Pham
Q. B.
,
Kumar
M.
,
Di Nunno
F.
,
Elbeltagi
A.
,
Granata
F.
,
Islam
A. R. M. T.
,
Talukdar
S.
,
Nguyen
X. C.
,
Ahmed
A. N.
&
Anh
D. T.
(
2022
)
Groundwater level prediction using machine learning algorithms in a drought-prone area
,
Neural Comput. Appl.
,
34
,
10751
10773
.
Rajaee, T., Ebrahimi, H. & Nourani, V. (2019) A review of the artificial intelligence methods in groundwater level modeling. J. Hydrol., 572, 336–351.
Rathnayake
N.
,
Rathnayake
U.
,
Chathuranika
I.
,
Dang
T. L.
&
Hoshino
Y.
(
2023
)
Cascaded-ANFIS to simulate nonlinear rainfall–runoff relationship
,
Appl. Soft Comput.
,
147
,
110722
.
Sahoo
S.
,
Russo
T. A.
,
Elliott
J.
&
Foster
I.
(
2017
)
Machine learning algorithms for modeling groundwater level changes in agricultural regions of the US
,
Water Resour. Res.
,
53
,
3878
3895
.
Shabani
S.
,
Samadianfard
S.
,
Sattari
M. T.
,
Mosavi
A.
,
Shamshirband
S.
,
Kmet
T.
&
Várkonyi-Kóczy
A. R.
(
2020
)
Modeling pan evaporation using Gaussian process regression K-nearest neighbors random forest and support vector machines; comparative analysis
,
Atmosphere (Basel)
,
11
,
66
.
Tao
H.
,
Hameed
M. M.
,
Marhoon
H. A.
,
Zounemat-Kermani
M.
,
Heddam
S.
,
Kim
S.
,
Sulaiman
S. O.
,
Tan
M. L.
,
Sa'adi
Z.
&
Mehr
A. D.
(
2022
)
Groundwater level prediction using machine learning models: a comprehensive review
,
Neurocomputing
,
489
,
271
308
.
Vyas
U.
,
Patel
D.
,
Vakharia
V.
&
Jodhani
K. H.
(
2024
)
Integrating GEE and IWQI for sustainable irrigation: a geospatial water quality assessment
,
Groundwater Sustainable Dev.
,
27
,
101332
.
Wen
L.
,
Ling
J.
,
Saintilan
N.
&
Rogers
K.
(
2009
)
An investigation of the hydrological requirements of River Red Gum (Eucalyptus camaldulensis) forest, using classification and regression tree modelling
,
Ecohydrol. Ecosyst. L. Water Process Interact. Ecohydrogeomorphology
,
2
,
143
155
.
Winpenny
J.
,
Heinz
I.
,
Koo-Oshima
S.
,
Salgot
M.
,
Collado
J.
,
Hernandez
F.
&
Torricelli
R.
(
2010
)
The Wealth of Waste: the Economics of Wastewater use in Agriculture
.
Rome: Food and Agriculture Organization of the United Nations (FAO)
.
Wu
Z.
,
Lu
C.
,
Sun
Q.
,
Lu
W.
,
He
X.
,
Qin
T.
,
Yan
L.
&
Wu
C.
(
2023
)
Predicting groundwater level based on machine learning: a case study of the Hebei plain
,
Water
,
15
,
823
.
Yadav
B.
,
Gupta
P. K.
,
Patidar
N.
&
Himanshu
S. K.
(
2020
)
Ensemble modelling framework for groundwater level prediction in urban areas of India
,
Sci. Total Environ.
,
712
,
135539
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).