Groundwater is essential for sustaining water needs, industrial growth, agriculture, and ecosystems, particularly in arid regions. This study uses data from GRACE and MODIS satellites, integrating environmental variables like land surface temperature, soil moisture, terrestrial water storage, precipitation, and vegetation indices to predict groundwater levels in Morocco’s Rabat-Salé Kenitra region. These environmental variables serve as input parameters, with the output being the predicted groundwater level. Advanced machine learning models, including Gradient Boosting Regression (GBR), Support Vector Regression (SVR), Random Forest (RF), and Decision Tree (DT) were employed to capture the relationships between these variables and groundwater levels. The GBR model showed superior performance with an R2 value of 0.99, a Mean Absolute Error (MAE) of 1.94, and a Root Mean Squared Error (RMSE) of 2.98, significantly improving over traditional methods that struggle with non-linear relationships and data noise. Compared to existing methods, our approach offers enhanced accuracy and robustness due to the GBR model’s ability to handle complex and non-linear relationships. This study demonstrates the advantages of integrating diverse environmental datasets with advanced machine learning techniques, improving groundwater management strategies and prediction reliability, especially in regions facing significant water scarcity and climate change impacts.

  • Pioneering application of GRACE satellite data and MODIS imagery for groundwater forecasting in a water-stressed region of North Africa.

  • Development of a robust machine learning modeling framework that achieves high accuracy in predicting monthly groundwater fluctuations, outperforming traditional time series approaches.

GIS Geographic Information System 
GWS groundwater storage 
GWL groundwater level 
GRACE Gravity Recovery and Climate Experiment 
DT decision tree 
ML machine learning 
MAE mean absolute error 
MSE mean squared error 
MODIS Moderate Resolution Imaging Spectroradiometer 
NASA National Aeronautics and Space Administration 
NDVI Normalized Difference Vegetation Index 
RMSE root mean squared error 
RS remote sensing 
RF random forest 
SVR support vector regression 
SVM support vector machine 
TWS terrestrial water storage 
XGB Extreme Gradient Boosting 
GIS Geographic Information System 
GWS groundwater storage 
GWL groundwater level 
GRACE Gravity Recovery and Climate Experiment 
DT decision tree 
ML machine learning 
MAE mean absolute error 
MSE mean squared error 
MODIS Moderate Resolution Imaging Spectroradiometer 
NASA National Aeronautics and Space Administration 
NDVI Normalized Difference Vegetation Index 
RMSE root mean squared error 
RS remote sensing 
RF random forest 
SVR support vector regression 
SVM support vector machine 
TWS terrestrial water storage 
XGB Extreme Gradient Boosting 

Groundwater is an essential resource for fulfilling water demands, sustaining industrial growth, supporting agricultural production, and preserving ecosystems on a global scale (Li et al. 2023). In arid and semi-arid regions, where surface water resources are scarce or unreliable, groundwater serves as an essential source of water for human survival and economic activities. Globally, groundwater accounts for nearly 30% of the world’s freshwater supply and supports over 2 billion people, making it indispensable for both current and future generations (Davamani et al. 2024). The significance of groundwater is further underscored by its role in mitigating the impacts of climate change on water availability, particularly in regions where climate-induced changes in precipitation patterns and increased evapotranspiration are expected to exacerbate water scarcity. Climate change poses significant challenges to groundwater recharge by altering precipitation patterns, increasing evapotranspiration, and intensifying droughts. These changes threaten the sustainability of groundwater resources and have profound consequences for critical and sensitive ecosystems (Loucks 2021). Additionally, groundwater over-extraction in response to declining surface water availability exacerbates the depletion of this vital resource, posing severe risks not only to ecological stability but also to human livelihoods and food security in affected regions.

These global challenges are particularly acute in Morocco, situated in North Africa, where water scarcity and the impacts of climate change are severe (Naghibi et al. 2015). Morocco faces inadequate water availability during critical periods, despite having underground reservoirs theoretically capable of meeting agricultural, industrial, and domestic needs (Alley 2009). However, these reservoirs are increasingly vulnerable to overexploitation and insufficient recharge, a situation exacerbated by the impacts of climate change (Swain et al. 2022). The consequences of water scarcity have become increasingly evident in Morocco, particularly in regions such as Rabat-Salé-Kénitra, where the pressures on groundwater resources continue to escalate. These challenges underscore the urgent need for advanced and sustainable groundwater management practices in Morocco, especially in light of the region’s growing population and economic demands.

Traditionally, detecting groundwater has involved costly and time-consuming fieldwork, such as borehole drilling and well construction, which provide limited spatial insights (Wang et al. 2018; Anomohanran et al. 2021). In response to these challenges, researchers are increasingly turning to advanced techniques, including artificial intelligence, machine learning models, and remote sensing, as promising alternatives to traditional methods (Tao et al. 2022). These approaches not only offer broader spatial coverage but also enable the integration of diverse datasets, thereby improving the accuracy and efficiency of groundwater predictions. By leveraging data from GRACE and MODIS satellites, researchers can derive environmental variables such as land surface temperature, soil moisture, terrestrial water storage, precipitation, and vegetation indices. The convergence of remote sensing and machine learning has revolutionized our understanding of groundwater resources, allowing for enhanced integration of ground observations and satellite data into comprehensive databases, which improves global insights into groundwater dynamics (Rodell et al. 2009; Rateb et al. 2020).

Literature review and gap analysis

Groundwater is a critical resource for sustaining water supply, particularly in arid and semi-arid regions where surface water resources are limited or unreliable. In these regions, groundwater acts as a lifeline, supporting agricultural production, industrial operations, and domestic consumption, thereby playing a pivotal role in socioeconomic development. The sustainable management of groundwater is essential not only for meeting current water demands but also for ensuring the long-term resilience of ecosystems and communities. With the ongoing challenges posed by climate change, including more frequent droughts and shifts in precipitation patterns, the reliance on groundwater in arid regions is expected to increase, making its careful management a matter of urgent priority. Recent studies underscore the need for integrated water resource management approaches that prioritize groundwater sustainability to mitigate the risks associated with water scarcity and ecosystem degradation (Li et al. 2023; Noori et al. 2023; Sarkar et al. 2024; Yi et al. 2024). However, groundwater systems in these regions face increasing pressure due to climate change, which exacerbates water scarcity by significantly altering precipitation patterns, increasing the frequency and intensity of droughts, and elevating evapotranspiration rates. These changes disrupt the natural replenishment of aquifers, leading to decreased groundwater levels and heightened vulnerability of water resources. In regions already facing water scarcity, such as arid and semi-arid areas, the compounded effects of these climate-driven alterations can result in severe and prolonged water shortages, threatening both ecological stability and human water security (Noori et al. 2023; Mahdian et al. 2024). The consequences of these changes are particularly severe for ecosystems that rely on consistent groundwater recharge. Reduced recharge rates can lead to the desiccation of wetlands, the decline of groundwater-fed streams, and the loss of critical habitats that support a wide range of biodiversity. These impacts not only threaten the survival of numerous plant and animal species but also disrupt ecosystem services that are vital for human livelihoods, such as water purification, flood regulation, and climate stabilization. The degradation of these ecosystems can trigger a cascade of negative effects, further exacerbating the vulnerability of both natural and human systems to climate change (Mahdian et al. 2024).

Recent studies have highlighted significant declines in groundwater recharge across various regions, including Iran, where groundwater depletion has reached alarming levels. Noori et al. (2023) demonstrated that Iran’s groundwater recharge has decreased by approximately 3.8 mm/yr nationwide, primarily due to unsustainable water management practices compounded by climatic changes. Similarly, in Bangladesh, future groundwater potential mapping under various climate change scenarios has revealed substantial regional variability, emphasizing the importance of adaptive water management strategies (Sarkar et al. 2024). Yi et al. (2024) also investigated the impacts of climate variability on groundwater levels in South Korea, highlighting the need for advanced predictive models to mitigate the adverse effects of these changes.

Despite these advancements in understanding the challenges facing groundwater resources, significant gaps remain in the literature, particularly concerning the integration of machine learning models with diverse environmental datasets. Traditional methods, while valuable, often fall short in capturing the complex and non-linear relationships inherent in groundwater systems. For instance, Naghibi et al. (2015) employed basic regression models for groundwater level prediction, but these models exhibited limitations in handling the intricate interplay between multiple environmental variables, leading to reduced predictive accuracy.

Furthermore, Anomohanran et al. (2021) highlighted the constraints of traditional field methods, such as the labor-intensive nature of data collection and the challenges associated with large-scale or real-time monitoring. These methods, although accurate in localized studies, are not scalable for broader applications, particularly in regions with diverse hydrogeological conditions.

Recent advancements have explored the potential of machine learning techniques, offering promising alternatives to traditional methods. Tao et al. (2022) demonstrated the effectiveness of Support Vector SVR) and Random Forest (RF) models in predicting groundwater levels, leveraging remote sensing data to enhance model inputs. However, these studies frequently encounter challenges related to data integration, where inconsistencies between datasets can hinder model performance and generalization.

Moreover, while Rodell et al. (2009) and Rateb et al. (2020) underscored the critical importance of integrating diverse datasets such as satellite-based observations, climate data, and in-situ measurements to improve the robustness of predictions, a significant gap persists in the literature. Specifically, there is a lack of comprehensive approaches that combine multiple machine learning models with large-scale, high-resolution environmental data to improve prediction accuracy and reliability.

This study aims to address these gaps by integrating advanced machine learning models with extensive remote sensing and environmental datasets to develop a robust framework for predicting groundwater levels in the Rabat-Kenitra region.

Objectives and novelty of the study

This study aims to address the gaps in the current research by:

Integrating Diverse Environmental Variables: Utilizing data from GRACE and MODIS satellites to include a broad range of environmental variables such as land surface temperature, soil moisture, terrestrial water storage, precipitation, and vegetation indices.

Applying Advanced Machine Learning Models: Employing a range of machine learning models, including Gradient Boosting Regression (GBR), SVR, RF, and Decision Tree (DT) models, to handle complex and non-linear relationships between variables.

Focusing on a Region-Specific Case Study: Enhancing the prediction of groundwater levels specifically in the Rabat-Salé-Kénitra region, providing valuable insights and data-driven recommendations for sustainable groundwater management.

The novelty of this work lies in its comprehensive approach to integrating diverse environmental datasets with advanced machine learning techniques, which has not been extensively explored in the context of groundwater level prediction. By combining these methods, this research aims to improve the accuracy and reliability of groundwater predictions, contributing to more effective water resource management strategies.

Geological and hydrological characteristics of the Rharb Aquifer

The Rharb Aquifer, a crucial water source in the Rabat-Kénitra region, is characterized by its complex geology and hydrology. It comprises a sequence of alluvial deposits and permeable sandstone formations interbedded with less permeable clay layers (Rimi et al. 2006). The primary recharge processes involve infiltration from precipitation and lateral inflow from the surrounding Gharb Basin margins. The aquifer system is also influenced by the geomorphology of the region, including river networks that contribute to both recharge and discharge dynamics.

Historical data on groundwater levels and usage trends

Historical data indicates that groundwater levels in the Rharb Aquifer have experienced significant fluctuations over the past decades. These fluctuations show a general decline in groundwater levels, attributed to increased extraction for agricultural and urban use (El Haouari & Khattabi 2012). The region has also seen rising groundwater demand driven by population growth and agricultural expansion, leading to over-extraction and subsequent concerns about sustainability.

Challenges in groundwater management in the Rabat-Kénitra region

The Rabat-Kénitra region faces several challenges related to groundwater management:

Over-Extraction: Intensive agricultural practices and urban development have led to excessive groundwater extraction, surpassing natural recharge rates. Contamination: Agricultural runoff and industrial pollutants pose risks to groundwater quality. Climate Change: Variability in precipitation patterns and increasing temperatures impact recharge rates and groundwater availability. Socioeconomic Impacts: Water scarcity affects local communities, leading to conflicts over resource allocation and necessitating robust management policies.

Methodology overview

In this study, we employ a variety of machine learning models, including GBR, SVR, RF, and DT models. These models are trained using environmental variables derived from GRACE and MODIS satellite data. The reasons for choosing these methods include their robustness in handling complex and non-linear relationships, their ability to integrate diverse datasets, and their proven effectiveness in previous studies. The detailed methodology and performance evaluation of these models are discussed in the subsequent sections.

This section provides comprehensive information on the research area, data sources, modeling techniques, and assessment measures. The methodological approach employed in this research is time series regression using machine learning techniques (Aderemi et al. 2023). The performance of these models is evaluated using estimated coefficients such as Root Mean Squared Error (RMSE), R-squared, Mean Squared Error (MSE), and Mean Absolute Error (MAE).

Study area

The Rabat-Salé-Kénitra region, established in 2015 as one of the 12 regions of Morocco, includes the nation’s capital. This region resulted from the merger of the former regions of Gharb-Chrarda-Beni-Hssen and Rabat-Salé-Zemmour-Zaer. It is geographically bounded by Tanger-Tetouan-Al Hoceima to the north, Fès-Meknès to the east, Casablanca-Settat to the south, Beni-Mellal-Khénifra to the southeast, and the Atlantic Ocean to the west. The region spans a total area of 17,570 km2, accounting for 2.5% of Morocco’s total area (hcp 2012).

The climate in Rabat-Salé-Kénitra is Mediterranean, characterized by chilly and wet winters, with average temperatures ranging from 0 to 5 degrees Celsius at night and reaching up to 17 degrees Celsius during the day. Summers are marked by an oceanic climate, with nights cooled by ocean moisture and daytime temperatures around 30 degrees Celsius. During spring and summer, the ‘Chergui’ desert winds can occasionally raise temperatures to 40 degrees Celsius for a few days (Hakimi & Brech 2021).

Figure 1 presents a detailed map of the Rabat-Salé-Kénitra region, emphasizing its geographical context within Morocco. The map illustrates the region’s location along the Atlantic coast and its proximity to the Mediterranean Sea. The legend provides clear explanations of the various administrative divisions and boundaries, aiding in the straightforward interpretation of the spatial information.
Figure 1

Detailed map of the Rharb Aquifer showing key geographical and administrative boundaries.

Figure 1

Detailed map of the Rharb Aquifer showing key geographical and administrative boundaries.

Close modal

Aquifers in the Rabat-Salé-Kénitra region

The region’s groundwater resources are sustained by several key aquifers, each with distinct geological and hydrological characteristics:

Maamora Aquifer: This extensive aquifer, primarily composed of Quaternary sands and limestones, serves as a crucial water reservoir with substantial storage capacity. It supports both agricultural and urban water needs, highlighting its importance for the region’s water supply (Zouhri 2001).

Bouregreg Coastal Aquifer: Located along the Atlantic coast, this aquifer is significantly influenced by marine conditions. It provides essential water resources to coastal cities such as Rabat and Salé, playing a vital role in supporting urban populations and industrial activities (Mahdaoui et al. 2024).

Gharb Aquifer: Situated in the Gharb plain, this aquifer consists of highly permeable alluvial deposits ideal for irrigation. It is extensively utilized for agricultural purposes, significantly contributing to the region’s agricultural productivity (Bouita Et Al 2021).

Tiflet Aquifer: Located inland from the coastal areas, the Tiflet Aquifer is characterized by sandstone and clay formations. It is critical for supplying water to rural communities, thereby enhancing the overall groundwater resources of the region (Faqihi et al. 2020).

Challenges and conservation efforts

Despite their critical importance, the aquifers in the Rabat-Salé-Kénitra region face several significant challenges. Overexploitation is a primary concern, as intensive agricultural and urban demands have led to groundwater extraction rates that exceed natural replenishment rates, resulting in depletion. Pollution also poses a substantial threat to groundwater quality, with contamination stemming from agricultural runoff, industrial effluents, and urban activities. Additionally, climate change exacerbates these issues by altering precipitation patterns and increasing temperatures, which in turn affect aquifer recharge rates and overall water availability.

To ensure the long-term sustainability of groundwater resources in the Rabat-Salé-Kénitra region, it is imperative to implement sustainable management practices. This includes robust monitoring and regulation of groundwater extraction to prevent overuse. Promoting water-efficient technologies in both agriculture and industry is essential to reduce water consumption. Furthermore, adopting integrated water resource management strategies that recognize the interconnectedness of surface water and groundwater systems is crucial for maintaining the balance and health of the region’s water resources.

Choice of the study area

The Rabat-Salé-Kénitra region is recognized for its significant groundwater reserves, with two primary aquifers of particular importance:

The Gharb Aquifer: Covering an area of 390 km2, the Gharb aquifer boasts 126 million cubic meters (Mm3) of renewable water resources annually. This aquifer maintains a generally good water balance due to substantial recharge from precipitation and infiltration from the margins of the Gharb basin, highlighting its regional hydrogeological relevance (noa 2022).

The Maâmora Aquifer: Spanning approximately 4,000 km2, the Maâmora aquifer is characterized as an unconfined aquifer, primarily recharged by rainfall infiltration. It serves as a substantial reservoir, providing 134 Mm3 of renewable water resources annually (noa 2022).

Figure 2 illustrates a comprehensive machine learning workflow, detailing each step from data preprocessing to model deployment. The process begins with data collection, where the training set is gathered. The subsequent step, data preprocessing, involves cleaning and normalizing the data to ensure it is ready for analysis.
Figure 2

A comprehensive machine learning workflow: from data preprocessing to model deployment.

Figure 2

A comprehensive machine learning workflow: from data preprocessing to model deployment.

Close modal

During the training validation phase, analysts assess how well different models have learned from their training sets by evaluating their performance on separate validation sets. This step allows for model adjustments and the exploration of alternative techniques. Through repeated testing and refinement, model improvements are achieved.

Once the best-performing model is identified, it undergoes evaluation with new data, referred to as the testing set. This case study utilizes four types of models: SVR, Random Forest, Decision Trees, and Gradient Boosting. Following rigorous testing, the final models are selected and deployed, culminating in their real-world application as depicted in the final part of the flowchart labeled Applied Model.

Datasets and parameters

Nature and sources of data

The daily data used in this study were collected from nine different sites in the Rabat-Salé-Kénitra region. The study utilized satellite data from the Moderate Resolution Imaging Spectroradiometer (MODIS) to evaluate a range of environmental factors in the region. Data were obtained from the NASA Earth Observing System Data and Information System (EOSDIS) data portal, covering a period from January 2010 to December 2022. The environmental factors analyzed include average surface temperature (AvgSurfT-tavg), groundwater storage (GWS-tavg), soil moisture (SoilMoist-S-tavg), terrestrial water storage (TWS-tavg), elevation, soil type, land surface temperature (LST), evapotranspiration (evap-Tr), normalized difference vegetation index (NDVI), and precipitation. The analysis also utilized GRACE data.

Groundwater storage

Groundwater storage represents the total volume of water stored underground in aquifers, averaged over a specific period. Data sourced from GRACE-DA, available from February 24, 2000, to February 17, 2023. Declines in groundwater storage reflect reductions in available groundwater, often due to excessive extraction for agricultural, industrial, and domestic purposes. Sustained over-extraction can lead to significant drops in groundwater levels, affecting water availability and quality (Rodell et al. 2009).

Precipitation

Precipitation is any form of water, liquid or solid, that falls from the atmosphere and reaches the ground. Data sourced from CHIRPS, available from February 24, 2000, to May 5, 2023. Precipitation is the primary source of groundwater recharge. Reduced precipitation can lead to lower groundwater levels, especially in regions that rely heavily on rainfall to replenish aquifers (Taylor et al. 2013).

Average surface temperature

Average surface temperature refers to the mean temperature recorded at the earth’s surface over a specified period. Data sourced from GRACE-DA, available from February 24, 2000, to February 17, 2023. Increased surface temperatures can lead to higher rates of evaporation, reducing the amount of water available to infiltrate into the ground and recharge aquifers. This reduction in infiltration can significantly exacerbate groundwater depletion, especially in arid and semi-arid regions (Kundzewicz et al. 2007).

Soil moisture

Soil moisture refers to the amount of water present in the soil, averaged over a specific period. Data sourced from GRACE-DA, available from February 24, 2000, to May 5, 2023. Low soil moisture can reduce the amount of water that percolates down to recharge aquifers. During periods of drought, reduced soil moisture can lead to increased reliance on groundwater for irrigation, further depleting groundwater resources (Döll 2009).

Land surface temperature

Land surface temperature is the temperature of the land surface as measured by remote sensing instruments. Data sourced from MODIS, available from February 24, 2000, to May 5, 2023. High land surface temperatures can increase evapotranspiration rates, reducing soil moisture and groundwater recharge. This can lead to a more rapid depletion of groundwater reserves, especially in hot climates (Wang & Dickinson 2012).

Evapotranspiration

Evapotranspiration is the sum of evaporation from the land surface and transpiration from plants. Data sourced from GRACE-DA, available from February 24, 2000, to May 5, 2023. High evapotranspiration rates can significantly reduce the amount of water available for groundwater recharge. In agricultural areas, high evapotranspiration can lead to increased irrigation demands, further stressing groundwater resources (Wada et al. 2010).

NDVI (Normalized Difference Vegetation Index)

The NDVI is a measure of vegetation health and density, calculated using the formula:
(1)
where NIR is the near-infrared light reflected by vegetation and Red is the visible red light reflected by vegetation. Data sourced from MODIS, available from February 24, 2000, to February 17, 2023. Higher NDVI values indicate healthier and denser vegetation, which can influence groundwater recharge through processes like transpiration and root water uptake (Pettorelli et al. 2005).

Statistical analysis of dataset parameters

To provide a comprehensive overview of the dataset characteristics, we performed a statistical analysis of the key environmental parameters. Table 1 presents the mean, standard deviation (Std), minimum (Min), median (50%), and maximum (Max) values for each parameter.

This statistical analysis provides a clear understanding of the data distribution and variability for each environmental factor, which is crucial for interpreting the modeling results and understanding the influence of these factors on groundwater levels.

Monitoring dataset

The monitoring dataset for this study was obtained from various sources, including NASA’s GRACE and MODIS satellites, and Google Earth. The dataset includes environmental variables such as land surface temperature, soil moisture, terrestrial water storage, precipitation, and vegetation indices. These data points were preprocessed through extraction, cleaning, and normalization before being used as predictors in the machine learning models. The use of Google Earth provided additional geospatial data to complement the satellite-derived variables.

Table 2 presents a comprehensive overview of environmental data sources and their availability for the Rabat-Salé-Kénitra region in Morocco. It includes various environmental factors such as average surface temperature, groundwater storage, soil moisture, terrestrial water storage, and others, along with their data sources, specific bands, and the periods for which the data is available.

Challenges and conservation efforts

Despite their critical importance, the aquifers in the Rabat-Salé-Kénitra region face several significant challenges. Overexploitation is a primary concern, as intensive agricultural and urban demands have led to groundwater extraction rates that exceed natural replenishment rates, resulting in depletion. Pollution also poses a substantial threat to groundwater quality, with contamination stemming from agricultural runoff, industrial effluents, and urban activities. Additionally, climate change exacerbates these issues by altering precipitation patterns and increasing temperatures, which in turn affect aquifer recharge rates and overall water availability.

Calibration of machine learning models

The calibration of the machine learning models was performed using a combination of various metrics and hyperparameter tuning. Each model – SVR, DT, RF, and GBR – was trained and evaluated using metrics such as R-squared (R2), MAE, MSE, and RMSE. The process involved:

  1. Model Training and Evaluation: The models were trained on the training dataset and evaluated on the test dataset. The performance of each model was measured using the aforementioned metrics.

  2. Hyperparameter Tuning: The models underwent hyperparameter tuning to optimize their performance. For instance, the Gradient Boosting Regression model was fine-tuned by running a grid search with cross-validation, optimizing hyperparameters such as learning rate, number of estimators, and maximum tree depth.

Data processing

One critical factor in determining the accuracy of predictions when using machine learning models is the proper preparation of input data. The preprocessing and normalization of variables can significantly impact the quality of insights gained, as some features may inherently hold more weight than others (Yu et al. 2018; Guzman et al. 2019; Mohammadi 2019). In this study, we examined 42,000 readings collected from multiple sites, all of which were thoroughly checked and cleaned prior to analysis. This dataset included up-to-date measurements of water levels and various other environmental factors, recorded daily.

To handle and analyze this extensive dataset, we utilized Python 3.10.11 and several essential libraries, including NumPy, SciPy, pandas, and Matplotlib. For machine learning tasks, such as SVR, DTs, and RFs, we employed the scikit-learn library. This library facilitated not only model fitting but also efficient cross-validation, a process that can be computationally expensive due to the repeated splitting of the dataset.

To effectively train and test our models without overfitting to the training data, we divided the dataset into two subsets: 70% for training and 30% for testing and validation (Davis et al. 1999). The concept of ‘generalization’ is crucial in this context; a well-trained model should perform equally well on both the training and testing datasets. Performance evaluation involves training the model on the training set (70% of the data) and then assessing its learning on the remaining 30%. To further fine-tune the models and avoid overfitting, researchers often employ a validation set drawn from the testing data. This additional step ensures that the model does not merely memorize the training data but can generalize its learning to new, unseen data.

Selection of the groundwater conditioning factors

In this study, we analyzed various environmental and meteorological factors that could potentially influence groundwater levels. The factors considered were: average surface temperature, soil moisture, terrestrial water storage, altitude, soil type, land surface temperature, evapotranspiration (the combined process of evaporation and plant transpiration), normalized difference vegetation index (NDVI) as an indicator of plant health, and precipitation. For each of these factors, we calculated the correlation coefficient with groundwater storage. Factors with low correlation values were excluded from further analysis. This approach enabled the identification of the most relevant features for estimating groundwater levels. Utilizing correlation analysis in this manner enhances the transparency and reliability of the study by providing well-founded justifications for the inclusion or exclusion of specific factors (Smith & Vasishth 2020).

Figure 3 illustrates the correlation coefficients (ranging from -0.4 to 1.0) depicting the relationships between various environmental variables and groundwater levels. The vertical axis enumerates the different variables, while the horizontal axis quantifies the strength of their correlations. The color-coding employed in the figure differentiates between positive and negative correlations, with red indicating negative correlations and blue indicating positive correlations.
Figure 3

Correlation of environmental factors with groundwater levels.

Figure 3

Correlation of environmental factors with groundwater levels.

Close modal

Based on Figure 3, the following key observations can be made: Total terrestrial water storage exhibits the strongest positive correlations with groundwater levels, as indicated by the dark blue bars. Elevation and soil type also display moderately positive relationships with groundwater. In contrast, variables such as precipitation, evapotranspiration, and vegetation indices demonstrate negative correlations with groundwater levels. Additionally, land surface temperature and elevation show relatively strong negative correlations with groundwater levels.

This correlation analysis provides valuable insights into the complex interrelationships between various environmental factors and groundwater dynamics, which can be instrumental for groundwater management and decision-making processes.

Supplementary material, Figure A.11 illustrates the correlation between average surface temperature and groundwater levels, highlighting the potential influence of temperature on groundwater dynamics.

Supplementary material, Figure A.12 illustrates the relationship between total water storage (TWS) derived from GRACE satellite data and groundwater levels, suggesting that variations in terrestrial water storage can serve as a critical predictor for fluctuations in groundwater levels.

Supplementary material, Figure A.13 demonstrates the correlation between soil moisture, surface air temperature, and groundwater levels, emphasizing the potential influence of soil moisture and temperature conditions on groundwater recharge and depletion processes.

The correlations between groundwater levels and various factors are depicted in Supplementary material, Figures A.11, A.12, and A.13. These figures indicate that average surface temperature may impact groundwater dynamics, changes in terrestrial water storage (from GRACE satellite data) are significant predictors for groundwater level fluctuations, and soil moisture levels and temperatures could influence the recharge and depletion of aquifers. Figure 3 presents a matrix illustrating all these relationships, aiding in the identification of key factors to consider when developing our AI models.

Correlation analysis methodology

In the methodology section, it was mentioned that correlation analysis was performed to identify the most relevant predictors. The specific statistical methods or techniques used for this analysis included: The Pearson Correlation Coefficient measures the linear correlation between two variables, providing a value between -1 (perfect negative correlation) and 1 (perfect positive correlation). This coefficient was used to determine the strength and direction of the relationships between each environmental factor and groundwater storage. A bar visualization was generated to illustrate the correlation matrix, allowing for quick identification of strong and weak correlations through color coding. These techniques ensured that the selection of predictors was based on statistically significant relationships, thereby enhancing the robustness and reliability of the groundwater level predictions.

Models

n this study, four machine learning models were employed to address groundwater-related tasks, including modeling, prediction, and forecasting. The selected models, DT, RF, Gradient Boosting Regression (GBR), and SVR, were trained on a prepared training set. Each model’s performance was subsequently evaluated using a test set, with metrics such as MSE, MAE, and R-squared (R2) serving as the primary indicators of accuracy. Among these models, Gradient Boosting Regression demonstrated superior performance. This model was further optimized through hyperparameter tuning using a grid search with cross-validation. The final evaluation, conducted on an independent dataset, confirmed the robustness of the optimized GBR model, which consistently outperformed the other models in accuracy.

Support Vector Regression

SVR is a sophisticated machine learning technique designed for regression tasks, extending the principles of Support Vector Machines (SVM) to predict continuous values (Figure 4). The objective of SVR is to identify a function that maintains a deviation from the actual observed targets y within a margin of ε, also preserving the flatness of the function (Cortes & Vapnik 1995; Aghelpour et al. 2019).

A significant advantage of SVR is its flexibility in handling both linear and non-linear relationships between input variables and the target variable. This flexibility is achieved through the application of kernel functions, such as the Radial Basis Function (RBF), which enable the mapping of input features into high-dimensional space, thereby enhancing SVR’s performance with complex datasets (Chengcheng et al. 2021).

The optimization problem in SVR is mathematically structured to minimize the norm of the weight vector, under the constraint that the predicted values remain within a margin of ε from the actual values, with allowances made for outliers. This process is expressed mathematically as:
(2)
where and represent slack variables that quantify the degree of deviation from the ε-tube. Here, N denotes the total number of data points, and C is a regularization parameter that balances between the flatness of and the tolerable level of deviations exceeding ε (Smola & Schölkopf 2004).
The function in the context of SVR is expressed as:
(3)
where and are Lagrange multipliers, is the kernel function representing the similarity between input vectors and x, and b is the bias term.
  • - Strengths:

    • Flexibility: SVR’s capability to handle both linear and non-linear data using kernel functions makes it highly adaptable for various types of datasets (Chengcheng et al. 2021).

    • Robustness: SVR remains effective in high-dimensional spaces and is robust even when the number of dimensions exceeds the number of samples (Kawashima & Kumano 2017).

    • Generalization: By maximizing the margin and minimizing prediction errors, SVR aims to enhance the generalization capability of the model, which is crucial for reliable predictions in diverse applications (Smola & Schölkopf 2004).

  • - Weaknesses:

    • Computational Complexity: The training process of SVR can be computationally demanding, particularly with large datasets, due to the quadratic programming problem it addresses (Ben-Hur & Weston 2010).

    • Choice of Hyperparameters: SVR’s performance is highly sensitive to the selection of hyperparameters (e.g., regularization parameter C, kernel type, and kernel parameters), which can be challenging to tune optimally (Chang & Lin 2011).

    • Sensitivity to Outliers: Despite incorporating slack variables to manage outliers, SVR can still be affected by extreme values, which may significantly influence the regression hyperplane (Zhou 2014).

    • Interpretability: The use of kernel functions and high-dimensional space transformations reduces the interpretability of SVR models compared to simpler linear models (Gunn 1998).

SVR is recognized for its robust performance in handling non-linear and high-dimensional data, making it a strong candidate for complex regression tasks (Kawashima & Kumano 2017; Lan & Xiao 2023; Habib et al. 2024). Nonetheless, the method’s computational intensity, coupled with its sensitivity to hyperparameter tuning and outliers, are significant limitations that need careful consideration.

Gradient Boosting Regression (GBR)

GBR is a sophisticated ensemble learning technique that excels in predictive modeling by aggregating the contributions of multiple weak learners, typically decision trees. Unlike traditional regression approaches, GBR operates by sequentially fitting trees to the residuals or errors of previous models (Figure 5), effectively enhancing the model’s accuracy at each iteration. This method allows GBR to manage complex, non-linear relationships within the data, making it particularly useful for tasks where traditional linear models fall short (Chen & Guestrin 2016).

The iterative optimization process inherent to GBR is geared towards minimizing a specified loss function by progressively introducing models that target the residual errors of prior iterations. This strategy is crucial for improving predictive performance, especially in scenarios involving complex, non-linear data distributions. Furthermore, GBR’s ability to focus on the most challenging predictions during each iteration helps mitigate bias, ultimately producing a highly accurate and robust model (Awan et al. 2024).
(4)
where J is the objective function to be minimized, represents the loss function measuring the difference between actual targets yi and predicted targets , and is the regularization term that controls model complexity.
During each iteration, GBR adds a new tree to the model to minimize the loss function:
(5)
where is the updated model, is the previous model, ν is the learning rate, and is the new tree added to correct the residuals.
  • - Strengths:

    • High Predictive Accuracy: GBR is renowned for its superior accuracy, achieved through iterative refinement of errors. This capability makes it highly effective in scenarios requiring precise predictions, particularly when dealing with complex, non-linear relationships in the data (Ahmadi et al. 2024).

    • Effective in Capturing Complex Relationships: GBR’s iterative approach enables it to capture intricate patterns and dependencies within the dataset that might be overlooked by simpler models. This strength is particularly valuable in environmental and hydrological modeling, where interactions between variables are often non-linear and complex (Sarkar et al. 2024).

    • Insight into Feature Importance: GBR models provide valuable insights into the importance of different features, allowing researchers to understand which variables contribute most significantly to the model’s predictions. This is crucial for developing targeted strategies in resource management and policy-making (Awan et al. 2024).

  • - Weaknesses:

    • Computational Intensity: The primary drawback of GBR lies in its computational demands. The iterative nature of the model, combined with the need for extensive hyperparameter tuning, can result in significant computational and time costs, especially when dealing with large datasets or high-dimensional feature spaces (Ahmadi et al. 2024).

    • Risk of Overfitting: While GBR is powerful, it is also prone to overfitting, particularly if the model is not properly regularized or if the number of boosting iterations is too high. This risk necessitates careful cross-validation and hyperparameter tuning to ensure the model generalizes well to unseen data (Sarkar et al. 2024).

    • Complexity and Interpretability: As GBR models become more complex with additional iterations and trees, they can become less interpretable. This complexity can pose challenges in explaining the model’s predictions to stakeholders or in understanding the underlying processes being modeled (Noori et al. 2023).

Random Forest

RF is a highly robust ensemble learning method that leverages the power of multiple decision trees to enhance predictive accuracy and model stability (Figure 5). Unlike single decision tree models, which can be prone to overfitting, RF mitigates this risk by averaging the predictions of numerous trees, each trained on a random subset of the data (Figure 6). This approach not only reduces variance but also enhances the model’s ability to generalize to new data, making it particularly well-suited for complex and high-dimensional datasets (Breiman 2001).

The operational framework of RF involves the construction of multiple decision trees, each trained on a different bootstrap sample of the original dataset, with a random subset of features considered for splitting at each node. This combination of bagging (bootstrap aggregating) and random feature selection introduces diversity among the trees, which in turn reduces the overall variance of the model. The final prediction is obtained by averaging the outputs of all trees (in the case of regression) or by taking the majority vote (in classification tasks), thus ensuring a robust and reliable model (Yi et al. 2024).
(6)
where is the predicted output for input , M is the total number of trees, and represents the prediction of the mth tree for input .
  • - Strengths:

    • Robustness to Overfitting: RF is particularly effective in mitigating overfitting, a common issue with single decision trees. The aggregation of multiple trees reduces the risk of the model capturing noise in the data, thereby enhancing its generalizability to new datasets (Breiman 2001).

    • Handling of High-Dimensional Data: RF excels in environments with large, complex datasets. Its ability to handle a vast number of input variables without overfitting makes it an ideal choice for applications in fields like remote sensing and environmental modeling, where data can be both high-dimensional and noisy (Yi et al. 2024).

    • Feature Importance Insights: One of the notable advantages of RF is its capacity to provide estimates of feature importance, allowing researchers to identify the most significant predictors in their models. This feature is particularly valuable in exploratory analysis and in scenarios where understanding the driving factors behind predictions is crucial (Probst et al. 2019).

  • - Weaknesses:

    • Computational Complexity: The main limitation of RF lies in its computational cost. The need to build and aggregate a large number of trees can be resource-intensive, both in terms of time and memory, particularly when working with very large datasets (Probst et al. 2019).

    • Reduced Interpretability: While RF provides reliable predictions, the model’s complexity, arising from the aggregation of numerous trees, can make it less interpretable compared to simpler models like single decision trees or linear regression (Hastie et al. 2009).

    • Bias-Variance Tradeoff: Although RF reduces variance by averaging multiple trees, it can introduce bias, especially when the individual trees are not deep enough to capture the complexities of the data. This tradeoff must be carefully managed to optimize the model’s performance (Al-Bayati & Alwan 2022).

Decision Tree

DTs are one of the most widely used and intuitive machine learning techniques, applicable to both classification and regression tasks. They are particularly valued for their simplicity and the clarity with which they visualize the decision-making process. This makes DTs an excellent choice for tasks where interpretability and transparency are crucial, such as in policy-making or educational contexts. Moreover, DTs are well-suited for handling complex, non-linear relationships between input features and the target variable, making them versatile tools for a wide range of applications (Safavian & Landgrebe 1991).

Mathematically, a decision tree is constructed by recursively partitioning the feature space into disjoint regions. Let represent the feature matrix, and represent the target variable. The decision tree model divides the feature space into regions as follows:
(7)
At each decision node t, the model selects a feature and a threshold θ that maximizes the reduction in impurity. This can be represented by the impurity measure J, calculated as:
(8)
The impurity measure J varies depending on the task: for classification, it might be Gini impurity or entropy; for regression, it typically involves the MSE or another suitable loss function.
  • - Strengths:

    • Simplicity and Interpretability: DTs are straightforward to understand and interpret, making them accessible tools for non-experts and suitable for scenarios where model transparency is crucial (Safavian & Landgrebe 1991).

    • Efficiency: DTs are computationally efficient, particularly when applied to large datasets, as they require less processing power compared to more complex models (Quinlan 1986).

    • Capability to Handle Non-linear Relationships: DTs are capable of capturing non-linear relationships between input features and the target variable, which makes them effective in complex prediction tasks where these relationships are not straightforward (Breiman et al. 1984).

    • Handling of Missing Data: DTs can effectively handle missing values in the dataset, providing robustness in scenarios where data completeness is an issue (Quinlan 1986).

  • - Weaknesses:

    • Prone to Overfitting: A major drawback of DTs is their tendency to overfit, especially when the trees are deep or when the model is trained on noisy data. Overfitting occurs when the model becomes too complex and starts capturing noise instead of the underlying data patterns (Breiman et al. 1984).

    • Instability: DTs can be unstable because small variations in the training data might lead to completely different trees being generated. This sensitivity to data variations can result in inconsistent predictions (Breiman et al. 1984).

    • Bias Towards Certain Features: DTs may exhibit bias towards features with more levels or categories, potentially skewing the model’s interpretation of feature importance (Safavian & Landgrebe 1991).

    • Lack of Smooth Predictions: In regression tasks, DT predictions are piecewise constant, leading to a lack of smoothness in the predicted values compared to other regression methods, such as linear regression or SVR (Breiman et al. 1984).

Summary of pros and cons of machine learning methods

To provide a comprehensive understanding of the strengths and limitations of each machine learning model employed in this study, the following Table 3 presents a detailed summary of the key advantages and disadvantages associated with the models utilized, supported by recent studies:

The selection of an appropriate machine learning model for groundwater prediction involves balancing several factors, including predictive accuracy, computational efficiency, and model interpretability. The table above highlights the strengths and weaknesses of each model, offering insights into their suitability for different aspects of groundwater modeling.

GBR is noted for its high predictive accuracy, which it achieves by iteratively minimizing residual errors. This makes GBR particularly effective for capturing complex, non-linear relationships in environmental data, which are common in groundwater systems. However, this accuracy comes at the cost of increased computational demands and a higher risk of overfitting if the model’s hyperparameters are not carefully tuned. The complexity of GBR models can also make them less interpretable, posing challenges in scenarios where model transparency is required.

RF offers a robust alternative, particularly in its ability to prevent overfitting through the ensemble method of averaging multiple decision trees. RF is well-suited for handling large datasets and provides valuable feature importance metrics, aiding in model interpretation. Nevertheless, the computational cost associated with constructing numerous trees can be significant, and the model’s complexity may reduce its interpretability compared to simpler models.

SVR is highly flexible, capable of modeling both linear and non-linear relationships with high-dimensional data. Its strength lies in its generalization ability, which enhances predictive reliability. However, SVR is computationally intensive and sensitive to the choice of hyperparameters, which can complicate its application, particularly in large-scale datasets.

DTs are perhaps the simplest and most interpretable model, making them an attractive option for initial analyses or when transparency is a priority. DTs are computationally efficient and capable of capturing non-linear relationships. However, they are prone to overfitting, especially with deep trees, and can be unstable, with small changes in the data potentially leading to significantly different tree structures. Additionally, DTs may exhibit bias towards features with more levels, affecting the accuracy of the model’s predictions.

In conclusion, the choice of model should be guided by the specific needs of the groundwater prediction task at hand. GBR is recommended when high accuracy and the ability to model complex relationships are prioritized, despite the computational costs. RF is ideal for large datasets and scenarios where overfitting must be minimized, although it may require substantial computational resources. SVR is best suited for high-dimensional data with a focus on generalization, though it demands careful hyperparameter tuning. Finally, DTs are most appropriate when interpretability and computational efficiency are paramount, with the caveat that their susceptibility to overfitting and instability must be managed. By understanding these tradeoffs, researchers can select the most appropriate model, or combination of models, to optimize groundwater predictions.

Gradient boosting regression

Table 4 compares different models of Gradient Boosting Regression. Each model has been fitted with a different set of hyperparameters: for example, learning rate, number of estimators, maximum tree depth, or random state. The table shows how the models perform according to four metrics–R-squared value, MAE, MSE, and RMSE.

Table 1

Statistical analysis of dataset parameters

ParametersMeanStdMin50%MaxMedian
Average Surface Temperature 295.616 7.968 275.982 297.439 311.806 297.439 
Groundwater Storage 884.460 35.525 818.079 879.773 1025.321 879.773 
Soil Moisture 4.213 1.435 1.424 3.853 7.783 3.853 
Terrestrial Water Storage 1167.752 45.826 1090.147 1156.628 1362.260 1156.628 
Elevation 58.000 0.000 58.000 58.000 58.000 58.000 
Land Surface Temperature 32.770 9.120 6.250 35.150 53.210 35.150 
Evapotranspiration 0.010 0.007 −0.007 0.009 0.043 0.009 
Normalized Difference Vegetation Index 0.257 0.066 0.000 0.235 0.618 0.235 
Precipitation 0.240 1.936 0.000 0.000 61.666 0.000 
ParametersMeanStdMin50%MaxMedian
Average Surface Temperature 295.616 7.968 275.982 297.439 311.806 297.439 
Groundwater Storage 884.460 35.525 818.079 879.773 1025.321 879.773 
Soil Moisture 4.213 1.435 1.424 3.853 7.783 3.853 
Terrestrial Water Storage 1167.752 45.826 1090.147 1156.628 1362.260 1156.628 
Elevation 58.000 0.000 58.000 58.000 58.000 58.000 
Land Surface Temperature 32.770 9.120 6.250 35.150 53.210 35.150 
Evapotranspiration 0.010 0.007 −0.007 0.009 0.043 0.009 
Normalized Difference Vegetation Index 0.257 0.066 0.000 0.235 0.618 0.235 
Precipitation 0.240 1.936 0.000 0.000 61.666 0.000 
Table 2

Environmental data sources and availability for the Rabat-Salé-Kénitra region, Morocco

Environmental FactorsData SourceBandDataset Availability
Average Surface Temperature GRACE-DA AvgSurfT_tavg 2000-02-24 to 2023-02-17 
Groundwater Storage GRACE-DA GWS_tavg 2000-02-24 to 2023-02-17 
Soil Moisture GRACE-DA ssm 2000-02-24 to 2023-05-05 
Terrestrial Water Storage GRACE-DA TWS_tavg 2000-02-24 to 2023-05-05 
Elevation SRTM elevation 2000-02-24 to 2023-05-05 
Soil Type USDA system/[B0] [’b0’] — 
Land Surface Temperature MODIS LST_Day_1km 2000-02-24 to 2023-05-05 
Evapotranspiration GRACE-DA Evap_tavg 2000-02-24 to 2023-05-05 
Normalized Difference Vegetation Index MODIS NDVI 2000-02-24 to 2023-02-17 
Precipitation CHIRPS precipitation 2000-02-24 to 2023-05-05 
Environmental FactorsData SourceBandDataset Availability
Average Surface Temperature GRACE-DA AvgSurfT_tavg 2000-02-24 to 2023-02-17 
Groundwater Storage GRACE-DA GWS_tavg 2000-02-24 to 2023-02-17 
Soil Moisture GRACE-DA ssm 2000-02-24 to 2023-05-05 
Terrestrial Water Storage GRACE-DA TWS_tavg 2000-02-24 to 2023-05-05 
Elevation SRTM elevation 2000-02-24 to 2023-05-05 
Soil Type USDA system/[B0] [’b0’] — 
Land Surface Temperature MODIS LST_Day_1km 2000-02-24 to 2023-05-05 
Evapotranspiration GRACE-DA Evap_tavg 2000-02-24 to 2023-05-05 
Normalized Difference Vegetation Index MODIS NDVI 2000-02-24 to 2023-02-17 
Precipitation CHIRPS precipitation 2000-02-24 to 2023-05-05 
Table 3

Summary of pros and cons of machine learning methods

MethodProsCons
Gradient Boosting Regression (GBR) • Provides high predictive accuracy by iteratively minimizing residual errors (Ahmadi et al. 2024; Sarkar et al. 2024). • Computationally intensive and time-consuming, especially with large datasets (Ahmadi et al. 2024). 
 • Effectively captures complex, non-linear relationships in data (Sarkar et al. 2024). • Prone to overfitting, particularly if hyperparameters are not carefully tuned (Sarkar et al. 2024; Yi et al. 2024). 
 • Offers insights into feature importance, which can be beneficial for model interpretation and understanding (Awan et al. 2024). • Complexity of the model can hinder interpretability (Noori et al. 2023). 
Random Forest (RF) • Robust against overfitting due to the ensemble approach of averaging multiple decision trees (Breiman 2001). • Computationally expensive due to the need to construct a large number of decision trees (Probst et al. 2019; Yi et al. 2024). 
 • Capable of handling large datasets with high dimensionality effectively (Yi et al. 2024). • Reduced interpretability compared to simpler models, given the complexity of the ensemble approach (Hastie et al. 2009). 
 • Provides estimates of feature importance, aiding in model interpretation (Probst et al. 2019; Sarkar et al. 2024). • May introduce bias, especially in cases with highly correlated features (Al-Bayati & Alwan 2022). 
Support Vector Regression (SVR) • Flexible in handling both linear and non-linear data relationships through kernel functions (Chengcheng et al. 2021). • Computationally demanding, particularly when applied to large datasets (Ben-Hur & Weston 2010; Habib et al. 2024). 
 • Effective in high-dimensional spaces, robust even with many features (Kawashima & Kumano 2017; Habib et al. 2024). • Sensitive to the choice of hyperparameters and outliers, which can impact model performance (Zhou 2014; Lan & Xiao 2023). 
 • Generalizes well, thereby enhancing model reliability and predictive accuracy (Smola & Schölkopf 2004; Lan & Xiao 2023). • Reduced interpretability due to the use of kernel functions and transformations into high-dimensional space (Gunn 1998; Habib et al. 2024). 
Decision Tree (DT) • Simple and intuitive, making it easy to interpret and understand (Safavian & Landgrebe 1991; Yeganeh-Bakhtiary et al. 2023). • Prone to overfitting, especially with deep trees, which can reduce generalization to new data (Breiman et al. 1984). 
 • Computationally efficient, especially suitable for large datasets with minimal resources (Yeganeh-Bakhtiary et al. 2023). • Sensitive to small changes in the data, leading to instability and inconsistent predictions (Breiman et al. 1984). 
 • Capable of capturing non-linear relationships between input features and the target variable (Al-Bayati & Alwan 2022; Yeganeh-Bakhtiary et al. 2023). • Bias towards features with more levels can skew the model’s understanding of feature importance (Safavian & Landgrebe 1991; Yeganeh-Bakhtiary et al. 2023). 
MethodProsCons
Gradient Boosting Regression (GBR) • Provides high predictive accuracy by iteratively minimizing residual errors (Ahmadi et al. 2024; Sarkar et al. 2024). • Computationally intensive and time-consuming, especially with large datasets (Ahmadi et al. 2024). 
 • Effectively captures complex, non-linear relationships in data (Sarkar et al. 2024). • Prone to overfitting, particularly if hyperparameters are not carefully tuned (Sarkar et al. 2024; Yi et al. 2024). 
 • Offers insights into feature importance, which can be beneficial for model interpretation and understanding (Awan et al. 2024). • Complexity of the model can hinder interpretability (Noori et al. 2023). 
Random Forest (RF) • Robust against overfitting due to the ensemble approach of averaging multiple decision trees (Breiman 2001). • Computationally expensive due to the need to construct a large number of decision trees (Probst et al. 2019; Yi et al. 2024). 
 • Capable of handling large datasets with high dimensionality effectively (Yi et al. 2024). • Reduced interpretability compared to simpler models, given the complexity of the ensemble approach (Hastie et al. 2009). 
 • Provides estimates of feature importance, aiding in model interpretation (Probst et al. 2019; Sarkar et al. 2024). • May introduce bias, especially in cases with highly correlated features (Al-Bayati & Alwan 2022). 
Support Vector Regression (SVR) • Flexible in handling both linear and non-linear data relationships through kernel functions (Chengcheng et al. 2021). • Computationally demanding, particularly when applied to large datasets (Ben-Hur & Weston 2010; Habib et al. 2024). 
 • Effective in high-dimensional spaces, robust even with many features (Kawashima & Kumano 2017; Habib et al. 2024). • Sensitive to the choice of hyperparameters and outliers, which can impact model performance (Zhou 2014; Lan & Xiao 2023). 
 • Generalizes well, thereby enhancing model reliability and predictive accuracy (Smola & Schölkopf 2004; Lan & Xiao 2023). • Reduced interpretability due to the use of kernel functions and transformations into high-dimensional space (Gunn 1998; Habib et al. 2024). 
Decision Tree (DT) • Simple and intuitive, making it easy to interpret and understand (Safavian & Landgrebe 1991; Yeganeh-Bakhtiary et al. 2023). • Prone to overfitting, especially with deep trees, which can reduce generalization to new data (Breiman et al. 1984). 
 • Computationally efficient, especially suitable for large datasets with minimal resources (Yeganeh-Bakhtiary et al. 2023). • Sensitive to small changes in the data, leading to instability and inconsistent predictions (Breiman et al. 1984). 
 • Capable of capturing non-linear relationships between input features and the target variable (Al-Bayati & Alwan 2022; Yeganeh-Bakhtiary et al. 2023). • Bias towards features with more levels can skew the model’s understanding of feature importance (Safavian & Landgrebe 1991; Yeganeh-Bakhtiary et al. 2023). 
Table 4

Comparison of gradient boosting regression models with different hyperparameters

HyperparametersLearning RateN EstimatorsMax DepthRandom StateR squareMAEMSERMSE
0.1 42 75 42 0.95 7.56 22 52 
0.1 100 42 0.99 1.94 8.87 2.98 
0.5 500 42 0.97 6.2 38.46 3.2 
HyperparametersLearning RateN EstimatorsMax DepthRandom StateR squareMAEMSERMSE
0.1 42 75 42 0.95 7.56 22 52 
0.1 100 42 0.99 1.94 8.87 2.98 
0.5 500 42 0.97 6.2 38.46 3.2 

With an R-squared (R2) score of 0.99, the Gradient Boosting Regression model performed well, explaining 99% of the variation in the target variable. The model had an MAE of 1.94, which means that the predicted values were 1.94 units off from the actual values on average. This model’s MSE was 8.87, indicating very minor squared discrepancies between predicted and actual values. The RMSE was 2.98, demonstrating a low average magnitude of prediction errors on the target variable’s original scale.

Decision tree

In Table 5, we assess how well different decision tree models perform under various hyperparameter configurations. These configuration settings cover maximum depth, minimum samples per leaf, and the number of features considered when splitting. Our evaluation criteria consist of four statistics for each model: R-squared, MAE, MSE, and RMSE. While R-squared indicates the proportion of variance in the dependent variable explained by the model, MAE, MSE, and RMSE all speak to how accurately these models make predictions; lower values for RMSE indicate better performance (as do higher ones for R-squared).

Table 5

Comparison of Decision Tree (DT) models with different hyperparameters

Hyperparametersmax_depthmin_samples_leafmax_featuresR squareMAEMSERMSE
10 sqrt 0.80 255.58 255.58 10.7 
none 0.98 3.05 24.26 4.93 
15 10 log2 0.78 16.5 272.3 11.12 
Hyperparametersmax_depthmin_samples_leafmax_featuresR squareMAEMSERMSE
10 sqrt 0.80 255.58 255.58 10.7 
none 0.98 3.05 24.26 4.93 
15 10 log2 0.78 16.5 272.3 11.12 

With an R2 score of 0.98, the Decision Tree model explained 98% of the variation in the target variable. When compared to the Random Forest and Gradient Boosting Regression models, the MAE of 3.05 suggests a greater average deviation. The Decision Tree model’s RMSE score was 4.93, suggesting a somewhat greater average squared difference between predicted and actual values.

Random Forest

In Table 6, we have compared how random forest models perform under different hyperparameters. These include maximum tree depth, minimum samples found at leaf nodes, and the most features to consider when splitting. We then assess the models using various criteria: R-squared value plus MAE, MSE or RMSE. Such measures help understand both how well your chosen factors explain variation in the dependent variable – as indicated by higher R-squared scores – along with their overall accuracy; lower values of MAE, MSE, and RMSE suggest better-performing models.

Table 6

Random Forest model performance with different hyperparameters

Hyperparametersmax_depthmin_samples_leafmax_featuresR squareMAEMSERMSE
sqrt 0.79 16.38 268.47 11.55 
Auto 0.98 3.05 24.26 4.93 
log2 0.83 14.55 211.66 10.27 
Hyperparametersmax_depthmin_samples_leafmax_featuresR squareMAEMSERMSE
sqrt 0.79 16.38 268.47 11.55 
Auto 0.98 3.05 24.26 4.93 
log2 0.83 14.55 211.66 10.27 

The Random Forest model produced a high R2 value of 0.97, matching the Decision Tree model. The MAE of 4.05 indicates a reasonably minimal average deviation from the actual values. The MSE of 31.67 suggests moderate squared deviations between anticipated and actual values.

Support Vector Regression

In Table 7, we assess how three SVM models perform under different hyperparameters. These include the type of kernel, gamma value, and regularization parameter (C). Our evaluation uses various metrics: R2 score, MAE, MSE, and its square root, RMSE.

Table 7

Comparison of Support Vector Regressor (SVR) models with different hyperparameters

HyperparametersRegularization ParameterKernelGammaR squareMAEMSERMSE
0.1 Poly Scale 0.88 12.27 150.57 9.92 
0.2 Linear Auto 0.90 10.00 76.50 9.02 
0.1 RBF Auto 0.95 6.73 69.28 8.32 
HyperparametersRegularization ParameterKernelGammaR squareMAEMSERMSE
0.1 Poly Scale 0.88 12.27 150.57 9.92 
0.2 Linear Auto 0.90 10.00 76.50 9.02 
0.1 RBF Auto 0.95 6.73 69.28 8.32 

The R2 value of 0.95 produced by the SVR model indicates that it can account for 95% of the variation in the target variable. It did, however, have a higher MAE of 6.73, suggesting a greater average variance between anticipated and actual values. SVR had an MSE of 69.28, showing higher squared differences than the Gradient Boosting Regression model.

Hyperparameter tuning and model performance

The performance of four different AI models (Gradient Boosting Regression, SVR, Random Forest, and Decision Tree) was evaluated for groundwater level prediction using various metrics, including R-squared R2, MAE, MSE, and RMSE.

Table 8 displays that among all models used, Gradient Boosting for Regression performed the best based on three metrics: An R2 of 0.99 indicates an extremely good fit – in other words, there is little difference between predicted values and actual ones; when it came to making mistakes (with its forecasts), this algorithm also had the smallest Mean Squared Error (8.87) as well as Mean Absolute Errors (1.94).

Table 8

Performance comparison of regression models: Gradient Boosting, SVR, Random Forest, and Decision Tree

ModelsR squareMAEMSERMSE
GBR 0.99 1.94 8.87 2.98 
SVR 0.95 6.73 69.28 8.32 
RF 0.97 4.05 31.67 5.63 
DT 0.98 3.05 24.26 4.93 
ModelsR squareMAEMSERMSE
GBR 0.99 1.94 8.87 2.98 
SVR 0.95 6.73 69.28 8.32 
RF 0.97 4.05 31.67 5.63 
DT 0.98 3.05 24.26 4.93 
Table 9

Cross-validation performance metrics for regression models

ModelR-squaredMAEMSERMSE
GBR ± 0.0012 ± 0.1046 ± 1.6761 ± 0.3070 
SVR ± 0.0027 ± 0.1739 ± 17.8083 ± 2.6623 
RF ± 0.0023 ± 0.0268 ± 12.1126 ± 0.2027 
DT ± 0.0020 ± 0.1846 ± 6.9131 ± 0.3995 
ModelR-squaredMAEMSERMSE
GBR ± 0.0012 ± 0.1046 ± 1.6761 ± 0.3070 
SVR ± 0.0027 ± 0.1739 ± 17.8083 ± 2.6623 
RF ± 0.0023 ± 0.0268 ± 12.1126 ± 0.2027 
DT ± 0.0020 ± 0.1846 ± 6.9131 ± 0.3995 

As shown in Figure 7, the GBR model achieved the highest performance among the four machine learning models evaluated.
Figure 4

Schematic representation of the support vector regression methodology.

Figure 4

Schematic representation of the support vector regression methodology.

Close modal
Figure 5

Schematic representation of Gradient Boosting Regression methodology.

Figure 5

Schematic representation of Gradient Boosting Regression methodology.

Close modal
Figure 6

Schematic representation of Random Forest technique.

Figure 6

Schematic representation of Random Forest technique.

Close modal
Figure 7

Comparison of the performance obtained by the GB, SVR, RF, and DT models.

Figure 7

Comparison of the performance obtained by the GB, SVR, RF, and DT models.

Close modal

The figure presents a comparison of the R-squared, MAE, and RMSE values for the GBR, SVR, RF, and DT models.

The researchers performed extensive hyperparameter tuning for each of the models to optimize their performance. For the GBR model, they evaluated four different configurations with varying learning rates, number of estimators, and maximum depths. The best-performing GBR model had a learning rate of 0.1, 100 estimators, a maximum depth of 6, and a random state of 42. This model configuration achieved the highest R-squared of 0.99, the lowest MAE of 1.94, and the lowest RMSE of 2.98 compared to the other GBR settings.

The GBR model clearly outperformed the other models, with the highest R-squared value of 0.99, the lowest MAE of 1.94, and the lowest RMSE of 2.98. This indicates that the GBR model had the best fit between the predicted and observed groundwater levels, and the lowest average magnitude of errors in its predictions.

The DT model demonstrated the second-best performance, with an R-squared of 0.98, followed by the RF model with an R-squared of 0.97. Both the DT and RF models outperformed the SVR model, which had a lower R-squared of 0.95 and higher error values (MAE = 6.73, RMSE = 8.32).

The comprehensive hyperparameter tuning process and the visual comparison provided in the figure further reinforce the superior performance of the GBR model compared to the other machine learning techniques evaluated in this study. The clear separation in the performance metrics underscores the GBR model’s ability to accurately predict groundwater levels.

Cross-validation to improve model reliability

We performed cross-validation on the training set to assess the model’s reliability. The results of the cross-validation for each model are summarized in Table 9. The cross-validation results demonstrate that the GBR model consistently outperforms the other models across all evaluation metrics. The high R-squared value and low error metrics indicate the robustness and reliability of the GBR model in predicting groundwater levels. The low standard deviations in the cross-validation scores further emphasize the model’s stability and generalization capability.

The SVR model, while performing well, shows higher variability in its predictions, as indicated by the higher standard deviations in the cross-validation metrics. This suggests that the SVR model might be more sensitive to the specific training data subsets.

The RF and DT models also perform well, with the DT model showing slightly better performance in terms of MAE and RMSE compared to the RF model. However, both models exhibit higher error metrics compared to the GBR model, indicating that GBR is more suitable for this specific application.

Overall, the cross-validation results confirm the superior performance and reliability of the GBR model for groundwater level prediction, making it the preferred choice among the evaluated models.

The results demonstrate the effectiveness of AI models, particularly the GBR model, in predicting groundwater levels. High R2 values and low error metrics suggest that these models can accurately capture the complex relationships between groundwater levels and factors such as weather patterns, remote sensing data, and soil moisture content.

The superior performance of the GBR model can be attributed to its ability to handle non-linear relationships and its robustness to outliers and noise in the data. The ensemble nature of the model, which combines multiple weak learners (decision trees) into a strong predictor, contributes to its high accuracy and generalization capability.
  • Gradient Boosting Regression:Figure 8 illustrates how varying the number of estimators affects four performance metrics for the GBR model. As the number of estimators increases, the R2 value improves, indicating better model fit. The MSE and RMSE decrease, showing more accurate predictions. The MAE declines significantly up to 150–200 estimators and then levels off, suggesting the model benefits from more estimators.

  • Decision Tree and Random Forest Models: These models also showed promising results, indicating their potential for groundwater level prediction. They effectively capture non-linear relationships and automatically identify the most relevant features, making them suitable for modeling complex groundwater systems.

  • Support Vector Regression (SVR): While the SVR model achieved a reasonable R2 value, its higher error metrics compared to other models might be due to sensitivity to the choice of kernel function and hyperparameters, as well as potential overfitting on small or noisy datasets.

Figure 8

Model performance obtained for Gradient Boosting Regression.

Figure 8

Model performance obtained for Gradient Boosting Regression.

Close modal
Figure 9 shows that the SVR model captures 95% of data variance with an R2 value of 0.95, but there is room for improvement. The higher error metrics, such as an MAE of 6.73 and an MSE of 69.28, indicate moderate deviations from predictions. The RMSE of 8.32 further supports this observation, suggesting that the model struggles with outlier predictions.
Figure 9

Model performance obtained for SVR.

Figure 9

Model performance obtained for SVR.

Close modal

The correlation numbers provide insights into the relationships between groundwater levels and various factors. For instance, a strong relationship with terrestrial water storage (TWS) suggests that incorporating more satellite data, such as GRACE, could enhance predictions. Similar correlations with soil moisture and surface air temperatures highlight their significant roles alongside weather information in modeling groundwater levels.

In Figure 10, observed groundwater level (GWL) values are compared with those predicted by the GBR model across different time points. The close tracking of fluctuations in GWL by the model suggests it has learned the underlying relationships well. While occasional discrepancies exist, such as slight crossovers, the overall performance is impressive, implying accurate modeling of groundwater levels.
Figure 10

Observed vs predicted groundwater level by GBR.

Figure 10

Observed vs predicted groundwater level by GBR.

Close modal

However, it is important to acknowledge the limitations of this research. A key constraint is the exclusion of all geological factors influencing groundwater levels in the study region. Geology plays a crucial role in groundwater dynamics, and its omission may have impacted the model’s accuracy. Future studies should incorporate geological information to provide a more comprehensive understanding of groundwater behavior.

Despite these limitations, the results of this study are valuable for land management. Accurate knowledge of groundwater levels can inform decisions on crop selection, irrigation planning, and resource allocation, especially in regions with limited water supplies. Understanding these levels is essential for sustainable land use practices.

Furthermore, this research underscores the importance of precise and comprehensive environmental data. Reliable measurements of soil moisture, temperature, precipitation, and other factors can improve groundwater level predictions and assist in areas with insufficient data. This information is crucial for understanding the impact of human activities and natural events on groundwater supplies and their interactions with surface water systems.

Potential applications of groundwater level predictions in Moroccan water resource management, sustainable agriculture, and environmental decision-making

Water resource management

Groundwater level predictions can inform the development of comprehensive water management strategies at the regional or watershed level. Accurate forecasts of groundwater availability can help water authorities make informed decisions about water allocation, groundwater extraction limits, and infrastructure planning to ensure sustainable water supplies. For example, water managers could use the model outputs to determine safe yield limits for groundwater pumping and implement policies to restrict extraction in areas prone to depletion.

Sustainable agriculture

Farmers and agricultural agencies can utilize the groundwater level predictions to optimize irrigation scheduling and water usage. By knowing the expected groundwater levels, they can adjust irrigation practices to match crop water requirements, reducing over-extraction and improving the efficiency of water use. For instance, farmers could use the model outputs to determine the ideal timing and volumes of groundwater pumping for irrigation, ensuring crops receive the necessary water without depleting the aquifer.

The model outputs can help identify areas at risk of groundwater depletion, allowing farmers and policymakers to implement sustainable agricultural practices, such as crop selection, soil management, and precision irrigation techniques, to maintain long-term agricultural productivity. This information can guide the transition to water-efficient crops and the adoption of precision irrigation systems in vulnerable regions.

Groundwater level information can be incorporated into precision farming applications, enabling farmers to make data-driven decisions about irrigation, fertilizer application, and other agricultural inputs to maximize yields while minimizing environmental impacts. Farmers could integrate the model predictions into their digital farming platforms to optimize water use and improve the overall sustainability of their operations.

Water resource management

The model outputs can be used to assess the potential impacts of climate change, land use changes, or other environmental factors on groundwater resources, supporting the development of adaptive management strategies and environmental impact assessments. Researchers and environmental agencies can leverage the model predictions to evaluate the long-term sustainability of groundwater-dependent ecosystems and plan for mitigation or adaptation measures.

Comparative analysis with existing studies

The results of this study, particularly the performance of the GBR model in predicting groundwater levels, were evaluated against existing studies employing various machine learning techniques. This comparative analysis provides insights into the relative strengths and weaknesses of different modeling approaches in similar contexts.

Study 1: (Feng et al. 2024) explored the application of traditional and deep machine learning algorithms, including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), SVM, DT, RF, and Generative Adversarial Network (GAN) for groundwater level prediction in Izeh City, Iran. Their findings showed that the CNN model achieved the highest accuracy, with an RMSE of 0.0558 and an R2 of 0.9948. The high performance of CNN is attributed to its robustness against noise and its capability to handle large datasets. In comparison, the GBR model in our study achieved an R2 value of 0.99 and an RMSE of 2.98, indicating that traditional ensemble methods can still provide highly accurate predictions comparable to advanced deep learning models under certain conditions (Feng et al. 2024).

Study 2: (Ibrahem Ahmed Osman et al. 2021) conducted a study on groundwater level prediction in Selangor, Malaysia, using XGBoost, Artificial Neural Network (ANN), and SVM models. Their results indicated that the XGBoost model outperformed the others, with an RMSE of 0.138 and an R2 of 0.920. Although XGBoost demonstrated promising results, the GBR model in our study exhibited a higher R2 value, suggesting that ensemble methods like Gradient Boosting can sometimes offer superior performance in capturing complex relationships in groundwater data (Ibrahem Ahmed Osman et al. 2021).

Study 3: The study’s approach to groundwater level prediction using GBR aligns with existing literature, offering unique contributions relevant to regions with limited time series data. (Li et al. 2023) demonstrated GBR’s superiority in the Bilate watershed in Ethiopia, achieving an R-squared value of 0.77 with non-time series data, similar to our findings where GBR excelled due to its ability to handle non-linear relationships. This study also aligns with (Zhang et al. 2023), emphasizing the importance of variables like LST, NDVI, and precipitation.

Study 4: (Zhang et al. 2023) developed a real-time groundwater level forecasting strategy using hybrid models combined with remote sensing data, focusing on arid and semi-arid regions. They utilized Deep Learning algorithms and Ensemble Machine Learning models such as LSTM, RF, and XGB, enhanced with Wavelet Transformation (WT). Their findings showed that hybrid models outperformed standalone models, with WT-LSTM achieving the best performance. Similarly, our study employed GBR, SVR, RF, and DT models, with GBR showing the highest accuracy. The integration of diverse environmental variables like terrestrial water storage and soil moisture in our study provides a more comprehensive view of factors influencing groundwater levels, aligning with Zhang et al.’s emphasis on hybrid models. Additionally, Zhang et al. used the SHapley Additive exPlanations (SHAP) method for model interpretability, which parallels our correlation analysis and variable importance assessments. In conclusion, the findings of this study align with existing literature, emphasizing the effectiveness of machine learning models in groundwater level prediction. The superior performance of the GBR model, with an R2 value of 0.99, underscores the potential of traditional ensemble methods in achieving high predictive accuracy. Additionally, this comparative analysis demonstrates that both traditional and deep learning models have unique strengths, and the choice of model can depend on specific dataset characteristics and prediction objectives.

In conclusion, the findings of this study align with existing literature, emphasizing the effectiveness of machine learning models in groundwater level prediction. The superior performance of the GBR model, with an R2 value of 0.99, underscores the potential of traditional ensemble methods in achieving high predictive accuracy. Additionally, this comparative analysis demonstrates that both traditional and deep learning models have unique strengths, and the choice of model can depend on specific dataset characteristics and prediction objectives.

This study presents a comprehensive analysis of machine learning models for predicting groundwater levels, with a specific focus on the GBR model. The findings highlight the potential of these models in capturing the complex relationships between groundwater levels and various environmental factors.

Novelty and contribution

The novelty of this work lies in its specific application and combination of methodologies:

Integration of GRACE and MODIS Data: This study uniquely integrates environmental variables from both GRACE and MODIS satellites to predict groundwater levels, which is not commonly done in existing studies.

Comprehensive Model Comparison: It provides a thorough comparison of multiple machine learning models (SVR, DT, RF, GBR) for groundwater level prediction, offering insights into their relative performances.

Focus on Rabat-Salé-Kénitra Region: The research addresses the specific challenges and conditions of the Rabat-Salé-Kénitra region in Morocco, contributing to localized water resource management strategies.

Environmental and Climatic Variable Analysis: The study includes an in-depth analysis of how various environmental and climatic variables influence groundwater levels, which can help in understanding the dynamics of groundwater systems in similar regions.

Key findings and relative importance in the literature

The key findings of this study underscore the effectiveness of the GBR model in predicting groundwater levels, with an R2 value of 0.99, highlighting its superior performance compared to other models evaluated. This aligns with and expands upon existing literature, demonstrating that traditional ensemble methods can still provide highly accurate predictions comparable to advanced deep learning models under certain conditions.

The comparative analysis with existing studies reveals that while deep learning models such as CNN and EDL have shown high accuracy in some contexts, the GBR model’s robust performance, particularly in handling non-linear relationships and data noise, makes it a valuable tool for groundwater prediction in various settings.

Limitations of the study

Despite the promising results, this study has several limitations:

Exclusion of Geological Factors: The study does not consider geological factors influencing groundwater levels in the region, which could significantly impact model accuracy.

Data Availability and Quality: The accuracy of predictions is highly dependent on the quality and availability of input data. Limited or low-quality data could affect the model’s performance.

Generalization to Other Regions: While the models perform well in the Rabat-Salé-Kénitra region, their generalizability to other regions with different hydrogeological conditions needs further investigation.

Directions for future research

Incorporating Geological Factors: Including geological information in the models to provide a more comprehensive understanding of groundwater behavior.

Advanced Machine Learning Techniques: Exploring advanced machine learning techniques such as deep learning and artificial neural networks to improve prediction accuracy.

Long-term Data Collection: Enhancing long-term data collection efforts to improve the quality and reliability of input data.

Impact of Land Use Changes: Investigating the effects of land use changes and ecosystem health on groundwater levels to build more robust models.

Implementation in similar environmental problems

The results and methods of this study can be implemented in similar environmental problems as follows:

Water Resource Management: Groundwater level predictions can inform the development of comprehensive water management strategies at the regional or watershed level, helping water authorities make informed decisions about water allocation, groundwater extraction limits, and infrastructure planning. Sustainable Agriculture: Farmers and agricultural agencies can utilize groundwater level predictions to optimize irrigation scheduling and water usage, reducing over-extraction and improving water use efficiency. Environmental Impact Assessments: The model outputs can be used to assess the potential impacts of climate change, land use changes, or other environmental factors on groundwater resources, supporting the development of adaptive management strategies and environmental impact assessments.

In summary, this study provides valuable insights into the application of machine learning models for groundwater level prediction. The findings underscore the importance of precise and comprehensive environmental data and highlight the potential of these models in addressing water resource management and environmental sustainability challenges.

The authors declare there is no conflict.

All relevant data are included in the paper or its Supplementary Information.

2012 hcp.ma. High Commission for Planning
.
2022. Hydrographie de Rabat-Sale-Kenitra (accessed: 26 Apr 2024)
.
Aderemi
B. A.
,
Olwal
T. O.
,
Ndambuki
J. M.
&
Rwanga
S. S.
2023
Groundwater levels forecasting using machine learning models: A case study of the groundwater region 10 at karst belt, South Africa
.
Systems and Soft Computing
5
(
200049
),
200049
.
Aghelpour
P.
,
Mohammadi
B.
&
Biazar
S. M.
2019
Long-term monthly average temperature forecasting in some climate types of iran, using the models SARIMA, SVR, and SVR-FA
.
Theoretical and Applied Climatology
138
(
3–4
),
1471
1480
.
Ahmadi
S. M.
,
Balahang
S.
&
Abolfathi
S.
2024
Predicting the hydraulic response of critical transport infrastructures during extreme flood events
.
Engineering Applications of Artificial Intelligence
133
,
108573
.
https://www.sciencedirect.com/science/article/pii/S0952197624007310
.
Al-Bayati
A. H.
&
Alwan
A. A.
2022
Artificial intelligence-based approaches for groundwater level prediction: A review and application
.
Journal of Water and Climate Change
2022
,
1
13
.
https://doi.org/10.1155/2022/8451812
.
Alley
W.
2009
Ground water. In: Encyclopedia of Inland Waters (G. Likens, ed.). Academic Press, Oxford, pp. 684–690
.
Anomohanran
O.
,
Oseme
J. I.
,
Iserhien-Emekeme
R. E.
&
Ofomola
M. O.
2021
Determination of groundwater potential and aquifer hydraulic characteristics in Agbor, Nigeria using geo-electric, geophysical well logging and pumping test techniques
.
Modeling Earth Systems and Environment
7
(
3
),
1639
1649
.
Awan
A.
,
Majid
A.
,
Riaz
R.
,
Rizvi
D. S.
&
Kwon
S.
2024
A novel deep stacking-based ensemble approach for short-term traffic speed prediction
.
IEEE Access
12
,
1
14
.
Ben-Hur
A.
&
Weston
J.
2010
A user’s guide to support vector machines
.
Methods in Molecular Biology
609
,
223
239
.
Bouita Et Al
M.
2021
Assessment of nitrogen pollution of groundwater in the Maamora gharb aquifer, Morocco
.
Egyptian Journal of Aquatic Biology and Fisheries
25
(
3
),
739
758
.
Breiman
L.
2001
Random forests
.
Machine Learning
45
(
1
),
5
32
.
Breiman
L.
,
Friedman
J. H.
,
Olshen
R. A.
&
Stone
C. J.
1984
Classification and Regression Trees
.
Wadsworth International Group
,
Belmont
.
Chang
C.-C.
&
Lin
C.-J.
2011
Libsvm: A library for support vector machines
.
ACM Transactions on Intelligent Systems and Technology (TIST)
2
(
3
),
1
27
.
Chen
T.
&
Guestrin
C.
2016
Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 13-17 August 2016. ACM, New York, pp. 785–794
.
Chengcheng
W.
,
Zhang
X.
,
Wang
W.
,
Lu
C.
,
Zhang
Y.
,
Qin
W.
,
Tick
G.
,
Liu
B.
&
Shu
L.
2021
Groundwater level modeling framework by combining the wavelet transform with a long short-term memory data-driven model
.
Science of The Total Environment
783
,
146948
.
Cortes
C.
&
Vapnik
V.
1995
Support-vector networks
.
Machine Learning
20
(
3
),
273
297
.
Davamani
V.
,
John
J.
,
Poornachandhra
C.
,
Gopalakrishnan
B.
,
Arulmani
S.
,
Parameswari
E.
,
Santhosh
A.
,
Srinivasulu
A.
,
Lal
A.
&
Naidu
R.
2024
A critical review of climate change impacts on groundwater resources: A focus on the current status, future possibilities, and role of simulation models
.
Atmosphere
15
(
1
),
122
.
Davis
J. F.
,
Piovoso
M. J.
,
Hoo
K. A.
&
Bakshi
B. R.
1999
Process data analysis and interpretation. In: Advances in Chemical Engineering. Volume 25 Advances in Chemical Engineering. Elsevier, Amsterdam, pp. 1–103
.
El Haouari
N.
&
Khattabi
A.
2012
Water resources management in morocco
.
Environmental Earth Sciences
65
,
2171
2186
.
Faqihi
F. Z.
,
Benslimane
A.
,
Lahrach
A.
,
Chibout
M.
&
Mokhtar
M. E.
2020
Recognition of the hydrogeological potential using electrical sounding in the khemisset-tiflet region, morocco
.
Journal of Groundwater Science and Engineering
8
(
2
),
172
179
.
https://doi.org/10.19637/j.cnki.2305-7068.2020.02.008
.
Feng
F.
,
Ghorbani
H.
&
Radwan
A. E.
2024
Predicting groundwater level using traditional and deep machine learning algorithms
.
Frontiers in Environmental Science
12
,
1
16
.
https://doi.org/10.3389/fenvs.2024.1291327
.
Gunn
S. R.
1998
Support vector machines for classification and regression. Technical report, University of Southampton
.
Guzman
S. M.
,
Paz
J. O.
,
Tagert
M. L. M.
&
Mercer
A. E.
2019
Evaluation of seasonally classified inputs for the prediction of daily groundwater levels: NARX networks vs support vector machines
.
Environmental Modeling and Assessment
24
(
2
),
223
234
.
Habib
M. A.
,
Abolfathi
S.
,
O’Sullivan
J. J.
&
Salauddin
M.
2024
Efficient data-driven machine learning models for scour depth predictions at sloping sea defences
.
Frontiers in Built Environment
10
,
1
16
.
https://doi.org/10.3389/fbuil.2024.1343398
.
Hakimi
F.
&
Brech
M.
2021
Opportunities and challenges of peri-urban agriculture on the fringes of the Metropolis of Rabat, Morocco
.
International Journal of Food Science and Agriculture
5
(
2
),
269
274
.
Hastie
T.
,
Tibshirani
R.
&
Friedman
J.
2009
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
.
Springer Science & Business Media
,
New York
.
Ibrahem Ahmed Osman
A.
,
Najah Ahmed
A.
,
Chow
M. F.
,
Feng Huang
Y.
&
El-Shafie
A.
2021
Extreme gradient boosting (xgboost) model to predict the groundwater levels in Selangor Malaysia
.
Ain Shams Engineering Journal
12
(
2
),
1545
1556
.
https://www.sciencedirect.com/science/article/pii/S2090447921000125
.
Kawashima
T.
&
Kumano
M.
2017
Robustness of support vector regression in high-dimensional spaces
.
Applied Intelligence
46
(
3
),
600
615
.
Kundzewicz
Z. W.
,
Mata
L. J.
,
Arnell
N. W.
,
Doll
P.
,
Kabat
P.
,
Jimenez
B.
,
Miller
K.
,
Oki
T.
,
Zekai
S.
&
Shiklomanov
I.
2007
Freshwater resources and their management
.
Lan
H.
&
Xiao
X.
2023
Comparative study of machine learning algorithms for groundwater level prediction
.
Journal of Hydrology
615
,
128829
.
Li
W.
,
Finsa
M.
,
Laskey
K.
,
Houser
P.
&
Douglas-Bate
R.
2023
Groundwater level prediction with machine learning to support sustainable irrigation in water scarcity regions
.
Water
15
,
1
21
.
Li
S.
,
Abdelkareem
M.
&
Al-Arifi
N.
2023
Mapping groundwater prospective areas using remote sensing and gis-based data driven frequency ratio techniques and detecting land cover changes in the yellow river basin, China
.
Land (Basel)
12
(
4
),
771
.
Loucks
D. P.
2021
Chapter 2 - impacts of climate change on economies, ecosystems, energy, environments, and human equity: a systems perspective. In: The Impacts of Climate Change, (T. M. Letcher, ed.). Elsevier, pp. 19–50. https://www.sciencedirect.com/science/article/pii/B9780128223734000161
.
Mahdaoui
K.
,
Chafiq
T.
,
Asmlal
L.
&
Tahiri
M.
2024
Assessing hydrological response to future climate change in the Bouregreg Watershed, Morocco
.
Scientific African
23
,
e02046
.
https://www.sciencedirect.com/science/article/pii/S2468227623005008
.
Mahdian
M.
,
Noori
R.
,
Salamattalab
M. M.
,
Heggy
E.
,
Bateni
S.
,
Nohegar
A.
,
Hosseinzadeh
M.
,
Siadatmousavi
S. M.
,
Fadaei
M.
&
Abolfathi
S.
2024
Anzali wetland crisis: Unraveling the decline of iran’s ecological gem
.
Journal of Geophysical Research: Atmospheres
129
(
4
),
e2023JD039538
.
Mohammadi
B.
2019
Predicting total phosphorus levels as indicators for shallow lake management
.
Ecological Indicators
107
(
105664
),
105664
.
Naghibi
S. A.
,
Pourghasemi
H. R.
,
Pourtaghi
Z. S.
&
Rezaei
A.
2015
Groundwater qanat potential mapping using frequency ratio and shannon’s entropy models in the Moghan watershed, Iran
.
Earth Science Informatics
8
(
1
),
171
186
.
Noori
R.
,
Maghrebi
M.
,
Jessen
S.
,
Bateni
S. M.
,
Heggy
E.
,
Javadi
S.
,
Noury
M.
,
Pistre
S.
,
Abolfathi
S.
&
AghaKouchak
A.
2023
Decline in Iran’s groundwater recharge
.
Nature Communications
14
(
1
),
6674
.
published October 21, 2023. https://doi.org/10.1038/s41467-023-42411-2
.
Pettorelli
N.
,
Vik
J. A.
,
Mysterud
A.
,
Gaillard
J.-M.
,
Tucker
C. J.
&
Stenseth
N. C.
2005
Using the satellite-derived NDVI to assess ecological responses to environmental change
.
Trends in Ecology & Evolution
20
(
9
),
503
510
.
Probst
P.
,
Wright
M. N.
&
Boulesteix
A. L.
2019
Hyperparameters and tuning strategies for random forest
.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
9
(
3
),
e1301
.
Quinlan
J. R.
1986
Induction of decision trees. In: Machine Learning Proceedings. Elsevier, Amsterdam, pp. 81–106
.
Rateb
A.
,
Scanlon
B. R.
,
Pool
D. R.
,
Sun
A.
,
Zhang
Z.
,
Chen
J.
,
Clark
B.
,
Faunt
C. C.
,
Haugh
C. J.
,
Hill
M.
,
Hobza
C.
,
McGuire
V. L.
,
Reitz
M.
,
Müller Schmied
H.
,
Sutanudjaja
E. H.
,
Swenson
S.
,
Wiese
D.
,
Xia
Y.
&
Zell
W.
2020
Comparison of groundwater storage changes from GRACE satellites with monitoring and modeling of major U.S. aquifers
.
Water Resources Research
56
(
12
),
e2020WR027556
.
Rimi
A.
,
Dahmani
M.
&
Bouchaou
L.
2006
Hydrogeology of the Rharb basin, Morocco
.
Hydrogeology Journal
14
(
1
),
79
92
.
Rodell
M.
,
Velicogna
I.
&
Famiglietti
J.
2009
Satellite-based estimates of groundwater depletion in India
.
Nature
460
(
7258
),
999
1002
.
Safavian
S.
&
Landgrebe
D.
1991
A survey of decision tree classifier methodology
.
IEEE Transactions on Systems, Man, and Cybernetics
21
(
3
),
660
674
.
Sarkar
S. K.
,
Rudra
R. R.
,
Talukdar
S.
,
Das
P. C.
,
Nur
M. S.
,
Alam
E.
,
Islam
M. K.
&
Islam
A. R. M. T.
2024
Future groundwater potential mapping using machine learning algorithms and climate change scenarios in Bangladesh
.
Scientific Reports
14
,
10328
.
https://doi.org/10.1038/s41598-024-60560-2
.
Smola
A. J.
&
Schölkopf
B.
2004
A tutorial on support vector regression
.
Statistics and Computing
14
(
3
),
199
222
.
Swain
S.
,
Taloor
A. K.
,
Dhal
L.
,
Sahoo
S.
&
Al-Ansari
N.
2022
Impact of climate change on groundwater hydrology: A comprehensive review and current status of the indian hydrogeology
.
Applied Water Science
12
(
6
),
120
.
Tao
H.
,
Hameed
M. M.
,
Marhoon
H. A.
,
Zounemat-Kermani
M.
,
Heddam
S.
,
Kim
S.
,
Sulaiman
S. O.
,
Tan
M. L.
,
Sa’adi
Z.
,
Mehr
A. D.
,
Allawi
M. F.
,
Abba
S. I.
,
Zain
J. M.
,
Falah
M. W.
,
Jamei
M.
,
Bokde
N. D.
,
Bayatvarkeshi
M.
,
Al-Mukhtar
M.
,
Bhagat
S. K.
,
Tiyasha
T.
,
Khedher
K. M.
,
Al-Ansari
N.
,
Shahid
S.
&
Yaseen
Z. M.
2022
Groundwater level prediction using machine learning models: A comprehensive review
.
Neurocomputing
489
,
271
308
.
Taylor
R. G.
,
Scanlon
B.
,
Döll
P.
,
Rodell
M.
,
Van Beek
R.
,
Wada
Y.
,
Longuevergne
L.
,
Leblanc
M.
,
Famiglietti
J. S.
,
Edmunds
M.
&
Konikow
L.
2013
Ground water and climate change
.
Nature Climate Change
3
(
4
),
322
329
.
Wada
Y.
,
Van Beek
L.
,
Van Kempen
C.
,
Reckman
J.
,
Vasak
S.
&
Bierkens
M.
2010
Global depletion of groundwater resources
.
Geophysical Research Letters
37
(
20
),
1
5
.
Yeganeh-Bakhtiary
A.
,
EyvazOghli
H.
,
Shabakhty
N.
&
Abolfathi
S.
2023
Machine learning prediction of wave characteristics: Comparison between semi-empirical approaches and dt model
.
Ocean Engineering
286
,
115583
.
https://www.sciencedirect.com/science/article/pii/S0029801823019674
.
Yi
S.
,
Kondolf
G. M.
,
Solís
S. S.
&
Dale
L.
2024
Groundwater level forecasting using machine learning: A case study of the baekje weir in four major rivers project, South Korea
.
Water Resources Research
60
,
e2022WR032779
.
https://api.semanticscholar.org/CorpusID:269620818
.
Zhang
Q.
,
Li
P.
,
Ren
X.
,
Ning
J.
,
Li
J.
,
Liu
C.
,
Wang
Y.
&
Wang
G.
2023
A new real-time groundwater level forecasting strategy: Coupling hybrid data-driven models with remote sensing data
.
Journal of Hydrology
625
,
129962
.
https://www.sciencedirect.com/science/article/pii/S0022169423009046
.
Zhou
Z.-H.
2014
Ensemble Methods: Foundations and Algorithms
.
Chapman and Hall/CRC
,
Boca Raton
.
Zouhri
L.
2001
L’aquifère du bassin de la mamora, maroc: Geometrie et ecoulements souterrains
.
Journal of African Earth Sciences
32
(
4
),
837
850
.
https://www.sciencedirect.com/science/article/pii/S0899536202000581
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Supplementary data