The Rabat–Salé–Kénitra region of Morocco faces critical groundwater challenges due to increasing demands from population growth, agricultural expansion, and the impacts of prolonged droughts and climate change. This study employs advanced machine learning models, including artificial neural networks (ANN), gradient boosting (GB), support vector regression (SVR), decision tree (DT), and random forest (RF), to predict groundwater storage variations. The dataset encompasses hydrological, meteorological, and geological factors. Among the models evaluated, RF demonstrated superior performance, achieving a mean squared error (MSE) of 484.800, a root mean squared error (RMSE) of 22.018, a mean absolute error (MAE) of 14.986, and a coefficient of determination (R2) of 0.981. Sensitivity analysis revealed significant insights into how different models respond to variations in key environmental factors such as evapotranspiration and precipitation. Prophet was also integrated for its ability to handle seasonality in time-series data, further enhancing prediction reliability. The findings emphasize the urgent need to integrate advanced predictive models into groundwater management to address groundwater depletion and ensure sustainable water resources amid rising drought conditions. Policymakers can use these models to regulate extraction, promote water-saving technologies, and enhance recharge efforts, ensuring the sustainability of vital groundwater resources for future generations.

  • Environmental impact: groundwater resources in Morocco, especially in the Rabat–Salé–Kénitra region data-driven insights: the research integrates comprehensive environmental data.

  • Innovative use of prophet model: the inclusion of the prophet forecasting model enhances the analysis.

  • Policy implications: the findings have significant implications for policymakers, offering data recommendations to mitigate groundwater decline.

ANN

Artificial neural networks

GB

Gradient boosting

DT

Decision tree

GIS

Geographic information system

GWS

Groundwater storage

GWL

Groundwater level

GRACE

Gravity recovery and climate experiment

Location 1

Kenitra

Location 2

Khemisset

Location 3

Tiflet

Location 4

Ouezzane

Location 5

Mechraa Belaksiri

Location 6

Souk El Arbaa

Location 7

Moulay Bouselham

Location 8

Al Kansera

ML

Machine learning

MAE

Mean absolute error

MSE

Mean squared error

MODIS

Moderate resolution imaging spectroradiometer

NASA

National Aeronautics and Space Administration

NDVI

Normalized difference vegetation index

RMSE

Root mean squared error

RS

Remote sensing

RF

Random forest

SVR

Support vector regression

SVM

Support vector machine

TWS

Terrestrial water storage

Morocco faces significant water scarcity, with an annual renewable water resource of 29 billion cubic meters (BCM), including 4 BCM from groundwater. Groundwater supplies as much as 60–70% of the nation’s potable water, though some deep aquifers are either nonrenewable or have minimal recharge capacity (Faysse et al. 2010; Hssaisoune et al. 2020). Rapid development and agricultural growth have led to aquifer overuse, with extraction rates surpassing natural recharge in most major basins (Ait Brahim et al. 2017; Fakir et al. 2021; Echogdali et al. 2023). This depletion is exacerbated by the region’s projected increase in water stress due to climate change (Aboutalebi et al. 2022). Ensuring sustainability remains a significant challenge.

The Rabat–Salé–Kénitra region faces distinct groundwater challenges that exacerbate the general water scarcity issues prevalent in Morocco. One of the primary concerns is the over-extraction of groundwater, where demand significantly exceeds natural recharge rates, leading to a steady decline in aquifer levels. This decline is driven by rapid population growth, which increases the demand for potable water, alongside heavy reliance on groundwater for agricultural irrigation. These agricultural practices are often inefficient, resulting in high consumption rates and further straining the region’s water resources (Cheikhaoui et al. 2024).

Existing research on groundwater modeling in Morocco and similar regions predominantly relies on traditional statistical techniques, such as linear regression and time-series analysis, which often inadequately capture the complex, nonlinear interactions inherent in groundwater dynamics. For instance, studies have illustrated the limitations of these conventional methods in accurately predicting groundwater behavior, particularly under changing environmental conditions, and have underscored the reliance on limited datasets that hinder comprehensive analyses (Ait Brahim et al. 2017; Fakir et al. 2021).

Several studies have examined groundwater modeling using various machine learning techniques across different regions. Osman et al. (2021) in Malaysia utilized a 9-month daily dataset, applying XGBoost, artificial neural network (ANN), and SVR, achieving an R2 of 0.920. Malakar et al. (2021) analyzed 23 years of groundwater level data in India, collected quarterly, employing FNN, RNN, and LSTM, also with an R2 of 0.920. Afan et al. (2021) similarly focused on a daily dataset from Malaysia from 2017 to mid-2018 using DL and EDL, reporting an R2 of 0.790. Dehghani (2022) utilized a comprehensive monthly dataset from Iran spanning 11 years, leveraging ANN and other methods to achieve a notable R2 of 0.995. Conversely, Khan et al. (2023) conducted a broad review of 109 articles with 15 years of data without specifying data processing methods, limiting the applicability of their findings. Mohammed et al. (2023) worked with a dataset from Sonqor over 306 months, employing GA-ANN and ELM, attaining an R2 of 0.916. While these studies provide valuable insights, common gaps include a lack of comprehensive datasets that encompass diverse climatic conditions and groundwater contexts, as well as insufficient details regarding data processing and validation methods, which may hinder the robustness and generalizability of the models employed. The novelty of this study is rooted in its integration of advanced machine learning techniques, specifically utilizing ANN, gradient boosting (GB), random forest (RF), and support vector machines (SVM), collectively enhancing the robustness of predictive models for groundwater assessment. This research employs comprehensive datasets derived from both satellite observations and ground measurements, incorporating critical variables such as surface temperature, groundwater storage, and soil moisture, facilitating a nuanced understanding of groundwater dynamics and addressing the limitations of previous studies that often relied on restricted datasets. Furthermore, the emphasis on sensitivity analysis fills a notable gap in the literature by examining how variations in environmental factors impact groundwater discharge points. By concentrating specifically on the unique groundwater challenges faced by the Rabat–Salé–Kénitra region, this research provides invaluable insights that are directly applicable to local water management strategies, thereby making a substantial contribution to the field of groundwater management.

This study highlights the potential of integrating machine learning techniques with remote sensing data to enhance groundwater resource management. It provides a foundation for future research aimed at improving the precision of these models, understanding the long-term impacts of climate change on groundwater resources, and integrating socio-economic factors into groundwater prediction models.

The key findings of this study demonstrate that the application of advanced machine learning techniques specifically ANN, gradient boosting (GB), RF, and SVMs substantially enhances the predictive accuracy of groundwater models in the Rabat–Salé–Kénitra region. This research utilizes comprehensive datasets that incorporate essential variables such as surface temperature, groundwater storage, and soil moisture, allowing for a deeper understanding of groundwater dynamics. Additionally, the focus on sensitivity analysis provides critical insights into how variations in environmental factors impact groundwater discharge points, effectively addressing notable gaps in the existing literature and contributing significantly to informed local water management strategies.

This study utilizes a systematic approach to predict groundwater discharge using advanced machine learning models (Figure 1). The process begins with the collection of a comprehensive dataset that includes various environmental factors. These factors are then classified into three main categories: hydrology, meteorology, and geology. The hydrological data includes groundwater storage, soil moisture, and evapotranspiration. Meteorological data comprises average surface temperature, land surface temperature, and precipitation. Geological data involves elevation and soil type. This classification ensures a detailed understanding of the factors affecting groundwater dynamics.
Figure 1

Methodology workflow for predicting groundwater discharge points.

Figure 1

Methodology workflow for predicting groundwater discharge points.

Close modal

Following data categorization, the data undergoes preprocessing, which includes cleaning, normalization, and transformation. The processed data is then used to train four machine learning models: ANN, GB, SVMs, and RF. These models are selected for their capacity to handle complex datasets and make accurate predictions. Their performance is evaluated using specific metrics to determine accuracy and reliability. This workflow leverages various machine learning techniques to effectively predict groundwater discharge and enhance groundwater resource management.

Study area

The Rabat–Salé–Kénitra region is one of the twelve regions of Morocco, established under the new regional division in 2015, encompassing the nation’s capital. This region resulted from the merger of the former regions of Gharb-Chrarda-Beni Hssen and Rabat–Salé–Zemmour-Zaer. It is bordered to the north by Tanger-Tétouan-Al Hoceima, to the east by Fès-Meknès, to the south by Casablanca-Settat and Beni Mellal-Khénifra, and to the west by the Atlantic Ocean. The region spans a total area of 17,570 km2, which is 2.5% of Morocco’s total area (Plan 2024). The climate is Mediterranean, characterized by cold and wet winters with average night temperatures ranging from 0 to 5°C, and daytime temperatures reaching up to 17°C. Summers are oceanic, with nights cooled by ocean humidity and daytime temperatures around 30°C. Occasionally in spring and summer, the ‘Chergui’ wind from the desert can raise temperatures to 40°C (Hakimi & Brech 2021).

Figure 2 illustrates a detailed map of the Rabat–Salé–Kénitra region in Morocco, which serves as the study area. Key observations from the map include:
Figure 2

Map of Morocco highlighting the Rabat–Salé–Kénitra region.

Figure 2

Map of Morocco highlighting the Rabat–Salé–Kénitra region.

Close modal

Geographical context: the map situates the Rabat–Salé–Kénitra region within Morocco, emphasizing its location along the Atlantic coast and its proximity to the Mediterranean Sea.

Legend: the legend offers a clear explanation of the various administrative divisions and boundaries depicted on the map, facilitating easy interpretation of the spatial data.

Aquifers in the Rabat–Salé–Kénitra region

The groundwater resources of the region are supported by several key aquifers, each exhibiting distinct geological and hydrological characteristics:

Maamora aquifer: this extensive aquifer, primarily composed of Quaternary sands and limestones, functions as a vital water reservoir with significant storage capacity. It meets both agricultural and urban water demands, underscoring its critical role in the region’s water supply (Jelbi et al. 2024).

Bouregreg coastal aquifer: situated along the Atlantic coast, this aquifer is notably influenced by marine conditions. It provides essential water resources to coastal cities such as Rabat and Salé, thereby playing a crucial role in sustaining urban populations and supporting industrial activities (Hssaisoune et al. 2020).

Gharb aquifer: located in the Gharb plain, this aquifer comprises highly permeable alluvial deposits that are well suited for irrigation. It is extensively utilized for agricultural purposes, making a substantial contribution to the region’s agricultural productivity (Hilal et al. 2024).

Tiddas Aquifer: Found inland from the coastal areas, the Tiddas Aquifer is characterized by sandstone and clay formations. It is essential for providing water to rural communities, thereby enhancing the overall groundwater resources available in the region (Ouharba et al. 2022).

Key factors contributing to groundwater depletion

Key factors contributing to groundwater depletion in the Rabat–Salé–Kénitra region include climate variability, land use changes, and population growth. Climate variability has led to prolonged droughts and altered precipitation patterns, significantly reducing natural recharge rates and causing pronounced declines in aquifer levels, particularly in areas experiencing less rainfall (Hssaisoune et al. 2020). Additionally, land use changes driven by urban expansion and intensive agricultural practices have increased water demand, regions such as the Gharb aquifer, which is heavily utilized for irrigation, are particularly affected (Asadollahi et al. 2024).Rapid population growth in urban centers like Rabat and Salé exacerbates this issue, as the increasing demand for potable water results in higher extraction rates (Bounoua et al. 2024). These factors exhibit spatial variation within the study area, with different aquifers experiencing varying degrees of depletion based on their specific hydrological characteristics and surrounding land use practices (Sajjad et al. 2023).

Selection of the study area

The region is significant due to its substantial groundwater reserves. The primary aquifers are: The Gharb aquifer, covering 390 km2, with 126 Mm3/year of renewable resources, generally maintains a balanced water balance. It is regionally significant due to substantial recharge from precipitation and infiltration from the margins of the Gharb basin (noa 2022). The Maâmora aquifer, spanning approximately 4,000 km2, is an unconfined aquifer recharged solely by the infiltration of rainfall. It represents a significant water reservoir with 134 Mm3/year of renewable resources (noa 2022). The Tiddas aquifer, at 250 km2, provides around 30 Mm3/year, primarily through rainfall (Rodell et al. 2009). These aquifers are vital for the region’s water supply, supporting agricultural, industrial, and domestic needs. The Bouregreg aquifer, covering 800 km2, offers approximately 60 Mm3/year of renewable resources through both rainfall and river infiltration (Hakimi & Brech 2021).

Nature and sources of data

The daily dataset for this study was gathered from nine distinct locations within the Rabat–Salé–Kénitra region. Satellite data from the Moderate Resolution Imaging Spectroradiometer (MODIS) was employed to assess various environmental factors in the area. This data was sourced from the NASA Earth Observing System Data and Information System (EOSDIS) portal, spanning from January 2010 to December 2022.

Due to the unavailability of specific agricultural data for Morocco, this study relies on generalized data and studies that illustrate the broad impact of agricultural practices on groundwater resources. These include the effects of water-intensive crops, irrigation practices, and the expansion of agricultural lands, which are known to influence groundwater depletion rates.

The six environmental factors analyzed include average surface temperature (AvgSurfT-tavg), groundwater storage (GWS-tavg), soil moisture (SoilMoist-S-tavg), terrestrial water storage (TWS-tavg), elevation, soil type, land surface temperature (LST), evapotranspiration (evap-Tr), normalized difference vegetation index (NDVI), and precipitation. Additionally, Grace data were utilized in the analysis.

Spatial and temporal coverage

The dataset used in this study covers nine distinct locations within the Rabat–Salé–Kénitra region, strategically selected to represent a variety of environmental conditions and land uses. These sites include urban, agricultural, and coastal areas, providing a comprehensive view of groundwater dynamics. The temporal coverage spans from January 2010 to December 2022, capturing significant climatic variations and socio-economic changes that influence groundwater levels. This time period is essential for analyzing long-term trends in groundwater depletion. While the dataset offers valuable insights, certain areas may lack full representation, particularly those with limited data availability. Nonetheless, the selected locations and extensive temporal range enhance the dataset’s overall representativeness for the Rabat–Salé–Kénitra region. This facilitates informed conclusions regarding groundwater management strategies. Overall, the dataset is well suited for assessing the challenges and dynamics of groundwater resources in the region.

Groundwater storage

Groundwater storage represents the total volume of water stored underground in aquifers, averaged over a specific period. Data sourced from GRACE-DA, available from 24 February 2000 to 17 February 2023.

Declines in groundwater storage reflect reductions in available groundwater, often due to excessive extraction for agricultural, industrial, and domestic purposes. Sustained over-extraction can lead to significant drops in groundwater levels, affecting water availability and quality (Rodell et al. 2009).

Precipitation

Precipitation is any form of water, liquid or solid, that falls from the atmosphere and reaches the ground. Data sourced from CHIRPS, available from 24 February 2000 to 5 May 2023.

Precipitation is the primary source of groundwater recharge. Reduced precipitation can lead to lower groundwater levels, especially in regions that rely heavily on rainfall to replenish aquifers (Taylor et al. 2013).

Average surface temperature (AvgSurfT-tavg)

Average surface temperature refers to the mean temperature recorded at the earth’s surface over a specified period. Data sourced from GRACE-DA, available from 24 February 2000 to 17 February 2023.

Increased surface temperatures can lead to higher rates of evaporation, reducing the amount of water available to infiltrate into the ground and recharge aquifers. This reduction in infiltration can significantly exacerbate groundwater depletion, especially in arid and semiarid regions (Kundzewicz et al. 2007).

Soil moisture (SoilMoist-S-tavg)

Soil moisture refers to the amount of water present in the soil, averaged over a specific period. Data sourced from GRACE-DA, available from 24 February 2000 to 5 May 2023.

Low soil moisture can reduce the amount of water that percolates down to recharge aquifers. During periods of drought, reduced soil moisture can lead to increased reliance on groundwater for irrigation, further depleting groundwater resources (Döll 2009).

Land surface temperature (LST)

Land surface temperature is the temperature of the land surface as measured by remote sensing instruments. Data sourced from MODIS, available from 24 February 2000 to 5 May 2023.

High land surface temperatures can increase evapotranspiration rates, reducing soil moisture and groundwater recharge. This can lead to a more rapid depletion of groundwater reserves, especially in hot climates (Wang & Dickinson 2012).

Evapotranspiration (evap-Tr)

Evapotranspiration is the sum of evaporation from the land surface and transpiration from plants. Data sourced from GRACE-DA, available from February 24, 2000, to May 5, 2023.

High evapotranspiration rates can significantly reduce the amount of water available for groundwater recharge. In agricultural areas, high evapotranspiration can lead to increased irrigation demands, further stressing groundwater resources (Wada et al. 2010).

Data processing

One essential element in achieving accurate predictions in machine learning is the proper preparation of input data. The preprocessing or normalization of variables, which involves assigning appropriate weights to features, can significantly enhance the quality of the resulting insights (Guzman et al. 2019; Mohammadi 2019). In our research, we analyzed 42,000 readings from various sites, all meticulously verified and cleaned before further analysis.

Missing values were addressed by using the ‘dropna()’ method to remove any rows with incomplete data, ensuring that our analysis was based solely on complete and reliable observations. Additionally, we implemented outlier detection methods to identify and mitigate the influence of extreme values on our results. Although the availability of this data varied, it consistently included current measurements of water levels and other parameters recorded daily without fail.

We managed and analyzed this data using Python 3.10.11, employing libraries such as NumPy, SciPy, pandas, and Matplotlib. For implementing methods like support vector regression (SVR), gradient boosting machines (GBM), or RFs, we utilized scikit-learn, which efficiently handled tasks such as cross-validation a process that can be computationally demanding due to the repeated splitting of the dataset.

To effectively train and test our model and avoid overfitting, we split our dataset into two parts: 70% for training and 30% for testing and validation (James et al. 2013). The primary goal is ‘generalization’: a well-trained model should perform equally well on both the training and testing datasets. The model’s performance is evaluated by training it on the training set (70% of the data) and then assessing its accuracy using the remaining 30%. To fine-tune the models and prevent overfitting to noise in the data, a validation set is derived from the 30% testing data.

Groundwater conditioning factor analysis

Groundwater modeling and prediction rely heavily on analyzing the factors that influence groundwater dynamics. Before constructing the groundwater model, it is crucial to assess the selected factors for potential multicollinearity issues. This section thoroughly examines the multicollinearity among the conditioning factors and describes the steps taken to resolve any identified issues. Multicollinearity occurs when the independent conditioning factors are highly correlated or interdependent.

Several methods have been suggested for detecting multicollinearity. The variance inflation factor (VIF) and tolerance (TOL) are particularly prominent in environmental modeling (O’Brien 2007; Bui ea D 2011) and were thus selected for this study. A VIF greater than 10 and a TOL less than 0.1 indicate the presence of multicollinearity problems (O’Brien 2007; Bui ea D 2011). In our analysis, we identified variables with a VIF exceeding 10, which were subsequently removed to enhance the model’s robustness. Additionally, the mutual information (MI) technique was employed to assess the relative importance of predictor variables and to identify those with minimal impact. Eliminating these variables is crucial for enhancing model accuracy (Khosravi et al. 2018).

This research analyzed six groundwater conditioning factors, including:

  • Average surface temperature (AvgSurfT_tavg): influences evapotranspiration rates and indirectly affects groundwater recharge.

  • Groundwater storage (GWS_tavg): indicates the volume of water stored underground, crucial for understanding groundwater availability.

  • Soil moisture (SoilMoist_S_tavg): reflects the amount of water present in the soil, affecting groundwater recharge rates.

  • Land surface temperature (LST): affects soil moisture and evaporation rates.

  • Precipitation: directly contributes to groundwater recharge through infiltration.

In our study, we analyzed six factors that influence the groundwater level. These factors encompass precipitation, elevation, among others, as detailed before.

Feature selection

Effective feature selection is crucial for building robust groundwater models by ensuring that only the most relevant and independent conditioning factors are included. This study employed several methods to detect and address multicollinearity and to identify the most informative features.

Analysis of multicollinearity among indicators

In the process of developing the model, it is crucial to conduct a multicollinearity analysis on the conditioning factors to identify and address high correlations that could negatively affect the model’s performance. the variance inflation factor (VIF) and tolerance are widely used methods for assessing multicollinearity (Ahmad 2021). A tolerance value less than 0.1 or a VIF value greater than 10 indicates significant multicollinearity, necessitating the removal of such variables from the analysis (Nhu 2020). The formulas for calculating tolerance and VIF are as follows:
(1)
(2)
where represents the coefficient of determination for the regression of the predisposing factor j on all other predisposing factors (Yang et al. 2021).

Table 1 shows the variance inflation factor (VIF) and tolerance (TOL) values for each feature to assess multicollinearity.

Table 1

Multicollinearity analysis results: showing the variance inflation factor (VIF) and tolerance (TOL) for each feature

FeatureVIFTOL
Average surface temperature 6.596058 0.151606 
Soil moisture 5.184165 0.192895 
Land surface temperature 6.616301 0.151142 
Evapotranspiration 2.064074 0.484479 
Precipitation 1.025388 0.975240 
FeatureVIFTOL
Average surface temperature 6.596058 0.151606 
Soil moisture 5.184165 0.192895 
Land surface temperature 6.616301 0.151142 
Evapotranspiration 2.064074 0.484479 
Precipitation 1.025388 0.975240 

Mutual information (MI): evaluation of conditioning factors

Mutual Information (MI) is a measure of the mutual dependence between two variables. It quantifies the amount of information obtained about one random variable through observing another random variable. In the context of groundwater modeling, MI is used to evaluate the importance of conditioning factors by measuring the dependency between each factor and the target variable, groundwater storage.

Let X and Y be two random variables. The Mutual Information is defined as:
where:
  • is the joint probability distribution function of X and Y.

  • and are the marginal probability distribution functions of X and Y, respectively.

The MI can also be expressed in terms of entropy:
where:
  • is the entropy of X.

  • is the conditional entropy of X given Y.

  • is the joint entropy of X and Y.

In this study, MI is used to evaluate the significance of each conditioning factor for predicting groundwater storage. Factors with higher MI values have greater predictive power, as they share more information with the target variable (Shannon 1948; Cover & Thomas 2006; Hajirahimi et al. 2019). Table 2 shows the calculated MI values for the conditioning factors.

Application of Prophet in groundwater forecasting

Prophet is particularly advantageous for modeling groundwater levels, as these levels are often influenced by seasonal variations resulting from climate fluctuations and human agricultural activities. One of the key strengths of Prophet is its ability to decompose time-series data into distinct components: the overall trend, seasonal patterns, and any holiday effects. This decomposition is critical for accurate forecasting, especially in regions like Rabat–Salé–Kénitra, where groundwater levels are subject to annual cycles influenced by both natural phenomena (such as seasonal rainfall). The seasonal component of Prophet captures recurring patterns in the data, allowing the model to account for predictable fluctuations in groundwater levels that occur at specific intervals throughout the year. This is particularly relevant in agricultural regions where groundwater extraction tends to increase during planting and harvesting seasons. By modeling these seasonal effects, Prophet can provide forecasts that not only predict future groundwater levels but also identify potential periods of stress on water resources.

In our analysis, we integrated Prophet alongside other machine learning models to enhance our forecasting framework. This combination allows us to leverage the strengths of Prophet’s time series decomposition with the predictive capabilities of machine learning algorithms. The result is a more robust and reliable forecasting system that can adapt to both short-term variability and long-term trends in groundwater data. By utilizing Prophet, we are better equipped to inform water resource management strategies, ensuring that decision-makers have access to timely and accurate forecasts that can aid in the sustainable management of groundwater resources in the region (Taylor & Letham 2018; Seeger et al. 2017).

In this study, we utilized several machine learning algorithms, specifically ANN, GB, SVR, decision trees (DTs), and RF. Each of these models was chosen based on their distinct strengths in handling complex data relationships and their proven effectiveness in similar applications.

Artificial Neural Networks (ANN): it was selected for its ability to model nonlinear relationships and interactions among multiple input variables, making it suitable for capturing the complex dynamics of groundwater levels.

Gradient Boosting (GB): it was chosen for its robustness in handling overfitting while maintaining high predictive power.

Support Vector Regression (SVR): SVR was included due to its effectiveness in high-dimensional spaces and its capacity to manage nonlinear relationships using kernel functions. This makes it particularly advantageous for groundwater prediction tasks.

Decision Trees (DTs): DTs provide interpretability and straightforward modeling of data, making it easier to visualize the decision-making process. Their simplicity and effectiveness in capturing nonlinear interactions justified their inclusion.

Random Forest: RF, an ensemble method based on DTs, was selected for its ability to enhance predictive accuracy and reduce overfitting by aggregating results from multiple trees. It is particularly effective in dealing with high-dimensional data.

To determine the optimal hyperparameters for each model, we employed a grid search approach (Table 3). This method systematically explores a range of hyperparameter values to identify the combination that yields the best model performance based on evaluation metrics such as R-squared (R2), mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE). This rigorous process ensures that the selected hyperparameters contribute to the overall robustness and accuracy of the models.

Support vector machine

SVMs, initially designed for classification tasks, can be adapted for regression problems using SVR. The fundamental principle of SVM involves identifying a hyperplane that optimally fits the data, ensuring that prediction errors remain within a specified margin while preserving the hyperplane’s flatness (Noori et al. 2022). A key benefit of SVM is its capability to operate in a transformed feature space via the kernel trick, allowing it to effectively manage nonlinear relationships (Cortes & Vapnik 1995). For regression purposes, the simplified SVM model can be formulated as follows:
where b is the bias term, is the weight vector, and represents the transformation of the input x using the kernel function.

Random forest

RF is an advanced ensemble learning method that aggregates predictions from multiple DTs to determine the mode for classification tasks or the mean for regression tasks on an unseen dataset. During the training phase, each tree in the forest is constructed using a unique bootstrap sample, and at each node split, a random subset of features is evaluated (Putra et al. 2023). This technique allows RF to efficiently handle large datasets with high dimensionality, making it a valuable tool in various scientific disciplines. One notable advantage of RF is its ability to evaluate the importance of individual features, which is especially beneficial for feature selection in complex hydrodynamic modeling. The overall prediction is obtained by averaging the outputs of all individual trees :
where T represents the number of trees, and denotes the prediction of the ith tree.

Gradient boosting

GB is an effective ensemble learning technique employed for both classification and regression tasks. It constructs models sequentially, with each new model aiming to correct the errors of the previous one. This approach leverages the strengths of multiple weak learners, usually DTs, to develop a robust predictive model (Friedman 2001).

The fundamental principle of GB is to minimize the loss function by adding new models that are trained to predict the residuals or errors of the previous models. The overall prediction is given by the sum of the predictions from all individual models, adjusted by a learning rate . The general form of the GB model can be expressed as:
where represents the mth weak learner, M is the total number of learners, and is the learning rate.

One of the key advantages of GB is its flexibility in handling various types of data and its ability to model complex relationships. Additionally, GB includes mechanisms to reduce overfitting, such as regularization techniques and early stopping.

Artificial neural networks

ANNs are computational models that mimic the structure of biological neural networks, and they are designed to capture complex and nonlinear relationships between inputs and outputs (Le et al. 2021). ANNs are composed of layers of interconnected neurons, which enable them to learn and represent a wide array of functions, provided they have sufficient depth and data. In applications such as predicting maximum scour depth, ANNs demonstrate adaptive learning abilities that are not reliant on predefined functional forms, making them well-suited for modeling the intricate nonlinearities found in hydraulic processes.

In a basic single-layer ANN configuration, the output for an input vector x is calculated using the formula:
where represents the weight vector, n is the number of input nodes, denotes the activation function, and b is the bias term.

This architecture and its inherent flexibility make ANNs particularly valuable in hydroinformatics and other fields where understanding and predicting complex interactions are essential.

The performance of the models was evaluated using several metrics, including R-squared (R2), MAE, and root mean squared error (RMSE). These metrics have been widely adopted in previous studies of groundwater management, such as (Abd-Elmaboud et al. 2024; Saqr et al. 2023), highlighting their relevance and effectiveness in assessing model accuracy.

Mean squared error

The MSE measures the average squared difference between the observed and predicted values. It is defined as:
Lower MSE values indicate better model performance (Kaliappan et al. 2021).

Root mean squared error

The RMSE is the square root of the MSE, providing a measure of the average magnitude of the errors. It is calculated as:
Since RMSE is in the same units as the observed and predicted values, it makes the error magnitude easier to interpret (Willmott 1982).

Mean absolute error

The MAE calculates the average absolute difference between the observed and predicted values. It is expressed as:
MAE provides a straightforward interpretation of the average prediction error in the same units as the observed data (Chai & Draxler 2014).

Coefficient of determination (R2)

The coefficient of determination (R2) represents the proportion of the variance in the observed data that can be predicted from the independent variables. It is defined as:
R2 values range from 0 to 1, with higher values indicating better model performance (Miles 2014).

Friedman test

Nonparametric statistical methods, such as the Friedman test (Friedman 1937), can be applied without the need to meet statistical assumptions (Derrac et al. 2011), and they do not require the data to follow a normal distribution. The primary goal of the Friedman test is to determine if there are significant differences in the performance of various models.

Essentially, it conducts multiple comparisons to identify notable differences between the behaviors of two or more models (Beasley & Zumbo 2003). The null hypothesis (H0) posits that there are no differences among the performances of the groundwater potential models. A higher p-value indicates a lower likelihood of rejecting the null hypothesis. If the p-value is less than the significance level (), the null hypothesis will be rejected.

Wilcoxon signed-rank test

While the Friedman test identifies if there are differences between models, it does not provide pairwise comparisons among them. Therefore, the Wilcoxon signed-rank test, another nonparametric statistical method, is used for this purpose. To evaluate the significance of differences between the performances of groundwater potential models, both the p-value and z-value are used (Wilcoxon 1945).

Information gain analysis

To identify the most influential factors affecting groundwater storage, we calculated the information gain for various environmental variables (Table 2). This analysis highlights the importance of each variable in predicting groundwater storage.

Table 2

Information gain scores for environmental variables

FeatureInformation gain
Soil moisture (SoilMoist_S_tavg) 0.949030 
Evapotranspiration (evap_Tr) 0.576318 
Average surface temperature (AvgSurfT_tavg) 0.309102 
Land surface temperature (LST) 0.222442 
Precipitation 0.006421 
FeatureInformation gain
Soil moisture (SoilMoist_S_tavg) 0.949030 
Evapotranspiration (evap_Tr) 0.576318 
Average surface temperature (AvgSurfT_tavg) 0.309102 
Land surface temperature (LST) 0.222442 
Precipitation 0.006421 
Table 3

Model hyperparameters and best parameters from the grid search

HyperparameterRandom forestANNSVRDecision treeGradient boosting
Initial parameters      
Parameter 1 n_estimators=100 hidden_layer_sizes=(100,100) kernel=’rbf’ max_depth=20 n_estimators=100 
Parameter 2 random_state=42 max_iter=500 C=100 min_samples_leaf=1 random_state=42 
Parameter 3  random_state=42 gamma=0.1 min_samples_split=5  
Parameter 4   epsilon=0.1 random_state=42  
HyperparameterRandom forestANNSVRDecision treeGradient boosting
Initial parameters      
Parameter 1 n_estimators=100 hidden_layer_sizes=(100,100) kernel=’rbf’ max_depth=20 n_estimators=100 
Parameter 2 random_state=42 max_iter=500 C=100 min_samples_leaf=1 random_state=42 
Parameter 3  random_state=42 gamma=0.1 min_samples_split=5  
Parameter 4   epsilon=0.1 random_state=42  

The Information gain analysis reveals that soil moisture and evapotranspiration are the most significant predictors of groundwater storage. These variables will be prioritized in the subsequent predictive models.

Trend analysis

Following the identification of key predictors, we analyzed the groundwater storage trends over time. Figure 3 illustrates the groundwater storage (GWS_tavg) for three locations (Kenitra, Khemisset,Tiflet), with each series accompanied by a trend line indicating the rate of decline.
Figure 3

Groundwater storage over time by location with trend lines. The graph shows groundwater storage for three locations, each with a trend line indicating the slope of decline.

Figure 3

Groundwater storage over time by location with trend lines. The graph shows groundwater storage for three locations, each with a trend line indicating the slope of decline.

Close modal

As seen in Figure 3, each location exhibits a notable downward trend in groundwater storage over the study period. The calculated slopes of the trend lines (-0.02 for all locations) highlight the consistent decline across the regions. This trend analysis sets the foundation for the subsequent predictive modeling of groundwater levels.

Table 3 presents the initial hyperparameters for each model and the best parameters obtained from the grid search:

In this table, the hyperparameters are listed as rows under the respective model columns are obtained from the grid search.

Detailed performance evaluation of models

We assessed the performance of the five machine learning models (ANN, SVM, RF, XGB, DT) using evaluation metrics such as R2, RMSE, MSE, and MAE, employing cross-validation techniques. Table 4 offers a detailed summary of the models’ performance during both the training and validation phases.

Table 4

Performance metrics for different models

ModelMSERMSEMAER2
Random forest 484.800 22.018 14.986 0.981 
Artificial neural network 535.565 23.142 16.949 0.979 
Support vector regression 806.711 28.402 20.373 0.969 
Decision tree 824.247 28.709 18.350 0.968 
Gradient boosting 561.091 23.687 17.462 0.978 
ModelMSERMSEMAER2
Random forest 484.800 22.018 14.986 0.981 
Artificial neural network 535.565 23.142 16.949 0.979 
Support vector regression 806.711 28.402 20.373 0.969 
Decision tree 824.247 28.709 18.350 0.968 
Gradient boosting 561.091 23.687 17.462 0.978 

Figure 4 illustrates the performance of five machine learning models in predicting groundwater storage, showcasing each model’s RMSE in distinct colors alongside their respective R2 values. The lower RMSE values for the RF and GB models indicate superior predictive accuracy, while the high R2 values reflect their effectiveness in capturing the underlying patterns in the data.
Figure 4

Comparison of model performance: RMSE and R2 metrics.

Figure 4

Comparison of model performance: RMSE and R2 metrics.

Close modal

Artificial neural network

The ANN model demonstrates strong predictive performance, as evidenced by its evaluation metrics and the prediction plot. The performance metrics for the ANN model are summarized as follows:

MSE is 535.565, indicating the average squared difference between the observed actual outcomes and the outcomes predicted by the model. RMSE is 23.142, which represents the square root of MSE, providing a measure of the average magnitude of the prediction errors. MAE is 16.949134, showing the average absolute difference between the predicted and actual values. The R2 value is 0.979, indicating that 97.9% of the variance in the actual data is explained by the model.

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model. The points closely follow the 1:1 line, indicating a strong correlation between the measured and predicted values, suggesting that the model accurately captures the underlying patterns in the data. The majority of the data points are concentrated near the 1:1 line, signifying minimal prediction errors (Figure 5). This alignment indicates that the ANN model makes precise predictions across the entire range of measured values. The residuals (differences between actual and predicted values) are small and evenly distributed around the line, which implies that the model does not have significant bias and is not overfitting or underfitting the data. Overall, the ANN model exhibits robust performance in predicting groundwater storage, as demonstrated by its high R2 value, low error metrics (MSE, RMSE, MAE), and the visual accuracy depicted in the prediction plot.
Figure 5

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model.

Figure 5

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model.

Close modal

Gradient boosting

The GB model showcases excellent predictive performance, as reflected in its evaluation metrics and the corresponding prediction plot. The metrics for the GB model are as follows: MSE is 561.091, representing the average squared difference between the observed actual outcomes and the outcomes predicted by the model. RMSE is 23.687, which is the square root of MSE and provides an understanding of the magnitude of the prediction errors. MAE is 17.462, indicating the average absolute difference between the predicted and actual values. The R2 value is 0.978, signifying that 97.8% of the variance in the actual data is explained by the model.

The prediction plot displays the actual versus predicted groundwater storage values for the GB model. The points closely align with the 1:1 line , indicating a strong correlation between the measured and predicted values, demonstrating that the model accurately captures the underlying patterns in the data (Figure 6). The concentration of data points near the 1:1 line signifies minimal prediction errors. This close alignment indicates that the GB model makes precise predictions across the entire range of measured values. The residuals (differences between actual and predicted values) are small and evenly distributed around the line, implying that the model does not exhibit significant bias and is not overfitting or underfitting the data. Overall, the GB model exhibits robust performance in predicting groundwater storage, as evidenced by its high R2 value, low error metrics (MSE, RMSE, MAE), and the visual accuracy depicted in the prediction plot.
Figure 6

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model.

Figure 6

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model.

Close modal

Random forest

The RF model demonstrates strong predictive capabilities, as indicated by its evaluation metrics and the corresponding prediction plot. The metrics for the RF model are: MSE is 484.800, representing the average squared difference between the actual and predicted outcomes. RMSE is 22.018, which is the square root of MSE and provides an understanding of the magnitude of the prediction errors. MAE is 14.987, indicating the average absolute difference between the predicted and actual values. The R2 value is 0.982, signifying that 98.2% of the variance in the actual data is explained by the model.

Figure 7 displays the actual versus predicted groundwater storage values for the RF model. The points closely align with the 1:1 line, indicating a strong correlation between the measured and predicted values, demonstrating that the model accurately captures the underlying patterns in the data. The concentration of data points near the 1:1 line signifies minimal prediction errors. This close alignment indicates that the RF model makes precise predictions across the entire range of measured values. The residuals (differences between actual and predicted values) are small and evenly distributed around the line, implying that the model does not exhibit significant bias and is not overfitting or underfitting the data. Overall, the RF model exhibits robust performance in predicting groundwater storage, as evidenced by its high R2 value, low error metrics (MSE, RMSE, MAE), and the visual accuracy depicted in the prediction plot.
Figure 7

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model.

Figure 7

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model.

Close modal

The support vector regression

The SVR model’s performance metrics and the corresponding prediction plot provide insights into its predictive capabilities. The SVR model has the following evaluation metrics: MSE is 806.712, representing the average squared difference between the actual and predicted values, which is relatively high compared to other models. RMSE is 28.403, indicating the standard deviation of the prediction errors. MAE is 20.374, reflecting the average absolute difference between the predicted and actual values. The R2 value is 0.970, meaning that 97.0% of the variance in the actual data is explained by the model.

Figure 8 displays the relationship between actual and predicted groundwater storage values. The data points largely align with the 1:1 line, indicating a strong correlation between measured and predicted values. However, there is significant scatter, especially at higher measured values, suggesting that the SVR model does not capture the underlying data patterns as accurately as other models. This results in higher prediction errors. Additionally, the residuals (differences between actual and predicted values) are larger and more dispersed compared to other models, implying that the SVR model may struggle with certain aspects of the data, possibly due to its kernel choice and parameter settings.
Figure 8

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model.

Figure 8

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model.

Close modal

The decision tree

The DT model’s performance metrics and the corresponding prediction plot provide insights into its predictive capabilities. The DT model has the following evaluation metrics: MSE is 824.248, representing the average squared difference between the actual and predicted values, which is relatively high compared to other models. RMSE is 28.710, indicating the standard deviation of the prediction errors. MAE is 18.351, reflecting the average absolute difference between the predicted and actual values. The R2 value is 0.969, meaning that 96.9% of the variance in the actual data is explained by the model.

Figure 9 displays the relationship between actual and predicted groundwater storage values. The data points largely align with the 1:1 line, indicating a strong correlation between measured and predicted values. However, there is significant scatter, especially at higher measured values, suggesting that the DT model does not capture the underlying data patterns as accurately as other models. This results in higher prediction errors. Additionally, the residuals (differences between actual and predicted values) are larger and more dispersed compared to other models, implying that the DT model may struggle with certain aspects of the data, possibly due to its depth and splitting criteria.
Figure 9

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model.

Figure 9

The prediction plot displays the actual versus predicted groundwater storage values for the ANN model.

Close modal

The results from the five models show varying degrees of accuracy in predicting groundwater storage. The ANN model exhibited a high level of accuracy with an MSE of 535.566, RMSE of 23.142, MAE of 16.949, and R2 of 0.980. Its prediction plot aligns closely with the 1:1 line, indicating a strong fit between actual and predicted values. The GBM model also performed exceptionally well, with an MSE of 561.091, RMSE of 23.687, MAE of 17.463, and R2 of 0.979, showing very accurate predictions. The RF model had slightly higher errors with an MSE of 484.800, RMSE of 22.018, MAE of 14.987, and R2 of 0.982, indicating it still captured the data patterns effectively. The SVR model, while demonstrating a good fit with an R2 of 0.970, showed higher errors (MSE of 806.712, RMSE of 28.403, MAE of 20.374) and more scatter in the prediction plot, particularly at higher measured values, indicating some challenges in capturing the data patterns accurately. The DT model showed the highest errors among the models, with an MSE of 824.248, RMSE of 28.710, MAE of 18.351, and R2 of 0.969, suggesting that it struggles the most in capturing the underlying data patterns.

Predictive model evaluation

The performance of the five models RF, ANN, SVR, DT, and GB was evaluated across three different locations for predicting groundwater storage. Figure 10 illustrates the actual, predicted, and future predicted groundwater storage values for each model and location.
Figure 10

Predictions of groundwater storage for RF, artificial neural network, support vector regression, DT, and GB models across three locations. The blue line represents the actual observed values, the red line represents the predicted values during the training period, and the green line represents future predicted values.

Figure 10

Predictions of groundwater storage for RF, artificial neural network, support vector regression, DT, and GB models across three locations. The blue line represents the actual observed values, the red line represents the predicted values during the training period, and the green line represents future predicted values.

Close modal

The figure demonstrates the following key points:

  • Random forest: The model’s prediction lines closely follow the actual values, indicating strong predictive performance across all three locations.

  • Artificial neural network: This model also shows a strong correlation between actual and predicted values, though with some visible scatter.

  • Support vector regression: The SVR model exhibits higher scatter and deviation from the actual values, particularly at higher measured values, indicating some challenges in capturing the data patterns accurately.

  • Decision tree: The DT model shows more scatter and larger prediction errors compared to the RF and ANN models, particularly in higher value ranges.

  • Gradient boosting: The GB model demonstrates strong performance with predicted values closely following the actual values, similar to the RF and ANN models.

Overall, the figure highlights the varying degrees of accuracy and reliability among the different models in predicting groundwater storage declining, with RF and GB models generally showing the best performance.

Statistical tests for model comparison

To statistically validate the differences in performance among the models, we conducted the Friedman test and the Wilcoxon signed-rank test.

Friedman test

The Friedman test, a nonparametric statistical test, was used to detect differences in the performance metrics (MSE, RMSE, MAE, and R2) across the models (Table 5). The null hypothesis (H0) posits that there are no differences among the performances of the models. A p-value less than the significance level () indicates significant differences among the models.

Table 5

Friedman test results

MetricChi-square statisticp-value
MSE 15.0 0.001817 
RMSE 15.0 0.001817 
MAE 15.0 0.001817 
R2 15.0 0.001817 
MetricChi-square statisticp-value
MSE 15.0 0.001817 
RMSE 15.0 0.001817 
MAE 15.0 0.001817 
R2 15.0 0.001817 

The Friedman test results indicate significant differences in the performance metrics among the models (p-values ).

Wilcoxon signed-rank test

Following the Friedman test, the Wilcoxon signed-rank test was performed for pairwise comparisons among the models. This test helps identify which specific models differ significantly in their performance metrics (Table 6).

Table 6

Wilcoxon signed-rank test results (p-values)

Model comparisonMSERMSEMAER2
RF vs. ANN 0.250 0.250 0.250 0.250 
RF vs. SVR 0.250 0.250 0.250 0.250 
RF vs. DT 0.250 0.250 0.250 0.250 
RF vs. GBM 0.250 0.250 0.250 0.250 
ANN vs. SVR 0.250 0.250 0.250 0.250 
ANN vs. DT 0.250 0.250 0.250 0.250 
ANN vs. GBM 0.250 0.250 0.250 0.250 
SVR vs. DT 0.875 0.875 0.875 0.875 
SVR vs. GBM 0.250 0.250 0.250 0.250 
DT vs. GBM 0.250 0.250 0.250 0.250 
Model comparisonMSERMSEMAER2
RF vs. ANN 0.250 0.250 0.250 0.250 
RF vs. SVR 0.250 0.250 0.250 0.250 
RF vs. DT 0.250 0.250 0.250 0.250 
RF vs. GBM 0.250 0.250 0.250 0.250 
ANN vs. SVR 0.250 0.250 0.250 0.250 
ANN vs. DT 0.250 0.250 0.250 0.250 
ANN vs. GBM 0.250 0.250 0.250 0.250 
SVR vs. DT 0.875 0.875 0.875 0.875 
SVR vs. GBM 0.250 0.250 0.250 0.250 
DT vs. GBM 0.250 0.250 0.250 0.250 

The Wilcoxon signed-rank test results show that none of the pairwise comparisons among models are statistically significant (p-values ), indicating that the differences in performance metrics are not due to random chance.

The RF model has the lowest mean rank (2.00), indicating that it generally performed better than the other models across all metrics (Table 7). The GB model follows with a mean rank of 3.00. The SVR and DT models have the highest mean ranks (3.75), indicating lower overall performance compared to the RF and GB models.

Table 7

Mean ranks of the models

No.Performed modelsMean rank
Random forest 2.00 
Artificial neural network 2.50 
Support vector regression 3.75 
Decision tree 3.75 
Gradient boosting 3.00 
No.Performed modelsMean rank
Random forest 2.00 
Artificial neural network 2.50 
Support vector regression 3.75 
Decision tree 3.75 
Gradient boosting 3.00 

Sensitivity analysis explanation and model evaluation

The sensitivity analysis was conducted to assess the impact of different factors on the predicted groundwater storage using various machine learning models (Figure 15). The factors analyzed included average surface temperature (Figure 11), soil moisture (Figure 12), land surface temperature (Figure 13) and evapotranspiration (Figure 14). The results indicate varying responses of the models to these factors. For instance, the RF and DT models showed a relatively stable prediction across different values of average surface temperature, while the ANN and SVR models demonstrated more pronounced changes. Similar trends were observed for other factors, where the response curves varied significantly among the models, highlighting their sensitivity to changes in the input features.
Figure 11

Sensitivity analysis: average surface temperature.

Figure 11

Sensitivity analysis: average surface temperature.

Close modal
Figure 12

Sensitivity analysis: soil moisture.

Figure 12

Sensitivity analysis: soil moisture.

Close modal
Figure 13

Sensitivity analysis – land surface temperature.

Figure 13

Sensitivity analysis – land surface temperature.

Close modal
Figure 14

Sensitivity analysis: evapotranspiration.

Figure 14

Sensitivity analysis: evapotranspiration.

Close modal
Figure 15

Sensitivity analysis of groundwater storage predictions.

Figure 15

Sensitivity analysis of groundwater storage predictions.

Close modal

The analysis further revealed that the models responded differently to changes in evapotranspiration and precipitation. For example, the SVR model exhibited a significant decrease in predicted groundwater storage with increasing evapotranspiration, while the RF and GB models showed a more stable response. Conversely, the predictions for precipitation indicated that the ANN model had a more sensitive response, showing large fluctuations, whereas the DT and GB models maintained a steadier prediction. Overall, these sensitivity analyses underscore the importance of considering model-specific behaviors and responses to input variations when predicting groundwater storage, aiding in selecting the most robust model for specific scenarios.

Comparative performance analysis of machine learning models

The comparative analysis of machine learning models in predicting groundwater storage revealed distinct performance differences among the employed algorithms, specifically highlighting the RF and GB models as superior performers. Several key factors contributed to their robust predictive capabilities.

  1. Ensemble learning approach: Both RF and GB are ensemble methods that combine the predictions of multiple DTs. This approach enhances their generalization ability by averaging predictions (RF) or sequentially improving model accuracy (GB). The aggregation of diverse models reduces overfitting, leading to more accurate predictions on unseen data.

  2. Handling nonlinearity and interactions: The inherent design of decision tree-based algorithms allows them to capture complex nonlinear relationships and interactions among input variables effectively. This is particularly advantageous in groundwater modeling, where relationships between environmental factors and groundwater levels are often intricate and nonlinear.

  3. Feature importance and selection: Both models provide insights into feature importance, allowing for the identification and prioritization of critical factors influencing groundwater storage. This focus on significant predictors, such as soil moisture and evapotranspiration, likely improved model accuracy, as highlighted in the information gain analysis.

  4. Hyperparameter optimization: The performance of both RF and GB models benefited significantly from rigorous hyperparameter tuning. Optimizing parameters such as the number of trees, learning rates, and maximum depth ensured that these models were configured to achieve the best possible accuracy, contributing to their superior performance metrics (e.g., lower RMSE and higher R2).

  5. Robustness to noise: The RF model, in particular, is known for its robustness against noise in the dataset. By averaging predictions across numerous trees, it mitigates the impact of outliers and erroneous data points, which may adversely affect other models like SVR and DTs.

  6. Comparative metrics: The performance metrics indicate that the RF model achieved an MSE of 484.800 and an R2 of 0.981, while the GB model attained an MSE of 561.091 with an R2 of 0.978. These results suggest that both models not only accurately captured the trends in groundwater levels but also maintained reliability across different validation datasets.

The superior performance of the RF and GB models can be attributed to their ensemble learning capabilities, ability to manage complex data relationships, effective feature selection, meticulous hyperparameter tuning, and robustness to noise. These factors collectively contribute to their effectiveness in predicting groundwater storage, highlighting their potential for application in groundwater management strategies.

Insights from sensitivity analysis

The sensitivity analysis conducted in this study revealed critical insights into how variations in key environmental factors, such as evapotranspiration and precipitation, affect groundwater storage predictions across different machine learning models. The analysis highlighted that models like RF and GB exhibited a relatively stable response to changes in average surface temperature, whereas ANN and SVR showed more pronounced fluctuations in predictions with varying evapotranspiration rates. Specifically, the SVR model demonstrated a significant decrease in predicted groundwater storage as evapotranspiration increased, indicating a sensitivity to this factor that could affect reliability in specific scenarios.

In terms of precipitation, the ANN model exhibited a notably sensitive response, with larger fluctuations in predicted groundwater levels compared to RF and GB models, which maintained steadier predictions. This sensitivity suggests that while RF and GB are more robust in varying conditions, ANN may require further refinement to enhance its predictive stability.

These findings have important implications for groundwater management practices. The demonstrated sensitivity of groundwater storage to environmental factors underscores the need for adaptive management strategies that consider the variability and interaction of these factors. Specifically, understanding how changes in evapotranspiration and precipitation influence groundwater levels can inform the timing and scale of water extraction, irrigation practices, and conservation efforts. By integrating these insights into management policies, stakeholders can better anticipate groundwater fluctuations, leading to more sustainable resource use and enhanced resilience against climate variability and extreme weather events.

Implications for groundwater management in Morocco

Morocco is facing significant groundwater challenges, exacerbated by prolonged periods of drought over the past 6 years. The findings from this study have important implications for policymakers and groundwater management strategies.

Factors influencing groundwater decline

The study identified key environmental factors influencing groundwater storage, including average surface temperature, soil moisture, evapotranspiration, precipitation, and land surface temperature. These factors directly impact groundwater recharge and depletion rates. For instance, higher surface temperatures and increased evapotranspiration rates can accelerate groundwater depletion, while effective soil moisture and adequate precipitation are critical for groundwater recharge. Sensitivity analysis revealed that the models responded differently to changes in these factors. For example, the SVR model exhibited a significant decrease in predicted groundwater storage with increasing evapotranspiration, while the RF and GB models showed a more stable response.

Scenario analysis and policy recommendations

To better understand the resilience of groundwater resources in the Rabat–Salé–Kénitra region, it is crucial to explore the impact of various scenarios, including varying precipitation levels and population growth rates, on groundwater storage predictions. Scenario analysis can illuminate potential risks and vulnerabilities associated with changing environmental and socio-economic conditions. For instance, modeling scenarios with reduced precipitation can help assess the implications of prolonged droughts on groundwater levels, while simulations incorporating increased population growth can indicate a heightened demand for water resources. By evaluating these different scenarios, stakeholders can identify critical thresholds that, if exceeded, may compromise groundwater sustainability.

In light of the findings from this study, we recommend the following specific and actionable strategies for policymakers to enhance groundwater management:

  1. Groundwater extraction limits: Establish and enforce extraction limits tailored to the specific aquifer capacities and recharge rates. These limits should be based on scientific assessments of sustainable yield to prevent over-extraction, especially in regions experiencing significant declines in groundwater levels.

  2. Water-saving technologies: Promote the adoption of water-saving technologies in agriculture, such as drip irrigation and soil moisture sensors. Financial incentives or subsidies can encourage farmers to implement these practices, ultimately reducing groundwater demand and enhancing efficiency.

  3. Recharge strategies: Develop and implement artificial recharge strategies, such as constructing check dams, percolation tanks, and recharge wells. These initiatives can enhance groundwater levels, particularly during wet seasons, ensuring a more reliable supply during dry periods.

  4. Public awareness campaigns: Launch public awareness campaigns to educate local communities about the importance of sustainable groundwater management practices. Engaging residents in conservation efforts can foster a collective commitment to protecting this vital resource.

By integrating scenario analysis into groundwater management planning and implementing the recommended strategies, policymakers can enhance the region’s resilience to climate variability and socio-economic pressures, safeguarding groundwater resources for future generations.

Limitations of the current modeling approach and future research directions

Despite the promising results obtained from the machine learning models utilized in this study, several limitations must be acknowledged that could affect the robustness and applicability of the findings in real-world groundwater management scenarios.

One primary limitation is the reliance on environmental data, which, while critical, does not encompass the full range of factors influencing groundwater dynamics. The models primarily focused on variables such as soil moisture, evapotranspiration, and precipitation, potentially overlooking significant socio-economic factors such as population growth, land use changes, agricultural practices, and industrial demands. These socio-economic elements can profoundly impact groundwater availability and usage, as they dictate water demand patterns and the implementation of water management practices.

Additionally, the models were developed using datasets that, although comprehensive, may not fully capture the spatial and temporal variability of groundwater systems across different regions. The lack of high-resolution, long-term datasets can lead to a limited understanding of groundwater dynamics and hinder the models’ ability to generalize across varying contexts.

  1. Integration of socio-economic factors: Future models should incorporate socio-economic data alongside environmental variables. This integration could enhance the predictive capability of models by accounting for human activities and their impacts on groundwater resources. Collaborative efforts with social scientists and policymakers can facilitate the collection and integration of relevant socio-economic data into groundwater models.

  2. Comprehensive datasets: There is a need for more extensive and high-resolution datasets that capture the complexities of groundwater systems. Future research should aim to compile datasets from various sources, including satellite observations, ground measurements, and socio-economic surveys, to develop a more holistic understanding of groundwater dynamics.

  3. Longitudinal studies: Conducting longitudinal studies will allow researchers to assess the long-term impacts of climate change and human activities on groundwater levels. These studies can provide valuable insights into trends and inform adaptive management strategies to mitigate groundwater depletion.

  4. Stakeholder engagement: Engaging local stakeholders and communities in the research process can help identify relevant socio-economic factors and improve the model’s applicability in real-world scenarios. Participatory approaches can also foster collaborative management strategies that consider both environmental sustainability and socio-economic needs.

Novelty of the approach

This study presents a novel approach to predicting groundwater storage variations by integrating advanced machine learning models namely, RF and GB with a comprehensive analysis of environmental factors. Unlike traditional groundwater modeling techniques that often rely solely on hydrological data, our approach incorporates critical environmental variables such as soil moisture, evapotranspiration, and precipitation, thereby providing a more holistic understanding of groundwater dynamics. Additionally, the sensitivity analysis conducted reveals the models’ responsiveness to variations in these factors, underscoring the intricate relationships that govern groundwater behavior. Furthermore, this research lays the groundwork for future studies to include socio-economic factors, such as population growth and agricultural practices, which are pivotal in shaping groundwater management strategies. By addressing these critical elements, our findings not only advance the methodological framework for groundwater prediction but also offer actionable insights for policymakers in Morocco and similar regions facing water scarcity challenges.

This study highlights the critical issue of groundwater decline in the Rabat–Salé–Kénitra region of Morocco, exacerbated by prolonged droughts and increasing demands from agriculture and population growth. By leveraging advanced machine learning models such as ANN, GB, SVR, DT, and RF, we were able to effectively predict variations in groundwater storage. Among these models, RF and GB consistently demonstrated superior predictive performance, making them valuable tools for groundwater management.

The integration of Prophet, known for its ability to handle seasonality in time series data, further enhanced the reliability of our predictions. Sensitivity analysis provided crucial insights into how different environmental factors, such as evapotranspiration and precipitation, impact groundwater levels. This analysis underscored the importance of model-specific responses to input variations, aiding in the selection of the most robust models for specific scenarios.

Our findings emphasize the urgent need for advanced predictive models in groundwater management strategies to mitigate the adverse effects of groundwater decline. Policymakers can utilize these models to implement sustainable water management practices, regulate groundwater extraction, promote water-saving technologies, and enhance groundwater recharge. Addressing these issues is vital to ensuring the sustainability of Morocco’s groundwater resources in the face of increasing drought conditions and climate change impacts.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

We declare there is no financial support for this research.

Abd-Elmaboud
M. E.
,
Saqr
A. M.
,
El-Rawy
M.
,
Al-Arifi
N.
&
Ezzeldin
R.
(
2024
)
Evaluation of groundwater potential using ANN-based mountain gazelle optimization: A framework to achieve SDGs in East El Oweinat, Egypt
,
Journal of Hydrology: Regional Studies
,
52
,
101703
.
doi: 10.1016/j.ejrh.2024.101703
.
Aboutalebi
M.
,
Torres-Rua
A. F.
,
McKee
M.
,
Kustas
W. P.
,
Nieto
H.
,
Alsina
M. M.
,
White
A.
,
Prueger
J. H.
,
McKee
L.
,
Alfieri
J.
,
Hipps, L., Coopmans, C., Sanchez, L. & Dokoozlian, N.
(
2022
)
Downscaling UAV land surface temperature using a coupled wavelet-machine learning-optimization algorithm and its impact on evapotranspiration
,
Irrigation Science
,
40
(
4
),
553
574
.
Afan
H. A.
,
Ibrahem Ahmed Osman
A.
,
Essam
Y.
,
Ahmed
A. N.
,
Huang
Y. F.
,
Kisi
O.
,
Sherif
M.
,
Sefelnasr
A.
,
Chau
Kw
&
El-Shafie
A.
(
2021
)
Modeling the fluctuations of groundwater level by employing ensemble deep learning techniques
,
Engineering Applications of Computational Fluid Mechanics
,
15
(
1
),
1420
1439
.
Ahmad
M.
(
2021
)
Multicollinearity analysis methods for model development
,
Statistical Journal
,
29
(
4
),
367
382
.
Ait Brahim
Y.
,
Seif-Ennasr
M.
,
Malki
M.
,
N’da
B.
,
Choukrallah
R.
,
El Morjani
Z.
,
Sifeddine
A.
,
Abahous
H.
&
Bouchaou
L.
(
2017
)
Assessment of climate and land use changes: Impacts on groundwater resources in the Souss-Massa River Basin. The Souss-Massa River Basin, Morocco, pp. 121–142
.
Asadollahi
A.
,
Sohrabifar
A.
,
Ghimire
A.
,
Poudel
B.
&
Shin
S.
(
2024
)
The impact of climate change and urbanization on groundwater levels: A system dynamics model analysis
,
Environmental Protection Research
,
4
.
doi: 10.37256/epr.4120243531
.
Beasley
T. M.
&
Zumbo
B. D.
(
2003
)
Comparison of aligned Friedman rank and parametric methods for testing interactions in split-plot designs
,
Computational Statistics & Data Analysis
,
42
(
4
),
569
593
.
Bounoua
L.
,
Lachkham
M. A.
,
Ed-Dahmany
N.
,
Lagmiri
S.
,
Bahi
H.
,
Messouli
M.
,
Yacoubi Khebiza
M.
,
Nigro
J.
&
Thome
K. J.
(
2024
)
Urban sustainability development in Morocco, a review
,
Urban Science
,
8
(
2
).
doi: 10.3390/urbansci8020028
.
Bui ea D
T.
(
2011
)
Modeling of predictive factors in landslide susceptibility assessment using GIS-based fuzzy logic
,
International Journal of Geographical Information Science
,
25
(
3
),
349
370
.
Cheikhaoui
Y.
,
Sadiki
M.
,
Allouza
M.
,
Chakiri
S.
&
Bouabdli
A.
(
2024
)
Estimation of irrigation water requirements in the Gharb-irrigated perimeter (north-western Morocco)
,
Water Supply
,
24
(
2
),
436
452
.
doi: 10.2166/ws.2024.012
.
Cortes
C.
&
Vapnik
V.
(
1995
)
Support-vector networks
,
Machine Learning
,
20
(
3
),
273
297
.
doi: 10.1007/BF00994018
.
Cover
T. M.
&
Thomas
J. A.
(
2006
)
Elements of Information Theory
, 2nd edn.
Hoboken, NJ, USA
:
Wiley-Interscience
.
Dehghani
R.
(
2022
)
Application of novel hybrid artificial intelligence algorithms to groundwater simulation
,
International Journal of Environmental Science and Technology
,
19
(
5
),
4351
4368
.
Echogdali
F. Z.
,
Boutaleb
S.
,
El Ayady
H.
,
Aadraoui
M.
,
Abdelrahman
K.
,
Bendarma
A.
,
Ikirri
M.
,
Abu-Alam
T.
,
Id-Belqas
M.
&
Abioui
M.
(
2023
)
Characterization and productivity of alluvial aquifers in sustainability oasis areas: A case study of the Tata watershed (southeast Morocco)
,
Applied Sciences
,
13
(
9
),
5473
.
Fakir
Y.
,
Bouimouass
H.
&
Constantz
J.
(
2021
)
Seasonality in intermittent streamflow losses beneath a semiarid Mediterranean Wadi
,
Water Resources Research
,
57
(
6
),
e2021WR029743
.
Faysse
N.
,
Errahj
M.
,
Kuper
M.
&
Mahdi
M.
(
2010
)
Learning to voice? evolving roles of family farmers in the coordination of large-scale irrigation schemes in Morocco
,
Water Alternatives
,
3
(
1
),
48
64
.
Friedman
J. H.
(
2001
)
Greedy function approximation: A gradient boosting machine
,
Annals of Statistics
,
29
(
5
),
1189
1232
.
Friedman
M.
(
1937
)
The use of ranks to avoid the assumption of normality implicit in the analysis of variance
,
Journal of the American Statistical Association
,
32
(
200
),
675
701
.
Guzman
S. M.
,
Paz
J. O.
,
Tagert
M. L. M.
&
Mercer
A. E.
(
2019
)
Evaluation of seasonally classified inputs for the prediction of daily groundwater levels: NARX networks vs support vector machines
,
Environmental Modeling & Assessment
,
24
(
2
),
223
234
.
Hajirahimi
Z.
,
Hashemi
H.
&
Raeisi
E.
(
2019
)
Application of mutual information for groundwater potential mapping in karst areas of the Zagros Region, Iran
,
Hydrogeology Journal
,
27
,
159
171
.
doi: 10.1007/s10040-018-1862-2
.
Hakimi
F.
&
Brech
M.
(
2021
)
Opportunities and challenges of peri-urban agriculture on the fringes of the metropolis of Rabat, Morocco
,
International Journal of Food Science and Agriculture
,
5
(
2
),
269
274
.
Hilal
I.
,
Oubeid
A.
,
Qurtobi
M.
,
Aqnouy
M.
,
Noureddine
A.
,
Saadi
R.
,
Raibi
F.
,
Bellarbi
M.
,
Mhamdi
H. S.
,
Sadiki
M.
,
Hasnaoui
M.
&
Benmansour
M.
(
2024
)
Groundwater vulnerability mapping using the susceptibility index (SI) method and tritium isotopes: A case study of the Gharb aquifer in northwestern Morocco
,
E3S Web of Conferences
,
489
.
doi: 10.1051/e3sconf/202448907001
.
Hssaisoune
M.
,
Bouchaou
L.
,
Sifeddine
A.
,
Bouimetarhan
I
&
Chehbouni
A.
(
2020
)
Moroccan groundwater resources and evolution with global climate changes
,
Geosciences
,
10
(
2
),
81
.
James
G.
,
Witten
D.
,
Hastie
T.
&
Tibshirani
R.
(
2013
)
An Introduction to Statistical Learning
.
New York
:
Springer
.
doi: 10.1007/978-1-4614-7138-7
.
Jelbi
M.
,
Mridekh
A.
,
Taia
S.
,
Kili
M.
,
El Mansouri
B.
&
Magrane
B.
(
2024
)
Integrated assessment of the Plio-quaternary sedimentary succession and groundwater mineralization forecasting in the Rharb Basin (northwestern Morocco)
,
Journal of African Earth Sciences
,
215
,
105277
.
doi: 10.1016/j.jafrearsci.2024.105277
.
Kaliappan
J.
,
Srinivasan
K.
,
Mian Qaisar
S.
,
Sundararajan
K.
&
Chang
C. Y.
(
2021
)
Performance evaluation of regression models for the prediction of the COVID-19 reproduction rate
,
Frontiers in Public Health
,
9
,
729795
.
Khosravi
K.
,
Panahi
M.
&
Tien Bui
D.
(
2018
)
Spatial prediction of groundwater spring potential mapping based on an adaptive neuro-fuzzy inference system and metaheuristic optimization
,
Hydrology and Earth System Sciences
,
22
(
9
),
4771
4792
.
doi: 10.5194/hess-22-4771-2018
.
Kundzewicz
Z.
,
Mata
L.
&
Arnell
N.
(
2007
)
Freshwater resources and their management
.
Le
T.
,
Nguyen
H.
,
Tran
M.
,
Nguyen
L.
&
Le
B.
(
2021
)
An efficient data-driven approach for prediction of maximum scour depth in rivers using artificial neural networks
,
Journal of Hydroinformatics
,
25
(
3
),
625
638
.
doi: 10.2166/hydro.2021.025
.
Malakar
P.
,
Mukherjee
A.
,
Bhanja
S. N.
,
Sarkar
S.
,
Saha
D.
&
Ray
R. K.
(
2021
)
Deep learning-based forecasting of groundwater level trends in India: Implications for crop production and drinking water supply
,
ACS ES&T Engineering
,
1
(
6
),
965
977
.
Miles
J.
(
2014
)
R squared, adjusted R squared. Wiley StatsRef: Statistics Reference Online. doi: 10.1002/9781118445112.stat06627
.
Mohammadi
B.
(
2019
)
Predicting total phosphorus levels as indicators for shallow lake management
,
Ecological Indicators
,
107
(
105664
),
105664
.
Mohammed
K. S.
,
Shabanlou
S.
,
Rajabi
A.
,
Yosefvand
F.
&
Izadbakhsh
M. A.
(
2023
)
Prediction of groundwater level fluctuations using artificial intelligence-based models and GMS
,
Applied Water Science
,
13
(
2
),
54
.
noa
(
2022
)
Hydrographie de Rabat-Sale-Kenitra [Accessed 26 April 2024]
.
Nhu
V. A. T.
(
2020
)
Handling multicollinearity in predictive models
,
Journal of Data Science
,
18
(
3
),
455
469
.
Noori
R.
,
Ghiasi
B.
,
Salehi
S.
,
Esmaeili Bidhendi
M.
,
Raeisi
A.
,
Partani
S.
,
Meysami
R.
,
Mahdian
M.
,
Hosseinzadeh
M.
&
Abolfathi
S.
(
2022
)
An efficient data driven-based model for prediction of the total sediment load in rivers
,
Hydrology
,
9
(
2
),
36
.
doi: 10.3390/hydrology9020036
.
Osman
A. I. A.
,
Ahmed
A. N.
,
Chow
M. F.
,
Huang
Y. F.
&
El-Shafie
A.
(
2021
)
Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia
,
Ain Shams Engineering Journal
,
12
(
2
),
1545
1556
.
Ouharba
E.
,
El
Z.
,
Triqui
Z.
&
Ouharba
H.
(
2022
)
Frequent and extreme climate events in the Bouregreg watershed (Morocco)
,
NQ
,
20
(
9
).
doi: 10.14704/nq.2022.20.9.NQ44283
.
O’Brien
R. M.
(
2007
)
Multicollinearity in regression models: A review
,
Quality & Quantity
,
41
(
5
),
673
690
.
Plan
H. C.
(
2024
)
The Rabat-Salé-Kénitra region. Available at: https://www.hcp.ma/ (Accessed: 9 July 2024)
.
Putra
P. H.
,
Azanuddin
A.
,
Purba
B.
&
Dalimunthe
Y. A.
(
2023
)
Random forest and decision tree algorithms for car price prediction
,
Jurnal Matematika Dan Ilmu Pengetahuan Alam LLDikti Wilayah 1 (JUMPA)
,
4
1
(Sep. 2023),
81
–89.
https://doi.org/10.54076/jumpa.v3i2.305
.
Rodell
M.
,
Velicogna
I.
&
Famiglietti
J. S.
(
2009
)
Satellite-based estimates of groundwater depletion in India
,
Nature
,
460
(
7258
),
999
1002
.
Sajjad
M. M.
,
Wang
J.
,
Afzal
Z.
,
Hussain
S.
,
Siddique
A.
,
Khan
R.
,
Ali
M.
&
Iqbal
J.
(
2023
)
Assessing the impacts of groundwater depletion and aquifer degradation on land subsidence in Lahore, Pakistan: A PS-InSAR approach for sustainable urban development
,
Remote Sensing
,
15
(
22
).
doi: 10.3390/rs15225418
.
Saqr
A. M.
,
Nasr
M.
,
Fujii
M.
,
Yoshimura
C.
&
Ibrahim
M. G.
(
2023
)
Optimal solution for increasing groundwater pumping by integrating MODFLOW-USG and particle swarm optimization algorithm: A case study of wadi El-Natrun, Egypt. In: Chen, X. (ed.) Proceedings of the 2022 12th International Conference on Environment Science and Engineering (ICESE 2022), Singapore: Springer Nature Singapore, pp 59–73
.
Seeger
M.
,
Rangapuram
S. S.
,
Maddix
D. C.
,
Wang
Y.
,
Gasthaus
J.
,
Januschowski
T.
&
Flunkert
V.
(
2017
)
Bayesian intermittent demand forecasting for large inventories. In: Advances in Neural Information Processing Systems, Available at: https://papers.nips.cc/paper/2017/hash/235a6c4a6c9e1767e2a42af6bca7b6b1-Abstract.html
.
Shannon
C. E.
(
1948
)
A mathematical theory of communication
,
The Bell System Technical Journal
,
27
(
3
),
379
423
.
doi: 10.1002/j.1538-7305.1948.tb01338.x
.
Taylor
R.
,
Scanlon
B.
,
Doll
P.
,
Matt Rodell, M., van Beek, R., Wada, Y., Longuevergne, L., Leblanc, M., Famiglietti, J. S., Edmunds, M., Konikow, L., Green, T. R., Chen, J., Taniguchi, M., Bierkens, M. F. P., MacDonald, A., Fan, Y., Maxwell, R. M., Yechieli, Y., Gurdak, J. J., Allen, D. M., Shamsudduha, M., Hiscock, K., Yeh, P. J.-F., Holman, I. & Treidel, H.
(
2013
)
Ground water and climate change
,
Nature Climate Change
,
3
(
4
),
322
329
.
Taylor
S. J.
&
Letham
B.
(
2018
)
Forecasting at scale
,
The American Statistician
,
72
(
1
),
37
45
.
doi: 10.1080/00031305.2017.1380080
.
Wada
Y.
,
Van Beek
L.
,
Van Kempen
C.
,
Reckman
J.
,
Vasak
S.
&
Bierkens
M.
(
2010
)
Global depletion of groundwater resources
,
Geophysical Research Letters
,
37
(
20
).
Wilcoxon
F.
(
1945
)
Individual comparisons by ranking methods
,
Biometrics Bulletin
,
1
(
6
),
80
83
.
Willmott
C. J.
(
1982
)
Some comments on the evaluation of model performance
,
Bulletin of the American Meteorological Society
,
63
(
11
),
1309
1313
.
Yang
X.
,
Liu
R.
,
Yang
M.
,
Chen
J.
,
Liu
T.
,
Yang
Y.
,
Chen
W.
&
Wang
Y.
(
2021
)
Incorporating landslide spatial information and correlated features among conditioning factors for landslide susceptibility mapping
,
Remote Sensing
,
13
(
11
),
10.3390/rs13112166
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).