ABSTRACT
The Rabat–Salé–Kénitra region of Morocco faces critical groundwater challenges due to increasing demands from population growth, agricultural expansion, and the impacts of prolonged droughts and climate change. This study employs advanced machine learning models, including artificial neural networks (ANN), gradient boosting (GB), support vector regression (SVR), decision tree (DT), and random forest (RF), to predict groundwater storage variations. The dataset encompasses hydrological, meteorological, and geological factors. Among the models evaluated, RF demonstrated superior performance, achieving a mean squared error (MSE) of 484.800, a root mean squared error (RMSE) of 22.018, a mean absolute error (MAE) of 14.986, and a coefficient of determination (R2) of 0.981. Sensitivity analysis revealed significant insights into how different models respond to variations in key environmental factors such as evapotranspiration and precipitation. Prophet was also integrated for its ability to handle seasonality in time-series data, further enhancing prediction reliability. The findings emphasize the urgent need to integrate advanced predictive models into groundwater management to address groundwater depletion and ensure sustainable water resources amid rising drought conditions. Policymakers can use these models to regulate extraction, promote water-saving technologies, and enhance recharge efforts, ensuring the sustainability of vital groundwater resources for future generations.
HIGHLIGHTS
Environmental impact: groundwater resources in Morocco, especially in the Rabat–Salé–Kénitra region data-driven insights: the research integrates comprehensive environmental data.
Innovative use of prophet model: the inclusion of the prophet forecasting model enhances the analysis.
Policy implications: the findings have significant implications for policymakers, offering data recommendations to mitigate groundwater decline.
ABBREVIATIONS
- ANN
Artificial neural networks
- GB
Gradient boosting
- DT
Decision tree
- GIS
Geographic information system
- GWS
Groundwater storage
- GWL
Groundwater level
- GRACE
Gravity recovery and climate experiment
- Location 1
Kenitra
- Location 2
Khemisset
- Location 3
Tiflet
- Location 4
Ouezzane
- Location 5
Mechraa Belaksiri
- Location 6
Souk El Arbaa
- Location 7
Moulay Bouselham
- Location 8
Al Kansera
- ML
Machine learning
- MAE
Mean absolute error
- MSE
Mean squared error
- MODIS
Moderate resolution imaging spectroradiometer
- NASA
National Aeronautics and Space Administration
- NDVI
Normalized difference vegetation index
- RMSE
Root mean squared error
- RS
Remote sensing
- RF
Random forest
- SVR
Support vector regression
- SVM
Support vector machine
- TWS
Terrestrial water storage
INTRODUCTION
Morocco faces significant water scarcity, with an annual renewable water resource of 29 billion cubic meters (BCM), including 4 BCM from groundwater. Groundwater supplies as much as 60–70% of the nation’s potable water, though some deep aquifers are either nonrenewable or have minimal recharge capacity (Faysse et al. 2010; Hssaisoune et al. 2020). Rapid development and agricultural growth have led to aquifer overuse, with extraction rates surpassing natural recharge in most major basins (Ait Brahim et al. 2017; Fakir et al. 2021; Echogdali et al. 2023). This depletion is exacerbated by the region’s projected increase in water stress due to climate change (Aboutalebi et al. 2022). Ensuring sustainability remains a significant challenge.
The Rabat–Salé–Kénitra region faces distinct groundwater challenges that exacerbate the general water scarcity issues prevalent in Morocco. One of the primary concerns is the over-extraction of groundwater, where demand significantly exceeds natural recharge rates, leading to a steady decline in aquifer levels. This decline is driven by rapid population growth, which increases the demand for potable water, alongside heavy reliance on groundwater for agricultural irrigation. These agricultural practices are often inefficient, resulting in high consumption rates and further straining the region’s water resources (Cheikhaoui et al. 2024).
Existing research on groundwater modeling in Morocco and similar regions predominantly relies on traditional statistical techniques, such as linear regression and time-series analysis, which often inadequately capture the complex, nonlinear interactions inherent in groundwater dynamics. For instance, studies have illustrated the limitations of these conventional methods in accurately predicting groundwater behavior, particularly under changing environmental conditions, and have underscored the reliance on limited datasets that hinder comprehensive analyses (Ait Brahim et al. 2017; Fakir et al. 2021).
Several studies have examined groundwater modeling using various machine learning techniques across different regions. Osman et al. (2021) in Malaysia utilized a 9-month daily dataset, applying XGBoost, artificial neural network (ANN), and SVR, achieving an R2 of 0.920. Malakar et al. (2021) analyzed 23 years of groundwater level data in India, collected quarterly, employing FNN, RNN, and LSTM, also with an R2 of 0.920. Afan et al. (2021) similarly focused on a daily dataset from Malaysia from 2017 to mid-2018 using DL and EDL, reporting an R2 of 0.790. Dehghani (2022) utilized a comprehensive monthly dataset from Iran spanning 11 years, leveraging ANN and other methods to achieve a notable R2 of 0.995. Conversely, Khan et al. (2023) conducted a broad review of 109 articles with 15 years of data without specifying data processing methods, limiting the applicability of their findings. Mohammed et al. (2023) worked with a dataset from Sonqor over 306 months, employing GA-ANN and ELM, attaining an R2 of 0.916. While these studies provide valuable insights, common gaps include a lack of comprehensive datasets that encompass diverse climatic conditions and groundwater contexts, as well as insufficient details regarding data processing and validation methods, which may hinder the robustness and generalizability of the models employed. The novelty of this study is rooted in its integration of advanced machine learning techniques, specifically utilizing ANN, gradient boosting (GB), random forest (RF), and support vector machines (SVM), collectively enhancing the robustness of predictive models for groundwater assessment. This research employs comprehensive datasets derived from both satellite observations and ground measurements, incorporating critical variables such as surface temperature, groundwater storage, and soil moisture, facilitating a nuanced understanding of groundwater dynamics and addressing the limitations of previous studies that often relied on restricted datasets. Furthermore, the emphasis on sensitivity analysis fills a notable gap in the literature by examining how variations in environmental factors impact groundwater discharge points. By concentrating specifically on the unique groundwater challenges faced by the Rabat–Salé–Kénitra region, this research provides invaluable insights that are directly applicable to local water management strategies, thereby making a substantial contribution to the field of groundwater management.
This study highlights the potential of integrating machine learning techniques with remote sensing data to enhance groundwater resource management. It provides a foundation for future research aimed at improving the precision of these models, understanding the long-term impacts of climate change on groundwater resources, and integrating socio-economic factors into groundwater prediction models.
The key findings of this study demonstrate that the application of advanced machine learning techniques specifically ANN, gradient boosting (GB), RF, and SVMs substantially enhances the predictive accuracy of groundwater models in the Rabat–Salé–Kénitra region. This research utilizes comprehensive datasets that incorporate essential variables such as surface temperature, groundwater storage, and soil moisture, allowing for a deeper understanding of groundwater dynamics. Additionally, the focus on sensitivity analysis provides critical insights into how variations in environmental factors impact groundwater discharge points, effectively addressing notable gaps in the existing literature and contributing significantly to informed local water management strategies.
MATERIALS AND METHODS
Following data categorization, the data undergoes preprocessing, which includes cleaning, normalization, and transformation. The processed data is then used to train four machine learning models: ANN, GB, SVMs, and RF. These models are selected for their capacity to handle complex datasets and make accurate predictions. Their performance is evaluated using specific metrics to determine accuracy and reliability. This workflow leverages various machine learning techniques to effectively predict groundwater discharge and enhance groundwater resource management.
Study area
The Rabat–Salé–Kénitra region is one of the twelve regions of Morocco, established under the new regional division in 2015, encompassing the nation’s capital. This region resulted from the merger of the former regions of Gharb-Chrarda-Beni Hssen and Rabat–Salé–Zemmour-Zaer. It is bordered to the north by Tanger-Tétouan-Al Hoceima, to the east by Fès-Meknès, to the south by Casablanca-Settat and Beni Mellal-Khénifra, and to the west by the Atlantic Ocean. The region spans a total area of 17,570 km2, which is 2.5% of Morocco’s total area (Plan 2024). The climate is Mediterranean, characterized by cold and wet winters with average night temperatures ranging from 0 to 5°C, and daytime temperatures reaching up to 17°C. Summers are oceanic, with nights cooled by ocean humidity and daytime temperatures around 30°C. Occasionally in spring and summer, the ‘Chergui’ wind from the desert can raise temperatures to 40°C (Hakimi & Brech 2021).
Geographical context: the map situates the Rabat–Salé–Kénitra region within Morocco, emphasizing its location along the Atlantic coast and its proximity to the Mediterranean Sea.
Legend: the legend offers a clear explanation of the various administrative divisions and boundaries depicted on the map, facilitating easy interpretation of the spatial data.
Aquifers in the Rabat–Salé–Kénitra region
The groundwater resources of the region are supported by several key aquifers, each exhibiting distinct geological and hydrological characteristics:
Maamora aquifer: this extensive aquifer, primarily composed of Quaternary sands and limestones, functions as a vital water reservoir with significant storage capacity. It meets both agricultural and urban water demands, underscoring its critical role in the region’s water supply (Jelbi et al. 2024).
Bouregreg coastal aquifer: situated along the Atlantic coast, this aquifer is notably influenced by marine conditions. It provides essential water resources to coastal cities such as Rabat and Salé, thereby playing a crucial role in sustaining urban populations and supporting industrial activities (Hssaisoune et al. 2020).
Gharb aquifer: located in the Gharb plain, this aquifer comprises highly permeable alluvial deposits that are well suited for irrigation. It is extensively utilized for agricultural purposes, making a substantial contribution to the region’s agricultural productivity (Hilal et al. 2024).
Tiddas Aquifer: Found inland from the coastal areas, the Tiddas Aquifer is characterized by sandstone and clay formations. It is essential for providing water to rural communities, thereby enhancing the overall groundwater resources available in the region (Ouharba et al. 2022).
Key factors contributing to groundwater depletion
Key factors contributing to groundwater depletion in the Rabat–Salé–Kénitra region include climate variability, land use changes, and population growth. Climate variability has led to prolonged droughts and altered precipitation patterns, significantly reducing natural recharge rates and causing pronounced declines in aquifer levels, particularly in areas experiencing less rainfall (Hssaisoune et al. 2020). Additionally, land use changes driven by urban expansion and intensive agricultural practices have increased water demand, regions such as the Gharb aquifer, which is heavily utilized for irrigation, are particularly affected (Asadollahi et al. 2024).Rapid population growth in urban centers like Rabat and Salé exacerbates this issue, as the increasing demand for potable water results in higher extraction rates (Bounoua et al. 2024). These factors exhibit spatial variation within the study area, with different aquifers experiencing varying degrees of depletion based on their specific hydrological characteristics and surrounding land use practices (Sajjad et al. 2023).
Selection of the study area
The region is significant due to its substantial groundwater reserves. The primary aquifers are: The Gharb aquifer, covering 390 km2, with 126 Mm3/year of renewable resources, generally maintains a balanced water balance. It is regionally significant due to substantial recharge from precipitation and infiltration from the margins of the Gharb basin (noa 2022). The Maâmora aquifer, spanning approximately 4,000 km2, is an unconfined aquifer recharged solely by the infiltration of rainfall. It represents a significant water reservoir with 134 Mm3/year of renewable resources (noa 2022). The Tiddas aquifer, at 250 km2, provides around 30 Mm3/year, primarily through rainfall (Rodell et al. 2009). These aquifers are vital for the region’s water supply, supporting agricultural, industrial, and domestic needs. The Bouregreg aquifer, covering 800 km2, offers approximately 60 Mm3/year of renewable resources through both rainfall and river infiltration (Hakimi & Brech 2021).
Nature and sources of data
The daily dataset for this study was gathered from nine distinct locations within the Rabat–Salé–Kénitra region. Satellite data from the Moderate Resolution Imaging Spectroradiometer (MODIS) was employed to assess various environmental factors in the area. This data was sourced from the NASA Earth Observing System Data and Information System (EOSDIS) portal, spanning from January 2010 to December 2022.
Due to the unavailability of specific agricultural data for Morocco, this study relies on generalized data and studies that illustrate the broad impact of agricultural practices on groundwater resources. These include the effects of water-intensive crops, irrigation practices, and the expansion of agricultural lands, which are known to influence groundwater depletion rates.
The six environmental factors analyzed include average surface temperature (AvgSurfT-tavg), groundwater storage (GWS-tavg), soil moisture (SoilMoist-S-tavg), terrestrial water storage (TWS-tavg), elevation, soil type, land surface temperature (LST), evapotranspiration (evap-Tr), normalized difference vegetation index (NDVI), and precipitation. Additionally, Grace data were utilized in the analysis.
Spatial and temporal coverage
The dataset used in this study covers nine distinct locations within the Rabat–Salé–Kénitra region, strategically selected to represent a variety of environmental conditions and land uses. These sites include urban, agricultural, and coastal areas, providing a comprehensive view of groundwater dynamics. The temporal coverage spans from January 2010 to December 2022, capturing significant climatic variations and socio-economic changes that influence groundwater levels. This time period is essential for analyzing long-term trends in groundwater depletion. While the dataset offers valuable insights, certain areas may lack full representation, particularly those with limited data availability. Nonetheless, the selected locations and extensive temporal range enhance the dataset’s overall representativeness for the Rabat–Salé–Kénitra region. This facilitates informed conclusions regarding groundwater management strategies. Overall, the dataset is well suited for assessing the challenges and dynamics of groundwater resources in the region.
Groundwater storage
Groundwater storage represents the total volume of water stored underground in aquifers, averaged over a specific period. Data sourced from GRACE-DA, available from 24 February 2000 to 17 February 2023.
Declines in groundwater storage reflect reductions in available groundwater, often due to excessive extraction for agricultural, industrial, and domestic purposes. Sustained over-extraction can lead to significant drops in groundwater levels, affecting water availability and quality (Rodell et al. 2009).
Precipitation
Precipitation is any form of water, liquid or solid, that falls from the atmosphere and reaches the ground. Data sourced from CHIRPS, available from 24 February 2000 to 5 May 2023.
Precipitation is the primary source of groundwater recharge. Reduced precipitation can lead to lower groundwater levels, especially in regions that rely heavily on rainfall to replenish aquifers (Taylor et al. 2013).
Average surface temperature (AvgSurfT-tavg)
Average surface temperature refers to the mean temperature recorded at the earth’s surface over a specified period. Data sourced from GRACE-DA, available from 24 February 2000 to 17 February 2023.
Increased surface temperatures can lead to higher rates of evaporation, reducing the amount of water available to infiltrate into the ground and recharge aquifers. This reduction in infiltration can significantly exacerbate groundwater depletion, especially in arid and semiarid regions (Kundzewicz et al. 2007).
Soil moisture (SoilMoist-S-tavg)
Soil moisture refers to the amount of water present in the soil, averaged over a specific period. Data sourced from GRACE-DA, available from 24 February 2000 to 5 May 2023.
Low soil moisture can reduce the amount of water that percolates down to recharge aquifers. During periods of drought, reduced soil moisture can lead to increased reliance on groundwater for irrigation, further depleting groundwater resources (Döll 2009).
Land surface temperature (LST)
Land surface temperature is the temperature of the land surface as measured by remote sensing instruments. Data sourced from MODIS, available from 24 February 2000 to 5 May 2023.
High land surface temperatures can increase evapotranspiration rates, reducing soil moisture and groundwater recharge. This can lead to a more rapid depletion of groundwater reserves, especially in hot climates (Wang & Dickinson 2012).
Evapotranspiration (evap-Tr)
Evapotranspiration is the sum of evaporation from the land surface and transpiration from plants. Data sourced from GRACE-DA, available from February 24, 2000, to May 5, 2023.
High evapotranspiration rates can significantly reduce the amount of water available for groundwater recharge. In agricultural areas, high evapotranspiration can lead to increased irrigation demands, further stressing groundwater resources (Wada et al. 2010).
Data processing
One essential element in achieving accurate predictions in machine learning is the proper preparation of input data. The preprocessing or normalization of variables, which involves assigning appropriate weights to features, can significantly enhance the quality of the resulting insights (Guzman et al. 2019; Mohammadi 2019). In our research, we analyzed 42,000 readings from various sites, all meticulously verified and cleaned before further analysis.
Missing values were addressed by using the ‘dropna()’ method to remove any rows with incomplete data, ensuring that our analysis was based solely on complete and reliable observations. Additionally, we implemented outlier detection methods to identify and mitigate the influence of extreme values on our results. Although the availability of this data varied, it consistently included current measurements of water levels and other parameters recorded daily without fail.
We managed and analyzed this data using Python 3.10.11, employing libraries such as NumPy, SciPy, pandas, and Matplotlib. For implementing methods like support vector regression (SVR), gradient boosting machines (GBM), or RFs, we utilized scikit-learn, which efficiently handled tasks such as cross-validation a process that can be computationally demanding due to the repeated splitting of the dataset.
To effectively train and test our model and avoid overfitting, we split our dataset into two parts: 70% for training and 30% for testing and validation (James et al. 2013). The primary goal is ‘generalization’: a well-trained model should perform equally well on both the training and testing datasets. The model’s performance is evaluated by training it on the training set (70% of the data) and then assessing its accuracy using the remaining 30%. To fine-tune the models and prevent overfitting to noise in the data, a validation set is derived from the 30% testing data.
Groundwater conditioning factor analysis
Groundwater modeling and prediction rely heavily on analyzing the factors that influence groundwater dynamics. Before constructing the groundwater model, it is crucial to assess the selected factors for potential multicollinearity issues. This section thoroughly examines the multicollinearity among the conditioning factors and describes the steps taken to resolve any identified issues. Multicollinearity occurs when the independent conditioning factors are highly correlated or interdependent.
Several methods have been suggested for detecting multicollinearity. The variance inflation factor (VIF) and tolerance (TOL) are particularly prominent in environmental modeling (O’Brien 2007; Bui ea D 2011) and were thus selected for this study. A VIF greater than 10 and a TOL less than 0.1 indicate the presence of multicollinearity problems (O’Brien 2007; Bui ea D 2011). In our analysis, we identified variables with a VIF exceeding 10, which were subsequently removed to enhance the model’s robustness. Additionally, the mutual information (MI) technique was employed to assess the relative importance of predictor variables and to identify those with minimal impact. Eliminating these variables is crucial for enhancing model accuracy (Khosravi et al. 2018).
This research analyzed six groundwater conditioning factors, including:
Average surface temperature (AvgSurfT_tavg): influences evapotranspiration rates and indirectly affects groundwater recharge.
Groundwater storage (GWS_tavg): indicates the volume of water stored underground, crucial for understanding groundwater availability.
Soil moisture (SoilMoist_S_tavg): reflects the amount of water present in the soil, affecting groundwater recharge rates.
Land surface temperature (LST): affects soil moisture and evaporation rates.
Precipitation: directly contributes to groundwater recharge through infiltration.
Feature selection
Effective feature selection is crucial for building robust groundwater models by ensuring that only the most relevant and independent conditioning factors are included. This study employed several methods to detect and address multicollinearity and to identify the most informative features.
Analysis of multicollinearity among indicators
Table 1 shows the variance inflation factor (VIF) and tolerance (TOL) values for each feature to assess multicollinearity.
Feature . | VIF . | TOL . |
---|---|---|
Average surface temperature | 6.596058 | 0.151606 |
Soil moisture | 5.184165 | 0.192895 |
Land surface temperature | 6.616301 | 0.151142 |
Evapotranspiration | 2.064074 | 0.484479 |
Precipitation | 1.025388 | 0.975240 |
Feature . | VIF . | TOL . |
---|---|---|
Average surface temperature | 6.596058 | 0.151606 |
Soil moisture | 5.184165 | 0.192895 |
Land surface temperature | 6.616301 | 0.151142 |
Evapotranspiration | 2.064074 | 0.484479 |
Precipitation | 1.025388 | 0.975240 |
Mutual information (MI): evaluation of conditioning factors
Mutual Information (MI) is a measure of the mutual dependence between two variables. It quantifies the amount of information obtained about one random variable through observing another random variable. In the context of groundwater modeling, MI is used to evaluate the importance of conditioning factors by measuring the dependency between each factor and the target variable, groundwater storage.
In this study, MI is used to evaluate the significance of each conditioning factor for predicting groundwater storage. Factors with higher MI values have greater predictive power, as they share more information with the target variable (Shannon 1948; Cover & Thomas 2006; Hajirahimi et al. 2019). Table 2 shows the calculated MI values for the conditioning factors.
Application of Prophet in groundwater forecasting
Prophet is particularly advantageous for modeling groundwater levels, as these levels are often influenced by seasonal variations resulting from climate fluctuations and human agricultural activities. One of the key strengths of Prophet is its ability to decompose time-series data into distinct components: the overall trend, seasonal patterns, and any holiday effects. This decomposition is critical for accurate forecasting, especially in regions like Rabat–Salé–Kénitra, where groundwater levels are subject to annual cycles influenced by both natural phenomena (such as seasonal rainfall). The seasonal component of Prophet captures recurring patterns in the data, allowing the model to account for predictable fluctuations in groundwater levels that occur at specific intervals throughout the year. This is particularly relevant in agricultural regions where groundwater extraction tends to increase during planting and harvesting seasons. By modeling these seasonal effects, Prophet can provide forecasts that not only predict future groundwater levels but also identify potential periods of stress on water resources.
In our analysis, we integrated Prophet alongside other machine learning models to enhance our forecasting framework. This combination allows us to leverage the strengths of Prophet’s time series decomposition with the predictive capabilities of machine learning algorithms. The result is a more robust and reliable forecasting system that can adapt to both short-term variability and long-term trends in groundwater data. By utilizing Prophet, we are better equipped to inform water resource management strategies, ensuring that decision-makers have access to timely and accurate forecasts that can aid in the sustainable management of groundwater resources in the region (Taylor & Letham 2018; Seeger et al. 2017).
MACHINE LEARNING MODELS AND EVALUATION METRICS
In this study, we utilized several machine learning algorithms, specifically ANN, GB, SVR, decision trees (DTs), and RF. Each of these models was chosen based on their distinct strengths in handling complex data relationships and their proven effectiveness in similar applications.
Artificial Neural Networks (ANN): it was selected for its ability to model nonlinear relationships and interactions among multiple input variables, making it suitable for capturing the complex dynamics of groundwater levels.
Gradient Boosting (GB): it was chosen for its robustness in handling overfitting while maintaining high predictive power.
Support Vector Regression (SVR): SVR was included due to its effectiveness in high-dimensional spaces and its capacity to manage nonlinear relationships using kernel functions. This makes it particularly advantageous for groundwater prediction tasks.
Decision Trees (DTs): DTs provide interpretability and straightforward modeling of data, making it easier to visualize the decision-making process. Their simplicity and effectiveness in capturing nonlinear interactions justified their inclusion.
Random Forest: RF, an ensemble method based on DTs, was selected for its ability to enhance predictive accuracy and reduce overfitting by aggregating results from multiple trees. It is particularly effective in dealing with high-dimensional data.
To determine the optimal hyperparameters for each model, we employed a grid search approach (Table 3). This method systematically explores a range of hyperparameter values to identify the combination that yields the best model performance based on evaluation metrics such as R-squared (R2), mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE). This rigorous process ensures that the selected hyperparameters contribute to the overall robustness and accuracy of the models.
Support vector machine
Random forest
Gradient boosting
GB is an effective ensemble learning technique employed for both classification and regression tasks. It constructs models sequentially, with each new model aiming to correct the errors of the previous one. This approach leverages the strengths of multiple weak learners, usually DTs, to develop a robust predictive model (Friedman 2001).
One of the key advantages of GB is its flexibility in handling various types of data and its ability to model complex relationships. Additionally, GB includes mechanisms to reduce overfitting, such as regularization techniques and early stopping.
Artificial neural networks
ANNs are computational models that mimic the structure of biological neural networks, and they are designed to capture complex and nonlinear relationships between inputs and outputs (Le et al. 2021). ANNs are composed of layers of interconnected neurons, which enable them to learn and represent a wide array of functions, provided they have sufficient depth and data. In applications such as predicting maximum scour depth, ANNs demonstrate adaptive learning abilities that are not reliant on predefined functional forms, making them well-suited for modeling the intricate nonlinearities found in hydraulic processes.
This architecture and its inherent flexibility make ANNs particularly valuable in hydroinformatics and other fields where understanding and predicting complex interactions are essential.
EVALUATION METRICS
The performance of the models was evaluated using several metrics, including R-squared (R2), MAE, and root mean squared error (RMSE). These metrics have been widely adopted in previous studies of groundwater management, such as (Abd-Elmaboud et al. 2024; Saqr et al. 2023), highlighting their relevance and effectiveness in assessing model accuracy.
Mean squared error
Root mean squared error
Mean absolute error
Coefficient of determination (R2)
INFERENTIAL STATISTICS
Friedman test
Nonparametric statistical methods, such as the Friedman test (Friedman 1937), can be applied without the need to meet statistical assumptions (Derrac et al. 2011), and they do not require the data to follow a normal distribution. The primary goal of the Friedman test is to determine if there are significant differences in the performance of various models.
Essentially, it conducts multiple comparisons to identify notable differences between the behaviors of two or more models (Beasley & Zumbo 2003). The null hypothesis (H0) posits that there are no differences among the performances of the groundwater potential models. A higher p-value indicates a lower likelihood of rejecting the null hypothesis. If the p-value is less than the significance level (), the null hypothesis will be rejected.
Wilcoxon signed-rank test
While the Friedman test identifies if there are differences between models, it does not provide pairwise comparisons among them. Therefore, the Wilcoxon signed-rank test, another nonparametric statistical method, is used for this purpose. To evaluate the significance of differences between the performances of groundwater potential models, both the p-value and z-value are used (Wilcoxon 1945).
RESULT
Information gain analysis
To identify the most influential factors affecting groundwater storage, we calculated the information gain for various environmental variables (Table 2). This analysis highlights the importance of each variable in predicting groundwater storage.
Feature . | Information gain . |
---|---|
Soil moisture (SoilMoist_S_tavg) | 0.949030 |
Evapotranspiration (evap_Tr) | 0.576318 |
Average surface temperature (AvgSurfT_tavg) | 0.309102 |
Land surface temperature (LST) | 0.222442 |
Precipitation | 0.006421 |
Feature . | Information gain . |
---|---|
Soil moisture (SoilMoist_S_tavg) | 0.949030 |
Evapotranspiration (evap_Tr) | 0.576318 |
Average surface temperature (AvgSurfT_tavg) | 0.309102 |
Land surface temperature (LST) | 0.222442 |
Precipitation | 0.006421 |
Hyperparameter . | Random forest . | ANN . | SVR . | Decision tree . | Gradient boosting . |
---|---|---|---|---|---|
Initial parameters | |||||
Parameter 1 | n_estimators=100 | hidden_layer_sizes=(100,100) | kernel=’rbf’ | max_depth=20 | n_estimators=100 |
Parameter 2 | random_state=42 | max_iter=500 | C=100 | min_samples_leaf=1 | random_state=42 |
Parameter 3 | random_state=42 | gamma=0.1 | min_samples_split=5 | ||
Parameter 4 | epsilon=0.1 | random_state=42 |
Hyperparameter . | Random forest . | ANN . | SVR . | Decision tree . | Gradient boosting . |
---|---|---|---|---|---|
Initial parameters | |||||
Parameter 1 | n_estimators=100 | hidden_layer_sizes=(100,100) | kernel=’rbf’ | max_depth=20 | n_estimators=100 |
Parameter 2 | random_state=42 | max_iter=500 | C=100 | min_samples_leaf=1 | random_state=42 |
Parameter 3 | random_state=42 | gamma=0.1 | min_samples_split=5 | ||
Parameter 4 | epsilon=0.1 | random_state=42 |
The Information gain analysis reveals that soil moisture and evapotranspiration are the most significant predictors of groundwater storage. These variables will be prioritized in the subsequent predictive models.
Trend analysis
As seen in Figure 3, each location exhibits a notable downward trend in groundwater storage over the study period. The calculated slopes of the trend lines (-0.02 for all locations) highlight the consistent decline across the regions. This trend analysis sets the foundation for the subsequent predictive modeling of groundwater levels.
MODEL HYPERPARAMETERS
Table 3 presents the initial hyperparameters for each model and the best parameters obtained from the grid search:
In this table, the hyperparameters are listed as rows under the respective model columns are obtained from the grid search.
Detailed performance evaluation of models
We assessed the performance of the five machine learning models (ANN, SVM, RF, XGB, DT) using evaluation metrics such as R2, RMSE, MSE, and MAE, employing cross-validation techniques. Table 4 offers a detailed summary of the models’ performance during both the training and validation phases.
Model . | MSE . | RMSE . | MAE . | R2 . |
---|---|---|---|---|
Random forest | 484.800 | 22.018 | 14.986 | 0.981 |
Artificial neural network | 535.565 | 23.142 | 16.949 | 0.979 |
Support vector regression | 806.711 | 28.402 | 20.373 | 0.969 |
Decision tree | 824.247 | 28.709 | 18.350 | 0.968 |
Gradient boosting | 561.091 | 23.687 | 17.462 | 0.978 |
Model . | MSE . | RMSE . | MAE . | R2 . |
---|---|---|---|---|
Random forest | 484.800 | 22.018 | 14.986 | 0.981 |
Artificial neural network | 535.565 | 23.142 | 16.949 | 0.979 |
Support vector regression | 806.711 | 28.402 | 20.373 | 0.969 |
Decision tree | 824.247 | 28.709 | 18.350 | 0.968 |
Gradient boosting | 561.091 | 23.687 | 17.462 | 0.978 |
Artificial neural network
The ANN model demonstrates strong predictive performance, as evidenced by its evaluation metrics and the prediction plot. The performance metrics for the ANN model are summarized as follows:
MSE is 535.565, indicating the average squared difference between the observed actual outcomes and the outcomes predicted by the model. RMSE is 23.142, which represents the square root of MSE, providing a measure of the average magnitude of the prediction errors. MAE is 16.949134, showing the average absolute difference between the predicted and actual values. The R2 value is 0.979, indicating that 97.9% of the variance in the actual data is explained by the model.
Gradient boosting
The GB model showcases excellent predictive performance, as reflected in its evaluation metrics and the corresponding prediction plot. The metrics for the GB model are as follows: MSE is 561.091, representing the average squared difference between the observed actual outcomes and the outcomes predicted by the model. RMSE is 23.687, which is the square root of MSE and provides an understanding of the magnitude of the prediction errors. MAE is 17.462, indicating the average absolute difference between the predicted and actual values. The R2 value is 0.978, signifying that 97.8% of the variance in the actual data is explained by the model.
Random forest
The RF model demonstrates strong predictive capabilities, as indicated by its evaluation metrics and the corresponding prediction plot. The metrics for the RF model are: MSE is 484.800, representing the average squared difference between the actual and predicted outcomes. RMSE is 22.018, which is the square root of MSE and provides an understanding of the magnitude of the prediction errors. MAE is 14.987, indicating the average absolute difference between the predicted and actual values. The R2 value is 0.982, signifying that 98.2% of the variance in the actual data is explained by the model.
The support vector regression
The SVR model’s performance metrics and the corresponding prediction plot provide insights into its predictive capabilities. The SVR model has the following evaluation metrics: MSE is 806.712, representing the average squared difference between the actual and predicted values, which is relatively high compared to other models. RMSE is 28.403, indicating the standard deviation of the prediction errors. MAE is 20.374, reflecting the average absolute difference between the predicted and actual values. The R2 value is 0.970, meaning that 97.0% of the variance in the actual data is explained by the model.
The decision tree
The DT model’s performance metrics and the corresponding prediction plot provide insights into its predictive capabilities. The DT model has the following evaluation metrics: MSE is 824.248, representing the average squared difference between the actual and predicted values, which is relatively high compared to other models. RMSE is 28.710, indicating the standard deviation of the prediction errors. MAE is 18.351, reflecting the average absolute difference between the predicted and actual values. The R2 value is 0.969, meaning that 96.9% of the variance in the actual data is explained by the model.
The results from the five models show varying degrees of accuracy in predicting groundwater storage. The ANN model exhibited a high level of accuracy with an MSE of 535.566, RMSE of 23.142, MAE of 16.949, and R2 of 0.980. Its prediction plot aligns closely with the 1:1 line, indicating a strong fit between actual and predicted values. The GBM model also performed exceptionally well, with an MSE of 561.091, RMSE of 23.687, MAE of 17.463, and R2 of 0.979, showing very accurate predictions. The RF model had slightly higher errors with an MSE of 484.800, RMSE of 22.018, MAE of 14.987, and R2 of 0.982, indicating it still captured the data patterns effectively. The SVR model, while demonstrating a good fit with an R2 of 0.970, showed higher errors (MSE of 806.712, RMSE of 28.403, MAE of 20.374) and more scatter in the prediction plot, particularly at higher measured values, indicating some challenges in capturing the data patterns accurately. The DT model showed the highest errors among the models, with an MSE of 824.248, RMSE of 28.710, MAE of 18.351, and R2 of 0.969, suggesting that it struggles the most in capturing the underlying data patterns.
Predictive model evaluation
The figure demonstrates the following key points:
Random forest: The model’s prediction lines closely follow the actual values, indicating strong predictive performance across all three locations.
Artificial neural network: This model also shows a strong correlation between actual and predicted values, though with some visible scatter.
Support vector regression: The SVR model exhibits higher scatter and deviation from the actual values, particularly at higher measured values, indicating some challenges in capturing the data patterns accurately.
Decision tree: The DT model shows more scatter and larger prediction errors compared to the RF and ANN models, particularly in higher value ranges.
Gradient boosting: The GB model demonstrates strong performance with predicted values closely following the actual values, similar to the RF and ANN models.
Statistical tests for model comparison
To statistically validate the differences in performance among the models, we conducted the Friedman test and the Wilcoxon signed-rank test.
Friedman test
The Friedman test, a nonparametric statistical test, was used to detect differences in the performance metrics (MSE, RMSE, MAE, and R2) across the models (Table 5). The null hypothesis (H0) posits that there are no differences among the performances of the models. A p-value less than the significance level () indicates significant differences among the models.
Metric . | Chi-square statistic . | p-value . |
---|---|---|
MSE | 15.0 | 0.001817 |
RMSE | 15.0 | 0.001817 |
MAE | 15.0 | 0.001817 |
R2 | 15.0 | 0.001817 |
Metric . | Chi-square statistic . | p-value . |
---|---|---|
MSE | 15.0 | 0.001817 |
RMSE | 15.0 | 0.001817 |
MAE | 15.0 | 0.001817 |
R2 | 15.0 | 0.001817 |
The Friedman test results indicate significant differences in the performance metrics among the models (p-values ).
Wilcoxon signed-rank test
Following the Friedman test, the Wilcoxon signed-rank test was performed for pairwise comparisons among the models. This test helps identify which specific models differ significantly in their performance metrics (Table 6).
Model comparison . | MSE . | RMSE . | MAE . | R2 . |
---|---|---|---|---|
RF vs. ANN | 0.250 | 0.250 | 0.250 | 0.250 |
RF vs. SVR | 0.250 | 0.250 | 0.250 | 0.250 |
RF vs. DT | 0.250 | 0.250 | 0.250 | 0.250 |
RF vs. GBM | 0.250 | 0.250 | 0.250 | 0.250 |
ANN vs. SVR | 0.250 | 0.250 | 0.250 | 0.250 |
ANN vs. DT | 0.250 | 0.250 | 0.250 | 0.250 |
ANN vs. GBM | 0.250 | 0.250 | 0.250 | 0.250 |
SVR vs. DT | 0.875 | 0.875 | 0.875 | 0.875 |
SVR vs. GBM | 0.250 | 0.250 | 0.250 | 0.250 |
DT vs. GBM | 0.250 | 0.250 | 0.250 | 0.250 |
Model comparison . | MSE . | RMSE . | MAE . | R2 . |
---|---|---|---|---|
RF vs. ANN | 0.250 | 0.250 | 0.250 | 0.250 |
RF vs. SVR | 0.250 | 0.250 | 0.250 | 0.250 |
RF vs. DT | 0.250 | 0.250 | 0.250 | 0.250 |
RF vs. GBM | 0.250 | 0.250 | 0.250 | 0.250 |
ANN vs. SVR | 0.250 | 0.250 | 0.250 | 0.250 |
ANN vs. DT | 0.250 | 0.250 | 0.250 | 0.250 |
ANN vs. GBM | 0.250 | 0.250 | 0.250 | 0.250 |
SVR vs. DT | 0.875 | 0.875 | 0.875 | 0.875 |
SVR vs. GBM | 0.250 | 0.250 | 0.250 | 0.250 |
DT vs. GBM | 0.250 | 0.250 | 0.250 | 0.250 |
The Wilcoxon signed-rank test results show that none of the pairwise comparisons among models are statistically significant (p-values ), indicating that the differences in performance metrics are not due to random chance.
The RF model has the lowest mean rank (2.00), indicating that it generally performed better than the other models across all metrics (Table 7). The GB model follows with a mean rank of 3.00. The SVR and DT models have the highest mean ranks (3.75), indicating lower overall performance compared to the RF and GB models.
No. . | Performed models . | Mean rank . |
---|---|---|
1 | Random forest | 2.00 |
2 | Artificial neural network | 2.50 |
3 | Support vector regression | 3.75 |
4 | Decision tree | 3.75 |
5 | Gradient boosting | 3.00 |
No. . | Performed models . | Mean rank . |
---|---|---|
1 | Random forest | 2.00 |
2 | Artificial neural network | 2.50 |
3 | Support vector regression | 3.75 |
4 | Decision tree | 3.75 |
5 | Gradient boosting | 3.00 |
Sensitivity analysis explanation and model evaluation
The analysis further revealed that the models responded differently to changes in evapotranspiration and precipitation. For example, the SVR model exhibited a significant decrease in predicted groundwater storage with increasing evapotranspiration, while the RF and GB models showed a more stable response. Conversely, the predictions for precipitation indicated that the ANN model had a more sensitive response, showing large fluctuations, whereas the DT and GB models maintained a steadier prediction. Overall, these sensitivity analyses underscore the importance of considering model-specific behaviors and responses to input variations when predicting groundwater storage, aiding in selecting the most robust model for specific scenarios.
DISCUSSION
Comparative performance analysis of machine learning models
The comparative analysis of machine learning models in predicting groundwater storage revealed distinct performance differences among the employed algorithms, specifically highlighting the RF and GB models as superior performers. Several key factors contributed to their robust predictive capabilities.
Ensemble learning approach: Both RF and GB are ensemble methods that combine the predictions of multiple DTs. This approach enhances their generalization ability by averaging predictions (RF) or sequentially improving model accuracy (GB). The aggregation of diverse models reduces overfitting, leading to more accurate predictions on unseen data.
Handling nonlinearity and interactions: The inherent design of decision tree-based algorithms allows them to capture complex nonlinear relationships and interactions among input variables effectively. This is particularly advantageous in groundwater modeling, where relationships between environmental factors and groundwater levels are often intricate and nonlinear.
Feature importance and selection: Both models provide insights into feature importance, allowing for the identification and prioritization of critical factors influencing groundwater storage. This focus on significant predictors, such as soil moisture and evapotranspiration, likely improved model accuracy, as highlighted in the information gain analysis.
Hyperparameter optimization: The performance of both RF and GB models benefited significantly from rigorous hyperparameter tuning. Optimizing parameters such as the number of trees, learning rates, and maximum depth ensured that these models were configured to achieve the best possible accuracy, contributing to their superior performance metrics (e.g., lower RMSE and higher R2).
Robustness to noise: The RF model, in particular, is known for its robustness against noise in the dataset. By averaging predictions across numerous trees, it mitigates the impact of outliers and erroneous data points, which may adversely affect other models like SVR and DTs.
Comparative metrics: The performance metrics indicate that the RF model achieved an MSE of 484.800 and an R2 of 0.981, while the GB model attained an MSE of 561.091 with an R2 of 0.978. These results suggest that both models not only accurately captured the trends in groundwater levels but also maintained reliability across different validation datasets.
Insights from sensitivity analysis
The sensitivity analysis conducted in this study revealed critical insights into how variations in key environmental factors, such as evapotranspiration and precipitation, affect groundwater storage predictions across different machine learning models. The analysis highlighted that models like RF and GB exhibited a relatively stable response to changes in average surface temperature, whereas ANN and SVR showed more pronounced fluctuations in predictions with varying evapotranspiration rates. Specifically, the SVR model demonstrated a significant decrease in predicted groundwater storage as evapotranspiration increased, indicating a sensitivity to this factor that could affect reliability in specific scenarios.
In terms of precipitation, the ANN model exhibited a notably sensitive response, with larger fluctuations in predicted groundwater levels compared to RF and GB models, which maintained steadier predictions. This sensitivity suggests that while RF and GB are more robust in varying conditions, ANN may require further refinement to enhance its predictive stability.
These findings have important implications for groundwater management practices. The demonstrated sensitivity of groundwater storage to environmental factors underscores the need for adaptive management strategies that consider the variability and interaction of these factors. Specifically, understanding how changes in evapotranspiration and precipitation influence groundwater levels can inform the timing and scale of water extraction, irrigation practices, and conservation efforts. By integrating these insights into management policies, stakeholders can better anticipate groundwater fluctuations, leading to more sustainable resource use and enhanced resilience against climate variability and extreme weather events.
Implications for groundwater management in Morocco
Morocco is facing significant groundwater challenges, exacerbated by prolonged periods of drought over the past 6 years. The findings from this study have important implications for policymakers and groundwater management strategies.
Factors influencing groundwater decline
The study identified key environmental factors influencing groundwater storage, including average surface temperature, soil moisture, evapotranspiration, precipitation, and land surface temperature. These factors directly impact groundwater recharge and depletion rates. For instance, higher surface temperatures and increased evapotranspiration rates can accelerate groundwater depletion, while effective soil moisture and adequate precipitation are critical for groundwater recharge. Sensitivity analysis revealed that the models responded differently to changes in these factors. For example, the SVR model exhibited a significant decrease in predicted groundwater storage with increasing evapotranspiration, while the RF and GB models showed a more stable response.
Scenario analysis and policy recommendations
To better understand the resilience of groundwater resources in the Rabat–Salé–Kénitra region, it is crucial to explore the impact of various scenarios, including varying precipitation levels and population growth rates, on groundwater storage predictions. Scenario analysis can illuminate potential risks and vulnerabilities associated with changing environmental and socio-economic conditions. For instance, modeling scenarios with reduced precipitation can help assess the implications of prolonged droughts on groundwater levels, while simulations incorporating increased population growth can indicate a heightened demand for water resources. By evaluating these different scenarios, stakeholders can identify critical thresholds that, if exceeded, may compromise groundwater sustainability.
In light of the findings from this study, we recommend the following specific and actionable strategies for policymakers to enhance groundwater management:
Groundwater extraction limits: Establish and enforce extraction limits tailored to the specific aquifer capacities and recharge rates. These limits should be based on scientific assessments of sustainable yield to prevent over-extraction, especially in regions experiencing significant declines in groundwater levels.
Water-saving technologies: Promote the adoption of water-saving technologies in agriculture, such as drip irrigation and soil moisture sensors. Financial incentives or subsidies can encourage farmers to implement these practices, ultimately reducing groundwater demand and enhancing efficiency.
Recharge strategies: Develop and implement artificial recharge strategies, such as constructing check dams, percolation tanks, and recharge wells. These initiatives can enhance groundwater levels, particularly during wet seasons, ensuring a more reliable supply during dry periods.
Public awareness campaigns: Launch public awareness campaigns to educate local communities about the importance of sustainable groundwater management practices. Engaging residents in conservation efforts can foster a collective commitment to protecting this vital resource.
Limitations of the current modeling approach and future research directions
Despite the promising results obtained from the machine learning models utilized in this study, several limitations must be acknowledged that could affect the robustness and applicability of the findings in real-world groundwater management scenarios.
One primary limitation is the reliance on environmental data, which, while critical, does not encompass the full range of factors influencing groundwater dynamics. The models primarily focused on variables such as soil moisture, evapotranspiration, and precipitation, potentially overlooking significant socio-economic factors such as population growth, land use changes, agricultural practices, and industrial demands. These socio-economic elements can profoundly impact groundwater availability and usage, as they dictate water demand patterns and the implementation of water management practices.
Additionally, the models were developed using datasets that, although comprehensive, may not fully capture the spatial and temporal variability of groundwater systems across different regions. The lack of high-resolution, long-term datasets can lead to a limited understanding of groundwater dynamics and hinder the models’ ability to generalize across varying contexts.
Integration of socio-economic factors: Future models should incorporate socio-economic data alongside environmental variables. This integration could enhance the predictive capability of models by accounting for human activities and their impacts on groundwater resources. Collaborative efforts with social scientists and policymakers can facilitate the collection and integration of relevant socio-economic data into groundwater models.
Comprehensive datasets: There is a need for more extensive and high-resolution datasets that capture the complexities of groundwater systems. Future research should aim to compile datasets from various sources, including satellite observations, ground measurements, and socio-economic surveys, to develop a more holistic understanding of groundwater dynamics.
Longitudinal studies: Conducting longitudinal studies will allow researchers to assess the long-term impacts of climate change and human activities on groundwater levels. These studies can provide valuable insights into trends and inform adaptive management strategies to mitigate groundwater depletion.
Stakeholder engagement: Engaging local stakeholders and communities in the research process can help identify relevant socio-economic factors and improve the model’s applicability in real-world scenarios. Participatory approaches can also foster collaborative management strategies that consider both environmental sustainability and socio-economic needs.
Novelty of the approach
This study presents a novel approach to predicting groundwater storage variations by integrating advanced machine learning models namely, RF and GB with a comprehensive analysis of environmental factors. Unlike traditional groundwater modeling techniques that often rely solely on hydrological data, our approach incorporates critical environmental variables such as soil moisture, evapotranspiration, and precipitation, thereby providing a more holistic understanding of groundwater dynamics. Additionally, the sensitivity analysis conducted reveals the models’ responsiveness to variations in these factors, underscoring the intricate relationships that govern groundwater behavior. Furthermore, this research lays the groundwork for future studies to include socio-economic factors, such as population growth and agricultural practices, which are pivotal in shaping groundwater management strategies. By addressing these critical elements, our findings not only advance the methodological framework for groundwater prediction but also offer actionable insights for policymakers in Morocco and similar regions facing water scarcity challenges.
CONCLUSION
This study highlights the critical issue of groundwater decline in the Rabat–Salé–Kénitra region of Morocco, exacerbated by prolonged droughts and increasing demands from agriculture and population growth. By leveraging advanced machine learning models such as ANN, GB, SVR, DT, and RF, we were able to effectively predict variations in groundwater storage. Among these models, RF and GB consistently demonstrated superior predictive performance, making them valuable tools for groundwater management.
The integration of Prophet, known for its ability to handle seasonality in time series data, further enhanced the reliability of our predictions. Sensitivity analysis provided crucial insights into how different environmental factors, such as evapotranspiration and precipitation, impact groundwater levels. This analysis underscored the importance of model-specific responses to input variations, aiding in the selection of the most robust models for specific scenarios.
Our findings emphasize the urgent need for advanced predictive models in groundwater management strategies to mitigate the adverse effects of groundwater decline. Policymakers can utilize these models to implement sustainable water management practices, regulate groundwater extraction, promote water-saving technologies, and enhance groundwater recharge. Addressing these issues is vital to ensuring the sustainability of Morocco’s groundwater resources in the face of increasing drought conditions and climate change impacts.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.
FUNDING
We declare there is no financial support for this research.