Groundwater availability is one of the key anxieties in most semi-arid regions of Ethiopia. The purpose of this study was to investigate the groundwater potential zone map of the alluvial plain of Gambela. The study applied analytic hierarchy process (AHP) models with four different machine learning algorithms: random forest classifier (RFC), gradient boosting classifier (GBC), decision tree classifier (DTC), and K-neighbor classifier (KNC). The features that are used as predictors include geology, geomorphology, slope, soil, lineament density, drainage density, land use and land cover (LULC), normalized difference vegetation index (NDVI), topographic wetness index (TWI), topographic roughness index (TRI), and rainfall. The final output of the groundwater potential zone was classified as low, moderate, high, and very high potential zones. The authentication through receiver operating curve (ROC) shows 78.2, 93.4, 92.5, 72.4, and 87.7% values of area under the curve (AUC) for AHP, RFC, GBC, DTC, and KNC, respectively. The results show that RFC and GBC are the best groundwater potential zone (GWPZ) map estimator. The study also shows that rainfall and geomorphology are the primary factors influencing the GWPZ. The outcome might promote improved management alternatives in other areas of the country with a comparable climate.

  • The current study is the application of four MLAs.

  • Using 11 groundwater-influencing criteria to compare models and criteria for the first time.

  • The study area is a remote area that was not given the required attention by the researchers.

  • The current study should become the benchmark for researchers in the area.

  • Applying as many criteria that are expected to influence.

Water is a great mover and doer, constantly modifying the landscape’ (Fieth 1973). Investigation of the water resource is the concern of scholars from the beginning of research works. The chief potable water supply of Africa is covered by groundwater resources (MacDonald et al. 2012; Godfrey et al. 2019). Large sedimentary basins made of sandstones and limestones are found mostly in semi-arid and dry sections of Africa's subsurface environment, together with foundation rocks and mudstones with poor transmissivity. Some of the world's greatest freshwater reserves are found in these basins (Medici et al. 2023). Groundwater becomes the primary source of water consumption in semi-arid regions where surface water is scarce (Magesh et al. 2012; Ostad-Ali-Askari et al. 2017; Manna et al. 2022).

Ethiopia possesses vast surface water capacity with eight primary river basin systems, wherein 90% of the water flows consistently throughout the year, crossing international borders. In line, a sufficient amount of surface water resources can increase subsurface storage (Ostad-Ali-Askari et al. 2017; Morbidelli et al. 2018). The potential of groundwater in Ethiopia was not well known (Worqlul et al. 2017). The country's freshwater provision relies on its groundwater reserves sourced from springs, shallow manual wells, and deep wells. Groundwater from the aforementioned sources is used by many sectors, from private citizens to governmental organizations, from urban supplies to industrial water needs. Though little research has been done in the past, there is still much to learn about the subsurface hydrological component of Ethiopian topography. Additionally, the country is experiencing a severe water problem due to climate change and unsustainable economic growth, which increases the strain on the already precarious groundwater resources to meet these unprecedented difficulties (Kebede 2013; Seifu et al. 2022).

Groundwater is a concealed asset and its investigations can be a daunting undertaking. There exist numerous techniques for groundwater exploration across the globe (Jothimani et al. 2021). Among the techniques, test drilling (Regenspurg et al. 2018) and stratigraphic analysis (Campo et al. 2020) are the unique methods for aquifers parameters investigation. But these methods are cost-intensive and time-consuming and also require technology and skilled manpower (Mukherjee et al. 2012). Currently, several types of methods and strategies have been implemented in different parts of the world for generating groundwater potential zone mapping (Razandi et al. 2015; Golkarian et al. 2018; Azma et al. 2021). Machine learning algorithms (MLAs) are a relatively new technology showing promising results (Gómez-Escalonilla et al. 2022). The models are totally based on computer approaches and are used to address difficult situations with complex datasets. Working directly with raw data is the algorithms’ main benefit since it dramatically reduces expert prejudice (Gómez-Escalonilla et al. 2021). MLAs, such as random forest classifier (RFC), gradient boosting classifier (GBC), decision tree classifier (DTC), and K-neighbor classifier (KNC), have been utilized in several studies for mapping groundwater potential zones (Naghibi et al. 2017; Avand et al. 2019; Nguyen et al. 2020; Al-Abadi et al. 2021). These studies produced trustworthy maps with a high degree of precision and displayed strong outcomes.

The researchers did not pay enough attention to the study region because it is in a rural place. The region is distinguished by a semi-arid climate, limited water resources, and erratic weather patterns. Due to remoteness and security concerns for data collection, the aforementioned method is new in the chosen research location and has not been tested in a comparable oasis region. Therefore, this study aims to apply MLAs and analytic hierarchy process (AHP) techniques with Geographical Information System (GIS) and remote sensing for the prediction of groundwater potential. To predict possible groundwater sites for the research region, this effort was also made using MLAs of the RFC, GBC, DTC, and KNC. The region is covered with Quaternary sediment (alluvial-lacustrine), an important geologic component that has spread over the plain (Kebede 2013). Numerous variables, such as temperature, agricultural methods, land use, and land cover, can affect the groundwater patterns in alluvial zones (Jannis et al. 2021).

The key originality of the present study is the use of four MLA models and AHP to compare the effectiveness of each model for groundwater potential site prediction. Additionally, this particular study uses 11 influencing variables for groundwater potential zone mapping to assess models and criteria for the first time. The objectives of the current investigation are to (i) establish the significance of the geological, morphological, and hydroclimatic aspects in assessing groundwater, (ii) recognize the factors that affect groundwater and appraise their impact, (iii) evaluate the AHP technique for GWPZ study, and (iv) assess the efficiency of MLAs in identifying GWPZ.

Description of the study area

The Gambela Plain is situated in the western region of Ethiopia's Baro-Akobo Basin, one of the country's major river basins (Figure 1). The study region encompasses an area of 18,222 sq. km, comprising the alluvial plain in the western part. Geographically, the region is situated between 33°00′ and 34°48′ east and 7°18′ to 8°35′ north. The elevation varies from 965 m a.m.s.l in the east to 388 m a.m.s.l. in the western region bordering South Sudan. While the eastern catchments feature moderately sloping, the western catchments have flat to very gently undulating plains. Pleistocene-Holocene continental sediments (alluvium and lacustrine sediments) are known to cover the majority of the area. These sediments are underlain by basalt, high-grade, and undivided basement rock. The Basement complex aquifers, Holocene alluvial aquifers, Quaternary alluvial-lacustrine aquifer, and regional sandstone aquifers of the Pliocene Alwero formation are among the four hydrostratigraphic units that are said to exist in the research area (Kebede 2013; Alemayehu et al. 2017). The eastern hilly regions are covered by shallow and extremely shallow basement aquifers with high to relatively low groundwater potentials (Kebede 2013). Arid and semi-arid climates are characteristic of the region's climatology (Berhanu et al. 2013). The primary sources of daily sustenance in the region are the natural resources found in the forest and woodlands. Fisheries also play an important role as a food source in the Gambela Plain. The region is accessed through Addis Ababa-Ambo-Nekemte-Gambela main road.
Figure 1

Geographical location of the study area.

Figure 1

Geographical location of the study area.

Close modal

Preparation of thematic layers

The features that are used as predictors include geology, geomorphology, slope, soil, lineament density, drainage density, land use and land cover (LULC), normalized difference vegetation index (NDVI), topographic wetness index (TWI), topographic roughness index (TRI), and rainfall. We acquired the study's geological map from the Ethiopian Geological Survey Institute (GSI). Data on groundwater, soil (in shapefile format), and LULC were collected from the Ministry of Water (MoW). The National Metrological Agency (NMA) provided information on the meteorological conditions. Additionally, we discover and download satellite data from the USGS Earth Explorer website, such as SRTM DEM and Sentinel 2 pictures (https://earthexplorer.usgs.gov). The satellite imagery used was captured from January to September 2021 with 20% cloud coverage. Interactive supervised classification was used to classify the LULC of the area. Converting and geo-referencing all the available data into Universal Transverse Mercator (UTM) zone 37 was the common step manipulation. Each thematic map was divided into 5–10 subclasses based on the specific character of each thematic layer (Supplementary Table A2). The Rockwork 16, ERDAS Imagine, Surfer, and ArcGIS software were used in this study for data processing.

Geology

The Baro-Akobo Basin in general and the Gambela Plain in particular, consist of crystalline bedrock in the highlands to the east, porphyritic layers of Tertiary lava and Quaternary sediments in the lowlands to the west. The majority of the lowlands of the Baro-Akobo River Basin are covered in quaternary undifferentiated mostly recent alluvial, fluvial, and lacustrine deposits, according to the general geological description of Ethiopia (Mengesha et al. 1996). The geology of the study area was developed from the published geologic map of Ethiopia (1996). The study area is covered by shallow alluvial-lacustrine sediments. The reports show that alluvial terrain covers up to one-fourth of the country's landmass (Kebede 2013). Alluvial-lacustrine deposits comprising sand, silt, clay, diatomite, limestone, and beach sand are the primary geological components that account for 80% of the region. Along the eastern border, the remaining 20% of the territory is covered by late to post-tectonic granite, Makonnen basalts, Baro group, Enticho sandstone, and Birbir group (Figure 2(a)). The geological components in the research region are described in Supplementary Table A3 in the appendix.
Figure 2

Groundwater influencing parameters of the study area: (a) geology, (b) geomorphology, (c) soil texture, and (d) slope.

Figure 2

Groundwater influencing parameters of the study area: (a) geology, (b) geomorphology, (c) soil texture, and (d) slope.

Close modal

Geomorphology

The majority of the study area is flat with a small inclination. The ridge and mountains are situated in the eastern headwater area. The height difference between the lowest and highest locations is 577 m. The geomorphology of the study was developed using the topographic position method (Starbuck et al. 2022).

Five groups of landforms, including canyons, shallow valleys, plain regions, local ridges, and high ridges and mesas, were identified in the research area (Figure 2(b)). Accordingly, the plain area has large area coverage in the landscape (82.6%) while the high ridge has the lowest coverage (0.12%).

Soil texture

The aquifer's capacity to transmit water is significantly influenced by the type, texture, permeability, and structure of the soil. The characteristics of the soil particles affect how much water infiltrates into the aquifer medium. Because different soil textures have varied infiltration rates, groundwater recharge is strongly affected by soil texture; fine-grained soil has a relatively low groundwater recharge compared to coarse-grain soil due to its low degree of porosity and permeability (Seifu et al. 2023). Clay soil has the lowest ability for penetration, but sandy soils have a high infiltration rate (Juandi & Syahril 2017). According to the soil classification report by Berhanu et al. (2013), clay, loam, and loamy sand are the most common soil textures discovered in the research region. These soil textures have corresponding areal coverages of 60, 28, and 12% for clay, loamy sand, and loam, respectively (Figure 2(c)).

Slope

Changes in runoff and infiltration, which are controlled by the steepness of the surface, have an effect on groundwater recharge. Because runoff rises with slope steepness, recharge to the subsurface falls, and vice versa (Nag & Kundu 2018). Groundwater potential zone will be higher in flat areas having low runoff (Rajaveni et al. 2017).

The research area's slope map was produced using ArcGIS tools from DEM and ranges from 0 to 70.7° (Figure 2(d)). The slope was then reclassified into five subclasses: flat sloping (0–2°), gentle sloping (2°–5°), strong sloping (5°–10°), moderate steep sloping (10°–15°), and very steep sloping (>15°).

Land use and land cover (LULC)

One of the most important variables influencing how raindrops behave on the surface is land cover. Land cover can facilitate subsurface infiltration on vegetated surfaces, while it can exacerbate runoff on bare soil (Shayannejad et al. 2022). Sentinel 2 satellite imagery was used to create LULC maps. The initial step was gathering Sentinel 2 footage of the particular area without any clouds. The satellite dataset was geometrically corrected to a common projection (Universal Transverse Mercator (UTM)). An inference-supervised classifier technique was used to classify images with 10 m compositions (bands 2, 3, 4, and 8) of sentinel 2. The land use types identified include forest land, agricultural land (annual, perennial), shrubland, grassland, waterbody, sandy area, rock outcrops, wetland, and built-up area (Figure 3(a)). Build-up and bare land have less infiltration while waterbody and wetlands have greater infiltration capacity.
Figure 3

Groundwater influencing parameters of the study area: (a) land use land cover, (b) rainfall, (c) NDVI, and (d) drainage density.

Figure 3

Groundwater influencing parameters of the study area: (a) land use land cover, (b) rainfall, (c) NDVI, and (d) drainage density.

Close modal

Rainfall

Rainfall is one of the major influencing elements that affect a region's potential for groundwater. Precipitation increases the amount of water on the surface, which raises the likelihood that water may seep deeper. In comparison to other sections of the country, the research area includes places with fewer data and meteorological stations. The average annual rainfall in the region was 1,037.6 mm/year. The spatial rainfall of the study area was calculated with kriging interpolation in ArcGIS (Figure 3(b)).

Normalized difference vegetation index (NDVI)

NDVI is a numerical indicator that illustrates and measures the behavior of vegetation cover in an area. NDVI analyzes remote sensing images to identify whether a target contains green vegetation (Ozyavuz et al. 2015). NDVI is expressed using near-infrared (NIR) and visible red (R) light to measure the presence of vegetation index in plants.
(1)

High values approaching one signify the vegetation cover while the lower value (−1 to 0) may be clouds, snow, or water (Figure 3(c)). A positive value near zero signifies bare lands, rock area, etc. (Gandhi et al. 2015).

Drainage density

In the studied region, the drainage pattern resembles a dendritic pattern (Yang et al. 2020). A region's drainage density is calculated by dividing the whole length of the stream by the total area. There are streams nearby if a region has a high drainage density, and vice versa. Groundwater volume is inversely proportional to drainage density (Magesh et al. 2012; Nag & Kundu 2018). The drainage density (Dd) of an area is calculated using the formula:
(2)
where Di is the total length of all streams (km) and A is the watershed area (km2) (Figure 3(d)).

Lineament density

In general, geologic structures such as fractures, faults, and discontinuous surfaces are defined as lineaments that can be identified using remote sensing and GIS techniques; that is architecture of the rock basement and significantly important in hydrological studies that work as pathways of groundwater movement and as a curvilinear feature of earth playing an important role in GW exploration (Ruidas et al. 2021). Lineaments show the porosity of areas that act as channels for groundwater circulation (Yamusa et al. 2018). Linear properties are mapped using remotely sensed data to create lineament maps, which are necessary to comprehend the existence of groundwater (Ifediegwu 2022). The rose diagram (Figure 4(b)), which depicts the direction of the lineament within the region, reveals that it follows N-S orientation.
Figure 4

Groundwater influencing parameters of the study area: (a) lineaments density map (the faults on the surface), (b) rose diagram map for general lineament orientation, (c) TWI, and (d) TRI.

Figure 4

Groundwater influencing parameters of the study area: (a) lineaments density map (the faults on the surface), (b) rose diagram map for general lineament orientation, (c) TWI, and (d) TRI.

Close modal
The lineament density (Ld) of an area is defined as the ratio of the total length of the lineament structure to the area coverage:
(3)
where Li is the length of ith lineament (km) and A is the area of the catchment (km2). The lineament density in km/km2 was classified into five classes (Figure 4(a)).

Topographic wetness index (TWI)

The TWI is a parameter that shows the terrain profile which can control water accumulation. The TWI was developed by Beven and Kirkby (Beven et al. 2021) with TOPMODEL.

The TWI shows the quantity of water that can flow within a specific site. The TWI is used to determine the groundwater potential infiltration due to topographic parameters. The TWI combines the slope gradient and local upslope contributing area (Pandey et al. 2022). The TWI of the area is calculated as (Figure 4(c)):
(4)
where α is the upslope contributing area and β is topographic gradient (slope).

Topographic roughness index (TRI)

TRI is a topographic statistic that measures landforms’ local elevation fluctuation. The texture of terrain can range from flat to uneven/irregular. The roughness index measures how much the ground surface changes in a given area. The topographic roughness map depicts the ruggedness/irregularity of land surface attributes in a DEM grid cell (Lindsay et al. 2019). The TRI is given as:
(5)
where FSmean means focal statistic, FSmax means maximum focal statistics, and FSmin means minimum focal statistic of a surface (Mukherjee & Singh 2020). The lowest range of terrain roughness values coincided with high groundwater potential zones, receiving higher ranked values, and vice versa (Figure 5(d)).
Figure 5

Methodology framework of the study.

Figure 5

Methodology framework of the study.

Close modal

Methods applied for GWPZ

Analytic hierarchy process (AHP)

Multi-criteria decision-making (MCDM) was used to assign weights to the chosen criterion. The application of MCDM helps individuals or groups of decision-makers examine their choices in the context of complicated scenarios involving several factors (Arabameri et al. 2019; Ravichandran et al. 2022).

The groundwater potential zone is defined by 11 influencing criteria. Using Saaty's scale of relative significance, the value of the weight was determined. The first step was to create a hierarchical structure for these parameters’ influence on groundwater potential zonation. The relative importance of each criterion is evaluated by creating an 11 by 11 pairwise assessment matrix (Table 1). By dividing each value by the total of the appropriate column, the weight of each thematic layer was transformed into a normalized value. The weighting for each thematic layer is provided by the row's common value in the normalized pairwise matrix table (Supplementary Table A1). Each thematic layer received weight based on how it affected the GWPZ of the test site (Supplementary Table A2).

Table 1

Pairwise comparison matrix

RFGeom.GeoLDSlopeSoilLULCDDNDVITWITRI
RF           
Geom. 0.33          
Geo 0.33 0.33         
LD 0.2 0.33        
Slope 0.2 0.2 0.33       
Soil 0.2 0.2 0.33 0.5      
LULC 0.14 0.2 0.2 0.33 0.33     
DD 0.14 0.14 0.2 0.33 0.33 0.33    
NDVI 0.14 0.14 0.2 0.2 0.2 0.33 0.33 0.33   
TWI 0.13 0.13 0.14 0.2 0.2 0.2 0.33 0.33  
TRI 0.11 0.13 0.14 0.14 0.17 0.2 0.2 0.33 0.5 
RFGeom.GeoLDSlopeSoilLULCDDNDVITWITRI
RF           
Geom. 0.33          
Geo 0.33 0.33         
LD 0.2 0.33        
Slope 0.2 0.2 0.33       
Soil 0.2 0.2 0.33 0.5      
LULC 0.14 0.2 0.2 0.33 0.33     
DD 0.14 0.14 0.2 0.33 0.33 0.33    
NDVI 0.14 0.14 0.2 0.2 0.2 0.33 0.33 0.33   
TWI 0.13 0.13 0.14 0.2 0.2 0.2 0.33 0.33  
TRI 0.11 0.13 0.14 0.14 0.17 0.2 0.2 0.33 0.5 

Note: RF, rainfall; Geom., geomorphology; Geo, geology; LD, lineament density; LULC, land use/land cover; DD, drainage density; NDVI, natural difference vegetation index; TWI, topographic wetness index; TRI, topographic roughness index.

Checking consistency

Through the use of AHP methodologies, consistency index (CI) and consistency ratio were used to assess the allotted weight's hesitancy (Ravichandran et al. 2022). The CI is defined as:
(6)
where λmax is the maximum eigenvalue and n is the matrix size (number of criteria). The value of maximum eigenvalue is calculated from the average ratio of weighted sum value and criteria weight. The consistency ratio (CR) of CI is calculated as:
(7)
where RCI is the random consistency index; its value depends on the number of parameters (n). For our case, the number of criteria n = 11 and we take the corresponding value from random consistency index table (Supplementary Table A4).
The CR, in this case, becomes 0.05 < 0.1 which is consistent. The acceptable value of CR is less than 0.1 (Franek & Kresta 2014). ArcGIS weighted overlay analysis capabilities for thematic maps were utilized to create the groundwater potential zone for the research region. ArcGIS divided each thematic layer into subclasses based on the kind of thematic layer. Based on their impact on groundwater potential zonation, these subclasses of the thematic layer have been awarded ratings 1–5, respectively. The overlay analysis was performed according to the equation:
(8)
where GWPZ is the groundwater potential zone, Wi is the weight of each thematic layer, and Ri is the rank of the subclasses of each thematic layer.

Random forest classifier (RFC)

Random forests are supervised MLAs that can be applied in majority voting for classification or averaging for regression. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned (Tyralis et al. 2019).

Using ensemble learning, the random forest method combines numerous classifiers to offer solutions to challenging issues. The tendency of decision trees to overfit their training set is corrected by random decision forests. If Cb (x) is the class prediction of bth random forest tree, then:
(9)

Gradient boosting classifier (GBC)

Gradient boosting is a functional gradient algorithm that iteratively selects a function pointer in the direction of the negative gradient, or a weak hypothesis, in order to minimize a loss function (Sachdeva & Kumar 2021). The gradient booster model update equation is:
(10)

Decision tree classifier (DTC)

A nonparametric supervised learning technique for classification and regression is called a decision tree. The objective is to learn straightforward decision rules derived from the data features in order to build a model that predicts the value of a target variable. A piecewise constant approximation of a tree can be thought of. The techniques known as decision tree inducers create a decision tree from a given dataset automatically. Typically, the best decision tree is found by reducing the generalization error (Tanimu et al. 2022).

K-neighbor classifier (KNC)

An example of nonparametric learning algorithm is the KNC. The algorithm determines the distance between the target point and the nearest points in the classification setting using the value supplied for K and the maximum number of votes from these adjacent points in relation to the number of points that were selected. If we have data with certain qualities (X) and the value of the relationship, the algorithm KNC belongs to the class of algorithms that can categorize an unknown item (Y) (Avand et al. 2019). This approach uses the Euclidean distance function (Drt) to determine how closely related (neighborhood) the real-time prediction value Xr = (X1n, X2n, X3n, …, Xmr) is to the predictive value for each historical observation: Xt = (X1b, X2b, X3b, …, Xmr):
(11)
where Wi (i = 1, 2,…, m) is the weight of predictors, whose sum is equal to one.

Dataset for MLAs

The 2,959 dataset was split into the train, validation, and test data, which were organized at 70, 20, and 10%, respectively, to increase the performance of our MLA models (Figure 5). As a general rule, normalization and standardization are utilized in the next stage to increase the neural network's effectiveness. Data normalization ensures that each character contributes equally to the sum. This does not imply that all characteristics are equally significant when choosing a classifier, though. In several application domains, researchers have employed normalizing techniques to enhance classification performance (Thanh et al. 2022). In this study, we use Min-Max Scalar normalization (Deepa & Ramesh 2022).

Random Search CV: The hyperparameter implementation ‘Randomized Search’ goes by the name of the Randomized Search CV technique (Bergstra et al. 2012). Cross-validation, score, parameter distributions, estimators, and the number of iterations are only a few of the factors that the function considers. This method, in contrast to grid search, posits that not all hyperparameters are equally important. Each cycle generates a different random set of hyperparameters, which helps find more potent combinations. The probability of discovering beneficial combinations rises as a result of the random creation of a new set of hyperparameters with each cycle.

Model validation

Model validation refers to the process of ensuring that the model actually achieves its intended purpose (Naghibi et al. 2016). Any scientific research's verifiability serves as a gauge of its quality. By employing the provided model approaches, the performance of GWPZ prediction has been assessed using the ROC and statistical measures of accuracy, precision, recall, F1-score, and kappa index (Table 2).

Table 2

Accuracy indices calculated for each classifier

NoParametersFormulas
Accuracy  
Precision  
Recall  
F1-score  
Overall accuracy (OA)  
Expected accuracy (EA)  
Kappa  
NoParametersFormulas
Accuracy  
Precision  
Recall  
F1-score  
Overall accuracy (OA)  
Expected accuracy (EA)  
Kappa  

Note: TP is the number of truly classified (true positive), TN is the number negative classified as true negative, FP is the number of positive classified as false positive, FN is the number of negative classified as false negative, N is the total number of points, and EA is the expected accuracy.

Multicollinearity and features selection

The variance inflation coefficient (VIF) measure was used to confirm the severity of collinearity between independent variables. Detect how collinearity affects the variance of the coefficient estimates. Additionally, tolerance indicates the percentage of variance in a predictor that cannot be explained by other predictors. In practice, if the VIF is greater than 4 or the tolerance is less than 0.25, multicollinearity may be present and needs further investigation (Ashwini et al. 2023). According to the findings (Table 3), there is not much severe collinearity among the independent variables in the data.

Table 3

VIF and tolerance result of the predictors

PredictorsVIFTolerance
Geomorphology 1.04 0.96 
Geology 1.19 0.83 
Drainage density 1.07 0.93 
Lineament density 1.19 0.84 
LULC 1.43 0.69 
NDVI 1.42 0.7 
Slope 1.13 0.88 
Soil 1.11 0.9 
RF 2.15 0.46 
TWI 1.13 0.88 
TRI 1.01 0.98 
PredictorsVIFTolerance
Geomorphology 1.04 0.96 
Geology 1.19 0.83 
Drainage density 1.07 0.93 
Lineament density 1.19 0.84 
LULC 1.43 0.69 
NDVI 1.42 0.7 
Slope 1.13 0.88 
Soil 1.11 0.9 
RF 2.15 0.46 
TWI 1.13 0.88 
TRI 1.01 0.98 

Prediction results of GWPZ

One of the most crucial aspects of groundwater modeling research is the creation of groundwater potential maps; hence, maps were created after training and validation in two primary processes. First, groundwater potential indices were created for each pixel in the research region. Next, these indices were reclassified using the natural break approach. Based on potential, each GWPZ map is broken down into five subclasses: very low, low, moderate, high, and very high. In the majority of models, very high potential zones cover the largest areas whereas low classes only cover the tiniest sections. A region with very low potential is located on the research area's southern edge (Figure 6).
Figure 6

Groundwater potential zone maps produced the AHP, RFC, KNC, GBC, and DTC models.

Figure 6

Groundwater potential zone maps produced the AHP, RFC, KNC, GBC, and DTC models.

Close modal
According to AHP results, 43.4% of the area is covered by very high potential zone followed by high and low potential coverage with 22 and 17%, respectively (Figures 6 and 7). The GWPZ map results in RFC show very high and very low-class coverage in larger areas of 44.9 and 20.6%, respectively. The two extreme points cover a large area of the map (65.4%) for RFC. The GWPZ classification with GBC results 16.1, 12.7, 12.3, 20.8, and 38% area coverage for very low, low, moderate, high, and very high, respectively. Large area coverage in the GBC model is a high and very high class of GWPZ (58.8%). In the DTC, high and very high potential classes make up the majority of the coverage (66.23%). The GWPZ class with KNC results in 16.2, 7.9, 15.1, 21.2, and 39.6% area coverage for very low, low, moderate, high, and very high, respectively (Figures 6 and 7).
Figure 7

GWPZ mapping using AHP and four MLA models.

Figure 7

GWPZ mapping using AHP and four MLA models.

Close modal

Model validation

The models’ performance has been assessed by employing five metrics: accuracy, precision, kappa, recall, and F1-score. The hyperparameters are fitted using the Randomized Search CV technique (Bergstra et al. 2012). It is crucial to specify the search space and provide a starting point. The estimator uses a random state of 42 and performs three convolutions for every 100 candidates, resulting in 300 fits. Table 4 shows the search space used in the base and best scenarios.

Table 4

Hyperparameter used for the baseline and best algorithms including the search space

HyperparametersSearch spaceRFC
GBC
DTC
KNC
BaselineBest parametersBaselineBest parametersBaselineBest parametersBaselineBest parameters
n_estimators [200, 2,000, num = 10] [‘100’] [‘600’] [‘100’] [‘200’] 
max_features [‘auto’, ‘sqrt’, ‘log2’] [‘auto’] [‘auto’] [‘auto’] [‘sqrt’] [‘None’] [‘auto’] 
max_depth [10, 110, num = 11] [‘None’] [‘60’] [‘None’] [‘50’] [‘10’] [‘80’] 
min_samples_split [2, 5, 10] [‘2’] [‘2’] [‘2’] [‘10’] [‘2’] [‘2’] 
min_samples_leaf [1, 2, 4] [‘1’] [‘2’] [‘1’] [‘2’] [‘1’] [‘2’] 
bootstrap [True, False] [‘True’] [‘False’] None None 
Metric [‘euclidean’, ‘manhattan’, ‘minkowski’] [‘minkowski’] [‘'manhattan’] 
N_Neighors [3, 5, 11, 19]       [‘5’] [‘19’] 
Weights [‘uniform’, ‘distance’]       [‘uniform’] [‘distance’] 
critrion [‘gini’, ‘entropy’]     [‘gini’] [‘entropy’]   
HyperparametersSearch spaceRFC
GBC
DTC
KNC
BaselineBest parametersBaselineBest parametersBaselineBest parametersBaselineBest parameters
n_estimators [200, 2,000, num = 10] [‘100’] [‘600’] [‘100’] [‘200’] 
max_features [‘auto’, ‘sqrt’, ‘log2’] [‘auto’] [‘auto’] [‘auto’] [‘sqrt’] [‘None’] [‘auto’] 
max_depth [10, 110, num = 11] [‘None’] [‘60’] [‘None’] [‘50’] [‘10’] [‘80’] 
min_samples_split [2, 5, 10] [‘2’] [‘2’] [‘2’] [‘10’] [‘2’] [‘2’] 
min_samples_leaf [1, 2, 4] [‘1’] [‘2’] [‘1’] [‘2’] [‘1’] [‘2’] 
bootstrap [True, False] [‘True’] [‘False’] None None 
Metric [‘euclidean’, ‘manhattan’, ‘minkowski’] [‘minkowski’] [‘'manhattan’] 
N_Neighors [3, 5, 11, 19]       [‘5’] [‘19’] 
Weights [‘uniform’, ‘distance’]       [‘uniform’] [‘distance’] 
critrion [‘gini’, ‘entropy’]     [‘gini’] [‘entropy’]   

The hyperparameters used in the two scenarios (baseline and random search) were applied as follows are given in Table 5. The RFC model fared better than the other models in this experiment in terms of accuracy (96.42%), precision (89%), recall (89%), F1-score (88%), and kappa (0.76). Following RFC, the GBC model measured metrics including Accuracy (96.11%), Precision (88%), Recall (87%), F1-score (88%), and Kappa (0.74). Since all of the top-performing models have kappa values between 0.61 and 0.80, the interpretation demonstrates substantial performance (Czodrowski 2014). The study makes use of weighted and macro averages. But the weighted average produced respectable outcomes. Accordingly, the models exhibited a significant improvement once the random search CV hyperparameters were optimized. GBC has the greatest improvement, followed by DTC, which displays improvements of 2.09 and 1.66%, respectively.

Table 5

Performance analysis result for all algorithms

ModelAccuracyprecisionRecallF1-scorekappaImprovement
GBC Baseline 94.14% 80% 81% 81% 0.61 2.09% 
Random search CV 96.11% 88% 88% 87% 0.74 
RFC Baseline 95.30% 85% 84% 84% 0.68 1.18% 
Random search CV 96.42% 89% 89% 88% 0.76 
DTC Baseline 93.02% 78% 77% 77% 0.54 1.66% 
Random search CV 94.57% 82% 82% 82% 0.63 
KNC Baseline 92.09% 74% 74% 74% 0.46 1.01% 
Random search CV 93.02% 77% 77% 77% 0.52 
ModelAccuracyprecisionRecallF1-scorekappaImprovement
GBC Baseline 94.14% 80% 81% 81% 0.61 2.09% 
Random search CV 96.11% 88% 88% 87% 0.74 
RFC Baseline 95.30% 85% 84% 84% 0.68 1.18% 
Random search CV 96.42% 89% 89% 88% 0.76 
DTC Baseline 93.02% 78% 77% 77% 0.54 1.66% 
Random search CV 94.57% 82% 82% 82% 0.63 
KNC Baseline 92.09% 74% 74% 74% 0.46 1.01% 
Random search CV 93.02% 77% 77% 77% 0.52 

Feature importance

The features class characteristics are listed on the left side of Figure 8 in order of their relevance to model prediction. The horizontal axis displays the average absolute SHAP value of each feature. Geomorphology and rainfall are the most important, followed by drainage density and NDVI, while TRI and geology are the least important for GWPZ prediction. The right side of Figure 8 illustrates not only the significance of the characteristic but also whether it has a positive or negative influence on prediction. According to the test results, high precipitation, drainage density, NDVI, and TRI levels improve GBC forecast quality. The expected RFC quality is improved by high rainfall, NDVI, and geology factors. DTC prediction quality is enhanced due to high rainfall, geology, and lineament density. The presentation demonstrates that the average values of geomorphology produce a good forecast for all three models.
Figure 8

The Summary bar (on the left) and global interpretability plot (on the right) for GBC, RFC, and DTC.

Figure 8

The Summary bar (on the left) and global interpretability plot (on the right) for GBC, RFC, and DTC.

Close modal

ROC analysis

The validation was performed using the receiver operating curve (ROC). By mapping the effects of varied choice criteria, the ROC accounts for all possible combinations of correct and incorrect judgments. ROC is a well-known method of justification of groundwater potential zone mapping (Naghibi et al. 2017; Andualem & Demeke 2019; Sameen et al. 2019). ROC is a graph of the connection between the true-positive rate (sensitivity) and the false-positive rate (1-specificity). According to ROC analysis, the area under the curve (AUC) demonstrates the accuracy of prediction. AUC values vary from 0 to 1, and values close to 1 indicate better model accuracy. The value of area under the ROC (AUC) is expressed as poor (0.5–0.6); average (0.6–0.7); good (0.7–0.8); very good (0.8–0.9); and excellent (0.9–1). The well data was used to check the accuracy for the GWPZ delineation (Mukherjee & Singh 2020). The groundwater potential zone output map was contrasted with the map of the water level in the research area. Wellpoint data from around the research region was utilized to compare and examine the maps generated by each model. The next step is determining the area under the ROC. The analysis of the models shows that the AUC of AHP, RFC, GBC, KNC, and DTC, respectively, results in 78.2, 93.4, 92.5, 87.7, and 72.4% (Figure 9). From the ROC results of each model, it is clear that RFC and GBC are the best models for GWPZ estimation.
Figure 9

ROC of the groundwater potential zone maps for each model.

Figure 9

ROC of the groundwater potential zone maps for each model.

Close modal

According to the results, the AUC of ROC for AHP, RFC, GBC, DTC, and KNC were 0.782, 0.934, 0.925, 0.724, and 0.87, respectively. These depict that all the models efficiently perform with higher AUC values that are greater than 70%. These results are similar to previous groundwater potential prediction studies (Naghibi et al. 2019; Thanh et al. 2022). Compared to the other models, RFC and GBC are delivering great results, KNC provides very good results, while AHP and DTC are classified in the category of good outcomes. On the other hand, the DTC has the least predictive value than the other models. These results indicate that the RFC and GBC models were overfitted with excellent AUC values. According to the MLAs’ accuracy and the Kappa index of the findings, the order for GWPZ prediction is RFC > GBC > DTC > KNC. These results of RFC and GBC are similar to the studies in various regions of the world (Rahmati et al. 2016; Naghibi et al. 2017, 2020). Our findings are likewise comparable to those of Maskooni et al.'s (2020) research in northern Iran and Sachdeva & Kumar's (2021) study in India. In both of these studies, the GBC and RFC are best performing and the GBC model (0.874, 0.79) has greater accuracy than RFC (0.864, 0.71) for the respective studies.

The analysis shows that much of the Gambela Plain is high and very high GWPZ. The very high potential class covers 43.37, 44.63, 38.03, 33.81, and 39.6% for AHP, RFC, GBC, DTC, and KNC, respectively (Figure 7). GBC provides high area coverage for very high classes, while DTC predicts the lowest rank. The very low potential class is 3.86, 20.87, 16.13, 15.54, and 16.17% area coverage for AHP, RFC, GBC, DTC, and KNC, respectively. In comparison to other models, the GBC provides more area coverage for very low class, whereas the AHP provides the least area coverage. The DTC model has the most area coverage for the high and moderate potential classes, with 32 and 19.8%, respectively (Figure 7).

Geomorphology and rainfall are the key factors that affect the GWPZ of the study region in all model outputs. In MLAs, slope geology, TWI, and TRI have little influence on GWPZ prediction. MLAs, which are one the most effective tools for dealing with high-dimensional, unstable, and real-world problems, have become very popular, especially in geospatial applications including groundwater potential assessment. Many ensemble models are thought to be more effective at addressing prediction issues than single ML approaches (Naghibi et al. 2017). Eleven regularly utilized groundwater-contributing components are logically considered in the methodology for the study's high degree of accuracy. A number of recent studies worldwide applied AHP and MLAs to investigate the groundwater potential (Zabihi et al. 2016; Golkarian et al. 2018; Avand et al. 2019). The primary drawback of this study, similar to other spatial sciences applications like groundwater potential mapping, is that the researchers must examine the output of their models across several study areas to ensure that the output is universal. Additionally, this approach also has a drawback in that the final GWPZ maps might vary depending on the incorporation of new datasets and/or models. The results of our study show a very good accuracy of MLA performance than the other studies in the field due to a random search application. The baseline and the best parameters are distinguishable after the improvement. This method's success depends on a number of input dataset-related characteristics, as well as on how the algorithm is used and validated. Therefore, in order to enhance the performance of the future study, we highly advise using additional MLA and models and including a number of groundwater-influencing parameters. In order to ensure both groundwater availability and management, we advise that future studies evaluate this technique using qualitative indicators in additional locations with varied geo-environmental features.

Sustainable development is dependent on correct GWPZ evaluation. In areas where data is scarce, RS-based data products can provide valuable information. AHP, RFC, GBC, KNC, and DTC were all used for this particular investigation. Eleven thematic maps were generated that have an influence on the local groundwater. In general, the research region is distinguished by a gradual slope, a low number of lineaments, a wetland nature, and homogenous geological material of alluvial-lacustrine deposits, all of which contribute to a high groundwater potential. The eastern corner of the research region has some difficulties due to the abundance of geologic materials, the steepness of the slope, and the increased density of lineaments. The characteristics that had the greatest influence on GWPZ mapping for all of the models were geomorphology and rainfall. The GWPZ map produced by ArcGIS spatial analysis tools depicted five potential zones: very low, low, moderate, high, and very high. GWPZ is very high and high in agricultural, grazing, and wetland regions. The ROC was used to confirm the GWPZ's ability to anticipate using the methodologies that were used. The results reveal that RFC and GBC outperform GWPZ prediction. The discovered GWPZ map of the data-scarce and overlooked region (Gambela Plain) will be the ideal answer for stack holders and decision-makers to efficiently manage and plan the resource. Due to their high and rapid efficiency, the approaches used demonstrate the usefulness of MLAs, remote sensing, and GIS in spatial decision-making, notably in groundwater management.

The author expresses gratitude to all governmental bodies for supplying the information needed for this research project. The author would like to thank Haramaya University for providing the opportunity for my PhD studies and for sponsoring my tuition. I want to express my gratitude to the anonymous reviewer for their insightful comments, which helped the paper's quality greatly.

I certify that the information presented here is true and complete to the best of my knowledge. I declare that this work is not published anywhere and is not submitted to any journal for publication.

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

All authors contributed to the study's conception and design. Material preparation, data collection, and analysis were performed by T.K.S. and T.A.W. The first draft of the manuscript was written by T.K.S. and edited by T.A.W. Both T. Alemayehu and T. Ayenew read and commented on previous versions of the manuscript. The final version proofread was undertaken by T.K.S. All authors read and approved the final manuscript.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Al-Abadi
A. M.
,
Fryar
A. E.
,
Rasheed
A. A.
&
Pradhan
B.
2021
Assessment of groundwater potential in terms of the availability and quality of the resource: a case study from Iraq
.
Environmental Earth Sciences
80
(
12
).
https://doi.org/10.1007/s12665-021-09725-0
.
Alemayehu
T.
,
Kebede
S.
,
Liu
L.
&
Kebede
T.
2017
Basin hydrogeological characterization using remote sensing, hydrogeochemical and isotope methods (the case of Baro-Akobo, Eastern Nile, Ethiopia)
.
Environmental Earth Sciences
76
(
13
).
https://doi.org/10.1007/s12665-017-6773-8
.
Andualem
T. G.
&
Demeke
G. G.
2019
Groundwater potential assessment using GIS and remote sensing: a case study of Guna tana landscape, upper blue Nile Basin, Ethiopia
.
Journal of Hydrology: Regional Studies
24
.
https://doi.org/10.1016/j.ejrh.2019.100610
.
Arabameri
A.
,
Rezaei
K.
,
Cerda
A.
,
Lombardo
L.
&
Rodrigo-Comino
J.
2019
GIS-based groundwater potential mapping in Shahroud plain, Iran. A comparison among statistical (bivariate and multivariate), data mining and MCDM approaches
.
Science of the Total Environment
658
,
160
177
.
https://doi.org/10.1016/j.scitotenv.2018.12.115
.
Ashwini
K.
,
Verma
R. K.
,
Sriharsha
S.
,
Chourasiya
S.
&
Singh
A.
2023
Delineation of groundwater potential zone for sustainable water resources management using remote sensing-GIS and analytic hierarchy approach in the state of Jharkhand, India
.
Groundwater for Sustainable Development
21
,
100908
.
https://doi.org/10.1016/J.GSD.2023.100908
.
Avand
M.
,
Janizadeh
S.
,
Naghibi
S. A.
,
Pourghasemi
H. R.
,
Bozchaloei
S. K.
&
Blaschke
T.
2019
A comparative assessment of random forest and k-nearest neighbor classifiers for gully erosion susceptibility mapping
.
Water (Switzerland)
11
(
10
).
https://doi.org/10.3390/w11102076
.
Azma
A.
,
Narreie
E.
,
Shojaaddini
A.
,
Kianfar
N.
,
Kiyanfar
R.
,
Alizadeh
S. M. S.
&
Davarpanah
A.
2021
Statistical modeling for spatial groundwater potential map based on GIS technique
.
Sustainability (Switzerland)
13
(
7
).
https://doi.org/10.3390/su13073788
.
Bergstra
J.
,
Ca
J. B.
&
Ca
Y. B.
2012
Random search for hyper-parameter optimization Yoshua Bengio
.
Journal of Machine Learning Research
13
.
Berhanu
B.
,
Melesse
A. M.
&
Seleshi
Y.
2013
GIS-based hydrological zones and soil geo-database of Ethiopia
.
Catena
104
,
21
31
.
https://doi.org/10.1016/j.catena.2012.12.007
.
Beven
K.
,
Kirkby
M.
,
Freer
J. E.
&
Lamb
R.
2021
A history of TOPMODEL
.
Hydrology and Earth System Sciences
25
(
2
),
527
549
.
https://doi.org/10.5194/HESS-25-527-2021
.
Campo
B.
,
Bohacs
K. M.
&
Amorosi
A.
2020
Late Quaternary sequence stratigraphy as a tool for groundwater exploration: lessons from the Po River Basin (northern Italy)
.
AAPG Bulletin
104
(
3
),
681
710
.
https://doi.org/10.1306/06121918116
.
Czodrowski
P.
2014
Count on kappa
.
Journal of Computer-Aided Molecular Design
28
(
11
),
1049
1055
.
https://doi.org/10.1007/s10822-014-9759-6
.
Deepa
B.
&
Ramesh
K. R.
2022
Epileptic seizure detection using deep learning through min max scaler normalization
.
International Journal of Health Sciences
6
(
S1
),
10981
10996
.
Media.Neliti.Com. https://doi.org/10.53730/ijhs.v6nS1.7801
.
Fieth
J. H.
1973
Water Facts and Figures for Planners and Managers
.
John Henry Feth – Google Books
.
Available from
: https://books.google.com.et/books?hl = en&lr = &id = VE7PdFwgKVsC&oi = fnd&pg = PA3&dq = water + facts + &ots = m1AR8nr5Wg&sig = PQzJmilJy-MxlUFbNVcWLy2Q7x0&redir_esc = y#v = onepage&q = water%20facts&f = false
Franek
J.
&
Kresta
A.
2014
Judgment scales and consistency measure in AHP
.
Procedia Economics and Finance
12
,
164
173
.
https://doi.org/10.1016/S2212-5671(14)00332-3
.
Gandhi
G. M.
,
Parthiban
S.
,
Thummalu
N.
&
Christy
A.
2015
NDVI: vegetation change detection using remote sensing and GIS – a case study of Vellore District
.
Procedia Computer Science
57
,
1199
1210
.
https://doi.org/10.1016/j.procs.2015.07.415
.
Godfrey
S.
,
Hailemichael
G.
&
Serele
C.
2019
Deep groundwater as an alternative source of water in the Ogaden Jesoma sandstone aquifers of Somali region, Ethiopia
.
Water (Switzerland)
11
(
8
).
https://doi.org/10.3390/w11081735
.
Golkarian
A.
,
Naghibi
S. A.
,
Kalantar
B.
&
Pradhan
B.
2018
Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS
.
Environmental Monitoring and Assessment
190
(
3
).
https://doi.org/10.1007/s10661-018-6507-8
.
Gómez-Escalonilla
V.
,
Vogt
M.-L.
,
Destro
E.
,
Isseini
M.
,
Origgi
G.
,
Djoret
D.
,
Martínez-Santos
P.
,
Holecz
F.
,
Victor Gomez-Escalonilla
V.
&
Mart Inez-Santos
P.
2021
Delineation of groundwater potential zones by means of ensemble tree supervised classification methods in the Eastern Lake Chad basin
.
https://doi.org/10.1080/10106049.2021.2007298
Gómez-Escalonilla
V.
,
Martínez-Santos
P.
&
Martín-Loeches
M.
2022
Preprocessing approaches in machine-learning-based groundwater potential mapping: an application to the Koulikoro and Bamako regions, Mali
.
Hydrology and Earth System Sciences
26
(
2
),
221
243
.
https://doi.org/10.5194/hess-26-221-2022
.
Ifediegwu
S. I.
2022
Assessment of groundwater potential zones using GIS and AHP techniques: a case study of the Lafia district, Nasarawa State, Nigeria
.
Applied Water Science
12
(
1
),
1
17
.
https://doi.org/10.1007/S13201-021-01556-5/FIGURES/13
.
Jannis
E.
,
Adrien
M.
,
Annette
A.
&
Peter
H.
2021
Climate change effects on groundwater recharge and temperatures in Swiss alluvial aquifers
.
Journal of Hydrology X
11
,
100071
.
https://doi.org/10.1016/J.HYDROA.2020.100071
.
Jothimani
M.
,
Abebe
A.
&
Duraisamy
R.
2021
Groundwater potential zones identification in Arba Minch town, Rift Valley, Ethiopia, using geospatial and AHP tools
.
IOP Conference Series: Earth and Environmental Science
822
(
1
).
https://doi.org/10.1088/1755-1315/822/1/012048
.
Lindsay
J. B.
,
Newman
D. R.
&
Francioni
A.
2019
Scale-optimized surface roughness for topographic analysis
.
Geosciences (Switzerland)
9
(
7
).
https://doi.org/10.3390/geosciences9070322
.
MacDonald
A. M.
,
Bonsor
H. C.
,
Dochartaigh
B. É. Ó.
&
Taylor
R. G.
2012
Quantitative maps of groundwater resources in Africa
.
Environmental Research Letters
7
(
2
).
https://doi.org/10.1088/1748-9326/7/2/024009
.
Magesh
N. S.
,
Chandrasekar
N.
&
Soundranayagam
J. P.
2012
Delineation of groundwater potential zones in Theni district, Tamil Nadu, using remote sensing, GIS and MIF techniques
.
Geoscience Frontiers
3
(
2
),
189
196
.
https://doi.org/10.1016/j.gsf.2011.10.007
.
Manna
F.
,
Allocca
V.
,
De Vita
P.
,
Medici
G.
&
Langman
J. B.
2022
Pathways and estimate of aquifer recharge in a flood basalt terrain; a review from the South Fork Palouse River Basin (Columbia River Plateau, USA)
.
Sustainability
14
(
18
),
11349
.
https://doi.org/10.3390/SU141811349
.
Maskooni
E. K.
,
Naghibi
S. A.
,
Hashemi
H.
&
Berndtsson
R.
2020
Application of advanced machine learning algorithms to assess groundwater potential using remote sensing-derived data
.
Remote Sensing
12
(
17
),
2742
.
https://doi.org/10.3390/RS12172742
.
Mengesha
T.
,
Tadiwos
C.
&
Workineh
H.
1996
Geological Survey of Ethiopia
.
Morbidelli
R.
,
Saltalippi
C.
,
Flammini
A.
&
Govindaraju
R. S.
2018
Role of slope on infiltration: a review
.
557
,
878
886
.
https://doi.org/10.1016/j.jhydrol.2018.01.019
.
Mukherjee
P.
,
Singh
C. K.
&
Mukherjee
S.
2012
Delineation of groundwater potential zones in arid region of India – a remote sensing and GIS approach
.
Water Resources Management
26
(
9
),
2643
2672
.
https://doi.org/10.1007/s11269-012-0038-9
.
Naghibi
S. A.
,
Pourghasemi
H. R.
&
Dixon
B.
2016
GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran
.
Environmental Monitoring and Assessment
188
(
1
),
1
27
.
https://doi.org/10.1007/S10661-015-5049-6/FIGURES/18
.
Naghibi
S. A.
,
Ahmadi
K.
&
Daneshi
A.
2017
Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping
.
Water Resources Management
31
(
9
),
2761
2775
.
https://doi.org/10.1007/s11269-017-1660-3
.
Naghibi
S. A.
,
Dolatkordestani
M.
,
Rezaei
A.
,
Amouzegari
P.
,
Heravi
M. T.
,
Kalantar
B.
&
Pradhan
B.
2019
Application of rotation forest with decision trees as base classifier and a novel ensemble model in spatial modeling of groundwater potential
.
Environmental Monitoring and Assessment
191
(
4
),
1
20
.
https://doi.org/10.1007/S10661-019-7362-Y/TABLES/5
.
Naghibi
S. A.
,
Hashemi
H.
,
Berndtsson
R.
&
Lee
S.
2020
Application of extreme gradient boosting and parallel random forest algorithms for assessing groundwater spring potential using DEM-derived factors
.
Journal of Hydrology
589
,
125197
.
https://doi.org/10.1016/J.JHYDROL.2020.125197
.
Nguyen
P. T.
,
Ha
D. H.
,
Nguyen
H. D.
,
Phong
T. V.
,
Trinh
P. T.
,
Al-Ansari
N.
,
Van Le
H.
,
Pham
B. T.
,
Ho
L. S.
&
Prakash
I.
2020
Improvement of credal decision trees using ensemble frameworks for groundwater potential modeling
.
Sustainability (Switzerland)
12
(
7
).
https://doi.org/10.3390/su12072622
.
Ostad-Ali-Askari
K.
,
Shayannejad
M.
&
Ghorbanizadeh-Kharazi
H.
2017
Artificial neural network for modeling nitrate pollution of groundwater in marginal area of Zayandeh-rood River, Isfahan, Iran
.
KSCE Journal of Civil Engineering
21
(
1
),
134
140
.
https://doi.org/10.1007/s12205-016-0572-8
.
Ozyavuz
M.
,
Bilgili
B. C.
&
Salici
A.
2015
Determination of vegetation changes with NDVI method
.
Journal of Environmental Protection and Ecology
16
(
1
),
264
273
.
Pandey
H. K.
,
Singh
V. K.
&
Singh
S. K.
2022
Multi-criteria decision making and Dempster-Shafer model-based delineation of groundwater prospect zones from a semi-arid environment
.
Environmental Science and Pollution Research
29
(
31
),
47740
47758
.
https://doi.org/10.1007/S11356-022-19211-0/TABLES/6
.
Rahmati
O.
,
Pourghasemi
H. R.
&
Melesse
A. M.
2016
Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: a case study at Mehran Region, Iran
.
CATENA
137
,
360
372
.
https://doi.org/10.1016/J.CATENA.2015.10.010
.
Rajaveni
S. P.
,
Brindha
K.
&
Elango
L.
2017
Geological and geomorphological controls on groundwater occurrence in a hard rock region
.
Applied Water Science
7
(
3
),
1377
1389
.
https://doi.org/10.1007/s13201-015-0327-6
.
Ravichandran
R.
,
Ayyavoo
R.
,
Rajangam
L.
,
Madasamy
N.
,
Murugaiyan
B.
&
Shanmugam
S.
2022
Identification of groundwater potential zone using analytical hierarchical process (AHP) and multi-criteria decision analysis (MCDA) for Bhavani river basin, Tamil Nadu, southern India
.
Groundwater for Sustainable Development
18
,
100806
.
https://doi.org/10.1016/J.GSD.2022.100806
.
Razandi
Y.
,
Pourghasemi
H. R.
,
Neisani
N. S.
&
Rahmati
O.
2015
Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS
.
Earth Science Informatics
8
(
4
),
867
883
.
https://doi.org/10.1007/s12145-015-0220-8
.
Regenspurg
S.
,
Alawi
M.
,
Blöcher
G.
,
Börger
M.
,
Kranz
S.
,
Norden
B.
,
Saadat
A.
,
Scheytt
T.
,
Virchow
L.
&
Vieth-Hillebrand
A.
2018
Impact of drilling mud on chemistry and microbiology of an Upper Triassic groundwater after drilling and testing an exploration well for aquifer thermal energy storage in Berlin (Germany)
.
Environmental Earth Sciences
77
(
13
).
https://doi.org/10.1007/s12665-018-7696-8
.
Ruidas
D.
,
Pal
S. C.
,
Islam
A. R. M. T.
&
Saha
A.
2021
Characterization of groundwater potential zones in water-scarce hardrock regions using data driven model
.
Environmental Earth Sciences
80
(
24
).
https://doi.org/10.1007/S12665-021-10116-8
.
Sachdeva
S.
&
Kumar
B.
2021
Comparison of gradient boosted decision trees and random forest for groundwater potential mapping in Dholpur (Rajasthan), India
.
Stochastic Environmental Research and Risk Assessment
35
(
2
),
287
306
.
https://doi.org/10.1007/S00477-020-01891-0/TABLES/6
.
Sameen
M. I.
,
Pradhan
B.
&
Lee
S.
2019
Self-learning random forests model for mapping groundwater yield in data-scarce areas
.
Natural Resources Research
28
(
3
),
757
775
.
https://doi.org/10.1007/s11053-018-9416-1
.
Seifu
T. K.
,
Ayenew
T.
,
Woldesenbet
T. A.
&
Alemayehu
T.
2022
Identification of groundwater potential sites in the drought-prone area using geospatial techniques at Fafen-Jerer sub-basin, Ethiopia
.
https://doi.org/10.1080/24749508.2022.2141993
.
Seifu
T.
,
Ayenew
T. T. A.
&
Woldesenbet
T. A.
2023
Groundwater Potential Mapping Using GIS and Remote Sensing with Multi-Criteria Decision-Making in Shinile Sub-Basin, Eastern Ethiopia
.
Shayannejad
M.
,
Ghobadi
M.
&
Ostad-Ali-Askari
K.
2022
Modeling of surface flow and infiltration during surface irrigation advance based on numerical solution of Saint–Venant Equations Using Preissmann's scheme
.
Pure and Applied Geophysics
179
(
3
),
1103
1113
.
https://doi.org/10.1007/S00024-022-02962-9/TABLES/5
.
Starbuck
C. A.
,
Dickson
B. G.
&
Chambers
C. L.
2022
Informing wind energy development: land cover and topography predict occupancy for Arizona bats
.
PLoS ONE
17
(
6
),
e0268573
.
https://doi.org/10.1371/JOURNAL.PONE.0268573
.
Tanimu
J. J.
,
Hamada
M.
,
Hassan
M.
,
Kakudi
H. A.
&
Abiodun
J. O.
2022
A machine learning method for classification of cervical cancer
.
Electronics (Switzerland)
11
(
3
).
https://doi.org/10.3390/electronics11030463
.
Thanh
N. N.
,
Chotpantarat
S.
,
Trung
N. H.
,
Ngu
N. H.
&
Muoi
L. V.
2022
Mapping groundwater potential zones in Kanchanaburi Province, Thailand by integrating of analytic hierarchy process, frequency ratio, and random forest
.
Ecological Indicators
145
,
109591
.
https://doi.org/10.1016/J.ECOLIND.2022.109591
.
Tyralis
H.
,
Papacharalampous
G.
&
Langousis
A.
2019
A brief review of random forests for water scientists and practitioners and their recent history in water resources
.
Water
11
(
5
),
910
.
https://doi.org/10.3390/W11050910
.
Worqlul
A. W.
,
Jeong
J.
,
Dile
Y. T.
,
Osorio
J.
,
Schmitter
P.
,
Gerik
T.
,
Srinivasan
R.
&
Clark
N.
2017
Assessing potential land suitable for surface irrigation using groundwater in Ethiopia
.
Applied Geography
85
,
1
13
.
https://doi.org/10.1016/j.apgeog.2017.05.010
.
Yamusa
I. B.
,
Yamusa
Y. B.
,
Danbatta
U. A.
&
Najime
T.
2018
Geological and structural analysis using remote sensing for lineament and lithological mapping
.
IOP Conference Series: Earth and Environmental Science
169
(
1
).
https://doi.org/10.1088/1755-1315/169/1/012082
.
Yang
R.
,
Suhail
H. A.
,
Gourbet
L.
,
Willett
S. D.
,
Fellin
M. G.
,
Lin
X.
,
Gong
J.
,
Wei
X.
,
Maden
C.
,
Jiao
R.
&
Chen
H.
2020
Early Pleistocene drainage pattern changes in Eastern Tibet: constraints from provenance analysis, thermochronometry, and numerical modeling
.
Earth and Planetary Science Letters
531
,
115955
.
https://doi.org/10.1016/J.EPSL.2019.115955
.
Zabihi
M.
,
Pourghasemi
H. R.
,
Pourtaghi
Z. S.
&
Behzadfar
M.
2016
GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran
.
Environmental Earth Sciences
75
(
8
).
https://doi.org/10.1007/s12665-016-5424-9
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data