Determining groundwater potential is vital for groundwater resource management. This study aims to present a comparative analysis of three widely used ensemble techniques (averaging, bagging, and boosting) in groundwater spring potential mapping. Firstly, 12 spring-related factors and a total of 79 groundwater spring locations were collected and used as the dataset. Secondly, three typical ensemble models were adopted to predict groundwater spring potential, namely, Bayesian model averaging (BMA), random forest (RF), and the gradient boosting decision tree (GBDT). The area under the receiver operating characteristics curve (AUC) and four statistical indexes (accuracy, sensitivity, specificity, and the root mean square error (RMSE)) were used to estimate the model's accuracy. The results indicate that the three models had a good predictive performance and that the AUC values of the GBDT, RF, and BMA were 0.88, 0.84, and 0.78, respectively. Furthermore, the GBDT had the best performance (accuracy = 0.89, sensitivity = 0.91, specificity = 0.87, and RMSE = 0.33) in terms of the four indexes, followed by RF (accuracy = 0.87, sensitivity = 0.91, specificity = 0.83, and RMSE = 0.36) and BMA (accuracy = 0.76, sensitivity = 0.87, specificity = 0.65, and RMSE = 0.49). This research can provide effective guidance for using ensemble models for mapping groundwater spring potential in the future.

  • Ensemble machine algorithms were compared and used to identify the potential zone of groundwater spring.

  • The Bayesian model averaging can be selected to map groundwater spring potential.

  • The three ensemble models show good predictive performance.

Graphical Abstract

Graphical Abstract

Groundwater is a vital water source for domestic, agricultural, and industrial water supplies all over the world. More than half the world's population depends on groundwater, especially in arid and semi-arid regions (Arabameri et al. 2019). Owing to the rapidly increasing demands of the population and the quick pace of industrialization, the demand for groundwater resources has increased, which can cause a water shortage (Rizeei et al. 2019). Moreover, excessive pumping of groundwater leads to a drawdown of groundwater and deterioration of groundwater quality. As a result, it is necessary to identify groundwater potential zones (Pal et al. 2020). Groundwater springs usually emerge in regions with a high level of groundwater and high groundwater storage potential (Kamali Maskooni et al. 2020; Yousefi et al. 2020). Therefore, evaluating groundwater spring potential (GSP) has been effective in identifying the state of groundwater resources and implementing a successful groundwater determination, protection, and management program (Naghibi et al. 2020).

GSP is commonly defined as the likelihood of groundwater springs in an area (Yousefi et al. 2020). The traditional tools to determine groundwater potential zones are mainly related to geophysical, geological, and hydrological methods. Although these methods can precisely estimate the groundwater potential zone, they can be time-consuming and costly (Rizeei et al. 2019). The geographic information system (GIS) and remote sensing (RS) are very practical for managing spatial data and preparing thematic maps commonly employed in various modeling techniques to evaluate GSP (Chowdhury et al. 2009; Kamali Maskooni et al. 2020). Various knowledge-driven approaches and statistical techniques have been used to map GSP, such as the analytical hierarchy process, frequency ratio, the logistic regression method, weights of evidence, multi-criteria decision analysis, index of entropy, and the evidential belief function (Chowdhury et al. 2009; Dilekoglu & Aslan 2022). These methods are relatively easy to realize, especially in large-scale regions. They are ineffective in dealing with the complex relationship between the conditioning factors and GSP (Chen et al. 2020; Naghibi et al. 2020; Mosavi et al. 2021). GSP is complex and depends on several surface and subsurface factors, thus delineation of GSP is a complex task (Misi et al. 2018; Patidar et al. 2021; Dilekoglu & Aslan 2022).

In the past ten years, machine learning methods have been viewed as potential cost-effective tools to predicate the spatial probability of GSP. Various researchers have developed many machine learning approaches, including the artificial neural network, classification and regression tree, logistic regression, random forest (RF), boosted regression trees, adaptive boosting classification trees, support vector machine, and deep neural networks (Chowdhury et al. 2009; Rahmati et al. 2016; Nguyen et al. 2020; Patidar et al. 2021). Several related pieces of research have proven that the machine learning method can deal with complex and high-dimensional data, which usually outperform conventional methods for mapping GSP (Kamali Maskooni et al. 2020; Naghibi et al. 2020; Zounemat-Kermani et al. 2021).

To avoid the bias of a single machine learning method in spatial modeling fields, the ensemble machine learning technique is rapidly growing in recent years (Rizeei et al. 2019; Kamali Maskooni et al. 2020; Yousefi et al. 2020; Patidar et al. 2021; Rashki Ghaleh Nou & Azhdary Moghaddam 2021). Ensemble learning started in the 1990s. It consists of two or more base classifiers and creates a new powerful model (Zounemat-Kermani et al. 2021). The results of related research indicate that the ensemble method can obtain more reliable prediction results than single predictive models, which have been successfully applied in similar studies, such as landslide susceptibility, flood susceptibility, and groundwater pollution assessment (Hong et al. 2018; Sachdeva & Kumar 2021). A development tendency for delineating GSP is the use of ensemble model (Yousefi et al. 2020; Mosavi et al. 2021). Generally, there are mainly three representative types of ensemble methods, i.e., averaging, bagging, and boosting (Zounemat-Kermani et al. 2021). For instance, Rahmati et al. (2016) introduced the RF model for assessing GSP and reported good performance results of the ensemble model. Maskooni et al. (2020) applied a boosting algorithm, i.e., gradient boosting decision tree (GBDT), and a statistical method for assessing groundwater potential. The results indicated that GBDT was a more suitable candidate for mapping GSP.

The application of hybrid models indicates that the optimum ensemble algorithm should be explored in different regions or for different evaluation objects. As can be seen from a review of related literature, a limited number of studies compared the three types of ensemble techniques (averaging, bagging, and boosting). For example, Bayesian model averaging (BMA) as a typical averaging algorithm was rarely used to map GSP but had been successfully employed in other fields (Mosavi et al. 2021). Thus, it is necessary to further discuss the application of advanced ensemble models in GSP.

This research aims to introduce three popular ensemble techniques (averaging, bagging, and boosting) and compare their performance when selecting the best model to assess GSP. Taking Chengde City in China as the case study, three typical ensemble methods were employed to map its GSP, including averaging (BMA), bagging (RF), and boosting (the GBDT). The objectives of this research are: (1) to explore the three ensemble models used for mapping GSP, (2) to discuss the applicability of ensemble techniques, and (3) to generate the reliable groundwater spring potential zone in the study area.

The study area is located in the south of Chengde City (Hebei Province, China), which consists of Xinglong, Kuancheng, and Pingquan County, thus covering an area of approximately 8,372 km2 (Figure 1). The area is characterized by a semi-arid climate, which varies greatly from north to south. The average annual precipitation over the last 20 years was 600 mm and the annual average temperature equals 7.5 °C. The terrain is high in the northwest and low in the southeast, mainly comprising three geomorphological compositions, the plateau, the mountainous land, and the basin. The elevation of the study area varies from 100 to 2,500 m above sea level.
Figure 1

The study area.

The geological deposits of the area are characterized by many outcrops from the Archaeozoic to the Cenozoic. The groundwater in the study area is a vital resource for industrial, agricultural, and domestic purposes (Tian et al. 2019). The main aquifers in the area can be divided into pore water in loose sediments and fissure water in bedrock, which is generally unconfined. The pore water in loose Quaternary sediments can be found in the river valley and the intermountain basin in the study area, which is also the main aquifer. The groundwater system in fractured rock mainly consists of the clastic rock fissure aquifer and the carbonate bedrock aquifer. Precipitation is the primary way for groundwater to refill, followed by river leakage and irrigation infiltration. Springs are an important point of exit for fissure groundwater systems. Due to intensive groundwater exploitation, the drawdown of the water table is serious. However, research conducted in the study area indicates that the resources of natural mineral water were abundant but had a high level of strontium (Sr), a low level of sodium, and low alkalinity (Wang et al. 2021). Therefore, the delineation of the GSP zone can help in managing the water supply in the study area, especially natural mineral water.

The methodology was divided into three stages: (1) the preparation of the dataset, including spring locations, non-spring locations and groundwater spring condition factors; (2) the construction and comparison of the three ensemble machine learning methods for mapping GSP; and (3) the selection of the most suitable model to map GSP in the study area. Conditioning factors maps were converted in GIS to a raster format for the input dataset of the ensemble models. The ensemble models were implemented in the Python 3.7 environment. The related code in this study is mainly based on the scikit-learn package, which is an open source Python module. Moreover, k-fold cross-validation and the trial-and-error process were used to find the optimum parameters of the ensemble models. The detailed flowchart of the proposed methodology is presented in Figure 2.
Figure 2

Flow chart of the methodology.

Figure 2

Flow chart of the methodology.

Close modal

Data collection and preparation

Spring inventory map

The locations of the springs were finally identified based on extensive field surveys and collected documents. A total of 75 spring locations were found and presented in a raster format (Figure 1). The springs were randomly split into the training set (70%, 52 springs) and the validation dataset (30%, 23 springs). Similarly, 75 locations without springs were randomly selected from these areas, as shown in Figure 1. These locations were also divided into the training and validation dataset using the same proportion.

Conditioning factors

It is vital to select the conditioning factors of GSP. Based on a literature review, occurrence of springs, and the availability of data in the study area (Chen et al. 2020; Naghibi et al. 2020; Yousefi et al. 2020; Sarkar et al. 2022), this research selected 12 groundwater spring conditioning variables. The selected factors include the altitude, slope angle, aspect, plan curvature, profile curvature, topographic wetness index (TWI), normalized difference vegetation index (NDVI), lithology, distance to faults, distance to rivers, rainfall, and land use. All factors were converted into a thematic map with a grid size of 30 × 30 m. Furthermore, the kriging spatial interpolation technique was selected to transform the discrete data into a continuous surface.

Surface topography has a direct and indirect influence on flow direction and accumulation, which can affect the occurrence of springs (Chen et al. 2020; Mosavi et al. 2021). Higher elevation generates higher slope degrees and lower infiltration rates, which are vital for the occurrence of springs. The altitude of the study area varies between 100 and 2,200 m (Figure 3(a)). Because the slope has an important effect on the infiltration rate and surface runoff, it was also used in this study (Figure 3(b)) (Kamali Maskooni et al. 2020). Aspect represents the dominant direction of the slope and influences infiltration rates, i.e., the slopes facing the north have more potential for a spring to occur on them (Kamali Maskooni et al. 2020). The aspect map of the entire area was classified into nine categories, including flat (−1), north (337.5–360 and 0–22.5), northeast (22.5–67.5), east (67.5–112.5), southeast (112.5–157.5), south (157.5–202.5), southwest (202.5–247.5), west (247.5–292.5), and northwest (247.5–292.5) (Figure 3(c)). The related pieces of research indicate that the curvature mainly affects the divergence and convergence, acceleration and deceleration of the flow (Naghibi et al. 2020). Thus, the plan and profile curvatures were considered in this research, as shown in Figures 3(d) and 3(e), respectively. The TWI was used to demonstrate soil moisture and calculate the accumulation of water, playing an important role in the spatial diversity of hydrological conditions (Arabameri et al. 2019). Thus, this factor was used to determine the occurrence of springs and identify the location of saturated zones (Figure 3(f)), defined by Equation (1) (Yousefi et al. 2020). The elevation, slope, aspect, curvature, and TWI were provided in a GIS environment using a digital elevation model (DEM). DEM data with a 30 × 30 m resolution was derived from the geospatial data cloud.
(1)
where As are the specific catchment areas and β is the slope gradient.
Figure 3

Spatial distribution of the parameters: (a) elevation, (b) slope angle, (c) aspect, (d) plan curvature, (e) profile curvature, (f) TWI, (g) NDVI, (h) lithology, (i) distance to faults, (j) distance to rivers, (k) rainfall, (l) land use.

Figure 3

Spatial distribution of the parameters: (a) elevation, (b) slope angle, (c) aspect, (d) plan curvature, (e) profile curvature, (f) TWI, (g) NDVI, (h) lithology, (i) distance to faults, (j) distance to rivers, (k) rainfall, (l) land use.

Close modal

The NDVI indicates the status of vegetation and is frequently identified as a significant factor affecting the distribution of groundwater spring locations (Pal et al. 2020). The parameter of the NDVI was calculated with red and near-infrared bands using Landsat 8 images from 2016. Figure 3(g) presents the NDVI distribution in the study area.

Groundwater storage is governed by lithological units directly affecting the groundwater potential zone (Yousefi et al. 2020). The lithological classes of the study area were obtained through boreholes and a geological map on a 1:100,000 scale (Figure 3(h)). The rock around fault structures was severely fragmented and percolated a huge amount of water towards the underground. Springs usually emerge near the fault zone in the study area. The shorter the distance to faults, the higher the potential of infiltration. The location of faults was extracted from geological maps on a 1:100,000 scale. The map of the distance to faults was produced by a buffering method in the GIS environment (Figure 3(i)).

Rivers are one of the main natural sources that can refill groundwater systems, especially in semi-arid and arid areas. When an area is nearer to rivers, there is a higher chance for springs to occur in it (Chen et al. 2020). Thus, the distance to rivers was obtained in the GIS environment using a buffering technique, as shown in Figure 3(j). Rainfall is a primary source of groundwater recharge in this region. Therefore, rainfall is also considered as a significant hydrologic factor, which can directly influence water infiltration and the water table. The average annual rainfall (from 2000 to 2015) was collected from five precipitation stations in the study area (Figure 3(k)).

Land use can describe the status of ecological conditions and reflect anthropological activities, which can impact the groundwater resources (Yousefi et al. 2020). Since the type of land cover indirectly influences the recharge rates and soil infiltration capacity, it can also influence GSP. The patterns of land use in Chengde City in 2016 were included in the research and classified into five types, namely, agricultural land and grassland, forest, waterbody, residential land, and barren land (Figure 3(l)).

BMA

BMA is a statistical analysis method and one of the most efficient and well-known averaging approaches, which takes into account model uncertainty in the Bayesian framework (Zounemat-Kermani et al. 2021). In Bayesian statistics, BMA infers the distribution probability of forecasting variables using the posterior probability of each ensemble member. It differs from other model averaging techniques because it combines the predictions of each model based on the relative performance of selected base learners during the training period, because of which it has been successfully applied in various fields (Raftery et al. 2005). Furthermore, the key advantage of this ensemble algorithm is its ability to obtain optimized weights of each base learner by maximizing the merits of different methods (Raftery et al. 2005). Therefore, BMA was selected as an averaging ensemble technique to assess GSP. In this research, the exception maximization algorithm was used to estimate the optimized weights and variances of every ensemble member by maximizing the log-likelihood function.

RF

Bagging is an ensemble technique that generates a unique bootstrap sample from the original training dataset for each decision tree. RF is a supervised ensemble machine learning algorithm and an efficient bagging ensemble model based on decision tree algorithms used for classifying regression problems (Breiman 2001; Rahmati et al. 2016). In assessing GSP, the RF algorithm operates by building many classification trees, which produces an estimate of classes during the training period, while bootstrap sampling with replacement is performed to create a new training set. The unselected variable or data are often called out-of-bag samples. The prediction error and validation are calculated based on these samples. The bootstrapping technique combined with the bagging algorithm helps the RF model avoid over-fitting problems.

The key parameters of the RF model are the number of variables included in each classification tree and the number of trees (Breiman 2001). The RF algorithm uses the Gini impurity as a measure for the best split selection. The value of the Gini index can measure the impurity of a given element in relation to the rest of the classes. The number of trees and variables in this research were selected based on previous research (Rahmati et al. 2016; Naghibi et al. 2020). Likewise, the 10-fold cross-validation scheme and the grid search method were used to determine the suitable number of trees and variables.

GBDT

Boosting is a hybrid multi-classifier combination algorithm and can be used for both classification and regression tasks. The main difference between it and the bagging ensemble techniques is the relevance of weak base learners. In boosting, the subsequent classifier is also trained based on the classification result from the previous base learner (Liang et al. 2021). The main objective of the bagging technique is to combine several weak learners into a strong classifier through multiple iterations. In contrast, the base learners in bagging are independent of each other, so the dataset of each base learner is created from the sample sets by the bootstrap resampling technique (Naghibi et al. 2020).

GBDT belongs to an ensemble learning method with a decision tree as the base classifier. It is also one of the most effective ensemble techniques in the boosting family (Friedman 2001). In essence, the residual of the previous base learner is selected as the input for the next base learner (Liang et al. 2021). The main characteristic of the GBDT algorithm is its training strategy. The value of the loss function of the current iteration decreases along the gradient of the reducing residuals from the previous step (Sachdeva & Kumar 2021). At the same time, the input dataset of the new base learner is changed by adjusting the weights of misclassified samples from the previous learner. The iteration process cannot be terminated until the number of iterations and the value error reach the pre-established condition. The weighted summations of weak classifiers are added up to produce the final result.

Validation and comparison

The validation and comparison process is an essential step in the development and application of multiple predictive models. Several indices have been proposed and used for this purpose. In this study, the performance of ensemble models was assessed by utilizing the receiver operating characteristics (ROC) curve, accuracy, sensitivity, specificity, and the root mean square error (RMSE) (Arabameri et al. 2019; Yousefi et al. 2020).

The ROC curve is one of the most commonly used methods to evaluate the performance of supervised machine learning models (Liang et al. 2021). In the ROC method, the area under the curve (AUC) is used to evaluate the accuracy of models. In general, the relationship between the AUC and model performance can be described as 0.5–0.6 (low accuracy), 0.6–0.7 (moderate accuracy), 0.7–0.8 (high accuracy), 0.8–0.9 (very high accuracy), and 0.9–1(excellent accuracy) (Hong et al. 2018; Arabameri et al. 2019).

In addition, this research includes statistical variables to compare the performance of ensemble models and employs the calculated Equations (2)–(5) (Rizeei et al. 2019; Chen et al. 2020). Accuracy represents the proportion of the total number of correctly classified points. Sensitivity is the ratio of spring pixels correctly classified as springs while specificity is the proportion of the non-spring pixels correctly classified as non-springs. RMSE is used to evaluate differences between the observed sample values and the predicted values. Overall, if the values of accuracy, sensitivity, and specificity are closer to 1, they indicate that the model has good performance, while a lower value of RMSE explicates better accuracy (Hong et al. 2018; Sachdeva & Kumar 2021).
(2)
(3)
(4)
(5)
where true positive (TP) and true negative (TN) are the numbers of pixels that are correctly classified as spring and non-spring locations, respectively. False positive (FP) and false negative (FN) are the numbers of pixels erroneously classified. Oi is the observed value of the i-th sample, yi is the corresponding predicted value, and N is the total number of samples.

Multicollinearity analysis

The independence of the 12 selected factors in the ensemble model is vital to ensure model accuracy and avoid over-fitting problems (Mosavi et al. 2021). To diagnose potential collinearity problems among the factors, two commonly used indicators were calculated in this research, namely, the tolerance (TOL) and variance inflation factor (VIF), as presented in Equation (6) (Arabameri et al. 2019; Mosavi et al. 2021).
(6)
where R2 is the R-squared value.

When the VIF value is greater than 5 and TOL is less than 0.1, a strong multicollinearity problem occurs among the predicted factors (Arabameri et al. 2019). The results are presented in Table 1. Elevation had the highest VIF value (1.803) and the lowest TOL value (0.555). Overall, there was no obvious multicollinearity among the 12 factors, so they were proven suitable for the further analysis of ensemble models.

Table 1

The results of the multi-collinearity analysis

ParametersCollinearity statistics
TOLVIF
Aspect 0.947 1.056 
Elevation 0.555 1.803 
Distance to faults 0.754 1.327 
Land use 0.927 1.079 
NDVI 0.680 1.470 
Plan curvature 0.871 1.148 
Profile curvature 0.655 1.527 
Rainfall 0.609 1.642 
Distance to rivers 0.815 1.228 
Lithology 0.771 1.298 
ParametersCollinearity statistics
TOLVIF
Aspect 0.947 1.056 
Elevation 0.555 1.803 
Distance to faults 0.754 1.327 
Land use 0.927 1.079 
NDVI 0.680 1.470 
Plan curvature 0.871 1.148 
Profile curvature 0.655 1.527 
Rainfall 0.609 1.642 
Distance to rivers 0.815 1.228 
Lithology 0.771 1.298 

Model performance

The ROC curve and the four statistical indexes were used to evaluate and compare the performance of the three ensemble models. The results of the evaluation are presented in Figure 4. The AUC values of all three models were larger than 0.70, which indicates that the three models had a high degree of accuracy. More specifically, the values of AUC were 0.88 for the GBDT model, 0.84 for the RF model, and 0.78 for the BMA model.
Figure 4

The ROC curve.

Similar performance phenomenon emerged while calculating statistical values. The results show that the GBDT had the highest accuracy (0.89), sensitivity (0.91), and specificity (0.87), followed by the RF (accuracy = 0.87, sensitivity = 0.91, and specificity = 0.83) and BMA (accuracy = 0.76, sensitivity = 0.87, and specificity = 0.65). In the case of RMSE, the value of GBDT, RF, and BMA were 0.33, 0.36, and 0.49, respectively.

In general, the GBDT model was ranked as the best one with slightly better performance than the RF model. The BMA model had relatively the lowest performance among the three ensemble models.

The groundwater potential map

Based on the 12 mentioned non-collinearity factors, the probability of GSP was predicted keeping in mind that 0 indicates no possibility and 1 indicates a 100% probability of a spring to occur. Finally, the GSP results of the BMA, RF, and GBDT models were categorized into five classifications using the natural break method, i.e., very low, low, moderate, high, and very high. The convenient tool GIS was selected to classify and visualize GSP zones. The detailed results of the three typical models can be seen in Figure 5.
Figure 5

GSP map using ensemble models: (a) the BMA model, (b) the RF model, and (c) the GBDT model.

Figure 5

GSP map using ensemble models: (a) the BMA model, (b) the RF model, and (c) the GBDT model.

Close modal

As shown in Figure 5(a), for the BMA model, the GSP of the total area ranging from very low to very high was 14.20%, 36.03%, 29.25%, 15.61%, and 4.91%, respectively. The RF model showed that 12.32% and 25.94% of the total area has very low and low potential for groundwater springs, while 25.93%, 22.80%, and 13.01% of the area has moderate, high and very high potential degrees, respectively (Figure 5(b)).

The GBDT model proved to be the most accurate among the three ensemble models and was selected to map GSP in the study area. As can be seen from Figure 5(c), very high and high potentials accounted for 6.12% and 26.12% of the total area, respectively. The area with the highest potential was found on the gentle slopes, where the land is mainly agricultural land and grassland. Moreover, rivers had a direct relationship with GSP. The areas located in the vicinity of the main rivers fell in the category of very high or high GSP. The moderate potential was estimated in 38.30% of the study area. Furthermore, 23.91% and 5.55% of the study area were divided into low and very low groundwater potential. The areas with a high elevation and steeper slopes had low GPS.

Determining groundwater spring potential zones is a cost-effective way to properly plan groundwater resource management. In recent years, the research on mapping GSP has significantly increased, especially the ones about utilizing machine learning methods. In most cases, ensemble machine learning models have proven to be significantly better than single ones (Rizeei et al. 2019; Chen et al. 2020). However, existing research has rarely compared the performance of the popular ensemble techniques (averaging, boosting, and bagging) in predicting GSP. Therefore, this research aimed to compare and use the three ensemble techniques for mapping GSP.

BMA has received a lot of attention in hydrology because it has proved to be one of the most efficient averaging approaches. However, it has not yet been employed in GSP mapping (Zounemat-Kermani et al. 2021). The RF model, on the other hand, has often been used for assessing groundwater potential (Rahmati et al. 2016; Pal et al. 2020). The GBDT model has a good comprehensive performance in environmental and ecological modeling (Liang et al. 2021). Therefore, this study selected these three popular ensemble models (BMA, RF, and GBDT) for mapping GSP in Chengde City (Hebei Province, China). The three models were further compared using the ROC method and the four statistical indexes (accuracy, sensitivity, specificity, and RMSE).

The results of this research are in line with previous research, indicating that the ensemble models perform well in mapping GSP (Chen et al. 2020; Naghibi et al. 2020; Zounemat-Kermani et al. 2021), with an AUC value greater than 0.70. Li & Tsai (2009) and Moazamnia et al. (2019) recommended the BMA model to predicate groundwater storage because it has higher accuracy compared to individual models. The BMA approach outperforms the single model technique owing to its ability to assign a weight and variance to each of the multiple models based on their relative performance (Li & Tsai 2009). Even though BMA has been widely used in hydrology, few studies used this algorithm for GSP. The AUC value of the BMA model in this research reached 0.78. The results indicate that BMA has good performance and that it can be used for mapping GSP.

Meanwhile, ensemble techniques bagging and boosting have been popularly used in groundwater potential mapping. Rizeei et al. (2019) stated that a novel ensemble boosting method had superior predictive performance in assessing GSP, while Patidar et al. (2021) noted that the RF model showed good prediction results of groundwater potential zones. Similarly, the high accuracy of bagging and boosting ensemble models was observed in this research, with the values of AUC for RF and GBDT being 0.84 and 0.88, respectively.

Furthermore, this research presented a detailed comparison of the three ensemble techniques using four statistical indexes. The GBDT model had the best performance, followed by RF and BMA. Similarly, various research found that the GBDT had higher performance than the RF model in landslide susceptibility mapping (Hong et al. 2018; Yousefi et al. 2020; Liang et al. 2021). The results of this research are in agreement with Sachdeva & Kumar (2021), who also indicated that the GBDT performed better than the RF model in groundwater potential mapping. However, the results of this study and several other research cannot verify that the boosting algorithm model always performs better than the bagging ensemble model. The results of some research showed different comparative results. For instance, Mosavi et al. (2021) and Chen et al. (2020) indicated that the bagging algorithm RF had a higher performance than the boosting algorithm, which is the adaptive boosting classification trees, in mapping groundwater potential.

This study applied three popular ensemble models, BMA, RF, and GBDT, in GSP mapping in Chengde City (Hebei Province, China). 75 groundwater spring locations and 12 influencing factors were collected from the study area. Factor selection was verified by multicollinearity analysis. The dataset was split into training (approximately 70%) and testing (30%) datasets. Furthermore, the comparison and validation of the ensemble models were performed using the ROC method and four statistical indexes (accuracy, sensitivity, specificity, and RMSE). Based on the ROC method, all three ensemble models exhibited good predictive performance, with the AUC value greater than 0.70. The averaging method can also be used to assess GSP instead of the bagging and boosting ensemble models. It is further indicated that the GBDT model had the best accuracy, followed by RF and BMA. Finally, the three models were further used to delineate the GSP of the study area. The result of the best model, GBDT, revealed that 32.24% of the area had a high or very high GSP, while the moderate, low, and very low potentials occupied 38.30%, 23.91%, and 5.55% of the study area, respectively. Areas with high GSP were mainly distributed in regions with gentle slopes or near rivers.

In general, the result can provide effective guidance for the further use of ensemble models in GSP mapping. However, the main limitation of this research is the exclusion of further optimization of the model parameters, such as the hyperparameter. It included only the trial-and-error method and grid search, which should also be comprehensively and comparatively analyzed in the future.

This research was supported by S&T Program of Hebei (D2022403032) and the Graduate Students Teaching Case of Hebei Province (KCJSZ2019090). The authors are indebted to the anonymous reviewers and the editors, who significantly improved the quality of the paper.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Arabameri
A.
,
Rezaei
K.
,
Cerda
A.
,
Lombardo
L.
&
Rodrigo-Comino
J.
2019
GIS-based groundwater potential mapping in Shahroud plain, Iran. A comparison among statistical (bivariate and multivariate), data mining and MCDM approaches
.
Sci. Total Environ.
658
,
160
177
.
https://doi.org/10.1016/j.scitotenv.2018.12.115
.
Breiman
L.
2001
Random forests
.
Mach. Learn.
45
(
1
),
5
32
.
https://doi.org/ 10.1023/A:1010933404324
.
Chen
W.
,
Zhao
X.
,
Tsangaratos
P.
,
Shahabi
H.
,
Ilia
I.
,
Xue
W.
,
Wang
X.
&
Ahmad
B. B.
2020
Evaluating the usage of tree-based ensemble methods in groundwater spring potential mapping
.
J. Hydrol.
583
,
124602
.
https://doi.org/ 10.1016/j.jhydrol.2020.124602
.
Chowdhury
A.
,
Jha
M. K.
,
Chowdary
V. M.
&
Mal
B. C.
2009
Integrated remote sensing and GIS-based approach for assessing groundwater potential in West Medinipur district, West Bengal, India
.
Int. J. Remote Sens.
30
(
1
),
231
250
.
https://doi.org/ 10.1080/01431160802270131
.
Friedman
J. H.
2001
Greedy function approximation: a gradient boosting machine
.
Ann. Stat.
29
(
5
),
1189
1232
.
https://doi.org/ 10.1214/aos/1013203451
.
Hong
H.
,
Liu
J.
,
Tien Bui
D.
,
Pradhan
B.
,
Acharyag
T. D.
,
Pham
B. T.
,
Zhu
A. X.
,
Chen
W.
&
Ahmadk
B. B.
2018
Landslide susceptibility mapping using J48 decision tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China)
.
Catena
163
,
399
413
.
https://doi.org/ 10.1016/j.catena.2018.01.005
.
Kamali Maskooni
E.
,
Naghibi
S. A.
,
Hashemi
H.
&
Berndtsson
R.
2020
Application of advanced machine learning algorithms to assess groundwater potential using remote sensing-derived data
.
Remote Sens.
12
(
17
),
2742
.
https://doi.org/ 10.3390/rs12172742
.
Li
X.
&
Tsai
F. T. C.
2009
Bayesian model averaging for groundwater head prediction and uncertainty analysis using multimodel and multimethod
.
Water Resour. Res.
45
(
9
),
627
643
.
https://doi.org/ 10.1029/2008WR007488
.
Liang
Z.
,
Wang
C.
&
Khan
K. U. J.
2021
Application and comparison of different ensemble learning machines combining with a novel sampling strategy for shallow landslide susceptibility mapping
.
Stochastic Environ. Res. Risk Assess
35
(
6
),
1243
1256
.
https://doi.org/ 10.1007/s00477-020-01893-y
.
Misi
A.
,
Gumindoga
W.
&
Hoko
Z.
2018
An assessment of groundwater potential and vulnerability in the Upper Manyame Sub-Catchment of Zimbabwe
.
Phys. Chem. Earth
105
,
72
83
.
https://doi.org/10.1016/j.pce.2018.03.003
.
Moazamnia
M.
,
Hassanzadeh
Y.
,
Nadiri
A. A.
,
Khatibi
R.
&
Sadeghfam
S.
2019
Formulating a strategy to combine artificial intelligence models using Bayesian model averaging to study a distressed aquifer with sparse data availability
.
J. Hydrol.
571
,
765
781
.
https://doi.org/ 10.1016/j.jhydrol.2019.02.011
.
Mosavi
A.
,
Sajedi Hosseini
F.
,
Choubin
B.
,
Goodarzi
M.
,
Dineva
A. A.
&
Rafiei Sardooi
E.
2021
Ensemble boosting and bagging based machine learning models for groundwater potential prediction
.
Water Resour. Manage.
35
(
1
),
23
37
.
https://doi.org/ 10.1007/s11269-020-02704-3
.
Naghibi
S. A.
,
Hashemi
H.
,
Berndtsson
R.
&
Lee
S.
2020
Application of extreme gradient boosting and parallel random forest algorithms for assessing groundwater spring potential using DEM-derived factors
.
J. Hydrol.
589
,
125197
.
https://doi.org/ 10.1016/j.jhydrol.2020.125197
.
Nguyen
P. T.
,
Ha
D. H.
,
Avand
M.
,
Jaafari
A.
,
Nguyen
H. D.
,
Al-Ansari
N.
,
Van Phong
T.
,
Sharma
R.
,
Kumar
R.
,
Le
H. V.
,
Ho
L. S.
,
Prakash
I.
&
Pham
B. T.
2020
Soft computing ensemble models based on logistic regression for groundwater potential mapping
.
Appl. Sci.
10
(
7
),
2469
.
https://doi.org/ 10.3390/app10072469
.
Pal
S.
,
Kundu
S.
&
Mahato
S.
2020
Groundwater potential zones for sustainable management plans in a river basin of India and Bangladesh
.
J. Clean. Prod.
257
,
120311
.
https://doi.org/ 10.1016/j.jclepro.2020.120311
.
Patidar
R.
,
Pingale
S. M.
&
Khare
D.
2021
An integration of geospatial and machine learning techniques for mapping groundwater potential: a case study of the Shipra river basin
.
India. Arab. J. Geosci.
14
(
16
),
1
16
.
https://doi.org/10.1007/s12517-021-07871-0
.
Raftery
A. E.
,
Gneiting
T.
,
Balabdaoui
F.
&
Polakowski
M.
2005
Using Bayesian model averaging to calibrate forecast ensembles
.
Mon. Weather Rev.
133
(
5
),
1155
1174
.
https://doi.org/ 10.1175/MWR2906.1
.
Rahmati
O.
,
Pourghasemi
H. R.
&
Melesse
A. M.
2016
Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: a case study at Mehran Region, Iran
.
Catena
137
,
360
372
.
https://doi.org/ 10.1016/j.catena.2015.10.010
.
Rashki Ghaleh Nou
M.
&
Azhdary Moghaddam
M.
2021
Improving the prediction of scour depth downstream of the flip bucket with machine learning techniques
. In:
Proceedings of the Institution of Civil Engineers-Water Management
.
Thomas Telford Ltd
.
https://doi.org/10.1680/jwama.20.00089
.
Rizeei
H. M.
,
Pradhan
B.
,
Saharkhiz
M. A.
&
Lee
S.
2019
Groundwater aquifer potential modeling using an ensemble multi-adoptive boosting logistic regression technique
.
J. Hydrol.
579
,
124172
.
https://doi.org/10.1016/j.jhydrol.2019.124172
.
Sachdeva
S.
&
Kumar
B.
2021
Comparison of gradient boosted decision trees and random forest for groundwater potential mapping in Dholpur (Rajasthan), India
.
Stochastic Environ. Res. Risk Assess
35
(
2
),
287
306
.
https://doi.org/ 10.1007/s00477-020-01891-0
.
Sarkar
S. K.
,
Esraz-Ul-Zannat
M.
,
Das
P. C.
&
Ekram
K. M. M.
2022
Delineating the groundwater potential zones in Bangladesh
.
Water Supply
22
(
4
),
4500
4516
.
https://doi.org/10.2166/ws.2022.113
.
Tian
Y.
,
Jiang
Y.
,
Liu
Q.
,
Dong
M.
,
Xu
D.
,
Liu
Y.
&
Xu
X.
2019
Using a water quality index to assess the water quality of the upper and middle streams of the Luanhe River, northern China
.
Sci. Total Environ.
667
,
142
151
.
https://doi.org/ 10.1016/j.scitotenv.2019.02.356
.
Yousefi
S.
,
Sadhasivam
N.
,
Pourghasemi
H. R.
,
Nazarlou
H. G.
,
Golkar
F.
,
Tavangar
S.
&
Santosh
M.
2020
Groundwater spring potential assessment using new ensemble data mining techniques
.
Measurement
157
,
107652
.
https://doi.org/ 10.1016/j.measurement.2020.107652
.
Zounemat-Kermani
M.
,
Batelaan
O.
,
Fadaee
M.
&
Hinkelmann
R.
2021
Ensemble machine learning paradigms in hydrology: a review
.
J. Hydrol.
598
,
126266
.
https://doi.org/10.1016/j.jhydrol.2021.126266
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).