Abstract
Determining groundwater potential is vital for groundwater resource management. This study aims to present a comparative analysis of three widely used ensemble techniques (averaging, bagging, and boosting) in groundwater spring potential mapping. Firstly, 12 spring-related factors and a total of 79 groundwater spring locations were collected and used as the dataset. Secondly, three typical ensemble models were adopted to predict groundwater spring potential, namely, Bayesian model averaging (BMA), random forest (RF), and the gradient boosting decision tree (GBDT). The area under the receiver operating characteristics curve (AUC) and four statistical indexes (accuracy, sensitivity, specificity, and the root mean square error (RMSE)) were used to estimate the model's accuracy. The results indicate that the three models had a good predictive performance and that the AUC values of the GBDT, RF, and BMA were 0.88, 0.84, and 0.78, respectively. Furthermore, the GBDT had the best performance (accuracy = 0.89, sensitivity = 0.91, specificity = 0.87, and RMSE = 0.33) in terms of the four indexes, followed by RF (accuracy = 0.87, sensitivity = 0.91, specificity = 0.83, and RMSE = 0.36) and BMA (accuracy = 0.76, sensitivity = 0.87, specificity = 0.65, and RMSE = 0.49). This research can provide effective guidance for using ensemble models for mapping groundwater spring potential in the future.
HIGHLIGHTS
Ensemble machine algorithms were compared and used to identify the potential zone of groundwater spring.
The Bayesian model averaging can be selected to map groundwater spring potential.
The three ensemble models show good predictive performance.
Graphical Abstract
INTRODUCTION
Groundwater is a vital water source for domestic, agricultural, and industrial water supplies all over the world. More than half the world's population depends on groundwater, especially in arid and semi-arid regions (Arabameri et al. 2019). Owing to the rapidly increasing demands of the population and the quick pace of industrialization, the demand for groundwater resources has increased, which can cause a water shortage (Rizeei et al. 2019). Moreover, excessive pumping of groundwater leads to a drawdown of groundwater and deterioration of groundwater quality. As a result, it is necessary to identify groundwater potential zones (Pal et al. 2020). Groundwater springs usually emerge in regions with a high level of groundwater and high groundwater storage potential (Kamali Maskooni et al. 2020; Yousefi et al. 2020). Therefore, evaluating groundwater spring potential (GSP) has been effective in identifying the state of groundwater resources and implementing a successful groundwater determination, protection, and management program (Naghibi et al. 2020).
GSP is commonly defined as the likelihood of groundwater springs in an area (Yousefi et al. 2020). The traditional tools to determine groundwater potential zones are mainly related to geophysical, geological, and hydrological methods. Although these methods can precisely estimate the groundwater potential zone, they can be time-consuming and costly (Rizeei et al. 2019). The geographic information system (GIS) and remote sensing (RS) are very practical for managing spatial data and preparing thematic maps commonly employed in various modeling techniques to evaluate GSP (Chowdhury et al. 2009; Kamali Maskooni et al. 2020). Various knowledge-driven approaches and statistical techniques have been used to map GSP, such as the analytical hierarchy process, frequency ratio, the logistic regression method, weights of evidence, multi-criteria decision analysis, index of entropy, and the evidential belief function (Chowdhury et al. 2009; Dilekoglu & Aslan 2022). These methods are relatively easy to realize, especially in large-scale regions. They are ineffective in dealing with the complex relationship between the conditioning factors and GSP (Chen et al. 2020; Naghibi et al. 2020; Mosavi et al. 2021). GSP is complex and depends on several surface and subsurface factors, thus delineation of GSP is a complex task (Misi et al. 2018; Patidar et al. 2021; Dilekoglu & Aslan 2022).
In the past ten years, machine learning methods have been viewed as potential cost-effective tools to predicate the spatial probability of GSP. Various researchers have developed many machine learning approaches, including the artificial neural network, classification and regression tree, logistic regression, random forest (RF), boosted regression trees, adaptive boosting classification trees, support vector machine, and deep neural networks (Chowdhury et al. 2009; Rahmati et al. 2016; Nguyen et al. 2020; Patidar et al. 2021). Several related pieces of research have proven that the machine learning method can deal with complex and high-dimensional data, which usually outperform conventional methods for mapping GSP (Kamali Maskooni et al. 2020; Naghibi et al. 2020; Zounemat-Kermani et al. 2021).
To avoid the bias of a single machine learning method in spatial modeling fields, the ensemble machine learning technique is rapidly growing in recent years (Rizeei et al. 2019; Kamali Maskooni et al. 2020; Yousefi et al. 2020; Patidar et al. 2021; Rashki Ghaleh Nou & Azhdary Moghaddam 2021). Ensemble learning started in the 1990s. It consists of two or more base classifiers and creates a new powerful model (Zounemat-Kermani et al. 2021). The results of related research indicate that the ensemble method can obtain more reliable prediction results than single predictive models, which have been successfully applied in similar studies, such as landslide susceptibility, flood susceptibility, and groundwater pollution assessment (Hong et al. 2018; Sachdeva & Kumar 2021). A development tendency for delineating GSP is the use of ensemble model (Yousefi et al. 2020; Mosavi et al. 2021). Generally, there are mainly three representative types of ensemble methods, i.e., averaging, bagging, and boosting (Zounemat-Kermani et al. 2021). For instance, Rahmati et al. (2016) introduced the RF model for assessing GSP and reported good performance results of the ensemble model. Maskooni et al. (2020) applied a boosting algorithm, i.e., gradient boosting decision tree (GBDT), and a statistical method for assessing groundwater potential. The results indicated that GBDT was a more suitable candidate for mapping GSP.
The application of hybrid models indicates that the optimum ensemble algorithm should be explored in different regions or for different evaluation objects. As can be seen from a review of related literature, a limited number of studies compared the three types of ensemble techniques (averaging, bagging, and boosting). For example, Bayesian model averaging (BMA) as a typical averaging algorithm was rarely used to map GSP but had been successfully employed in other fields (Mosavi et al. 2021). Thus, it is necessary to further discuss the application of advanced ensemble models in GSP.
This research aims to introduce three popular ensemble techniques (averaging, bagging, and boosting) and compare their performance when selecting the best model to assess GSP. Taking Chengde City in China as the case study, three typical ensemble methods were employed to map its GSP, including averaging (BMA), bagging (RF), and boosting (the GBDT). The objectives of this research are: (1) to explore the three ensemble models used for mapping GSP, (2) to discuss the applicability of ensemble techniques, and (3) to generate the reliable groundwater spring potential zone in the study area.
STUDY AREA
The geological deposits of the area are characterized by many outcrops from the Archaeozoic to the Cenozoic. The groundwater in the study area is a vital resource for industrial, agricultural, and domestic purposes (Tian et al. 2019). The main aquifers in the area can be divided into pore water in loose sediments and fissure water in bedrock, which is generally unconfined. The pore water in loose Quaternary sediments can be found in the river valley and the intermountain basin in the study area, which is also the main aquifer. The groundwater system in fractured rock mainly consists of the clastic rock fissure aquifer and the carbonate bedrock aquifer. Precipitation is the primary way for groundwater to refill, followed by river leakage and irrigation infiltration. Springs are an important point of exit for fissure groundwater systems. Due to intensive groundwater exploitation, the drawdown of the water table is serious. However, research conducted in the study area indicates that the resources of natural mineral water were abundant but had a high level of strontium (Sr), a low level of sodium, and low alkalinity (Wang et al. 2021). Therefore, the delineation of the GSP zone can help in managing the water supply in the study area, especially natural mineral water.
MATERIALS AND METHODS
Data collection and preparation
Spring inventory map
The locations of the springs were finally identified based on extensive field surveys and collected documents. A total of 75 spring locations were found and presented in a raster format (Figure 1). The springs were randomly split into the training set (70%, 52 springs) and the validation dataset (30%, 23 springs). Similarly, 75 locations without springs were randomly selected from these areas, as shown in Figure 1. These locations were also divided into the training and validation dataset using the same proportion.
Conditioning factors
It is vital to select the conditioning factors of GSP. Based on a literature review, occurrence of springs, and the availability of data in the study area (Chen et al. 2020; Naghibi et al. 2020; Yousefi et al. 2020; Sarkar et al. 2022), this research selected 12 groundwater spring conditioning variables. The selected factors include the altitude, slope angle, aspect, plan curvature, profile curvature, topographic wetness index (TWI), normalized difference vegetation index (NDVI), lithology, distance to faults, distance to rivers, rainfall, and land use. All factors were converted into a thematic map with a grid size of 30 × 30 m. Furthermore, the kriging spatial interpolation technique was selected to transform the discrete data into a continuous surface.
Spatial distribution of the parameters: (a) elevation, (b) slope angle, (c) aspect, (d) plan curvature, (e) profile curvature, (f) TWI, (g) NDVI, (h) lithology, (i) distance to faults, (j) distance to rivers, (k) rainfall, (l) land use.
Spatial distribution of the parameters: (a) elevation, (b) slope angle, (c) aspect, (d) plan curvature, (e) profile curvature, (f) TWI, (g) NDVI, (h) lithology, (i) distance to faults, (j) distance to rivers, (k) rainfall, (l) land use.
The NDVI indicates the status of vegetation and is frequently identified as a significant factor affecting the distribution of groundwater spring locations (Pal et al. 2020). The parameter of the NDVI was calculated with red and near-infrared bands using Landsat 8 images from 2016. Figure 3(g) presents the NDVI distribution in the study area.
Groundwater storage is governed by lithological units directly affecting the groundwater potential zone (Yousefi et al. 2020). The lithological classes of the study area were obtained through boreholes and a geological map on a 1:100,000 scale (Figure 3(h)). The rock around fault structures was severely fragmented and percolated a huge amount of water towards the underground. Springs usually emerge near the fault zone in the study area. The shorter the distance to faults, the higher the potential of infiltration. The location of faults was extracted from geological maps on a 1:100,000 scale. The map of the distance to faults was produced by a buffering method in the GIS environment (Figure 3(i)).
Rivers are one of the main natural sources that can refill groundwater systems, especially in semi-arid and arid areas. When an area is nearer to rivers, there is a higher chance for springs to occur in it (Chen et al. 2020). Thus, the distance to rivers was obtained in the GIS environment using a buffering technique, as shown in Figure 3(j). Rainfall is a primary source of groundwater recharge in this region. Therefore, rainfall is also considered as a significant hydrologic factor, which can directly influence water infiltration and the water table. The average annual rainfall (from 2000 to 2015) was collected from five precipitation stations in the study area (Figure 3(k)).
Land use can describe the status of ecological conditions and reflect anthropological activities, which can impact the groundwater resources (Yousefi et al. 2020). Since the type of land cover indirectly influences the recharge rates and soil infiltration capacity, it can also influence GSP. The patterns of land use in Chengde City in 2016 were included in the research and classified into five types, namely, agricultural land and grassland, forest, waterbody, residential land, and barren land (Figure 3(l)).
BMA
BMA is a statistical analysis method and one of the most efficient and well-known averaging approaches, which takes into account model uncertainty in the Bayesian framework (Zounemat-Kermani et al. 2021). In Bayesian statistics, BMA infers the distribution probability of forecasting variables using the posterior probability of each ensemble member. It differs from other model averaging techniques because it combines the predictions of each model based on the relative performance of selected base learners during the training period, because of which it has been successfully applied in various fields (Raftery et al. 2005). Furthermore, the key advantage of this ensemble algorithm is its ability to obtain optimized weights of each base learner by maximizing the merits of different methods (Raftery et al. 2005). Therefore, BMA was selected as an averaging ensemble technique to assess GSP. In this research, the exception maximization algorithm was used to estimate the optimized weights and variances of every ensemble member by maximizing the log-likelihood function.
RF
Bagging is an ensemble technique that generates a unique bootstrap sample from the original training dataset for each decision tree. RF is a supervised ensemble machine learning algorithm and an efficient bagging ensemble model based on decision tree algorithms used for classifying regression problems (Breiman 2001; Rahmati et al. 2016). In assessing GSP, the RF algorithm operates by building many classification trees, which produces an estimate of classes during the training period, while bootstrap sampling with replacement is performed to create a new training set. The unselected variable or data are often called out-of-bag samples. The prediction error and validation are calculated based on these samples. The bootstrapping technique combined with the bagging algorithm helps the RF model avoid over-fitting problems.
The key parameters of the RF model are the number of variables included in each classification tree and the number of trees (Breiman 2001). The RF algorithm uses the Gini impurity as a measure for the best split selection. The value of the Gini index can measure the impurity of a given element in relation to the rest of the classes. The number of trees and variables in this research were selected based on previous research (Rahmati et al. 2016; Naghibi et al. 2020). Likewise, the 10-fold cross-validation scheme and the grid search method were used to determine the suitable number of trees and variables.
GBDT
Boosting is a hybrid multi-classifier combination algorithm and can be used for both classification and regression tasks. The main difference between it and the bagging ensemble techniques is the relevance of weak base learners. In boosting, the subsequent classifier is also trained based on the classification result from the previous base learner (Liang et al. 2021). The main objective of the bagging technique is to combine several weak learners into a strong classifier through multiple iterations. In contrast, the base learners in bagging are independent of each other, so the dataset of each base learner is created from the sample sets by the bootstrap resampling technique (Naghibi et al. 2020).
GBDT belongs to an ensemble learning method with a decision tree as the base classifier. It is also one of the most effective ensemble techniques in the boosting family (Friedman 2001). In essence, the residual of the previous base learner is selected as the input for the next base learner (Liang et al. 2021). The main characteristic of the GBDT algorithm is its training strategy. The value of the loss function of the current iteration decreases along the gradient of the reducing residuals from the previous step (Sachdeva & Kumar 2021). At the same time, the input dataset of the new base learner is changed by adjusting the weights of misclassified samples from the previous learner. The iteration process cannot be terminated until the number of iterations and the value error reach the pre-established condition. The weighted summations of weak classifiers are added up to produce the final result.
Validation and comparison
The validation and comparison process is an essential step in the development and application of multiple predictive models. Several indices have been proposed and used for this purpose. In this study, the performance of ensemble models was assessed by utilizing the receiver operating characteristics (ROC) curve, accuracy, sensitivity, specificity, and the root mean square error (RMSE) (Arabameri et al. 2019; Yousefi et al. 2020).
The ROC curve is one of the most commonly used methods to evaluate the performance of supervised machine learning models (Liang et al. 2021). In the ROC method, the area under the curve (AUC) is used to evaluate the accuracy of models. In general, the relationship between the AUC and model performance can be described as 0.5–0.6 (low accuracy), 0.6–0.7 (moderate accuracy), 0.7–0.8 (high accuracy), 0.8–0.9 (very high accuracy), and 0.9–1(excellent accuracy) (Hong et al. 2018; Arabameri et al. 2019).
RESULTS
Multicollinearity analysis
When the VIF value is greater than 5 and TOL is less than 0.1, a strong multicollinearity problem occurs among the predicted factors (Arabameri et al. 2019). The results are presented in Table 1. Elevation had the highest VIF value (1.803) and the lowest TOL value (0.555). Overall, there was no obvious multicollinearity among the 12 factors, so they were proven suitable for the further analysis of ensemble models.
The results of the multi-collinearity analysis
Parameters . | Collinearity statistics . | |
---|---|---|
TOL . | VIF . | |
Aspect | 0.947 | 1.056 |
Elevation | 0.555 | 1.803 |
Distance to faults | 0.754 | 1.327 |
Land use | 0.927 | 1.079 |
NDVI | 0.680 | 1.470 |
Plan curvature | 0.871 | 1.148 |
Profile curvature | 0.655 | 1.527 |
Rainfall | 0.609 | 1.642 |
Distance to rivers | 0.815 | 1.228 |
Lithology | 0.771 | 1.298 |
Parameters . | Collinearity statistics . | |
---|---|---|
TOL . | VIF . | |
Aspect | 0.947 | 1.056 |
Elevation | 0.555 | 1.803 |
Distance to faults | 0.754 | 1.327 |
Land use | 0.927 | 1.079 |
NDVI | 0.680 | 1.470 |
Plan curvature | 0.871 | 1.148 |
Profile curvature | 0.655 | 1.527 |
Rainfall | 0.609 | 1.642 |
Distance to rivers | 0.815 | 1.228 |
Lithology | 0.771 | 1.298 |
Model performance
Similar performance phenomenon emerged while calculating statistical values. The results show that the GBDT had the highest accuracy (0.89), sensitivity (0.91), and specificity (0.87), followed by the RF (accuracy = 0.87, sensitivity = 0.91, and specificity = 0.83) and BMA (accuracy = 0.76, sensitivity = 0.87, and specificity = 0.65). In the case of RMSE, the value of GBDT, RF, and BMA were 0.33, 0.36, and 0.49, respectively.
In general, the GBDT model was ranked as the best one with slightly better performance than the RF model. The BMA model had relatively the lowest performance among the three ensemble models.
The groundwater potential map
GSP map using ensemble models: (a) the BMA model, (b) the RF model, and (c) the GBDT model.
GSP map using ensemble models: (a) the BMA model, (b) the RF model, and (c) the GBDT model.
As shown in Figure 5(a), for the BMA model, the GSP of the total area ranging from very low to very high was 14.20%, 36.03%, 29.25%, 15.61%, and 4.91%, respectively. The RF model showed that 12.32% and 25.94% of the total area has very low and low potential for groundwater springs, while 25.93%, 22.80%, and 13.01% of the area has moderate, high and very high potential degrees, respectively (Figure 5(b)).
The GBDT model proved to be the most accurate among the three ensemble models and was selected to map GSP in the study area. As can be seen from Figure 5(c), very high and high potentials accounted for 6.12% and 26.12% of the total area, respectively. The area with the highest potential was found on the gentle slopes, where the land is mainly agricultural land and grassland. Moreover, rivers had a direct relationship with GSP. The areas located in the vicinity of the main rivers fell in the category of very high or high GSP. The moderate potential was estimated in 38.30% of the study area. Furthermore, 23.91% and 5.55% of the study area were divided into low and very low groundwater potential. The areas with a high elevation and steeper slopes had low GPS.
DISCUSSION
Determining groundwater spring potential zones is a cost-effective way to properly plan groundwater resource management. In recent years, the research on mapping GSP has significantly increased, especially the ones about utilizing machine learning methods. In most cases, ensemble machine learning models have proven to be significantly better than single ones (Rizeei et al. 2019; Chen et al. 2020). However, existing research has rarely compared the performance of the popular ensemble techniques (averaging, boosting, and bagging) in predicting GSP. Therefore, this research aimed to compare and use the three ensemble techniques for mapping GSP.
BMA has received a lot of attention in hydrology because it has proved to be one of the most efficient averaging approaches. However, it has not yet been employed in GSP mapping (Zounemat-Kermani et al. 2021). The RF model, on the other hand, has often been used for assessing groundwater potential (Rahmati et al. 2016; Pal et al. 2020). The GBDT model has a good comprehensive performance in environmental and ecological modeling (Liang et al. 2021). Therefore, this study selected these three popular ensemble models (BMA, RF, and GBDT) for mapping GSP in Chengde City (Hebei Province, China). The three models were further compared using the ROC method and the four statistical indexes (accuracy, sensitivity, specificity, and RMSE).
The results of this research are in line with previous research, indicating that the ensemble models perform well in mapping GSP (Chen et al. 2020; Naghibi et al. 2020; Zounemat-Kermani et al. 2021), with an AUC value greater than 0.70. Li & Tsai (2009) and Moazamnia et al. (2019) recommended the BMA model to predicate groundwater storage because it has higher accuracy compared to individual models. The BMA approach outperforms the single model technique owing to its ability to assign a weight and variance to each of the multiple models based on their relative performance (Li & Tsai 2009). Even though BMA has been widely used in hydrology, few studies used this algorithm for GSP. The AUC value of the BMA model in this research reached 0.78. The results indicate that BMA has good performance and that it can be used for mapping GSP.
Meanwhile, ensemble techniques bagging and boosting have been popularly used in groundwater potential mapping. Rizeei et al. (2019) stated that a novel ensemble boosting method had superior predictive performance in assessing GSP, while Patidar et al. (2021) noted that the RF model showed good prediction results of groundwater potential zones. Similarly, the high accuracy of bagging and boosting ensemble models was observed in this research, with the values of AUC for RF and GBDT being 0.84 and 0.88, respectively.
Furthermore, this research presented a detailed comparison of the three ensemble techniques using four statistical indexes. The GBDT model had the best performance, followed by RF and BMA. Similarly, various research found that the GBDT had higher performance than the RF model in landslide susceptibility mapping (Hong et al. 2018; Yousefi et al. 2020; Liang et al. 2021). The results of this research are in agreement with Sachdeva & Kumar (2021), who also indicated that the GBDT performed better than the RF model in groundwater potential mapping. However, the results of this study and several other research cannot verify that the boosting algorithm model always performs better than the bagging ensemble model. The results of some research showed different comparative results. For instance, Mosavi et al. (2021) and Chen et al. (2020) indicated that the bagging algorithm RF had a higher performance than the boosting algorithm, which is the adaptive boosting classification trees, in mapping groundwater potential.
CONCLUSION
This study applied three popular ensemble models, BMA, RF, and GBDT, in GSP mapping in Chengde City (Hebei Province, China). 75 groundwater spring locations and 12 influencing factors were collected from the study area. Factor selection was verified by multicollinearity analysis. The dataset was split into training (approximately 70%) and testing (30%) datasets. Furthermore, the comparison and validation of the ensemble models were performed using the ROC method and four statistical indexes (accuracy, sensitivity, specificity, and RMSE). Based on the ROC method, all three ensemble models exhibited good predictive performance, with the AUC value greater than 0.70. The averaging method can also be used to assess GSP instead of the bagging and boosting ensemble models. It is further indicated that the GBDT model had the best accuracy, followed by RF and BMA. Finally, the three models were further used to delineate the GSP of the study area. The result of the best model, GBDT, revealed that 32.24% of the area had a high or very high GSP, while the moderate, low, and very low potentials occupied 38.30%, 23.91%, and 5.55% of the study area, respectively. Areas with high GSP were mainly distributed in regions with gentle slopes or near rivers.
In general, the result can provide effective guidance for the further use of ensemble models in GSP mapping. However, the main limitation of this research is the exclusion of further optimization of the model parameters, such as the hyperparameter. It included only the trial-and-error method and grid search, which should also be comprehensively and comparatively analyzed in the future.
ACKNOWLEDGEMENTS
This research was supported by S&T Program of Hebei (D2022403032) and the Graduate Students Teaching Case of Hebei Province (KCJSZ2019090). The authors are indebted to the anonymous reviewers and the editors, who significantly improved the quality of the paper.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.