Soil erosion represents a significant environmental threat, particularly in watersheds with complex geomorphological characteristics. This study evaluates soil erosion vulnerability (SEV) in the Saddang Watershed, Indonesia, using tree-based machine learning (ML) models: decision tree (DT) and random forest (RF). The research incorporates key hydrological, topographical, and environmental factors, such as drainage density, rainfall erosivity, soil bulk density, and vegetation index, which influence soil erosion dynamics. A dataset of 2,000 locations, validated through 20-fold cross-validation, was utilized for model training and evaluation. Results indicate that the RF model demonstrated superior predictive accuracy (AUCROC: 0.953) compared with the DT model (AUCROC: 0.917), attributed to RF's ensemble methodology that enhances robustness against overfitting. Spatial analysis classified SEV into five categories, highlighting moderate to very high vulnerability near riverbanks and steep terrains. Drainage density and river proximity emerged as the most influential factors, underscoring the necessity for targeted conservation measures in these areas. The integration of ML models with GIS and remote sensing provides a robust framework for real-time SEV assessment, aiding sustainable land-use planning. This approach offers valuable insights for policymakers and practitioners, enabling evidence-based interventions to mitigate soil erosion and enhance environmental resilience in watersheds worldwide.

  • Innovative modeling techniques.

  • Comprehensive factor analysis.

  • Climate change integration.

  • Practical conservation framework.

  • Holistic analytical approach.

Globally, soil erosion poses critical risks to food security, environmental sustainability, and socioeconomic growth due to factors like unsustainable land use, deforestation, climate change, and population growth (Chakrabortty et al. 2020; Sahour et al. 2021; Folharini et al. 2023; Nguyen et al. 2023; Olii et al. 2023). This process degrades arable land, reduces soil fertility, and impairs water quality through sedimentation, which disrupts aquatic ecosystems and water supply (Borrelli et al. 2017; François et al. 2024). To address soil erosion, vulnerability assessments that consider soil type, slope, land use, vegetation, rainfall, and human activities are essential (Shit et al. 2015; Pandey et al. 2021; Mosaid et al. 2022; Nguyen et al. 2023). Identifying soil erosion vulnerability (SEV) areas enables targeted soil erosion control, supports resource allocation, and provides critical insights into underlying soil erosion drivers, which aid in developing tailored soil conservation strategies (Olii & Ichsan 2020; Avand et al. 2021; Olii et al. 2024b).

Traditional SEV assessment methods, such as RUSLE, plot-scale studies, and MCDM, offer useful insights but face limitations, including high uncertainty and limited accuracy across diverse conditions (Alewell et al. 2019; Mosavi et al. 2020; Sahour et al. 2021; Olii et al. 2024b). Machine learning (ML) has improved soil erosion prediction accuracy, efficiently handling large datasets and capturing complex, non-linear interactions (Dinh et al. 2021; Nguyen et al. 2023). ML models can incorporate data from diverse sources – such as remote sensing (RS) imagery, soil properties, and climate variables – providing scalability across spatial and temporal scales (Chakraborty et al. 2024). This capability enables precise SEV modeling to inform sustainable land management and improve decision-making (Band et al. 2020; Ghorbanzadeh et al. 2020; Mosavi et al. 2020; Saha et al. 2020; Avand et al. 2021; Dinh et al. 2021; Sahour et al. 2021; Baiddah et al. 2023; Folharini et al. 2023; Nguyen et al. 2023; Singh et al. 2023; Wang et al. 2023). Integrating RS, GIS, and ML techniques enables robust SEV modeling, as RS provides essential spatial data, while GIS organizes and visualizes these datasets (Mosavi et al. 2020; Baiddah et al. 2023; Nguyen et al. 2023). This study uses tree-based ML algorithms, known for their interpretability, scalability, and resilience, as well as their ability to handle heterogeneous data and provide feature importance analysis (Chen & Guestrin 2016). Regardless of its benefits, careful examination of performance and interpretability trade-offs is essential (Friedman 2001; Liaw & Wiener 2002).

Most SEV studies using ML rely on continuous data, which may decrease interpretability and lead to inaccuracies. To address this, we propose a novel method combining classification techniques with ML-based SEV factor weighting, enhancing interpretability for decision-makers and improving model accuracy, especially in unique environments (Ghorbanzadeh et al. 2020; Mosavi et al. 2020; Saha et al. 2020; Dinh et al. 2021; Sahour et al. 2021; Baiddah et al. 2023; Chakrabortty & Pal 2023; Nguyen et al. 2023; Wang et al. 2023). This study addresses the limitations of current SEV models by integrating traditional classification techniques with advanced ML-based factor weighting, enhancing both interpretability and accuracy. By mitigating issues such as data complexity, overfitting, and multicollinearity, this approach makes model results more accessible to decision-makers and adaptable to varied geographic contexts. The model achieves improved prediction stability through a flexible classification system, especially in regions with unique environmental factors. We compare tree-based ML algorithms, including decision tree (DT) and random forest (RF), and integrate them with RS and GIS data to identify the most effective SEV prediction model for South Sulawesi's Saddang Watershed. This method not only improves forecast precision but also offers deeper insights into soil erosion dynamics, advancing SEV prediction and land management practices. Findings from this study are expected to support evidence-based decision-making, enabling stakeholders to create sustainable land-use policies tailored to the region's characteristics. This research also highlights broader applications of tree-based ML, GIS, and RS methodologies in environmental management, showing potential for adaptation across various landscapes. These advancements contribute meaningfully to soil erosion studies and offer scalable solutions for other regions with similar environmental challenges.

Study area

The Saddang Watershed is located in the southwestern part of Sulawesi Island, which has a tropical rainforest climate, between 2°43′42.4992″ S–3°34′51.4992″ S latitude and 119°14′49.4988″ E–120°3′43.4988″ E longitude, and has a 4,909 km2 area (see Figure 1). The main river in the Saddang Watershed is the Saddang River, a cross-provincial river area, namely South Sulawesi and West Sulawesi. The Saddang Watershed consists of several districts, namely Enrekang, Tana Toraja, and North Toraja in South Sulawesi Province to Polewali in West Sulawesi. The Saddang River flows into the Makassar Strait through two estuaries, namely the Barbana and Paria estuaries. The study area is characterized by steep terrain, with elevation ranging from 44 to 2,880 m and an average of 1,277 m above mean sea level (MSL). The region has four geomorphological units: fallen deposits, mountain/hill volcanoes, strongly incised folded mountains/hills, and karst hills. Land use throughout the study area consists of settlements, rice fields, plantations, grasslands, swamps, water bodies, and other designated areas dominated by mixed dryland agricultural land and dryland forests. The average yearly temperature is about 23 °C. The hottest month is October, with an average temperature of 26 °C, while the coldest month is June, with an average temperature of 22 °C. The yearly average rainfall is 2,500 mm. The month with the most precipitation is May, with an average of 387 mm, and the least is September, with an average of 68 mm. There is a Benteng Dam, which facilitates irrigation in an area of 94,222 ha in the Saddang Watershed, while downstream of the Saddang River is the Bakaru Hydroelectric Power Plant (2 × 64 MW). The groundwater potential in the region is estimated at around 1.354 million m3 per year.
Figure 1

Location of the study area.

Figure 1

Location of the study area.

Close modal

Overview of methodological framework

Different types of data were collected from various sources to carry out the intent of our study (Table 1). The methodology consisted of several fundamental building blocks to ensure the accuracy of the SEV prediction. Figure 2 presents the schematic of the methodology workflow from data sampling to SEV prediction. The method consisted of five sections: (i) preparation and collection of the relevant factors for SEV modeling; (ii) soil erosion inventory mapping; (iii) selection of the SEV factors; (iv) soil erosion modeling using ML models; and (v) evaluating the models' performance.
Table 1

Data sources

Data typesSourcesScale
SRTM 1 ARC-Second Global:
s03_e119_1arc_v3; s03_e120_1arc_v3; s04_e119_1arc_v3; and s04_e120_1arc_v3 
https://earthexplorer.usgs.gov/ 30 × 30 m 
Landsat 9 OLI/TIRS C2 L1:
LC09_L1TP_115062_20230929_20230929_02_T1 
https://earthexplorer.usgs.gov/ 30 × 30 m 
Soil texture map (clay, silt, and sand content) https://soilgrids.org/ 250 × 250 m 
Soil organic carbon map https://soilgrids.org/ 250 × 250 m 
Soil bulk density map https://soilgrids.org/ 250 × 250 m 
Rainfall data https://power.larc.nasa.gov/data-access-viewer/ 0.25° × 0.25° 
Google Earth Image Data SIO, NOAA, US Navy, NGA, GEBCO 30 × 30 image 
Boundary administration https://gadm.org/ Shapefile 
Data typesSourcesScale
SRTM 1 ARC-Second Global:
s03_e119_1arc_v3; s03_e120_1arc_v3; s04_e119_1arc_v3; and s04_e120_1arc_v3 
https://earthexplorer.usgs.gov/ 30 × 30 m 
Landsat 9 OLI/TIRS C2 L1:
LC09_L1TP_115062_20230929_20230929_02_T1 
https://earthexplorer.usgs.gov/ 30 × 30 m 
Soil texture map (clay, silt, and sand content) https://soilgrids.org/ 250 × 250 m 
Soil organic carbon map https://soilgrids.org/ 250 × 250 m 
Soil bulk density map https://soilgrids.org/ 250 × 250 m 
Rainfall data https://power.larc.nasa.gov/data-access-viewer/ 0.25° × 0.25° 
Google Earth Image Data SIO, NOAA, US Navy, NGA, GEBCO 30 × 30 image 
Boundary administration https://gadm.org/ Shapefile 
Figure 2

Flowchart illustrating the methods of the current study.

Figure 2

Flowchart illustrating the methods of the current study.

Close modal

Soil erosion inventory mapping

The soil inventory map is essential for preparing the SEV model by various predictive models and was considered the dependent variable in this study area. It was necessary to know the locations of eroded and non-eroded regions for susceptibility mapping the Saddang Watershed. Therefore, the locations (i.e., x and y-coordinates) of 2,000 areas (1,000 soil erosion locations and 1,000 non-soil erosion locations) were sampled through field surveys and interpretation using SAS Planet and Google Earth to model the SEV based on a binary scale (occurrence/non-occurrence). The soil erosion types include soil erosions (such as sheet, rill, gully, and mass movements).

Cross-validation with 20 k-fold and stratified sampling is a robust method to evaluate model performance. The dataset is split into 20 equally sized subsets (folds), ensuring each fold maintains the same class distribution as the original dataset. The model is trained on 19 folds and validated on the remaining one, repeating the process 20 times with a different validation fold each time. This technique reduces overfitting and provides a reliable performance estimate by averaging the results across all folds. Stratified sampling ensures a balanced representation of classes, making it particularly effective for imbalanced datasets, and improving fairness in evaluation metrics.

Selection of the SEV factors

The factors were chosen using the following criteria: data availability, previous experiences and reports in the literature, data connectivity and heterogeneity, and local geo-environmental characteristics. According to these criteria, 13 important soil erosion factors were collected and compiled for this study, including hydrological (rainfall erosivity, topographical wetness index (TWI), distance to the river, stream power index (SPI), and drainage density), topographic (slope length factor and topographic roughness index (TRI)), and environmental (bulk density, clay ratio, soil organic carbon, and normalized difference vegetation index (NDVI)) factors.

Hydrological factors
Hydrological factors play an important role in SEV because they directly influence the movement of water across the terrain, which affects soil erosion processes (Figure 3). Rainfall erosivity is the ability of rainfall to cause soil erosion, which is influenced by its intensity, duration, and frequency. High rainfall erosivity raises SEV by encouraging surface flow and soil separation. Intense rainfall causes increased erosive forces, especially on exposed soil surfaces. Prolonged or frequent rainfall increases soil erosion by increasing the duration of erosive conditions. Furthermore, uneven rainfall distribution can result in concentrated runoff, accelerating soil erosion in sensitive locations. SEV is increased in locations with high rainfall erosivity, especially when combined with variables like steep slopes or low vegetation cover. In the present study, monthly rainfall data for 10 years (2013–2022) were used to calculate the rainfall erosivity from the following equation developed by Wischmeier & Smith (1978) in Prasannakumar et al. (2011).
(1)
where R is the rainfall erosivity factor (MJ mm ha−1 h−1 year−1), Pm is the monthly rainfall (mm), and Pa is the annual rainfall (mm).
Figure 3

Hydrological factors of SEV in Saddang Watershed.

Figure 3

Hydrological factors of SEV in Saddang Watershed.

Close modal
The TWI measures landscape wetness by combining slope and upslope contributing areas. High TWI values imply wetter environments with greater soil saturation and runoff. SEV is higher in places with high TWI due to increased soil moisture content, resulting in lower soil cohesiveness and an increased possibility of mass movements. Steep slopes and high TWI exacerbate SEV since runoff accumulates momentum more quickly. The TWI was derived from DEM using the following equation:
(2)
where As is the upstream contributing area and β is the slope gradient (in radians).
The SPI evaluates the erosive force of flowing water using channel slope and drainage area. High SPI values indicate places with a greater SEV due to increased stream energies. Higher SPI values are caused by steeper slopes and larger drainage areas, which increase SEV. High SPI locations have more intense hydraulic stresses, which leads to increased sediment transport capacity and soil erosion detachment. As a result, soil erosion rates are enhanced in places with high SPI values. The SPI was calculated from DEM using the following equation:
(3)
where As is the upstream contributing area and β is the slope gradient (in radians).

The distance to the river and drainage density are crucial factors in determining SEV. SEV is influenced by a variety of factors, including distance to waterways. Areas close to rivers frequently face higher soil erosion rates due to the erosive power of concentrating water flow and the increased possibility of floods. Sediment transportation is facilitated, resulting in higher soil erosion downstream. Furthermore, rivers can modify local microclimates, influencing flora growth and soil stability. Drainage density, or the concentration of streams and channels in a landscape, is also an important consideration. High drainage density is associated with enhanced runoff potential, sediment transport, and more frequent erosive occurrences.

Topographic factors
Topographic factors critically impact SEV by dictating water runoff speed and volume (Figure 4). The slope length factor is a metric that quantifies the potential for sediment to be transported by surface water within a certain area. It considers parameters such as the slope of the terrain, flow length, and intensity of water runoff. The slope length factor aids in determining the vulnerability and amount of soil erosion by measuring how easily soil particles can become separated and transported away by surface water. The slope length factor was calculated using Equation (4) or (5).
(4)
(5)
where λ is the slope length (m), β is the slope gradient (in radians), and m is the slope length exponent.
(6)
(7)
where F is the ratio between rill soil erosion and interrill soil erosion, and β is the slope gradient (in radians).
Figure 4

Topographic factors of SEV in Saddang Watershed.

Figure 4

Topographic factors of SEV in Saddang Watershed.

Close modal
The TRI is a measure that quantifies the variability in terrain elevation over a specific area, indicating the degree of unevenness or ruggedness of the landscape. It is calculated by comparing the elevation differences between a central point and its surrounding points (Stambaugh & Guyette 2008). High TRI values signify more rugged, uneven terrain with significant elevation changes, while low TRI values indicate smoother, more uniform landscapes. TRI assesses terrain complexity and its impact on soil erosion, water flow, and habitat suitability. The following equation was adopted to compute the TRI:
(8)
where xij is the elevation of the neighbor grid (0,0).
Environmental factors
Environmental factors impact SEV by influencing the conditions under which soil erosion occurs (Figure 5). Bulk density, clay ratio, and soil organic carbon significantly impact soil erosion. High bulk density indicates compacted soil with reduced pore space, leading to lower infiltration rates and increased surface runoff, which enhances soil erosion. The clay ratio affects soil structure and cohesion; higher clay content generally improves soil's resistance to soil erosion due to better particle binding, although excessive clay can lead to poor drainage and surface crusting. Soil organic carbon plays a crucial role in maintaining soil structure and fertility; higher organic carbon levels improve soil aggregation and porosity, enhancing water infiltration, and reducing soil erosion. Together, these factors determine the SEV to erosive forces and its ability to withstand soil erosion.
Figure 5

Environmental factors of SEV in Saddang Watershed.

Figure 5

Environmental factors of SEV in Saddang Watershed.

Close modal
Figure 6

Spatial distribution of SEV of each method in Saddang Watershed: DT (Left) and RF (Right).

Figure 6

Spatial distribution of SEV of each method in Saddang Watershed: DT (Left) and RF (Right).

Close modal
The NDVI is an RS-based measure that assesses vegetation health and density by comparing near-infrared and visible light reflectances. High NDVI values denote dense, healthy vegetation, whereas low values imply sparse or stressed vegetation. NDVI affects soil erosion by reflecting the amount of vegetation cover, which protects the soil surface from direct raindrop impact, lowers surface runoff, and stabilizes soil with plant roots. Higher NDVI values correlate with reduced soil erosion rates because dense plant cover reduces soil erosion processes, whereas lower NDVI values correlate with higher soil erosion due to insufficient vegetative protection. The NDVI is then calculated using the following formula:
(9)
Multicollinearity analysis

Collinearity occurs when an independent variable is a linear function of another independent variable (Mosaid et al. 2024). High collinearity in a regression equation indicates a strong correlation between independent variables, which can compromise the model's accuracy and the reliability of its coefficients (Dormann et al. 2013). To evaluate collinearity among independent variables, this study employed two common indicators: tolerance (TOL) and variance inflation factor (VIF) (Miles 2014). These measures provide insights into the extent of collinearity present in the data. While no universally accepted thresholds exist, the literature suggests widely used criteria for interpreting these indices: a VIF value of ≤ 5 or 10 and a TOL value of ≥ 0.1 or 0.2 generally indicate acceptable levels of collinearity (Arabameri et al. 2019). These thresholds imply that the independent variables are not excessively correlated and can function reliably in the regression model. This assessment is essential for ensuring the robustness and accuracy of the model's results.

Soil erosion modeling using ML models

Tree-based ML algorithms, such as DT and RF, are highly effective for modeling soil erosion. These algorithms use DTs to understand and predict SEV patterns. Their ability to handle complex, non-linear relationships in data makes them well-suited for accurately predicting SEV, aiding in developing targeted soil conservation strategies.

Decision tree

The DT algorithm generates a tree-like structure, with each internal node representing a decision based on a feature attribute and each leaf node representing a class label or regression value (Kotsiantis 2013). The technique starts with the complete dataset and separates it recursively at each node based on feature qualities to maximize information gain or decrease impurity (Ghosh & Maiti 2021). This procedure will continue until a stopping requirement, such as a maximum tree depth or minimum samples per leaf, is fulfilled. DTs provide interpretable models and can handle categorical and continuous data. They are constructed iteratively by picking the optimum split at each node depending on criteria such as Gini impurity or information gain, giving a flexible approach for classification and regression tasks (Band et al. 2020).

Random forest

The RF algorithm is a non-parametric ensemble-supervised ML model (Breiman 2001). The RF algorithm creates an ensemble of DTs using bootstrap sampling and random feature selection. Numerous DTs are initially created from randomly selected training data subsets. At each tree node, a portion of the whole feature set is randomly picked for splitting purposes. This unpredictability reduces the likelihood of overfitting and encourages variation among the trees. During training, each tree generates predictions individually, and the final prediction is decided by aggregating the outputs of all trees, usually using a majority voting system for classification problems or averaging for regression tasks. This ensemble approach produces a robust and highly accurate model capable of identifying complicated relationships within data while resisting noise and outliers (Herrera et al. 2019). This model is a powerful tool for both classification and regression problems and has been widely used for SEV mapping (Gayen et al. 2019; Ghorbanzadeh et al. 2020; Mosavi et al. 2020; Avand et al. 2021; Jiang et al. 2021; Folharini et al. 2023; Wang et al. 2023).

Evaluating models’ performance

The quality of the produced maps was assessed using the receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC). The ROC curve principle is a 1 − specificity (x-axis) against sensitivity (y-axis). The AUC characterizes the accuracy with which future occurrences are predicted. The AUC value spans from 0.5 to 1.0 and is classified into five grades: bad (<0.6), fair (0.6–0.7), good (0.7–0.8), very good (0.8–0.9), and excellent (>0.9) (Wei et al. 2022; Naceur et al. 2024).

Multicollinearity analysis for SEV factors

Based on Table 2, all factors listed can be used in the SEV model, as their collinearity metrics fall within acceptable thresholds reported in the literature. The TOL values for all variables are above the critical threshold of 0.1, and their VIF values are below 10, indicating that none of the variables exhibit excessive multicollinearity (Miles 2014). While some variables, such as bulk density (VIF = 5.307, TOL = 0.188), show relatively higher VIF and lower TOL values, these remain within the permissible range. This suggests that their inclusion in the model is justifiable and unlikely to compromise the stability or predictive accuracy. Other factors, including rainfall erosivity, NDVI, and distance to the river, demonstrate low VIF values (<2) and high TOL values (>0.5), indicating strong independence and minimal multicollinearity, making them reliable contributors to the model. Variables with moderate collinearity, such as TWI, TRI, and clay ratio, have VIF values below 5 and can also be included without significant risk of instability.

Table 2

Multicollinearity test among SEV factors

SEV factorsR2TOLVIF
Rainfall erosivity 0.317 0.683 1.464 
Topographical wetness index 0.764 0.236 4.241 
Stream power index 0.676 0.324 3.089 
Distance to river 0.454 0.546 1.831 
Drainage density 0.663 0.337 2.965 
Slope length factor 0.516 0.484 2.065 
Topographic roughness index 0.764 0.236 4.241 
Bulk density 0.812 0.188 5.307 
Clay ratio 0.751 0.249 4.013 
Carbon organic 0.701 0.299 3.345 
Normalized difference vegetation index 0.307 0.693 1.442 
SEV factorsR2TOLVIF
Rainfall erosivity 0.317 0.683 1.464 
Topographical wetness index 0.764 0.236 4.241 
Stream power index 0.676 0.324 3.089 
Distance to river 0.454 0.546 1.831 
Drainage density 0.663 0.337 2.965 
Slope length factor 0.516 0.484 2.065 
Topographic roughness index 0.764 0.236 4.241 
Bulk density 0.812 0.188 5.307 
Clay ratio 0.751 0.249 4.013 
Carbon organic 0.701 0.299 3.345 
Normalized difference vegetation index 0.307 0.693 1.442 

Weights for SEV factors

The study evaluates SEV using various factors, each weighted according to its influence and classifies these factors into grids with corresponding area coverage and scores (Table 3). The highest weight is attributed to drainage density, with values of 0.384 in RF and 0.385 in DT. This highlights the critical role of drainage networks in intensifying soil erosion. A higher drainage density indicates a more extensive network of streams and channels, facilitating concentrated water flow, and thereby increasing soil erosion potential in the affected areas (Bhattacharya et al. 2019). Most of the area (41.0%) falls within the 0.0–0.2 range, suggesting a lower vulnerability due to sparser drainage networks. The distance to rivers is the second most influential factor, with weights of 0.257 (RF) and 0.251 (DT). Distance to rivers significantly affects soil erosion, as areas closer to rivers are more vulnerable to the erosive force of flowing water, especially during high-intensity rainfall events. This factor underscores the importance of spatial planning and soil erosion control measures near riverbanks (Gayen et al. 2019). The analysis shows that 56.6% of the area lies more than 1,600 m from rivers, suggesting a lower SEV for those regions. However, 13.1% of the area within 400 m of rivers is more vulnerable.

Table 3

Weights, classes, and scores of SEV factors in Saddang Watershed

VariablesSEV
factors

Weight
Classes of
SEV factors
Grid
total
Area
(km2)
Area
(%)
Scores
Random
forest
Decision
tree
Hydrological data Rainfall erosivity
(MJ mm ha−1 h−1 year−1
0.094 0.100 <1,750 – – – 
1,750–2,000 1,589,039 1,430 29.1 
2,000–2,250 2,167,987 1,951 39.7 
2,250–2,500 1,697,289 1,528 31.1 
>2,500 – – – 
Topographical wetness index 0.003 0.003 <5  2,177,564  1,960 39.9 
5–10  2,990,637  2,692 54.8 
10–15  252,194  227 4.6 
15–20  31,256  28 0.6 
>20  2,664  2 0.0 
Stream power index 0.002 0.003 <0  21,442  19 0.4 
0–5  4,459,813  4,014 81.8 
5–10  922,106  830 16.9 
10–15  45,804  41 0.8 
>15  5,150  5 0.1 
Distance to river
(m) 
0.257 0.251 >1,600  3,085,972  2,777 56.6 
1,200–1,600  504,789  454 9.3 
800–1,200  546,529  492 10.0 
400–800  600,307  540 11.0 
<400  716,718  645 13.1 
Drainage density
(km/km2
0.384 0.385 0.0–0.2  2,234,961  2,011 41.0 
0.2–0.4  2,016,758  1,815 37.0 
0.4–0.6  1,027,461  925 18.8 
0.6–0.8  166,738  150 3.1 
0.8–1.0  8,397  8 0.2 
Topographic data Slope length factor 0.005 0.003 <0.4  1,054,041  949 19.3 
0.4–1.4  105,427  95 1.9 
1.4–3.1  233,839  210 4.3 
3.1–6.8  682,546  614 12.5 
>6.8  3,378,462  3,041 61.9 
Topographic roughness index 0.032 0.031 0.0–0.2  1,054,041  949 19.3 
0.2–0.4  105,427  95 1.9 
0.4–0.6  233,839  210 4.3 
0.6–0.8  682,546  614 12.5 
0.8–1.0  3,378,462  3,041 61.9 
Environmental data Bulk density
(cg/cm3
0.159 0.157 <50 – – – 
50–75  20,451  18 0.4 
75–100  3,219,692  2,898 59.0 
100–125  2,214,172  1,993 40.6 
>125 – – – 
Clay ratio 0.168 0.166 0.0–0.2  – – 
0.2–0.4 143,651 129 2.6 
0.5–0.6 3,152,259 2,837 57.8 
0.7–0.8 2,127,587 1,915 39.0 
0.8–1.0 30,818 28 0.6 
Carbon organic
(dg/kg) 
0.230 0.225 >125 68,582 62 1.3 
100–125 2,072,569  1,865 38.0 
75–100 2,416,955  2,175 44.3 
50–75 875,170 788 16.0 
<50 21,039 19 0.4 
Normalized difference vegetation index 0.134 0.133 >0.7 – – – 
0.5–0.7 348,182 313 6.4 
0.3–0.5 4,320,562 3,889 79.2 
0.2–0.3 338,890 305 6.2 
<0.2 446,681 402 8.2 
VariablesSEV
factors

Weight
Classes of
SEV factors
Grid
total
Area
(km2)
Area
(%)
Scores
Random
forest
Decision
tree
Hydrological data Rainfall erosivity
(MJ mm ha−1 h−1 year−1
0.094 0.100 <1,750 – – – 
1,750–2,000 1,589,039 1,430 29.1 
2,000–2,250 2,167,987 1,951 39.7 
2,250–2,500 1,697,289 1,528 31.1 
>2,500 – – – 
Topographical wetness index 0.003 0.003 <5  2,177,564  1,960 39.9 
5–10  2,990,637  2,692 54.8 
10–15  252,194  227 4.6 
15–20  31,256  28 0.6 
>20  2,664  2 0.0 
Stream power index 0.002 0.003 <0  21,442  19 0.4 
0–5  4,459,813  4,014 81.8 
5–10  922,106  830 16.9 
10–15  45,804  41 0.8 
>15  5,150  5 0.1 
Distance to river
(m) 
0.257 0.251 >1,600  3,085,972  2,777 56.6 
1,200–1,600  504,789  454 9.3 
800–1,200  546,529  492 10.0 
400–800  600,307  540 11.0 
<400  716,718  645 13.1 
Drainage density
(km/km2
0.384 0.385 0.0–0.2  2,234,961  2,011 41.0 
0.2–0.4  2,016,758  1,815 37.0 
0.4–0.6  1,027,461  925 18.8 
0.6–0.8  166,738  150 3.1 
0.8–1.0  8,397  8 0.2 
Topographic data Slope length factor 0.005 0.003 <0.4  1,054,041  949 19.3 
0.4–1.4  105,427  95 1.9 
1.4–3.1  233,839  210 4.3 
3.1–6.8  682,546  614 12.5 
>6.8  3,378,462  3,041 61.9 
Topographic roughness index 0.032 0.031 0.0–0.2  1,054,041  949 19.3 
0.2–0.4  105,427  95 1.9 
0.4–0.6  233,839  210 4.3 
0.6–0.8  682,546  614 12.5 
0.8–1.0  3,378,462  3,041 61.9 
Environmental data Bulk density
(cg/cm3
0.159 0.157 <50 – – – 
50–75  20,451  18 0.4 
75–100  3,219,692  2,898 59.0 
100–125  2,214,172  1,993 40.6 
>125 – – – 
Clay ratio 0.168 0.166 0.0–0.2  – – 
0.2–0.4 143,651 129 2.6 
0.5–0.6 3,152,259 2,837 57.8 
0.7–0.8 2,127,587 1,915 39.0 
0.8–1.0 30,818 28 0.6 
Carbon organic
(dg/kg) 
0.230 0.225 >125 68,582 62 1.3 
100–125 2,072,569  1,865 38.0 
75–100 2,416,955  2,175 44.3 
50–75 875,170 788 16.0 
<50 21,039 19 0.4 
Normalized difference vegetation index 0.134 0.133 >0.7 – – – 
0.5–0.7 348,182 313 6.4 
0.3–0.5 4,320,562 3,889 79.2 
0.2–0.3 338,890 305 6.2 
<0.2 446,681 402 8.2 

Bulk density, with weights of 0.159 (RF) and 0.157 (DT), also plays a substantial role in SEV. This parameter reflects soil compaction, where higher bulk density can reduce water infiltration and increase surface runoff (Zhao et al. 2018). Although compact soils are more resistant to detachment, they exacerbate runoff, leading to higher SEV. Similarly, the clay ratio, weighted at 0.168 (RF) and 0.166 (DT), influences soil texture and cohesiveness (Soinne et al. 2023). Soils with higher clay content are generally more resistant to soil erosion; however, once detached, clay particles are easily transported by water due to their fine size. Another critical factor is carbon organic content, with weights of 0.230 (RF) and 0.225 (DT). This factor reflects the role of vegetation and organic matter in stabilizing soil. Areas with higher organic carbon content typically exhibit lower soil erosion potential due to the binding effect of organic material on soil particles and the protective role of vegetation cover.

Rainfall erosivity, with weights of 0.094 (RF) and 0.100 (DT), indicates the impact of rainfall intensity and duration on soil erosion. High rainfall erosivity values signify greater potential for soil detachment and transport (Sujatha & Sridhar 2018; Jothimani et al. 2022). Most of the area (39.7%) falls within the 2,000–2,250 MJ mm ha−1h−1year−1 range, indicating significant soil erosion potential due to rainfall. NDVI, another climatic and vegetation-related factor, has weights of 0.134 (RF) and 0.133 (DT). NDVI reflects vegetation health and density, where higher values signify better vegetation cover that can mitigate soil erosion by stabilizing soil with root systems (Mokarram & Zarei 2021). Other factors, such as the TWI, SPI, TRI, and slope length factor, have lower weights but still contribute to SEV. These factors represent terrain characteristics that influence water accumulation, runoff energy, and surface irregularity (Gómez-Gutiérrez et al. 2015; Arabameri et al. 2019; Getnet & Mulu 2021). Though less significant than other factors, they provide valuable insights into localized soil erosion processes.

Cross-validation

By employing 20 k-fold cross-validation with stratified sampling, the study achieves a thorough and fair assessment of the RF and DT models' performance in analyzing SEV. This method strengthens the reliability of factor weights, minimizes overfitting, and ensures that findings – such as the critical role of drainage density and distance to rivers – are robust and generalizable. This reinforces the practical utility of the models for identifying and mitigating soil erosion in vulnerable regions like the Saddang Watershed.

Spatial distribution of SEV

This study aims to assess the spatial distribution of SEV in Saddang Watershed by employing two widely recognized ML models: DT and RF. The analysis categorizes SEV into five classes: very low, low, moderate, high, and very high (Table 4). The results, expressed in terms of area percentages, provide insights into the efficacy and differences between these two models in classifying SEV (Figure 6). For the very low class, the DT model identifies 1,194 km2 (24.43%), nearly identical to the RF model's classification of 1,201 km2 (24.58%). In the low class, the DT model estimates 1,729 km2 (35.38%), slightly higher than the RF model's classification of 1,720 km2 (35.20%), indicating minor differences in identifying low-vulnerability areas. Both models show remarkable agreement for the moderate class, with the DT classifying 1,256 km2 (25.70%) and the RF 1,257 km2 (25.73%), reflecting the reliability of both models in identifying areas with moderate vulnerability. In the high class, the DT model identifies 640 km2 (13.09%), slightly lower than the RF model's classification of 642 km2 (13.14%), demonstrating the consistency of both models in detecting high-vulnerability areas. For the very high class, the DT model estimates 68 km2 (1.39%). In comparison, the RF model assigns a slightly lower value of 66 km2 (1.36%), confirming the capability of both models in pinpointing areas with critical SEV. Overall, the analysis reveals that the DT and RF models provide consistent and reliable classifications, with minor differences reflecting the sensitivity of each model in recognizing various levels of SEV. Notably, the areas classified as moderate, high, and very high are predominantly located near riverbanks and river branches. This spatial correlation underscores the impact of fluvial processes on SEV. This study is in line with Nwilo et al. (2021) and Ratiat et al. (2023) that areas with moderate to very high vulnerability to soil erosion are mainly situated near riverbanks and branches, considering the influence of fluvial processes on SEV. This comparative analysis reveals both consistencies and variations in SEV classifications, highlighting the importance of using multiple models to comprehensively understand soil erosion dynamics and inform effective soil conservation strategies.

Table 4

Distribution of spatial classes of SEV in Saddang Watershed

SEV
classes
Total scores classes
Decision tree

Random forest
Grid
total
Area
(km2)
Area
(%)
Grid
total
Area
(km2)
Area
(%)
Very low 0.0–0.2 1,326,473 1,194 24.43 1,334,524 1,201 24.58 
Low 0.2–0.4 1,921,084 1,729 35.38 1,910,978 1,720 35.20 
Moderate 0.4–0.6 1,395,457 1,256 25.70 1,396,923 1,257 25.73 
High 0.6–0.8 710,876 640 13.09 713,142 642 13.14 
Very high 0.8–1.0 75,425 68 1.39 73,748 66 1.36 
Total  5,429,315 4,886 100 5,429,315 4,886 100 
SEV
classes
Total scores classes
Decision tree

Random forest
Grid
total
Area
(km2)
Area
(%)
Grid
total
Area
(km2)
Area
(%)
Very low 0.0–0.2 1,326,473 1,194 24.43 1,334,524 1,201 24.58 
Low 0.2–0.4 1,921,084 1,729 35.38 1,910,978 1,720 35.20 
Moderate 0.4–0.6 1,395,457 1,256 25.70 1,396,923 1,257 25.73 
High 0.6–0.8 710,876 640 13.09 713,142 642 13.14 
Very high 0.8–1.0 75,425 68 1.39 73,748 66 1.36 
Total  5,429,315 4,886 100 5,429,315 4,886 100 

Validation model

Validation of SEV is an important step in the assessment of modeling. The AUCROC is a widely recognized and accepted metric for quantitatively assessing the diagnostic performance of classification models. It comprehensively measures a model's ability to distinguish between different classes or outcomes. A high AUCROC value, typically above 0.9, suggests that the model is highly effective in distinguishing between different classes, making it a robust tool for predictive accuracy. The DT and RF models have AUCROC values of 0.917 and 0.953, respectively (Figure 7). In the context of SEV prediction, an AUCROC above 0.9 indicates that the models, such as DT and RF, are adept at correctly classifying areas based on their SEV levels. This quantitative assessment is crucial for validating the reliability of predictive models and guiding decision-making in soil conservation and management practices.
Figure 7

AUCROC graph.

Discussions

Performance evaluation of DT and RF algorithms

The performance of DT and RF algorithms in predicting SEV can be critically assessed through various metrics, including accuracy, robustness, and generalization capabilities. The models are evaluated using AUCROC values, which indicate their efficacy in distinguishing between different levels of SEV. In this study, the higher AUCROC of the RF model suggests a greater discriminatory power than the DT model. This enhanced performance is attributed to the RF's ensemble approach, which combines predictions from multiple decision trees. By aggregating the results, the RF model improves accuracy and reduces the vulnerability of overfitting, making it more adept at capturing complex patterns and providing reliable predictions for new, unseen data. On the other hand, despite having a lower AUCROC, the DT model offers significant advantages in simplicity and interpretability. The visual clarity of decision trees helps stakeholders understand and communicate the factors influencing SEV. This interpretability is particularly valuable in applications where understanding the decision-making process is crucial. In summary, while the RF model's higher AUCROC underscores its robustness and effectiveness in handling complex environmental data, the DT model's simplicity and ease of interpretation make it a useful tool for straightforward applications. Both models are valuable for predicting SEV and informing soil conservation policies, with the RF model providing more accurate predictions and the DT model offering clear, understandable insights. Combining the strengths of both approaches can offer a comprehensive method for evaluating and managing SEV.

Significance of input variables and their interactions

Understanding the significance and interactions of input variables is crucial for accurately predicting SEV using ML models. Each input variable provides unique insights into soil erosion dynamics. Understanding the significance and interactions of input variables is crucial for predicting SEV using ML models. Drainage density is a key factor, where higher values indicate extensive stream networks that intensify concentrated water flow, increasing soil erosion potential (Sajedi-Hosseini et al. 2018; Arabameri et al. 2020). Conversely, areas with sparse drainage networks exhibit lower SEV. Distance to rivers further underscores the vulnerability of areas near riverbanks, where flowing water exerts erosive forces, particularly during high-intensity rainfall events (Gayen et al. 2019). Bulk density and the clay ratio significantly affect soil texture and water behavior. High bulk density reduces infiltration, increasing surface runoff and erosion risks, while clay-rich soils, though resistant to detachment, are easily transported once eroded (Gayen et al. 2019). Organic carbon content plays a stabilizing role by reducing erosion through the binding effect of vegetation and organic matter. Meanwhile, rainfall erosivity quantifies the intensity and duration of rainfall, driving soil detachment (Jothimani et al. 2022; Olii et al. 2023, 2024a). For instance, areas with high drainage density near rivers and steep slopes (TRI) experience severe soil erosion during intense rainfall (rainfall erosivity). In such regions, soils with high bulk density exacerbate runoff, while lower organic carbon content and sparse vegetation (NDVI) fail to provide stabilization. This interplay highlights the need for integrated conservation strategies.

Implications for soil conservation and management practices

Understanding input variables' significance and interactions in predicting SEV offers crucial insights for developing effective soil conservation and management practices in Saddang Watershed. Key variables such as drainage density, distance to rivers, rainfall erosivity, and soil properties provide insights for prioritizing interventions. High drainage density areas, characterized by concentrated water flow, require measures such as check dams or vegetative barriers to disrupt soil erosion processes and stabilize soil (Zema et al. 2022). Regions near rivers, where the distance to rivers is a critical factor, benefit from riparian buffer zones and reinforced riverbanks to mitigate soil erosion during intense rainfall (Singh et al. 2021; Graziano et al. 2022). Areas with high rainfall erosivity and sparse vegetation should adopt conservation practices such as afforestation, cover cropping, or reduced tillage to protect soil surfaces from raindrop impact and enhance ground cover (Farmaha et al. 2022). Tailored strategies are necessary for managing soil properties. High bulk-density areas should focus on soil aeration and organic amendments to improve infiltration and reduce runoff (Basset et al. 2023). For clay-rich soils, SEV can be minimized by implementing sediment traps or constructing retaining walls to manage particle transport. Integrating these approaches with terrain-specific interventions, such as terracing in rugged areas, ensures effective erosion control and sustainable land management.

Integrating ML models like DT and RF into GIS and RS enables real-time SEV assessments and supports proactive decision-making. This approach facilitates the monitoring of SEV and the evaluation of conservation measures' effectiveness over time. Furthermore, the insights gained from SEV models can drive educational initiatives and inform policy development, promoting sustainable land management practices and raising awareness about the importance of soil conservation. By applying these insights, stakeholders can enhance their soil management strategies, protect soil health, and foster sustainable land-use practices, ultimately reducing soil erosion vulnerabilities and improving environmental outcomes.

This study demonstrates the effectiveness of DT and RF models in determining SEV in Saddang Watershed. Drainage density is the most critical factor influencing SEV, intensifying soil erosion through concentrated water flow. Distance to rivers, bulk density, clay ratio, organic carbon, and rainfall erosivity significantly affect SEV. Vegetation cover and terrain factors contribute less but provide insights into localized soil erosion, aiding targeted soil conservation efforts. The DT and RF models are dependable, but the RF model has superior sensitivity and accuracy, as indicated by its AUCROC score of 0.917 versus the DT's 0.953. The RF model's ensemble technique, which integrates numerous DT, improves its capacity to capture complicated patterns while reducing overfitting, making it more suitable for environmental modeling. However, the DT model's simplicity and interpretability make it useful for stakeholders who require a thorough understanding of the decision-making process. Finally, both the DT and RF models provide useful insights into SEV, with the RF model outperforming the DT model in terms of accuracy and interpretability. These findings highlight the need to employ various models to gain a thorough understanding of soil erosion dynamics, which will aid in developing successful soil conservation methods in the Saddang Watershed.

We thank the Faculty of Engineering at Universitas Gorontalo for their exceptional support and resources throughout this research. Their commitment to academic excellence and research development has been instrumental in the success of this study. We also extend our thanks to all colleagues, research assistants, and collaborators who contributed their expertise and time, enhancing the quality of this work. Your dedication and collaborative efforts have been crucial in achieving the objectives of this research.

M.R.O. conceived and designed the study, performed data analysis, and drafted the manuscript. A.K.Z.O. contributed to data collection and interpretation and revised the manuscript critically for important intellectual content. A.O. assisted with data analysis and provided technical support and expertise in ML algorithms. R.A.D. contributed to research methodology and experimental design, and reviewed and edited the manuscript. M.A.M. conducted fieldwork and data acquisition and contributed to the discussion and interpretation of results. B.A.K. guided statistical analysis and assisted with data visualization. B.B. contributed to the literature review, supported the research design, and participated in manuscript revision. R.S.N.O. assisted with manuscript preparation and contributed to the discussion of findings. R.P. coordinated the research project, contributed to manuscript drafting, and managed the overall project. All authors reviewed and approved the final version of the manuscript.

The data used for analysis in this study are available from the corresponding author upon reasonable request.

The authors declare there is no conflict.

Alewell
C.
,
Borrelli
P.
,
Meusburger
K.
&
Panagos
P.
(
2019
)
Using the USLE: Chances, challenges and limitations of soil erosion modelling. International Soil and Water Conservation Research, 7 (3), 203–225. https://doi.org/10.1016/j.iswcr.2019.05.004
.
Arabameri
A.
,
Cerda
A.
&
Tiefenbacher
J. P.
(
2019
)
Spatial pattern analysis and prediction of gully erosion using novel hybrid model of entropy-weight of evidence
,
Water (Switzerland)
,
11
(
6
),
1
23
.
https://doi.org/10.3390/w11061129
.
Arabameri
A.
,
Tiefenbacher
J. P.
,
Blaschke
T.
,
Pradhan
B.
&
Bui
D. T.
(
2020
)
Morphometric analysis for soil erosion susceptibility mapping using novel GIS-based ensemble model
,
Remote Sensing
,
12
(
5
),
1
24
.
https://doi.org/10.3390/rs12050874
.
Avand
M.
,
Mohammadi
M.
,
Mirchooli
F.
,
Kavian
A.
&
Tiefenbacher
J. P.
(
2021
)
A new approach for smart soil erosion modelling: integration of empirical and machine learning models
.
Environmental Modeling & Assessment, 28 (1), 145–160. https://doi.org/10.21203/rs.3.rs-809330/v1
.
Baiddah
A.
,
Krimissa
S.
,
Hajji
S.
,
Ismaili
M.
,
Abdelrahman
K.
,
El Bouzekraoui
M.
,
Eloudi
H.
,
Elaloui
A.
,
Khouz
A.
,
Badreldin
N.
&
Namous
M.
(
2023
)
Head-cut gully erosion susceptibility mapping in semi-arid region using machine learning methods: insight from the high atlas, Morocco
,
Frontiers in Earth Science
,
11
(
May
),
1
19
.
https://doi.org/10.3389/feart.2023.1184038
.
Band
S. S.
,
Janizadeh
S.
,
Saha
S.
,
Mukherjee
K.
,
Bozchaloei
S. K.
,
Cerdà
A.
,
Shokri
M.
&
Mosavi
A.
(
2020
)
Evaluating the efficiency of different regression, decision tree, and Bayesian machine learning algorithms in spatial piping erosion susceptibility using ALOS/PALSAR data
,
Land
,
9
(
10
),
1
22
.
https://doi.org/10.3390/land9100346
.
Basset
C.
,
Abou Najm
M.
,
Ghezzehei
T.
,
Hao
X.
&
Daccache
A.
(
2023
)
How does soil structure affect water infiltration? A meta-data systematic review
.
Soil and Tillage Research
,
226
, 1–15 (
June 2022
).
https://doi.org/10.1016/j.still.2022.105577
.
Bhattacharya
R. K.
,
Das Chatterjee
N.
&
Das
K.
(
2019
)
Multi-criteria-based sub-basin prioritization and its risk assessment of erosion susceptibility in Kansai–Kumari catchment area, India
,
Applied Water Science
,
9
(
76
), 1–30.
https://doi.org/10.1007/s13201-019-0954-4
.
Breiman
L.
(
2001
)
Random forest
,
Machine Learning
,
45
,
5
32
.
https://doi.org/10.1023/A:1010933404324
.
Borrelli
,
P.
,
Robinson
D. A.
,
Fleischer
L. R.
,
Lugato
E.
,
Ballabio
C.
,
Alewell
C.
,
Meusburger
K.
,
Modugno
S.
,
Schütt
B.
,
Ferro
V.
,
Bagarello
V.
,
Oost
K.
,
Van Montanarella
L.
&
Panagos
P.
(
2017
)
An assessment of the global impact of 21st century land use change on soil erosion, Nature Communications, 8 (1). https://doi.org/10.1038/s41467-017-02142-7
.
Chakrabortty
R.
&
Pal
S. C.
(
2023
)
Modeling soil erosion susceptibility using GIS-based different machine learning algorithms in monsoon dominated diversified landscape in India
,
Modeling Earth Systems and Environment
,
9
(
2
),
2927
2942
.
https://doi.org/10.1007/s40808-022-01681-3
.
Chakrabortty
R.
,
Pradhan
B.
,
Mondal
P.
&
Pal
S. C.
(
2020
)
The use of RUSLE and GCMs to predict potential soil erosion associated with climate change in a monsoon-dominated region of eastern India
,
Arabian Journal of Geosciences
,
13
(
1073
), 1–26.
https://doi.org/10.1007/s12517-020-06033-y
.
Chakraborty
C.
,
Bhattacharya
M.
,
Pal
S.
&
Lee
S. S.
(
2024
)
From machine learning to deep learning: advances of the recent data-driven paradigm shift in medicine and healthcare
,
Current Research in Biotechnology
,
7
(
November 2023
),
100164
.
https://doi.org/10.1016/j.crbiot.2023.100164
.
Chen
T.
&
Guestrin
C.
(
2016
) '
XGBoost: a scalable tree boosting system
',
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
,
13–17 August
, pp.
785
794
.
Dinh
T. V.
,
Hoang
N. D.
&
Tran
X. L.
(
2021
)
Evaluation of different machine learning models for predicting soil erosion in tropical sloping lands of northeast Vietnam
,
Applied and Environmental Soil Science
, 2021, 1–14,
https://doi.org/10.1155/2021/6665485
.
Dormann
C. F.
,
Elith
J.
,
Bacher
S.
,
Buchmann
C.
,
Carl
G.
,
Carré
G.
,
Marquéz
J. R. G.
,
Gruber
B.
,
Lafourcade
B.
,
Leitão
P. J.
,
Münkemüller
T.
,
Mcclean
C.
,
Osborne
P. E.
,
Reineking
B.
,
Schröder
B.
,
Skidmore
A. K.
,
Zurell
D.
&
Lautenbach
S.
(
2013
)
Collinearity: a review of methods to deal with it and a simulation study evaluating their performance
,
Ecography
,
36
(
1
),
27
46
.
https://doi.org/10.1111/j.1600-0587.2012.07348.x
.
Farmaha
B. S.
,
Sekaran
U.
&
Franzluebbers
A. J.
(
2022
)
Cover cropping and conservation tillage improve soil health in the southeastern United States
,
Agronomy Journal
,
114
(
1
),
296
316
.
https://doi.org/10.1002/agj2.20865
.
Folharini
S.
,
Vieira
A.
,
Bento-Gonçalves
A.
,
Silva
S.
,
Marques
T.
&
Novais
J.
(
2023
)
Soil erosion quantification using machine learning in sub-watersheds of northern Portugal
,
Hydrology
,
10
(
1
), 1–17.
https://doi.org/10.3390/hydrology10010007
.
François
M.
,
Gonçalves Pontes
M. C.
,
de Vasconcelos
R. N.
,
de Oliveira
U. C.
,
Peixoto da Silva
H.
,
Faria
D.
&
Mariano-Neto
E.
(
2024
)
Assessing soil erosion and its drivers in agricultural landscapes: a case study in southern Bahia, Brazil
,
Journal of Water and Climate Change
,
00
(
0
),
1
16
.
https://doi.org/10.2166/wcc.2024.147
.
Friedman
J. H.
(
2001
)
Greedy function approximation: a gradient boosting machine
,
Annals of Statistics
,
29
(
5
),
1189
1232
.
https://doi.org/10.1214/aos/1013203451
.
Gayen
A.
,
Pourghasemi
H. R.
,
Saha
S.
,
Keesstra
S.
&
Bai
S.
(
2019
)
Gully erosion susceptibility assessment and management of hazard-prone areas in India using different machine learning algorithms
,
Science of the Total Environment
,
668
,
124
138
.
https://doi.org/10.1016/j.scitotenv.2019.02.436
.
Ghorbanzadeh
O.
,
Shahabi
H.
,
Mirchooli
F.
,
Valizadeh Kamran
K.
,
Lim
S.
,
Aryal
J.
,
Jarihani
B.
&
Blaschke
T.
(
2020
)
Gully erosion susceptibility mapping (GESM) using machine learning methods optimized by the multi-collinearity analysis and K-fold cross-validation
,
Geomatics, Natural Hazards and Risk
,
11
(
1
),
1653
1678
.
https://doi.org/10.1080/19475705.2020.1810138
.
Ghosh
A.
&
Maiti
R.
(
2021
)
Soil erosion susceptibility assessment using logistic regression, decision tree and random forest: study on the Mayurakshi River basin of eastern India
,
Environmental Earth Sciences
,
80
(
8
),
1
16
.
https://doi.org/10.1007/s12665-021-09631-5
.
Gómez-Gutiérrez
Á.
,
Conoscenti
C.
,
Angileri
S. E.
,
Rotigliano
E.
&
Schnabel
S.
(
2015
)
Using topographical attributes to evaluate gully erosion proneness (susceptibility) in two Mediterranean basins: advantages and limitations
,
Natural Hazards
,
79
,
291
314
.
https://doi.org/10.1007/s11069-015-1703-0
.
Graziano
M. P.
,
Deguire
A. K.
&
Surasinghe
T. D.
(
2022
)
Riparian buffers as a critical landscape feature: insights for riverscape conservation and policy renovations
,
Diversity
,
14
(
3
),
1
20
.
https://doi.org/10.3390/d14030172
.
Herrera
V. M.
,
Khoshgoftaar
T. M.
,
Villanustre
F.
&
Furht
B.
(
2019
)
Random forest implementation and optimization for Big Data analytics on LexisNexis's high performance computing cluster platform
,
Journal of Big Data
,
6
(
1
), 1–36.
https://doi.org/10.1186/s40537-019-0232-1
.
Jiang
C.
,
Fan
W.
,
Yu
N.
&
Liu
E.
(
2021
)
Spatial modeling of gully head erosion on the Loess Plateau using a certainty factor and random forest model
,
Science of the Total Environment
,
783
, 1–13.
https://doi.org/10.1016/j.scitotenv.2021.147040
.
Jothimani
M.
,
Getahun
E.
&
Abebe
A.
(
2022
)
Remote sensing, GIS, and RUSLE in soil loss estimation in the Kulfo River catchment, Rift valley, southern Ethiopia
,
Journal of Degraded and Mining Lands Management
,
9
(
2
),
3307
3315
.
https://doi.org/10.15243/jdmlm.2022.092.3307
.
Kotsiantis
S. B.
(
2013
)
Decision trees: a recent overview
,
Artificial Intelligence Review
,
39
(
4
),
261
283
.
https://doi.org/10.1007/s10462-011-9272-4
.
Liaw
A.
&
Wiener
M.
(
2002
)
Classification and regression by random forest
,
R Journal
,
2
(
3
),
18
22
.
Miles
J.
(
2014
)
Tolerance and variance inflation factor
,
Wiley StatsRef: Statistics Reference Online
,
4
,
2055
2056
.
https://doi.org/10.1002/9781118445112.stat06593
.
Mosaid
H.
,
Barakat
A.
,
Bustillo
V.
&
Rais
J.
(
2022
)
Modeling and mapping of soil water erosion risks in the Srou Basin (Middle Atlas, Morocco) using the EPM model, GIS and magnetic susceptibility
,
Journal of Landscape Ecology (Czech Republic)
,
15
(
1
),
126
147
.
https://doi.org/10.2478/jlecol-2022-0007
.
Mosaid
H.
,
Barakat
A.
,
John
K.
,
Faouzi
E.
,
Bustillo
V.
,
El Garnaoui
M.
&
Heung
B.
(
2024
)
Improved soil carbon stock spatial prediction in a Mediterranean soil erosion site through robust machine learning techniques
,
Environmental Monitoring and Assessment
,
196
(
2
),
130
.
https://doi.org/10.1007/s10661-024-12294-x
.
Mosavi
A.
,
Sajedi-Hosseini
F.
,
Choubin
B.
,
Taromideh
F.
,
Rahi
G.
&
Dineva
A. A.
(
2020
)
Susceptibility mapping of soil water erosion using machine learning models
,
Water (Switzerland)
,
12
(
7
), 1–17.
https://doi.org/10.3390/w12071995
.
Naceur
H. A.
,
Abdo
H. G.
,
Igmoullan
B.
,
Namous
M.
,
Alshehri
F.
&
Albanai
J. A.
(
2024
)
Implementation of random forest, adaptive boosting, and gradient boosting decision trees algorithms for gully erosion susceptibility mapping using remote sensing and GIS
,
Environmental Earth Sciences
,
83
(
3
), 1–21.
https://doi.org/10.1007/s12665-024-11424-5
.
Nguyen
C. Q.
,
Tran
T. T.
,
Nguyen
T. T. T.
,
Nguyen
T. T.
,
Astarkhanova
T. S.
,
Vu
L. V.
,
Dau
K. T.
,
Nguyen
H. N.
,
Pham
G. H.
,
Nguyen
D. D.
,
Prakash
I.
&
Pham
B.
(
2023
)
Mapping of soil erosion susceptibility using advanced machine learning models at Nghe An, Vietnam
,
Journal of Hydroinformatics
,
26
(
1
), 1–16.
https://doi.org/10.2166/hydro.2023.327
.
Nwilo
P. C.
,
Ogbeta
C. O.
,
Daramola
O. E.
,
Okolie
C. J.
&
Orji
M. J.
(
2021
)
Soil erosion susceptibility mapping of Imo River basin using modified geomorphometric prioritisation method
,
Quaestiones Geographicae
,
40
(
3
),
143
162
.
https://doi.org/10.2478/quageo-2021-0029
.
Olii
M. R.
&
Ichsan
I.
(
2020
)
Assessment of critical land using geographic information systems - A case study of Limboto watershed, Gorontalo. IOP Conference Series: Earth and Environmental Science, 437 (1), 1–9. https://doi.org/10.1088/1755-1315/437/1/012053
.
Olii
M. R.
,
Olii
A.
,
Pakaya
R.
&
Olii
M. Y. U. P.
(
2023
)
GIS-based analytic hierarchy process (AHP) for soil erosion-prone areas mapping in the Bone Watershed, Gorontalo, Indonesia
,
Environmental Earth Sciences
,
82
(
9
),
1
14
.
https://doi.org/10.1007/s12665-023-10913-3
.
Olii
M. R.
,
Kironoto
B. A.
,
Olii
A.
,
Pakaya
R.
&
Olii
A. K. Z.
(
2024a
)
Advancing soil erosion assessment: application of remote sensing and geospatial techniques in Bulango Ulu Reservoir Basin
,
E3S Web of Conferences
,
476
,
1
15
.
https://doi.org/10.1051/e3sconf/202447601041
.
Olii
M. R.
,
Olii
A. K. Z.
,
Olii
A.
,
Pakaya
R.
&
Kironoto
B. A.
(
2024b
)
Spatial modeling of soil erosion risk a multi-criteria decision-making (MCDM) approach in the Paguyaman Watershed, Gorontalo, Indonesia
,
Arabian Journal of Geosciences
,
17
(
226
),
1
13
.
https://doi.org/https://doi.org/10.1007/s12517-024-12032-0
.
Pandey
S.
,
Kumar
P.
,
Zlatic
M.
,
Nautiyal
R.
&
Panwar
V. P.
(
2021
)
Recent advances in assessment of soil erosion vulnerability in a watershed
,
International Soil and Water Conservation Research
,
9
(
3
),
305
318
.
https://doi.org/10.1016/j.iswcr.2021.03.001
.
Prasannakumar
V.
,
Shiny
R.
,
Geetha
N.
&
Vijith
H.
(
2011
)
Spatial prediction of soil erosion risk by remote sensing, GIS and RUSLE approach: a case study of Siruvani River watershed in Attapady valley, Kerala, India
,
Environmental Earth Sciences
,
64
(
4
),
965
972
.
https://doi.org/10.1007/s12665-011-0913-3
.
Ratiat
A.
,
Haddad
A.
,
Bouaichi
I.
&
Matene
C. N.
(
2023
)
Analysis and mapping of water erosion vulnerability using GIS for the Mghila Watershed, northwest of Algeria
,
Archives of Hydroengineering and Environmental Mechanics
,
70
(
1
),
71
87
.
https://doi.org/10.2478/heem-2023-0005
.
Saha
S.
,
Roy
J.
,
Arabameri
A.
,
Blaschke
T.
&
Bui
D. T.
(
2020
)
Machine learning-based gully erosion susceptibility mapping: a case study of eastern India
,
Sensors (Switzerland)
,
20
(
5
).
https://doi.org/10.3390/s20051313
.
Sahour
H.
,
Gholami
V.
,
Vazifedan
M.
&
Saeedi
S.
(
2021
)
Machine learning applications for water-induced soil erosion modeling and mapping
,
Soil and Tillage Research
,
211
, 1–12.
https://doi.org/10.1016/j.still.2021.105032
.
Sajedi-Hosseini
F.
,
Choubin
B.
,
Solaimani
K.
,
Cerdà
A.
&
Kavian
A.
(
2018
)
Spatial prediction of soil erosion susceptibility using a fuzzy analytical network process: application of the fuzzy decision making trial and evaluation laboratory approach
,
Land Degradation and Development
,
29
(
9
), 3092–3103.
https://doi.org/10.1002/ldr.3058
.
Shit
P. K.
,
Nandi
A. S.
, &
Bhunia
G. S.
(
2015
)
Soil erosion risk mapping using RUSLE model on jhargram sub-division at West Bengal in India. Modeling Earth Systems and Environment, 1 (3), 1–12. https://doi.org/10.1007/s40808-015-0032-3
.
Singh
R.
,
Tiwari
A. K.
&
Singh
G. S.
(
2021
)
Managing riparian zones for river health improvement: an integrated approach
,
Landscape and Ecological Engineering
,
17
(
2
),
195
223
.
https://doi.org/10.1007/s11355-020-00436-5
.
Soinne
H.
,
Keskinen
R.
,
Tähtikarhu
M.
,
Kuva
J.
&
Hyväluoma
J.
(
2023
)
Effects of organic carbon and clay contents on structure-related properties of arable soils with high clay content
,
European Journal of Soil Science
, 74,
1
16
.
https://doi.org/10.1111/ejss.13424
.
Stambaugh
M. C.
&
Guyette
R. P.
(
2008
)
Predicting spatio-temporal variability in fire return intervals using a topographic roughness index
,
Forest Ecology and Management
,
254
(
3
),
463
473
.
https://doi.org/10.1016/j.foreco.2007.08.029
.
Sujatha
E. R.
&
Sridhar
V.
(
2018
)
Spatial prediction of erosion risk of a small mountainous watershed using RUSLE: a case-study of the Palar sub-watershed in Kodaikanal, south India
,
Water (Switzerland)
,
10
(
11
),
1
17
.
https://doi.org/10.3390/w10111608
.
Wischmeier
W. H.
&
Smith
D. D.
(
1978
)
Predicting rainfall erosion loss: A guide to conservation planning. The USDA Agricultural Handbook No. 537, Maryland
.
Wang
Y.
,
Zhang
Y.
&
Chen
H.
(
2023
)
Gully erosion susceptibility prediction in Mollisols using machine learning models
,
Journal of Soil and Water Conservation
,
78
(
5
),
385
396
.
https://doi.org/10.2489/jswc.2023.00019
.
Wei
Y.
,
Liu
Z.
,
Zhang
Y.
,
Cui
T.
,
Guo
Z.
,
Cai
C.
&
Li
Z.
(
2022
)
Analysis of gully erosion susceptibility and spatial modelling using a GIS-based approach
,
Geoderma
,
420
, 1–14.
https://doi.org/10.1016/j.geoderma.2022.115869
.
Zema
D. A.
,
Carrà
B. G.
,
Lucas-Borja
M. E.
,
Filianoti
P. G. F.
,
Pérez-Cutillas
P.
&
Conesa-García
C.
(
2022
)
Modelling water flow and soil erosion in Mediterranean headwaters (with or without check dams) under land-use and climate change scenarios using SWAT
,
Water (Switzerland)
,
14
(
2338
), 1–27.
https://doi.org/10.3390/w14152338
.
Zhao
W.
,
Wei
H.
,
Jia
L.
,
Daryanto
S.
,
Zhang
X.
&
Liu
Y.
(
2018
)
Soil erodibility and its influencing factors on the Loess Plateau of China: a case study in the Ansai Watershed
,
Solid Earth
,
9
(
6
),
1507
1516
.
https://doi.org/10.5194/se-9-1507-2018
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).