Skip to Main Content

Here, the parameters considered are subsample (PX1), the number of estimators (PX2), minimum child weight (PX3), maximum depth (PX4), learning rate (PX5), colsample bytree (PX6), lambda (PX7), alpha (PX8), verbosity (PX9), and maximum bin (PX10), respectively. Of the parameters mentioned, PX1 to PX6 significantly influence the loss function; thus, they are considered hyperparameters. PX1 is used to select subsample data that should be considered for training purposes. PX2, PX3, PX4, and PX6 govern the partitioning of decision trees, impacting the similarity score. PX5 decides the pace of the learning process from which iterated predictor values are obtained with a minimal loss function. PX7 to PX10 do not significantly influence the simulated predictor values. Thus, they are assigned default values. PX7 and PX8 are regularization parameters added as a penalty to the loss function to minimize the overall error and to counter the overfitting scenario. Table 3 presents the parameter ranges with remarks (columns 2–3).

Table 3

Hyperparameters of XGBoost

S.No (1)Parameters, their range, and optimal value (2)Remarks (3)
PX1 Subsample 0.5–1 (0.8) Subsample is a randomly chosen portion of the trained data before building decision trees which would help avoid overfitting. 
PX2 Number of estimators 100–1,000 (100) The number of estimators is the actual population of estimators used in forming decision trees for achieving the minimum loss function. However, the number of estimators picked for a specific problem should be determined, resulting in a considerable reduction in the loss function. 
PX3 Minimum child weight 0–10 (5.5) The weight assigned for deciding the successive partitioning of the decision tree is described as the minimum child weight. The greater the minimal child weight values, the more conservative the partitioning in the building tree. Lower values are recommended to produce more decision trees, which improve the convergence rate; nevertheless, lower values are computationally more expensive. 
PX4 Maximum depth 3–15 (8) The number of leaves from the root to the farthest leaf defines the maximum depth. The greater the maximum depth of a tree, the more complex the model, which overfits and aggressively consumes memory. Lower than optimal values, on the other hand, would result in the development of an insufficient number of decision trees, which would have a negative impact on the convergence criteria. 
PX5 Learning rate 0–1 (0.55) The learning rate is the step size at which the weights are updated to get the minimal loss function. A lower learning rate would increase the likelihood of pinpointing precise outcomes. The higher the learning values, the more conservative the boosting procedure develops. 
PX6 Colsample bytree 0.5–1 (0.65) Colsample bytree is the column subsample ratio used in the development of each tree. The value specifies the fraction of columns to be subsampled. Higher values of colsample make the model more conservative. 
PX7 Lambda Default (1) Lambda value indicates the L2 regularization term on leaf weights which are used for adjusting the loss function and countering the overfitting by summing up the square of feature coefficients. 
PX8 Alpha (α) Default (2.75) Alpha values indicate the L1 regularization term on leaf weights, which are used to adjust the loss function and counter the overfitting by summing up the feature coefficients. 
PX9 Verbosity Default (1) Verbosity is the adjustment made in the XGBoost to facilitate visualization of the training process. 
PX10 Maximum bin Default (256) The maximum bin is chosen to bucket the feature values. 
S.No (1)Parameters, their range, and optimal value (2)Remarks (3)
PX1 Subsample 0.5–1 (0.8) Subsample is a randomly chosen portion of the trained data before building decision trees which would help avoid overfitting. 
PX2 Number of estimators 100–1,000 (100) The number of estimators is the actual population of estimators used in forming decision trees for achieving the minimum loss function. However, the number of estimators picked for a specific problem should be determined, resulting in a considerable reduction in the loss function. 
PX3 Minimum child weight 0–10 (5.5) The weight assigned for deciding the successive partitioning of the decision tree is described as the minimum child weight. The greater the minimal child weight values, the more conservative the partitioning in the building tree. Lower values are recommended to produce more decision trees, which improve the convergence rate; nevertheless, lower values are computationally more expensive. 
PX4 Maximum depth 3–15 (8) The number of leaves from the root to the farthest leaf defines the maximum depth. The greater the maximum depth of a tree, the more complex the model, which overfits and aggressively consumes memory. Lower than optimal values, on the other hand, would result in the development of an insufficient number of decision trees, which would have a negative impact on the convergence criteria. 
PX5 Learning rate 0–1 (0.55) The learning rate is the step size at which the weights are updated to get the minimal loss function. A lower learning rate would increase the likelihood of pinpointing precise outcomes. The higher the learning values, the more conservative the boosting procedure develops. 
PX6 Colsample bytree 0.5–1 (0.65) Colsample bytree is the column subsample ratio used in the development of each tree. The value specifies the fraction of columns to be subsampled. Higher values of colsample make the model more conservative. 
PX7 Lambda Default (1) Lambda value indicates the L2 regularization term on leaf weights which are used for adjusting the loss function and countering the overfitting by summing up the square of feature coefficients. 
PX8 Alpha (α) Default (2.75) Alpha values indicate the L1 regularization term on leaf weights, which are used to adjust the loss function and counter the overfitting by summing up the feature coefficients. 
PX9 Verbosity Default (1) Verbosity is the adjustment made in the XGBoost to facilitate visualization of the training process. 
PX10 Maximum bin Default (256) The maximum bin is chosen to bucket the feature values. 

Close Modal

or Create an Account

Close Modal
Close Modal