Here, the parameters considered are subsample (P_{X1}), the number of estimators (P_{X2}), minimum child weight (P_{X3}), maximum depth (P_{X4}), learning rate (P_{X5}), colsample bytree (P_{X6}), lambda (P_{X7}), alpha (P_{X8}), verbosity (P_{X9}), and maximum bin (P_{X10}), respectively. Of the parameters mentioned, P_{X1} to P_{X6} significantly influence the loss function; thus, they are considered hyperparameters. P_{X1} is used to select subsample data that should be considered for training purposes. P_{X2}, P_{X3}, P_{X4}, and P_{X6} govern the partitioning of decision trees, impacting the similarity score. P_{X5} decides the pace of the learning process from which iterated predictor values are obtained with a minimal loss function. P_{X7} to P_{X10} do not significantly influence the simulated predictor values. Thus, they are assigned default values. P_{X7} and P_{X8} are regularization parameters added as a penalty to the loss function to minimize the overall error and to counter the overfitting scenario. Table 3 presents the parameter ranges with remarks (columns 2–3).

Table 3

S.No (1) . | Parameters, their range, and optimal value (2) . | Remarks (3) . |
---|---|---|

P_{X1} | Subsample 0.5–1 (0.8) | Subsample is a randomly chosen portion of the trained data before building decision trees which would help avoid overfitting. |

P_{X2} | Number of estimators 100–1,000 (100) | The number of estimators is the actual population of estimators used in forming decision trees for achieving the minimum loss function. However, the number of estimators picked for a specific problem should be determined, resulting in a considerable reduction in the loss function. |

P_{X3} | Minimum child weight 0–10 (5.5) | The weight assigned for deciding the successive partitioning of the decision tree is described as the minimum child weight. The greater the minimal child weight values, the more conservative the partitioning in the building tree. Lower values are recommended to produce more decision trees, which improve the convergence rate; nevertheless, lower values are computationally more expensive. |

P_{X4} | Maximum depth 3–15 (8) | The number of leaves from the root to the farthest leaf defines the maximum depth. The greater the maximum depth of a tree, the more complex the model, which overfits and aggressively consumes memory. Lower than optimal values, on the other hand, would result in the development of an insufficient number of decision trees, which would have a negative impact on the convergence criteria. |

P_{X5} | Learning rate 0–1 (0.55) | The learning rate is the step size at which the weights are updated to get the minimal loss function. A lower learning rate would increase the likelihood of pinpointing precise outcomes. The higher the learning values, the more conservative the boosting procedure develops. |

P_{X6} | Colsample bytree 0.5–1 (0.65) | Colsample bytree is the column subsample ratio used in the development of each tree. The value specifies the fraction of columns to be subsampled. Higher values of colsample make the model more conservative. |

P_{X7} | Lambda Default (1) | Lambda value indicates the L_{2} regularization term on leaf weights which are used for adjusting the loss function and countering the overfitting by summing up the square of feature coefficients. |

P_{X8} | Alpha (α) Default (2.75) | Alpha values indicate the L_{1} regularization term on leaf weights, which are used to adjust the loss function and counter the overfitting by summing up the feature coefficients. |

P_{X9} | Verbosity Default (1) | Verbosity is the adjustment made in the XGBoost to facilitate visualization of the training process. |

P_{X10} | Maximum bin Default (256) | The maximum bin is chosen to bucket the feature values. |

S.No (1) . | Parameters, their range, and optimal value (2) . | Remarks (3) . |
---|---|---|

P_{X1} | Subsample 0.5–1 (0.8) | Subsample is a randomly chosen portion of the trained data before building decision trees which would help avoid overfitting. |

P_{X2} | Number of estimators 100–1,000 (100) | The number of estimators is the actual population of estimators used in forming decision trees for achieving the minimum loss function. However, the number of estimators picked for a specific problem should be determined, resulting in a considerable reduction in the loss function. |

P_{X3} | Minimum child weight 0–10 (5.5) | The weight assigned for deciding the successive partitioning of the decision tree is described as the minimum child weight. The greater the minimal child weight values, the more conservative the partitioning in the building tree. Lower values are recommended to produce more decision trees, which improve the convergence rate; nevertheless, lower values are computationally more expensive. |

P_{X4} | Maximum depth 3–15 (8) | The number of leaves from the root to the farthest leaf defines the maximum depth. The greater the maximum depth of a tree, the more complex the model, which overfits and aggressively consumes memory. Lower than optimal values, on the other hand, would result in the development of an insufficient number of decision trees, which would have a negative impact on the convergence criteria. |

P_{X5} | Learning rate 0–1 (0.55) | The learning rate is the step size at which the weights are updated to get the minimal loss function. A lower learning rate would increase the likelihood of pinpointing precise outcomes. The higher the learning values, the more conservative the boosting procedure develops. |

P_{X6} | Colsample bytree 0.5–1 (0.65) | Colsample bytree is the column subsample ratio used in the development of each tree. The value specifies the fraction of columns to be subsampled. Higher values of colsample make the model more conservative. |

P_{X7} | Lambda Default (1) | Lambda value indicates the L_{2} regularization term on leaf weights which are used for adjusting the loss function and countering the overfitting by summing up the square of feature coefficients. |

P_{X8} | Alpha (α) Default (2.75) | Alpha values indicate the L_{1} regularization term on leaf weights, which are used to adjust the loss function and counter the overfitting by summing up the feature coefficients. |

P_{X9} | Verbosity Default (1) | Verbosity is the adjustment made in the XGBoost to facilitate visualization of the training process. |

P_{X10} | Maximum bin Default (256) | The maximum bin is chosen to bucket the feature values. |

This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy.