The results presented in this section are produced by both RF and HGB models that were trained on ideal data, i.e. without pressure sensor uncertainty. The purpose of this comparison is to observe the amount of decline in accuracy due to the addition of uncertainty in the collected data and to assess the better ML algorithm for the given problem in an ideal setting. Tables 4 and 5 show the results for the HGB and RF algorithms, respectively. It can be observed from the presented results that the HGB-trained model is superior to the RF model by almost 30% when trained on one million instances of data. The presented results are obtained by evaluating the models with a Stratified KFold cross-validation procedure using five folds in the data. The maximum number of farmed instances was one million simulations but the training process was conducted starting from 100,000 to observe the learning curve and to get a sense of the upcoming plateau in learning of the models.
Optimized hyper-parameters for the HGB model
Parameter name . | Parameter value . |
---|---|
Learning rate | 0.15 |
L2 regularization | 17 |
Loss | Categorical crossentropy |
Max bins | 156 |
Max depth | 2 |
Max iter | 429 |
Max leaf nodes | 184 |
Min sample leaf | 75 |
Parameter name . | Parameter value . |
---|---|
Learning rate | 0.15 |
L2 regularization | 17 |
Loss | Categorical crossentropy |
Max bins | 156 |
Max depth | 2 |
Max iter | 429 |
Max leaf nodes | 184 |
Min sample leaf | 75 |
Optimized hyper-parameters for the RF model
Parameter name . | Parameter value . |
---|---|
Max depth | 444 |
Max leaf nodes | 457 |
Min sample leaf | 2 |
Min samples split | 65 |
Number of estimators | 481 |
Parameter name . | Parameter value . |
---|---|
Max depth | 444 |
Max leaf nodes | 457 |
Min sample leaf | 2 |
Min samples split | 65 |
Number of estimators | 481 |
HGB prediction results without added pressure uncertainty
Inputs . | Top 1 (%) . | Top 3 (%) . | Top 5 (%) . | Top 10 (%) . |
---|---|---|---|---|
100k | 85.03 | 90.35 | 94.12 | 98.30 |
300k | 83.25 | 88.69 | 92.23 | 95.97 |
500k | 84.63 | 90.07 | 93.57 | 97.07 |
1 million | 85.59 | 91.18 | 94.59 | 97.98 |
Inputs . | Top 1 (%) . | Top 3 (%) . | Top 5 (%) . | Top 10 (%) . |
---|---|---|---|---|
100k | 85.03 | 90.35 | 94.12 | 98.30 |
300k | 83.25 | 88.69 | 92.23 | 95.97 |
500k | 84.63 | 90.07 | 93.57 | 97.07 |
1 million | 85.59 | 91.18 | 94.59 | 97.98 |
RF prediction results without added pressure uncertainty
Inputs . | Top 1 (%) . | Top 3 (%) . | Top 5 (%) . | Top 10 (%) . |
---|---|---|---|---|
100k | 22.58 | 21.99 | 22.56 | 22.60 |
300k | 35.86 | 34.50 | 34.87 | 35.33 |
500k | 43.57 | 42.23 | 42.76 | 43.25 |
1 million | 57.17 | 56.11 | 56.74 | 57.50 |
Inputs . | Top 1 (%) . | Top 3 (%) . | Top 5 (%) . | Top 10 (%) . |
---|---|---|---|---|
100k | 22.58 | 21.99 | 22.56 | 22.60 |
300k | 35.86 | 34.50 | 34.87 | 35.33 |
500k | 43.57 | 42.23 | 42.76 | 43.25 |
1 million | 57.17 | 56.11 | 56.74 | 57.50 |