The results presented in this section are produced by both RF and HGB models that were trained on ideal data, i.e. without pressure sensor uncertainty. The purpose of this comparison is to observe the amount of decline in accuracy due to the addition of uncertainty in the collected data and to assess the better ML algorithm for the given problem in an ideal setting. Tables 4 and 5 show the results for the HGB and RF algorithms, respectively. It can be observed from the presented results that the HGB-trained model is superior to the RF model by almost 30% when trained on one million instances of data. The presented results are obtained by evaluating the models with a Stratified KFold cross-validation procedure using five folds in the data. The maximum number of farmed instances was one million simulations but the training process was conducted starting from 100,000 to observe the learning curve and to get a sense of the upcoming plateau in learning of the models.

Table 2

Optimized hyper-parameters for the HGB model

Parameter nameParameter value
Learning rate 0.15 
L2 regularization 17 
Loss Categorical crossentropy 
Max bins 156 
Max depth 
Max iter 429 
Max leaf nodes 184 
Min sample leaf 75 
Parameter nameParameter value
Learning rate 0.15 
L2 regularization 17 
Loss Categorical crossentropy 
Max bins 156 
Max depth 
Max iter 429 
Max leaf nodes 184 
Min sample leaf 75 
Table 3

Optimized hyper-parameters for the RF model

Parameter nameParameter value
Max depth 444 
Max leaf nodes 457 
Min sample leaf 
Min samples split 65 
Number of estimators 481 
Parameter nameParameter value
Max depth 444 
Max leaf nodes 457 
Min sample leaf 
Min samples split 65 
Number of estimators 481 
Table 4

HGB prediction results without added pressure uncertainty

InputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
100k 85.03 90.35 94.12 98.30 
300k 83.25 88.69 92.23 95.97 
500k 84.63 90.07 93.57 97.07 
1 million 85.59 91.18 94.59 97.98 
InputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
100k 85.03 90.35 94.12 98.30 
300k 83.25 88.69 92.23 95.97 
500k 84.63 90.07 93.57 97.07 
1 million 85.59 91.18 94.59 97.98 
Table 5

RF prediction results without added pressure uncertainty

InputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
100k 22.58 21.99 22.56 22.60 
300k 35.86 34.50 34.87 35.33 
500k 43.57 42.23 42.76 43.25 
1 million 57.17 56.11 56.74 57.50 
InputsTop 1 (%)Top 3 (%)Top 5 (%)Top 10 (%)
100k 22.58 21.99 22.56 22.60 
300k 35.86 34.50 34.87 35.33 
500k 43.57 42.23 42.76 43.25 
1 million 57.17 56.11 56.74 57.50 

Close Modal

or Create an Account

Close Modal
Close Modal