Internal components of the configuration model are referred to as parameters, and how these perform will depend on the training dataset used. The explicitly stated parameters that regulate the training process are known as hyperparameters and are established as part of the training process. In Bi-LSTM, all parameters are considered important and treated as hyperparameters. These are Bi-LSTM layer node (P_{L1}), Learning rate (P_{L2}), Batch size (P_{L3}), the number of neurons (P_{L4}), Epochs (P_{L5}), and Dropout (P_{L6}), respectively. The weights updated in input, forget, memory, and output gates are influenced by P_{L1} to P_{L6} (refer to Table 1). P_{L1}, P_{L4}, and P_{L5} positively influence the model performance when they are increased. They help to store the information in cell states and enhance the scope for efficiently updating weights. P_{L2} and P_{L3} govern the acceleration mechanism of weights updated. Lower values of P_{L2} and P_{L3} are preferable as the model could learn all the complex patterns effectively. P_{L6} helps improve the processing time, thereby reducing biases. Therefore, these are assessed for identifying optimum values using the Random Search hyperparameter tuning method. The Random Search method is chosen for tuning due to its ease of applicability and ability to generate infinitive parameter variations (Elgeldawi *et al.* 2021). The ranges of parameters related to Bi-LSTM with remarks are described in Table 1 (columns 2–3).

Table 1

Index (1) . | Parameters, their range, and optimal value (2) . | Remarks (3) . |
---|---|---|

P_{L1} | Bi-LSTM layer node 16–256 (64) | Bi-LSTM layer nodes are hidden layer nodes that aid in data conveyance from input, forget, memory, and output gates. The larger the Bi-LSTM layer nodes, the more comprehensive the information from the given data stored in the cell state, resulting in a better output. |

P_{L2} | Learning rate 0–1 (0.2) | The learning rate describes the step size at which the weights are altered to achieve minimal loss function. However, larger learning rates may result in underfitting, while very low values may result in an overfitting scenario. |

P_{L3} | Batch size 16–256 (64) | Batch size specifies the number of samples sent to the network instantly. More batch sizes result in an erroneous capture of complex data patterns. |

P_{L4} | Number of neurons 16–256 (128) | The number of neurons available in a dense model layer plays a major role in model performance. Overfitting may arise if there are more neurons than what is desirable. |

P_{L5} | Epochs 10–100 (15) | Epoch refers to one cycle through the entire training dataset. However, a more significant number of epochs will consume memory, rapidly increasing the processing time. |

P_{L6} | Dropout 0–0.5 (0.25) | It refers to data or noise intentionally dropped from a neural network to improve processing and reduce overall computational time. |

Index (1) . | Parameters, their range, and optimal value (2) . | Remarks (3) . |
---|---|---|

P_{L1} | Bi-LSTM layer node 16–256 (64) | Bi-LSTM layer nodes are hidden layer nodes that aid in data conveyance from input, forget, memory, and output gates. The larger the Bi-LSTM layer nodes, the more comprehensive the information from the given data stored in the cell state, resulting in a better output. |

P_{L2} | Learning rate 0–1 (0.2) | The learning rate describes the step size at which the weights are altered to achieve minimal loss function. However, larger learning rates may result in underfitting, while very low values may result in an overfitting scenario. |

P_{L3} | Batch size 16–256 (64) | Batch size specifies the number of samples sent to the network instantly. More batch sizes result in an erroneous capture of complex data patterns. |

P_{L4} | Number of neurons 16–256 (128) | The number of neurons available in a dense model layer plays a major role in model performance. Overfitting may arise if there are more neurons than what is desirable. |

P_{L5} | Epochs 10–100 (15) | Epoch refers to one cycle through the entire training dataset. However, a more significant number of epochs will consume memory, rapidly increasing the processing time. |

P_{L6} | Dropout 0–0.5 (0.25) | It refers to data or noise intentionally dropped from a neural network to improve processing and reduce overall computational time. |

This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy.