Skip to Main Content

Internal components of the configuration model are referred to as parameters, and how these perform will depend on the training dataset used. The explicitly stated parameters that regulate the training process are known as hyperparameters and are established as part of the training process. In Bi-LSTM, all parameters are considered important and treated as hyperparameters. These are Bi-LSTM layer node (PL1), Learning rate (PL2), Batch size (PL3), the number of neurons (PL4), Epochs (PL5), and Dropout (PL6), respectively. The weights updated in input, forget, memory, and output gates are influenced by PL1 to PL6 (refer to Table 1). PL1, PL4, and PL5 positively influence the model performance when they are increased. They help to store the information in cell states and enhance the scope for efficiently updating weights. PL2 and PL3 govern the acceleration mechanism of weights updated. Lower values of PL2 and PL3 are preferable as the model could learn all the complex patterns effectively. PL6 helps improve the processing time, thereby reducing biases. Therefore, these are assessed for identifying optimum values using the Random Search hyperparameter tuning method. The Random Search method is chosen for tuning due to its ease of applicability and ability to generate infinitive parameter variations (Elgeldawi et al. 2021). The ranges of parameters related to Bi-LSTM with remarks are described in Table 1 (columns 2–3).

Table 1

Hyperparameters of Bi-LSTM

Index (1)Parameters, their range, and optimal value (2)Remarks (3)
PL1 Bi-LSTM layer node 16–256 (64) Bi-LSTM layer nodes are hidden layer nodes that aid in data conveyance from input, forget, memory, and output gates. The larger the Bi-LSTM layer nodes, the more comprehensive the information from the given data stored in the cell state, resulting in a better output. 
PL2 Learning rate 0–1 (0.2) The learning rate describes the step size at which the weights are altered to achieve minimal loss function. However, larger learning rates may result in underfitting, while very low values may result in an overfitting scenario. 
PL3 Batch size 16–256 (64) Batch size specifies the number of samples sent to the network instantly. More batch sizes result in an erroneous capture of complex data patterns. 
PL4 Number of neurons 16–256 (128) The number of neurons available in a dense model layer plays a major role in model performance. Overfitting may arise if there are more neurons than what is desirable. 
PL5 Epochs 10–100 (15) Epoch refers to one cycle through the entire training dataset. However, a more significant number of epochs will consume memory, rapidly increasing the processing time. 
PL6 Dropout 0–0.5 (0.25) It refers to data or noise intentionally dropped from a neural network to improve processing and reduce overall computational time. 
Index (1)Parameters, their range, and optimal value (2)Remarks (3)
PL1 Bi-LSTM layer node 16–256 (64) Bi-LSTM layer nodes are hidden layer nodes that aid in data conveyance from input, forget, memory, and output gates. The larger the Bi-LSTM layer nodes, the more comprehensive the information from the given data stored in the cell state, resulting in a better output. 
PL2 Learning rate 0–1 (0.2) The learning rate describes the step size at which the weights are altered to achieve minimal loss function. However, larger learning rates may result in underfitting, while very low values may result in an overfitting scenario. 
PL3 Batch size 16–256 (64) Batch size specifies the number of samples sent to the network instantly. More batch sizes result in an erroneous capture of complex data patterns. 
PL4 Number of neurons 16–256 (128) The number of neurons available in a dense model layer plays a major role in model performance. Overfitting may arise if there are more neurons than what is desirable. 
PL5 Epochs 10–100 (15) Epoch refers to one cycle through the entire training dataset. However, a more significant number of epochs will consume memory, rapidly increasing the processing time. 
PL6 Dropout 0–0.5 (0.25) It refers to data or noise intentionally dropped from a neural network to improve processing and reduce overall computational time. 

Close Modal

or Create an Account

Close Modal
Close Modal