Skip to content Skip to sidebar Skip to footer

Big Loss And Low Accuracy On Training Data In Both Bert And Albert

I am using huggingface TFBertModel to do a classification task (from here: ), I am using the bare TFBertModel with an added head dense layer and not TFBertForSequenceClassification

Solution 1:

The default learning rate is too high for BERT. Try setting it to one of the recommended learning rates from the original paper Appendix A.3 of 5e-5, 3e-5 or 2e-5.


Post a Comment for "Big Loss And Low Accuracy On Training Data In Both Bert And Albert"