Big Loss And Low Accuracy On Training Data In Both Bert And Albert
I am using huggingface TFBertModel to do a classification task (from here: ), I am using the bare TFBertModel with an added head dense layer and not TFBertForSequenceClassification
Solution 1:
The default learning rate is too high for BERT. Try setting it to one of the recommended learning rates from the original paper Appendix A.3 of 5e-5, 3e-5 or 2e-5.
Post a Comment for "Big Loss And Low Accuracy On Training Data In Both Bert And Albert"