Does model size matter?


If you haven't already, check out my tutorial on training a model using HuggingFace and Weights & Biases. We'll be building on that knowledge today. This tutorial will cover two models – BERT and DistilBERT – and explain how to conduct a hyperparameter search using Sweeps. We're going to aim to answer two questions:

  1. How does DistilBERT compare in performance to the larger BERT?
  2. Should BERT and DistilBERT be fine-tuned with different hyperparameters?

