26
Feb
2020
/
Stacey Svetlichnaya, Deep Learning Engineer

Scrying for Significance: Effective Hyperparameter Search in Deep Learning

TL;DR: To understand the significance of hyperparameter tuning outcomes, quantify the random variance of your experiments

Hyperparameter tuning is a game of accumulating tiny victories. When trying lots of different values in many combinations, it can be hard to tell the difference between background noise and real improvement in target metrics. Is a 0.6% uptick in validation accuracy when I increase the learning rate meaningful? What if I’m also changing the batch size? How many values and combinations should I keep trying to make sure? Maybe my model’s not that sensitive to these, and I should focus on more interesting variables.

One approach is to compare my observations across two conditions: how much do the results change in my active experiments versus in a random background noise condition (e.g. when the random seed is not set)? The magnitude of the variance in the signal condition (changing hyperparameters, fixed random seed) versus the noise condition (changing random seed, fixed hyperparameters) can quantify the observed improvement relative to a baseline (a null hypthosesis, if you will). If the variance in validation accuracy over many trials from noise alone is 0.5%, then my 0.6% change isn’t very interesting. However, if the variance from noise alone is 0.06%, then my learning rate tuning improved the validation accuracy tenfold, which is much more promising.

In this report I use W&B Sweeps to explore this approach on a simple example (a bidirectional RNN trained on MNIST) and visualize the difference between noise (left image below) and signal (right image). Click here for the full report

Newsletter

Enter your email to get updates about new features and blog posts.

Weights & Biases

We're building lightweight, flexible experiment tracking tools for deep learning. Add a couple of lines to your python script, and we'll keep track of your hyperparameters and output metrics, making it easy to compare runs and see the whole history of your progress. Think of us like GitHub for deep learning.

Partner Program

We are building our library of deep learning articles, and we're delighted to feature the work of community members. Contact Lavanya to learn about opportunities to share your research and insights.

Try our free tools for experiment tracking →