How to Use Weights & Biases for Experiment Tracking and Collaboration: A Case Study

Andre Graubner, Ege Karaismailoglu, Leonard von Kleist and Lukas Kapp-Schwoerer

We are a team of four friends pursuing our bachelor’s degrees at ETH Zurich. This June we competed in the Student Cluster Competition at the International Supercomputing Conference 2019. Specifically, we represented ETH in the AI Task, which involved detecting extreme weather events such as cyclones in weather data. Weights and Biases was a useful tool for us, so we want to describe our workflow in developing and improving our model so other people can benefit from it as well.

Our basic workflow for using Weights & Biases:

  1. Create a shared project on W&B
  2. Set up W&B in our code with the API
  3. Make a list of experiments to run
  4. Automatically track experiments 
  5. Discuss, review results, and repeat steps 3 - 5

To detect extreme events in weather data, we decided to start from deeplabv3+, a state-of-the-art image segmentation architecture using deep learning. After adapting the architecture to handle the weather data, we compiled a list of potential improvements to the algorithm. Here is the paper we worked from.

As our list grew, we realized we would need a comprehensive way to keep track of the experiments we were planning to run. With some research, we found Weights & Biases. 

The entry barrier to set up W&B is low, so we decided to use it and see if W&B would be helpful to track our experiments. Turns out - it was! 

Set Up

We integrated Weights & Biases as a TensorFlow callback in python to track each change to our architecture throughout our experiment. 

import wandb
from wandb.keras import WandbCallback
wandb.init(entity="RACKlette", project="isc", name="test lower learning rate", tags=["lr"])
config = wandb.config
# Pass variables to wandb that should be stored:
config.epochs = epochs
config.batch_size = batch_size
config.samples = samples
config.output_stride = output_stride
config.channels = channels
config…, yTrain, 
                  validation_data = (xVal, yVal),
                  callbacks=[WandbCallback(), reduce_lr])

Experiment Tracking 

We started running experiments, varying loss functions, changing optimizers and adjusting more general architecture properties. W&B logged each run and made it easy to compare different approaches with visualizations.  

Example visualization of the validation IoU in W&B: Baseline Model vs. Model with Learning Rate Scheduler vs. Additional false-positive loss.

We saw a significant improvement with a learning rate scheduler, but when we analyzed our predictions vs the actual labels, we noticed that our model still produced far too many false positives for one of the classes. We then changed the loss function to penalize these false positives more strongly.

One big advantage of W&B is that it allows users to easily compare any subset of experiments out of the entire experiment.  The above example illustrates how W&B helps gain new insights. 

Comparing our models allowed us to cooperate. Each team member could access the entire project history from anywhere and at any time. We no longer needed to take time to catch up on what approaches different team members had done because we could watch every experiment unfold in real-time. Instead of taking time to catch up, we were able to focus more on discussing our next approaches. Furthermore, being able to view the progress of our experiments from any device was motivating as it provided a continual sense of progress. 

Weights & Biases is a valuable tool to help coordinate team efforts for big projects with lots of diverging solution attempts because it can make any progress instantly accessible to every team member, decreasing time wasted on status updates and increasing team spirit. 

Tips and Tricks

  1. Filters: Using filters to search previous experiments (instead of endless scrolling)
  2. Crash Alerts: It’s possible to get a Slack alert anytime your run crashes, to detect failure early and avoid wasting computation time. See
  3. Writing reports: Summarizing progress in written reports can provide structure and minimize confusion.

Thanks for reading! Feel free to reach out with further questions about our workflow or our project at

Join our mailing list to get the latest machine learning updates.