Model flexibility: Some pretrained nets parallelize six times better

Stacey Svetlichnaya

A pre-trained network gives a solid baseline for many problems in machine learning. When choosing among different pre-trained architectures, we often compare performance metrics like loss and accuracy. However, the model’s training time can have a bigger impact on the project if it slows down the iteration loop going forward. I use W&B system metrics to diagnose why one version of Inception trains six times faster than another when parallelized across 8 GPUs in this short report.

Join our mailing list to get the latest machine learning updates.