Lavanya Shukla, ML engineer at Weights & Biases

MNIST - A Deep Dive

MNIST is like the secret handshake we use to introduce people into the world of machine learning. Almost all of us have taken a shot at classifying the squiggly handwritten letters in this dataset at some point in our ML careers.

In this post we dive deeper into a state of the art model that achieves 99.4% validation accuracy on MNIST, and try to figure out what kinds of images leave it flummoxed! We hope this analysis will be useful to people trying to build models with near-perfect MNIST accuracy.

If you’re curious, grab yourself a coffee and join the quest! You can find the Weights and Biases dashboard used to generate these insights here.

I encourage you to dive deeper into your own models’ predictions using W&B! It’s easy to get started Good luck!

Trends and Summary

The primary results of our exploration are summarized in the table on the left.

Saliency Heatmaps

Let’s put our data detective hat on and dig a little deeper. We’ll start by plotting the saliency heatmaps, which help us visualize how influential each pixel is with respect to the predicted class – they basically tell us which regions of the image contribute the most to the predicted output.

Confusion Matrix

Intermediate Class Activations

To round out the analysis, we analyze the intermediate class activations which are quite useful for understanding what features successive layers of the CNN extract from the input image. The key takeaways from the maps are:

Digging Deeper with Data Frames


Last, we track the validation and accuracy metrics for the training and validation sets at each epoch. The model’s final validation accuracy was 99.48%. We could potentially have used EarlyStopping here and truncated training at around epoch 15 when we reached 99.3% val accuracy!

Now it’s your turn!

I encourage you to create a W&B dashboard for your own model, with one line of code. If you do, please share them with me! I’d love to see them!


Enter your email to get updates about new features and blog posts.

Weights & Biases

We're building lightweight, flexible experiment tracking tools for deep learning. Add a couple of lines to your python script, and we'll keep track of your hyperparameters and output metrics, making it easy to compare runs and see the whole history of your progress. Think of us like GitHub for deep learning.

Partner Program

We are building our library of deep learning articles, and we're delighted to feature the work of community members. Contact Lavanya to learn about opportunities to share your research and insights.

Try our free tools for experiment tracking →