Training a classification model is interesting, but have you ever wondered how your model is making its predictions? Is your model actually looking at the dog in the image before classifying it as a dog with 98% accuracy? Interesting, isn't it. In today’s report, we will explore why deep learning models need to be interpretable, and some interesting methods to peek under the hood of a deep learning model. Deep learning interpretability is a very exciting area of research and much progress is being made in this direction already.
So why should you care about interpretability? After all, the success of your business or your project is judged primarily by how good the accuracy of your model is. But in order to deploy our models in the real world, we need to consider other factors too. For instance, is racially biased? Or, what if it’s classifying humans with 97% accuracy, but while it classifies men with 99% accuracy, it only achieves 95% accuracy on women?
Understanding how a model makes its predictions can also help us debug your network. [Check out this blog post on 'Debugging Neural Networks with PyTorch and W&B Using Gradients and Visualizations' for some other techniques that can help].
At this point, we are all familiar with the concept that deep learning models make predictions based on the learned representation expressed in terms of other simpler representations. That is, deep learning allows us to build complex concepts out of simpler concepts. Here’s an amazing Distill Pub post to help you understand this concept better. We also know that these representations are learned while we train the model with our input data and the label, in case of some supervised learning task like image classification. One of the criticisms of this approach is that the learned features in a neural network are not interpretable.
Today we'll look at 3 techniques that address this criticism and shed light into neural networks' “black-box” nature of learning.