BatchNorm has been a favorite topic among the ML research community since it was proposed. It is often misunderstood, and even quite poorly understood. So far, the research community has mostly focused on its normalization component. It's also important to note that a BatchNorm has two learnable parameters - a coefficient that is responsible for scaling and a bias that is responsible for shifting. Not much work has been done in order to study the effect of these two parameters systematically.
Earlier this year, Jonathan Frankle and his team published a paper on Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs. They studied how well the scaled and shifted parameters of the BatchNorm layers adjust themselves with the random parameter initializations of CNNs. In this report, I am going to present my experiments based on the ideas presented in this paper.