Lavanya Shukla, ML engineer at Weights & Biases

Fundamentals of Neural Networks

Training neural networks can be very confusing. What’s a good learning rate? How many hidden layers should your network have? Is dropout actually useful? Why are your gradients vanishing?

In this post we’ll peel the curtain behind some of the more confusing aspects of neural nets, and help you make smart decisions about your neural network architecture.

We’ll also see how we can use Weights and Biases inside Kaggle kernels to monitor performance and pick the best architecture for our neural network!

I highly recommend forking this kernel and playing with the different building blocks to hone your intuition.

If you have any questions, feel free to message me. Good luck!

1. Basic Neural Network Structure

Input neurons

Output neurons

Hidden Layers and Neurons per Hidden Layers

Loss function

Batch Size

Number of epochs

Scaling your features

2. Learning Rate

3. Momentum

4. Vanishing + Exploding Gradients

Activation functions

Weight initialization method


Gradient Clipping

Early Stopping

5. Dropout

6. Optimizers

7. Learning Rate Scheduling

Type image caption here (optional)

8. A Few More Things


We’ve explored a lot of different facets of neural networks in this post!

We’ve looked at how to setup a basic neural network (including choosing the number of hidden layers, hidden neurons, batch sizes etc.)

We’ve learnt about the role momentum and learning rates play in influencing model performance.

And finally we’ve explored the problem of vanishing gradients and how to tackle it using non-saturating activation functions, BatchNorm, better weight initialization techniques and early stopping.

You can compare the accuracy and loss performances for the various techniques we tried in one single chart, by visiting your Weights and Biases dashboard.

Neural networks are powerful beasts that give you a lot of levers to tweak to get the best performance for the problems you’re trying to solve! The sheer size of customizations that they offer can be overwhelming to even seasoned practitioners. Tools like Weights and Biases are your best friends in navigating the land of the hyper-parameters, trying different experiments and picking the most powerful models.

I hope this guide will serve as a good starting point in your adventures. Good luck!

I highly recommend forking this kernel and playing with the different building blocks to hone your intuition. And here’s a demo to walk you through using W+B to pick the perfect neural network architecture.

Join our mailing list to get the latest machine learning updates.