14
Aug
2019
/
Lavanya Shukla, ML engineer at Weights & Biases

Fundamentals of Neural Networks

Training neural networks can be very confusing. What’s a good learning rate? How many hidden layers should your network have? Is dropout actually useful? Why are your gradients vanishing?

In this post we’ll peel the curtain behind some of the more confusing aspects of neural nets, and help you make smart decisions about your neural network architecture.

We’ll also see how we can use Weights and Biases inside Kaggle kernels to monitor performance and pick the best architecture for our neural network!

I highly recommend forking this kernel and playing with the different building blocks to hone your intuition.

If you have any questions, feel free to message me. Good luck!

1. Basic Neural Network Structure

Input neurons

Output neurons

Hidden Layers and Neurons per Hidden Layers

Loss function

Batch Size

Number of epochs

Scaling your features

2. Learning Rate

3. Momentum

4. Vanishing + Exploding Gradients

Activation functions

Weight initialization method

BatchNorm

Gradient Clipping

Early Stopping

5. Dropout

6. Optimizers

7. Learning Rate Scheduling

Type image caption here (optional)

8. A Few More Things

Results

We’ve explored a lot of different facets of neural networks in this post!

We’ve looked at how to setup a basic neural network (including choosing the number of hidden layers, hidden neurons, batch sizes etc.)

We’ve learnt about the role momentum and learning rates play in influencing model performance.

And finally we’ve explored the problem of vanishing gradients and how to tackle it using non-saturating activation functions, BatchNorm, better weight initialization techniques and early stopping.

You can compare the accuracy and loss performances for the various techniques we tried in one single chart, by visiting your Weights and Biases dashboard.

Neural networks are powerful beasts that give you a lot of levers to tweak to get the best performance for the problems you’re trying to solve! The sheer size of customizations that they offer can be overwhelming to even seasoned practitioners. Tools like Weights and Biases are your best friends in navigating the land of the hyper-parameters, trying different experiments and picking the most powerful models.

I hope this guide will serve as a good starting point in your adventures. Good luck!

I highly recommend forking this kernel and playing with the different building blocks to hone your intuition. And here’s a demo to walk you through using W+B to pick the perfect neural network architecture.

Weights & Biases

We're building lightweight, flexible experiment tracking tools for deep learning. Add a couple of lines to your python script, and we'll keep track of your hyperparameters and output metrics, making it easy to compare runs and see the whole history of your progress. Think of us like GitHub for deep learning.

Partner Program

We are building our library of deep learning articles, and we're delighted to feature the work of community members. Contact Carey to learn about opportunities to share your research and insights.

Try our free tools for experiment tracking →