In this tutorial we are going to look at Recurrent Neural Networks and time series data. In future videos, we are going to show how to take these RNNs and apply them to text data.
First of all, what is time series data? In the real world, data changes over time. For instance, if we look at this famous dataset of airplane sales from 1949 - 1969 we can see that there is a general trend upwards, and a cyclical trend between years. In order to accurately predict the airline sales in 1960 for instance, we have to have some way of taking into account its time trend.
So how can we convert this data over time into the machine learning API (fixed length of numbers in and fixed length of numbers out)? We use a very simple and effective approach, known as the sliding window. We take a window of a fixed size, e.g. 10 elements, and put that in the first row of our training data. From these 10 elements, we want to output the 11th element, which we call the label. We keep sliding the window across, and eventually we will have a dataset that we can input into our perceptron, in the same way that we did in the very first tutorial in this series.
Go into the timeseries directory in ml-class, and open up perceptron.py. This code sets up timeseries data and inputs it into a perceptron. Let’s walk through the code.
We have a pretty good accuracy from our simple perceptron, but we can improve it by using a Recurrent Neural Network (RNN). What’s better about an RNN than a perceptron? The difference is the causality of time. If you scramble the 20 inputs from the previous perceptron, it would have no effect on the prediction. But obviously from looking at the graph, we know that the order of those 20 events does matter. An RNN takes into account the order of the inputs, which becomes especially important on larger datasets.
Recurrent Neural Networks generally take in the same input as a dense neural network: they take in a vector of numbers over time and output a single number or a vector of numbers. The difference is that they keep track of a state, which they pass between themselves.
So what happens inside the RNN? The RNN takes in a list of numbers (e.g. 1, 3, 4, 7), and at each step it passes along a state which it also outputs. In this case, the state is a single number. In this diagram of the second step, the network takes in the input 3 and it also takes in the state passed from the previous cell, 0.4. Using these two inputs, the network does a perceptron calculation - a weighted sum and a hyperbolic tangent activation function - and outputs a single number, 0.9. This output is then output by the cell and also passed to the next cell as a state. This same calculation is performed 10 times (or however long the window is) and then we take the final output to be the prediction of the label.
What is a hyperbolic tangent activation function?
A hyperbolic tangent activation function acts like a sigmoid, but the values range from -1 to 1 instead of from 0 to 1.