Google Cloud Platform or more popularly GCP makes it really easy to quickly fire up a virtual machine that’s loaded with a GPU and all the necessary setup to help accelerate your machine learning experimentation. Among the various services that GCP offers in this line, Notebooks from the AI Platform section are my favorite.
In this article, I will show you how to quickly spin up a Notebook instance via the AI Platform and how to configure Weights and Biases in that instance. I’ll also highlight some cool features that come with Weights and Biases (W&B) that make it a lot of fun to work with.
Note: The rest of this post assumes you already have a billing-enabled GCP account.
3. After navigating to the Notebooks section, click on NEW INSTANCE.
Note that in the Customize Instance, you get a lot more options including a range of GPU choices. Upon selecting the environment you wish to proceed with, you will be given a popup like the following to specify the additional details. Be sure to tick the installation option for the NVIDIA GPU drivers.
Now let’s configure Weights and Biases within this instance.
Your notebook instance comes with both Python2 and Python3 (3.5) installed. So, we will have to very careful about using python and pip aliases here. By default, when you type python in the terminal version 2.7 gets selected. We can, of course, configure it accordingly. But let’s not focus on that for the time being.
Let’s go ahead and execute the following from a terminal of your notebook instance
pip3 install wandb
Notice, I am using pip3 here. Let’s verify the installation.
Next, let’s ensure that the CLI interface of W&B is also working. We can verify that here:
In this instance, it is not working as expected. So what is going on here? This is a common error for beginners and those new to working with Python. This is because the PATH environment variable is not configured properly which tells the operating system from where to load wandb.
Let’s fix this.
First, we need to find out the location directory of wandb. The easiest way to do this I know of is to simply run pip3 uninstall wandb and it logs out information. Below is an example -
Note the path of wandb (this is not the Python library but the CLI utility) which in this case is: /home/jupyter/.local/bin/wandb. Now, we need to symlink wandb inside /usr/bin -
$ cd /usr/bin
$ sudo ln -s /home/jupyter/.local/bin/wandb wandb
The next step is to update the value of the following environment variable correctly: PATH. So, type in nano ~/.bashrc and enter the following -
After you are done typing the above, go ahead and save it. After that, do not forget to run source ~/.bashrc otherwise, the changes won’t take effect.
Now, when you run wandb login, you should see the following -
Proceed with an option accordingly and you should be good to go. After this step has been completed, W&B should be authorized.
Now that we have set up and authorized W&B for our Notebook instance, let’s actually see how to use it to keep track of a model’s performance while it is training. We will be using TensorFlow 2.0 and particularly the high level tf.keras API.
I will walk you through the primary steps you would need to perform in order to use W&B to keep track of your model’s performance. You may find this notebook handy if you want to follow along.
# Initialize your W&B project allowing it to sync with TensorBoard
config = wandb.config
# Specify the configuration variables
config.dropout = 0.2
config.hidden_layer_size = 128
config.layer_1_size = 16
config.layer_2_size = 32
config.learn_rate = 0.01
config.decay = 1e-6
config.momentum = 0.9
config.epochs = 25
# Specify your model definition
# Try supplying the configuration variables
model = Sequential([...])
# Compile the model
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
# The WandbCallback logs metrics and some examples of the test data
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=config.epochs,
And that’s it! Now if you go to this run page (https://app.wandb.ai/sayakpaul/tensorboard-integration/runs/e8kv5zab) you will see:
Weights and Biases allows reading your previous runs for analysis purpose. Here’s an excellent analysis done by Lukas on some publicly available Weights and Biases runs. Instrumenting runs is as easy as -
api = wandb.Api()
run = api.run("sayakpaul/arxiv-project-complex-models/6t93vdp7")
In the above example, https://app.wandb.ai/sayakpaul/arxiv-project-complex-models/runs/6t93vdp7 is a publicly available run. Now, after the run is loaded, you can extract the configuration variables of the run like so - run.config. It will print out -
If you want to read the metrics associated with a particular run along with other important stuff, you can easily do so by -
api = wandb.Api()
run = api.run("sayakpaul/arxiv-project-complex-models/6t93vdp7")
You get -
In order to read multiple runs residing in a project and summarizing them, you need three lines of code -
runs = api.runs("sayakpaul/arxiv-project-complex-models")
for run in runs:
Of course, you have the flexibility of trimming the parts from the summary you don’t need. To know about the full potential of the Weights and Biases API check out the official documentation: https://docs.wandb.com/library/api.
In machine learning, MemoryErrors are extremely common - even when training is taking place on GPUs. When these errors occur, model training is impacted. In addition to MemoryErrors, it is also common for those new to Machine Learning to encounter a power failure and lose the progress of days of training. This is all to say that setting up your development environment thoughtfully will pay dividends in the long run.
But let’s say you are training a model, water spills on your computer, and your computer shuts down unexpectedly. Weights and Biases help you prepare for these types of situations by allowing users to resume a run if it was not completed. Let’s use an example to demonstrate.
First things first - you need to set the resume argument in wandb.init() to True:
wandb.init(project="resume-read-group-runs", name="resume_runs", resume=True)
Now, let’s say, while training my model I accidentally restarted my Jupyter Notebook instance. After I restarted the kernel, to be able to resume the training from exactly where it was last, I would do the following -
model = tf.keras.models.load_model(wandb.restore("model-best.h5").name)
It is important to note this will only work if you supply the WandbCallback while calling model.fit().
After loading the model, I simply need to compile it in exactly the same way I did before restarting the kernel and I would call model.fit() then -
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test),
callbacks=[WandbCallback(data_type="image", labels=labels, save_model=True)])
Pay attention to the initial_epoch argument. The training will begin from exactly where it was left off.
You might want to group many different runs with respect to one or more configuration variables. This helps to draw comparisons between many different runs within a project or even distributed training. I am going to show a simple example of grouping runs together.
Go to your project page (the URL should be like https://app.wandb.ai/sayakpaul/arxiv-project-complex-models) and press “ALT + Space”. It should look like -
Now, click on Group and you will see a list of the available configuration variables -
Now, I wish to group the runs with respect to the learning rate which is present as lr. I will select it accordingly and I am done -
You can go beyond just one field and select any field you may find necessary to group together the runs for your purpose -
Note: Be sure to turn off your notebook instance after you are done with your work -
I hope this article gave you a flavor of different useful features offered by Weights and Biases to help you keep track of your deep learning experiments smoothly and systematically. I have made available all the experiments I did for this article here. I hope they are useful :)
Enter your email to get updates about new features and blog posts.
We're building lightweight, flexible experiment tracking tools for deep learning. Add a couple of lines to your python script, and we'll keep track of your hyperparameters and output metrics, making it easy to compare runs and see the whole history of your progress. Think of us like GitHub for deep learning.
We are building our library of deep learning articles, and we're delighted to feature the work of community members. Contact Lavanya to learn about opportunities to share your research and insights.