Ayush Chaurasia, Contributor

Exploring Neural Style Transfer with Weights & Biases

In this tutorial, we’ll go through the neural style transfer algorithm by Gatys, implement it and track it using the W&B library. Let’s assume that we’re building a style transfer app for production. We’ll need to compare the results generated by changing various parameters. This requires subjective comparison because we cannot use an accuracy metric as no style transfer result is more “accurate” than the other. So, we’ll need to choose the parameters according to our preference and this requires side-by-side comparison which can easily be done using wandb library. Let’s take it step-by-step.

Intro To Neural Style Transfer

Neural style transfer is an optimization technique that takes as input  a content and a style image, and optimizes a target image to resemble the contents of the content image and style of the style image. The key finding of this research paper by Gatys was that the content and style features of an image can be separated using deep neural networks.

Separating Style and Content

Our next task is to understand how the style and content of a given image can be approximated using a deep neural network. If you ever wander into the nitty-gritty of training and visualizing the training process, you’ll find that the lower layers of the network learn to recognize the lines, edges, and colors which represent the low level details of the image while the deeper layers learn to recognize shapes, orientation and other high level aspects of the image. Thus, we can use lower layers to extract style as it consists of textures and colors of the image while the deeper levels can be utilized to extract the content. Here’s an excerpt from the research paper -  “along the processing hierarchy of the network, the input image is transformed into representations that increasingly care about the actual content of the image compared to its detailed pixel values.”

Now, we’ve got the tools necessary to build our style transfer algorithm. We just need to optimize another image(tensor) to represent the style and content of the input images. But to optimize, we’ll need a metric, which brings us to the cost function.

Cost Function For Style Transfer

The total cost function is the weighted sum of two separate costs namely, style cost and content cost calculated separately.

Here, α is called the content weight and 𝛽 is called the style weight. Generally, style weight is much higher than the content weight as we want to emphasize more on the “style transfer”. In this experiment, we’ll set α =1e2and 𝛽 =1e8. We’ll use a pre-trained VGG19 network to calculate style and content losses.

The layers underlined in red will be used for the extraction of style features and more weight will be given to the layers at the front as compared to the deeper layers.

The layer underlined in blue will be used for the extraction of content features. Only one layer deep into the network is used for content extraction as the lower layers don’t capture the content but only texture(style).

Let’s look into the details of the two cost functions.

Content Loss

The content loss is pretty straight-forward. let p and x be the original image and the image that is generated and P and F their respective feature representation in layer i . We then define the squared-error loss between the two feature representations as the content loss.

Style Loss

let a and x be the original image and the image that is generated and Ai and Gi their respective style representations in layer i  . The contribution of that layer to the total loss is then

Here G represents a correlation between style features known as “gram matrix”.

It is essentially a measure of style captured by the features.

Wl represents the weight of a particular layer.

We’ll provide more weight to the lower layers as they are responsible for capturing the style.

Here are the weights that I’ve used for this experiment.

Now that we have our separate losses, let’s combine them to form the cost function

This is all we need to optimize our algorithm to generate artistic style. I’ve covered a more detailed explanation of the research paper and code implementation from scratch here.

Weights and Biases Integration

Wandb library can be easily installed using pip. We’ll use it to track the performance and outputs of various optimizers that we’ll use to optimize our style transfer algorithm.The saved logs and metrics for this project are public and can be viewed on this wandb workspace.  

In order to use wandb to track a project,we need to initialize it using the init()function which takes the name of the workspace and an optional name for the current execution. wandb.config helps us keep track of other important parameters.

The picture above represents the initialization for the execution where we’ve used the Adamoptimizer.

Tracking Loss and Outputs

Next, we’ll track the total cost function and the target image that our optimizer produces. But before that, let’s see what are the style and content image that we’ll  be using for this task.

Now let’s print and log the total loss after every 10 iterations using wandb. The syntax to do that is pretty straightforward.

Here, we are logging only the loss and not the accuracy because it’s quite obvious that there is no accuracy metric for style transfer. We’ll also save the output image generated after every 20 iterations in a list and log that list in order to track the generated outputs. Here’s the code snippet for logging images.

WandB can also be used to log plots, charts, tensors, audio video and other data formats. Detailed information about the log function can be in the official documentation for wandb.log.

Moving On To Experimentation

In neural style transfer algorithm, a slight change in any of the parameters or hyper-parameters such as optimizer function, learning rate, layer weights, style and content weights etc can dramatically change the end results. In this case study, we’ll analyse how different optimizer algorithms affect the results. We’ll compare these optimizers - Adam,Adagrad,RMSPropand LBFGS. After running the code with the different optimizers, wandb will automatically log the necessary data needed for comparing the executions in the workspace.Let’s take a look at our wandb workspace.

Various visualizations are automatically generated by wandb and are listed under the AUTO VISUALIZATIONS tab.

We can manually add visualization for the desired parameters using the “add visualization” button. The generated visualizations are listed under the VISUALIZATION tab.

All the executions are listed on the left pane and the details of each execution can be viewed by clicking on its name.

Visualizing The Cost Function

Let’s take a look at how various optimization algorithms stack up against each other. To do that, we’ll have a look at the visualization of losses that we’ve logged.

As it is clearly visible from the plot, Adam optimizer starts to converge faster than any other optimizer but in the end LBFGS is the clear winner if minimizing the loss is the comparison metric.

Adagrad executes fairly smoothly but it takes much longer to converge. RMSProp is the most unstable optimizer among these four optimizers as it is clearly visible from its plot.

Deduction - If minimizing the loss function has the highest priority, the best optimizer would be LBFGS. We’ve easily deduced this by comparing the plot generated by wandb.

But in style transfer, the loss metric does not capture the whole picture. Remember that we’re building a style transfer app for production and choosing a technique over another is fairly subjective as “style” is subjective. So, now let’s compare the results. We’ve logged 50 outputs for each execution. Let’s compare them.

Visualizing Results at Index 15

Wandb allows up to generate and save reports by choosing and comparing specific runs. I’ve saved a report which compares the outputs of various indices. Let’s have a look at index 15.

Looking at the results, we can easily conclude that RMSprop optimizer is the outlier among the others as the output generated by it has a lot of distortions without the visible signs of transference of style. This can be fixed by tuning the hyper-parameters but for the sake of experimentation, we’ll hold the parameters constant. The image looks dull and faded. Again, it is purely subjective and might be considered as a “whitened” version of the image. So, we can use this optimizer when trying to apply a whitening filter.

Adagrad’s output has sharp contrast but as the convergence its convergence is slow, the style is not yet prominent when compared with the outputs of LBGFS and Adam.


Visualizing Results at Index 25

The LBGFS optimizer has achieved the highest level of style transfer, closely followed by Adam. RMSProp again seems to be the obvious outlier with white-filter touch. Adagrad has the least prominent style features.

Deduction- If the speed for execution(number of epochs) is concerned, we should go for LBGFS as our optimizer of choice.

Visualizing the Final Results

Let’s have a look at the final output of our algorithm

Final Deduction-

Now we have sufficient insights to choose the optimizers accordingly to build our application. We can even offer multiple forms of style transfer as we’ve got different yet consistent results in all the 4 experiment.

Finally, we can easily tune this algorithm by changing even one of many parameters. I’d recommend that you try to change the hyper-parameters and compare the executions at various levels. Wandb makes it really easy to compare and log the results. So go ahead and experiment!

Creating Reports on wandb

Wandb allows you create reports which contain detailed analysis of an experiment with supporting visualization and explanation. The detailed report of this experiment can be found here.

More Results

Here are some more results from the test runs that I performed on my system.


The original neural style transfer research paper

A video explaining the research paper

A video explaining the code implementation of the research paper

Wandb workspace for this project.

Join our mailing list to get the latest machine learning updates.