Transformer Reinforcement Learning


Transformer Reinforcement Learning is a library for training transformer language models with Proximal Policy Optimization (PPO), built on top of Hugging Face.

In this report you'll be able to see logged metrics and gradients from an example project— a GPT-2 experiment fine-tuning the model to generate positive movie reviews. The language model takes a few words of a movie review as input, and is tasked with finishing the review with a positive tone, using a sentiment classifier to calculate the reward.

Read full post →

Join our mailing list to get the latest machine learning updates.