Benchmarks

Weights & Biases is an experiment tracking platform for deep learning. Our tools make collaborative deep learning easy for teams by organizing experiments and notes in a shared workspace, tracking all the code and hyperparameters, and visualizing output metrics.

With public benchmarks, we aim to encourage collaboration at the broader community level. This feature allows people to coordinate their efforts. You can easily see and share the code, workflow, and results of other people's experiments. If you're new to a problem, you don't have to start from scratch. We encourage you to include your code and any notes or observations about your process. You can use this platform to see what others have tried, discuss ideas, and synthesize different approaches.

Our hope is that everyone can see other people's work on similar problems in a single place. The more accessible, transparent, and collaborative the process of model training becomes, the safer and better the models we build.

Browse and contribute to our current benchmarks. If you have an idea for a benchmark or would like us to host one for you, please reach out.

Aerial Segment by DroneDeploy
Segment high resolution aerial orthomosaics and elevation images
CodeSearchNet by GitHub
Model source code as a language for semantic search, understanding, translation, and more
Drought Watch
Predict drought severity from satellite imagery and ground-level photos
KMNIST
Learn to read classical Japanese handwriting from images
The Witness
Teach an AI to play a video game (spoiler alert for The Witness!)
CATZ Benchmark
Learn to predict cat behavior from consecutive frames in GIFs
Super Resolution
Learn to enhance images without losing quality
Colorizer
Add realistic color to black and white photos