Organize Your Machine Learning Pipelines with Artifacts

Ayush Chaurasia, Contributor
28
Aug
2020

In this report, we will show you how to use W&B Artifacts to store and keep track of datasets, models, and evaluation results across machine learning pipelines. Think of an artifact as a versioned folder of data. You can store entire datasets directly in artifacts, or use artifact references to point to data in other systems.

Let's get started!

The term “employee monitoring” does not have the best connotation attached to it. This can be widely attributed to the fact that employee monitoring has various meanings attached to it, including computer activity monitoring and GPS monitoring on employees’ cars. Many of these tend to infringe on employees’ perceived rights and privacy. However, when used for worker protection, especially for lone workers, employee monitoring can provide employees with safety benefits that they might not otherwise have. One such use case where employee monitoring becomes necessary is to ensure workplace safety in accident-prone environments.

Workplace Safety App

Keeping this use case in mind, let us build a workplace monitoring app for a construction company that will track whether the workers present in a scene are wearing helmets and/or masks. To ensure workers’ privacy, we will not extract, operate on, or save the facial embeddings. We will build multiple models for different use cases. The datasets will be collected from open-source repositories, and some data points will be annotated manually.

Before building the first working version of the app, let us first build our experimental setup upon which the entire project will be based.

The Base Repository

All the project files can be viewed or forked from the workplace-safety-app repo. My goto object detection architecture is YOLO v3, and the deep learning framework of choice is Pytorch. We will build our project on YOLO V3 by making the required changes and bug fixes along the way. However, I will not go into the architectural details of the model. If you are not familiar with YOLO or want to brush up your knowledge, please refer to the References and Resources section.

The most famous yet lightweight version of YOLO v3 implementation in PyTorch on GitHub is by Erik Linder-Norén. That can be used as the base for this project with a few changes. An important detail about the custom dataset format is that the annotations should be present as text(.txt) files with the same name as the image file( more details here in README.md )

Read the full post →

Join our mailing list to get the latest machine learning updates.