Mask R-CNN is a computer vision model developed by the Facebook AI group that achieves state-of-the-art results on semantic segmentation (object recognition and pixel labeling) tasks. An implementation of the model is made available by Matterport on their github page. The code in the repo works with the MS Coco dataset out of the box, but provides for easy extensibility to any kind of dataset or image segmentation task.
To jump straight to the results of the experiment, go here.
Implementing Mask R-CNN from scratch would be highly non-trivial, but we can pull on the sinews a bit by tweaking hyperparameters that govern how the model operates to get a sense of how they affect model performance. The Matterport implementation provides a lot of knobs to turn, including:
Learning RateGradient Clip NormLearning MomentumWeight decayBackboneBackbone StridesScales, Ratios, and Anchors per Image for the Region Proposal NetworkThe weights of various loss functions (segmentation vs classification, etc)
For an overview of Mask R-CNN and to get a sense of the significance of these hyperparameters see this great post about Mask R-CNN and its genealogy.
To run our hyperparameter experiments we used Weight & Biases (wandb).
Wandb is a platform that helps machine learning teams coordinate the training of models. It is sort of analogous to github for software 2.0 projects, allowing team members to experiment with different model configurations while keeping everything on the same page by synchronizing the tracking of model hyperparameters and performance. It also makes it a lot easier for individuals to dive in and try a bunch of different things out without getting lost in the mire.
We were particularly curious about seeing how the semantic segmentation of the images progressed as the model trained with different hyperparameters, so we integrated an ImageCallback() class to the Matterport code to sync to wandb.
We tried to maintain a minimal footprint on the code in the original repo, restricting our modifications to coco.py wherever possible. However, ImageCallback() needed access to the Coco dataset, so we either had to put the callbacks in the coco.py file, or bring the Coco dataset into model.py. We eventually settled on the former for the sake of generality, but this required modifying the MaskRCNN() class in model.py to accept a callbacks argument, and modifying the train function of the class to pass the callbacks initialized in the constructor to keras.fit_generator().
We also include a script to run parameters sweeps, that can be adapted to work with different hyperparameters or different values of the same ones, but note that, for trying out different hyperparameters, the CocoConfig() class in coco.py has to be adapted to to get the values from environment variables.
To get it up and running just follow the steps in the readme of our github repo.
The results of our parameter sweep can be found on the wandb run page.
Two of the most interesting runs from our sweep were the runs with gradient clipping set aggressively (10.0) and higher learning rate (0.01). They had validation loss scores that were high and bounced around a lot early on but ended up being top-performing models.
You can see these and other graphs on the run page on the wandb app, and check out the progress of the image labeling through training.