Recently, Anthony Goldbloom took some time to answer questions from our Slack community about his vision for Kaggle, how Kaggle & the competitions have changed over the years, how they're handling TPUs, and how competitive data science can prepare you for the real world.
For more AMAs with industry leaders, join the W&B slack community.
Q from Charles: It’s a very common bit of folklore in the machine learning community that deep neural networks don’t do well in Kaggle competitions and that the winners are usually (ensembles of) tree-based models. Is that accurate? If so, why do you think there’s a disconnect between the methods that win contests on Kaggle datasets and the methods that push the boundaries of ML? If not, do you have any idea where this misconception came from?
Anthony: that’s only partially true. deep neural networks win all unstructured data challenges on Kaggle (that is competitions with speech, images, text etc). It is still true that gradient boosting machines (XGBoost/LightGBM) + clever feature engineering is the best approach for smaller datasets with structured data and do still win Kaggle challenges. Deep neural networks start do take over again for very large structured datasets.
Q from Stacey Svetlichnaya: Kaggle is such an amazing resource, thanks for building it and taking the time to chat with us! I’m really curious how you think about the balance of competition (e.g. working in secret to beat the state of the art) and collaboration (sharing details of different approaches and code, building on existing work) in the field of machine learning. Of course we need both strategies, and a lot depends on the dataset/problem/context. Still, what are some of the biggest tradeoffs or edge cases you’ve encountered? How has this balance influenced the evolution of Kaggle as a platform and a community? How can we encourage the best aspects of both approaches as the field grows increasingly complex and the stakes get higher?
Anthony: thanks for the nice words about Kaggle. While Kaggle runs ML challenges, there’s a huge amount of sharing that goes on as part of the challenges. And in almost all cases, winners end up sharing their approach at the end of the competition. That is one of the main reasons, people get value out of Kaggle: very rewarding to spend a lot of time competing in a challenge and then learning what the winners did that you might have missed. We try and balance the incentive to compete with an incentive to share. For example, we offer points for competition but also upvote notebooks and discussion posts.