ML Research and Production Pipelines with Chip Huyen
Chip has worked on ML research at Snorkel, NVIDIA, Netflix and Primer. She joins us to shares the biggest challenges of moving machine learning pipelines from research to production.
View all podcasts
Gradient Dissent - a Machine Learning Podcast · Evolution of Reinforcement Learning and the Robot Hand

Chip Huyen is a writer and machine learning engineer currently working at Snorkel AI, a startup that focuses on machine learning production pipelines. Previously, she’s worked at NVIDIA, Netflix, and Primer. She helped launch Coc Coc - Vietnam’s second most popular web browser with 20+ million monthly active users. Before all of that, she was a best selling author and traveled the world. Chip graduated from Stanford, where she created and taught the course on TensorFlow for Deep Learning Research.

Check out Chip's recent article on ML Tools, and her website. Finally you can follow her on Twitter.


Lukas: My first question for you when I was thinking about interviewing you was actually.. I really want to hear the whole story about how you got into machine learning. I kind of have bits and pieces of your background that you've told in the past, but tell me your life story.

Chip: How much time do we have?

Lukas: You can cut it down [laughs] How did you get into tech in the first place?

Chip: It’s a funny story because I come from a very non-tech background; as far away as you can think of. So after high school, I didn't go to college and I started traveling. So I did that for three years and in the process, I was writing; I was writing for a newspaper, I hosted a couple of columns and I wrote a couple of books which got me into more trouble since and I wanted to homes than I wished for.

Lukas: Wait, what?!

Chip: You know, internet popularity is a double-edged sword. So my books got to be popular and it was very popular, that sounds a bit self arrogant.

Lukas: They were best sellers, right? I think it's fair to say they were popular.

Chip: In Vietnam, because while traveling, I met people on the road and I was young, I didn't know to handle all the attention. And people were like all these like “it’s not possible for a girl to travel by herself” and most young people were like, “she didn’t write the book. She didn't write any of that.” Like “she must’ve had people writing things for her, doing things for her, accusing me of having a lot of money, lot of travel and all. So there was a lot of controversies and I was a little bit offended. I was like, who are these people? Why are they like making me answer all these stupid questions? But at that time, I did not know how to handle that and it caused a lot of like a backlash. So I was like, okay, I'm so tired of this. I'm going back to school. So I went back to Stanford and I was thinking of doing something like writing or political science. And I was at Stanford and everyone told me that the question is not whether you should take a CS course but when you do it because 90 percent of undergrads take the CS course at some point. So I just took a course in the first quarter and I really liked it.

Lukas: When you came to Stanford, how old were you? You were already a bestselling Author then..

Chip: You don't ask that question [laughs]. So I was older than my classmates when I took my first quarter and it was fun and I kept on doing more courses. Before I knew it, I was a C.S major and I took an A.I course; I cried a lot in the first class because it was so difficult. [laughs] I think when I came into Stanford it was a peak of the A.I. Hype. Can I say AI hype?

Lukas: You can say literally whatever you want. [laughs]

Chip: So I did that and it was fun and yeah here I am and I think in my third year I taught a course.

Lukas: Your third year as an undergrad?

Chip: Yeah.

Lukas: Wow! I’m trying to think about what I was doing as an undergrad, I feel embarrassed.

Chip: To be fair, I was like older than most people, right? And I actually didn't have to spend time on frat parties, or trying to like impress people. I was pretty much done with the party scene by then.

Lukas: Right, so you taught a really popular class, right? I think you could say it was a popular class, I could say.

Chip: I think you can say that. It was quite unexpected. I didn’t even know that the class was popular. I was just teaching it. And one day walking to the dining hall, a friend was like, did you see that comment about you on Hacker News? And I was like, why would anyone say anything about me on Hacker News? And it turned out that my course had been picked up by hackernews and I was like wow, that's interesting. And at some point, you know what happened? I was not really active on twitter back then and one day I opened twitter and I saw I had ten thousand followers and I went, “Wow. Who are these people?” It was great.

Lukas: Well, it was timely class. What was the topic of the class? Just for people who might not know.

Chip: Oh, it was TensorFlow. I think it was the right time. TensorFlow was very popular in 2016. Wow. It's crazy how fast things change. Like back then in 2016, Tensorflow was all people could talk about and now it’s what people complain about. So, yeah. So I taught a course on TensorFlow and the official name was TensorFlow for Deep Learning Research, which is a not-so-flashy name and I think I also put a lot of materials online. And I think it was maybe the first college level course on TensorFlow.

Lukas: I’m trying to remember it in 2016, I think you must’ve had to compile it yourself right, to use the GPU back then? I'm trying to remember… I remember just installing TensorFlow as a pretty painful experience for me.

Chip: I don't remember it to be so painful. It was just some concepts that were a little bit hard to grasp. As in like a computation graph; so there should be a graph first before you can run it. So I think Tensorflow 2.0 now is a bit different.

Lukas: Got it. How did you come up with the material for that class? How did you even think of that?

Chip: So when I started teaching the course, I was just hoping to learn personally, you know? I started taking courses as a sophomore, sometime in my second year. So I didn't know a lot. I interned at a start up where they used Tensorflow it was an internship and I was blown away. Like, wow, I didn't realize you could do so many things with it. So I went to a couple of my professors and was like, can you teach a course on this? And my professor was like, I don't have time. Why don't you teach it? So I was OK. And I got a lot of people to help me. I had some friends at Google who knew a lot about tensor flow, I had professors take a look at my curriculum and got a lot of feedback. I read lecture notes. I was really nervous so I had really good friends who were coerced into being my fake students. So for every lecture, I would make them sit and listen to me. Give them a fake lecture. So I think I got a lot of help. It was like learning together with my students. I didn't think of it as teaching as much as a group study.

Lukas: That's super cool. When you came to Stanford, you hadn't taken any computer science class before?

Chip: So I came from a math background, so I did math in high school. So I think I took some C.S courses but, you know, it was more like very, very basic. And if I remember, it was a blue screen back then. Wow. Yeah.

Lukas: This is simply amazing, you went from introductory computer science type stuff to teaching a TensorFlow class two years later. It’s amazing. Do you have any advice to other people who want to learn this stuff?

Chip: I think this is the beauty of computer science; the barrier’s to entry is really low. Also with ML. Especially with the experiment-oriented progress which translates to mean that you actually don't need to know a lot of theories to make a contribution. So I've seen people who get into ML for like a year and are able to make pretty great projects. Which like I am still a little ambivalent about it. It’s good as it lowers the entry barrier and allows more people from different backgrounds get into it. So like what does it say about the field, when somebody joins for a year and is able to make a pretty mind-blowing experiment. So I don't know how to feel like...

Lukas: Maybe it means that there's lots of interesting stuff to try [laughs]

Chip: I'm so skeptical of giving advice. I think I’ll just say get your hands dirty. Then try things, try things out and be friends with smart people.

Lukas: Be friends with smart people [laughs]

Chip: [laughs] I have friends smarter than you, I think. I don't think I could have got anything done without my friends. Really.

Lukas: That’s so cool. I think that's good advice. Why did you choose to go into AI?

Chip: For me, it was just the promise that AI held. So I come from a village in Vietnam and I travelled and there was a time I realized that it could be great to actually overcome language barriers. For the simple majority of human knowledge, it’s written in English, and people who don't speak English can't access it. Like people in my hometown can't really read anything that I write in English or my parents are afraid of visiting me in the U.S. because they wouldn't be able to navigate the airport or how to get here. So at this time, I was really interested in machine translation and thinking if you can automate a translation process, then it could be really, really helpful and if we can overcome the language barriers and help people, then maybe people from my village can access human knowledge or just step out of the border.

Lukas: That's so cool.

Chip: That's what I thought back then and very idealistic.

Lukas: What are the topics that are most interesting to you right now in machine learning?

Chip: I think over time, what we are liking is better engineering in machine learning. So there are two aspects, both as in engineering and in research and during production. So in research there are a lot of researchers who are amazing at what they do, but who are also not good with engineers and it's not because of them, it's just because as humans, our time is limited. We focus too much on research. We can't expect them to be great engineers. So I wonder if there is a better.. if we can build a good tool or toolset to help researchers carry out their research more efficiently. Also if you have clean code, it’s easier to experiment and this helps with reproducibility and in production, I also think that there’s a gap with people; researchers and production engineers. I think there's been a lot of progress in researching machine learning and now the question is how do we bring the research into production and that’s what I'm very interested in and also the start-up I'm part of right now is also focusing on that by helping companies productionize machine research.

Lukas: You've worked with some big companies, generally what kinds of problems do you see when companies try to take on research and production? What are the main ways you see companies fail at this?

Chip: One of the big things is a lot of companies are chasing buzzwords. They are like how can we use BERT, How can we use transformers? And you can look at them and say you actually don't need that. You don't even need deep learning. Like, a lot of your problems can be solved by traditional classical algorithms. So sometimes companies say things like this one should use very fancy techniques. The reason can be because they don't understand what is happening, because I think there is a lot of misunderstanding in AI reporting that you will see, like a lot of journalists and reporters talking about AI and if they don't have background in AI they can simplify or just present it as an accessory they give. Like what? What exactly is going on? And it's like in some companies may just want to attract clients, like you said, they say we are using state-of-the-art techniques. So some companies actually go out of their way to try to use that. So that’s one problem. I think the second is a lack of data, and I think you guys know that very well because you also are trying to solve that problem, right? So in research, people work with very clean static datasets and in any business, you'll want a clean and standard datasets because you want to focus on separate models. I think more and more as models are being commoditized, you can take off-the-shelf models. So now the bottleneck is data and real world data is nowhere close to research data. So the problem is how do you collect data and verify data, how do you cope with constant distribution shifting/data drift? So there's a problem with data. Another problem is with interpretability. So in research, sometimes you can have more of the state-of-the-art but also in the real world, you just don't care about accuracy or F1 or whatever metric do you were seeing]. How can we explain the decisions that the model is making? I think a lot of people are focusing a lot of time on this so I think we are making progress.

Lukas: It's funny, I saw a tweet recently. I think it was someone had Opeen AI who was arguing that folks should not teach anything besides neural nets.

Chip: Oh my God, it’s such click bait.

Lukas: And it's funny because it reminded me of when I got my first job out of school. There was sort of a similar debate but it was different topics, it was basically like machine learning versus rule-based systems. And there were a lot of older researchers who had kind of built their careers on logic and rule-based systems. And they would say, oh, obviously you should do both and I was like, “Come on! Rule-based systems don’t really work on anything. Can you find me a benchmark where it actually makes sense to use a rule-based system?” And I was thinking, that’s how I felt at the time and I think I might still… I mean, you don't see a lot of rule-based systems in production in the last decade or two. I don't come across them, I guess. And then I was thinking, you know, when that person made a click-bait tweet and I went, no, no, it's ridiculous. And then now I’m thinking, like, am I now the old guy who is just justifying the things that I know…

Chip: No.. No.. Did you get baited? Did you participate in the discussion?

Lukas: No. I'm always afraid of controversial topics on Twitter.

Chip: [laughs] It is not even controversial, it’s just wrong. Like I don’t get it. I mean, if it is not from some company, I would that person was trolling. Maybe he's trolling. I'm not sure.

Lukas: Unlike my work where it just feels like mild research - it's not in neural nets at all, you actually work in neural nets. What are the non-deep learning algorithms that you think are useful that you would keep around? And why would you use a different one?

Chip: I mean XGBoost is still like the most popular algorithm in Kaggle competitions. K-Nearest Neighbors is still really good for anomaly Detection there are a lot of really great algorithms. A lot of people don’t even know what Boosting is which is a bit sad. The other day my friend was telling me about how he was interviewing somebody and the person could explain perfectly well what a transformer model is but can't explain what a decision tree is. And it was… I don’t know, maybe I'm old, too. I don't even know anymore

Lukas: What are the situations where you would recommend using a boosted tree versus a neural network approach?

Chip: I think definitely from baselines. For example, If a simple model does a job reasonably well, there’s just always value in trying. But in production, the simpler a model is, it’s easier to understand, implement and to avoid mistakes.

Lukas: So if you don't get improvement from more complicated methods, don't go there.

Chip: It's also hard to tell because a lot of improvement is incremental, right? So you can say this only gives 1% improvement, it's not worth it. But for like a 1% improvement, you invest more time investing again, then another and another and another and over time, you can get it up to 10% improvement but stifle it from the beginning, then you would never be able to reach the point where it should be. I’m not pro using machine verb or an deep learning, I’m very pro deep learning, I’m just saying we should not forget singular baselines and I don't think we spend enough time talking about or defending baselines.

Lukas: Interesting. A lot of people have said that and I happen to agree with them, but if someone were to ask you why are baselines important, how would you answer that?

Chip: A metric, by itself, doesn't mean anything. You saying 90% accuracy doesn't mean anything. So we say, “Oh, my model is amazing. It has this accuracy.” What does that even mean? For example, somebody showed me this model and is like it has 90% accuracy and say, wow, if you predict at random it's like 89% accuracy already, so what is the hope? What is the point of getting this? So I think baselines and landmarks should help you localize where the model performance is and where you want it to get you. Also interestingly looking at human baselines, for example. If humans could just understand just how well they could do... So maybe say 90% or a really amazing human baseline is like 85%, we say, oh, that’s superhuman performance, but even with a human baseline of 99% percent, we know that we still have a long road to go.

Lukas: So the human baseline is maybe in some sense like a best case scenario.

Chip: Yeah. I think in some cases, in a lot of cases, human baselines are equal, but it's not so always.

Lukas: Right. Another thing I'm curious about is your work on this, because I think fake news is probably going to be a big topic again with the election coming up.

Chip: I think this was a fun class project. So it was after the election and we were curious to see… I think echo chamber can also help echoing fake news. I feel like the same fake news is usually circulating the same echo chamber. And, you know, if one echo chamber shares a certain piece of news and now the echo chamber shares similar news but with different prospective, I'm not sure how healthy this might be but it might be interesting to cross-share, to bring the similar piece of news from one prospective to another echo chamber.Not fake news, of course, there’s no point in spreading fake news from one echo chamber to another. So what we did was that we got a lot of tweets and formed hashtags. At that time, as I said, it was after the election, so we collected a lot of tweets from during the elections in U.S. sitting states and what we did was we had some seed hashtags that we know was from tweets that were pro-Republican or pro-Democratic. So from those seed hashtags, we made exemptions that if two hashtags belong in the same tweet, then they likely have the same sentiment. If one hashtag appears next to a hashtag that’s pro-Republican then it's likely to also be pro-Republican. So from that, we had an algorithm that was used just to resolve conflicts. It's very simple like major reporting and from that we were able to label about a thousand hashtags. And so from those hashtags…

Lukas: So just to be clear, you labelled those hashtags as liberal or conservative essentially?

Chip: Yes.

Lukas: Based on what they co-occurred with?

Chip: Yes. So after that, we looked into tweets. So we built a graph of relationship between users, for example, if user A replied to or retweet another user’s link, we group them. So we built graphs of users and we had to look as a kind of the hashtags they use. So we tried to predict whether a person is liberal or conservative and then we used some graph algorithms to detect communities and then we looked at those communities and checked whether this community had much more conservative than liberal members. It was really fascinating because what we found out was that…. About 50% of the communities we found are neutral, like there’s a difference between the number of conservatives or liberals is not that high. But then about 25% of them are conservative versus the number of conservative members are like more than three times higher than Democrats and this is 25 percent Democrat Communities.

Lukas: So you found the echo chambers?

Chip: I'm not sure I would say echo chamber but I just feel like people who share similar beliefs definitely have stronger ties with people with different feelings.

Lukas: I see. So then were you suggesting to sort of like spread information between these communities or...?

Chip: We never had access to that because it would require having such a strong social network, but we would have ideas. So we ran to some literature and they say that if you actually show somebody perhaps a news article with an opposite point of view from their beliefs, they're going to ignore it. So if you believe in a show and saw the opposite of it, you’d only be like, “oh, just fake news”, right? So you actually have to slowly show them a similar news article but with a slightly different point of view. You can't give people a totally opposite viewpoint and expect people to listen to it.

Lukas: I see. That makes sense.

Chip: So we never had a chance to test an algorithm but we thought that detecting echo chambers maybe a first step in finding a way to break them up.

Lukas: I see. Yeah. A lot of people just want to live in their echo chambers [laughs]

Chip: I think that Silicon Valley is a massive echo chamber, really. I think we all live in the bubble and I feel like somehow this pandemic has made me realize how different our bubble is and how strong it is.

Lukas: The other question I wanted to ask you about is you've recently gone from a bigger company to a start-up, which is a little bit of a shift. Has there been any surprises there? How does that feel to go from, big company to start-up? Was that a big cultural shift or...?

Chip: It’s a big difference. It’s such a big difference and it's just what I wanted. I thought that after graduation I would like to try a different working environment to see which one I would like. So leaving NVIDIA was not a reflection on NVIDIA, it was just a reflection on myself, because I just wanted a change. And my co-workers at NVIDIA have been really, really, helpful. And, um, they're really great. So it was a big shock to join a start-up. The first I think, is the workload is so much more; which is a good thing.So you know at a big company, you might be leaving work at like 5, 6:00pm but at a start-up, you might have P.R requests at like midnight on Saturday, right? Which is not a bad thing… I don't know.I think I'm just still very ambivalent about one's work-life balance discussions, you know. So some people say, “You shouldn’t work on the weekends” or “No company should expect its employees to work on a Saturday evening.” As much as companies may not expect you, it isn't what they expect of you as much as what we expect out of ourselves. I don't want you to promote the pressure of working too hard but I do believe that when you leave careers, there are certain compromises you might have to make, which depends on what you want out of life. Anyways, there's a very roundabout way of saying that I work on weekends. [Both laugh]

Lukas: It sounds like you're enjoying that experience.

Chip: I also don't have much of a life, you know, so...

Lukas: I guess right now there's less going on, yeah

Chip: So yes, it’s a big shock that people work on the weekends. I think I like it. I'm not sure how much longer I would like it for because I also heard that when you have family, like for you guys, I heard you had a baby recently. Congratulations!

Lukas: Yeah.

Chip: Do you still work on the weekends?

Lukas: I don't think I have a very strong point of view. I feel like it's a little bit weird to tell people not to work super hard. Like, I worked incredibly hard in my 20s and. I kind of like to imagine that hard work pays off. I feel proud of the stuff that I did.

Chip: I think you did a great job. I mean, I’ve heard you have done a lot of great things. I’m a fan.

Lukas: Oh thank you. In the best situation, working really hard can be incredibly fun like for me. So I’ve realized that I’d rather run a company than, you know, work for someone else, that’s my particular point of view. But for me, working really hard, can be like a real joy. Like when I started my second company, one of the really fun things for me was that it actually made sense for me to pull an all-nighter once in a while, allow it to happen.

Chip: Really? Do you still pull all-nighters?

Lukas: No, now I have a baby so I have a different kind of all-nighter. [Both laugh] I think that the important thing is that the company is trying to do something and figuring out how to do it over the long term is the important thing and it needs to be at a sustainable pace because if you work hard and burnout, that's super counterproductive. But I don't always think that burnout actually comes from working hard. I think burnout comes more from like working hard at things that seem pointless or not seeing the success that you wanted.

Chip: Yeah. That makes sense. So I think for me I see people working on the weekends doing better and like me I also like working on the weekends because I feel motivated, because I really like what I do and I have faith in it and I also want to contribute. Like if on a Saturday night I could stay at home and watch some I don’t know, like the bachelor. I don't know what people seem to be watching or I could just go on Github and check out some P.Rs. I know that sounds like a horrible analogy, but yeah. I feel motivated. So okay. Anyway, back to the question, the first thing I noticed was the different workload. And I feel okay to work on the weekends because I also get to talk to my co-workers on the weekend and I like it. Then there’s a very simple understanding because I think everyone knows that some people work on the weekends and it gives them more flexibility. Like if you work on the weekday and you feel burned out and tired, you can take a day or two off during the week, it’s fine as well. And so this understanding is in consideration while making the schedule as it will be unnecessary for us to follow a typical five-day workweek and I work on the weekdays and then take the weekend off. I think the second thing is the exposure I have to the entire stack. So working in a big company, I was focused on very specific products and shielded away from aspects like QA or client but at a start-up, I get the chance to see everything and we sort of built it from scratch. So I'm exposed to the decisions that we have to make; like what’s the tool to use or how to structure the repo. For example, do you want a modern repo or do you want a very small repo? So a recent decision is what tools to use because when you join a big company, usually you had to use standard tools and someone has decided for you but at a start-up, you have a say in choosing the tools which expose you to various problems as well. So I really like it. And also, of course, there’s the size at a big company. There are a lot of people and I think it’s nice that at a big company, you have access to a lot of people but at a start-up, you have only a small number of colleagues so you can’t send a message to somebody from another team because all you have are just the people on the team.

Lukas: Cool. Makes sense. So I think we’ve gone a bit overtime but we always end with two questions, really kind of curious to hear your takes on these. So the first question is, what is a topic in machine learning that you think people don't talk enough about or is an underrated topic that people should talk about more, but they don't?

Chip: So I think you have a list of things I usually carry around. One of them is graph. I love graph and I used to think they were underrated, I tweeted about three years ago about it. But I think that has changed. Like I was at Neurips in December and I saw so many papers on graph and there was a workshop on graph and it was one of the most attended workshops.

Lukas: They tell you about the computer science sense of graph, right?

Chip: More like grafh networks. So the graph theory. So now there are a lot of graphs like GNN or GCNN and a graph convolution networks. So I think there are multiple uses of graphs in deep learning. So a graph is a measurable computation of many inputs, right? So I have a `data from social network that for example, is a graph or reclamation systems and we have users and items and it’s a bi-part graph. So graph is a measurable representation of input and a lot of distributions can be represented using graphs like pictorial graph. So graphs can be both input and output. And also graphs can also have a lot of relationships with convolutions so they focus on the local connections. So a graph at a point can be connected to major neighboring points and convolutions when you have a very local linear transformation. I don’t know if I’m making any headway with all of the explanation right now but yeah, I’m in love with graph and I'm so happy to see that it's catching on. I think that other things that are underrated is the engineering aspect of machine learning. So we see a lot of people talking about integrations for deep learning or version control but I think people, from what I’ve seen are beginning to catch up with it.

Lukas: Cool. Good answer. Obviously I agree. The second question is, in your experience, I'm really curious I guess at this and actually say in your experience...

Chip: If you don’t mind, I’d like to add that I feel like another thing that’s underrated is, I’m very production-oriented, so I think monitoring. If you deploy a system, how do you monitor it? How do you know when you need to reach in the model? How do you know data distribution has shifted? You know, so I haven't seen a lot of monitoring. So I think it's still very underrated.

Lukas: Yeah, totally.

Chip: Yeah. Sorry.

Lukas: No, no.

Chip: Trying to sound smart, you know?

Lukas: I think you’re successfully sounding smart. But the second question is, in your experience, taking projects from training into potentially deployed systems, where is the biggest bottleneck? What's the hardest step in the process?

Chip: What’s really clear right now is when you have very big models, it's really slow to run inferencing for so long. It's very slow and it can be very costly. So for example, you can try to take GPT2 to go into production and you can just even spot instances where it costs quite a bit for every inferencing, like for every time you make a prediction, you want to generate something. So this is why we haven't seen a lot of GPT2 in production yet and it's a very interesting problem. Like I'm not sure if I can mention the exact company but some start-up told me it wasn't using. GPT2 in production and they say if they could reduce the inference time by half, they would be able to break even so I presume that instead of using a novel precision point, if they can somehow make it work on first for 16, at half-precision point, then it can reduce the inference then by half and therefore, it’ll help the company stay afloat. It’ll make a really big difference; like you just break even, or not. Especially in this economy.

Lukas: Yeah, totally. Wow, great answer. Interesting. My final question actually is simple. If people want to learn more about your work, do you have a Twitter account or a company account you want to tell us about?

Chip: Yes. I spend too much time on Twitter and I'm ashamed about that. Follow me on Twitter.

Lukas: What's your Twitter handle?

Chip: It’s @chipro. That’s c-h-i-p-r-o. as in professional, but r-o means crazy, so chip-crazy.

Lukas: Oh really? I did not know that. That's awesome.

Chip: Yeah. Also I have a blog where I blog about tech and stuff but I write long form like each of my blog postsa takes me like two to three months to write. So I don't write a lot but you should check it out.

Lukas: That's great. Thank you.

Thank you. Coming from you it means a lot.

Join our mailing list to get the latest machine learning updates.