Building Level 5 Autonomous Vehicles with Lyft's Anantha Kancherla
Anantha and Lukas dive into the challenges of building deep learning models for self driving cars and deploying them into production.
View all podcasts
Gradient Dissent - a Machine Learning Podcast · Evolution of Reinforcement Learning and the Robot Hand

Anantha Kancherla, VP of Engineering at Level 5, the Autonomous Vehicle Program

As Lyft’s VP of Engineering at Level 5, the Autonomous Vehicle Program, Anantha Kancherla has an insider's view on what it takes to make self-driving cars work in the real world. He previously worked on Windows at Microsoft focusing on DirectX, Graphics and UI; Facebook’s mobile Newsfeed and core mobile experiences; and led the Collaboration efforts at Dropbox involving launching Dropbox Paper as well as improving core collaboration functionality in Dropbox.

He and Lukas dive into the challenges of working on large projects and how to approach breaking down a major project into pieces, tracking progress and addressing bugs.

Check out Lyft’s Self-Driving Website. And this article on building the self-driving team at Lyft. Follow Lyft Level 5 on Twitter.


Topics Covered:

0:00 Sharp Knives

0:44 Introduction

1:07 Breaking down a big goal

8:15 Measuring the performance of deep learning models

10:50 Allocating Resources

12:40 Interventions

13:27 What part still has lots of room for improvement?

14:25 Various ways of deploying models

15:30 Rideshare

15:57 Infrastructure, updates

17:28 Model versioning

19:16 Model improvement goals

22:42 Unit testing

25:12 Interactions of models

26:30 Improvements in data vs models

29:50 Finding the right data

30:38 Deploying models into production

32:17 Feature drift

34:20 When to file bug tickets

37:25 Processes and growth

40:56 Underrated aspect

42:34 Major challengesto deploying deep learning models for self driving cars into production

Lukas: I assume that the goal is to make Level 5 automation then?

Anantha: Yes, Level 5 automation aspiration but I'll take Level 4.

Lukas: I was actually wondering... I don't think I've ever been part of a team with such a huge singular technical ambition. I was wondering how you break that problem down into constituent parts, like how you think about what the weekly KPIs should be when you have this gigantic goal.

Anantha: Honestly, this is for me how I've always worked and I don't know why. I guess because I started my career at Windows and it was like a few ... by the time I left because it was about well into tens of thousands of people working there all working towards one product, with a singular focus on one product.

Lukas: Right.

Anantha: This is obviously way smaller than what Windows was, but it's the same idea. It's a good question, you know? What does it mean and how does it break down? So usually in a project like this, you're going to have so many different skillsets and ML is just one of them. Usually when people think about self-driving cars, most think about the AI part of it but the way we think about it is that first of all, you have to think about the work that we do as in two parts; the work that happens in the cloud and the work that happens in the car - the code that runs in the car and the code that runs in the cloud. For the code that runs in the car, you can think about it as, if you really want to simplify it, in two parts or maybe three parts. At the lowest level, you have the operating system. Again, if you want to compare it with a traditional development environment, like imagine you have your OS that you're targeting and then on top of that OS, there will be a runtime that you're going to write - if it's Android, Java or whatever. There will be an equivalent runtime that you want. And then on top of that runtime, you'll have your applications. That's how you would write a typical one. It's a similar idea here. You'd have your OS, that is running on the card hardware. Now, the card hardware is way more complicated than anything that you've seen on the phones or PCs. We say it's like a data center on wheels. Typically a car has a lot of computers in it, some of them we write software for, some of them come with the car and they're all in a gigantic network. Most of the code that we write runs on what we call High-Performance Compute but again, different companies root in different things so they may all factor the workload in different ways. You can imagine multiple smaller computers, or one large computer and one small computer... there are different configurations possible and you just haven't figured out how you're going to break down your workload. And then on each of those computers, you're going to run, depending on how big it is, a fairly beefy operating system or possibly even no operating system if it is a microcontroller. And sometimes, there'll be embedded processors and various microcontrollers and then on top of that, we have a framework that we build that basically enables the software components that are running on that one computer to work with each other. A good equivalent of that is in the open source world, you would have run into something called ROS. It's very similar to that but you can imagine they can also communicate across the computers on the network and then on top of that, you write the functionality that actually makes it autonomous. But then you can imagine that's just one but it could also have a calibration functionality that require a whole bunch of other little pieces of functionality that you would write. But autonomous itself kind of breaks down into your classical robotics paradigm: Sense, Plan, Act - Sensing basically is what we in our world call it, Perception. You have a block of code that predicts how the world is going to change and there's a block of code that basically figures out where I am within the world. It's called localization. Then there's another block of code; once it knows, "This is what the world looks like. This is where I am at. And this is how the world is going to change in the next few seconds. How am I going to act? What's the plan?" And then it sends it down to the actuators which is the control part of it. Those are all the kinds of components that work on the car. And now there's a lot of code that actually runs on the cloud, for developmental reasons as well as even during deployment, like you do. There are teams that actually build infrastructure, because even though we are part of Lyft and Lyft is very much a cloud company, the kind of workloads that the ride-sharing part of apps run is very different from the kind of workload that we run.

Lukas: Right.

Anantha: The amount of data that we collect or the amount of compute that we need is on a very, very different scale and the requirements are to be very different. So we have teams that actually think about the data part of that infrastructure, teams that think about the compute part of the infrastructure and then we also have to think about testing all of these. Testing, obviously with a unique test, you will do whatever with your code then there's also the other side of the testing, which is on the road - you build everything and deploy it; but then there's a whole lot of other testing that needs to happen in between. Simulation is one example where you try to run the software that you built, that will eventually run your car, somewhere in the cloud. And then we also have rigs that we build; we call them test builds but another term that you will often hear is hardware in the loop testing. So you build depending on which team it is. Like every team will build their own smaller versions of hardware then there'll be full system tests. So there are different types of these testbuilds that we build and you can think of them as mini data centers that we have. You run the code on those as well. We treat them as if it's like another cloud. So we have teams that do all of that and then there's teams that work on simulation.... All of these teams eventually come together at the end of the day. It all gets packaged up into software that is the part of the code in the car and then we test the car on the road, and then there are metrics used to drive work on the software that runs on the car but quite often it can also impact stuff that happens in the cloud. Like, for example, if you change your sensor and you capture a lot more data per hour. That means you may have to potentially replan your storage capacity. Things like that. Did I answer your question?

Lukas: There are so many more questions actually. I think you're at the point where you probably have some metric like how long you can drive without intervention that you're trying to optimize for something like that. But then do you break it down by teams of like, "we need to make our perception 10% better", or something like that? How do you think about that?

Anantha: It's kind of broken down, right? For example, there's a top-level metric for the world system performance and then...

Lukas: Is it like time between intervention? Is that the right metric to look at?

Anantha: California DMV reporting that that's what they look at. So they look at what's called MPI, Miles Per Intervention; how many miles do you drive before you have an intervention? It's a really common metric that people track but then there are so many other metrics that you have to think about. Like the endpoint latency is one example. How long does it take from, say, the time you your camera captured a frame to the time that you reacted to it? So there's a number of other metrics that matter but of course, you can argue that all of them come down into an intervention like some human had to intervene. That's kind of how generally the industry is standardized around today but, you know, it's really controversial because what is an intervention? How do you report it? It's up for debate. But then internally we track a number of other broader system level metrics and you can do two things. One is you can apportion. Let's say you do MPI, the Miles Per Intervention, you could apportion MPI to different components and say, "hey, the reason the intervention happened was because we misperceived, the reason the intervention happened was because our map was wrong." And apportion those. And they can go round as far as they can go round. That's it. But then really, that's just only part of the problem. So each component will also have its own separate metric. Perception, for example, we may want to track the position and recall of seeing different agents. What's the position recall for seeing a pedestrian or a car or a bus? And then you can follow something like that. How good is my perception/recall at 50 meters? 100 meters? 500 meters? So there's lots and lots of metrics that eventually break down at the component level where it comes down to every component.

Lukas: How do you allocate your resources? Is the perception team a lot bigger than the planning team?

Anantha: That's a very good question. I'd say it's roughly similar between the two teams. I don't think that there's a perfect science when you want to allocate resources. You kinda have to look at the stage of the project you are in and the majority... Because sometimes each project, each arc of the stack will move at different paces depending upon what they're building. So let's say you're doing something which is highly machine-learning-dependent, I'm talking about when you're starting from scratch. Steady state, of course, is very different. When you're starting from scratch, when you're beginning, you first may have to spend quite a bit of time building your machine learning infrastructure, data gathering, all of that. So those teams you probably want to populate first before you start throwing models at it. Maybe you can get away with a relatively rudimentary team, a smaller team of just a few code experts in the perception part. And then once that is ready, then you start putting more people in that area and maybe you don't have to work so hard to throw additional people on the infrastructure side of things. Once you start unlocking the ability to see the world, then you can start doing more and more complicated maneuvers and planning and then you start pushing more into the planning world; and then you start hitting bottlenecks on that side and then you will kind of say, "oh, yeah, I should look at adding few more people to unlock this thing in that area." It kind of is very, very dynamic so I wouldn't say there's one standard formula through which we do resource allocation here.

Lukas: I see. But is it like where you're seeing the most actually interventions being caused by, or is it where you see the most opportunity for improvement?

Anantha: I think interventions in a steady state world, let's say that a lot of bugs ignored interventions. I mean, if you just replace interventions with bugs. It's the same problem in any software; like rarely have you biggeest bugs. And sometimes throwing more people at the problem is the right answer. And sometimes it is not the right answer. In fact, it's the wrong answer. So you may want to figure out putting the right people into that. Maybe you don't have the right expertise. So it's not always clear that the resource allocation is directly proportional to the number of bugs that you have.

Lukas: I see. That makes sense. At this moment, is there a particular part of the chain that feels like the most challenging for you, that feels like there's the most room to improve?

Anantha: I think that if you look at the state of understanding and state of research in this space, the place where there is a lot of scope for improvement is in the area of prediction and say, behavior planning. So there's an area where there's still a lot of active development going on. The industry is changing fast. Just recently, I saw a really cool paper from Waymo's research team. So there's a lot of activity going on in that work. So I would say thats an area, which is developing quite a bit.

Lukas: So this is like predicting where another car is going to go whether at a pedestrian...?

Anantha: What will happen in the world in the next few seconds? Who should I pay attention to? What should I watch out for? You know, all the things that we as humans take for granted. Those kind of problems.

Lukas: So inside of the car when it's when it's operating right now, how many different models are running approximately?

Anantha: Oh, boy. I'm not sure I can tell you the actual number, but in terms of ML, we have so many different ways of deploying these models, right? On the car, you deploy them and then in the cloud, you have a couple of different ways of deploying them. And if you look at Lyft at large, including ride shar, there's so many different ways. Like there are times when somethings that look like just run their models on the desktop once in a while pretty ad hoc there are online loops and there are online learning that is happening, they're all running in the cloud. Then there are models that are running on the phone, and models that are running on the car. So we have models pretty much everywhere.

Lukas: So I guess you're responsible for the ones that ran in the car and the cloud for now.

Anantha: We also help the ride share team as well. So there's a few people on our team who help out, because we have a lot of pretty amazing machine learning people so we also help the core part of it. We do have visibility into like how they do it also. In fact, sometimes our teams work on the cell phone models or like the offline models.

Lukas: It's cool to talk to someone that's working on so many models at the same time. I'm really curious about your infrastructure for all these. How often are these models updated?

Anantha: It completely depends on which one you're talking about. So if you're doing the mapping ones that's really dependent on why you're using the model. So sometimes, these models are used to help the operators as they work on the new UI techniques or whatever where there's some additional assist they're providing. They made update it when that time comes. Otherwise, generally they're assisting humans as opposed to doing it on their own. So those models don't update us frequently. But the models that are operating on the card, you do that depending on what you're addressing and which area that you're trying to improve. So let's say you're working in winter and you see a lot of vapour or smoke much more visible, there'll be some parts of the code that are more impacted by all of that. So you'll see those iterating very fast. In general, though, these models tend to get trained and iterated upon on almost a continuous basis like the ones which go on the car.

Lukas: Did these models feed into each other? I remember when I was following models, there was a big version problem of one changes in the downstream ones need to update, do you actually then retrain everything downstream from a model if you change like an upstream model? How do you keep track of that?

Anantha: The models do feed into each other. That does happen. This is where I would say that since I'm not day-to-day involved in this work, I don't know the specific details about how the team manages it, but I don't think that they have to go and retrain their downstream models. But maybe you can think about it as you have your model metrics, right? So you train your model. You get a bunch of metrics around the model. But that's not enough. So you have to look at downstream metrics also. Because quite often, those tend to be the trickiest bugs also. So you bring your model, it all looks good in terms of the metrics and it's all working fine. You deploy it in the car and then you see the behaviors change quite a bit. The car may decide to break more often, or do something different. And then you have to debug that because the model behavior has impacted something downstream. So then you have to debug that. So it's not necessarily that you have to retrain those downstream models, you may just want to figure out where the interaction is happening. There are a number of few things you do have to be very careful about though. Like the validation search that you'd use to validate this model and the training set that you'd use for the training downstream. You have to be very careful of keeping them all separated and hopefully there's no overlap. Otherwise, you may introduce some of your artifacts. So those things the team has to be very careful about.

Lukas: Do you think it's harder with these kinds of models? A lot of people have talked about like predicting timelines is much more difficult. Have you found that to be the case or..?

Anantha: Predicting timelines?

Lukas: Timelines of improvements. I feel like software's already hard, right? But with the models, it almost seems like it might be unknowable to know how we get X% improvement. Do you give your team goals where you say, "look, I want to see a 10% improvement on this accuracy metric"?

Anantha: They set themselves goals for improving it. I think ultimately it's the same with any software. You set yourself a goal. But just because it's machine learning doesn't mean that it's a new problem. The problem has got to do with the fact that you really don't know the perfect solution and you can't really estimate what it'll take for you to get to the perfect solution. So the way you do that is by series of experiments, by iteration. Because if you know exactly what to write, then why wouldn't you pretty accurately estimate the time? And sometimes that's the case like in say you say, "oh, I need to know refactor this thought." You know roughly how long it would take and you have test cases around, you can test it so you know all your unknown unknowns, all of those things that people get off. So you become more and more predictable over time. That's basically what I'm trying to say. Basically, what happens is that as you keep working on the problem, you start having a better idea about how long it will take because you start developing intuition about that particular area. Then you probably have unit test sort like integration tests and some other tests that help you guide and focus on the right areas and carve out the noise. Then you tend to get a lot more predictable in the work that you do. Then after that, if you change your model, you come in with a new model, you know how many experiments you need to run. You know how to scale. Then at a point, it's like throwing money at the problem. You paralyze it and you do a lot more work. But you are getting more and more predictable just because you've built all those intuition and all those collateral through tests so I would say that, going back to your question, are they predictable? Definitely not that I would say. But as they start working on it, they get better and better and more accurate about it in terms of how much they can do.

Lukas: And this isn't predicting improvements, like incremental improvements.

Anantha: I would say more like let's say they're trying to fix issues because they tend to get like more and more predictable about that. Now, they're bringing in they say like, "oh, the circumstances with the goal of saying I'm going to improve it by X% improvement." So the best way they can do that is by running a whole bunch of experiments and see how fast they can come. Even that, if your infrastructure is better and you have a good set of personnel that you can get incrementally better but I don't think that it's any different than any other software development that you can get super predictable.

Lukas: I guess one question or one thing that some people say is different, or that I imagine is different is testing these models. Before you put them into production, do you test them against a set of unit tests where it's, "I insist that the model does this or that the car does this in this situation?" Or is it more like an overall accuracy of I want it to make the right decision 99% of the time. How do you think about that? Because aren't these models somewhat inherently unpredictable or they're not always going to do exactly the same thing, right?

Anantha: Right. So the way it works is that you have a model and you will have a certain... I'm talking in terms of Perception models because if you're doing something else downstream in the area of planning, it's pretty different. So you will have certain metrics that you reached today. Then what you will do is obviously you're working to go beyond that metric, right? You can identify that as part of your model development. Like you'll develop it, you'll have the model results. But that's not enough. So you then have to do some level of integration testing. You put it all together and then you see, like, "how's the downstream metric?" Let's say if it's perception. The output of perception that really planning would consume is what we call Tracks. These are basically objects over time that you track over time. So you have to get those tracking metrics improved or better or impacted in one way or another in those areas. And then when you put that in the car, then the top level metrics that you have, like, how's the car behaving, whether it is driving, is it comfortable? Is it safe? So what are the metrics that you track for any of those things? So you have to get that right. So you have to go through this entire journey repeatedly. It's not you just run the model once and it works.

Lukas: And are you able to run these tests every time there's a new model, as you try to pass the first test and then sort of expand out?

Anantha: Yeah. You have to run through the entire gamut if you're doing something brand new.

Lukas: Do you have any interesting examples of something that improved the local tests but made that the test worse?

Anantha: Yeah, yeah. There's a bunch of examples. The thing is, I don't know what I can tell you.

Lukas: Fair enough.

Anantha: I mean, there are all these cases where you'll see the interaction between the model you see upstream, the perception and what happens on the planning side. I mean, I can tell you as a friend but maybe I shouldn't put it out there on the clip. But anyways,  I was just giving you an example of winter; you have a lot more smoke that you have to deal with. We could see that the model performance is really good but when we integrated it, it didn't work right. And so we had to go back and see if there was some interaction going on between the upstream model and the downstream model that caused this problem. These kind of things happen all the time. So the team over time has become much more rigorous about all these things. So any time they do this, there's a lot of automation built in. They test all these things. So they have to go through the whole thing.

Lukas: I see. When you see teams improve models, is it typically that they've collected more different data, changed the data pipeline or improved the model itself? Do you have a sense of what...?

Anantha: Most often you feel like the improvements happen with the right data.

Lukas: Interesting, the right data.

Anantha: It's less often that the model architecture itself has to be changed.

Lukas: Got it.

Anantha: Yeah.

Lukas: But is it that it's the MLT themselves that's asking for different types of datasets? Do they control that process, or is there a data team?

Anantha: OK, so this is another big thing that we've been somewhat religious about it, level 5. We don't have a notion of an engineering team, a science team or a data team. We just have a perception team. I'll tell you my mental model around ML. I think ML is a skill. It's like anything you learn.. you know how to write python, great. Or you know how to write C++, that's a skill. So ML is a skill but a skill alone is not enough. So you need domain expertise. So just because you know how to write Python may be good enough for some things, but if you're trying to build some complex insurance thing, you probably need to understand insurance. So how do you divide the domain knowledge from the skill? In some cases you can't. And you see that happening like so you'll have EMs write a spec and then they'll give you something and say, go and implement it. More often you'll see that the engineer has to really ramp up and truly understand what the actual problem is because they would debug something and they have to really understand what is going on. It's the same thing in ML. So we have a team that is called the prediction team, their job is to predict. So we don't have a difference between some data scientist or a data team and an engineer. So it's the same people who have the domain expertise and have ML skills. And that's how we've been operating so far.

Lukas: That's cool. So all of your teams have a mix of skillsets.

Anantha: Yeah. So this seems to be a pretty big debate in the industry, you know, should we have a science and an engineering team idea. So the mental model I've come up with is Job over Science; is to develop knowledge by what they produce, the production is knowledge and the job of an engineering team is an artifact. In most cases, we are actually building an artifact. We are building a product. So in each case I see that the science vs engineering divide to be less germaine in these areas. You can have a research team. Their job is to produce knowledge. And that's OK. But when it comes to developing a product, I've always found that it is better to have the domain knowledge and the skill people together. And I believe if you can find the unicorns in your fold, that's awesome. But we have very few of those, but then we kinda have to bracket them with people with the right skills. Does it make sense now?

Lukas: Totally. I mean, we see the same thing with a lot of the companies we work with and if I was in charge, I think I would lean in the same way as you've, make sure that the people doing ML are right inside the teams that are actually trying to accomplish something right now.

Anantha: And now coming back to the data question that you asked..

Lukas: Yeah.

Anantha: So if you are a domain specialist, you already have a very good intuition about what is the data that you want. And like we just said, most of the problems seem to be about finding the right data, then the right model. So you have this nice property where the team just knows what is the right data to seek.

Lukas: Got it. That makes sense. Is it challenging for you to deploy models into production? Like I've never had to deploy into hardware, is that a challenging step for you? Do you find sometimes the model doesn't perform the same way you expected when it's actually inside the hardware?

Anantha: I think there's difference between when you're building it for like a cloud service versus what we are doing here. Is that the model? There may be a transform after training, right? Generally, on the transform after training you could say quantization or something like that. So you have to develop a good understanding about that, about the impact of that on the model. The other thing that becomes really important when you're deploying, and this is no different for mobile apps, when you're deploying models in your mobile phone, is that you are really, really careful about power and latency. So you have to really be rigorous about your upcount, how much time does it take? So all of that you have to think about.

Lukas: Do you build infrastructure to actually monitor these models as they run in production.

Anantha: Yes. So actually we try... We have an internal framework that in fact, I was just watching the video just before this. They were doing our demo because they'd just built a new one. And that takes get all of these for you. So it would do the stacks and everything, like when you are building and bringing your model and running your experiments. And in fact, we dump all of that probably in your tool.

Lukas: Cool. Some people talk to us about worrying about future drift. Would you notice stuff like a sensor broke or something, or if the model's getting a different kind of data, is that something you look for? Or is it mainly just like latency of the model?

Anantha: Oh, I see. So you're talking about the model. Some strange behavior...

Lukas: Yeah some weird situation where it seems to be struggling.

Anantha: So it could be so many problems. Yeah. It could be some sensor has gone bad. I was just thinking of something specific here. Yeah, those things happen. But the way we find out a bunch of these things is that increasingly we depend on something we call unsupervised metrics. Like, what's the rough size of a bicycle?

Lukas: Yeah, like a meter or two meters.

Anantha: No, it's. I mean, probably a lot more than that, say three or four. But if you see a 50 meter bicycle, then there's probably something wrong with that, right? Just doing that as a very extreme example but you can imagine that there's a lot of such heuristics that people put together and track. And if you start seeing weird things happening or that, that enables you to catch lots of crazy bugs, and sometimes that's a really good way of catching long deletions as well, because it may not result in a disengagement, but you may see some weird behavior or it may trigger a disengagement but it is probably not often enough that that's an important problem to focus on. So increasingly we depend on all these unsupervised metrics that the data comes in, the new compute, all these various interesting statistics and the new figured out what is actually going on. And then you go back and resolve it.

Lukas: It's funny I was just looking at Jira tickets for my company... if you see, like one thing wrong, does that warrant you a ticket? Like if you see one bicycle that's too big, will you actually file a ticket against that?

Anantha: You should. I mean, all of these things should be like filing a ticket. If it is that glaringly obvious, and you have the time to do it, to take a look at it, you should.

Lukas: So give us only one example where it seems wrong and we're gonna take a look at it.

Anantha: Yeah. Again, this has nothing to do with to self-driving cars. I mean, we used to have similar problems in windows. You know, like there'll be some weird one off thing that you saw and we would record it. And then next thing you know, if it's some old changes that suddenly things starts to pop up and then you're like, "oh yeah, I've seen it in these environments and situations." And then you kind of are like ...So it's important to anytime you see something anonymous, you just find it. And hopefully, you have more context that you capture and then it'll help you debug.

Lukas: Do you have a team that's tasked with looking for that or is that kind of everybody's looking for those things?

Anantha: So we have a team that...obviously a lot of our reports come from the drivers driving the road. But then we also have to have additional people to go back and look at the data and see if there's something weird going on. They're not necessarily engineers, we call them Operations. So they scan and take a look at these things. And, of course, engineers also you know, look into these interesting cases. And then we actually look at it as well. But there's so much data coming in that, "which one do you look at?" and "how do you prioritize?" That really becomes a more interesting problem.

Lukas: It sounds incredibly challenging.

Anantha: Yeah, yeah, yeah. I mean, I believe that this is a problem of any software data scheme. So in our case, I mean, I've heard again that there's a lot to work on, major products that operated at scale. And it's the same problem, whether you're running newsfeed at Facebook you're running some issues in Windows or you're running a car on the road for thousands of miles. You get lots and lots of reports and that's the issue of diversity.

Lukas: Yeah. I guess these are the sort of issues for complexity and scale.

Anantha: Yeah. It's a complexity and scale problem. These are extremely simple problems. You have a sanitized test track and you're running your car and it's probably you can be very selective about what you do and be really rigorous. But when you're running it on the road, anything can happen. And it's like you run your operating system on any kind of PC anybody will tell you you still haven't figured out what's happening.

Lukas: So you've been at Lyft for almost three years now?

Anantha: Yeah.

Lukas: And the organization must have grown quite a bit in that time. I'm curious about how process have just changed as the organization has grown and things have solidified.

Anantha: A lot, right? It's interesting. There are teams which were nonexistent and then they've been built and then now they've gotten to operating at scale. I would say the current Perception team is one of those which is now operating at scale. But then there are still some new teams that are forming and it almost feels like they're doing things which were, say, a perception team or some other team was doing it at the better beginning. Of course, they have a lot more guidance now because there are other teams that are out looking for them and they get through it. A few things happened. One is as you get bigger and operate on a wider scale... In the beginning, we would not care about where the training was happening. The engineer would train it on their desktop; the workstation that they had, it could be like that. As the team started growing, more engineers started coming in and the reproducibility and all of that started becoming a real problem because multiple people are working on the same thing. And so then you start becoming much more rigorous in your process, and that's fine. But it would work only if you have maybe four of them working together. But once you go beyond that, process won't fix it, mutual agreements won't fix it. So you probably should be building a framework to help you standardize that process and just make you not worry about all the moving parts. Then after that, you'll find that the framework doesn't last you and then you'll write a new framework for the new scale of problems that you'll run into. And we've gone through all of that. Then another thing that will happen is as you ramp up and you grow that much, you'll start becoming cognizant of your cost. Especially if you're doing it on the cloud, they provide a lot of sharp knives. And as you play with them, you can cut yourself and bill yourself for that, in tons of money. And so again, you start becoming very, very careful and you try to build your ML frameworks or whatever, maybe it's not just ML, but even simulation. You start building your frameworks to help keep that in check. Then you start becoming very, very rigorous about your data partitioning. And you also have versioning and you track all of them and make in a tool. And you probably want a custom tool, which is kind of how we ended up building our own analytics tool internally to track a number of these things. And then you start getting even more electro-track all your experiment, and then you begin to use Weights and Biases. And then you start running into time problems. You know, you do more and more complicated models and then you want to get done with your experiments faster and then you start getting distributed training. And, you know, you've gone through this entire journey... I'm sure there's more.

Lukas: Yeah. What's next?

Anantha: I'm sure there's more. I'm sure if I talk to somebody at Google, they will probably have gone even further in the order of things to be done.

Lukas: Thanks. That was well said. We always close with these two questions. I'm curious how you're going to answer these. You may be very specific, but maybe you can expand it to autonomous vehicles. What's one underrated aspect of machine learning or AV's that you think people should pay more attention to than they do?

Anantha: I noticed that there's a tendency to think of it as just a skill, and it's like you throw it at your brain and it gets better and you can do that over and over again and then you may be able to get a good result. But I always go back to the idea that one very underrated aspect of machine learning is that it has to be coupled with domain knowledge. You really have to have a good understanding about what problem you're solving and have a good understanding of the domain. In fact, I would say spend quite a bit of time really understanding the data that you're going to get. And then because I said the right data is more important than a lot of data, actually, there was this interesting case where we made some change and we cut down our data usage by half, it became way cheaper for us and our model became more accurate. So that's what you get by actually genuinely understanding. And I would say I don't care too much about their domain knowledge but I think that's something that I would say is very important in this area. Especially with machine learning. But you could argue it's true for any anything but.

Lukas: Yeah. Yeah. All right. Well said. Today's last question is, you've actually deployed several serious machine learning products at scale, what's been the biggest challenge in taking them from the experimental stage to actually running in production?

Anantha: So one of the biggest problems with machine learning is that get into generalize right. So there are a lot of tail events and the beta is typically really sparse in your dataset and trying to figure out why that happened, what happened is generally really difficult. So this is one area that I can think you have to figure out how to combine machine learning with other techniques. You know, a place where you want to have absolute guarantees in a system like a robot where there is actually no human intervention. In other areas, I think there's some very nice properties if you're doing a human assist. So ML has a really nice property of being able to do good enough. And let's say you get some 99. ..some number of lines and then the remaining can be like augmented by human intelligence. But if you really are trying to do a perfect build a robot, then it has to really be completely autonomous. You have to figure out additional ways in which you can have some guarantees. And that's actually quite challenging. Figuring out everything.

Lukas: Seems challenging.

Anantha: Yeah.

Lukas: Awesome. Thank you so much. It was a real pleasure to talk to you.

Join our mailing list to get the latest machine learning updates.