Machine Learning in Production for Self Driving Cars with Nicolas Koumchatzky
Director of AI infrastructure at NVIDIA, Nicolas is responsible for MagLev, the production-grade ML platform
View all podcasts
BIO

Nicolas Koumchatzky is the Director of AI infrastructure at NVIDIA, where he's responsible for MagLev, the production-grade machine learning platform by NVIDIA. His team supports diverse ML use cases: autonomous vehicles, medical imaging, super resolution, predictive analytics, cyber security, robotics. He started as a Quant in Paris, then joined Madbits, a startup specialized on using deep learning for content understanding. When Madbits was acquired by Twitter in 2014, he joined as a deep learning expert and led a few projects in Cortex, include a real-time live video classification product for Periscope. In 2016, he focused on building an scalable AI platform for the company. Early 2017, he became the lead for the Cortex team. He joined NVIDIA in 2018.ūüź¶

Follow Nicolas on twitter: https://twitter.com/nkoumchatzkyūüõ†

Maglev: https://blogs.nvidia.com/blog/2018/09...

‚Äć‚úćÔłŹScalable Active Learning for Autonomous Driving: https://medium.com/nvidia-ai/scalable...

‚Äć‚úćÔłŹActive Learning ‚Äď Finding the right self-driving training data doesn‚Äôt have to take a swarm of human labelers: https://blogs.nvidia.com/blog/2020/01...

TRANSCRIPT

‚ÄćLukas: Nicholas - welcome and thanks so much for taking the time to talk with us. You're an expert on deploying deep learning in the real world. I would love to hear how things have changed since you started doing this in 2016, or maybe even earlier at Twitter? What were the challenges then - and what are the challenges now - that you're seeing, that make these models actually work?


Nicolas: Thank you, Lukas. I actually started learning about deep learning in 2014. So I'm not one of the old schoolers of deep learning, but I got hooked pretty quickly. I started at a small startup with five people, and we were acquired by Twitter. At Twitter we started the first deep learning team. Basically, Twitter didn't have any deep learning knowledge back then, or at least very little, so we were associated with software engineers to get deep learning into production in some product areas that could benefit from it.


Lukas: What were the first areas where they felt they could benefit? Was it in vision work?


Nicolas: Mostly vision, yes, and text. We started with two main projects: One was filtering out bad image content, and the other was determining which user profiles were safe for ad placements. This is a big deal for advertisers because they wanted to make sure that they could put ads on appropriate profiles and avoid any that might be toxic or insulting. We were able to classify those profiles using text, images, and user features. As you can imagine, this was an important revenue generator. And that was the beginning of the team.


Lukas: Was this shifting from an existing more traditional model to a deep learning model, or, a new deployment of a new problem?


Nicolas: There had been interest in those areas previously, but without deep learning it was almost impossible to perform at the required accuracy. Advertisers were expecting 99.9% accuracy which was not achievable just using tabular features and decision trees.


Lukas: These sound like applications that could be run on a batch process in the background. It doesn't need to run live on user queries, does it?


Nicolas: That's true in some cases, but not for images. When a user posts an image, an undesirable image must be hidden right away. Catching very bad images required a lot of real-time processing. In that case, we would have a target of a hundred milliseconds or less.


Lukas: Wow, this is 2016. So how did you get that working in real time in production?


Nicolas: So that was interesting, I'm sure you know the details of deep learning frameworks back then, but basically there was Theano, Lua Torch, and Caffe.


Nicolas: We weren't using Caffe, although it would have been a pretty good solution for this case. We were using Lua Torch for our model training, and it was great. However, before deploying to production it was more difficult, so we took what we had and wrapped it up into Skela services. That was so much effort to make sure it was working, stable and so on.


Lukas: So did you actually run torch in production?


Nicolas: Yes, we were running torch in production, that was a lot of effort and required so much engineering.


Lukas: And you have fifty milliseconds to make a decision? That sounds like a real feat. Were you doing that yourself, or handing it over to a different team?


Nicolas: We were doing it internally. We had to because it required a lot of expertise about deep nets, and what could make them go faster or slower, such as batch size, etc. So we had to do everything in-house.


Lukas: Were you retraining live? How did how did that work?


Nicolas: Later, we did. Firstly, we used deep learning only for images and text. Mostly looking for abusive content. After a while we started looking at other problems that were more fundamental to Twitter such as timing and ranking of ads placement, and tabular based like user features, and item features. We were trying to make the best prediction of whether a user was going to engage with certain content. We used deep learning for that, and managed to get better results than traditional techniques. The reason I started with on-line learning is because for ads placement it's very important to have access to the latest and greatest features. We needed to do online-learning like learning continuously with very high frequency because otherwise there is a quick decay of performance. Decay, the half life, maybe  1, 2, 3 minutes and then the model starts to decay. So we had to do online learning for that.


Lukas: You would retrain every minute?


Nicolas: There are multiple ways of doing it. One way to do it is just to do online learning right away, so just keep training online. You could also free some of the layers and only retrain the last logistic regression. But this is the easiest one. You can use some wide and deep architecture, for example. You could use the memorization part which is usually where the decay happens, and keep retraining that one, keeping the other one constant. Or, I think some companies, Google maybe, does that, they retrain regularly - every five minutes, maybe. They take the existing model in production, fine-tune it, redeploy, fine tune, redeploy etc.


Lukas: Do you ever go back and retrain it from scratch or is it always just online?


Nicolas: Yes, we do. Mostly, if we want to add new features, or radically change the model architecture.


Lukas: How do you even evaluate, then, if a new architecture is going to be better? That would be tricky, right?


Nicolas: We have to simulate the fact that it's online learning. In that case there has to be a time period where we stop training and we look at everything that's after, and then we can evaluate by keeping learning. It's possible to simulate the situation. It's more infrastructure.


Lukas: Wow. So this must've been an incredibly high amount of compute.


Nicolas: Yeah, pretty high. Interestingly, all CPU though.


Lukas: All CPU? Interesting.


Nicolas: I work at NVIDIA now but GPUs were not that easy to use in that context. There was less tooling. Now it's changing with, RAPIDS, for example. So basically data science accelerating on GPU know, a lot of libraries available now. But back then there was none of that. So we executed code on CPU, to make it really, really fast.


Lukas: Were there other pieces of infrastructure that you had to build to get this working in 2016?


Nicolas: Oh, yeah, 2016. Well, so you mean besides the inference serving? Yeah. We had to for the training part. One of the challenges we had is that a lot of our customers were used to decision trees and so often APIs and configurations. They were not familiar with Lua Torch. A few people were familiar with Lua engineering. So we had to hide the configurations. We built infrastructure to simplify our life, so they could copy paste configurations, and just specify the features they wanted and the steps they wanted to run - training, validation, whatever, and then automatically save down the road. So we brought a lot of automation to the training phase at the cost of flexibility at the beginning. Then it changed.


Lukas: And how did it change? How did it evolve?


Nicolas: Once the company started realizing the impact and the importance of this at Twitter, they decided to hire people who could understand and really invest in education. That was one. And at the same time we decided to move to Tensorflow for a centralized machine learning platform. Why Tensorflow? Because back then Pytorch was very new and unstable, not even 1.0, I think. So we moved to Tensorflow that also had an inference story, which Pytorch doesn't have. That's why there were so many recommender systems using Tensorflow in those days because they had this compute story. So we moved to Tensorflow pretty quickly after that for training and inference.


Lukas: I see.


Nicolas: Or training at least, not for inference. Only some part of it for inference - just the C++ library part. But the thing is, Twitter has their own data formats and serialization formats. So we had to play with that. So for example, Thrift instead of proto.


Lukas: Were there any there any other surprisingly challenging things at that time? Things that academics or people that don't work in these large scale deployments wouldn't know about. Any other tricky pieces?


Nicolas: So you mean from Twitter? Yes. In some parts there was a disbelief  around deep learning. I think it still exists in the medical field for example where people ask for interpretability or explainability etc, even at the expense of better performance. But eventually that will disappear.


Lukas: It's so funny. In 2005, I worked at Yahoo! moving models from rule based ranking systems to boosted trees. They had all the exact same complaints: these models are not explainable, they're impossible to deploy, making chaos on our infrastructure... It's exactly the same complaints, only now moving away from decision trees.


Nicolas: Yeah, exactly. The difference for our infrastructure is that we replicated almost the same API as what Twitter had for decision trees. So it was a little bit easier. It was already sort of ML ready. But in your case, I guess it was completely different, right?


Lukas: Yeah. But it sounds like you had to add some weird components and abstract away in the same way that we did.


Nicolas: Oh yeah. Just basically abstracted everything away and made it look the same to gain adoption. That was fun.


Lukas: So when did you move to NVIDIA?


Nicolas: That was a year and a half ago - 18 months.


Lukas: So tell me about the stuff that you've been working at NVIDIA.


Nicolas: It's quite different in terms of the application domain. However, I'm  managing the team, building the platform, to develop autonomous vehicle software. In autonomous vehicle software I include deep neural networks, all the validation required for it and so on. This is what I'm managing. It's a pretty big endeavor. The reason for that is autonomous vehicles are such large scale. So many models, so many people working on it. There are so many specific needs that we have to build relatively custom infrastructure in order to be efficient and good at it. And competitive.


Lukas: And do you mean it's custom infrastructure for self-driving cars or custom infrastructure for every individual team working on self-driving cars?


Nicolas: It's the nature of developing models and what we call perception. So the ability to understand the world, from the car, so multiple models, plus custom modules. Developing this requires a lot of customization in the cloud infrastructure. As an example, all machine learning teams use a waffler system in order to say, hey, I want to do this task and then do this task and then another task.


Nicolas: In the case of autonomous vehicles, the big difficulty is that the various steps are going to be in so many different languages, so many different libraries. One is going to be data preparation using Spark. Another is going to be - I want to run the actual software from the car on the target hardware, which is the actual embedded hardware that's racked in the cloud, but I want to run it on this using Kuda and then I want to run a Golang container. All of these things are so different that they require a workflow system that's agnostic to all of it. Basically, we had to develop our own customized infrastructure.


Lukas: Got it. I guess you're sort of starting to talk about this, but what I are the big components of the infrastructure that you built and what are the big problems that each component solves.


Nicolas: At the top level where we interact with our users, we provide tools and SDKS and libraries. At the bottom level we have car components, that enable this top level. At the top level we go from everything outside the car. When people drive, you collect data or test a new build of the software system, then take out the data or send it over Wi-Fi. Then it gets into the system. It needs to be ingested. So that's the first step, ingestion. Ingestion is already pretty complicated because it's similar to Yahoo! or Twitter, where you need to write heavy, and then have some way to process the data to transform it into datasets that are more consumable by multiple users. We have to do that. The challenges are pretty massive because we need to test for data quality, for example. Or we need to index the data, we need to process what it has to transform it into something that's easier to consume downstream as well. So that's the first step. Second step is to build the best data sets, and that's actually a big challenge. The way we approach it is we view machine learning, as, Software 2.0, as Karpathy laid it out. I don't know if he was the first. But where data is the source code for machine learning. So we need to be very careful about how we write our source code. To do that we curate datasets, create datasets, select the right frames, the right videos with the right filters, make sure there is no overlap between training and validation. So we have a lot of tooling for that.


Lukas: The tools - they don't actually do this, they help a user pick this or do they somehow automatically like pick the best...


Yeah. Both actually. We are also investing a lot in active learning since you were at Figure Eight you have a lot of experience there.


Lukas: Yeah, but I'm always fascinated by how other people do it too.


Nicolas: We published a blog post recently, exactly about that. Autonomous vehicles lends itself perfectly to active learning: Massive amounts of data, but very costly human labelling. So if you want to do 3D cuboid labelling, it's extremely costly. However, there are thousands and thousands of hours of data of driving available. So we really have to select the one that's going to be the most efficient, and that's going to find the pattern that the DNN, deep neural network, is not able to find. In order to do that, we use active learning, and active learning gives us uncertainty's cause. And the frames are videos with the highest uncertainty. These are the ones that we want to label to improve the performance. So we tried it, and we got a 3X higher improvement - 3X to 5X actually - higher improvements using active learning sample data versus manually curated - not even random - manually by humans...


Lukas: Human's guessing what data is going to be the best?


Nicolas: Yes, exactly. The challenge was to find VRUs - vulnerable road users - at night. It's a super challenging problem because for cars it's difficult to view at night using a camera. Pedestrians and bicycles are the two most difficult categories to detect, so we wanted to focus on these first. The first pool of people doing manual creation were told to look through the videos and find images that are relevant for these classes. The second group was just using the models to identify the frames that have the highest uncertainty for these two classes. So, one was completely automated and we were able to find frames that were very, very uncertain for both pedestrians and bicycles. We chose maybe twenty thousand of them. The manual curation produced the same - twenty thousand. They swiped through videos and when they found pedestrians or bicycles at night, they stopped and selected a few frames in that segment of video. We trained the model with these two sub datasets. We looked at the validation performance and the increase in validation performance was three times higher for active learning selected data. So it does work, and it can be completely automated.


Lukas: Wow, that's really impressive. Yeah, that's amazing. And that's in your blog post. We should definitely get a link to that.


Nicolas: I can share that with you. We were impressed too, because it was just a research experiment. And now we're working to automate that, and to be able to even automatically select our retrained models and improve performance. We could have a machine fabricating DNN and get it right with humans in the loop, just for the human annotations.


Lukas: OK, what else?


Nicolas: Then there's labelling, which I'm sure you're very familiar with because of your work at Figure Eight.  For autonomous vehicles there are some pretty massive challenges. First, the scale is massive. NVIDIA has a thousand plus labellers in India. We're doing it ourselves. We use software to dispatch requests to these labellers and manage the work. As you probably know it's quite complex because it's all about human workflows and the way they behave (including when they refuse, or make mistakes) so we have to integrate quality assessment into the loop. But also the UI tools themselves are pretty tricky. For example, sometimes we need to be able to draw 3D cuboids, and make sure we can link LIDAR data with image data. To link these two things we need to have a mix of human labelling and automated computing so as to build a new representation of the data that is then usable by those humans. Both at the same time makes it pretty complex and difficult to do. We will build on that. That's step number three, enabling the data. Then step four is about training.


Nicolas: So we've developed a lot of code to enable our developers to train their  models. One of the biggest challenges we have is that once we train, we need to export this. We need to do inference on an embedded system. And so we are compute constrained in a way. I mean, this is one of the constraints. We cannot deploy a thousand servers to be able to crush everything. We need to use a single chip to compute everything. So in order to make that right, we use multitask training, where we have one single model body that can predict multiple things, like path detection or obstacle detection like a traffic sign, intersection, and so on. I think this is similar to what Tesla's doing. I know they've done a talk recently on that. And then there's a lot of optimization such as pruning the models or in uint8 quantization or neural architecture search that we can use in order to even further reduce the size of the model with equal performance.


Lukas: So your tools to do all of this. Is there stuff left for a perception team at a customer to do?


Nicolas: Yes. We provide this as part of the call libraries. But of course sometimes they need to do something new, and when they need to do that they can add their own algorithm for example, then we basically productionize/platformize it. Also, on the perception side, they are really focused on looking into new types of predictions. They were predicting bicycles, and now they want to predict more fine grained things. So they will add classes of objects - things that we don't have to worry about. We just provide the core infrastructure.


Lukas: So you provide core infrastructure to do multi-task learning and quantization, but then the customer will provide the different types of classifications that they would want.


Nicolas: Exactly. There is a small overlap between the two, and there we help each other.


Lukas: Do you handle some of the newer stuff such as trying to figure out intentions or trying to map out the underlying dynamics of a person -  where their arms are, and head is - stuff like that. Is that within your scope?


Nicolas: Yes. Not my team specifically, but the perception team is definitely looking into things like that. That's usually more the research side, a bit more advanced.


Lukas: Does this work with different ML frameworks?


Nicolas: Yes we do work with the ML frameworks just because they provide so much value. For training specifically we use Tensorflow a lot, and PyTorch a little bit too - I think it's mostly historical. And then for deployment we use TensorRT which is NVIDIA's deep learning inference library. And, what's great is that it's optimized for NVIDIA hardware of course. It's also optimized for inference. So you can do some optimization of the graph, for example. So, we deploy using TensorRT. We get pretty deep performance gains with that.


Lukas: Cool. Wait, so is that the whole thing?


Nicolas: No, no - that's not all. There is evaluation: the validation of the model. So let's say you train one model that's doing obstacle detection. What you really want to understand is how the modifications of that one model are going to impact the overall system. It's a very tricky system that requires pretty fine grained understanding of the impact. So let's say we have this perception system that's a mix of post-processing, Kalman filters, neural networks and so on. They're all mixed in, in pretty complex ways. What we want is to have much better KPIs to understand what's happening in the system. For example, false positives, false negatives. That's step number one.  But then the next level, which is the perception API level - a higher level - how many mistakes do I make per hour, for example, of detection of a car. And then I also want to understand even further than that: how do I drive the car? This involves simulation. We want to be able to run simulation jobs with this new perception system to understand how the system behaves now. It's the same simulation.


Nicolas: So we want to do all of this. We have a system to basically evaluate all of these things that scale together, which is on the same infrastructure: same data structure, same dashboards, same output data and same analytics library. The output of this is all these KPIs plus what we call events. The first positive is an event, for example, we can define any event as anything. Once we have all of this, then the AV Developers can look at all this information. And this is the next step. This is what we call debugging, which is a bit  like Software 2.0 - debugging the output of a predictor. So we look at the output of the predictor, and at the KPIs, at the events, and then zoom in on all of these events in a very fine grained manner. We can focus on the prediction versus the ground truth, for example, to see if there's something missing when there is a lens flare. We can go very deep and then come back high, and then make a diagnosis about what's going wrong with the system.


Nicolas: This diagnosis is how we improve the system. This diagnosis tells us, for example, I need more data on, Japan at night, for example. Then we can go back to the curation step, which is building better data sets. This is the feedback loop that goes from debugging to this curation step and that helps us improve our perception system.


Lukas: So your user could automatically request more images of a bicyclist in the snow or something like that, and then let the curation step go out and look for more of those, or weight these more?


Nicolas: Yes, exactly. It's based on what the curation can do, which could be  geographical conditions or maybe temperature if we have access to that type of sensor.


Lukas: That's amazing! How do you approach making such a sophisticated system on behalf of customers? Do you build your own perception systems just to try your own software?


Nicolas: At the bottom of this end to end workflow, we have car components, which is our data platform and our workflow management system. Those two things are powering everything, to be able to write ETRs, be able to register data sets, for example, be able to get perform queries about data, and make sure that all of these things are traceable end to end, which is a major requirement for the autonomous vehicles industry so that if there's a problem in the future, we can go back in time and understand everything that happened. So anyway, all those things at the bottom are powering the top, and are pretty beefy and made for scaling.


Lukas: The first thing is data storage? Did I have that right?


Nicolas: No, it's data platform.


Lukas: Then a workflow management system.


Nicolas: Then workflow management system, yes.


Lukas: Gotcha. So the data platform is just keeping track of where all the data is or ...


Nicolas: No, it's a bit more than that. It's all the infrastructure required in order to store structured and unstructured data. So structured data could be anything like simple floating points, continuous values and raw data is all the sensor recordings, in general. We have all of this and we can organize it. And the second step is we want to be able to query all of this at scale. We use Hive, we use Presto and Spark SQL - Spark in general - to enable us to do all of this. This is what the data platform provides, all those pieces. Then the workflow management is more around the ability to schedule those complex compute and data access tasks, and stitch them together. We know we can organize data in a certain way: we know we can access the cluster, but then we want to make sure, like I explained earlier, that we can perform those Graphos tasks. Because sometimes we require a lot of scale, especially when we do evaluation of scale on thousands of hours of data. We need a workflow system that enables us to to do all of this.


Lukas: Interesting. One thing I didn't hear you say that I think a lot of people talk about is synthetic data. Is that interesting to you?


Nicolas: Yes it is. We have a simulation team. I think I mentioned it for a testing purposes.


Lukas: I wasn't sure if that was for totally synthetic simulations or something else?


Nicolas: Well, we can do both for open loop, which is no control and planning in the loop - no actual driving. We can replay existing data. So that's that's really good because then we can measure on real data. But for closed loop data which is driving in the real world, we need real simulated data. And this is where NVIDIA shines because we can generate  simulated worlds like in video games. And even more than that, we have the ability to generate all kinds of sensor data for the car. So not just not LIDAR data or radar data, but also Cam IMU, all those things that are car specific, we can generate all of this. And we have a special box that we call Constellation, which has this simulation generator on one side, and what we call the ECU - the embedded system -  on the other side that can process all those sensor inputs in the same box. So basically, do the exact simulation run, exact processing of the simulated data. We can do all of this. We can use it for testing and we can also use it for collecting data and training on data that just doesn't exist in the real world. So very helpful for bootstrapping perception efforts and bootstrapping new neural networks.


Lukas: It sounds like you have almost a complete end to end solution. Can I come to you with a car and some sensors and get a system that could make an autonomous vehicle for me?


Nicolas: Yeah, exactly. Yes, you can.


Nicolas: Except that it's difficult to change sensors. If you use a different sensor, we are going to have to re collect data to revalidate and retrain models or fine tune them and so on. But assuming that they are similar to what we have, or that you're willing to pay some money, we can redo that work entirely.


Lukas: So you're the perfect person to ask this question: What do you think is  left to do? I mean, I live in San Francisco, so I do see autonomous vehicles driving around a fair bit. What is there left to work on, to make it a real thing that I would use every day?


Nicolas: So you don't use it every day, is what you're saying?


Lukas: Well, I rarely get into an autonomous vehicle, but I feel like I'm in the industry...


Nicolas: Do you have a Tesla? Model 3 or...?


Lukas: I've played with them and I think they very, very impressive. It's a good point you're making... What do you think are the next steps with systems like yours? What are you thinking of focusing on?


Nicolas: First, I think this is going to be pervasive, and in the future everyone's going to have autonomous vehicle functionality. That's number one. But the vision goes even further than that:  Cars are going to become software defined, or are already on the way to becoming software defined.  People are going to see a centralized computer with a really nice UI/UX and they are going to be able to buy new software to upgrade their cars. This is already what's happening with Tesla. And I think that's one of the reasons why their capitalization is so high, their valuation is so high. But the other car makers are also looking into that model, and interested in it. And I think this is the future of the industry. So for us, it's also our future. We need to be ready for this world at NVIDIA. That means having an open, programmable platform. We want to enable all those car makers or Tier 1 to build those systems together on the same chip on the centralized computer. We don't want to exclude them from our chip. We want to enable people to write software on our chip. Infotainment software. Self-driving software and so on. And now, self driving is so difficult, we can provide it for them as a given application for big car makers or even smaller, and just develop it as an application for them. But then where are we going with that? I think it's a matter of performance and improving the control planning and the entire perception system. I think we're still at the beginning of it. We're going be able to do better and better and better over time by building a lot of automation first, by potentially adding machine learning in areas that don't have machine learning yet, such as predicting the planning, past, for example, and things like that.


Lukas: What I'm hearing is you think there's iterative improvement in a bunch of different things, and then there's applying machine learning to planning - these are just the next steps in making these systems work better. What is the stuff that people are really wrestling with right now to make these things really work?


Oh, I see. I think the hardest thing is urban areas right now. Like being able to drive in urban areas such as in New York City. It's really, really hard. That's the next frontier. And it requires all sorts of new signals coming from the car: intersections, lights, lack of lanes, unknown vehicles such as garbage collection, etc. Things like that can be very tricky. All of this is still newer. All the self-driving providers started with easier areas such as highways, except for some of the Level 5 stuff like Lyft, who are trying to leapfrog that. It's a big challenge. Yeah, that's the next frontier.


Lukas: Do you think your approach at NVIDIA is significantly different from a Tesla or Lyft?


Nicolas: All these companies are targeting different things. As a result there are some differences. Lyft is targeting Level 5. They want to have fully autonomous vehicles. Tesla is building cars. So they don't need to build a platform that's usable by other people. On our side, we built a platform and we make money with the hardware. We also make money with our software, but it has to be usable by everyone else. So we have to make it in a way that is central. This is one of the constraints we have. As a result, the platform we are building - car infrastructure - is designed in a way that can ported into other car makers, or other people developing self-driving.


Lukas: It's interesting because all the pieces of your platform that you mentioned, are super relevant to health care applications, and to almost any kind of deep learning application. Would you ever expand your platform to other applications?


Nicolas: Yeah, I think that's possible. Some of these pieces are not required, per say. And sometimes the scale we aim for is not required either. Usually what we try to do is build a superset of those tools and push the frontier a little bit further. Now some things are a little bit tied to autonomous vehicles, but the entire end to end workflow though seems very applicable. The tool sets that we built, sometimes were customized to the data we have. So yes, we could extend them. It would just require some work.


Lukas: This is kind of a specific question, but I've been thinking about this lately. How important do you think is the hyperparameter search piece, like the neural architecture search you're talking about? Is that really essential or is that a nice to have?


Nicolas: So the neural architecture search is still something we're exploring. It can be important because we can really reduce the compute footprint. For example, we can constrain the search space of our neural architecture search to something that's going to perform really well on target hardware, in terms of latency. NVIDIA has some hardware accelerators that are specific, so we can make sure that we target this and find the architecture. However, hyperparameter search is something that we have available, but the advantage of using that, just as the compute requires, is often not super interesting for developers. So, we do it sometimes, but it's not a big competitive advantage for us.


Lukas: Your platform sounds amazing and it solves a whole slew of problems. Is there like a piece of it that you're especially proud of, that stands out to you as best in class?


Nicolas: I really like the active learning part, and everything that goes around it. One other thing we are doing is what we call targeted learning, which is the ability to take out perception bugs. For example, I'm not able to detect, say, trucks in a certain position.  Use that, and sample a dataset that then is going to be used for training and fix the perception bug. Doing that is similar to active learning, but  conditional active learning. So I'm really proud of these two things because I really love the automation of it. We could just go on vacation and be like, okay, now let the system work. We have customers  sending their bugs and automatically we would just fix them.


Lukas: Cool. Well, this is so fascinating. Even if we weren't recording this for something, I think I would've really enjoyed this conversation. It's great to meet you. Thanks so much for doing this with us.


Nicolas: Thanks. I love talking about it.


Nicolas: It's great to meet you, too. Cool. Well, thanks a lot.

‚Äć

Join our mailing list to get the latest machine learning updates.