Evolution of Reinforcement Learning and the robot hand with Peter Welinder
Peter Welinder, Robotics lead at OpenAI talks about his love of robotics, the early days of reinforcement learning, and the evolution of the robot hand.
View all podcasts

Peter Welinder is a research scientist and roboticist at OpenAI. Before that, he was an engineer at Dropbox and ran the machine learning team, and before that, he co-founded Anchovi Labs a startup using Computer Vision to organize photos that was acquired by Dropbox in 2012.

Read some of Peter’s Articles:


Follow Peter on Twitter:


Lukas: I'm most excited to talk to you about the OpenAI stuff but I think we should start with your career, which I think is pretty interesting. You've launched a startup. You've run machine learning at Dropbox and then gone into Open AI as a researcher. Could you tell us a little bit about how you first got into deep learning. I think it was the startup, right, where you first use this?

Peter: Yeah, So, I did machine learning in grad school. I didn't really know what I was doing when I went to grad school, I knew I wanted to learn about how intelligence worked, like AI and you know the place where I started was being an intern of neuroscience. So I spent a fair amount of time just sitting in a basement and building these little micro dives you implant into rats' brains.

Lukas: Oh really? Wow I didn't know that. Wow!

Peter: Yeah. So I did that for probably half a year. And I realized how lonely that work was; you study for about twelve hours a day because you're a grad student and you have to get to work really hard and then building this thing and this whole thing takes like three months to build. And then at some point, you need to make a surgery and implant this into the rat and if something happens at that time, you're all screwed. You have to go back to square one again and rebuild. So, yeah, it's like if you go down this path, it would take a really, really long time to go through grad school. So, I realized at that point, neuroscience wasn't created for me. So then I ended up being wanting to focus more on robotics. But, you know, robotics has a similar problem, where everything takes a really long time to do because you have to first build your robot and get the robot to work and that's probably three quarters of your PHD. And then you have to do the experiments at them so instead I decided let's pick an aspect of this instead, which kinds of goes towards something more useful. And if it lies in robotics that's awesome, but it's great if you can have it in other applications. So I ended up doing a lot of computer vision and that's how I got started on my startup, which was doing image organization; things like finding faces in photos, finding what photos are about. That's how I got started into more machine learning and computer vision and eventually ended up at Dropbox doing more of that stuff there.

Lukas: What were the problems at Dropbox that you worked on?

Peter: So initially, the premise that we had when we started, which was really interesting, this was back in 2012, was that half of the files in Dropbox were images and there were like many, many billions of images, of photos and it was sort of like the dark matter of Dropbox. It took up all the space, but nobody knew what was in it and it was not useful at all for our users. So the mission me and my co-founder, after we joined was to see if we could start making sense of all that data and actually make it useful for people. So, there was a lot of pretty mundane things that you had to clear up first, like just being able to sort photos by date, stuff like that, extracting the metadata from the photos... So it was a fair amount of time spent on how do we deal, how do we get to index things like billions of photos and how do we just do simple things on them? Like if you want to search for GPS or search by timestamp or something like that. But, the idea was that eventually we could actually start extracting useful information from images. So, one thing that we realized through that work and one of the main features that i worked out while there, at least on Photos was it turns out that a lot of people use their Dropbox for a lot more than say the family photos and stuff like that. Over time, the use of Dropbox shifted towards more business users and turns out to be that if you are a business user, you take a lot of photos of documents. So photos are really, really boring. They're just photos of documents. You know people are kind of too lazy to put stuff in a scanner, so they just take a photo with their phone and they hope that they would find this photo somewhere, obviously, as though as soon as you've taken that photo like two days later it is lost in your photo library with baby photos, photos of food, photos of all the random crap you take photos of, you know. You're just taking photos of everything, right? So we built this thing, which, first of all was just finding those photos that had documents in them. Just bringing them up and showing them to the users and then think I'm doing more useful things on that, like actually you extracting the text of what's in the documents or...

Lukas: Oh so you would see all the photos? That must be such a magical user experience.

Peter: Yeah. Anyways, that was a really fun thing actually, because I honestly use these features probably few times a week. So, I don't know, probably just somebody whose eyes are still very dark. But I'm very, very proud of that. You can actually scan your documents using Dropbox as well. Just taking a photo using Dropbox's scanner. So I was a part of it. Even building that OCR experience was really fun because with my past in computer vision, this was like pre-deep learning, you know. This was like the old school computer vision and there was like Hogg features or pyramid on features, bag of words kind of stuff, you know, all these weird things. People don't know about them anymore because they don't matter anymore.

Lukas: Yeah

Peter: When I was at Dropbox, that's what really went on, like this whole deep learning revolution kind of happened. And it was kind of mind-blowing to work on computer vision problems before and after this, because it's like before, it's like that exclusivity comment where you want to see if you're a national park, you just look at the GPS but if you want to take a photo of a bird, a recognized species, you need five years and a research team, I guess that's gone totally. And anytime we would brainstorm about features, it would be like, "I don't know, maybe in a decade, we can make that work." And I think once we started using deep learning for stuff, it was like, you know, I remember with OCR, we took the best OCR systems that were out there and we created a benchmark using those like using Google SOCR, IB was using what's like I.B. what's one big know, the OCR and other text recognition company. And we sat down and we just started building our own OCR engine from scratch, where we come extract the text. We did the text recognition. Word recognition is on it.. In three months we had beaten all of the public dataset benchmarks, you know. And that was just mind blowing to me. Like, that's the stuff that would have taken so much longer before.

Lukas: Wow! That's amazing. What year was that?

Peter: That must have been 2014 or something like that.

Lukas: Wow. Was it caffe like?

Peter: Yeah. It was also one of those times.. Like we started it when Caffe was still a thing and by the end of it TensorFlow was a thing. Like it was just so much chasing from bug to bug for deep learning, I think we even had like Theano was probably the stuff we sort of prototyping it, we get at this stuff like the library's just changing every bug, you know. But yeah, I think what we shipped in production initially, some part of it was in good faith probably not anymore.

Lukas: But that must have been challenging to just run that on every document, I mean, that seems like a huge production challenge.

Peter: The truth of deploying machine learning systems in general is that, you know, we did that three month stint where we just created the algorithm and so we got it all working. And then it was like a year to ship the feature because all of that stuff you don't like actually putting it together in production, making sure that the errors are not disasters but then also scaling the thing up and doing it in a way where you can you take the cost of running it for phone photo and then you multiply it by a few billion then there's very high numbers, then you give that to some finance person. So there's a lot of funding for optimization work and stuff like that but we've got it down to a place where people are happy with it. I don't know what the status is now but I think it's probably one of the things that you still have to actually be a paid user for dropbox to run this. I don't think we have it for our free users.

Lukas: Where there some tricks to getting the size down and the cost down, I'm trying to remember what people did back then, did you do quantization and stuff yet?

Peter: I think it was in the very early stages of that. So I don't think we did stuff like that. I think at that point it was mostly that once we had gotten everything working, it was a very manual process like, can we get away with a smaller network, you know? Can we have five layers instead of six layers making it just all about finding where everything is, but also I feel like even then there was also pretty early in the state of these neural network libraries. So even doing optimizations on those libraries and doing little optimizations to just make them run on the particular architectures that we had on the machines, stuff like that. Those things mattered, but all that's done automatically for you now, but that was the good stuff. I think we have at least one or two people who work full time on this for a few months just to reduce the speed and the compute footprint for these things.

Lukas: And then you left to go to Open AI and you work on the robotics team, right?

Peter: Yeah, exactly. So I kind of always wanted to work on robotics since I had been in grad school but again, I abandoned it because I thought there was a little bit too much work actually working on the robots. If you build a robot, you have a whole system and all of the things were kind of broken, communication was kinda the promising part and the really cool thing that I had started noticing was that, like that was kind of when obvious results come out where deep learning where it was doing really well along on simple computer games and so on. And deep reinforcement learning in particular was the thing that people actually started to get to work. And I started feeling at that point that deep learning solved a lot of these perception things in robotics. And it's kind of the thing where before deep learning, it felt like you just didn't know if anything was going to ever work and after deep learning it was like, "yeah, it'll probably work if we have enough data." It's a very different feeling, feeling that you kind of know how to get there, if you just had enough data. And you need to like, obviously, it is a hard place. Like it's more of a solvable engineering and product problem to figure out how to get data. But you still have the thing with robotics; that there's this other aspect, which is the control part and they control it's also really, really hard. And what's really promising is that those early, deep reinforcement learning where it was that suddenly there was kind of a learning based approach to control that seemed to scale to more interesting action space. And what I mean by that, is you can just manipulate all the joints on a robot, for example. So I knew some of the people who were working on that at Open AI and they were just starting up a robotics team so it seemed like a really good time to just get into deep reinforcement learning and see if we can actually get robots to do much more interesting things using deep learning.

Lukas: So how has it evolved? Do you still feel deep reinforcement learning is as promising as it felt in 2017 or whatever year you joined the robotics team?

Peter: Yeah, I think at that point I felt like there is a chance that this could work. Now I feel like this is totally the... It should work, there might be other ways to get there, but if you like, this should work. You know, in the limit, this will work out. How long it would take to get there, it's really hard to say. I feel like it's always like one or two years away. People thought we'd be one or two years away for more than one or two years. But I feel like there's something fundamental with deep learning and with deeply reinforcement learning where it really feels like this should be able to solve the problem relatively far. By the problem I mean, getting to more general purpose robots - robots that can do more of the things that humans do, actually move around in home, not be locked into a factory, but actually dealing with all the complexities of the real world. I very strongly feel like there's something that the way you need to tackle this just because of the complexity of the world is really to learning and deep reinforcement learning is just such a simple paradigm where it seems like most other things could be much more complex. I guess my bias is complex things have never really worked, it's the simple things that really, really work. That's kind of what I saw at Dropbox, there was always the simplest approach is the way... If you try to be a little bit clever with algorithms and stuff, usually you would end up being disappointed. The most important thing was really setting up the data and I think that's something very fundamental with deep reinforcement learning that makes me think that if we can push it far, we're kinda really just getting started.

Lukas: When you say, work or push it really far, what are some of the things that you see so far that makes you think that it works? And then what are some of the things that would make you feel like, wow, this is really successful?

Peter: That's a good question. So first of all, I don't know if all the listeners would know what deep reinforcement learning is, so I'd like to describe that a little bit more. So reinforcement learning is really about learning from trial and error. A lot of machine learning is based on supervised learning where you show examples and then you have a label. But reinforcement learning is again like trial and error and you're doing a series of actions to get maybe some kind of score at that. We call this the reward. Like you do something that gets rewarded or punished at the end. But you really you should talk about reward, you know, we're all optimistic. But this is the core algorithm. It's very, very simple. And, you know, the reason I feel it is promising is that the biggest issue around, the biggest criticism that reinforcement learning gets is that you just need lots and lots of experience. You just need to do so many of these trial and errors in order to learn anything. And so people usually don't like reinforcement learning or robots because you cannot do that on a real robot because first of all, if you do anything on a real robot and you don't do it very controlled, you're going to break the robot, you're going to break the things around the robots. It's kind of dangerous to do it. And the reason I think some of that criticism is misplaced is because we can just do a lot of that in learning and simulation and some of the things that we showed at Open AI over the past two years has been that we have really focused on this problem of seeing if we can solve robotics problems in simulation and taking those agents that we have trained in the simulator and putting them into real world robots and seeing if we can do the same thing that we trained in the simulator, on the real world robots and the hypothesis behind this, the somewhat controversial hypothesis is that if you have any problem in a simulator, you can solve it using reinforcement learning if you just have enough compute. And you can have really complex problems like Go or like DOTA, which is a computer game. These things require a lot of strategy and so on. And you can take those and you can still solve them with enough compute. So we trained an agent in the simulator to operate a robotic hand. And we got this to solve the Rubik's Cube in a simulator and then by setting up the environment in the right way in the simulator and throwing lots and lots of compute at it, we were able to train a robust enough algorithm to then put it on a real world robot and have it solve a real world Rubik's Cube. And I feel like this was a hard enough problem, a manipulation problem, that's tricky for humans even, to do it. Like we had one hand that was fixed to a wall, and we moved it very much. And, you know, it can still do this thing, so it's a hard manipulation problem. Still, we can solve it using reinforcement learning, solve it with a real world robot. So in some way that kinda gave me enough confidence where I now feel like there must be more problems we can tackle using this approach, like a lot of things would be easier than solving a Rubik's cube.


Lukas: When the robotics team got started, what was its like, charter? Did you know that you're going to do simulation? Did you know that you were going to do reinforcement learning?

Peter: I think the short answer is going to be that we didn't know at all what we were doing. You know, I guess. We had this goal. You know, it's like we want to build general-purpose robots but I don't think we had a super clear idea of how to get there. I think one core belief we had though was that deep learning would be a big part of it. Reinforcement learning would also probably be a big part of it. But exactly what kind of different flavors of reinforcement learning and so on, that we didn't really know yet. There was a philosophy around, can we take some of these approaches that are pretty simple and by really pushing them super, super, super far, can we solve really hard problems with them? So I think that was our overall strategy. So we kind of hoped that just taking really simple reinforcement learning algorithms and putting them on a really, really hard problem would be successful. But I think we were somewhat scared for the first two years, but maybe this one worked out. It's like it really comes like that a lot of times. It seems like every time this robotic hand broke and we had to send it off for repairs and we'd have like a month sitting there and thinking about our mistakes and we'd be like will this ever work? I'm not sure. It is probably completely the wrong path. But in the end now I think us believing that is stronger than ever.

Lukas: Why did you choose to manipulate a hand? Like I feel like if I was trying to build a general purpose robot, I might even leave out the hands. It seems like the hand has got to be the most complicated thing and I feel like in movies, robots don't even have hands. like maybe they don't even need them, I don't know..

Peter: Yeah. You know, it's interesting how that started because the first problems we tackled, we didn't use a robotic arm, we had one of these Pet robots, which is basically a mobile robot with a robotic arm and like a two finger gripper. It's a super simple robot and we would even just screw it into the floor so it couldn't move. So it was just like a robot arm, basically. A very expensive robot arm. And so that's how we started it. But what we realized when we were doing that, we started with the simplest of problems in robotics which is block stacking. People have been stacking blocks for 60 years. I kid you not. There's stuff like some movies from Stanford in the 1950s or 60s where they have robots stacking blocks out. So, you know, we got to start simple.That was one of the first things we were doing. And we had this realization that even the simplest thing of just picking up the blocks and the manipulation of those blocks was pretty hard. So then we were like, "OK, so we need to solve this problem, doing manipulation." And then we were like, "We need to be pretty ambitious about this. Let's do the hardest thing we can think about. Let's have a robotic arm." There was another thing that we did at the same time, we went to a robotics conference. We asked people, a bunch of roboticists from across the world and we asked them, what is the hardest thing you can imagine doing in robotics right now? If we take a really hard problem and we can show that deep reinforcement learning work in this, where would you be impressed? And then all of them would answer like, "Well, the problem I'm working on is really, really hard" If you push them enough, two things became really clear. One was high degrees of freedom, like having lots of joints in your robot. That's kind of hard because a lot of the control theoretic approaches just they don't scale very well with a number of joints on the robot. So, you know, a hand is like if you have a robot arm, it's like five robotic arms which is hard, you know? It's really complex. It doesn't really get more complex than that. The other thing people said was doing things with contact is really hard, where you're actually like manipulating objects. So that's why we felt, well, if we really want to convince people that deep reinforcement learning can solve really complex robotics problems, let's just pick a really hard problem. If we solve that, we feel like we're finding where we solve with the manipulation problem for that particular robot, we won't be afraid of manipulation problems anymore in some sense. So a Rubik's Cube was... Once you have the robotic hand, then it's like what do you do with it? I can if it just stuck to the wall, why do you need to put something in the hand and, you know, a ball or something is not very exciting. So what is the most complex object we can think of? A Rubik's Cube. It's pretty complex. So that's how we got started on the Rubik's Cube in the robotic hand. In hindsight, I don't know how smart that was, but it gave us a really tricky problem to work on.

Lukas: Interesting. You would have done something else in hindsight?

Peter: I mean, I think, you know, when we started out with this project, it was kind of pretty crazy. So we did this thing where we started solving it in a simulator and we thought, okay, this is going to maybe take half a year to solve in the simulator. It's going to be tricky to come up with the right reinforcement learning algorithms. We'd probably have to iterate on the kind of algorithms and stuff like that. Then we started on it and then within two or three weeks we had solved it in the simulator. So we went, "Holy shit, that was simple." So we can probably solve it in the physical in under another month or so. And those were like the famous last words. Then it took another two years from that point. I definitely feel like there were certain things we didn't know about these robotic hands, like just the fact that nobody had run reinforcement learning algorithms on these robotic hands before and when you train a reinforcement learning algorithm from scratch, and also, if you want to train it in the simulator and deploy it on a real hand, it has no respect for the delicacy of the hardware. It would just push all the motors in the maximum speed in different directions and we would bring in the manufacturers of the Shadow Hand and show them what we were doing, they would just watch in horror like, "We only run it for five minutes and you have it running all day?" Within the hour, you'd have one of the fingers, the thumb or the little thing would be loose or hanging off by a thread.. we would completely destroy the robotic hands and so I think there was the iteration time on this hand was really, really long. I definitely feel like it's one of those things. If we had picked a simpler problem, we'd have completed it faster just because the physical aspect of waiting for repairing hands and figuring out the dynamics of this really complex hands and it's definitely easier to tackle a problem if you start from a simpler problem and make it more advanced than if you pick a really advance problem and just go at it because then you don't know where the issues are. And I think it took us a long time to narrow down and shrink the complexity, to then be able to expand the complexity again as we were solving this task. So I think this is the main thing. If you could buy super robust, industrial grade robot hands that move, that might have been different but basically it was like two companies in the world that made these robot hands because nobody knows how to use them.

Lukas: I'm surprised it's even two because I've never seen a robot hand except your robot hand.

Peter: [Laughs] Right. Oh, my God. You know, they sell it to these research institutes and effectively, they tell us that they go to these researchers and they sell it and then two years later, they visit them and the hand is in this pristine condition. Nobody dares to touch these robots because they're so complicated.

Lukas: So tell me about the team; how big is the team working on this? And how do you divide up roles? How do you break apart such a difficult goal that might actually be impossible into smaller pieces? And what does a performance review look like? Do you actually do that?

Peter: Those are good questions. I think we've learned a lot about this because it's very different from a lot of other situations. When I was at Dropbox, it's all about you want to ship a product and you do everything to ship the product in here. At Open AI, we have a pretty ambitious goal of building just more general A.I. algorithms and eventually general intelligence. And so we want to set really ambitious goals for ourselves where we can really feel like we can push the envelope on what we can do with A.I. and that's maybe tricky to make that into something concrete, especially when you have lots of people working on it. Because the other thing that we pretty strongly believe, especially in the robotics team, is this idea of having just more of a team effort to achieve big things. There's just too many things with robotics that you kind of need to solve where you can't just have one or two people working on it. You need a bigger team. And right now, we found that sweet spot has been around somewhere between 10 and 20 people, in terms of the size of the team. If we get bigger than that, overhead starts slowing you down quite a bit but if you're smaller than, like 10 people, it's relatively hard to make progress just because there's a little too many things to do. So what we tried to do is to have pretty concrete goals. So for example, we knew that we were working towards solving a Rubik's Cube for like two years. So this was a very concrete goal. Once we can see this robot hand solving a real Rubik's Cube, then we've solved this problem. Having a very concrete goal like that makes it easier to focus and not digress too much, because if you're doing research, it's like walking through a forest and you want to get to a mountain, and there's all these nice fruits and berries around. It's like you just want to go, "Oh, this looks really good. I want to taste this for a while and see what I can do with it." It's very tempting at every point in time to just stop and explore for a really long time. But if you want to solve a really big problem, you have to be much more focused than that. And seeing this thing in the race and this clear goal, it helps a lot. So that's been one core component of how we do things; is this clear goals and then having the whole team work towards that and more in the philosophy of a startup, but less maybe short-time priorities, like it's we kinda have to try out really ambitious things and things that will fail with very high likelihood. It's like we want to leapfrog a lot of other approaches with the process we take.

Lukas: I guess my question is like I'm imagining and that makes total sense, but what is everyone doing?

Peter: You know, yeah. It's a good question. So what is everybody doing? If you ask anybody at any point in time, what are they doing? They're going to tell you "Well, I have this bug I'm trying to figure out this bug." It's like any engineer's life. It is what you're doing almost all of the time - working on this bug. But it's different levels of bugs. So usually the work is split between doing some engineering towards building up tooling to understand more of where we're going in terms of our experiments and running our experiments and so on. Or engineering in terms of running our training, like training our models and stuff like that, or a lot of research on where we work. What I mean by research is more coming up with new algorithms, trying them out, come up with hypotheses, trying them out, figure out the best way to set up experiments. Sometimes that involves doing something that we have come up with by ourselves based on where we are in our research. Sometimes that's a new paper that's come out that might be promising, let's re-implement that and see how that compares to our work towards our baseline. There's just a lot of different things going on. Which is really interesting because it's like one thing that's very different from, say, working at a company where also you're working on a feature, you're working on that feature for often times, at least a quarter, often many quarters of a year, you're working on the same thing here. Like things switched very, very quickly. It's like you're working on one thing for a week. Then you're working on another thing for maybe three weeks and then another thing for a week. Each product is very different. It might be like, let's make these things faster, let's dig really deep into crude optimization for training faster and another day is like how do I control this new robot that we got? You know, then another day is like how do I render things really quickly in open G.L. or unity or something like that. But it's just highly varied work.

Lukas: That sounds so fun, I wanna work with you.

Peter: I could tell you it's pretty boring. It's definitely one of those things, like whenever you get bored, there's another product just around the corner that you could jump on. So you learn a lot. It's really nice.

Lukas: I actually don't know if you have thoughts on this, but I was wondering, just one really practical thing, when I talked to you maybe a year or two ago, you were completely like, "hey, TensorFlow is the best language, it's clear that's the thing to use." And then you guys switched to PyTorch, why? What happened and how did you even.. it seems like switching a framework mid project sounds unbelievably daunting. What prompted it? How did that come about?

Peter: I think most of these things happen pretty bottoms up at Open AI. Everybody has their main project and this one or more side project, it's just like a natural thing, right? And then the other areas are probably you always want to try something, some new tool so you can learn a little bit more. So people started playing around with PyTorch for their side projects and you kind of pretty quickly realized that their code is much shorter and much more pleasant to read and much faster to iterate on. You can just get all the data out in the middle of your network without running it through your graph and extracting it from a graph like that's an intense process. It was just a much more pleasant tool to use for people for their own projects. So then what happens is that you have this very separate. Then when you start your next project, I think some teams at Open AI are smaller and they have a product that runs like a month or two and then they try different products and when they switch products, that's a pretty easy point at which you can switch to a new tool. So that's what started happening, you know, and then some teams started building those tools upon them PyTorch and then the other teams are like, "Oh that tool, looks really nice. Oh that's PyTorch." And then suddenly, this formal starts growing with other teams and eventually it's too much. And I think we just realized that people had adopted this tool and we should just go with the flow and everybody should adopt it. I think the other thing was we started building more and more really good tooling and we wanted the whole company to start using that tooling because it made everybody move faster. Luckily for robotics, for example, I think robotics was probably one of the biggest teams that had to face this switch. And we were pretty lucky in that when we released our results with Rubik's Cube. And then we had some time where we could take a step back and do a little bit more refactoring of our tooling and change the framework. I wouldn't have wanted to do this in the middle of a project, as you said, like that seems like a recipe for disaster. You know there is this thing like whatever job reimplements a reinforcement learning algorithm, even if it's the same person that reimplemented last time, it's still going to take them a month to get it right.

Lukas: Totally. So what other, internal tools are you really proud of? What stuffs have you built and do you have any plans to open source any of it to other people or is just for people at Open AI?

Peter: There are a few things that we have released that we feel like have been really useful for ourselves and I feel like a lot of people have adopted them. So that's some recognition that it's been useful for other people. I think the biggest thing is OpenAI gym has been there since the beginning of Open AI more or less. When Open AI was founded was around the time where reinforcement learning started to work again with deep reinforcement learning. People would just reimplement all these very basic environments in which you would benchmark your algorithms and OpenAI kinda built this library called OpenAI Gym, which has all those environments that people are benchmarking on or to implement, that we could just use that. And then it has a really simple abstraction layer and a very simple interface. So people would just build more environments on top of that API. So that became really popular. I think that's a really good one. I think there are two others. Whenever we come up with new algorithms that we find that we use ourselves a lot then we release them. So, for example, we have this baselines library, which has a lot of reimplementations of reinforcement during our games. Getting those implementations right is really, really hard. And so releasing that, it's good because we've seen that it saves people a lot of work. So we've done that. And the robotics team in particular, we want to, as soon as we can, separate out some core components of our workflow. We tried to do that. Like we did this with something called MuJoCo pi, which is a python wrapper for a physics simulator called MuJoCo, which we use in all our work and so we just released that.  We would have released it quite a long time ago but, you know, once it was stable enough, we released it. Similarly a rendering pipeline we call it Orb, we have also released that. So usually, we try to open source things. Now, the tricky thing is that we cannot open source all the things we're working on, not because we don't want to, but because it would add a lot of overhead because often times code in our repositories, it's not very long lived. Most of it, I would say, like 90% is not used after half a year to a year because it seems like there are all these hypotheses that we're trying out and most of them fail, you know. So it's like you're left with a bunch of code that you basically have to delete because it doesn't matter. And we don't want to release stuff just to release it if we don't really believe in it. So it's really the stuff that survives that we want to release and that's kind of the philosophy we have around it but when we do have those components, you know, we just try to release them.

Lukas: Interesting. I also want to ask you, I mean, this is a kind of a loaded question coming from me I realize, but I feel really proud that you guys use our product Weights And Biases or WandB, I'm curious, you could say a little bit about how you use it. I'm not trying to turn this into, like, a infomercial. I'm genuinely curious on what your workflow is around it, because I see you using reporting more and more.

Peter: Yeah. No. I mean we have been using it for a while now and it's also one of those things that we started using as a robotics team because at some point, as the robotics team grew, everybody was basically running their own Tensorboard graphs on their computers and pasted some graphs in Slack and sharing with each other. It's really tricky to keep track of all that stuff. So then we ended up using it a lot for tracking our experiments. And I think that brought us a certain level of sanity in all the chaos that was all of the research that we were doing as a team group. I guess that the latest feature that we have now started using quite a lot are these Reports. And I feel like a pretty pro user of the feature in some way, because it's not just putting them into graphs, but it's also putting them together in a nice report. It's one of those funny things where it's adding a certain level of process and bureaucracy to how people create these reports but we found it's been super useful because when you're a small team where you're two or three people and you're talking all the time about your progress and so like everybody has this mental states of what is happening. But once you get a team bigger than maybe five or six people, then giving each other feedback and understanding what other people are working on inside can be really hard. And it's like this n-square problem where you cannot meet to talk to everybody and pairwise those shots going on. And so really Weights and Biases finds out information from one person to all the others in an efficient way. It's really, really important. So the way we use these reports is that we're actually pretty strict about it now. If you're running anything, an experiment or you have some kind of recycle positives that you're going after and you think it's going to take more than a day or two, we would like to very strongly push everybody towards writing a report. And what goes into a report is, What are you doing? What is the experiment that you're going to be running over the next few days? Because you're probably going to spend thousands of dollars into your retirement, lots of your own time on it. So it's good to spend at least a few minutes to justify for yourself and others what are you going to do? It's not we ever say no, you shouldn't do this. It's more like we can say, "Oh, I don't know if I believe in that, but, you know okay, that's fine, at least now I understand it." If you can write down what it is you're going to do and we tried to make it, as you predicted, from a scientific standpoint in terms of, "Here are my hypotheses and here's my plan for my proving or disproving these hypotheses." And then, you know, the report is usually a number of graphs and stuff that we have from our training runs and so on that are like example photos that we've generated as part of our evaluation scripts and so in these reports. And we've just wanted it to be useful because it's a place...it's a little bit like rubber ducking. You know, you're talking to yourself as you're doing this. It forces you to clarify your own thoughts. It gives a way of having other people learn both from the kind of positive outcomes as well as the negative outcomes in terms of the experiments you run. And I think another thing it does is it also forces a reduction in stigma around things not working out. Because ultimately, again, in order to do the stuff we're doing, most things are not going to work out. But if you sweep those things under the rug every time it doesn't work out, then it looks like somebody is only doing amazing work and things are just working all the time and you don't hear about all those 90% of the times it didn't work. And that's usually how it works out in papers, it's like you see all these papers and people coming out and it's like everything is working, but you don't know about all those things they tried that didn't work out. And you know what we do internally, then is like you can look at those things that didn't work out. You see that okay other people are doing this experiment, they had this belief that it would work, but it didn't turn out working. And so you don't feel as afraid yourself to pursue experiments because as long as you have a good reason for why it would work, it's OK if it didn't work out, you know?

Lukas: That makes me so proud to hear that. I'm so glad that it's useful for you.

Peter: Yeah. It's awesome.

Lukas: So we always end with two questions. I'm wondering how you'll answer these. The first one is, what's a topic in machine learning that you feel like people don't talk about as much as they should, like an underrated topic?

Peter: I don't know if this is something that people don't talk about, but it's definitely a thing which we don't understand well enough, which is understanding when our algorithms are uncertain about what they're doing. For humans it's very natural when you don't know what's going on, you slow down and you are more perceptive, you think more and so on. And the algorithms that we have today just make split decisions all the time. You know, they don't think very much at all. They just open their eyes, see something and just react and that's it. It's like when we're walking into a dark room, we run around and flail our arms. No, we feel our way around it, we take it easy and, you know, our algorithms don't do that at all. And so they giving them a sense of either self-confidence. My PhD advisor would sometimes a bit meanly comment on people being high confidence, low confidence. That's very much how our algorithms are a lot of the times. It's like this should be a little bit lower confidence, long time and not try to be as confident as such to make split decisions.

Lukas: I love it. All right. Here's my last question. So if you look at the projects you've been involved in from inception to deployment, what's generally been the biggest bottleneck or the thing that makes you the most worried if you're doing another project to get it deployed? It sounds like hardware was the biggest issue.

Peter: For robotics, totally. it's definitely one of those things. If you can get good hardware that's really useful, really reliable, then you should just pay all the money you can. I remember when we started at open AI, we were like, we can't get by on these $100 webcams. So software robots now it's like, how much is this camera? It's $10,000. It's probably worth $12,000 for this camera because it would save me like half a year of my misery. So, you know, that's a big thing. I think generally one thing that always gets me a little bit worried is when you don't start with the simplest things, you know. I really think this is one of the core things, like you cannot for every project you start out with you should start with a very strong but simple baseline. It's just like if people don't start in that direction, if you try out more complex matters, it's just gonna be they often got to be like, in terms of the parameters, exponentially harder to get to work, basically and then you can do all this work and you're gonna find that you may make it work then but then if you tried the simpler approach and it works, you just gonna feel really embarrassed. And that should just teach you that you should always start with the simplest thing. And  then you could try these more complicated things, but if you cannot beat the simple thing, after a while, you're warmth for this simple thing increases and you're like, "actually, maybe I should just use a simple thing" and you kind of learn to appreciate the simple things. I think this is one of the core things I always look for is, are we trying the simplest thing possible? Because it's like that's probably the thing that's going to work in that.

Lukas: Well, what a great way to end. Thank you so much, Peter.

Peter: Thank you so much. It was great being on your show. Thank you so much.

Join our mailing list to get the latest machine learning updates.