Product Management for AI with Peter Skomoroch
Machine Learning Executive & Entrepreneur, Peter Skomoroch shares his experience building and running successful data science and machine learning teams
View all podcasts
TRANSCRIPT

Our guest on this episode of Gradient Dissent is Peter Skomoroch!

Peter is the former head of data products at Workday and LinkedIn. Previously, he was the cofounder and CEO of venture-backed deep learning startup SkipFlag, which was acquired by Workday, and a principal data scientist at LinkedIn.

Follow Peter on Twitter and check out some of his other work:


Lukas: So Peter, what are you working on?


Peter: So I'm spending much of my time at home like everybody else, but I'm doing a bunch of writing and some angel investing, and I'm actually doing a bunch of gardening as well. And so I'm taking this time to kind of reboot my garden. And one of the problems that I have that's driving me crazy is raccoons. So I'm in San Francisco and there's a lot of raccoons in our neighborhood that are just tearing everything up. Pete Warden has this great book, Tiny ML.


Lukas: Oh, yeah.


Peter: And it uses TensorFlow Lite. I'm sort of going to play around with that and actually there are these robots; you can get these on Amazon or from Wal-Mart or something. It's really cool. It's called Mekamon. The company, unfortunately no longer exists, but you can get these kind of cheap. Somebody did some bluetooth sniffing and actually put out a Python API so you can actually control it. It's got servos, It's pretty robust, can move pretty quickly and so I'm going to do a project to make this sentinel in my backyard to protect the garden from raccoons.


Lukas: Hopefully non-lethal.


Peter: Protects. Protect and serve. Yeah, non-lethal. But I think raccoons are afraid of things like dogs. So if I can get this to bark like a dog, maybe that would work.


Lukas: I feel like the San Francisco raccoons are afraid of nothing. I don't know. I'm curious how this goes for you. But I got to show you, I got...


Peter: You got the same book!


Lukas: This guy is bigger (show him a robot). Put this together myself, bought some kit off of Alibaba.


Peter: By the way, there's an open source project called Spot Micro. If I go and buy a 3D printer, I could put it together but it's like an open source spot Mini. I think it might have been open AI or somebody else; they put it in their simulation framework. So there is a model, Mujoco I think it might be. So you can actually train it in a simulated environment but I think it's still very nascent . I haven't seen an actual working video of this thing working so I think people are still trying to build it. It's not working yet.


Lukas: I want to start off with going back to your first job or you doing ML data science. I can say maybe even before LinkedIn. AOL, I think it was, right?


Peter: Well, yeah. Even before that. So I think when we met I was I was at AOL or just finishing up at AOL Search. And so that's an interesting conversation in and of itself. I was working on the search team there. I had just started. I came there from M.I.T. and the week I started, there was a release of user search data. That was the weirdest first week on the job ever, for sure. Everything was in disarray when I started. But going back, so first job, I mean, when I was an undergrad, I worked in physics and neuroscience and so machine learning was, I think, kind of seen as voodoo to those folks back then e.g there was this thing called deconvolution of signals. I worked on a project. It was a summer project; we were working on anti-matter, which sounds cool, right? But positronium is basically a positron and an electron that forms an atom, like hydrogen except with matter and antimatter. Anyway, long story longer, the hard part was you're taking these measurements, real world sensor data, you're trying to detect the decay of this  atom. Because when the positron and electron are near each other, eventually they annihilate and it gives off radiation. And so you have all these sensors and equipment set up to measure that annihilation but the problem is, everything that you're using to measure it, pollutes the signal. So what you actually measure is this convolution of the raw data with those signals. And so at the time, one of my projects was to deconvolve that signal. So deconvolution is one of the core machine learning problems from back in the day. It's kind of like the cocktail party problem. You have a bunch of people speaking and you need to disentangle the different voices in a recording. Anyway, that's actually what I think was the first work that I did in neural science, I was really interested in neural networks, basically and interested in process. I found the processing of the data more interesting than the actual experiments, which are, you know, a lot of hard work and a lot of time in the lab. So I really just dug deeply into signal processing and machine learning. And so before I got to AOL and I actually had two other roles where I cut my teeth on big datasets. The first one was actually back in the first dot.com crash. It was a company called Profit Logic, and they ended up getting acquired by Oracle and became Oracle Retail. This was back in the day when they'd ship the data to you on tapes. So if you had a customer, to get access to their data, people say it's harder now to get access to customer data when you're dealing with enterprise customers but back then, we actually had to get tapes of data sent to us - a lot of point of sale data, retail sales data, and we were building predictive models for retail sales. Then I worked in biodefense at M.I.T., which is unfortunately a relevant topic today with the pandemic happening but we were working on these kinds of things. How do you detect and measure and prevent these kinds of things, both naturally occurring as well as, you know, you can imagine a terrorist attack with some kind of weaponized bio agent.


Lukas: So you did predictive modeling in the first dot com boom.


Peter: Yeah


Lukas: You got tapes, how big is the data? What kind of models are you building? What's your tool set, satellite?


Peter: That's a long time ago.. Thinking back, you know what's funny? We were actually using Python, right? So you're using Python and C++. And so at that time, almost nobody was using Python.


Lukas: Google, right.


Peter: Google. That's right. So it was Google and then a handful of other startups around the late 90s, early 2000s but in the enterprise, Python was not really widely adopted yet. This was really close to the metal kind of work because you'd get the data; I don't remember the exact volume, but you can imagine a customer like WalMart, right? They have thousands of stores, thousands of products and each of those products has different sizes and colors and styles. So really at the SKU store level was the granularity of data and then every transaction so similar to what something like Square might have now where you have somebody buy a coffee, you get that transactional event. But at the time, the lag was really long. So these are brick and mortar retailers and so they were mostly running on Oracle or DB2 or something like that. And they would have their point of sale system and then that would be aggregated up, all that raw data usually to something like, this shirt, it would be a SKU and if you look at the sales of that SKU, you could see the sales nationally. Then typically what would happen is, there's markdowns. So two weeks and you lower the price and then there's a sales bump depending on the elasticity of the item, the price elasticity. And so really, the machine learning problem there was; the first few days - actually, I think we cut it pretty close, we had a weak turnaround time - we would get the raw data and then we would run our models; so some Python and C++ models to do forecasting and optimization and model fitting. Then we would basically want to spit back price recommendations. So for every SKU, we're telling the retailer, "hey, mark it down 10%, mark it down 15% so that you'll optimize sales over the entire season." And many times what ended up happening is it would take a few days to load the data. We'd have to get the data from tapes, we'd have to upload it (there could be an issue there). Then you'd have to get it into our models, run the models, which I think that could take about a day or two at that time. And then at the end of the day, this is going to sound really bad, but QA at that point in time looked like... So the final product we would send was basically a CSB report that would go to the retailer that they would then put into their point of sale system or their buying system but there were some nights where we actually would just print out the CSBS and look at them. And then mark, "this looks wrong" with a marker and then go back and actually go in with Python and fix and change the models. And those are late nights.


Lukas: What was your title at that point? What did they call you?


Peter: I think back then it was kind of like being a glorified grad student, but I think my title was Analyst or Data Analyst. Data scientist didn't come around until like 2009 or 2010.


Lukas: But then you actually were one of the original data scientists that decked in in the early days.


Peter: Yeah, yeah.


Lukas: I think everyone's gone on to do really awesome stuff. How did you come across LinkedIn that early?


Peter: Yeah. So after I did the stuff at M.I.T., I went back and I did some neural network grad work at M.I.T. and then I was at a well, I wanted to move into Consumer Internet. It was actually like a year after I left Prava Logic that they got acquired, so I think I kind of got bit by the startup bug when I saw, "hey, there is a light at the end of the tunnel, all these things can work." And so I was eager to get into consumer internet. So I was at AOL search in the D.C. area. They were based in Herndon, Virginia and so my goal was to just move out to the Bay Area. Actually, when I left AOL, that was the first time I signed up for LinkedIn because, back then at least, when you would leave a company, you would get all these LinkedIn invites from your co-workers and I hadn't created an account yet. This must have been back in 2008. I got a LinkedIn invite, signed up and I liked the product. There was a group, you know groups was a bigger feature back then. So I was on a lot of the early Hadoop groups and machine learning groups on LinkedIn, connected with a lot of people. I think that's how we actually met, was maybe in the Mike Driscoll had a big data community back then. He was blogging and he later went on to found metamarkets. But yeah, basically I came out to the Bay Area and I was interviewing, talking to a bunch of different startups. LinkedIn was interesting to me because, I think a big reason was, Reid Hoffman, the Founder, is really big on networks and I was a big believer in the power of networks and connecting people to communities. This was before Twitter or I think Twitter was out but Twitter was still pretty early. So there was LinkedIn, Twitter, Facebook. I was a big believer in the mission of LinkedIn specifically because it's unfortunately relevant again now. I think there's nothing more meaningful you can do right now than get someone a job and people working on things that are important to them and fulfilling and that feels like it matters, I think is really important. So it seemed like a great opportunity to leverage data, they had amassed this large data set of all these people, their profiles, their connections but they hadn't really flip that switch yet that like machine learning or data science switch to leverage it to have impact. That was just beginning when I got there.


Lukas: So what were the early projects? What were you doing there?


Peter: It was interesting. A lot of the core elements that you see today were there at that point. So there was a Profile, you could connect to people, there was an early version of 'People You May Know' - Jonathan Goldman was the first data scientist to build that but back then, it was running on SQL. So it was like a big SQL area that would take a few days to run our series as inquiries and the network was much smaller; I think there were only maybe like 10 million members or something at that point. Now there's probably over 500 million. I don't know the latest number on LinkedIn, but at that point, the data was small enough that you could sort of make that work; but I think it would actually take over a week to run People at that point. And so some of the first products I worked on, the first one, actually, D.J. Patel, who was running the team at the time, I was lucky enough based on some of the stuff I'd done before, he gave me a bit of latitude. And he said, "just come up with a new product, come up with something that you think we should do and we'll pitch it to the board". And so what I came up with was LinkedIn skills. So at this time, I was still basically an icy data scientist, product manager-type person. So I pitched this idea of, "hey, skill seems like an obvious thing you should have as an element of somebody's profile and there's all these other cool things we could do with it, we could use it in search and ad targeting. But we could also use like you could endorse each other for skills, things like that." So we had these early notions that we could do stuff like that but the first task is how do you bootstrap something like that? So I'd say the first year was basically bootstrapping and building that from scratch. And so we put put a team together. Jay Kreps, who's the co-founder of Confluent was my first engineering partner on that project. And then Sam Shaw, who rewrote from SQL to map reduce the 'People you may Know' was then my second engineering partner and became my co-founder later at a startup SkipFlag that we did, but that was the first big project.


Lukas: How did you bootstrap it?


Peter: Yeah. So actually, crowdsourcing came in. I think we may have used Crowdflower. We definitely used Mechanical Turk. Basically it was a mix of different things that we have, maybe in the show notes I can give you some references for papers on how we did it. The core prototype I actually built in, I think, a few weeks and I really just slapped it together with duct tape. Again, some Python, SciPy, SciKit-Learn existed but it was still pretty nascent at the time, and MapReduce. The stack back then looked like Hadoop, Pig, some Hive but we mostly settled on Pig which came out of Yahoo! And it was basically a bunch of batch jobs. And the idea, the trick, which I had kind of picked up when I was at AOL; at AOL, I was working on mining patterns from search query data and then crawling external Websites and trying to actually understand the topics in those sites. You can imagine, if you're on TripAdvisor, what are the topics on TripAdvisor? What are people writing about in reviews? What are the locations that are in search queries? So I spent a lot of time working on N.L.P. and information extraction. And so that was basically the idea - to bootstrap skills. We had about 10 or 12 billion English language profiles and basically it was a bit like Word to VAK, but pre-worked VAK extracting commonly co-occurring phrases from those profiles and then getting a bunch of candidates for named entities essentially from the raw text and from those candidates for named entities, again, similar to how a lot of people do named entities now. They use Wikipedia or deep, Wikidata or things like that as a source of truth. If we could map those phrases, those surface forms to an entity in Wikipedia, then I could normalize those. So if you say ROR or ruby on rails or rails, we could disambiguate those to LinkedIn down to be the same entity. And so it was a really primitive, in some sense, form of things that we went on to do at our startup SkipFlag.


Lukas: But wait, wait. So you just used Wikipedia to pull in the skills?


Peter: So we would use that as a, I guess I would say, means of normalizing the things that people would say on their profile. So you could basically name any disambiguation. So if somebody says a phrase like, let's say Angel.. so if you say Angel, do you mean an angel investor or... There are people on LinkedIn who do psychic healing and they can talk to angels and so you want to be able to distribute those two roles.


Lukas: So you used Wikipedia for your ontology.


Peter: The ontology, Yeah. So basically the knowledge graph was complicated. Not everything is in Wikipedia to the notability criterion for creating a Wikipedia entity is pretty high. So a lot of jargon, a lot of domain specific stuff is not in Wikipedia. So we would only use that as a.. let's say, Weights and Biases, people start putting that on their LinkedIn profile; that's an emerging topic which can be in the knowledge graph but if we can link it to Wikipedia, that gives us a lot more evidence and data for tagging.


Lukas: Would you like manually review new skillsets? Like, how would you know? say like Weights and Biases becomes a skill people want to put up, how would you create a new one?


Peter: Initially it was a combination. So we would have an automated skill-discovery job that would detect emerging topics that probably were skills. This is where production machine learning gets really complicated. If it's user-facing, I think I was maybe overly paranoid, but we really ended up not having a lot of issues with things like profanity and other things like offensive topics. And part of the reason was we had many layers of vetting. And so some of that was human-curated meaning we had humans come up with white lists and black lists and grey lists. So it might be OK for you to put on your profile. For example, alcoholism. If you are a psychiatrist, who helps deal with alcoholism and drinking disorders and things like that but you wouldn't want the machine learning algorithm to automatically suggest that to someone and be incorrect.


Lukas: Oh I see. That's an example of a grey list?


Peter: That's a grey list. So we had multiple tiers of, "Where is it appropriate to use this data?" So we may be correct that that person in their profile said alcoholism, but we shouldn't suggest is as a skill necessarily.


Lukas: Right. Right.


Peter: But I think the other thing that is interesting there is the use of crowdsourcing. So as a way in that first month when I was bootstrapping the system, I was able to get labeled data. I think the Wikipedia task was something like, we would show phrases in context and then ask them to label, "hey, pick which Wikipedia entity is this phrase? Is this the correct one?" And then that powers the machine learning training, which does it automatically.


Lukas: Got it. Cool. So you worked on it for like a year? Like, how long did it take you to get something that you could like to play into production?


Peter: So I think that the prototype and with it, the front end and, the using SciPy and Hadoop, Pig, etcetera that took maybe about two to three months to get something reasonable. And then there is a bunch of design work and a bunch of engineering works. So the engineering, taking something that runs on your desktop or your laptop to do a prototype app that recommends skills for people, that's one thing. But at the time, the way that our production stack worked, we had something called Voldemort, which was an SQL datastore, and we had Hadoop jobs that would then push metadata, essentially. So let's say you extract suggested skills for all of those 10 million members, you would compute those suggested skills, push them up to an SQL Store, and then a recommender service which sends data to the frontend would have to pull from that data store and display it to the user. And then there's all the logic around, Did someone accept the recommendation? Did they decline it? Tracking, which would then go to Kafka eventually, all that machinery and all that engineering, that's where folks like Jay and Sam Shaw came in and designed that. The other hard part, I would say is you always have these choices of, "Did we do the thing to get it done quickly or did we do the thing to set us up so we can do 10 more products like this?" And in those early days, we were at this transition point. We had done in the past a lot of things, like I mentioned, the three day SQL query. We wanted to do things in a bit more scalable way. So we bit the bullet and a lot of things like Kafka and other projects came out of those efforts to make it more scalable.


Lukas: So it was like two to three months to make the prototype, how long did it take you to get it out?


Peter: I think it took about a year to get it out. It was phased, right? This is another good point; people should look for opportunities to do an MVP. So the way that we approached it was we actually did email first so it was much easier. Changing the front end of LinkedIn at that time was a big, heavy process and we had this framework, which is pretty hard to work with. It could take an intern about a month to learn how to commit a change and push it to the site and so email is much easier. So if we could push the data out of the loop to an email job, you can imagine something like an e-mail campaign on MailChimp - we weren't using Mailchimp, but something like that - you could push the recommendation. So I could send an email to you saying, "Hey, Lukas, do you have these skills? Adam's your profile." It's a much lighter weight way to do that. That, I think, took another few months to get all the pieces powered at the backend so that we could do that and then the work to get the frontend done and actually roll it out in a  task was probably another few months. So all in, it probably took about six to seven months at that point in time to get this on out for all users on LinkedIn.


Lukas: Must have been satisfying when it was deployed to so many people.


Peter: Yeah, I think it was actually the first strata. So I remember D.J. Patel had a keynote and he was going to announce it, but it wasn't quite ready to ship. And so I think I had a talk a couple of days later. And yeah, I think we announced it sometime around then. Right around the first strata in 2010.


Lukas: Cool.


Ad: Hi. We'd love to take a moment to tell you guys about Weights and Biases. Weights and Biases is a tool that helps you track and visualize every detail of your machine learning models. We help you debug your machine learning models in real time, collaborate easily and advance the state-of-the-art in machine learning. You can integrate Weights and Biases into your models with just a few lines of code. With hyperparameter Sweeps, you can find the best set of hyperparameters for your models automatically. You can also track and compare how many GPU resources your models are using. With one line of code, you can visualize model predictions in form of images, videos, audio, plotted chats, molecular data, segmentation maps and 3D point clouds. You can save everything you need to reproduce your models days, weeks or even months after training. Finally, with Reports, you can make your models come alive. Reports are like blogposts in which your readers can interact with your model's metrics and predictions. Reports serve as a centralized repository of metrics, predictions, hyperparameters tried and accompanied notes. All of this together gives you a bird's eye view of your machine learning workflow. You can use Reports to share your model insights, keep your team on the same page and collaborate effectively, remotely. I'll leave a link in the show notes below to help you get started. And now let's get back to the episode.


Lukas: So you went on to start SkipFlag. What was it like going from a bigger company into your own startup? What was the experience like for you?


Peter: It was interesting. I think the part that I left out in that transition was moving into management. So I had managed projects and small teams before, but as LinkedIn group; when I joined LinkedIn, there were about 300 employees. When I left, there was over, I think six thousand. In our data team, I think, obviously like the Facebook data team and LinkedIn, there are a number of data teams at that time that grew pretty fairly large and so like everybody else in a hypergrowth company, we had to learn a lot about how to run data teams. And one of the challenges that I found was that the tools we were using as enterprise tools, essentially, and workplace software was still really dumb, right? So you're building all these cool, smart systems for Facebook and Google on the frontend, and LinkedIn but the tools that all of us are using at tech companies are still pretty stupid. And so what I really wanted to do was apply some of that technology to those workplace problems and more specifically, intelligent assistants. So moving to a startup, what was it like? I think that it's not as obvious when you're in these larger companies where you've hit scale. Everybody specializes. And you have... there was a great joke actually on Twitter last night. I forget who said this, but it might have been Parties, she was at Twitter in the past, I think. Anyway, she said who came up with Data Engineer? It should have been Data Lakers. [laughs] Data Lakers are Data Engineers who are doing all that hard work, creating those new SQL stores and infrastructure. I think a lot of folks go off to launch a startup, then the reality hits them that while there's actually a lot to build, this was like 2015, even with Amazon Web Services or Google or whatever there's still a lot of pieces and a lot of glue that you rely on and companies like Google and Facebook that is just not there and you have to put together yourself. So that was a big journey; building a lot of that. But I would say overall, it was extremely fun. When you're at a big company, you end up spending a lot of time. There's a lot of impact that you can have but at the same time, you spend a huge amount of time on coordination, and red tape and getting everyone on the same page. And one of the advantages of a startup, obviously, is you can move a lot faster. You make mistakes, but they're your mistakes. That was really exciting.


Lukas: I've never asked you this. I'm actually genuinely curious about this one. You had an enterprise tool that helped you organize information at SkipFlag. I've never had this problem as an entrepreneur just because of the space I've gone into but how did you prototype it? Did you get someone to give you their slack logs and like do it for them? Or how do you even build ML logs without the data?


Peter: Yeah. So that's the chicken and egg problem. So I'm a little OCD when it comes to data and datasets. So back in maybe like 2007 or 2008, I wrote a blog post, some datasets available on the Web and it was in those early days... so Yaron Schwartz was another person who was really big on making datasets available and open. And he famously got in trouble for scraping academic journals. So he was working on some projects in this area. I'd been collecting a lot of datasets, a lot of public datasets for years and using the Twitter API, for example, I had been doing firehose crawls for years, Wikipedia, I mean, I'm an adviser to Common Crawl so that was used to create glove and a lot of these other N.L.P. datasets that we all enjoy today.


Lukas: Is the end run corpus still relevant? Is that still around? I remember working on that.


Peter: Funnily enough, it really is. We did some work... I don't want to give away too much secret sauce or details, but we were working with one enterprise customer. Let me say a little more about the tool. So how did we get going? So what Skip Flag was doing; we were a knowledge base that would build itself out of your enterprise communication. We started actually with Slack because one of our first investors was Excel. And so I actually did an E.I.R. In Excel and was hanging out there at the time as we put the company together. And they were investors in Slack. And so I was using Slack a little bit before it launched. I think that was maybe like 2014, 2015 and I really liked it. I'm pretty picky when it comes to workplace tools. And I enjoyed it. Obviously, they've been massively successful. You know, tons of people use slack, but it felt like an opportunity for a dataset so nobody was really using that dataset yet. It put some unique challenges similar to Twitter in that, there had been email startups, many email startups before and they're startups working on documents but Slack was interesting because to me it felt closer to Twitter - short form messaging data, very hard to do the kinds of things that we were working on, knowledge extraction in any disambiguation. But there's a lot of data and it's accessible. So they had a pretty good API in the early days in terms of actually pulling public channels slack data. So the initial way we bootstrapped was actually using slack and one of the hard parts is if you were to build the whole product; I think we still have a video online from we did a paper and Katie D and they have you do a short video describing the paper. So we did a paper on energy extraction on ONJ noisy text and in the video we have a short product demo, so we could put that in the show notes so people can check that out. But before we got to that full blown product, which looked a bit like if you use notion or other modern Wiki-like products, it looked a bit like that, except it had this A.I. infused that could auto-organize all your docs and answer questions, you could upload eventually, ultimately, after a required work day. One of the things we worked on was you could give it like a PDF of your workplace H.R. policies, and it could do fact extraction across all that and then automatically answer questions, which is pretty cool for an H.R. person to have this thing automatically answer those questions based on just a document. But before you get to that, how do you do this with little data? So basically, I went around with my friends, you know, I got about 100 or so startups that I knew, got them in as beta users and said, "Hey, Slack is confusing. It's noisy. It's hard to sift through. What if we gave you a smart e-mail digest and we just summarized what's going on in your Slack team so that you can keep up to date with what's happening and see interesting stuff? And oh, it'll have news articles recommended based on what you're talking about and things like that." So we did that. We did that prototype. And to do that, we had other auxiliary datasets and we could do a bit of transfer learning and things like that. So we had the Wikipedia Corpus Common Crawl, which is a fairly large dataset for N.L.P.. I think it's a few terabytes of Web crawl data. So we were able to train and bootstrap on open datasets and Web crawl data and Twitter data in combination, then with customer data to train and do something like smart summarization and extraction.


Lukas: You were going to mention the End Run Corpus. Did you also use that?


Peter: Oh, yeah Slack actually came much later. So we started with Slack. We did the e-mail digest. And then in parallel, we were building out the product that became SkipFlag. One of the things that we found was that Slack is great but at the time, this is like 2016, 2017, larger companies were still not all in on slack; and I think they probably still are not all in on Slack. Teams is getting a lot of adoption obviously, things like that. But the world still runs on email, right? So email is a big deal and we had worked with email before. My co-founder, Sam Shaw actually ran email relevance at LinkedIn. And so we're pretty familiar with working with email and then run Corpus. I actually took a class; Leslie Cabling, I think back at M.I.T bought the corpus from Enron or they're bankrupt or whatever. So she bought the dataset and that's how it became an open dataset. She curated it and put it out there anyway. Long story longer, as we got into email... that's one of the few public email datasets out there. So when we would want to show a customer how well this could work, that would that was the dataset that we would benchmark on. And a lot of academic papers still use that as a benchmark forever.


Lukas: It's funny, I remember working on it in grad school and I just kind of feel sorry for all the employees that look at it, even though it's released. It is. I think it's a good lesson for me because now I'm really religious about keeping work stuff at work and personal stuff personally.


Peter: You don't realize that there's nobody more careful than a data scientist or engineer for sure. We were really careful from the beginning and really rigorous. We had PII scrubbing and all kinds of stuff in our machine learning pipeline from day one.


Lukas: How did you do automatic PI?


Peter: So there's a bunch of techniques out there. So obviously nothing is foolproof but this actually goes back to that first week at AOL. I had just started, so I was almost in quarantine. So I hadn't been involved with the release of the search dataset. So for whatever reason, I was tasked with going in and putting in place a bunch of the PII protection. So you can imagine in search query logs what are common things that you...


Lukas: Wait. so you out in place what PII was relevant for the AOL systems then?


Peter: I didn't put it fully in place. I was a data scientist. I wasn't really doing the production engineering, but I did put together the scrubbing layer, which was things like, "OK, how do you detect FedEx I.Ds?, how do you track a Social Security number or is that kind of stuff? but this was years ago. If you are going to do it now, Microsoft actually has an open source project for this. And there's a whole bunch of others there. There's about a half dozen open source efforts to do this kind of thing now.


Lukas: So tell people what's the AOL query story?


Peter: So I think I mean, you should really have Ron on. I don't know if he's going to talk about it. So yeah. My job as a chief scientist at Twitter and previously who was driving search at a well... Tt was a strange situation because technically nothing was out of the norm in that... So you mentioned the end run dataset. At the time in academic research, the world of search, there were a few datasets. There is an excite logs. NSN had a dataset out there of search query logs. And so there was kind of an accepted format. There were like two or three, Maybe Lykos had a dataset out there. There were two or three datasets in the academic world that were search query logs. And so AOL basically released one in the same format. So I think they didn't really expect anything like this to happen but what ended up happening was Reddit... This was in the early days of Reddit too, right? I remember I was actually looking at Reddit on the weekend when I just started this job and I saw on the new page where emerging stories are popping up. I saw this story. AOL releases search logs and in some user I don't know if they were the first person to see it on Reddit or if it was a reporter but someone at Reddit like said, "Hey, check this out. Their search queries' here. And then what happened was a whole bunch of people started putting up sites where they put a Web app in front of the search query logs, and you could go in and explore crazy queries. And query session logs are deeply private, insensitive things. So even if you remove the user I.D., this is what the world discovered basically. When that happened, hey, it was a wakeup call. I think a lot of people in technology already knew that obviously this stuff could be sensitive but I think for a lot of the world, it was a wakeup call that what I type in to my browser goes somewhere and it can have an impact and so if you think about the advice back in the early 2000s dealing with Google, the general advice out there was always Google your name, like every few weeks. You should Google your name to make sure that there isn't something bad in the Internet about you or whatever you like, reputation management. But actually, what that means then is in those anonymous search area logs, usually one of the more common things that people were Googling was actually their name. So then that made it fairly easy to triangulate in many cases, individuals. So ultimately, the search dataset was taken down and then I think we entered into this period where it was much more difficult for academia. Like you mentioned, the end run dataset. I don't think something like that would happen today. You wouldn't have an end run dataset. You wouldn't have the AOL search logs. And so it became much more locked down. I think the Netflix prize dataset was one of the last big ones.


Lukas: Yeah. And I hear there were complaints; I think Netflix didn't actually do a followup to their competition because there is privacy you brought up so...


Peter: Yeah. I mean, it did. It was a good run and that kind of correlated with the rebirth deep learning. Because I remember I was actually working on the side when it was going on. I think I was at AOL at the time and I was working on the Netflix prize and I was in all the forums and it was like, this is before Kaggle but it was basically one of the first... there were the data mining competitions and then there was the Netflix prize and then there was Kaggle. And it was really interesting to see the progress because deep learning kind of came out of left field and ended up working really well and then its ensemble techniques. But I think that that period was the catalyst for a lot of what day-to-day folks like you and I deal with in machine learning. And this massive surge of progress, I think is largely because of these benchmarks and then things like imagenet. So anyway, I know we're on a tangent here, but I think that that was a really exciting period and we're seeing the compounding effect of that now where the technologies that people have at their disposal are amazing compared to what we had 10 years ago.


Lukas: Totally.


Peter: And it's so powerful.


Lukas: I really want to make sure I get this in before we run out of time. You've been lately doing some consulting for different companies, what are you seeing out there? I'm really curious. What kinds of stuff are people doing and what kind of technology are they using at this point? Is Python and C++ still the standard like what's going on?


Peter: I haven't seen C++ in a while actually. If you're using devices and things, maybe. But, it's interesting. After the acquisition of the startup, I took a break, started doing some angel investing and I get people paying me periodically to come in and help with strategy and help with consulting and running data orgs or rebooting data orgs sometimes. And so it's interesting; deep learning go back three, four years ago, it was seen as a risky proposition, unless you're a small startup. So bigger companies were not, I think, doing it as much in like 2014, 2015 but now it does seem like everybody wants to be using TensorFlow, PyTorch, things like that. I think also, obviously, the cloud providers have become a big player. So a lot of people are using Sagemaker. They might be using the Google cloud platform...


Lukas: Do you have stuff you recommend, like when you come in? Do you have an opinion as to what people should be using?


Peter: I'd say I am still somewhat agnostic. So generally I tend to use Amazon myself, but I'm open to using other tools and other platforms. And so for example, I've got a camera here so, this is pretty cool. So I've got this as jeur smart camera, so I'm going to play around with that. But that's the cool thing; is like I mean, if you're using TensorFlow, all the big players use all the same open source stuff. So I can run TensorFlow on Microsoft or Google or whatever. Then I think it really depends on the problem and it depends on your company stack. I'd say if you're already all in on Google, then using a lot of the Google tooling can make sense. And so that's where I think I'm not religious about one platform or another. I think they're all converging to some degree. But I have a lot more experience with the Amazon stack, probably like most people. In terms of what I see at these companies, I think that what ends up happening, is actually similar to how things were a decade ago, I guess, in that data science, when we changed the branding from research scientist or machine learning scientist to data science, a lot of that was because you needed people who could put stuff into production and that production, machine learning engineering and data engineering was different than an academic who can write a paper. And so I think that you do have this challenge when it comes to hiring and shipping products. Managing research scientists is something that's difficult for anybody but it's especially difficult if that's not your area of expertise. If you're an enterprise company, if you're building workplace software or even if you're a consumer company, typically, they'll have a V.P. of and or something try to manage those teams. But if they don't have background, it can be really hard because planning is hard. Prioritizing those projects, knowing what's likely to work, if somebody hasn't done it before and then you hire some people out of school or people who've worked on Kaggle competitions. There's a lot of pieces that they're missing. So we actually ship and execute.


Lukas: Do you think there's something different about data stuff than other things? Because if I'm a V.P. of engineering then I can't be an expert on, you know, dev ops and architecture and all these things so I have to rely on folks. Is there something that makes data science a particularly challenging in this way?


Peter: So, in all these other areas, the tools and technologies definitely change and people have to keep up with them. I think one of the challenges with machine learning is partly because of the people who work on it and partly just because of the nature of the field and how rapidly it's changing. They're always trying the latest thing, right? And I think that's very hard to manage, whether it's even just something as simple as library dependencies or methodology. Things are changing rapidly and that is at the root of it. I think the other thing is, if you're doing dev ops, dev ops at Company A probably looks a lot like dev ops at Company B, right? You choose your stack, you choose your tooling and then you live with the consequences of those decisions. But for machine learning, almost every problem is different. It's like that saying, every family is dysfunctional in its own unique way. I think the same thing is true of machine learning teams and projects. If you're trying to predict financial fraud and before you are working on detecting porn and user profile images, those are two vastly different problems. So SREs may look like SREs across companies, but machine learning problems are not all identical.


Lukas: Interesting. Let's stick with this article you wrote; What You Need to Know About Product Management for A.I. It seems like the best place to start with this. It's a question I have about a lot of things written about AI, which is what makes product management for AI different than product management in general?


Peter: So I think the main difference between product management for A.I. and product management for traditional software projects is that machine learning software is inherently probabilistic, whereas classic software development is more deterministic. Ideally with software, you have this rich methodology around unit tasks and functional tasks, integration tasks and builds, and you're working on developing the software and you expect it to always behave the same way if you've instrumented the right tasks. That creates a very clear, comfortable development process and both engineering leaders and architects and product managers are comfortable with that, right? So most product managers like to run projects that are predictable. They like to be able to commit to deadlines, to work with partner teams and customers and be able to commit to a date. If you are mostly building things that are clear and that are understandable, or things that you built before you can come up with good estimates if you have enough experience. I think with machine learning, the uncertainty comes from a bunch of places but it's all the way down to the individual algorithms. So if you're training a model, there's some amount of randomization, different random ways it can lead to different results all the way to your approach. Your approach may be different. Every problem is a little bit different for machine learning. Otherwise, it would be essentially solved. So there's always something different about it. Maybe it's a slightly different application, different dataset that you're using that's, let's say, a movie recommender. If you're going to recommend videos on TikTok, that's on the surface seems similar to recommending Netflix movies but if you peel back on The Onion, it's really pretty different. They are short videos. There's not a lot of context. They're very fresh, very new, every day and they're user generated, right? So you don't know what is going to be in that video. And there's not a lot like dialogue. Right? There's just something interesting. It's more visual versus Netflix that has a curated catalog of blockbuster movies or self-produced movies where everything is very carefully controlled. So on the surface, those both look like recommender problems but for a machine learning person, they would realize, "OK, there's a huge set of different things it would have to do for TikTok than it would have to do for Netflix." So that's just scratching the surface. But why are these different? Really, the planning process is often very different because it's very data dependent and very application dependent vs. you know, a user sign up flow looks very similar across many different software applications.


Lukas: So what does the planning process look like in the face of this amount of uncertainty?


Peter: I think there's two things: there's what maybe the planning process should look like and then realistically what it looks like in most companies. So I'd say in most companies, the pattern I've seen is people just do what they know. They continue to try to plan these traditional software products. I think the better teams are aware of some of these issues and and they treat the uncertainty from day one and they build it into their planning. So it's effectively, most machine learning projects are much closer to R&D than they are something very clear and easy to execute on. So I think that the best ways I've seen a plan, involves first starting with what are the core problems that matter to your business. So one of the problems could just be the set of machine learning projects you're working on may not be the right ones. So typically, companies have some kind of product planning process roadmap building. They may do this quarterly. They may do this annually, where they come up with a set of funded projects and that they're going to staff, that they're going to resource and they're going to execute on. And so I think fundamentally, you need to have a clear set of projects that align with your company strategy. So let's say, you're a consumer app and growth is important. So one of your key metrics maybe daily active users and time on site and sign ups, things like that. So clear business metrics where if your machine learning project has an impact, you can see the number change, right? What you don't want to have happen is you don't want to spend six months to a year working on a machine learning project and then at the end of it, you can't see a material impact in any numbers. And this does happen a lot. So a lot of people, I'd say, especially in enterprise, machine learning is seen often more as a feature, like an interesting checkbox to have, but it's not necessarily tied to a clear business outcome.


Lukas: Why do you think that happens? I mean, a lot of folks have talked about it that we've talked to, but it sort of seems like connecting a project to a business outcome is something that's a best practice for any kind of project. I do hear this over and over so there must be something I got. What do you think it is?


Peter: I think some of it is just lack of familiarity with the domain. So in some ways, unfortunately, it's like blockchain. People hear a buzzword, they hear blockchain, and they say, "OK, we need to have a blockchain story." So I think when these things start top down sometimes, that happens. So the company may say, "OK, our board is pushing us to have a blockchain strategy" and then they get some consultants in and maybe they have internal execs come up with something. And then I think when these things tend to be pushed top down sometimes.. It can be good to have executive sport, don't get me wrong, but I think you do need the bottom up expertise and experience to connect those dots. And that's really where product management shines. So I think if you have a good product manager who's thinking, who's very numbers driven, that can help. And I do think that tends to happen more in these instrumented companies versus enterprises usually more sales driven.


Lukas: What about specifically addressing the uncertainty? Say you have a thing that's connected to a business outcome but, you know, like you said earlier, with ML, it's hard enough to even know how good of a system you can build, so how do you plan around that level of uncertainty?


Peter: So there are a number of different strategies. One of the ones that I'm really honing in on lately is an old strategy. It's what Darba used for self-driving cars. It's what Netflix used for the Netflix Prize and you know, all the data mining competitions used for years, which is benchmarks. So I think if you have a clear ... you could go back to video recommendation again. Netflix created a dataset, a benchmark dataset, held back some test dataset and released training data for people to train models and then had a clear set of evaluation metrics. And they said here's your current state-of-the-art or what is in production right now and we're going to pay $1million for the first team to get a 10% improvement. I think it was 10%, right?


Lukas: Yeah, 10%. Yeah.


Peter: The nice thing, and this will appeal to, I think, a lot of product managers, they like clear objectives and goals that people can rally around, that your team can rally around. So I found that really effective. I found two things. If you can't construct that benchmark, and it's surprising the number of teams that actually skip ahead, they just don't even bother with that. They may have some business metrics they're measuring and they may have model metrics that they're using internally, but they're not really connecting those in a clear way and they're not doing it... Like, for example, it's like testing and production. They may do something where they have an AB test and they may say, "hey, when we rolled out this new model code, in our AB task, we saw a 5% lift. So it's better than the old one. Good job. Work on the next model." And if you're doing that, it's very easy to fool yourself and it's very hard to debug. So that's where the uncertainty creeps in; you don't really know where you stand. You don't know, well, "Was there something else happening in the data during that time period that affected the model?", "If we reran that same model on new data, would we get the same result?" And so this is where I think experiment management is one where you could frame this. It's really critical that you build those benchmarks and that you hold out some dataset, some stream of traffic, for example, and you keep running on multiple models so that you can ensure you haven't regressed in terms of performance.


Lukas: Every one of the things that you said to me in a private conversation was talking about standups with ML teams feels a little different than setups with engineering team because it's like hey I wrote this feature, but then with an ML team, it's a little harder. Do you have any thoughts around that?


Peter: Yeah. So I think I've mentioned this on Twitter a few times as well. And it's interesting to see the discussion that people have around this. So I think with a lot of data scientists, it resonates. In that, and some of that may just be understanding the nature of the work that scientists and machine learning researchers are doing and then how to translate that into that kind of standup format? I think you see the clash of cultures immediately in a standup because often people have, say you're doing, I don't know agile development or, you know, scrum or something where you have very clear chunks of work that may take one to two days in traditional software development. If all the people working on other parts of the product's launch have work that's easily chunked in that way, then they can close the Jirs tickets more easily. They say, oh, yeah, I implemented that API that will talk to the e-mail system and we're all set. And it's in testing. That's very different from they get to the data scientists in the standup and they say, well I'm still training the model and something is not working, but I'm not quite sure what it is. I'm going to look into the initialization parameters and maybe try to optimize that. And I'll report back next time and then repeat. Right. And it's always something. The model isn't working until it's working, unfortunately. And so I think that can create some stress. Maybe it's just me, but I feel that stress. And so in terms of strategies to deal with that for product managers, I think it's at least good to call it out. I think if you don't talk about it, it can start to seem strange. So I think it's at least worth calling out. And this gets to the point of organizational support for these ML projects. If you listen to chatter, there's all kinds of apps for back channels now. There's Slack and there's other things like Blind where people talk about their companies and especially in an environment like now where there's economic uncertainty and pressure, I think increasingly you're going to have this chatter which is already there around, "Hey, what are those ML people actually doing? They're getting these big paychecks, where's the beef? What are they delivering?" I think this is critically important. And like standups, I think it would be good to have more of a clarity around what should machine learning folks report in standups and make it clear that the progress meter is going to be a little different. And it may be research results like here's the objective we have this week, here are the things that we want to put in place and we may have accomplished them even if the results are not there. I think if you say, "Hey, we're gonna improve by 5% this week, and that's our goal for the standup", that can be very hard because you may not hit it.


Lukas: Chris Albon also on this show said he was talking about a lot about creating a sense of emotional security for his ML team and I think a big part of that for him was not focusing people too much on this sort of external 5% increase goals. I have to say he made some sense but I think I was a little convinced for myself. I mean, I feel like as someone running a company under pressure, I do feel like for me, I think the way that I run my teams is pretty sort of external-metrics focused.


Peter: Yeah.


Lukas: But I think there's downsides to it for sure. And it's very hard to know what a reasonable goal is. So yeah, I go back and forth.


Peter: So I think it varies depending on the stage of the project and of the company. So when they're just starting with machine learning, I think the hard reality is it's gonna be hard to get to that number when you don't even have your ML infrastructure in place. So I think in the early stages before you actually have a working product in production, unfortunately it's gonna be really hard to be metrics driven. So you may have decoupled, you may have some set of people working on sample data, training a model and maybe quickly you can get to a benchmark. What I would suggest people do is, so I kind of agree with you and I agree with Chris, but I think that you have to encapsulate it in different ways. If you have no part of your team that's numbers driven then you're in trouble. So what I would say is, let's say you're starting a new project, let's say it's fraud detection and accounting or something and you're going to roll out that model. It's one of the things when you get back to project managing and planning and how you run these projects, as quickly as possible, you need to get to a benchmark data set. So I remember Leslie Kaelbling, I think we talked about her before. She's a professor at M.I.T; I took a machine learning course she taught years ago. There was a project in the course and people had to pick projects. So in some ways, it's similar to picking a project in your company. And she said one of the most important things is if you don't have the dataset in hand now, pick a different problem, because you're going to spend the whole semester just gathering the dataset. So I wouldn't necessarily give that advice for a company. If you don't have a dataset, maybe you do need to gather it but if you can have that dataset ready to go and get people working on it, benchmark right away, and then you can get on this nice track where you can track progress. We have a white board with a goal, you know, "Hey, in two weeks, this is the number we want to hit." And maybe it's like AUCX and everybody would just keep their eye on that number and we'd know what we were shooting for.


Lukas: And it would be like a kind of a two-week horizon.


Peter: I think, yes, two to three weeks. Now that's once you have a working model. Once you have a baseline model, one of the most important things you can do is just start with an MVP, get something basic in place so that you can get on that AUC improvement training. And once you can do that, then you can create this momentum where the team feels like there's progress. And I think for project planning, then it becomes more clear. So once something shipped and assuming it is tied to a business metric where you improve AUC and then you see revenue increase or users increase or something, then it becomes very clear what impact your team is having. And so I think this solves that backchannel chatter issue as well. Part of the PM's job here is to keep people moving towards that objective but then also to communicate it to the rest of the company. So a good weekly status emails, getting the company to  update emails that go out to everybody and make it very clear that, "Hey, we have these model improvements which cost X or Y to our bottom line metrics."


Lukas: That makes total sense.


Peter: Yeah. So anyway, I don't know what Chris's comments were, but it sounds like shielding your team so they don't feel overly stressed by metrics can make sense but I think that's more important in deep R&D. So, for example, like Google, they would have 20% projects, right? At LinkedIn, we do this as well, except it's not as simple as, "OK, one day a week of someone's time." Often what would happen is entire sets of people would be 100% on a 20% project. In ML, that's really what you need to do sometimes and so shield those people. Definitely.


Lukas: I remember years ago, the DEF CEO told me that he basically gave out bonuses to his ML team based on the lift that they got on the projects that they were working on. And on one hand, it seems like a very fair manager strategy kind of pushing down the decision making to the folks working out. I guess, on the other hand, I've worked on many projects and I've been really surprised by not making any progress or being the progress that I wanted. So, you know, I think that could be a more stressful environment for sure, right?


Peter: Yeah. I think in the article I mentioned something related to this. I'm not sure if I mentioned OKRs explicitly, but I do talk about setting goals and setting objectives and then how that can go wrong. So I think this is a great example. I am a believer in that general framework. I think what becomes difficult... So for people who aren't familiar with product management measurement in OKRs, what typically happens is at the beginning of a quarter, everybody signs up for an OKR and you're not supposed to stand back, you're not supposed to set the bar low. You're supposed to have a reasoned, ambitious goal. Typically have something like three OKRs per quarter and it might be increase user signups by 20%, increase revenue per user by, you know, 10%, something like that, and they may be a little more grandular, right? Especially in a larger company, they become more grandular. It may be something like increase search relevance as measured by F1 or whatever by 30%. In any case, those metrics are important and I think the hard part is among the product leaders. You have to be very careful about how much latitude you give on just pure ML metrics versus business metrics, because it's very easy for our teams. That's how you get this bubble where everybody is just doing R&D. And then I think what ends up happening is a lot of the business leaders and PMs see those OKRs and they just shrug and say, I don't understand how that relates to the business.


Lukas: Right. Right.


Peter: Rewarding people for OKRs is pretty standard. And where that can go wrong, I think the YouTube example is one of the best known ones where by all OKR metrics, YouTube has been succeeding wildly over the last five or six years, probably. But the downside is when you manage to a single number, PMs become machines. They're like the parable of A.I. and paperclips where you build an amazing paperclip optimizer with AI and then it destroys the world to make as many paperclips as it can. So PMs are the same way where you give them a metric they're gonna hit it but there may be a lot of collateral damage. In the case of YouTube there is a lot of misinformation and conspiracy theories because they lead to clicks. So by the measure of engagement on YouTube, they're doing fantastic. But at what cost?


Lukas: Right. Right. I guess it's a good segway to another topic that you talked about in your paper that might be interesting to talk about here, which is building infrastructure to make your AI or machine learning actually scale. What kinds of recommendations do you have there?


Peter: So that's a deep, deep topic. So I think there's a spectrum of companies. So companies like your Google, Facebook, they already are deep into building ML and they have all the framework so it's hard to say.. The advice that makes sense for them may not make sense for other companies, I guess, is one key thing to be aware of. And so, realistically, I would break out a few different types of companies. So I would say for your hypergrowth technology companies that are more consumer facing or enterprise sass apps that are in the cloud from day one, those are your modern technology stack companies. In many cases, those companies have good tracking in place or using some modern frameworks and tools. They have Kafka, they have things like that, and they probably have good data ETL. Now it varies quite a bit. So some companies just move so fast that things are duct taped together, right? Even for successful startups. But it tends to be the case that those companies at least have a lot of the raw pieces in place so that when they do get to a stage where they want to use machine learning to make their products better, there's some amount of work, but it's maybe one year, or 18 months of work to get things pretty solid. I think the bigger challenge are more legacy companies or enterprise companies where organizationally, the very large organizations may have different data systems. It's typically hard on those companies to get access to data. People may even hold on to data and not want to give it up without 10 meetings. And even when you get the data, the idea of getting the data means different things, right? So someone gives you a dump of data, which is static, is very different from, "Hey, we want to do this in production. We want to do fraud detection. We need a Kafka feed. We need all this infra." And so I think for those companies, my recommendation has been don't try to reinvent the wheel. Like, there are 20 or 30 companies, you know they see... So you see what Uber did with Michelangelo. What Salesforce is doing with Einstein. There is like everybody trying to build their own ML platform internally. So that typically happens when the Top-Down guidance says, "Hey, we need an AI strategy." Somewhere in those early discussions, they jump to the conclusion, "Oh, well, first we need to build our own A.I. framework and let's give that project a name." This happens a lot in software development and big companies. It's project-name-driven development and they come up with a name that'll be our infrastructure ETL system. Let's go build that. And that might take two years. I don't know. Is this actually what you're seeing in big customers.


Lukas: Yeah. That's funny. You put my quote in the article, which was probably the tweet that I was made fun of the most; and that's actually big companies shouldn't build their own ML tools, and it's like, "Okay, I am selling an ML tool." But I feel I am incredibly biased, but it is like, I would say from my experience, it's just baffling how you go into enterprise companies in particular they do this, and they build so much infrastructure they could integrate cheaply or for free. And it's actually funny because you go into Google or Facebook, they've been around longer and they actually pull in a lot more open source infrastructure than companies less far along. So, yeah it's always surprising and I think part of it is actually that folks want to... Sort of you building ML infrastructure inside a company is a little bit of a career development path or...


Peter: Yeah. I totally agree. I think a big part of it is incentives. So if you think about it, and this isn't just Silicon Valley. I think it's all over. You know, PMs might be rewarded for OKRs and hitting product metrics, a lot of engineers are rewarded for releasing a new open source framework, for giving a talk where there's some new infrastructure piece they built that everybody in the company is using. And so I think engineering and product leaders really need to think about the rewarding. And one way to think about it is maybe as reward leverage as well. So if I was in a situation where somebody made the choice to use an open source system or even to use a vendor, and they delivered ahead of schedule and everything's working, you need to find a way to reward that as well as just rewarding, "Hey, I did a 18-month sprint to build something that is not as good as what I could get off the shelf."


Lukas: Yeah. Totally.


Peter: I think the hard part, you also deal with when you roll out to customers, you deal with the engineers on the ground. And I think there are good reservations around you know, sometimes using a third party thing can make people uncomfortable. They don't feel like they can adapt it or change it all their needs and so that's where I think a lot of the frameworks need to be really responsive to what a customer needs and how flexible they are. Because I think we've all been in a situation where you use some third party failing and it's too rigid and then eventually it causes a lot of headaches.


Lukas: And it does seem like a lot of the tools and infrastructure that comes out is built by a lot of engineers coming out of Facebook and Uber and others. They might not actually realize the different needs that other companies might have.


Peter: Yeah. So I think when it comes to the infrastructure side, that's another common pattern is that people..., so I worked with Jay Kreps who is the CEO of Confluent. He was originally my engineering partner back at LinkedIn. Then we were building some of the data products we built. And he eventually went on, we had built a number of these things and then his work on a lot of the infrastructure pieces and focus on that and what eventually became Kafka and other open source projects grew out of, "Hey, we've built this four or five times now. I think we should abstract this out." So that's a very different approach than saying, "Hey, we've never done this, but let's go design what we think the right thing is and then build this abstract platform." So I agree with him. I tend to think things that grew out of real experience tend to be better in terms of frameworks. So if you're selecting, make sure you're selecting something that as a product manager or engineering leader, make sure that you know the origin of the framework and ideally if you're at these companies your is building this, I mean, one of the best things you can do is just see more problems and map what you're doing to those customer problems.


Lukas: We always end with two questions, and I'm wondering how you're going to answer these, I think they're a lot of stuff we talked about but here's my first one, which is, what do you think is the topic in data science or machine learning that people don't talk about enough, like the underrated thing that in your experience matters more than people spend time thinking about it?


Peter: I think it's actually constructing good benchmarks. So if I were to look at, you know, we talked about teams that are struggling or having trouble, I would say nine times out of ten, they haven't done the hard work to construct a crisp, clean, precise benchmark and an evaluation of how well their model is doing. And so what often happens is people have these notions,"Hey I'm going to build a recommender", How do I build a recommender?" and "Oh yeah, we'll get this data", and people actually just start building the model. And then after the fact, maybe they label some sample data and they say, "Oh, this is my gold standard." And that's maybe I'd say. A lot of the time people don't even do that. They just launch the thing and then it becomes very hard, or they may use proxy metrics like did it increase lift? Did it increase CTR? And maybe an AB test, and AB testing is not a benchmark, basically I would say. So build a benchmark, be rigorous about it and if at all possible... because the other thing that happens is when things aren't working... So say you're six months in and your model isn't working, you don't know why, you need that benchmark so that you can debug what's going on. And if you don't have it, you're going to flail.


Lukas: It's funny, we had a guy from Open AI on the podcast a while back and he said the exact same thing. He was mentioning that the Dota team spent about six months building a hand-tuned benchmark. He was saying there's a baseline rule based system. They really spent six months actually building the benchmarks.


Peter: Yeah. I mean, I don't know the exact amount of time, but at our startup, I mean, I would almost say we spent 20 to 30% of our time on that kind of thing.


Lukas: I think it totally makes sense. The second and last question. When you look at, this is a good one for you, actually. So do you get all the ML projects you see on consulted on a part of... What do you think is the hardest part getting them from a model to a production deployed model? Where's the biggest bottleneck there?


Peter: So from a working model?


Lukas: No, I would say from conception to say, here's the goal to a deployed model that people can actually use.


Peter: I would say that the two... I'd say there's actually three hard parts, it's hard for me to pick just one. I would say one hard part is around actually getting the data. So a lot of companies, you were asking where did we get the data for the startup? Even within companies that seemingly have a lot of data, getting the dataset you need to train the model is often really costly and hard and there may be a lot of internal roadblocks. So that's one where I've seen people stumble. The other hard part about getting things to production is then actually the modeling approach. A lot of people, I see this all the time in blogs and on Twitter, say, "Oh, the modeling doesn't actually matter that much. It's all these other auxillary things and it's commodity." I don't believe that. I believe modeling is commodity at all. I think it's actually really hard to get models to work correctly, and especially when you move beyond that toy or benchmark dataset to real world data. Building something robust, and that works at scale is actually really difficult. So I'd say that's the second part - is actually the hard elbow grease and research work of getting a working model is usually hard, harder than people think. And then the last part, I would say, is actually getting buy in from executives. So this is a long journey to getting something out to production. And if you're running your own company and you're a CEO, that's one thing and maybe you can push it through. You think it's really important. But if you're in any large organization, there's a bunch of stakeholders, there's a bunch of business units, there's a bunch of engineering teams juggling resources. I think a lot of people struggle just to convince companies that it's actually a priority to push out their machine learning effort. And so I think that's where I'll go back and plug that product management article that I wrote. I think it's really important. You need somebody who's your advocate who's like.. It could be your head of data science or V.P. of data or it could be a product leader who is driving AI. But if you don't have a seat at the table, at the exec table that believes in this and is really supporting it and pushing it. A lot of these things will die on the vine.


Lukas: Interesting. Cool. Thank you so much. That was a lot of fun.


Peter: Yeah, man. I like your garage. [Both laugh]


Lukas: Thanks

Join our mailing list to get the latest machine learning updates.