Ray's co-creator Robert Nishihara on the state of distributed computing in Machine Learning
The story of Ray and what lead Robert to go from reinforcement learning researcher to creating open-source tools for machine learning and beyond
View all podcasts
Gradient Dissent - a Machine Learning Podcast · Evolution of Reinforcement Learning and the Robot Hand
BIO

Robert is currently working on Ray, a high-performance distributed execution framework for AI applications. He studied mathematics at Harvard. He’s broadly interested in applied math, machine learning, and optimization, and was a member of the Statistical AI Lab, the AMPLab/RISELab, and the Berkeley AI Research Lab at UC Berkeley.


https://www.robertnishihara.com

https://anyscale.com/

https://github.com/ray-project/ray

https://twitter.com/robertnishihara

https://www.linkedin.com/in/robert-nishihara-b6465444/

Topics covered:

0:00 sneak peak + intro

1:09 what is Ray?

3:07 Spark and Ray

5:48 reinforcement learning

8:15 non-ml use cases of ray

10:00 RL in the real world and and common uses of Ray

13:49 Python in ML

16:38 from grad school to ML tools company

20:40 pulling product requirements in surprising directions

23:25 how to manage a large open source community

27:05 Ray Tune

29:35 where do you see bottlenecks in production?

31:39 An underrated aspect of Machine Learning

TRANSCRIPT

Robert:

We have all these machine learning researchers some of them have backgrounds in math or statistics, things like that. And they want to be spending more of their time thinking about designing better algorithms or better strategies for learning, but actually quite a few of them are spending quite a bit of time on the tooling side. Or like building better tools or scaffolding for doing fairly low level engineering for speeding things up or scaling things up.


Lukas:

You're listening to Gradient Dissent, a show where we learn about making machine learning models work in the real world. I'm your host Lukas Biewald. Robert Nishihara is the CEO of the company that makes Ray, a high-performance distribute execution framework for AI applications and others. His Ray project came out of his work at the Rise Lab at UC Berkeley. And prior to that, he studied mathematics at Harvard. I'm super excited to talk to him. So I'm curious about how Ray came to be and how you think about it, but maybe before we go into that, if you could just kind of give a high-level overview of what Ray does and why people use it.


Robert:

At a high level, the underlying trend that is giving rise to the need for Ray is just the fact that distributed computing is becoming the norm. And more and more applications, especially applications that involve machine learning and some capacity needs to run on clusters. They're just not happening on your laptop or a single machine. And the challenge is that actually developing and running these distributed applications or scalable applications is quite hard. When you're developing these scalable applications, you're often not only building your application logic, like the machine learning part. You're often also building a lot of infrastructure or scaffolding to run your application.


And we're trying to make it as simple as developing on your laptop, essentially to let people focus just on building their application logic, and then be able to run it anywhere from your laptop to a large cluster. And take advantage of all the cluster resources, but without having to be experts in infrastructure.


Lukas:

What's the real challenge of making that work because it's absolutely... As probably more of an ML person than the DevOps person. They'll probably kill me for even like thinking this, but conceptually it seems like a pretty simple idea. So what makes it hard to actually abstract away the underlying distributed system from the ML logic?


Robert:

A lot of the challenges are actually being general enough. If you have a specific use case in mind, of course you can build a specialized tool for that use case. But then the challenge is that it often doesn't generalize to the next use case you have. Like maybe you build some setup or some infrastructure for training neural networks at a large scale, but then you want to do reinforcement learning and all of a sudden you need a different system or all of a sudden you want to do online learning and it's different. The challenge is really trying to anticipate these use cases or without even the future use cases will be trying to provide the infrastructure that will support them. Ray achieves this by being a little bit lower level than a lot of other systems out there. So if you're familiar with tools like Apache Spark, for example.


The core abstraction that Spark provides is a dataset. And it lets you manipulate data sets. So if you're doing data processing, that's the perfect abstraction. If you look at something like TensorFlow, the TensorFlow provides the abstraction of a neural network. So if you're training neural networks, it's the right abstraction. What Ray is doing is it's not providing a data set abstraction or a neural network abstraction or anything like that. It's actually just taking more primitive concepts like Python functions and Python classes and letting people translate those concepts into the distributed settings. So you can take your Python functions, execute them in the cluster setting, or take your Python classes, instantiate them as like services or microservices or actors. And in some sense, the generality comes from the fact that we are not introducing new concepts. So that enforcing it to course your application into those concepts, we're taking the existing concepts of functions and classes, which we already know are quite general and letting and providing a way to translate those into the distributed setting.


Lukas:

So what's something that would be painful to do in Spark, but it would be easy to do in Ray?

Robert:

For example, training neural networks, or building AlphaGo or building an online learning application or deploying your machine learning models in production. Those are some examples.

Now let's take like building AlphaGo as an example. That does seem to me like a pretty... Maybe this is going to annoy you, but it's like a super naive question maybe in a challenging way. But AlphaGo seems almost like a very embarrassingly parallel learning problem. It seems like you could run a lot of learning at once and combine the results. Why wouldn't that work on Spark for example?


Robert:

There's a lot of subtleties. So if you're implementing something like AlphaGo, yes, you are running a lot of simulations in parallel. And then that's one part of it. You're also doing a lot of gradient descent and actually updating your models and these things, each of them individually are embarrassingly parallel perhaps. But one of them is happening on GPS. One of them is happening on CPU machines. There's sort of this tight communication loop between the two where you take the roll outs and stuff that you do. The money college research and pass those over to the training part. And they take the new models from the training part and pass those over to do the rollout. So there's a lot of this sort of communication. And a natural way to express this is to have these kinds of stateful actors or stateful services that have these machine learning models that are getting updated over time.

And the way it's often natural to express these things is with stateful computation, which is different from what Spark is providing. So those are a couple examples.


Lukas:

Is there something specific about reinforcement learning? Because that is actually your background, right? And it seems like that might've been some of the impetus for making this. Is there something core to reinforcement learning as opposed to like supervise learning that makes this more necessary?


Robert:

I think the one reason we focused on some reinforcement learning applications initially with Ray is that... Well, beyond the fact that it's an exciting application area is the fact that it's quite difficult to do with existing systems, right? So when deep mind is building AlphaGo or when OpenAI is doing DOTA, they're not doing it on top of Spark. They're not doing it on top of just TensorFlow. They're building new distributed systems to run these applications. And part of the reason for that is that reinforcement learning combines a bunch of different computational patterns together. Yes, there's the training part with the gradient descent. There's also these embarrassingly parallel simulations that are happening. There's also some kind of inference or serving where you take the models and you use them to do the roll-outs.

So in some cases actually you have some data processing components where you are storing these roll outs and then using them later. So it combines a lot of different computational patterns together, and it ends up being tough for specialized systems. And you often end up benefiting from. This is an area where you benefit from having a more general purpose system.


Lukas:

A lot of that seems like it would overlap with a supervised learning, but it sounds like there's a kind of more things going on in parallel that are different. Why do you think reinforcement learning specifically requires a totally different framework?


Robert:

Well, I don't think it's that just the reinforcement learning that requires a different framework. I think when people build new applications and wants to scale them up, they often end up having to build new systems. And so for example, with companies that we see wanting to do online learning, where there's a training component, you're learning from your interactions with the real world. But then also taking those models that you're training and doing inference and serving predictions back to some application in the real world, right? But to do this there's often a streaming component where you have data streaming in, a training component where you're updating your models and then a serving component where you're sending recommendations or predictions or whatever back out there on the real world.

And to do this, again, it's not just TensorFlow, it's not just a stream processing system, or it's not just a serving system. People end up building new systems to do this. But this is also an area because of race generality of where some of the coolest applications we see, you can do the entire thing on top of Ray.


Lukas:

I think one thing you mentioned to me, or maybe it was someone on your team mentioned to me in the past, is that a lot of folks are not even doing machine learning on top of Ray.


Robert:

Yeah. There's a mixture. Certainly a lot of people are doing machine learning, but a lot of people they're Python developers. They're developing their application on their laptop and it's too slow. They want to scale it up, but they don't want the investments of needing to build a lot of infrastructure. And they're looking for a simple way to do that. So you're absolutely right. A lot of these people, even if they're not doing machine learning today, they do plan to do machine learning. And you start to see the machine learning being integrated into more and more different kinds of applications. So it's often our users are not like just training a neural network in isolation. Sometimes they are. They're often not just deploying the model in production. Now, they often have machine learning models that are integrated in interesting ways with other application logic or business logic.


Lukas:

And is your thought that all of that logic will run on top of Ray or that it's just the trickiest bits? The machine learning are the most complicated parts going, right?


Robert:

Yeah. Well, so to be clear, Ray integrates really nicely with the whole Python ecosystem. So if you're our users or they're using TensorFlow, they're using PyTorch, they're using Pandas and spaCy, this is part of why Python is so great, right? It has all these great libraries. So our users are using these libraries, and then what they're using Ray for is to scale up their applications and to run them easily on clusters. It's not replacing all of these things. It's letting people scale them up.


Lukas:

For sure. That makes sense. I guess slightly switching gears, I feel like a lot of people have been talking about reinforcement learning for quite a long time, and there are such evocative examples and Go. I absolutely love those examples, but I think maybe a knock on it has been It's not maybe used in industry as supervised learning. Is that consistent with your experience, or do you see reinforcement learning catching on inside of more real-world applications?


Robert:

It's certainly not being used to the extent that supervised learning is being used. I think a lot of companies are exploring reinforcement learning or are experimenting with it to see. I think the areas where we see reinforcement learning having a lot of successes are in optimizing supply chains or these kind of operations areas or some financial applications, recommendation systems, and things like that. Of course, that's one application area that Ray supports really well, but it's far from the main focus of Ray or the only focus.


Lukas:

Sure. I guess it's interesting, because online learning I would view as more best practice, and I think lots of companies are at least trying to do online learning. Do you have any way of knowing the sort of volumes of the different kinds of applications, or do you have any sense of the relative ... Just from the tickets that come in, do you have any sense of what are the ... Can you stack rank the most common uses of Ray? Is that even possible?


Robert:

I don't actually know the exact breakdown. There are certainly a lot of people doing stuff on the more machine learning experimentation training models. There's a number of people building their companies' products or services and running them on top of Ray, building back ends for user-facing products. A lot of people who are ... It's really just distributed Python, right? Independent of machine learning. Then there are a number of people, and, actually, this is a really important use case, a number of people building not just the end applications, but actually building libraries that other people will use, and scalable libraries.

That's exciting because Ray, it's not just good for building applications. It's actually great for if you want to build a distributed system, because it is low enough level that if you were to build a system or library for machine learning training or data processing or stream processing or model serving, it can let you focus on just that application logic, right? Just on your model serving application logic or your stream processing application logic, and then Ray can take care of a lot of the distributed systems details that you would normally have to take care of, like scheduling or handling machine failures or managing the resources or transferring data efficiently between machines, right?

Typically, if you want to build, say, a system for stream processing, you would have to build all of that logic yourself, not just the streaming logic, but also the scheduling and the fault tolerance and so on. By taking care of that inside of Ray, we can let library developers easily build these distributed libraries, and that can all give rise to this kind of ecosystem that a lot of other developers can benefit from.


Lukas:

Do you think, ultimately, it subsumes what Spark does, or does it live alongside it for different use cases?


Robert:

I think Spark is the kind of thing where, of course, if Spark were being ... Essentially, what we would like is if Spark were to be created today, instead of back when it was created, and if Ray is living up to its promise and really delivering on what we're trying to do, then our hope is that Spark would be created on top of Ray and that for developers who want to build things like Spark, then Ray would make them successful or enable them to do that more easily.

So that's a little bit of how ... Ray, it's a lower level API. One analogy is if you compare with Python, Python has a really rich ecosystem of libraries. There's Pandas and NumPy and so on. Spark is a bit more like Pandas, and Ray is a bit more like Python, if that makes sense.


Lukas:

Yep. That makes total sense. That kind of reminds me of another question that I wanted to ask, which is is it important to you to support other languages? Do you see it as essential to ... It's funny, because we've had a couple of folks recently on this podcast who have just been surprisingly negative on Python. It's actually not my most native language, but I love it for training machine learning, but it seems like maybe there's some sense that it's slow or hard to scale. Where do you land on that?


Robert:

Well, it is hard to scale, and that's something we're trying to address. You're right, of course. It can be slow, although a lot of the way that libraries like TensorFlow and Ray and other libraries and NumPy deal with this is that the bulk of the library's written in C+ or C, and then they provide Python bindings.

So Ray, like you mentioned, is actually written in a language-agnostic way. Tthe bulk of the system is written in C+, and we provide Python and Java APIs. Of course, Python is our main focus. That's the lingua franca of machine learning today, and it's one of the fastest growing programming languages. So it makes sense to focus there. But at the same time, a lot of companies are using Java in production very heavily, and you have companies where, a lot of times, their machine learning is in Python and their business logic is in Java. So being able to have a seamless story for how to invoke the machine learning from the business logic, it's actually a pretty nice feature of Ray. Down the road, we do plan to add more languages.


Lukas:

Setting aside your pragmatic CEO hat of wanting to support the languages that people actually want, do you think that Python will stay the lingua franca of machine learning for 20 years? Do you have any set feeling on that?


Robert:

I don't think I have any particularly special insight here. I could see that going either way.



Lukas:

But I guess you've dug deep, though, and I feel like sometimes the people building the tools get more frustrated with Python than the people using the tools. I'm not sure.


Robert:

Certainly there are things that a lot of newer features in Python that are making people's lives easier. There's more happening in terms of typing and things like that. You can really do anything with Python. It's extremely flexible. When we design APIs, for example, pretty much any API that we can imagine wanting for Ray, we can just implement that in Python. Of course, when we say, "Okay, what should the API be in Java?," a lot of times, you run into limitations with the language. You can't just have any API that you want.



Lukas:

But maybe that flexibility trades off with fundamental constraints on speed, or do you not feel that way?



Robert:

It trades off with something. I don't know if it's the performance or something else.



Lukas:

Interesting. Okay. To switch gears again, another thing that I wanted to ask you about is when you started grad school, were you imagining that you'd become a person that runs an ML tools company? Were you trying to become an ML researcher? What were you thinking at that moment? Then what were you thinking when you started this project? Did you imagine that it could become a company, an important open source project, or was it to meet a need that you had at that moment?


Robert:

Yeah, that's a great question. So when I started grad school, I was very focused on machine learning research, and I was actually coming from more of the theoretical side, trying to design better algorithms for optimization or learning or things like that. This was definitely a change in direction, although it was gradual. You have all these machine learning researchers who some of them have backgrounds in math or statistics, things like that, and they want to be spending more of their time thinking about designing better algorithms or better strategies for learning, but actually, quite a few of them are spending quite a bit of time on the tooling side or building better tools or scaffolding for doing fairly low-level engineering for speeding things up or scaling things up.

We were in this situation where we were trying to run our machine learning experiments, but built these tools over and over. These were always one-off tools that we built just for ourselves and maybe even just for one project. At the same time, we were in this lab in Berkeley.

... at the same time, we were in this lab in Berkeley, which was surrounded by people who had created Spark and all these other highly successful systems and tools and we felt there had to be something useful here that we could build, or we knew the tools that we wanted. And so we started building those and the goal from the start was to build useful open source tools that could make people's lives easier. And we had the idea for Ray initially, we thought we would be done in about a month. And of course you can get a prototype up and running pretty quickly, but to really make something useful, to take it all the way, there's quite a lot of extra work that has to happen. So that's how we got into it.


Lukas:

When did you feel like, "Okay, this could be a company"?


Robert:

The scope of what we wanted to do was pretty large from the start. We didn't envision this as just a tool for machine learning or just a tool for reinforcement learning or anything like that. It was really, we thought this could be a great way to do distributed computing and to build scalable applications and combined with the fact that from where we were sitting, it seems like all the applications or many of the applications are going to be distributed. So, what we wanted to build was quite large from the start and to really achieve that, it's an effort from a lot of different people and a company is a natural vehicle to go about these kinds of large projects.


Robert:

We were seeing a lot of adoption, a lot of people using it and a lot of excitements and we thought it made sense as a business and combined with the fact that it was a problem that we thought was important and timely, those are the factors that led to us wanting to start a company.


Lukas:

And how's the transition been from grad student to start up CEO?


Robert:

It's really exciting. As you can imagine, there's a lot of differences and there's a lot to learn, that's for sure. But I'm working with really fantastic people and even in grad school, like before we started the company, we were working with a great group of highly motivated people. And we had already started thinking about some of the same kinds of problems of how do we combine our efforts to do something that adds up to something larger and how do we grow the community around Ray. It was a pretty smooth or gradual transition.


Lukas:

Have there been users or customers that have pulled your product or requirements in surprising directions?


Robert:

Yes, absolutely. I can start with one example on the API side. So actually, some of the initial applications that we wanted to support, like training machine learning models with the parameter server or even implementing some reinforcement learning algorithms, those actually weren't possible with the initial Ray API. I mentioned that Ray lets you take Python functions in classes and translate those into the distributed setting. When we started, it was actually just functions, not classes. So we didn't have the stateful aspect. And that was pretty limiting.

Adjust functions are pretty powerful, you can do a lot with functions, but one day we just realized we were doing all sorts of contortions to try to support these more stateful applications. And so at some point we realized, "Oh, we really need actors. We really need this concept of an actor framework." And once we implemented actors, I remember Philip and I, once we realized this, we mapped it out and divided up the work and tried to implement it really quickly and that just opened up a ton of use cases that we didn't imagine before.

But there was still multiple steps to that. So when we first had actors, only the process that created the actor could invoke methods on the actor, could talk to the actor. And at some point we realized we needed to have these handles to the actor that can be passed around to other processes and let anyone invoke methods on any actors. And that was another thing when we influenced these actor handles that just opened up a flood of new use cases. So there've been a couple of key additions like this, which really increased the generality and the scope of the kinds of applications we can support. But there haven't been too many changes. It's actually been a fairly minimal and a stable API for quite a while. So there's that.

And I would say there are other important Ray users that have really pushed a lot and done a lot in terms of things like really pushing for more performance, improving performance, how can we keep making it better? Also, on the availability side, they're running this in production during really mission critical times and how can we make sure that it's not going to fail ever. And also some of, really the like support for Java, that's something that came from the community and both initially adding Java bindings as well as then doing a lot of refactoring to move a lot of the shared Python and Java logic into C plus. Those are some examples. There have been pretty tremendous contributions from the community.


Lukas:

This isn't just a request, it's actually committing code-


Robert:

Absolutely. Yeah.


Lukas:

How do you think about managing like a large open source community? Like how do you do basic things, like make a roadmap when people are coming and going and have different opinions on what to do?


Robert:

It's a good question. And I wouldn't say that we have totally nailed it just yet, but we use a lot of Google Docs, a lot of design docs on and Google Docs. We use Slack very heavily. So we have a Slack that anybody can join and that's a good way to pull people, ask questions for users to ask questions, anything from the roadmap to just some error message that they're seeing or asking if there's anyone else using Ray on Slurm or something like that. And then a number of other things like just before the pandemic, we were doing a lot of meetups. We were doing this Ray Summit coming up, this coming September, these kinds of events to really meet users in person or virtually and to just get a sense of what people are working on and that kind of thing.


Lukas:

That's cool. Have you ever had this situation where like someone submits a pull request and they

obviously put in a ton of work, and you're just like, "Ooh, I just don't want to go there?


Robert:

Yeah, that's certainly happened. And we try to get in front of that by having a design doc ahead of time. And you don't want people to spend a huge amount of time on something like that before if people are not on the same page about whether that's even desirable or not. I think a lot of the time we're really getting a lot of those conversations are happening over Google Docs and over the design docs. And that kind of pushback is moved earlier in the conversation.


Lukas:

Yes. But I feel like there's this knock on ML researchers from some people and definitely not any Weights and Biases employees but I think some people that I've met feel like ML researcher's code is low quality maybe because they... or an instance, when they want to get the paper published, then they wash their hands and so they don't actually start to see the maintenance life cycle and they don't learn to architect things well, but I think it was interesting as you started as an ML researcher and actually more of a theoretical ML researcher, which I hear some people think are the worst culprits in this domain. And you went to this very, I think, like very architecture heavy kind of tricky programming projects. Has it been like a transition for you to just up-level like your skills around this or have you learned stuff along the way or do you feel like you started naturally to it?


Robert:

Oh, I've definitely learned a lot along the way. And I think a lot of this was Philip who I work with and one of my co-founders. He's been building systems for quite a long time and has a lot of expertise in this area. So I think maybe there was less of a transition for him. And then combined with the fact that we were in the AMPLab and RISELab at UC Berkeley where people had created Spark, had created Mesos, a lot of just leading people in distributed systems. And Berkeley also has a long tradition of creating great open-source software. So if we were doing this in isolation, it would probably look very different, but we were in this great environment with all these experts we could really learn from. So I think that played a big role.


Lukas:

What a great lab. It's amazing how many amazing projects will come out of it.


Robert:

Yeah. And of course, you're probably familiar with Caffe about the deep learning frameworks like that also coming out of Berkeley at the same time or actually Caffe was a little earlier. A lot of people one advantage of machine learning researchers building tools is that they know exactly what problem they're trying to solve. There's some advantages there as well.


Lukas:

Totally. I should say, we have and our customers, you maybe can't say this, but we can definitely say it, a lot of our customers are huge fans of Ray and one thing that a lot of our customers use and really like is Ray Tune. And I'm curious specifically how that came about and what your goals are for that.


Robert:

Our goals there are to build really great tools, and ideally the best tools, for hyperparameter tuning. And hyperparameter tuning is one of these things which is pretty ubiquitous in machine learning. If you're doing machine learning and training machine learning model, chances are you're not just doing it once, but actually a bunch of times and trying to find the best one. And this is something, again, where a lot of times people are building their own tools. And you can write your own basic hyperparameter search library pretty quickly. It's a four loop if you're doing something simple. But these experiments can be quite expensive. And if you're trying to make it more efficient or you're trying to speed up the process, there's quite a bit you can do in terms of stopping some experiments early or investing more resources in the more promising experiments or sharing information between the different experiments, like with population-based training or hyperband or things like that. So there's quite a lot of stuff you can do to really make the experiments more efficient. And we're trying to just provide that off-the-shelf for people who want to do that at a large scale in a way that's compatible with any deep learning framework that they're using and just works out of the box.


Lukas:

And so is part of the vision there to show people some of the libraries that you think should be built on Ray so that they get inspired to build more libraries?


Robert:

Yes.


Lukas:

How do you think about What libraries you should build as a core part of your project and what ones should be third-party?


Robert:

So in the long run, most of the libraries will be built by third parties. But I think it's important to start off with a few high quality libraries that address some of the big pain points that people have right away and are the kinds of things that people would want to use Ray for or have to build themselves if we didn't provide a library. We essentially started with a scalable machine learning trying to provide libraries that let people address some of their bigger pain points. And then for everything else that we're not providing, they can just build it themselves using Ray. Or hopefully in the longer run, other people will build libraries that really flesh out this ecosystem.


Lukas:

When you look at machine learning projects that you've been part of or that you've seen and you look at the whole arc from conception and experimentation to deployed and useful and production, where do you see the most surprising bottlenecks?


Robert:

So one obvious aspect is the bottleneck around scaling things up. This is one of the core things we're trying to address with Ray. One less obvious bottleneck is about interfacing machine learning models and your machine learning logic with the rest of your application logic. And one example where this comes up is with deploying or serving machine learning models in production. So web serving has been around for a long time. And you have Python libraries, like Flask, which lets you easily serve webpages and things like that. So what's the difference between regular web serving and serving machine learning models? Superficially, they might seem pretty similar. There's some end point that you can query. And in fact, when people are deploying machine learning models in production, they're often starting with something like Flask wrapping, their machine learning model in a Flask server.

But you run into a lot of pain points there as you start to deploy more models or you start to want to batch things together or you start to want to incrementally roll out the models or roll back or things like that or compose models together. At the other end of the spectrum. You have specialized systems or tools for machine learning model serving, so things like TensorFlow Serving. I think there's a PyTorch one as well. And the challenge with a lot of these frameworks for machine learning model serving is that they're a little too restrictive. And so often, it's just a neural network behind some end points. It's a tensor-to-tensor API, so a tensor going in and then a tensor going out. Often what you want is to have the machine learning model as part of the serving logic but actually to have other generic or application logic surrounding that machine learning model, so whether that's doing some processing on the input or some post-processing on the output and really combining these things together. So that's one pain point I've seen quite a bit. And we're actually building a library called Ray Serve on top of Ray to really get the best of both of these worlds.


Lukas:

Cool. That's awesome. Okay. My final question is, when you look at machine learning broadly, research but also production, implement, all these things, what's a topic that comes to mind as something that people don't pay enough attention to, that's more important than the credit that it gets?


Robert:

So I'm not sure if this is underrated, but one area that I think has a ton of potential is in using natural language processing to help people ask questions about data and help people ask questions about all the information and data out there. And for example, the fact that if you Google something like a simple fact, what year was George Washington born or what's the capital of California, you immediately get an answer. And so it makes it easier, natural for people to ask interesting questions about facts and to realize that there's some ground truth out there. Now, if we can provide similar tools that let people ask lots of questions about data sets that are not simple facts that you can look up in a database, but rather have to be inferred by performing some regression or some filtering or some basic statistics, what is there correlation between X years of school and income or things like that, the hope is that it would become very natural for people to start to ask questions about data and to get in the habit of trying to glean information from data sets out there. And I think that's something that's becoming more possible. And it will be very exciting.


Lukas:

What a great answer. I love it. Thank you. Awesome. So I think one of the things that's coming up shortly that I'm really excited about is your Ray Summit. Can you tell me a little bit about what you're hoping to accomplish there and who should come to it?


Robert:

Absolutely. And also, really excited for your talk there as well. So if you're interested in learning more about how companies ranging from tech companies, like Microsoft or AWS, to companies in finance, like JP Morgan and Morgan Stanley or Ant Financial, to startups and researchers are using Ray in production to scale up or speed up their machine learning or their Python applications, this is going to be the best place to do that. And we're really excited. We're going to be hearing from some of the leading figures in the machine learning ecosystem as well as people like Michael Jordan, people from DeepMind, Google Brain, as well as prominent people in the Python community, like Wes McKinney who created pandas, as well as tons and tons of companies using Ray to really do machine learning or scale up their applications. It's an area where it's an opportunity for the Ray community to see more about what everyone else is doing, to get to know each other better and to really showcase some of those use cases.


Lukas:

Nice. I'm really looking forward to it. I'm definitely going to be there. And yeah, I think I'm giving a talk.


Robert:

Yes, you are and we're super excited about that.


Lukas:

Awesome. Thanks so much. We appreciate it.


Robert:

Thank you.

Join our mailing list to get the latest machine learning updates.