Friday, July 24, 2015

A Physicist and an Engineer Go Looking for Data Scientists

In early April we had a visit from two data scientists at Engage3, who came to speak in our Alumni Seminar Series. Toward the end we had some particularly interesting discussion about the value of a physicist as data scientist. I include that part of the conversation here, addressing the question: you need a lot of machine learning, probability and statistics, subjects never taught in physics departments, so why do you hire physicists?

Anup Doshi, Director of Data Science at Engage3
As background, in late March we had a physics department colloquium by the founder of Engage3, Ken Ouimet. According to the Engage3 website, "Ken earned a BS in Chemical Engineering from UC Davis and went on to UC Santa Barbara to pursue his PhD work in Chemical Engineering and Theoretical Physics. While studying Statistical Physics he realized he could utilize these same principles to model retail markets and optimize retail pricing decisions."


The colloquium title was "The Physics of Shopping and Algorithmic Trading in Consumer Marketplaces." The visit by Ouimet led to our meeting with the Engage3 data scientists. They were James Holliday (PhD in physics from UC Davis in 2007) and Anup Doshi (PhD in EE).


We pick up the discussion toward the very end of Holliday's presentation.


Holliday: We’re trying to hire people. I tend to look for physicists or people who have gone through a physics education. And the reason I do that is I believe that physicists have a way of solving problems and approaching problems that’s unique. I love the way that we’re taught to take a problem, a complex problem that we’ve never seen before, and break it down into fundamental blocks: things that we have seen before or things that we understand very well. And it can be a really complicated thing and maybe we have to make some approximations, but the ability to look at something that we have not seen before and come up with a way to solve it – it’s just wonderful and I think it’s unique to physics. 

LK: I’m curious for an engineer’s perspective. We can tell ourselves stuff like this all the time but I’m a physicist. You [Doshi] have physicists working for you, and you’re in the market for hiring talent. So I’m also really interested in your perspective on what’s valuable about a physicist.

I think the key qualities that physicists bring to problem solving are the ability to approach a problem from first principles, mathematically model a problem from first principles, and then follow in some sense a scientific method to get all the way through the problem.

Doshi: Sure. Just to preface that question: my background – I did a PhD in Electrical Engineering and since then I’ve been working in this field of data science for a number of years now. I’ve had bosses that are physicists, colleagues that are physicists, and folks that are working for me as physicists, and I always enjoyed working with all of them. I think the key qualities that physicists bring to problem solving are the ability to approach a problem from first principles, mathematically model a problem from first principles, and then follow in some sense a scientific method to get all the way through the problem. That's formulating the problem, doing background research, modeling, generating a hypothesis, doing experiments, doing tests, skills from high energy physics like Monte Carlo simulations, for example, solving great, tough optimizations, going to the whiteboard and actually writing out the optimization problem; working out better ways to solve this. And then beyond that just getting the results and interpreting the results and then communicating those back. Those skills are unique to I think the mathematically-oriented, scientific person, like physicists. You don’t get that necessarily in any other discipline, that I've seen.

Holliday: Exactly. I like to look for the physicists when I'm hiring data scientists. One thing that I do when I’m interviewing people is I’ll throw a problem – I’ll throw it very quickly – I’ll throw a very difficult problem that I don’t expect people to necessarily be able to solve; I don’t give them all the information they need to solve that because I want to see if they can ask the right questions to understand the problem to make progress on it. And I want to see how they think about it as they’re pushing forward; to see if they can’t work in those situations.  The ones we wind up hiring tend to be ones from the mathematical, the scientific-oriented fields, that can think through the problem. So I would encourage everyone as you’re pursuing science or whatever,  make a habit out of asking for clarifying information if you don’t understand something. That is the real world: sometimes you’re not given all the information you need; you need to get that information to make progress.

Questioner: Why don’t you guys hire from mathematicians rather than physicists?

Holliday: We have talked to a few statisticians, and we hired somebody recently with a statistics background; a statistician. 

Questioner: I think you’re thinking biased because for data analysis you need a lot of statistics and machine learning and probability, which are the courses that are never taught in physics departments. You have to spend a lot of time investing in some people to teach them those courses.

Somebody else: I think if you’re a physicist you’re assumed to know probability and statistics; that is the basic –

Original questioner: A little, but not as much as mathematicians need statistics. I am working on complexity, but in all interviews I say that I have a statistics background with probability and machine learning.

Holliday: Yeah, that’s very fair, and I appreciate the question. I suppose it is coming off like I’m saying I’m putting up a filter: only physicists apply. One nice thing, what I get when I assume that a physicist or data scientist is coming in, like we said, there’s the assumption that they have some of that mathematical foundation. If someone were to come with just a mathematical degree, I would be happy to interview. I would obviously be impressed with the math; there’s probably a lot that could be said about the problem solving, and we’d just have to see. 

Doshi: So if I could follow up on that: we see a lot of candidates come across our desk who have X, Y, Z background, and then they’ve got a Master’s in Data Science. And there’s lots of programs now. Data science itself is a big, growing field, and a lot of universities are offering the “Master’s in Data Science.” And they’ll teach you skills like basic statistics, basic machine learning, computational skills – learn python, whatever you need to learn – they’ll teach you that for a year or two, then pump you out with a degree in Data Science. You see a lot of those candidates coming across our desk. They’ll come across, and we’ll pose them one of the simplest problems, a Bayesian problem, and they won’t know how to approach it properly because it doesn’t fit into the things that they’ve learned.

Math questioner: I didn’t mean those. Because those are programs that you pay for them; you don’t get admitted to university for data science; it’s like an MBA.

the key missing qualities there are that inquisitiveness and the ability to approach a problem from a first principles kind of concept.

Doshi: Maybe, but there’s also courses – you can go out and learn by yourself whatever machine learning you want to learn, and so the key missing qualities there are that inquisitiveness and the ability to approach a problem from a first principles kind of concept. Even if you don’t know how to solve the problem the way it’s supposed to be solved, can you think about a solid approach, and can you formulate it in a way that, given your background, that will get you to a reasonable answer – a reasonable hypothesis even – a reasonable answer relatively quickly? And then can you follow through that logic? That kind of inquisitiveness and the ability to approach a problem correctly is much more valuable than actually having those skills because then if we grab the people that have that ability, then we can go out and say hey, here, read this book and come back. 

Other questioner: What are some of the things that you thought you needed to learn as you entered industry, that have helped you succeed in industry?

Holliday: So here’s the academic world versus the real world: in academia you spend a lot of time making sure that the calculations are correct, the foundations are right, the assumptions are correct; in industry there’s a whole lot of “I need something right now.” And that’s a little bit hard for me. And I could see how somebody who has definitely gone through the path of academia would not want to maybe compromise the ethics of the math in order to get a very sloppy calculation out now that we can give to the investor because we’re on the hook for something that needs to be delivered.