This week, we chat with Alexey Grigorev, a lead data scientist from OLX Group. We actually met in-person last year at OLX Group’s Prod Tech Conference where he presented how to deduplicate images. However, we didn’t recognize each other online, and only found out when I asked him about it right before this chat!
Alexey has an interesting career. He started as a software engineer focused on Java. Then, he stumbled upon something that made him want to switch to data science. He did this by taking a Master’s in Business Intelligence (which he will share was not the best time investment).
After that, in a single interview, he secured his first data science role, and has since grown in his career. In this chat, I probe him about his thought process, as well as his advice for others looking to emulate his career.
Alexey: Hi, I’m Alexey, a lead data scientist at OLX. My role includes too many things and involves overseeing anything related to machine learning. I try to help where help is needed, and stay on top of everything.
I also work a lot with infrastructure, such as making sure that once we have a model, we can serve it to real users. I also mentor a lot of people, such as data analysts or engineers who want to get into machine learning. I help them with training their first model and then deploying it.
Alexey: It’s funny that you asked what I saw. I actually saw a video, or rather, a video course by Andrew Ng (laughs). I saw that course, and I thought, okay, this is what I want to do. This led me to taking more courses, also on Coursera, and chatting with a couple of companies who were looking for data scientists.
However, everyone was telling me that I didn’t have enough education and the background wasn’t a good fit. This was how I decided that I needed to get a Master’s.
Back then, I didn’t know that BI is not really data science (laughs). But, I still had a couple of courses on machine learning, and the BI courses were also helpful.
Alexey: I think the path I took wasn’t the most optimal. I spent two years doing the Master’s, but I think I could have done the switch in like 6 months. Just watching courses, doing projects, and trying to find a job. But back then, it wasn’t easy to get a job without “proper” education; the field was still new, and people didn’t know who to hire, what kind of background was needed.
It’s easier now. People kinda know more or less what they want to see in a data scientist. For example, a PhD is a nice to have, but it’s not a showstopper.
Thus, I wouldn’t advise doing a Master’s—you’ll spend two years doing it, and not all the courses are needed. While I did study useful things, not all of them turned out to be useful for me.
Alexey: Haha, of course! (Smiles). Yes, I know that people can acquire useful skills from a Master’s or PhD. But, I don’t think it’s the only way of doing this.
Eugene: There you have it everyone. I mean, a lead data scientist, a hiring manager himself is saying that it’s not necessary to have a Master’s, if any recruiters turn you down, you can show them this video (haha).
Alexey: I remember that interview. It was a long interview, just one interview. Usually these days, you have a series of interviews. But that was just one interview, 2.5 hours; it was pretty tough.
Most of the time, we talked about my thesis. It was about math information retrieval. Basically, trying to work with formulas in Wikipedia. Let’s say we have an expression,
e = mc**2; we want to know that
e is referring to energy.
My takeaway is that it’s good to have a project to talk about. It doesn’t have to be a thesis, you don’t really have to do a Master’s. But you should have a real-life project, a real application of machine learning.
This is especially important if you don’t have a lot of experience in your CV, if it’s your first full-time job. It can be a thesis, a course project, a side project, a Kaggle competition. If you have something to talk about, it’s a big plus; it makes the conversation smoother.
One other thing the hiring manager was interested in was my Java experience. The company had a lot of services written in Java, and they needed somebody who can help integrate machine learning into these services.
So knowing Java, and having a project (I could talk about), made the interview go well.
Eugene: That’s a great experience.
I had a similar experience where I attended a three-hour long interview which included lunch. The company, an e-commerce startup, was interested in my experience in a Kaggle competition on product classification. Turns out that they had a problem with product classification too, and needed someone to fix it. And eventually, I got hired.
But these are very unusual experiences, and I think I got lucky. Maybe you got lucky too. And so, for the benefit of our listeners…
Alexey: (Laughs) Sorry, I’m going to disappoint you. After graduating, I just had one interview, got an offer after that, and accepted it.
Maybe it was a mistake actually. Maybe I shouldn’t have stopped at one interview. Maybe I should have interviewed with more companies.
Don’t get me wrong, I’m very happy with my experience at Searchmetrics. It was a really great job. I learned a lot and met a lot of great people, some of whom I’m still in touch with.
But now when I look back, I think it’s always good to have multiple opportunities to choose from. Maybe in some other instances, it’s good to have an offer, and continue to interview at other places. Maybe there’s something better out there. But if you don’t try, you don’t know.
Alexey: First of all, let’s talk about the title itself; “senior”, what does it actually mean?
If you talk to people from different companies, maybe all the answers will be different. But I think there’s one common aspect in all the answers.
A senior is someone who can take end-to-end responsibility.
And the important word is not end-to-end, but responsibility. If you have a problem, a senior is someone who will find a way to get it done. Even if it’s a complex project, even if there are obstacles, they will figure it out. They will find the right people and remove the blockers. Instead of sitting there with the blocker, they will find the solution. This is the most important quality of a senior, in my opinion.
Specific to data science, I think a senior is someone who can do a project end-to-end. This includes talking to stakeholders, figuring out of ML is actually the right tool, translating requirements into the language of ML, understanding if the problem is worth solving, breaking a big, ambiguous problem into smaller tasks for other members of the team.
It doesn’t mean that they are a rockstar who can do everything. They’ll work with other people, with product managers, with stakeholders. But they’ll need to assess the situation—is it really worth spending time working on this problem? Do we need ML, or is something simpler good enough? They might work with data engineers, or build the data pipelines themselves. Then, they’ll train the model, and serve it.
If someone can do all these things, they can 100% call themselves a senior data scientist. I think the important part here is the first part. A mid-level data scientist will also be able to train a model, or work with data engineers on pipelines. But to become a senior, the communication and problem framing aspects are essential.
Some people might call this position a lead data scientist. For me, the main distinction is that a senior is mostly involved in one project. They’re making all the decisions in one project. They’re still very hands-on, spending more than 50% of the time coding.
For a lead, it’s more projects, less hands-on. It’s more communication with multiple stakeholders.
Eugene: To summarize, a senior data scientist is someone you can trust. You give them a problem, they’ll be independent, they’ll run with it. (Alexey: Exactly)
And how do they do it? They need to have the end-to-end understanding. They might not be doing everything themselves. They can get other people to help them.
And it’s not just doing things the right way. It’s questioning even the decision of whether to do it or not. This is what makes a data scientist senior.
Alexey: What I liked at a startup was exposure to pretty much everything. I could do everything I wanted, there were no boundaries.
And there was a lot more work than everyone in the company could possibly do. Our data science team had only three people. When you only have three people, you really have to think carefully about what to do that would have the biggest impact on the customer. How would it affect our product? How does it help customers?
And we didn’t have anything, and had to build everything from scratch. How to do this so that everything doesn’t fall apart (i.e., sustainable), and yet still move fast? We had to make many trade-offs. In some areas, we decided to move faster. In other areas, we dedicated more time to make the product more robust and sustainable.
That was a fun experience.
I think it’s also possible to have such experience in a bigger company. But, a bigger company has more people. You don’t always have as much freedom as a startup, to make all these technical decisions. Especially in a corporation, you have all these existing infra and tools that you have to use, so you’ll have some limits.
Nonetheless, a start-up will also have such limits, such as time and money, which also forces you to be creative.
What I like about corporations, is that there’s always someone who can teach you how to do these things. At a start-up, you’re mostly on your own. In corporations, you might find more senior colleagues, more mature processes, and also more resources.
What I would suggest to people who are just starting, and if they have two offers, is to go with a startup. It’s more likely that you’ll have a broader experience there. You’ll be exposed to many different things. You have to talk to sales, engineers, product. You’ll learn more.
Eugene: Haha people are going to listen to this, and just hear that Alexey said: “Startup” (Alexey: laughs).
I like the startup life too, but I’m going to add a bit of balance here.
I think that in a bigger company, you can work at scale (Alexey: nods). OLX is in 40, 45 marketplaces. You get to learn about the various cultural differences, regulation differences, and work at scale across multiple teams.
For people who are very young in your career, I think there’s no way you can make a mistake. Everything is just going to be a great learning experience.
Alexey: Effective is somebody who gets the job done, in a reasonable amount of time. Also, pragmatic to some extent, not a perfectionist.
I remember my team lead in Searchmetrics. I learned a lot from him. He was really effective. He could just come with a problem, dig deep, and focus on solving this problem. After a couple of days, he would have a solution.
I found myself thinking, hey, I want to be like him. I want to quickly come up with an idea, and develop it. That was cool.
Also, being effective is being able to focus. I know it’s very difficult. You have so many things you have to do, and have to spend the day in meetings. Now that I know this, I admire his work even more.
I think what helps with effectiveness is to focus on the problem. When it comes to data science, what we often think of first is the solution.
Okay, I have this problem, and I’m going to hit it with this hammer of machine learning. We see this with people starting with the latest TensorFlow models that are huge.
It helps to take a step back, and think: Do we really need to solve this with the fanciest model out there? Or is there a simpler solution?
Just focus on solving the problem, not how to solve it. Usually, the simplest way to solve it is the right way.
Also, on learning effectively, I think it’s good to have the right balance of courses and hands-on practice. Don’t just do courses.
Sometimes, we follow a tutorial and come away thinking that we can now do everything. Usually, this is not true. For me, after watching a tutorial, and then trying to apply it to a project, I find that I don’t know the topic well (Eugene: exactly).
Then I need to do a lot of googling, a lot of research. I think this is what helps. After doing a course, do a project. Or just do projects and learn along the way.
Eugene: Alexey mentioned a great point and I just wanted to add to that. After you do a course, and you try to apply what you learned, you’re gonna realize that there are 1,001 things that you don’t know.
The same thing happens when you’re writing. You think you know something and try to write about it. Then you realize that your mind is a blank.
Doing projects and writing are a great way for you to consolidate your thinking and to fill the gaps.
Alexey: Usually, when I interview, I ask a simple Python question. It’s a very simple problem, similar to fizz buzz. It requires one for-loop and an if-statement within it. Some people can solve it in 3 - 5 minutes.
If someone has Python in their CV, and they can’t solve it, it’s a no-go. If someone starts on the problem, and then says “I usually google how to do for-loops in Python”, then for me, it’s very suspicious (Eugene: yeah).
I mean, then why do you put Python in your CV, if you need to google how to do a for-loop? For me, this is a big red flag.
Apart from this Python thing, there are no other showstoppers. It shows that the person claims they can program, but they actually cannot.
On the other hand, what stands out?
Having projects stands out. Having end-to-end responsibility, delivered a project end-to-end, being able to talk about the trade-offs they made, all the decisions, why the project started. These stand out.
I understand that not everyone will work on this level to be able to answer this. But it’s good to ask, why are we doing that. And then be able to think through that question and answer it properly.
Alexey: People often underestimate the amount of time it takes to deploy a model. Not just the deployment, but building data pipelines, etc.
Also, we don’t spend enough time making sure that we’re solving the right problem, making sure that what we do actually matters. If you spend half a year developing this great model that nobody cares about, then you’ve just wasted half a year.
Ask yourself, why are we doing this? What kind of problem are we trying to solve? Who is the user? How will they use it? Are they going to use it the way we imagine, or will they do something different? Having this conversation with the user is very important.
Alexey: Currently, I’m an IC as lead data scientist. I haven’t been a manager to answer this question.
Nonetheless, for the IC track, what I see is exposure to many projects, with less hands-on. I like this. I like to guide and mentor people. What should they watch out for? What should they make sure to understand? Why do we need to solve this? I like asking these questions, and engaging with stakeholders, and mentoring people.
Instead of solving the problem myself, I show others how to do it. This scales a lot better. As a lead, I can work on multiple projects, and teach people by showing them how to do it. While they may not be effective immediately, in a year, they’ll be able to work at the same level as I am now. This means I can scale out my skills, and in this way, be more effective.
That’s what I like about being an IC. It’s technical, still hands-on, I still do a lot of pair-programming.
For a manager, it’s pretty different. They need to think about things like performance reviews, and other things. (Eugene: budgeting, fighting for resources, politics).
But now that I’m thinking about this, I think it might not be a bad idea to take on the managerial path too. Maybe it’s a good idea to first have a small team, maybe 3 - 4 people, and test if you really enjoy the people management aspect of the job. If after 3 months, and you find that it’s not your cup of tea, you can safely backtrack.
Eugene: I think that’s a great idea, and companies should provide that option to their technical contributors. It can be overwhelming for people to become a manager, and they realize there’s no u-turn, and they burn out and leave.
Instead, if we give people an option of being a manager, with the option to backtrack, and after the trial, they decide it’s not for them, they can stop. And we still retain them. It’s a win-win.
Alexey: The Master’s, I don’t think it was really necessary. Maybe back then it kinda was, but now, it’s not necessary for sure. Just doing courses and projects should be enough.
One thing that helped was starting to freelance in parallel with my studies. That gave me a lot of projects, and gave me a great portfolio. That was helpful. While freelance is not for everyone, if you’re studying and have some free time, it’s probably a good idea to freelance a bit.
Eugene: Yes, that’s a great point. What’s difficult for people who just come out of school is that they don’t have much experience to show for it. Freelancing, internships, these are great ways to demonstrate your work. You’ll also learn a lot of stuff that you don’t learn in school.
Alexey: Another thing is this document I have for each project. From the very first project meeting, I capture everything in a document. What’s the problem? Why do they think ML is a good solution? How does success look like? What are the next steps? Every time we follow-up on a topic, I capture it in the document. And over time, it captures the history of how this problem evolves.
I started doing this in OLX, I didn’t do this previously. But now thinking back, I should have started doing this even when I was freelancing.
Another thing that’s helpful is writing blog posts about things and sharing online. A blog post helps you consolidate what you learn, what is important. I don’t think I did this enough, and only started it at OLX.
Alexey: It’s difficult to write, and make it crisp on paper. All my thoughts, it seems really clear in my head. But when I’m trying to pull it out into a document, it’s super difficult. Just two minutes ago, when I was thinking about this, it was so clear in my head. But why doesn’t it come out on paper like that?!
I know why people don’t write. It’s difficult. They spend days, or weeks writing something. And when they ask for comments, they get a lot of comments. And they get … sigh… discouraged, and they give up.
I think it helps to do it more. The more you do this, the easier it becomes. Even though I’ve been writing for quite a while, it’s still very difficult for me. But when I force myself to write things, it becomes easier for me.
Writing helps with speaking as well. In future, when I have a conversation, or a podcast, like now, these things are clear in my head, because I’ve written them. And speaking about it becomes easier.
I think this is the power of writing, and everyone should do this to structure their thoughts and think clearly. Even if you just write your own internal notes, that nobody sees, it still helps to structure your thoughts.
More advice about writing from Alexey and other leaders here.
Alexey: I remember trying. I had one RSS reader to subscribe to arXiv RSS. It was basically impossible to keep up. I also had a folder on Dropbox, and I called it “To Read”. At some point, it became half a gig. I remember that day, when I decided that I know I’m not going to read this, that was a good day (laughs).
At some point, I just thought to myself: Do I really need all this information? What am I going to do with it? Just realizing that there’s no way to digest all this information, it helps a lot.
How to choose what to learn? I don’t know, I just try to focus on the problem that I’m solving. Whatever works for that problem, I try to find.
Also, for the last 2 - 3 years, I’m trying to learn things outside of data science. A bit of marketing, how to speak, how to read. If you like something, you learn, and when you stop liking it, maybe you’ve learned enough, and you move on.
Eugene: What Alexey is sharing is just-in-time learning (Alexey: Yes).
You have a problem you need to solve, you try to learn about it, and immediately you get hands-on practice. That’s how it sticks.
Alexey: It somehow happened naturally. I’m writing a book, and one of the readers asked, “Is there a place where I can talk about this book?” I realized that there’s actually no such place, and decided to create a place for that.
I also get a lot of questions on LinkedIn, Twitter, email, Quora. I try to answer these questions, but it doesn’t scale. That was another reason I started the community slack. People can ask these questions in public, and we can answer them. Maybe if I start by showing an example, more people can help with the questions?
The community also hosts meetups. One reason why I do this, is sometimes I want to talk at a conference, I submit a great proposal, and I’m rejected (laughs).
It’s disappointing, and I think, why do I have to submit a proposal? Why do I need a conference? Can’t I just talk about it myself? And this is how it happened, maybe I can just host meetups.
Also, a friend asked, do you know of a place where I can give a talk? Why yes I do! This was the SageMaker event that you attended. It was our first talk.
Eugene: Currently, Alexey hosts one talk a week. I don’t know how he keeps up with the cadence, but it’s a great time to join datatalks.club.
This was a great chat, thanks Alexey. I’ll share the videos on YouTube and write this up.
This week, we chat with @Al_Grigor, lead data scientist @ OLX Group.— Eugene Yan (@eugeneyan) December 2, 2020
He started as a Java dev, did a Master's in BI, then in a single interview, got his first DS role.
I asked about his experience, thought process, and advice for new data scientists. https://t.co/8oDiVYqGJQ
Thanks to Alexey Grigorev for the interview. Thanks to Yang Xinyi and Alexey Grigorev for reading drafts of this.
Join 4,000+ readers getting updates on data science, data/ML systems, and career.
Welcome gift: 5-day email course on How to be an Effective Data Scientist 🚀