I recently had the pleasure to be on the DataTalk.Club podcast, where Alexey and I chatted about the importance of writing in a tech career. We discussed about why I started writing, what keeps me motivated to write week-after-week, my writing process, how to start writing, the writing culture and process at Amazon, and more.
Here’s the transcript for our chat, lightly edited for clarity and readability.
Alexey Grigorev: Hello everyone. Thanks for coming to our event today. This event is brought to you by datatalks.club, which is a community of people who like to talk about data.
Today we’ll talk about technical writing and blogging. We have a special guest today, Eugene Yan. Eugene works at the intersection of machine learning and product. He likes building pragmatic machine learning systems. And he also writes and speaks about effective data science, machine learning in production, and career growth.
If you follow Eugene, you’ll know that he writes a lot every week on his personal site. Naturally, it was very difficult for me to think of anyone else who is more suited for this podcast episode. So thank you, Eugene. Welcome.
Eugene Yan: Thank you, Alexey, it’s a pleasure to be here.
Alexey Grigorev: Before we go into our main topic of writing, let’s start with your background. Could you tell us a bit about your journey so far, about your career, about how you started.
Eugene Yan: It’s a bit of an unusual journey. I graduated with a Psychology degree about 10 years ago. I spent a few years working on investment policy and decided that I didn’t really like it. I was doing a lot of contracts and agreements and I wanted to work more with data. So, I took about 20 - 30 MOOCs and interviewed at a few places and very luckily got accepted at IBM.
While I was at IBM I did a Kaggle competition on product classification. My team and I managed to get into the top 3% and we happened to share about our attempt at a meet-up. It happened that an e-commerce startup was facing a similar problem about how to categorize products correctly and they had someone in the audience.
So they invited me in for lunch and asked how I would solve the problem for them in different languages in Vietnam and Indonesia and English. And they next day they offered me a job. I was excited to experience how start-up was like. I was pretty early in my career then and joined as their third data scientist.
That startup was Lazada. A few years later Alibaba acquired us and I moved on to a health tech start-up. Unfortunately, that didn’t work out very well. So now here I am. I’m now an Applied Scientist at Amazon in Seattle, working on recommendation and machine learning systems.
Alexey Grigorev: So if anyone tells you that Kaggle is a waste of time, you’re a living proof.
Eugene Yan: Haha, I was really lucky.
Alexey Grigorev: There is an article about your career journey right and this is one of your most viewed one right, how is it called?
Eugene Yan: I’ll post a link later. I’ll find the link and I’ll share it later.
Here’s the article: From Psych Grad to Leading Data Science at Lazada
Alexey Grigorev: It’s a pretty detailed career journey. That’s an awesome read, I recommend everyone to read it, like pretty much every other article that you wrote. Since we’re talking about writing today, I’m curious, when did you start writing and by writing I mean like internal not necessarily like technical documentation, because like when you just started your career I guess you need to write a bit. But like something external, like blogging, when did you start doing that?
Eugene Yan: So I’m looking at my site now and my first post was from September 2015. This was a report for DataKind, which is an NGO that works on projects. So this was a project accelerator with a few NGOs that we were trying to help out. Why did I write that? Because they were just asking for volunteers to write, so I volunteered. Why did I volunteer? I’ll admit that my reason for volunteering was not entirely altruistic. I volunteered because I wanted to practice writing.
I know Alexey asked, when did I start writing? But I also want to answer the question of why did I start writing, which I think is a very important one. Back then, I interviewed several data scientists and data science leads that I admire, some of them two to three steps ahead of me, similar to how I interviewed Alexey a few months ago for the informal mentors series.
Alexey Grigorev: Haha, I’m not two or three steps ahead of you.
Eugene Yan: The key question I asked them was what skills do I need to be an effective data scientist? Was it domain expertise, ninja hacking skills, PhD level research, data pipelining? what was it? The answer I got really surprised me. Um, anyone want to make a guess? … I guess not, everyone’s too shy.
Alexey Grigorev: I can try, business skills?
Eugene Yan: No, you know, they say that all that could be learned. But the thing that really made someone on the team really stand out was the ability to communicate, the ability to write, the ability to speak with non-technical speakers, so that things could happen.
To be honest, when they first told me that I thought they were just bullshitting, that they were just kidding me. But then I decided, okay, I asked them this, 80% of them say the same thing. I’m going to try it for one year. So for one year, anytime that people need something to be written, I’m going to do that.
So I wrote for a year and in that same year, I also spoke at my very first meetup, which was about the Kaggle competition. And in that one year so many good things happen to me, and I decided that, hey, you know, I’m going to try this again for another year. And so that eventually became a habit. So I guess it’s a long, long answer. I started writing about five years ago because people told me to write, because they say that this is important for you to be effective. And that’s why I did it.
Alexey Grigorev: And actually, Ricky in chat wrote “communication”. So he was correct.
Eugene Yan: He got it, yes, he got it.
Alexey Grigorev: Yeah. And so this is, I guess, what keeps you motivated right? Because I think it’s pretty similar for me. I know that writing is important. I know all these benefits that it can give me all like all these connections. And every time I post something like some people reach out to me and ask things, but still, it’s very difficult to write, because, you know, writing is difficult.
It takes a lot of time. It takes preparation and sometimes, you even interview some people to actually get material. It takes a lot of time and effort. And the reward is not immediate. So yeah, you still need to put a lot of effort and then you need to do this regularly. So what keeps you motivated, like this reward or something else?
Eugene Yan: That’s a good question. At the end of 2020, I did some reflection. I went back through all my posts and one thing really stood out. I realized that the main reason why I write is to share.
For example, I would do my projects and I would share with people this is how I clean the Amazon data, this is how I deploy an API. I also wrote about the classes I took at the Online Masters of Computer Science at Georgia Tech. I wrote a summary for each class because a lot of people were asking me questions about which classes are good, which professors are good. So I wrote that. And when I attend conferences and some of these conferences are expensive, so I wanted to share my notes as well. So, I just write and put them online.
And that was first reason that and it’s a very satisfying reason because when I share and other people find it helpful, it motivates me to write more. So thankfully I happen to be motivated by this and that continues me to write.
But along the way, I also found other reasons to write. Alexey, you’ve mentioned about this before. Sometimes you think you know enough about something to write about it. And then when you actually try to write about it, you realize that you know nothing. So that happens to me a lot. I forget the details. So I write to consolidate all these details and fill the gaps. And that’s another reason to write—to learn, to consolidate your knowledge.
Recently I applied this technique to topics that I had some familiarity with but I wanted to consolidate my knowledge, such as the survey on natural language processing from RNNs up to Big Bird. And also on some topics that are new to me, such as the summary on data discovery platforms. And also, you know, consolidating my knowledge and sharing about real-time recommendations. So by forcing myself to write, I was forcing myself to learn and that’s awesome.
So I write to share, and I write to learn. And I guess the last reason, which is really awesome, and is what led to this is—something really interesting happens when you write online. People start reaching out to you for feedback, discuss ideas. They say, “this is really useful”, “this part could be improved”.
And some of these people, I’m really inspired by, and I’m really humbled to have a chance to chat with these people, my idols that I admire online. And this is also what led to the current chat I’m having with Alexey right now, to share about writing online and to help people more.
So the third reason why I write is to be a lighthouse. I’m saying: “This is what I’m interested about. If you’re interested about this, I hope you talk to me about it. I hope to have a conversation about it so we can mutually learn and mutually inspire each other.” So I write to share, to learn, and to be a lighthouse.
Alexey Grigorev: Yeah, that’s awesome. Actually now I remember the first time we spoke on Twitter. And my first question was, do you remember what I asked?
Eugene Yan: No, I can’t. (Alexey laughed) Did you reach out to me or did I reach out to you?
Alexey Grigorev: I think I reached out to you. I actually asked, “Hey, what do you use for blogging because I also want to blog. What do you use for that?”
Eugene Yan: Oh, yeah
Alexey Grigorev: That’s you being a lighthouse.
Eugene Yan: Haha true. By the way if anyone’s interested I use a very simple Jekyll framework and just hosted it on Github pages.
Alexey Grigorev: Yeah, that’s pretty amazing. And especially the learning part—I can 100% relate to that. I guess I think okay, this will take me two hours to write because I know everything, then I sit down and then like a week after the post’s still not finished because like there are so many things I need to look up, especially if it’s something technical and things like frameworks evolve. They no longer work. But, but, yeah, it also helps to learn a lot.
Alexey Grigorev: I’m curious to know, how do you write? Now we understand the motivation. Right, and this is awesome, I can totally relate to that. But it’s also interesting, how do you write like yeah, you have an article about that, but I just wanted to hear from you. Like, do you have any process, maybe, like, how do you go about writing?
Eugene Yan: It’s a good question. Let me first share how I went about writing the wrong way. So when I first started writing. I thought that, you know, writing is sitting in front of a computer, typing it out, something beautiful will come out. But it doesn’t happen that way.
And I thought that whatever I write must be 100% original, no one should have written about it. It must be 100% useful. And I realized that, looking back on it, this is a very, very, very high bar. It’s very difficult, made it very difficult for me to write.
So, how do I write now,? I realized, first, it’s not to have that mindset. So now my mindset is “I write to share, to learn, to be a lighthouse”. And I realized that writing doesn’t start with writing. You start with reading, reading things you’re interested in, learning things you’re interested in, and you think about it, and then you write notes. And now that I view it this way, my writing doesn’t have to be original. It doesn’t have to be perfect. It doesn’t have to be hundred percent useful. It makes it a lot easier for me to write.
So how do I write, do you mean about how I select topics, or how I actually craft a post? What do you mean?
Alexey Grigorev: Yeah, like basically your process. Let’s say you want to write about something, then you all of a sudden I don’t know you’re taking a walk and then an idea occurs to you or you see like a tweet about real-time machine learning in China, and then you think, Okay, this is something I can write about. So now you have this idea, right. So you use, okay, I have some knowledge about this topic and I want to write, and what happens next after this?
Eugene Yan: So usually when I have an idea for a topic, I’m just gonna put a title in my notes. I’m gonna say, “Hey, you know, here’s a possible topic”. Right now my notes have like 50 topics that I just need to go through week by week to write about it and pick.
So how do I write? My process is pretty straightforward. It starts from the timeline I set for myself. I set myself a timeline of seven days. Every week I aim to publish regardless of what state it’s in. I will ship every week. So starting from day one, I just pick a topic that I want to tackle. I write an outline of how I want to structure it.
For example, let’s take real-time machine learning. I want to write about why you should not do real-time recommendations and why it would be useful, and after that how real-time recommendations look like, and then how to do an MVP yourself. So there’s the big picture. And then I start to add bullet points. So that takes about two hours.
Then day two, I look at that outline again and I rewrite it from memory. So all the crap or the bad ideas go out and the new ideas go in. And then maybe at day three, I look, if I’m okay with the outline, maybe I stop iterating on the outline. If I’m not okay on day three I iterate again. So I do this several times on the outline section until I’m satisfied.
By the time it’s Saturday, which is day six and I have no more time to iterate on the outline, I’m forced to write the prose. So how I write the prose, is just take the outline… essentially the outline is pretty detailed, everything’s there. I just need to write it in proper human language, instead of just bullet points, so people can read it. And find images. So by the end of day six, I will always have a prose. Sometimes this takes 12 hours, sometimes this takes six hours, but I will always have it.
On day seven, after a night of sleep, I will read through it again, organize my ideas, organize paragraphs, organize the sentences, find images and all that. So by day seven, my time is up. I must publish it. If people give me feedback, I might still make more tweaks on the next Monday and Tuesday, but usually after that I stop.
So the key part of this is the iteration. I mainly iterate on the outline. Why do I iterate on the outline? Because iterating on the prose takes too long for me. I care too much about the language, the sentence structure and the right words. By forcing myself to only focus on the ideas, on the outline, I improve it and only leave very little time to iterate in the prose. So that’s my process.
Alexey Grigorev: That’s pretty interesting. So basically, like if you look at the amount of time you spent writing an article, like how much time would say you spent on this outline? 50 percent?
Eugene Yan: Definitely more, I would say that the outline is at least 50% sometimes 70%.
Alexey Grigorev: So most of the time goes to this outline and then that makes me really curious to know like how this outline looks like because to me, like when I hear outline it’s just three, four bullet points. Yeah, so, like, how do you structure this outline?
Eugene Yan: So the outline really is the key section headers. And then for a section header, under each paragraph, what should the key topic sentence of each paragraph be and then maybe I have supporting evidence, etc. That’s it. In the later stages of the outline, the outline becomes sort of like the actual content itself. It’s just that it is not in paragraph form. And that makes it easier for me to write. I just force myself to write in bullet points.
Alexey Grigorev: Yeah, so basically you have all, like the idea is already there, you already know what to write about. Like, probably you have already sections, sub-sections, even paragraphs. Maybe you already have in that outline and so you need to do on day six is just take these bullet points that you have and make them like, you know, translate them into normal, natural language.
Eugene Yan: Exactly.
Alexey Grigorev: Okay, that makes sense. That’s pretty interesting, because it’s not very similar to what I do. So I actually write prose as you say, and then I have to, okay, like this paragraph, I need to delete it. And this is super ineffective and takes ages and then also deleting things from an article is really difficult.
I spent so much time writing this piece and then I see that it’s 20 minutes. I know that. Nobody’s going to read this, this thing, right, because it takes just too much time to read. And then I have to edit. I have to delete it. And that makes the writing process very difficult. Right. And what you have. I think I should definitely try that.
Eugene Yan: I fully understand. Yeah, I recommend it. I have the same problem. Writing a paragraph just takes me 20, 30 minutes, whereas writing bullet points is just so easy, because I don’t care; I’m going to throw it away anyway.
Alexey Grigorev: Yeah, I’m also curious; you said that you try to reconstruct an outline from memory. Why do you do that?
Eugene Yan: Okay, let’s say you read a book and your friend asked you, what is that book about? You tell him the best parts, the most important parts right? That’s what reconstructing an outline from memory does. I remember only the best parts of my argument, the best parts of the story I’m trying to tell. And that’s it.
And sometimes after I reconstruct it, I would look at it side by side. Hey, you know, did I miss out any key points? If I couldn’t quite remember it maybe it’s not important. So reconstructing from memory is like, you know, putting it through a neural network (auto-encoder to be specific), only the important stuff filters through. And I do that several times.
Alexey Grigorev: Yeah, that’s interesting because I also have this problem that I cannot simply rely on my memory and I was going to ask you about that, but you also mentioned that you still look back at the original thinking. How often do you forgot things?
Eugene Yan: Quite often. I will always do a regression analysis: Is this new one better than the old one? It gets to a certain point when the difference is very small. That’s when you know you can sort of start converting the outline to prose.
Alexey Grigorev: So you do this every day. Every day you tried to start from scratch, basically, from a blank page?
Eugene Yan: Not necessarily. It depends. There are some days where I can’t finish the outline, so that will be when I continue. On the days where the previous day I complete an outline and if I’m not satisfied with it yet, I would try to rewrite it. If I’m satisfied with it, I would just start writing my prose earlier, so I get to have a bit of a break on the weekends.
Alexey Grigorev: Having a break is important, right, especially if you spent that much time on writing.
Eugene Yan: Yeah, the problem is I’m a very slow writer. I’m a very slow thinker. So that’s why I need to spend a lot of time on it. I hope that if any of you here are trying to write, that you won’t have the same problem, but this is the problem that I have. That’s why iterating through the outline helps me a lot.
Alexey Grigorev: Yeah, that’s interesting because sometimes when you read a post and then you know that this person publishes every week, it’s very difficult to, you know, see this process, like all this work that the writer has to do. And then you say you’re a slow writer. And then if I go to your blog and then I see the number of things you publish, that makes it…
Eugene Yan: I mean, if you think about it, it’s like one to two hours every weekday in the morning. That’s like seven hours and then over the weekends, let’s say it’s like 13 hours plus another five hours for final editing. There’s like 25 hours a week. So it does take a bit of time. Yeah.
Alexey Grigorev: And you also mentioned, you have a list of topics like a list of ideas, as you mentioned, you have around 50 right? How do you, like even before that, like before you start writing, how do you put things from that backlog and where do you take ideas from?
Eugene Yan: I see a pattern when I look back at what I write. For example, I write when I see people are uncertain about certain topics. Whenever I share that my team runs on Scrum, we use Agile and Scrum, people always ask, how do you do that? How is it like? And I was like, when enough people ask me about it, I’m going to write about it. People also ask me, how did you get from a psychology degree into data science? So many people ask about it, I just decide to write it. So I get a lot of questions. So I decide write answers to them and they can scale indefinitely.
That’s one way. Sometimes I also write about topics I would like to disagree on. For example, I saw this Reddit post which had 10 different roles in data science - data scientists, decision scientist, AI engineer, AI product manager, etc. I disagree. I think data scientists should be more end-to-end, and Alexey feels the same way. He thinks that data scientists should be more full stack. I thought what I was writing was an unpopular opinion—though Netflix and Stitch Fix adopt the same approach—but it turned out to be surprisingly popular.
Sometimes I also see people focusing on the wrong things. For example, a lot of people reach out to me and ask if writing is important. They say writing documentation takes a long time. I disagree. I think writing becomes more important in your career as you progress. Sometimes people ask me, what’s the point of reading papers? I disagree. You need to read papers to keep up in your field.
So I write to answer questions that I get from people. And occasionally, if I may, once I build up enough trust with the community, I disagree on something and try to share my differing opinion.
Vaibhav: Can I ask a question, because it kind of relates to what Alexey was asking. So you said you are kind of thinking about 50 topics. But then, let’s say you have to finish your topic in a week’s time. How do you prioritize these 50 topics or n number of topics that you’re thinking about?
Eugene Yan: I realized that people who come to my site enjoy reading the machine learning stuff. But to be honest guys, I can’t write about machine learning every week. I’m really interested in it, but not to the extent of writing about it every week. So I will try to make sure that I write about machine learning once or twice a month and then I add in some other stuff, which I think is really important.
For example, next week, I want to write something about why you should not be doing online courses in 2021. I know it’s controversial, but I think Alexey and I talk a lot about this as well. And he probably knows the reason why I’m saying this. I want to write that. To just tell people, guys, there’s a better way to learn. Because a lot of people, when they talk to me, they asked me, you know, is this course good, is that course good, should I do this masters.
Thus, the only prioritization I have is at least one or two machine learning topics in a month. The rest of the time is messages that I want to try to get across. I imagine writing for my teammates. I imagine writing for my mentees. What is my view that I want to share with them to help them be more effective data scientists and this may not be related to machine learning at all sometimes.
Alexey Grigorev: So you basically have an image in your head of the target audience right, of the person who is going to read that. And do you use this somehow in your writing like when you prepare this outline how do you use it?
Eugene Yan: That’s a great question. So I write mainly for three groups of people.
The first person I write for is myself. When you write, you should write for yourself because let’s be honest, no one’s going to read it. If you write for yourself, at least you’re going to benefit from it, by furthering your learning.
The second person I write for is my wife. I hope to at least help her understand what it is that I do.
The third person I write for is my current team members and my future team members, and likeminded friends in the community. So I’m going to assume that these are people with a bit of a technical background, maybe a bit of machine learning background, people who are driven to learn, to level up.
Sometimes, some of the topics I write about don’t get a lot of traction, are not very popular, like the topic I wrote about the importance of writing versus coding. I think that’s a very important topic, but it didn’t get as much traction as I’d like. But maybe the community is not ready for it yet. So yes, I do have a specific group of people. My future teammates and my current teammates in my industry.
Evgeniy: Can I ask a question? First of all, thanks for all the detail and it resonates in a lot of respects and you have a very organized workflow for reviewing the draft, reviewing the concept and I was wondering, does it ever happened, that there is some tipping point that editing it more makes your text worse? Like sometimes like you, you know, that you would have to release it and you know that the extra time spent on editing or drafting would just make things worse or it’s always kind of linear. So do you see any non-linearities in your writing process?
Eugene Yan: That’s a good question. So let me answer—do further iterations make it worse? Yes, further iterations can make it worse. That’s why when I iterate on outlines I always compare them side by side. Which one communicates the story, the message I’m trying to get across better?
When I’m editing paragraphs, I would just type a new paragraph above the old paragraph and compare the two paragraphs. Usually my metric for success—Is it shorter and conveys all the key ideas? If it is, good. If it’s longer and it gets a bit more messed up, it’s a bit harder to write, that’s when you know it’s worse and you delete it. So it can regress. You should always try to do some sort of regression analysis to just check, is it better.
Alexey Grigorev: This is what you mean by regression analysis. I was going to ask that. If it has to do anything with linear regression.
Evgeniy: Do you also see your result as a product, design, product life, and life, you know versioning?
Eugene Yan: Yeah, I do. I see my writing as little packages of love (cringe). When I write about stuff, I really want to help people understand about it. If they don’t understand about it, then I fail. So that means my product UX is bad. If I write about a product and the substance is not there, it’s not useful, that means the back end is bad. So it’s very difficult. And I don’t think I will ever get there. But the way for me to get there is to just force myself to write every week. And every year, there’ll be 50 iterations, or rather 50 different products and I would have gotten enough practice, maybe in 5, 10 years I will be good enough.
Alexey Grigorev: 5, 10 years… Like now if your articles are that good, like, what will happen in 5, 10 years haha.
Eugene Yan: In 5, 10 years I can write the same thing in five hours instead of 20.
Alexey Grigorev: Do you somehow try to control the length? Because for me, personally, this is a problem because, like, I have this idea, but to actually elaborate this idea it takes just too much space. I need too many pages. And then I know that people with their attention span we have today, they will not read this. Is this a problem for you? And if it is, how do you overcome it?
Eugene Yan: This is an interesting question. So I’ve read, and a lot of people have told me, that the ideal length for an article is about 1500 words, that’s about 10 minutes. And that’s the average attention span you can get from reading online.
But honestly, before this, and even now, I don’t really think about it. I want to write just enough to communicate the substance and the content and get the message across. Sometimes this is just 600 words or 1000 and that’s it. Perfect. Sometimes I need to go into 4000 or 5000 words like the recent one about real-time recommendations—I feel that if I cut out anything, if I cut out any paragraph, it loses the big picture.
I assume that my audience are people like me, my future team members, my current team members, people who want to learn about real-time recommendations. I assume that they are really interested in it and they will read it. I don’t expect them to read it all at once. They can read it and refer to it anytime. But to be honest, I don’t have a maximum length. I just try to keep it as short as I can because it’s also courteous, courtesy on the readers’ time.
Alexey Grigorev: And I guess having this deadline of one week helps not to, you know not to put too much information there.
Eugene Yan: Yes, definitely.
Alexey Grigorev: And for me, one of the most difficult parts is like, let’s say I already finished writing something. How do you call this thing? How do you come up with a title or maybe you even do this before you start writing or how does the process look for you when you come up with the title?
Eugene Yan: Hmm, I title my posts like how I name my functions—in the simplest way so that no one would misunderstand. This is a difficult thing which I’m still trying to learn and I just try to come up with the sentence that explains what this is about. It’s just like how you write documentation, like how you write code.
However, I’ve heard that there are other things to consider such as SEO or how to get people to click on your title more. But usually, by the time it’s Sunday night I’m publishing it, I just don’t have the energy to think about the title anymore, honestly speaking. So I don’t put a lot of effort into that.
Alexey Grigorev: When we were planning this event. I was talking to you, Eugene. And then I said, I want to call the event “Blogging and Technical Writing”. But then you said, no, no, no. You didn’t say that it’s boring. But you said the title should be “The Importance of Writing in a Tech Career”.
That sounds much much better than what I suggested. It’s not just the essence, because I think like blogging and technical writing also capture the essence. But you made it sound so much more interesting, right. And we can see that. If we look at the number of people who registered for this event, it’s more than usual. So you really nailed the title.
Eugene Yan: Thank you. I want to try to explain why I named it that way. So when Alexey first reached out to me for this talk, I was, hey, this is a great opportunity to get people to write more. And how do I get people to write more? I say it’s important. And this is a very specific audience right, in tech, and you know tech people are very focused about their careers.
So I’m wanted to talk about why writing is important in a tech career, and hopefully convince people to start writing and practicing. And that how the title came about.
Alexey Grigorev: That makes sense. Again, you keep the audience in mind when coming up with a title. Now let’s say you’ve convinced—we have 34 people online now—everybody on this call to start writing. What do they do next?
Eugene Yan: Just start writing. I know it’s not a very good answer. Let me share some questions people ask me. “I’m afraid of writing online. Can I write about this? It’s just a summary of articles, it’s just a book summary. Will it be useful? Will it be worthy of writing online?” I’m going to share with you a pretty brutal truth: For your first article, no one is going to read it. So just like they said “dance like nobody’s watching”, write like nobody’s reading. It doesn’t matter. You’re writing for yourself to practice.
Then, “what to write about?” A lot of people ask this as well. Write what you’re thinking about now. Write what’s on your mind, what keeps you up at night. It doesn’t have to be related to your career. It could be about gardening. It could be about, I don’t know, recipes. It doesn’t matter. Just, just write.
Also, some people say, “oh, I need to find my topic, my theme, to write about.” Right. And I’ll share with you about this example of Patrick McKenzie. I don’t know if you know him. He goes by patio11 on Twitter. So for years, he just wrote about what’s on his mind. I think compilers or Ruby or Java, whatever is happening in Japan.
But when he looked back after several years he found that, the things he wrote that at the intersection of engineering and marketing really resonated with the audience. And that’s how he found his theme. But it took him several pieces and one or two years to figure that out. It’s just the same way—you cannot connect the dots looking forward. It’s only after you’ve written a lot then you realize… what is your motivation, what are the topics that you sort of write about and then it happens. So it’s going to be a long term process.
Alexey Grigorev: And one recommendation I often see on the internet in blog posts is like, first you need to find your niche. So what you’re saying is, don’t care about finding your niche, just write whatever is on your mind, just go sit on your computer. And, you know, start writing.
Eugene Yan: I would say that it’s useful to find your niche, but you won’t find your niche before you write a lot. Then that’s when you find your niche. So it’s sort of a circular thing. They don’t see that you need to first write a lot to find a niche, which is after you write a lot when you reflect. That’s where you see your niche.
Alexey Grigorev: But then also writing about the same thing over and over again is pretty boring, right?
Eugene Yan: Exactly. I can’t write about machine learning every single week. I’m interested in it, but I have other interests too that are important, like about career, about data science processes. So I try to write about that. So I haven’t found my topic of focus yet and that’s fine because I’m just writing for myself. My topic of focus is whatever Eugene is thinking about. That’s it.
Alexey Grigorev: And what tools specifically, would you recommend to use? Should it be Medium, should it be WordPress, should it be Jekyll? What would you recommend?
Eugene Yan: I recommend whatever is easy for you to use. A lot of people spend too much time on, oh, I need to find the perfect domain, and I need to have the perfect Hugo or Gatsby or Jekyll setup, and it needs to be CI/CD. Come on guys, the time you spend on that, just spend it on writing. Just pick whatever’s the least amount of work. Writing is already really difficult. Remove all the excess work and just focus on writing.
When I started I used WordPress. It was really easy to just start writing. It got to a certain point in time when I wanted to customize my theme and WordPress didn’t allow me to do it. That’s why I switched to hosting my own site, but by then I already written like 50, 60 pieces. The tool is the least of your concerns. Just write and push it online. Substack, WordPress, Medium. Whatever reduces your barriers to entry.
Alexey Grigorev: I think Medium is pretty easy. Like, you don’t need to do much there, just get an account and start writing (Eugene: Exactly). With Wordpress, this might require some setup, right (Eugene: Yup). And we have a question. The question is, I assume you’re busy. How do you schedule and prioritize this writing work into your daily routine?
Eugene Yan: Yeah, well, I am busy, but, um, I guess let me share a bit. In 2017 to 2019, I was doing an online masters of computer science. At that time, I was spending 20 to 40 hours a week on the online masters while I was working. So after I graduated I suddenly had a lot of time. That’s when I decide, okay, I need to pour this energy somewhere else and I decided to pour it into writing.
It’s just one to two hours a day, early in the morning. It’s the same as exercise, as meditation. It’s just a daily habit. My Saturday is just hammering out the prose and Sundays are just trying to edit a bit. I do make time to go out and thankfully, I have a very understanding wife. But I think it’s possible. And you don’t have to spend so much time, you can just write short snippets, maybe 500 words. Just start small. Start with whatever you’re comfortable with.
Alexey Grigorev: Or tweet.
Eugene Yan: Yeah, just tweet.
Alexey Grigorev: Haha actually tweets are even more difficult, right, because of the limitation you have. (Eugene: True) By the way, speaking of Twitter, yesterday I posted a tweet saying that you will tell us the secret sauce of writing haha.
Eugene Yan: Haha I have no secret sauce. Honestly, when I woke up today and I saw Alexey’s tweet, I thought “crap, I have no secret sauce”. But I have been completely transparent and honest. Maybe the outline iteration approach is my secret sauce. I don’t know. It’s something that I hear is something new. But yeah, I don’t have a secret sauce really.
Alexey Grigorev: We have a question—how did you make your blog popular? What did you do to attract people so they go and read and then get in touch with you?
Eugene Yan: I don’t know. I wish I had the answer to that so I could repeat it. I would write and then I would write a tweet about it. And then I would copy that same tweet and post it on LinkedIn. And that’s it. And people who saw it would maybe circulate it. That’s it.
That’s my only distribution channel because it’s already so much effort to just compose tweets right. I’m just gonna write 280 characters and just post it there and that’s it. Eventually I sort of find likeminded people who read my stuff. And I didn’t really put a lot of effort into it, honestly, and I don’t think it’s very popular. But I’m at least thankful that I can find people who have the same interests that can engage with me on that.
Alexey Grigorev: Well, you’re saying not popular, but how many times did you end up on the first page of Hacker News.
Eugene Yan: Ah, I can’t remember. (Alexey: So that many times haha.) Honestly, but Hacker News is a completely different thing. Hacker News likes disagreements. I think my post was like “Stop taking regular notes, use a Zettelkasten instead” and you know people were debating about it. “Oh this Zettelkasten is a new fad”. And that’s why.
Alexey Grigorev: Yeah, but I think it wasn’t just one post that ended up there, there’s also your post about this like the data science should be more end-to-end. Right. But at least I think this article, so the secret sauce is disagreement right. So find what people disagree about then.
Eugene Yan: It could be. And now that I think about it, maybe the real secret sauce is that threw a lot of darts. I think last year I shot 55 arrows (essays) and, you know, three of them hit the mark. So my hit rate is about, I don’t know,
2.5% 5%. So if you shoot a lot, things happen.
Alexey Grigorev: Basically consistency right.
We also wanted to talk about not just writing for an external audience on the internet but also internal like at work. And this is also an important part right for your career, not just we’re not just writing code for entire day. Right. So we need to communicate with other people and writing is a form of communication. Right. When we write the documentation we all always need to think about the reader. So it’s not just, you know, an afterthought right. So what do you think about writing at work, like, why is it important, and how you should go about doing that?
Eugene Yan: So why is writing at work important? Well that’s a very big question to answer.
Alexey Grigorev: Let’s say, why should a project need documentation?
Eugene Yan: So let me let me share it this way. Before you start a project you sort of need to socialize your idea and check your idea with other people. Let’s just take a document that Amazon is famous for. It’s called the press release. Before we start a project, we would write a press release and, you know, maybe the press release for Amazon S3 is like, you know, highly scalable storage at a very cheap price and we send it out internally. Are people excited about this? If people are excited about this, hey, maybe this product could work. And this is before we even do any coding. So we first do this to test the audience.
And after you decide to do it—if the press release document works well—then maybe you write something called the design document. I know software engineers are very familiar with this, but the design doc is essentially… let’s say I’m gonna write the design document for real-time recommendations on how I would train the model, how I would serve the model, what are my business and technical requirements in terms of latency, in terms of throughput, in terms of cost. I write that and again, I will circulate that.
So writing at work gives you a way to test your ideas at scale before writing any code. So that’s really powerful. First you test it from the customer perspective: Is the idea exciting? Then you test it from the implementation: Is it useful? Is it possible? You’re going to check with your principal engineers, your fellow engineers, and make sure it works.
And documentation. Alexey mentioned about documentation. Personally, I’m a very forgetful guy. If I write some code, six months later, I would look at it and definitely forget about it. I will be looking at my code and be thinking: “What is this idiot Eugene writing about? Why did he implemented this way?” So documenting my code is like me—in the past—sending to myself: “Hey, you might think this is really stupid but here’s the reason for this. It’s because the data before this day is like this. Or it’s because I had to optimize the code to meet the latency requirements” and this is what it does, documenting your code does that.
Similarly for documenting your projects. Why did we decide to go with dynamoDB instead of Redis? What’s the pros and cons of each? Where’s the decision log? Why did we decide to go with Flink instead of Spark? You know all these answers. Or at least, you would have done the research for them. It would all be in your head. But you want to write it down so that you remember, and you want to write it down to share the knowledge with your team.
I hope that answers the question. Writing is important at work to test your ideas through a press release, test your design through the design doc. You document at work, you document your stuff so that the knowledge is not lost. And this helps you scale the knowledge more effectively.
Alexey Grigorev: And then also, like you said, you write for your future self, right. Because you can forget things. That resonates with me because I know that if I do this sequence of terminal commands, I know that in a few days, I will forget them. I need to write them down. So next time I need to execute the same, like create do the same thing, I have reference, and I just can copy paste things. I also wanted to ask you, like you mentioned, like this press release and then design document, is it something that I think Amazon is called working backwards. Is it that thing?
Eugene Yan: The press release is part of the working backwards process. So the working backwards process, it’s not exactly a secret. I mean, it is online. So essentially, working backwards is what it is, we work backwards from the customer. And that’s the right way to work. It means that we try to understand what the customer problem is, what the customer potential demand is and we work backwards from it. The other way of doing this is, we build it and people use it because what we build is so awesome. But, you know, often, that doesn’t work. So that’s the working backwards process.
And Alexey, you talked about this a lot. First, do some research. Understand what is the problem the customer has before you even train a model to solve that problem. If that’s not a problem or if the potential cost saved or if the potential revenue is too minor, we shouldn’t put effort into that. Right. So we work backwards.
The press release in the working backwards process helps us to solidify our idea around a single idea. For S3, we want to provide highly scalable, low latency, low cost storage. And so the team is going to build around that.
I think for Alexa initially the press release was “ You can start music with your voice command”. Initially it was just that. Starting music via voice. But over time, of course, it has been very popular and it has grown and grown such that you can shop for groceries, you can ask questions, you can control lightning, but the press release, the first one is just test the market. Are people interested in it? Are internal stakeholders interested in it? Are your VPs, are your directors interested in it to channel resources into this research?
Alexey Grigorev: Yeah, makes sense. And the writing is an important part of this process right?
Eugene Yan: Yeah. I’m sure everyone knows, or at least most people know that Amazon doesn’t have slides. We don’t use PowerPoint. At least most of the time. So we write documents and that makes it very easy to scale, right? To share something you don’t need to be there to present for half an hour, you can just send them the document and they can just read it. So it really saves a lot of time.
Of course, you know, writing the document takes a lot of time. But once you have written it, it’s upfront costs that you almost never have to repeat it again. Unlike a presentation, you always have to be there to present, or you could video yourself, I don’t know. But I can tell you this document writing process is very useful and it will follow me wherever I go, I will adopt the same practice.
Alexey Grigorev: And I think we still have a bit of time and they have another question for you. What I want to ask you is writing for portfolio. So let’s say you did an awesome project, let’s say it was a Kaggle competition you finished in the top 20 top 30 like with a medal and you want to put all your code to Github. What do you do, what do you need to put in the README so people can immediately see the value of this thing?
Eugene Yan: Let’s think about it. Imagine I’m searching on GitHub for the code for some Kaggle competition. What would I find? I would find hundreds of repos on that Kaggle competition. What would distinguish one repository from another?
I think having a README is useful. A README that says how to start using this code, how to quick start, how to install this code, if it has a requirements, that will be really useful. And maybe explains the big processes. The data prep step is in this folder, the machine learning step is in this folder, the validation steps is in this folder. So I think a basic README that sort of explains it. That would be good enough.
Next, imagine thinking of it from a hiring manager’s point of view. If they’re going to hire you, you definitely need to be able to write code. Then the question is: Are you able to document code and share about it? When they read your code does the documentation make sense? When they read your README, does it make sense? Do you explain it well enough that they feel confident that you can do the same thing at work? So I think that’s something that would be helpful.
Alexey Grigorev: Okay, we’re running out of time, and you probably need to go. So thanks a lot for coming today for sharing your knowledge with us and for sharing all see your secret sauce. I like what you do differently, and it deserves this name. I’ll try to follow your process and see what happens. And they think it will be a lot better than what I previously wrote.
Eugene Yan: Thank you. I hope so. Thanks Alexey. Thanks everyone.
Had a great chat with @Al_Grigor on the importance of writing in a tech career.— Eugene Yan (@eugeneyan) January 19, 2021
We discussed about:
• Motivations for writing
• My writing process (iterating outlines)
• How to start writing
• Writing culture at Amazon
• And more...👇https://t.co/5GJ8IAxJ29
Join 4,300+ readers getting updates on data science, data/ML systems, and career.
Welcome gift: 5-day email course on How to be an Effective Data Scientist 🚀