Bukalapak - Fireside Chat with the Data Science team

[ datascience leadership ] · 10 min read

I was recently invited to speak at a fireside chat with Bukalapak, an e-commerce unicorn in Indonesia. It was a fun discussion with the data science and engineering team where I shared my experience building and running data teams in South East Asia and answered questions from the audience. I got feedback that my responses helped with their thinking on running data teams and engaging with stakeholders. Maybe they’ll benefit you as well.

• • •

What’s your perspective of Data team in companies where the main product is not Data? Should Data be in a leading or supporting role?

I think this depends on the organization that Data is part of. One way to look at is whether Data is a cost center or a profit center. I believe that, in the case of Bukalapak, data is a profit center. In e-commerce companies, Data drives Gross Merchandise Value (GMV) though powering search, recommendations, marketing, customer acquisition, etc.

If Data is part of a cost center, such as working on anti-money laundering because it’s a government regulation, or working on process automation, there’s only so much money that data can save. The maximum value that data can provide is 100% of the cost involved. And even then, it will never achieve 100% savings as it costs infra and people to run those (data and machine learning) systems.

On the other hand, if data is part of a profit center, the maximum value it can provide is infinite. Take Google’s ad systems for example. Since Google started selling ads, ad revenue has kept growing and growing. In general, data teams in profit centers tend to have more respect and influence, because, well, they bring in the bacon. Data teams in cost centers tend to have a supporting role (and are also likelier to get outsourced).

That said, there are exceptions. One exception is based on the amount of trust Data leaders have in the organization. I’ve seen Data leaders where, regardless of their function (i.e., profit or cost center), they have a lot of influence in the organization. They’re consulted by the executive team and senior leadership because they’ve earned the trust and demonstrated that data can add value. In those cases, Data plays a leading role.

On the other hand, if a Data leader has lost the trust of the organization, either by making bad decisions, or not playing well with the other functions or teams, they might lose the trust of the org and be put in a position where they can only play a supporting function.

How do you increase collaboration between data and the rest of the organization? How do you encourage stakeholders to be more data-savvy beyond looking at dashboards?

Reflecting on my previous experience, I think there are three main mechanisms that my teams did well.

The first approach is more formal. In Lazada, we started sending out a data science newsletter internally, where we shared with the larger organization about the results of our collaboration with various stakeholders. This helped to increase awareness of what the data team was doing and brought in new “business”. Stakeholders would reach out after a newsletter and say, “You know that thing you did for them? Could you do the same for us?”

We also did roadshows, where I travelled to several of our marketplaces (i.e., countries) to share about the data team’s work through presentations, as well as consult with each country on their key needs. In a previous startup, our tech team had demos every Friday. These demos were open to everyone (<20 people) and it helped the business development and marketing team gain awareness of the work we were doing so they could pitch better.

The second thing that made a big difference was Friday happy hour (i.e., more informal). Back in Lazada, every Friday night, the VP of Data Engineering and I would start Friday happy hour. We would have food and drinks—essentially, it was just a reason for the team to get together and chill. But every now and then, a senior functional leader would join us. I mean, free food and drinks!—many people from across the org joined us and we learned more about the various functions during happy hour than the entire week.

After a few drinks, I would sneak in the question: “What’s keeping you up at night? How can data help?” Sometimes, they would share problems that data could help on, such as automatically detecting spam reviews, or improving delivery forecasts. Then the team would act on it and deliver. This helped tremendously with gaining trust with the organization. And that person now becomes a champion for data.

Also, sometimes, when I get a request or some information I don’t understand, I like to just walk around the office and ask stakeholders for “2 minutes”. This usually means 10 - 15 minutes. But in that short meeting, everything would be cleared up and it improves the understanding between teams. I also tell them that they can do the same with me, and I’m glad that some do reach out to me for “2 minutes”.

Finally, and this is probably more relevant to your second question, is that I try to provide stakeholders with white-glove service when they take an interest in data. For example, if a stakeholder is asking about some data, I would go and sit with them, set up the SQL client on their laptop, make sure they have DB access, and share a simple query so they can get the data they need. I would also then take the time to work with them to tweak the query to suit their needs.

Most of the time, it’s a one-off request (they probably learn that asking Eugene for data help takes too much time and learning on their part haha). But sometimes, a few stakeholders get interested enough that they start playing with the database themselves. This happens about one in five times.

How do you work with business to define the right problems and prioritize solving them? Can you give some concrete examples?

This is a good question. I usually do this by just being curious and growing a thick skin to ask “Why?”. Here’s an example where the logistics team reached out with this request:

“Could you improve the rank of products that are Fulfilled by Lazada (FBL)?” I thought the request made sense. We want to rank products that use our services higher and incentivize merchants to use FBL. Nonetheless, I decided to ask “Why?”.

“Because those products that are fulfilled by Lazada are delivered faster.” Again, very reasonable response. We want customers to get their products faster. Nonetheless, at the risk of sounding dumb, I decided to ask “Why?” again.

“Because when it’s delivered faster, we get fewer complaints about late deliveries.” Ah, now the root cause emerges. They want to rank products that are fulfilled by Lazada higher to reduce customer complaints about late deliveries.

But IMHO, this was not a product ranking problem; instead, it was a delivery forecasting problem. Our forecasts were underestimating the time it took to deliver products when sellers choose to ship the products themselves (e.g., via 3rd-party or dropshipping). In this case, we choose to tackle this problem by improving the delivery forecasting algorithm instead of tweaking the product ranking system.

So that’s how to define the right problem. Then, how do we prioritize which problems to solve? It so happens that I’m writing a post about this now, but the gist of it is … (Please refer to this post from last week.)

As a senior/lead in the team, with knowledge across the company to keep on top of, how do you find a balance and avoid being overwhelmed with such scope?

I think that as you become more senior, it’s inevitable that you lose touch with some of the details that you had in your previous role. For example, if you’re in Bukalapak, you might try to learn everything that’s happening in Bukalapak, read all the docs, explore all the data, and by the time you’re done, what you’ve learned would already be out of date.

I think that as a senior, you should not expect yourself to know, do, and deliver everything, especially if you’re in a leadership position. That’s not what’s expected of you, and if you did that, I think you are actually leading poorly. Instead, your role is to provide guidance and intuition to the team, mentor and grow them, and trust them to make the right decisions. And because the team is directly working on the problem or system, they’ll have a more in-depth understand and a more updated context than you.

Also, as a senior or lead, you should be more focused on business outcomes. How do you identify the most pressing problems that needs your attention? How do you structure the problem and solution and help your team to execute it effectively? How do you hire, train, and motivate your team to deliver? All this takes a lot of energy. It’s unlikely you can stay in touch with the details—like an IC can—while working on this.

Nonetheless, sometimes, you might need to dive into the details, perhaps when the problem is especially nebulous, or when it’s a critical, time-sensitive issue. If so, you have to apply just-in-time learning to quickly get up to speed on the issue, consult with your team, and trust your experience and intuition to make the right decisions.

What’s your opinion about using data from an excel spreadsheet? It usually comes from ops/admin and business/product teams who want the data to be accessible via monitoring.

I think it’s a pain. Nonetheless, it’s tolerable as long as it’s automated. I used to work with teams that provided data via CSV format, and for the longest time, it wasn’t automated. This means that sometimes the column names would change, or go missing, or the encoding might change from UTF-8 or ASCII or Latin ISO.

If it were up to me, I would not accept any data that comes from a non-automate process. What this means is that CSV, Excel, and text files are fine, as long as they are automated and have a consistent schema and format. Once it’s automated, I don’t think it’s too difficult to export the CSV or Excel files into an S3 bucket. The data team can then build an ETL pipeline to read and store that data in a database.

Alternatively, the data team can directly query the data from their database. They should be getting their data from somewhere, right? Ask them how they’re querying the data and get it from their database directly.

You have experience in the US and Singapore. How do you compare the data culture between South East Asia and the US?

I don’t think that region plays a big part in terms of data culture. In South East Asia, we also have our tech unicorns doing awesome data and machine learning work such as Grab, GoJek, Traveloka, Tokopedia, Sea, Bukalapak, etc.

I think what makes the difference is the organization. For example, in the US, you might also find older, more mature companies where data and machine learning is not as advanced as our champions in South East Asia.


If you found this useful, please cite this write-up as:

Yan, Ziyou. (Mar 2021). Bukalapak - Fireside Chat with the Data Science team. eugeneyan.com. https://eugeneyan.com/speaking/bukalapak-fireside/.

or

@article{yan2021bukalapak,
  title   = {Bukalapak - Fireside Chat with the Data Science team},
  author  = {Yan, Ziyou},
  journal = {eugeneyan.com},
  year    = {2021},
  month   = {Mar},
  url     = {https://eugeneyan.com/speaking/bukalapak-fireside/}
}

Share on:

Browse related tags: [ datascience leadership ]

Join 6,700+ readers getting updates on machine learning, RecSys, LLMs, and engineering.