Setting up my new MacBook Pro from scratch
17 Nov 2024  ·  5 min  ·  engineering misc
ML systems, production & scaling, execution & collaboration, building for users, conference etiquette.
03 Nov 2024  ·  10 min  ·  machinelearning engineering production leadership
Look at and label your data, build and evaluate your LLM-evaluator, and optimize it against your labels.
27 Oct 2024  ·  13 min  ·  llm eval learning 🛠 🩷
FastAPI, FastHTML, Next.js, SvelteKit, and thoughts on how coding assistants influence builders' choices.
08 Sep 2024  ·  8 min  ·  learning engineering 🛠
Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators.
What to interview for, how to structure the phone screen, interview loop, and debrief, and a few tips.
07 Jul 2024  ·  21 min  ·  machinelearning career leadership
Structured input/output, prefilling, n-shots prompting, chain-of-thought, reducing hallucinations, etc.
26 May 2024  ·  17 min  ·  llm production
From the tactical nuts & bolts to the operational day-to-day to the long-term business strategy.
12 May 2024  ·  1 min  ·  llm engineering production leadership
Building an AI coach with speech-to-text, text-to-speech, an LLM, and a virtual number.
Evals for classification, summarization, translation, copyright regurgitation, and toxicity.
31 Mar 2024  ·  33 min  ·  llm eval machinelearning
How unit testing machine learning code differs from typical software practices
25 Feb 2024  ·  6 min  ·  machinelearning engineering
Overcoming the bottleneck of human annotations in instruction-tuning, preference-tuning, and pretraining.
Some fundamental papers and a one-sentence summary for each; start your own paper club!
An expanded charter, lots of writing and speaking, and finally learning to snowboard.
Sending helpful & engaging pushes, filtering annoying pushes, and finding the frequency sweet spot.
24 Dec 2023  ·  18 min  ·  teardown recsys machinelearning production
How to use open-source, permissive-use data and collect less labeled samples for our tasks.
05 Nov 2023  ·  12 min  ·  llm eval machinelearning
The biggest deployment challenges, backward compatibility, multi-modality, and SF work ethic.
Reference, context, and preference-based metrics, self-consistency, and catching hallucinations.
Distinguishing problems with external vs. internal LLMs, and data vs non-data patterns
13 Aug 2023  ·  6 min  ·  llm production
Evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback.
30 Jul 2023  ·  66 min  ·  llm engineering production 🔥
Writing drafts via retrieval-augmented generation. Also reflecting on the week's journal entries.
11 Jun 2023  ·  6 min  ·  llm engineering 🛠
What's the big deal, intuition on query-key-value vectors, multiple heads, multiple layers, and more.
21 May 2023  ·  8 min  ·  deeplearning nlp llm
It started with a question that had no clear answer, and led to eight PRs from the community.
Should chat be the main UX for LLMs? I don't think so and believe we can do better.
9 patterns including HITL, hard mining, reframing, cascade, data flywheel, business rules layer, and more.
23 Apr 2023  ·  20 min  ·  machinelearning engineering production recsys
Generating Dr. Seuss headlines, fake WSJ quotes, HackerNews troll comments, and more.
Also, shortcomings in document retrieval and how to overcome them with search & recsys techniques.
09 Apr 2023  ·  14 min  ·  llm deeplearning learning 🛠 🔥
Asking LLMs to generate biographies to get a sense of how they memorize and regurgitate.
Writing good instructions to achieve high precision and throughput.
12 Mar 2023  ·  6 min  ·  machinelearning mechanism
Collecting ground truth, data augmentation, cascading heuristics and models, and more.
26 Feb 2023  ·  16 min  ·  teardown machinelearning production
End of week debrief, weekly business review, monthly learning sessions, and quarter review.
05 Feb 2023  ·  7 min  ·  mechanism leadership
Pilot & copilot, literature review, methodology review, and timeboxing.
22 Jan 2023  ·  7 min  ·  mechanism machinelearning productivity
How to migrate and sync notes & images across devices
15 Jan 2023  ·  2 min  ·  til productivity
Seeking first to understand, earning trust, and preparing for away team work.
08 Jan 2023  ·  3 min  ·  leadership misc
Travelled, wrote, and learned a lot, L5 -> L6, gave a keynote at RecSyS, and started a meetup.
A quick overview of variational and denoising autoencoders and comparing them to diffusers.
11 Dec 2022  ·  3 min  ·  deeplearning
The fundamentals of text-to-image generation, relevant papers, and experimenting with DDPM.
27 Nov 2022  ·  19 min  ·  deeplearning nlp survey
My three favorite papers, 17 paper summaries, and ML and non-ML lessons.
02 Oct 2022  ·  14 min  ·  recsys engineering production
Or why I should write fewer integration tests.
04 Sep 2022  ·  19 min  ·  engineering machinelearning production 🩷
Pushing back on the cult of complexity.
14 Aug 2022  ·  10 min  ·  machinelearning engineering production 🔥
Some off-the-beaten uses of Python learned from reading libraries.
31 Jul 2022  ·  10 min  ·  python engineering 🔥
15 minutes a week to document your work, increase visibility, and earn trust.
26 Jun 2022  ·  4 min  ·  mechanism productivity career
Understanding and spotting patterns to use code and components as intended.
12 Jun 2022  ·  13 min  ·  machinelearning engineering python 🔥
Mindset, 100-day plan, and balancing learning and taking action to earn trust.
Industry examples, exploration strategies, warm-starting, off-policy evaluation, and more.
08 May 2022  ·  14 min  ·  teardown recsys machinelearning
Introducing randomness and/or learning from inherent randomness to mitigate position bias.
17 Apr 2022  ·  7 min  ·  recsys
Thinking about recsys as interventional vs. observational, and inverse propensity scoring.
10 Apr 2022  ·  8 min  ·  recsys eval machinelearning
How they differ and why they work better in different situations.
20 Mar 2022  ·  7 min  ·  engineering productivity misc
Hard-won lessons on how to start data science projects effectively.
06 Mar 2022  ·  7 min  ·  datascience engineering productivity
I'm heading into a team lead role and would like to define the vision and roadmap.
18 Feb 2022  ·  3 min  ·  leadership datascience career 📬
What to consider for in terms of data, roadmap, role, manager, tooling, etc.
13 Feb 2022  ·  8 min  ·  datascience machinelearning career 🔥
Beyond getting that starting role, how does one continue growing in the field?
19 Jan 2022  ·  6 min  ·  learning career machinelearning
Met most of my goals, adopted a puppy, and built ApplyingML.com.
28 Nov 2021  ·  7 min  ·  productivity life
More than two dozen interviews with ML Practitioners sharing their stories and advice
25 Nov 2021  ·  1 min  ·  machinelearning career 🛠
Susan shares 5 lessons she gained from writing online in public over the past year.
07 Nov 2021  ·  6 min  ·  writing
Write before you're ready, write for yourself, quantity over quality, and a few other lessons.
17 Oct 2021  ·  7 min  ·  writing
Simple baselines, ideas, tech stacks, and packages to try.
03 Oct 2021  ·  5 min  ·  recsys deeplearning production survey
Why this is the first rule, some baseline heuristics, and when to move on to machine learning.
19 Sep 2021  ·  8 min  ·  machinelearning career 🔥
Focusing on long-term rewards, exploration, and frequently updated item.
05 Sep 2021  ·  13 min  ·  teardown recsys deeplearning
How to generate labels from scratch with semi, active, and weakly supervised learning.
01 Aug 2021  ·  12 min  ·  teardown machinelearning
Building semantic search; how to calculate recall when relevant documents are unknown.
20 Jul 2021  ·  1 min  ·  machinelearning 📬
Show them the data, the Socratic method, earning trust, and more.
04 Jul 2021  ·  9 min  ·  leadership career
Breaking it into offline vs. online environments, and candidate retrieval vs. ranking steps.
27 Jun 2021  ·  13 min  ·  teardown production engineering recsys 🔥
A whirlwind tour of bandits, embedding+MLP, sequences, graph, and user embeddings.
13 Jun 2021  ·  25 min  ·  teardown recsys machinelearning deeplearning
How to go from knowing machine learning to applying it at work to drive impact.
02 May 2021  ·  12 min  ·  machinelearning career 🩷
An overview and comparison of the various approaches, with examples from industry search systems.
25 Apr 2021  ·  21 min  ·  teardown machinelearning production 🔥
Even high achieving individuals experience impostor syndrome; here's how Susan learned to manage it.
18 Apr 2021  ·  8 min  ·  career life learning
More education, achievements, and awards don't shoo away imposter syndrome. Here's what might help.
11 Apr 2021  ·  9 min  ·  career life learning
What do you deeply care about? What do you excel at? Build a career out of that.
Short vs. long-term gain, incremental vs. disruptive innovation, and resume-driven development.
21 Mar 2021  ·  12 min  ·  datascience machinelearning leadership
I wish I started sooner. All have improved my life and several have compounding effects.
Pointers to think through your methodology and implementation, and the review process.
07 Mar 2021  ·  15 min  ·  writing machinelearning engineering
Three documents I write (one-pager, design doc, after-action review) and how I structure them.
28 Feb 2021  ·  10 min  ·  writing engineering productivity 🩷 🔥
Access, serving, integrity, convenience, autopilot; use what you need.
21 Feb 2021  ·  19 min  ·  teardown machinelearning engineering 🔥
What the top teams did to win the 36-hour data hackathon. No, not machine learning.
14 Feb 2021  ·  6 min  ·  datascience engineering misc
What I learned about hiring and training, and fostering innovation, discipline, and camaraderie.
31 Jan 2021  ·  16 min  ·  leadership datascience
Stop procrastinating, go off the happy path, learn just-in-time, and get your hands dirty.
How to increase the chances of getting called up by recruiters?
16 Jan 2021  ·  5 min  ·  career datascience 📬
Why real-time? How have China & US companies built them? How to design & build an MVP?
10 Jan 2021  ·  21 min  ·  teardown machinelearning recsys production 🔥
A public roadmap to track and share my progress; nothing mission or work-related.
03 Jan 2021  ·  4 min  ·  productivity life
Wrapping up 2020 with writing and site statistics, graphs, and a word cloud.
20 Dec 2020  ·  8 min  ·  productivity life
A short story on flying daggers and life's challenges.
Time to clear the cache, evaluate existing processes, and start new threads.
06 Dec 2020  ·  4 min  ·  productivity life
How he switched from engineering to data science, what "senior" means, and how writing helps.
How did you set up your site and what's an easy way to replicate it?
Data cleaning, transfer learning, overfitting, ensembling, and more.
22 Nov 2020  ·  11 min  ·  machinelearning career life
Interview questions you should ask and how to evolve your job scope.
15 Nov 2020  ·  8 min  ·  datascience career
A personal take on their deliverables and skills, and what it means for the industry and your team.
08 Nov 2020  ·  11 min  ·  datascience machinelearning engineering career
Setbacks she faced, overcoming them, and how writing changed her life.
01 Nov 2020  ·  11 min  ·  career machinelearning writing
What questions do they answer? How do they compare? What open-source solutions are available?
25 Oct 2020  ·  16 min  ·  teardown datascience engineering 🔥
DNS server snafus led to email & security issues. Also, limited free build minutes monthly.
21 Oct 2020  ·  3 min  ·  misc
Not 'How to build a data science portfolio', but 'Whys' and 'Whats'.
18 Oct 2020  ·  15 min  ·  datascience learning career
Step-by-step walkthrough on the environment, compilers, and installation for ScaNN.
14 Oct 2020  ·  3 min  ·  python machinelearning til
Building prototypes helped get buy-in when roadmaps & design docs failed.
11 Oct 2020  ·  7 min  ·  productivity
As careers grow, how does the balance between writing & coding change? Hear from 4 tech leaders.
04 Oct 2020  ·  13 min  ·  writing career leadership
Emphasis on bias, more sequential models & bandits, robust offline evaluation, and recsys in the wild.
27 Sep 2020  ·  16 min  ·  recsys deeplearning production survey
What if the alternative was nothingness?
26 Sep 2020  ·  1 min  ·  life
For years I've refined my routines and found tools to manage my time. Here I share it with readers.
20 Sep 2020  ·  16 min  ·  productivity
My tools for organization and creation, autopilot routines, and Maker's schedule
13 Sep 2020  ·  11 min  ·  productivity
A step-by-step of how to migrate from json comments to Utterances.
Checking for correct implementation, expected learned behaviour, and satisfactory performance.
06 Sep 2020  ·  14 min  ·  machinelearning engineering python
Should I switch from a regex-based to ML-based solution on my application?
04 Sep 2020  ·  4 min  ·  machinelearning 📬
Why read papers, what papers to read, and how to read them.
30 Aug 2020  ·  6 min  ·  learning
Becoming a senior after three years and dealing with imposter syndrome.
27 Aug 2020  ·  2 min  ·  career datascience 📬
How not to become an expert beginner and to progress through beginner, intermediate, and so on.
Examining the broad strokes of NLP progress and comparing between models
16 Aug 2020  ·  23 min  ·  nlp deeplearning survey
Why (and why not) be more end-to-end, how to, and Stitch Fix and Netflix's experience
09 Aug 2020  ·  17 min  ·  datascience machinelearning leadership 🔥
Updating our FastAPI app to let users select options and download results.
05 Aug 2020  ·  3 min  ·  engineering python til
Surprising lessons I picked up from the best books, essays, and videos on writing non-fiction.
02 Aug 2020  ·  11 min  ·  learning writing 🩷
Why OMSCS? How can I get accepted? How much time needed? Did it help your career? And more...
I couldn't find any guides on serving HTML with FastAPI, thus I wrote this to plug the hole on the internet.
23 Jul 2020  ·  3 min  ·  engineering python til 🔥
Ever revisit a project & replicate the results the first time round? Me neither. Thus I adopted these habits.
19 Jul 2020  ·  12 min  ·  mechanism datascience productivity
It's not enough to have a good strategy and plan. Execution is just as important.
12 Jul 2020  ·  7 min  ·  mechanism datascience productivity
I wanted to add my recent writing to my GitHub Profile README but was too lazy to do manual updates.
11 Jul 2020  ·  3 min  ·  engineering python til
I thought giving it my all led to maximum outcomes; then I learnt about the 85% rule.
09 Jul 2020  ·  2 min  ·  productivity life 🩷
Part II of the previous write-up, this time on applications and frameworks of Spark in production
05 Jul 2020  ·  15 min  ·  machinelearning deeplearning spark production survey
Sharing my notes & practical knowledge from the conference for people who don't have the time.
28 Jun 2020  ·  11 min  ·  machinelearning deeplearning spark survey
After this article, we'll have a workflow of tests and checks that run automatically with each git push.
21 Jun 2020  ·  20 min  ·  engineering production python productivity 🔥
Does DS have business requirements? When does it make sense to split DS and DE??
21 Jun 2020  ·  3 min  ·  datascience 📬
A curious discussion made me realize my expert blind spot. And no, Airflow is not late.
17 Jun 2020  ·  3 min  ·  engineering production til
Haste makes waste. Diving into a data science problem may not be the fastest route to getting it done.
15 Jun 2020  ·  12 min  ·  mechanism datascience productivity
Initially, I didn't like it. But over time, it grew on me. Here's why.
07 Jun 2020  ·  10 min  ·  mechanism agile leadership datascience
Crocker's Law, cognitive dissonance, and how to receive (uncomfortable) feedback better.
Can maintaining machine learning in production be easier? I go through some practical tips.
25 May 2020  ·  16 min  ·  machinelearning engineering production
I thought deploying machine learning was hard. Then I had to maintain multiple systems in prod.
18 May 2020  ·  14 min  ·  machinelearning engineering production
An expansion of my Twitter thread that went viral.
09 May 2020  ·  4 min  ·  writing
What I Learnt about evaluating ideas from first-hand participation in a hackathon.
03 May 2020  ·  7 min  ·  datascience lazada
What I learned about measuring diversity, novelty, surprise, and serendipity from 10+ papers.
Why you should give a talk and some tips from five years of speaking and hosting meet-ups.
18 Apr 2020  ·  7 min  ·  datascience writing
Should I join a start-up? Which offer should I accept? A simple metaphor to guide your decisions.
12 Apr 2020  ·  6 min  ·  career
Using a Zettelkasten helps you make connections between notes, improving learning and memory.
05 Apr 2020  ·  6 min  ·  writing learning productivity 🔥
Writing begins before actually writing; it's a cycle of reading -> note-taking -> writing.
Automate your experimentation workflow to minimize effort and iterate faster.
15 Mar 2020  ·  6 min  ·  productivity python
How hard work, many failures, and a bit of luck got me into the field and up the ladder.
Beating the baseline using Graph & NLP techniques on PyTorch, AUC improvement of ~21% (Part 2 of 2).
13 Jan 2020  ·  17 min  ·  recsys deeplearning nlp python 🛠
Building a baseline recsys based on data scraped off Amazon. Warning - Lots of charts! (Part 1 of 2).
06 Jan 2020  ·  14 min  ·  recsys deeplearning python 🛠
OMSCS CS6200 (Introduction to OS) - Moving data from one process to another, multi-threaded.
15 Dec 2019  ·  7 min  ·  omscs learning engineering
OMSCS CS6750 (Human Computer Interaction) - You are not your user! Or how to build great products.
Moving off wordpress and hosting for free on GitHub. And gaining full customization!
25 Aug 2019  ·  1 min  ·  misc
OMSCS CS6440 (Intro to Health Informatics) - A primer on key tech and standards in healthtech.
OMSCS CS7646 (Machine Learning for Trading) - Don't sell your house to trade algorithmically.
11 May 2019  ·  9 min  ·  omscs learning machinelearning python
No, you don't need a PhD or 10+ years of experience.
30 Apr 2019  ·  8 min  ·  career datascience
Taking the best from agile and modifying it to fit the data science process (Part 2 of 2).
02 Feb 2019  ·  14 min  ·  mechanism agile datascience productivity
A deeper look into the strengths and weaknesses of Agile in Data Science projects (Part 1 of 2).
26 Jan 2019  ·  13 min  ·  agile datascience productivity 🔥
OMSCS CS6601 (Artificial Intelligence) - First, start with the simplest solution, and then add intelligence.
20 Dec 2018  ·  8 min  ·  omscs learning machinelearning python
OMSCS CS6460 (Education Technology) - How to scale education widely through technology.
OMSCS CS7642 (Reinforcement Learning) - Landing rockets (fun!) via deep Q-Learning (and its variants).
30 Jul 2018  ·  6 min  ·  omscs learning deeplearning python
Culture >> Hierarchy, Process, Bureaucracy.
12 May 2018  ·  5 min  ·  leadership datascience lazada
OMSCS CS7641 (Machine Learning) - Revisiting the fundamentals and learning new techniques.
27 Dec 2017  ·  4 min  ·  omscs learning machinelearning python
How being a Lead / Manager is different from being an individual contributor.
25 Sep 2017  ·  5 min  ·  leadership datascience lazada
OMSCS CS6300 (Software Development Process) - Java and collaboratively developing an Android app.
13 Aug 2017  ·  5 min  ·  omscs learning engineering
Tools and skills to pick up, and how to practice them.
25 Jun 2017  ·  9 min  ·  datascience learning career
OMSCS CS6476 Computer Vision - Performing computer vision tasks with ONLY numpy.
15 May 2017  ·  5 min  ·  omscs learning deeplearning python
If things are not failing, you're not innovating enough. - Elon Musk
19 Feb 2017  ·  3 min  ·  leadership datascience lazada
Or how to put machine learning models into production.
13 Feb 2017  ·  8 min  ·  machinelearning production python 🛠
A web app to find similar products based on image.
14 Jan 2017  ·  4 min  ·  deeplearning python production 🛠
Cleaning up text and messing with ascii (urgh!)
11 Dec 2016  ·  8 min  ·  machinelearning python 🛠
A simple web app to classify fashion images into Amazon categories.
27 Nov 2016  ·  2 min  ·  deeplearning python production 🛠
Got accepted into Georgia Tech's Computer Science Masters!
A card sorting game to discover youl passion by identifying skills you like and dislike.
Parsing json and formatting product titles and categories.
11 Oct 2016  ·  9 min  ·  machinelearning python 🛠
Learning Scala from Martin Odersky, father of Scala.
31 Jul 2016  ·  4 min  ·  learning
Guest post of how DataKind SG worked with NGOs to frame their problems and suggests solutions
17 Sep 2015  ·  8 min  ·  datascience
193 posts, 29 talks, 16 prototypes, 382,697 words, and countless hours.
Join 9,100+ readers getting updates on machine learning, RecSys, LLMs, and engineering.