Start Here • Eugene Yan

Key themes in my work

New here? These are topics I write & speak about. Alternatively, browse tags or search.

Exploring ML systems in industry and how they're implemented.

Lessons from A year of Building with LLMs: As seen on O’Reilly (Parts 1, 2, 3, book)
Patterns for LLM Systems: Evals, RAG, fine-tuning, caching, guardrails, defensive UX.
Improving RecSys & Search with LLMs: Model architecture, data generation, etc.
System Design for RecSys & Search: Offline vs. online, retrieval vs. ranking.
Real-time Retrieval: Examples from various companies and how to build an MVP.
Patterns for Personalization: Via bandits, sequences, graphs, and user embeddings.
Reinforcement Learning for Recsys: Long-term rewards and explore-exploit.
Bandits for RecSys: Industry examples, warm-start, off-policy evaluation.
Search Query Matching: Via lexical, graph, and representation learning methods.
Push Notifications: What to send, what not to send, and how many to send.
Content Moderation: Collecting labels, data augmentation, cascade pattern, etc.
Bootstrapping Data Labels: With semi, active, and weakly supervised learning.
Feature Stores: As a hierarchy of needs (e.g., access, serving, integrity, etc.)
Data Discovery Platforms: How they help with find data and open source options.

Surveys on machine learning methods.

Prompting Fundamentals: Structured I/O, prefilling, n-shot prompting, CoT, etc.
LLM Evals: Evals that actually work for classification, summarization, translation.
LLM-Evaluators: Use cases, techniques, alignment, finetuning, and critiques.
Q&A Evals: Metrics, how to build datasets and evaluators, and existing benchmarks.
Out-of-Domain Finetuning: How to bootstrap on open data and label fewer samples.
Synthetic Data: For pretraining, instruction-tuning, and preference-tuning.
Intuition on Attention: Why Q, K, V vectors, multiple heads and layers, etc.
Counterfactual Evaluation & IPS: Aka the observational vs. interventional problem
Measuring and Mitigating Position Bias: Accounting for bias in ordered results.
Primer on Language Models: From RNN to Word2Vec to Transformer to BERT to T5.
Primer on Text-to-Image: Diffusion, conditioning, guidance, and latent space.

Practices at the intersection of ML and engineering.

ML Design Patterns: Patterns in code & systems such as factory, decorator, proxy, etc.
ML Design Patterns for Systems: HITL, hard mining, reframe, cascade, flywheel, etc.
Python Patterns: Lesser seen patterns such as mixins, relative imports, etc.
ML Testing: Implementation, expected learned behavior, and evaluation metrics.
Pipeline Testing: Managing brittle tests in data & machine learning pipelines.
Model Testing: Why you shouldn’t mock machine learning models in unit tests.
Challenges with ML in Production and A Practical Guide to Overcome them.
Python Project Setup: With a workflow of checks that run with each git push.

I don't get to hack as much as I want, but when I do, they're a ton of fun.

Processes and tools for effective projects and teams.

How to Interview ML/AI Engineers: How to run phone screens, loops, and debriefs.
Mechanisms for ML Projects: Pilot & copilot, literature reviews, and more.
Mechanisms for technical teams: Weekly debriefs, learning sessions, WBRs, etc.
Weekly 15-5s: Increase visibility and earn trust with this 15-minute habit.
Scrum for Data Science: Parts of Scrum that fit the data science process well.

Practices that worked well for me and general advice.

How to Run A Weekly Paper Club: Benefits, and how to read and facilitate papers.
Onboarding Effectively: Mindset, 100-day plan, and balancing learning and doing.
Data Science Red Flags: What to look out for when interviewing with a new team.
Influencing without Authority: Socratic method, earning trust, finding advocates, etc.
Comparing Data/ML Roles: Data/Applied/Research scientist, and ML engineer.
Georgia Tech OMSCS FAQ: A guide to applying for and completing OMSCS.

Especially in the context of a career in tech and data.

Talks that've received the most positive feedback and engagement.

AIE World’s Fair 2024 Keynote: What We Learned from a Year of Building with LLMs.
Netflix PRS 2024: LLMs-based recommendation experiences—Challenges & Lessons.
AIE Summit 2023 Keynote: Lego blocks of LLM systems (evals, RAG, guardrails, etc).
RecSys 2022 Keynote: Is the juice worth the squeeze, for real-time recommendations?
System Design for RecSys & Search: Candidate retrieval and ranking in industry.
OLX Product & Tech Keynote: How Asia’s tech giants scale & the SuperApp strategy.