🔖 Key themes and curated posts
New here? These are some topics I write & speak about. Or navigate via tags or search.
Machine Learning Systems
Exploring ML systems in industry and how they're implemented.
- System Design for RecSys & Search: Offline vs. online, retrieval vs. ranking.
- Real-time Retrieval: Examples from various companies and how to build an MVP.
- Search Query Matching: Via lexical, graph, and representation learning methods.
- Patterns for Personalization: Via bandits, sequences, graphs, and user embeddings.
- Bandits for RecSys: Industry examples, warm-start, off-policy evaluation.
- Reinforcement Learning for Recsys: Long-term rewards and explore-exploit.
- Feature Stores: As a hierarchy of needs (e.g., access, serving, integrity, etc.)
- Data Discovery Platforms: How they help with find data and open source options.
- Bootstrapping Data Labels: With semi, active, and weakly supervised learning.
Machine Learning Techniques
Surveys on machine learning methods.
Machine Learning & Engineering
Practices at the intersection of ML and engineering.
Mechanisms for ML and Data Science
Thoughts on what an effective data science process should look like.
Writing
Especially in the context of a career in tech and data.
Learning & Career
Practices that worked well for me and general advice.
Ideas
Random philosophical ideas and thoughts.
Summaries & Notes
Summaries and permanent notes, tidied up for public consumption.
Other resources
That are mostly scattered across the internet.
- applied-ml: Papers and tech blogs on real-world machine learning in industry.
- ml-surveys: Papers summarizing machine learning advances.
- applyingml: Papers, guides, and interviews on how to apply ML effectively.
- ml-design-docs: Template of design docs for machine learning systems.
- testing-ml: Examples of implementation & behavioral tests for ML code.
- python-collab-template: Template with tests, type checks, linting, etc.
- recsys-nlp-graph: Simple recsys and experiment results (built on PyTorch).
- papermill-mlflow: Experimentation workflow for machine learning.