Key themes in my work
New here? These are topics I write & speak about. Alternatively, browse tags or search.
Machine Learning Systems in Industry
Exploring ML systems in industry and how they're implemented.
- Patterns for LLM Systems: Evals, RAG, fine-tuning, caching, guardrails, defensive UX.
- System Design for RecSys & Search: Offline vs. online, retrieval vs. ranking.
- Real-time Retrieval: Examples from various companies and how to build an MVP.
- Patterns for Personalization: Via bandits, sequences, graphs, and user embeddings.
- Reinforcement Learning for Recsys: Long-term rewards and explore-exploit.
- Bandits for RecSys: Industry examples, warm-start, off-policy evaluation.
- Search Query Matching: Via lexical, graph, and representation learning methods.
- Content Moderation: Collecting labels, data augmentation, cascade pattern, etc.
- Bootstrapping Data Labels: With semi, active, and weakly supervised learning.
- Feature Stores: As a hierarchy of needs (e.g., access, serving, integrity, etc.)
- Data Discovery Platforms: How they help with find data and open source options.
Machine Learning Techniques
Surveys on machine learning methods.
- Evals for Abstractive Summaries: Reference, context, and preference-based metrics.
- Intuition on Attention: Why Q, K, V vectors, multiple heads and layers, etc.
- Counterfactual Evaluation & IPS: Aka the observational vs. interventional problem
- Measuring and Mitigating Position Bias: Accounting for bias in ordered results.
- Measuring Serendipity in Recommendations: Diversity, novelty, unexpectedness, etc.
- A Brief Survey of NLP: From RNN to Word2Vec to Transformer to BERT to T5.
- A Survey of Text-to-Image: Diffusion, conditioning, guidance, and latent space.
Machine Learning & Engineering
Practices at the intersection of ML and engineering.
- LLM Problems & Patterns: External vs. internal LLMs; data vs. non-data patterns.
- ML Design Patterns: Patterns in code & systems such as factory, decorator, proxy, etc.
- ML Design Patterns for Systems: HITL, hard mining, reframe, cascade, flywheel, etc.
- Challenges with ML in Production and A Practical Guide to Overcome them.
- Python Patterns: Lesser seen patterns such as mixins, relative imports, etc.
- ML Testing: Implementation, expected learned behavior, and evaluation metrics.
- Pipeline Testing: Managing brittle tests in data & machine learning pipelines.
- Python Project Setup: With a workflow of checks that run with each
I don't get to hack as much as I want, but when I do, they're a ton of fun.
- Obsidian-Copilot: A Prototype Assistant for Writing and Reflecting.
- LLM UXs: Interacting with LLMs with Minimal Chat.
- Raspberry-LLM: Dr. Seuss headlines, HackerNews trolls, etc.
- LLMs to Research, Reflect, and Plan: LLMs, document retrieval, and Discord.
- ApplyingML.com: Ghost knowledge of applying machine learning.
- RecSys with Graphs & NLP: Node2Vec, Gensim word2vec, PyTorch word2vec.
- Baseline RecSys in PyTorch: The humble matrix factorization.
- SortMySkills: Sorting 50 skills to identify your passion and strengths.
- Image Search: Using image embeddings and cosine similarity.
- Image Classification: Transfer learning via Keras, Theano, and AWS.
- Title Classification: Data acquisition, data prep, building a frontend.
Mechanisms for Business, Product, and Tech Teams
Processes and tools for effective projects and teams.
Learning & Career
Practices that worked well for me and general advice.
Especially in the context of a career in tech and data.
Talks that've received the most positive feedback and engagement.
Ideas & Opinions
Random ideas and unnecessarily strong opinions.
Summaries & Notes
Summaries and permanent notes, tidied up for public consumption.
That are mostly scattered across the internet.
- applied-ml: Papers and tech blogs on real-world machine learning in industry.
- applyingml: Papers, guides, and interviews on how to apply ML effectively.
- open-llms: Open LLMs available for commercial use.
- ml-design-docs: Template of design docs for machine learning systems.
- testing-ml: Examples of implementation & behavioral tests for ML code.
- python-collab-template: Template with tests, type checks, linting, etc.
- papermill-mlflow: Experimentation workflow for machine learning.
- 1-on-1s: Questions to ask during 1-on-1s, from my time as a manager.
172 posts, 25 talks, 13 prototypes, 320,185 words, and countless hours.
Join 5,700+ readers getting updates on machine learning, RecSys, LLMs, and engineering.