Simple baselines, ideas, tech stacks, and packages to try.
An overview of system design, candidate retrieval, and ranking, with industry examples.
Focusing on long-term rewards, exploration, and frequently updated item.
Why real-time RecSys? What does the system design look like in industry? How to build an MVP?
Breaking it into offline vs. online environments, and candidate retrieval vs. ranking steps.
A whirlwind tour of bandits, embedding+MLP, sequences, graph, and user embeddings.
Why real-time? How have China & US companies built them? How to design & build an MVP?
Emphasis on bias, more sequential models & bandits, robust offline evaluation, and recsys in the wild.
What I learned about measuring diversity, novelty, surprise, and serendipity from 10+ papers.
Comparing baselines (matrix factorization) against novel approaches using graphs & NLP.
Beating the baseline using Graph & NLP techniques on PyTorch, AUC improvement of ~21% (Part 2 of 2).
Building a baseline recsys based on data scraped off Amazon. Warning - Lots of charts! (Part 1 of 2).