Tag: ≥20min

Evaluating Long-Context Question & Answer Systems

Evaluation metrics, how to build eval datasets, eval methodology, and a review of several benchmarks.

22 Jun 2025 · 28 min · llm eval survey

Improving Recommendation Systems & Search in the Age of LLMs

Model architectures, data generation, training paradigms, and unified frameworks inspired by LLMs.

16 Mar 2025 · 43 min · recsys llm teardown 🔥

Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)

Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators.

18 Aug 2024 · 49 min · llm eval production survey 🔥

How to Interview and Hire ML/AI Engineers

What to interview for, how to structure the phone screen, interview loop, and debrief, and a few tips.

07 Jul 2024 · 21 min · machinelearning career leadership 🔥

Task-Specific LLM Evals that Do & Don't Work

Evals for classification, summarization, translation, copyright regurgitation, and toxicity.

31 Mar 2024 · 33 min · llm eval survey

How to Generate and Use Synthetic Data for Finetuning

Overcoming the bottleneck of human annotations in instruction-tuning, preference-tuning, and pretraining.

11 Feb 2024 · 42 min · llm survey

Evaluation & Hallucination Detection for Abstractive Summaries

Reference, context, and preference-based metrics, self-consistency, and catching hallucinations.

03 Sep 2023 · 23 min · llm eval survey

Patterns for Building LLM-based Systems & Products

Evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback.

30 Jul 2023 · 66 min · llm engineering production 🔥

More Design Patterns For Machine Learning Systems

9 patterns including HITL, hard mining, reframing, cascade, data flywheel, business rules layer, and more.

23 Apr 2023 · 20 min · machinelearning engineering production recsys

Patterns for Personalization in Recommendations and Search

A whirlwind tour of bandits, embedding+MLP, sequences, graph, and user embeddings.

13 Jun 2021 · 25 min · teardown recsys machinelearning deeplearning

Search: Query Matching via Lexical, Graph, and Embedding Methods

An overview and comparison of the various approaches, with examples from industry search systems.

25 Apr 2021 · 21 min · teardown machinelearning production 🔥

Real-time Machine Learning For Recommendations

Why real-time? How have China & US companies built them? How to design & build an MVP?

10 Jan 2021 · 21 min · teardown machinelearning recsys production 🔥

NLP for Supervised Learning - A Brief Survey

Examining the broad strokes of NLP progress and comparing between models

16 Aug 2020 · 23 min · llm deeplearning survey

How to Set Up a Python Project For Automation and Collaboration

After this article, we'll have a workflow of tests and checks that run automatically with each git push.

21 Jun 2020 · 20 min · engineering production python productivity

eugeneyan

Tag:
≥20min
(14)