Tag: llm

Evaluating Long-Context Question & Answer Systems

Evaluation metrics, how to build eval datasets, eval methodology, and a review of several benchmarks.

22 Jun 2025 · 28 min · llm eval survey

AI Engineer 2025 - Improving RecSys & Search with LLM techniques

Recsys & search are converging with LLMs via semantic IDs, data augmentation, and unified foundation models.

04 Jun 2025 · 1 min · recsys llm engineering production

Building News Agents for Daily News Recaps with MCP, Q, and tmux

Learning to automate simple agentic workflows with Amazon Q CLI, Anthropic MCP, and tmux.

04 May 2025 · 8 min · llm learning 🛠

An LLM-as-Judge Won't Save The Product—Fixing Your Process Will

Applying the scientific method, building via eval-driven development, and monitoring AI output.

20 Apr 2025 · 5 min · eval llm engineering

NVIDIA GTC 2025 - Building LLM-Powered Applications

Chip Huyen and I share what we've learned, best practices, and insights at NVIDIA GTC 2025.

18 Mar 2025 · 1 min · llm engineering production

Improving Recommendation Systems & Search in the Age of LLMs

Model architectures, data generation, training paradigms, and unified frameworks inspired by LLMs.

16 Mar 2025 · 43 min · recsys llm teardown 🔥

Building AI Reading Club: Features & Behind the Scenes

Exploring how an AI-powered reading experience could look like.

12 Jan 2025 · 10 min · llm learning 🛠 🩷

A Spark of the Anti-AI Butlerian Jihad (on Bluesky)

How the sharing of 1M Bluesky posts uncovered the strong anti-AI sentiment on Bluesky.

08 Dec 2024 · 7 min · llm ai misc

AlignEval: Building an App to Make Evals Easy, Fun, and Automated

Look at and label your data, build and evaluate your LLM-evaluator, and optimize it against your labels.

27 Oct 2024 · 14 min · llm eval learning 🛠 🩷

Weights & Biases LLM-Evaluator Hackathon - Hackathon Judge

Being a human judge at the Weights & Biases LLM-as-a-Judge Hackathon

22 Sep 2024 · 2 min · llm eval

Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)

Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators.

18 Aug 2024 · 49 min · llm eval production survey 🔥

AI Engineer 2024 Keynote - What We Learned from a Year of LLMs

Special double-feature closing keynote from the 6 authors of the hit O'Reilly article on Applied LLMs.

27 Jun 2024 · 2 min · llm ai engineering production

Netflix PRS 2024 - Applying LLMs to Recommendation Experiences

Challenges and lessons from deploying LLM experiences: evals, scalability, guardrails.

31 May 2024 · 2 min · llm engineering production leadership

Prompting Fundamentals and How to Apply them Effectively

Structured input/output, prefilling, n-shots prompting, chain-of-thought, reducing hallucinations, etc.

26 May 2024 · 17 min · llm production 🔥

What We've Learned From A Year of Building with LLMs

From the tactical nuts & bolts to the operational day-to-day to the long-term business strategy.

12 May 2024 · 1 min · llm engineering production leadership 🔥

Building an AI Coach to Help Tame My Monkey Mind

Building an AI coach with speech-to-text, text-to-speech, an LLM, and a virtual number.

07 Apr 2024 · 4 min · llm ai life 🛠

Task-Specific LLM Evals that Do & Don't Work

Evals for classification, summarization, translation, copyright regurgitation, and toxicity.

31 Mar 2024 · 33 min · llm eval survey

How to Generate and Use Synthetic Data for Finetuning

Overcoming the bottleneck of human annotations in instruction-tuning, preference-tuning, and pretraining.

11 Feb 2024 · 42 min · llm survey

Language Modeling Reading List (to Start Your Paper Club)

Some fundamental papers and a one-sentence summary for each; start your own paper club!

07 Jan 2024 · 6 min · llm learning

Out-of-Domain Finetuning to Bootstrap Hallucination Detection

How to use open-source, permissive-use data and collect less labeled samples for our tasks.

05 Nov 2023 · 12 min · llm eval machinelearning python

Reflections on AI Engineer Summit 2023

The biggest deployment challenges, backward compatibility, multi-modality, and SF work ethic.

15 Oct 2023 · 7 min · llm ai misc

AI Engineer 2023 Keynote - Building Blocks for LLM Systems

Evals, retrieval-augmented generation, guardrails, and collecting feedback; all that good stuff.

09 Oct 2023 · 17 min · llm ai engineering production

Evaluation & Hallucination Detection for Abstractive Summaries

Reference, context, and preference-based metrics, self-consistency, and catching hallucinations.

03 Sep 2023 · 23 min · llm eval survey

How to Match LLM Patterns to Problems

Distinguishing problems with external vs. internal LLMs, and data vs non-data patterns

13 Aug 2023 · 6 min · llm production

Patterns for Building LLM-based Systems & Products

Evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback.

30 Jul 2023 · 66 min · llm engineering production 🔥

Obsidian-Copilot: An Assistant for Writing & Reflecting

Writing drafts via retrieval-augmented generation. Also reflecting on the week's journal entries.

11 Jun 2023 · 6 min · llm engineering 🛠

Some Intuition on Attention and the Transformer

What's the big deal, intuition on query-key-value vectors, multiple heads, multiple layers, and more.

21 May 2023 · 8 min · deeplearning llm

Open-LLMs - A list of LLMs for Commercial Use

It started with a question that had no clear answer, and led to eight PRs from the community.

07 May 2023 · 1 min · llm misc

Interacting with LLMs with Minimal Chat

Should chat be the main UX for LLMs? I don't think so and believe we can do better.

30 Apr 2023 · 3 min · llm misc 🛠

Raspberry-LLM - Making My Raspberry Pico a Little Smarter

Generating Dr. Seuss headlines, fake WSJ quotes, HackerNews troll comments, and more.

16 Apr 2023 · 2 min · llm 🛠

Experimenting with LLMs to Research, Reflect, and Plan

Also, shortcomings in document retrieval and how to overcome them with search & recsys techniques.

09 Apr 2023 · 14 min · llm deeplearning learning 🛠 🔥

LLM-powered Biographies

Asking LLMs to generate biographies to get a sense of how they memorize and regurgitate.

19 Mar 2023 · 11 min · llm misc

Text-to-Image: Diffusion, Text Conditioning, Guidance, Latent Space

The fundamentals of text-to-image generation, relevant papers, and experimenting with DDPM.

27 Nov 2022 · 19 min · deeplearning llm survey

NLP for Supervised Learning - A Brief Survey

Examining the broad strokes of NLP progress and comparing between models

16 Aug 2020 · 23 min · llm deeplearning survey

eugeneyan

Tag:
llm
(34)