eval
(7)Look at and label your data, build and evaluate your LLM-evaluator, and optimize it against your labels.
27 Oct 2024  ·  13 min  ·  llm eval learning 🛠 🩷
Being a human judge at the Weights & Biases LLM-as-a-Judge Hackathon
Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators.
Evals for classification, summarization, translation, copyright regurgitation, and toxicity.
31 Mar 2024  ·  33 min  ·  llm eval machinelearning
How to use open-source, permissive-use data and collect less labeled samples for our tasks.
05 Nov 2023  ·  12 min  ·  llm eval machinelearning
Reference, context, and preference-based metrics, self-consistency, and catching hallucinations.
Thinking about recsys as interventional vs. observational, and inverse propensity scoring.
10 Apr 2022  ·  8 min  ·  recsys eval machinelearning
Join 9,100+ readers getting updates on machine learning, RecSys, LLMs, and engineering.