📝 Add evals data science note · davidgasquez.com/handbook@882ca15

📚 Personal bits of knowledge

1 changed file

expand all

··· 28 28 - Don't work if you cannot define what "great" means for your use case. 29 29 - [Evals replace LGTM-vibes development](https://newsletter.pragmaticengineer.com/p/evals). They systematize quality when outputs are non-deterministic. 30 30 - [Error analysis](https://youtu.be/ORrStCArmP4) workflow: build a simple trace viewer, review ~100 traces, annotate the first upstream failure ([open coding](https://shribe.eu/open-coding/)), cluster into themes ([axial coding](https://delvetool.com/blog/openaxialselective)), and use counts to prioritize. Bootstrap with grounded synthetic data if real data is thin. 31 + - Evals are fundamentally "data science"; Look at your data data, conduct experiments and measure where appropriate, and, iterate metrics and approaches. 31 32 - Pick the right evaluator: code-based assertions for deterministic failures; LLM-as-judge for subjective ones. Keep labels binary (PASS/FAIL) with human critiques. Partition data so the judge cannot memorize answers; validate the judge against human labels (TPR/TNR) before trusting it. 32 33 - Run evals in CI/CD and keep monitoring with production data. 33 34 - [Analyze → measure → improve → automate → repeat](https://newsletter.pragmaticengineer.com/p/evals).

Configure Feed

Configure Feed