๐Ÿ“š Personal bits of knowledge
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

๐Ÿ“ Add evals data science note

+1
+1
Machine Learning.md
··· 28 28 - Don't work if you cannot define what "great" means for your use case. 29 29 - [Evals replace LGTM-vibes development](https://newsletter.pragmaticengineer.com/p/evals). They systematize quality when outputs are non-deterministic. 30 30 - [Error analysis](https://youtu.be/ORrStCArmP4) workflow: build a simple trace viewer, review ~100 traces, annotate the first upstream failure ([open coding](https://shribe.eu/open-coding/)), cluster into themes ([axial coding](https://delvetool.com/blog/openaxialselective)), and use counts to prioritize. Bootstrap with grounded synthetic data if real data is thin. 31 + - Evals are fundamentally "data science"; Look at your data data, conduct experiments and measure where appropriate, and, iterate metrics and approaches. 31 32 - Pick the right evaluator: code-based assertions for deterministic failures; LLM-as-judge for subjective ones. Keep labels binary (PASS/FAIL) with human critiques. Partition data so the judge cannot memorize answers; validate the judge against human labels (TPR/TNR) before trusting it. 32 33 - Run evals in CI/CD and keep monitoring with production data. 33 34 - [Analyze โ†’ measure โ†’ improve โ†’ automate โ†’ repeat](https://newsletter.pragmaticengineer.com/p/evals).