
Scott Condron
@_ScottCondron
Followers
5K
Following
5K
Media
506
Statuses
3K
Helping build AI/ML dev tools at @weights_biases. I post about machine learning, data visualisation, software tools.
Dublin, Ireland
Joined April 2018
Here's an animation of a @PyTorch DataLoader. It turns your dataset into a shuffled, batched tensors iterator. (This is my first animation using @manim_community, the community fork of @3blue1brown's manim). Here's a little summary of the different parts for those curious:.1/5
32
500
3K
Who does evals (eng vs product vs domain experts), how often they do it, and how they do it varies wildly based on team size, personas, task complexity, and risk tolerance. There's no way simple off-the-shelf evals would work for @bytefuse_ai
1
0
2
RT @corbtt: At OpenPipe we built an entire SFT platform before pivoting to RL. It's theoretically possible to get similar results with eit….
0
12
0
This was my favourite talk I went to at the AI worlds fair. It makes a good case that teams sophisticated enough to have good evals can leverage open models to make custom, fine-tuned agents that are more reliable at their tasks using RL.
🆕 Training Agentic Reasoners. today's feature is @willccbb's triumphant return to the AIE stage RL track - now as part of @PrimeIntellect! . A lot of agent builders are basically doing "RL by hand". He concisely explains current RL algorithms in one slide (!) but then argues
0
0
5
RT @weights_biases: Unsure where to get started with AI Evals for your business? . Scott the PM for our W&B Weave product, he's talked to m….
0
4
0
RT @sh_reya: Big fan of Scott’s eval guide. I like that it’s highly interactive (“choose your own adventure”), and that it distills a lot o….
0
6
0
Most AI teams optimize eval metrics without knowing the business impact mapping. From @chipro's AI Engineering: if 80% factual consistency → 30% ticket automation, 90% → 50%, you can calculate ROI on improvements and set deployment thresholds. This is how you know when you're.
@chipro @eugeneyan Thanks @chipro! I included a quote from your book about connecting your eval metric to a business metric
0
0
3
RT @AtharvaIngle7: this is really cool - like how it builds a personalized evaluation roadmap based on your specific situation.
0
1
0
How I built this:.- @sh_reya's DocETL to help find relevant quotes / tips from my favourite eval guides / chapters / case studies across different key dimensions (defining eval requirements, dataset building, scoring, etc.) and prompt iteration.- Claude Code to synthesize the.
I made a choose-your-own-adventure for AI evaluation strategy. Your answers build a personalized roadmap based on task complexity, cost of failure, and your current evaluation. It also includes my favourite selection of tips from industry experts like @eugeneyan, @chipro and
3
6
25
I made a choose-your-own-adventure for AI evaluation strategy. Your answers build a personalized roadmap based on task complexity, cost of failure, and your current evaluation. It also includes my favourite selection of tips from industry experts like @eugeneyan, @chipro and
5
15
90
RT @corbtt: Our customers that are using RL to train agents on their specific domain to build reliable agents are *extremely* happy fyi.
0
14
0
RT @capetorch: My multi-turn GRPO runs keep crashing as the vLLM server can't keep up (long traces and a lot of them when doing multiturn):….
0
1
0
RT @zmkzmkz: just finished the pretraining of our 7B baseline. this is the first time I've pretrained a model of this scale, just a measly,….
0
6
0