Justus Mattern
@MatternJustus
Followers
6K
Following
3K
Media
134
Statuses
954
RL for code | prev. research @PrimeIntellect, @MPI_IS and built revideo
San Francisco, CA
Joined March 2021
Interesting case of GPT-5.1 remembering its training harness when only given a bash tool ("the editing helper I usually use isn’t available in this environment")
1
3
48
the fact that a company that has raised over 60M dollars and is able to recruit top AI researchers uses Tinker rather than in-house infra to train frontier models is an incredibly positive sign that Tinker can be used for serious large scale training runs
Congratulations to @axiommathai on their achievement! AxiomProver, a mathematics model fine-tuned with Tinker, got top scores on the Putnam Math Competition.
9
12
544
I'll be at Neurips starting Thursday - would love to chat about post-training for code and SWE evals / RL environments!
5
2
54
Excited about these results! Was a lot of fun building the RL stack with the team ❤️
Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack Achieving state-of-the-art performance for its size across math, code and reasoning Built using the same tools we put in your hands, from environments & evals, RL frameworks, sandboxes & more
4
0
94
if you're an engineer looking for new opportunities, reach out :)
4
1
43
a long-term investment that is paying off is investing heavily into evals (which we can use RL environments) we aim for a hyper-realistic task distribution and use a variety of different grading techniques: - classical tests (e.g. unit tests, integration tests) for reliably
1
2
31
Update: After an incredible year at @PrimeIntellect, I have decided to take my next step in August. Grateful that I got to work with such a talented team and build the best open-source RL infra! For now, I'm continuing to work on RL for coding agents. Will share updates :)
31
8
399
Next up solving competitive programming challenges from the Waymo entertainment system
Uber will give its drivers in the US an option to make money by doing “digital tasks”. These short minute-long tasks can be done anytime including while idling for passengers: ▫️data-labelling (for AI training) ▫️uploading restaurant menus ▫️recording audio samples of
1
0
17
my turn: if you are interested in working on coding agent research and being a core contributor to what will be an impactful paper, some Stanford friends are working on an industry collab and are looking for motivated researchers to join! Can be paid internship if full-time, DM
The only piece of advice I give to undergrads that want to get into research is to cold email PhD students with a good track record. Most undergrads are bottlenecked by research ideas whereas good PhD students have way too many ideas that they cannot execute. If you can code
16
24
377
there's nothing like the feeling of looking up from your computer and going for a walk after an incredibly stressful 8h lock-in without food or breaks to meet a deadline
2
0
34