jacob_dphillips Profile Banner
Jacob Phillips Profile
Jacob Phillips

@jacob_dphillips

Followers
806
Following
11K
Media
26
Statuses
248

Engineering Fellow @a16z, American Dynamism. prev ML @scale_AI, CTO @Themis_AI, AI + History @MIT

Joined April 2016
Don't wanna be here? Send us removal request.
@jacob_dphillips
Jacob Phillips
3 months
We’re entering a new era in robotics where generalized systems are starting to work in the real world, but researchers still don’t have good tools for understanding their data. That’s why I built ARES, an open-source platform for ingesting, annotating, and curating robotics data.
Tweet media one
14
32
163
@jacob_dphillips
Jacob Phillips
1 day
RT @davideasnaghi: Exciting news on @diodeinc published on Business Insider today. 1/ We raised capital! Over $14.5m, most recently in a $….
0
68
0
@jacob_dphillips
Jacob Phillips
3 days
RT @rmcentush: Conflicts are won not just by what we produce, but how fast we move it. Yet military logistics still run on spreadsheets and….
0
5
0
@jacob_dphillips
Jacob Phillips
4 days
RT @svlevine: I wrote a fun little article about all the ways to dodge the need for real-world robot data. I think it has a cute title. ht….
0
116
0
@jacob_dphillips
Jacob Phillips
22 days
RT @_mattfreed: Get in. We’ve got fields to tend
Tweet media one
0
11
0
@jacob_dphillips
Jacob Phillips
1 month
RT @_ConnorSweeney: Her brain went 6 hours without oxygen before they could operate. On New Year's Eve in 2024, a 3mm-wide clump of cells….
0
9
0
@jacob_dphillips
Jacob Phillips
1 month
RoboArena from @pranav_atreya -- real-world, scalable benchmarking for robots! Another step towards infrastructure for robot learning, similar to @lmarena_ai
Tweet media one
@jacob_dphillips
Jacob Phillips
3 months
I wrote a second piece on “How to Build ChatGPT for Robotics”, covering the history of robot data labeling, current best practices, and what the future holds for robots – across benchmarks, safety, red-teaming, and real-world deployment.
0
0
7
@jacob_dphillips
Jacob Phillips
1 month
RT @SeanHendryx: What will the learning environments of the future look like that train artificial super intelligence? In recent work at @s….
0
30
0
@jacob_dphillips
Jacob Phillips
1 month
RT @jsuarez5341: PufferLib 3.0: We trained reinforcement learning agents on 1 Petabyte / 12,000 years of data with 1 server. Now you can, t….
0
93
0
@jacob_dphillips
Jacob Phillips
1 month
RT @oyhsu: Want to tinker with robots but don't have one on hand? . @jacob_dphillips on our team @a16z built MALLET, a simple toolkit for a….
0
1
0
@jacob_dphillips
Jacob Phillips
1 month
MALLET provides a simple toolkit for anyone to become a robotics researcher. Check out the Github repo at . Thanks to @zhiyuan_zhou_ for setting up AutoEval and @oyhsu, @espricewright, and the rest of the @a16z American Dynamism team for their support.
Tweet card summary image
github.com
Cloud-based tools and an evaluation harness for VLMs to control real-world robots - jacobphillips99/mallet
1
0
9
@jacob_dphillips
Jacob Phillips
1 month
However, VLMs are getting stronger and stronger in multimodal reasoning. MALLET helps VLMs achieve low error approaching that of an actual VLA robot policy! Using MALLET, we can also experiment with in-context learning by providing different amounts of historical observations to
Tweet media one
Tweet media two
1
0
3
@jacob_dphillips
Jacob Phillips
1 month
Does it work? Not quite yet -- VLMs really struggle with embodied, 3D visual problems like occlusion, or optical illusions like parallax effect. Here's an example of gemini-2.5-flash trying to "open the drawer". From the reasoning traces, we see that the model can't tell that it
1
0
5
@jacob_dphillips
Jacob Phillips
1 month
We host CPU and GPU servers on @modal_labs, enabling researchers to train and evaluate VLMs or vision-language-action (VLA) models. We can also use MALLET as an evaluation benchmark to test the spatial reasoning capabilities of VLMs in comparison to VLAs.
Tweet media one
1
0
4
@jacob_dphillips
Jacob Phillips
1 month
Have you ever wondered if o4-mini could control a robot? Ever wanted to do robotics research, but didn't have any robots or GPUs? MALLET is a toolkit and benchmark for letting vision-language models like GPT-4o drive robots in the real-world. MALLET is built on top of
3
9
59
@jacob_dphillips
Jacob Phillips
1 month
@chris_j_paxton On learning from real-world deployments: "Most deployed robots are doing the same task, over and over again, in the same environment. So the pool of useful robots for learning “robot GPT” is going to be quite a bit lower.".
0
0
3
@jacob_dphillips
Jacob Phillips
1 month
A great point from @chris_j_paxton in "It Can Think" this morning that a lot of people working in robot data collection tend to miss! This may actually be more bullish on robot learning from human videos.
Tweet media one
2
2
16
@jacob_dphillips
Jacob Phillips
2 months
RT @espricewright: who is building American Dynamism and will be @CVPR in Nashville next week? . hit us up @oyhsu @jacob_dphillips @MillenA….
0
2
0
@jacob_dphillips
Jacob Phillips
2 months
Releasing updated data and datasets on @huggingface! Now compatible with @MLCommons Croissant metadata format.
Tweet card summary image
huggingface.co
@jacob_dphillips
Jacob Phillips
3 months
We’re entering a new era in robotics where generalized systems are starting to work in the real world, but researchers still don’t have good tools for understanding their data. That’s why I built ARES, an open-source platform for ingesting, annotating, and curating robotics data.
Tweet media one
1
0
16
@jacob_dphillips
Jacob Phillips
2 months
28 miles, 5k vertical feet of elevation gain, 4 tiny bass, 1 unknown skull
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
0
16
@jacob_dphillips
Jacob Phillips
2 months
The recent Sonnet release actually showed a small regression on MMMU, a visual reasoning benchmark, despite large advances in long-context reasoning for agentic coding and AIME. Excited to see better embodied reasoning benchmarks in the future!
Tweet media one
@oyhsu
Oliver Hsu
2 months
Feels like there’s more discussion lately around evaluation criteria for physical reasoning abilities of AI. Maybe an extension of evaluating visual reasoning, but likely something wholly different. “The people yearn for benchmarks” — @jacob_dphillips.
0
1
6