
Jacob Phillips
@jacob_dphillips
Followers
806
Following
11K
Media
26
Statuses
248
Engineering Fellow @a16z, American Dynamism. prev ML @scale_AI, CTO @Themis_AI, AI + History @MIT
Joined April 2016
We’re entering a new era in robotics where generalized systems are starting to work in the real world, but researchers still don’t have good tools for understanding their data. That’s why I built ARES, an open-source platform for ingesting, annotating, and curating robotics data.
14
32
163
RT @davideasnaghi: Exciting news on @diodeinc published on Business Insider today. 1/ We raised capital! Over $14.5m, most recently in a $….
0
68
0
RT @rmcentush: Conflicts are won not just by what we produce, but how fast we move it. Yet military logistics still run on spreadsheets and….
0
5
0
RT @svlevine: I wrote a fun little article about all the ways to dodge the need for real-world robot data. I think it has a cute title. ht….
0
116
0
RT @_ConnorSweeney: Her brain went 6 hours without oxygen before they could operate. On New Year's Eve in 2024, a 3mm-wide clump of cells….
0
9
0
RoboArena from @pranav_atreya -- real-world, scalable benchmarking for robots! Another step towards infrastructure for robot learning, similar to @lmarena_ai
I wrote a second piece on “How to Build ChatGPT for Robotics”, covering the history of robot data labeling, current best practices, and what the future holds for robots – across benchmarks, safety, red-teaming, and real-world deployment.
0
0
7
RT @SeanHendryx: What will the learning environments of the future look like that train artificial super intelligence? In recent work at @s….
0
30
0
RT @jsuarez5341: PufferLib 3.0: We trained reinforcement learning agents on 1 Petabyte / 12,000 years of data with 1 server. Now you can, t….
0
93
0
RT @oyhsu: Want to tinker with robots but don't have one on hand? . @jacob_dphillips on our team @a16z built MALLET, a simple toolkit for a….
0
1
0
MALLET provides a simple toolkit for anyone to become a robotics researcher. Check out the Github repo at . Thanks to @zhiyuan_zhou_ for setting up AutoEval and @oyhsu, @espricewright, and the rest of the @a16z American Dynamism team for their support.
github.com
Cloud-based tools and an evaluation harness for VLMs to control real-world robots - jacobphillips99/mallet
1
0
9
We host CPU and GPU servers on @modal_labs, enabling researchers to train and evaluate VLMs or vision-language-action (VLA) models. We can also use MALLET as an evaluation benchmark to test the spatial reasoning capabilities of VLMs in comparison to VLAs.
1
0
4
@chris_j_paxton On learning from real-world deployments: "Most deployed robots are doing the same task, over and over again, in the same environment. So the pool of useful robots for learning “robot GPT” is going to be quite a bit lower.".
0
0
3
A great point from @chris_j_paxton in "It Can Think" this morning that a lot of people working in robot data collection tend to miss! This may actually be more bullish on robot learning from human videos.
2
2
16
RT @espricewright: who is building American Dynamism and will be @CVPR in Nashville next week? . hit us up @oyhsu @jacob_dphillips @MillenA….
0
2
0
Releasing updated data and datasets on @huggingface! Now compatible with @MLCommons Croissant metadata format.
huggingface.co
We’re entering a new era in robotics where generalized systems are starting to work in the real world, but researchers still don’t have good tools for understanding their data. That’s why I built ARES, an open-source platform for ingesting, annotating, and curating robotics data.
1
0
16
The recent Sonnet release actually showed a small regression on MMMU, a visual reasoning benchmark, despite large advances in long-context reasoning for agentic coding and AIME. Excited to see better embodied reasoning benchmarks in the future!
Feels like there’s more discussion lately around evaluation criteria for physical reasoning abilities of AI. Maybe an extension of evaluating visual reasoning, but likely something wholly different. “The people yearn for benchmarks” — @jacob_dphillips.
0
1
6