Jesse Dodge @JesseDodge X Profile

Jesse Dodge

@JesseDodge

Followers

3K

Following

7K

Media

25

Statuses

813

Research Scientist at Meta. 10-yr test-of-time ACL 22, Best Demo ACL 25, Best Resource Paper ACL 24, Best Theme Paper ACL 24, Best Student Paper NAACL 15 🏳️‍🌈

Joined March 2009

Don't wanna be here? Send us removal request.

Jesse Dodge

@JesseDodge

4 days

Personal update: I'm excited to be joining @Meta! . I'm deeply grateful for the opportunities I've had at @allen_ai over the past 6 years (including three paper awards in the last two years). Onward to the next chapter! 🥳.

17

9

387

Jesse Dodge

@JesseDodge

5 days

huge congrats to the team! this was a massive effort over a long time 😁 happy to see it come to fruition.

Ai2

@allen_ai

5 days

With fresh support of $75M from @NSF and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡

0

9

Jesse Dodge

@JesseDodge

18 days

I unfortunately can't make this event this year, but it's an excellent list of people that will be there! Def go if you can!.

Koustuv Sinha

@koustuvsinha

18 days

🎉Only 20 days to go - MLRC 2025 (@repro_challenge) happening this month (Aug 21st)!! We have an exciting array of keynote talks - @random_walker, @soumithchintala, @BlancheMinerva & @jefrankle, orals, poster sessions & panel discussion led by @sayashk!

0

Jesse Dodge

@JesseDodge

1 month

RT @allen_ai: The first API endpoints for our fully open Olmo and Molmo models! Thank you, Cirrascale.

0

3

0

Jesse Dodge

@JesseDodge

2 months

Today we released SciArena! It's totally free, try asking questions about scientific topics, papers, citations, etc! Every query gets responses from two different models -- be sure to vote on which you prefer 😁.

Ai2

@allen_ai

2 months

Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM evaluation approach to the scientific domain. 🧵

0

8

39

Jesse Dodge

@JesseDodge

2 months

when speaking to the press i always said my estimate assumed chatgpt had same electricity consumption as bloom, and that that was prob not correct. and that what we use the AI for was potentially much more impactful on climate change. but that quote had a life of its own 😂.

0

2

Jesse Dodge

@JesseDodge

2 months

love seeing the transparency!. as the original source of the quote, "one query to chatgpt uses about as much electricity as could light a light bulb for 20 minutes", i'm happy that my estimate was only off by 10x!. it's great that we don't have to guess any more.

Sam Altman

@sama

2 months

also, here is one part that people not interested in the rest of the post might still be interested in:

1

0

6

Jesse Dodge

@JesseDodge

4 months

RT @IanMagnusson: Excited to share that DataDecide, our suite of language models pretrained over differences in data and scale, has been ac….

0

4

0

Jesse Dodge

@JesseDodge

4 months

RT @gu_yuling: Excited to be at #NAACL2025 in Albuquerque this week! I'll be presenting "OLMES: A Standard for Language Model Evaluations"….

arxiv.org

Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models can be particularly challenging, as choices of...

0

12

0

Jesse Dodge

@JesseDodge

4 months

RT @gabriberton: How to select pre-training data for LLMs?. Two papers came out last week from AllenAI and Nvidia that do it in a similar….

0

55

0

Jesse Dodge

@JesseDodge

4 months

We just released more than 30k model checkpoints, trained on 25 different pretraining corpora, all evaluated on 10+ benchmarks! . We applied a rigorous, scientific approach for how to decide on what data to train / eval on. Check it out! 🎉.

Ai2

@allen_ai

4 months

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared. DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵

0

1

19

Jesse Dodge

@JesseDodge

4 months

RT @allen_ai: Here we go #googlecloudnext! Excited to connect with developers and builders here about our fully open models, now available….

0

36

0

Jesse Dodge

@JesseDodge

4 months

We've released #OLMoTrace! This tool matches spans in language model output to exact matches in the training data. It searches over trillions of pretraining tokens in seconds, showing where a model trained on facts or word sequences. Only possible with open data! 🎉.

Ai2

@allen_ai

4 months

For years it’s been an open question — how much is a language model learning and synthesizing information, and how much is it just memorizing and reciting?. Introducing OLMoTrace, a new feature in the Ai2 Playground that begins to shed some light. 🔦

0

4

16

Jesse Dodge

@JesseDodge

5 months

RT @allen_ai: Ai2 is coming to #GoogleCloudNext! 🚀. Follow along as we bring our fully open AI to the main stage in Las Vegas, and don't mi….

0

17

0

Jesse Dodge

@JesseDodge

5 months

RT @allen_ai: Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks….

0

161

0

Jesse Dodge

@JesseDodge

7 months

RT @koustuvsinha: 🚨 We are pleased to announce the first, in-person event for the Machine Learning Reproducibility Challenge, MLRC 2025! Sa….

0

4

0

Jesse Dodge

@JesseDodge

8 months

RT @IanMagnusson: Come chat with me at #NeurIPS2024 and learn about how to use Paloma to evaluate perplexity over hundreds of domains! ✨We….

0

4

0

Jesse Dodge

@JesseDodge

8 months

super excited about this work! we can predict the downstream task performance of a 13B model using only ~1% of the compute of actually training it. I think there's a lot more to do in this space, looking forward to continue working on it. Excellent work @JiachengNLP @AkshitaB93!.

Jiacheng Liu

@liujc1998

8 months

Want to predict the task performance of LMs before pretraining them?. We develop task scaling laws and model ladders, which predict the accuracy on individual tasks by OLMo 2 7B & 13B models within 2 points of absolute error. The cost is 1% of the compute used to pretrain them.

0

11

Jesse Dodge

@JesseDodge

9 months

Find me here as well

0

1

20

Jesse Dodge

@JesseDodge

10 months

RT @claranahhh: Building/customizing your own LLM? You'll want to curate training data for it, but how do you know what makes the data good….

0

36

0