
Jesse Dodge
@JesseDodge
Followers
3K
Following
7K
Media
25
Statuses
813
Research Scientist at Meta. 10-yr test-of-time ACL 22, Best Demo ACL 25, Best Resource Paper ACL 24, Best Theme Paper ACL 24, Best Student Paper NAACL 15 🏳️🌈
Joined March 2009
huge congrats to the team! this was a massive effort over a long time 😁 happy to see it come to fruition.
With fresh support of $75M from @NSF and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡
0
0
9
I unfortunately can't make this event this year, but it's an excellent list of people that will be there! Def go if you can!.
🎉Only 20 days to go - MLRC 2025 (@repro_challenge) happening this month (Aug 21st)!! We have an exciting array of keynote talks - @random_walker, @soumithchintala, @BlancheMinerva & @jefrankle, orals, poster sessions & panel discussion led by @sayashk!
0
0
0
RT @allen_ai: The first API endpoints for our fully open Olmo and Molmo models! Thank you, Cirrascale.
0
3
0
Today we released SciArena! It's totally free, try asking questions about scientific topics, papers, citations, etc! Every query gets responses from two different models -- be sure to vote on which you prefer 😁.
Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM evaluation approach to the scientific domain. 🧵
0
8
39
love seeing the transparency!. as the original source of the quote, "one query to chatgpt uses about as much electricity as could light a light bulb for 20 minutes", i'm happy that my estimate was only off by 10x!. it's great that we don't have to guess any more.
also, here is one part that people not interested in the rest of the post might still be interested in:
1
0
6
RT @IanMagnusson: Excited to share that DataDecide, our suite of language models pretrained over differences in data and scale, has been ac….
0
4
0
RT @gu_yuling: Excited to be at #NAACL2025 in Albuquerque this week! I'll be presenting "OLMES: A Standard for Language Model Evaluations"….
arxiv.org
Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models can be particularly challenging, as choices of...
0
12
0
RT @gabriberton: How to select pre-training data for LLMs?. Two papers came out last week from AllenAI and Nvidia that do it in a similar….
0
55
0
We just released more than 30k model checkpoints, trained on 25 different pretraining corpora, all evaluated on 10+ benchmarks! . We applied a rigorous, scientific approach for how to decide on what data to train / eval on. Check it out! 🎉.
Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared. DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵
0
1
19
RT @allen_ai: Here we go #googlecloudnext! Excited to connect with developers and builders here about our fully open models, now available….
0
36
0
We've released #OLMoTrace! This tool matches spans in language model output to exact matches in the training data. It searches over trillions of pretraining tokens in seconds, showing where a model trained on facts or word sequences. Only possible with open data! 🎉.
For years it’s been an open question — how much is a language model learning and synthesizing information, and how much is it just memorizing and reciting?. Introducing OLMoTrace, a new feature in the Ai2 Playground that begins to shed some light. 🔦
0
4
16
RT @allen_ai: Ai2 is coming to #GoogleCloudNext! 🚀. Follow along as we bring our fully open AI to the main stage in Las Vegas, and don't mi….
0
17
0
RT @allen_ai: Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks….
0
161
0
RT @koustuvsinha: 🚨 We are pleased to announce the first, in-person event for the Machine Learning Reproducibility Challenge, MLRC 2025! Sa….
0
4
0
RT @IanMagnusson: Come chat with me at #NeurIPS2024 and learn about how to use Paloma to evaluate perplexity over hundreds of domains! ✨We….
0
4
0
super excited about this work! we can predict the downstream task performance of a 13B model using only ~1% of the compute of actually training it. I think there's a lot more to do in this space, looking forward to continue working on it. Excellent work @JiachengNLP @AkshitaB93!.
Want to predict the task performance of LMs before pretraining them?. We develop task scaling laws and model ladders, which predict the accuracy on individual tasks by OLMo 2 7B & 13B models within 2 points of absolute error. The cost is 1% of the compute used to pretrain them.
0
0
11
RT @claranahhh: Building/customizing your own LLM? You'll want to curate training data for it, but how do you know what makes the data good….
0
36
0