Parag Jain ✈️ NeurIPS
@jparag123
Followers
319
Following
1K
Media
3
Statuses
149
RS @Meta Ex SR @GoogleDeepMind, PhD @EdinburghNLP
United Kingdom
Joined January 2022
I’ll be at #NeurIPS2025 until 12/7!👋 Please reach out if you want to chat about RL, reasoning, self-evolving, or LLM diversity. My Pre: 🌟 Fri, Dec 5 (11a-2p): Spotlight on Synthetic Data Scheduling, #4108 🌟 Sat, Dec 6 (11:30a & 4:30p): Spotlight on evaluating CoT, Hall F
0
1
7
Want to understand how to RL fine-tune your LLM without labels? I'll be presenting Compute as Teacher (CaT 🐈) as a spotlight⭐️ poster at the Efficient Reasoning workshop at NeurIPS ✈️ next week If you're around, come and chat about RL, LLMs, and brain decoding. #NeurIPS2025
🚨New Meta Superintelligence Labs Paper🚨 What do we do when we don’t have reference answers for RL? What if annotations are too expensive or unknown? Compute as Teacher (CaT🐈) turns inference compute into a post-training supervision signal. CaT improves up to 30% even on
2
13
75
Where do learning signals come from when there's no ground truth ? Compute as Teacher: Convert the model's exploration at inference time into reference free supervision.
🚨New Meta Superintelligence Labs Paper🚨 What do we do when we don’t have reference answers for RL? What if annotations are too expensive or unknown? Compute as Teacher (CaT🐈) turns inference compute into a post-training supervision signal. CaT improves up to 30% even on
6
41
270
🔥 NEW PAPER: What makes reasoning traces effective in LLMs? Spoiler: It's NOT length or self-checking. We found a simple graph metric that predicts accuracy better than anything else—and proved it causally. 🧵[1/n]
4
27
177
🚨Brilliant New @AIatMeta Superintelligence Labs Paper. It asks a simple question: "Can inference compute substitute for missing supervision?" And the big deal is that this paper shows you don’t need humans to provide labels or feedback in reinforcement learning anymore.
11
37
261
🚨New Meta Superintelligence Labs Paper🚨 What do we do when we don’t have reference answers for RL? What if annotations are too expensive or unknown? Compute as Teacher (CaT🐈) turns inference compute into a post-training supervision signal. CaT improves up to 30% even on
13
87
552
We released DeepConf that can achieve 99.9% on AIME'25 with open source models with only 15% of the compute, compared to majority voting@512. The secret? Simple. Just to pruning the rollouts if they show a consecutive stream of low-confidence😀. Can be applied to any models
Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong
11
53
367
New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵
58
168
1K
Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
320
913
6K
Submitted my thesis 😀, next stop viva🚀. That means I'm on the job market! If you have a position that fits my profile, I'd love to chat 🙏. Web: https://t.co/p3ogTV40CV GScholar: https://t.co/Uomhm0QhW9
5
10
67
Do speakers of different languages talk differently about what they see? We measure the saliency of entities mentioned in image captions of 31 languages to answer: sometimes they do! Kudos to @uriberger88 for leading the project
1
10
39
Check this out if you are interested in text-to-SQL parsing! 🚀📊
⚡️ Accepted to #NeurIPS2024 @NeurIPSConf D&B track as a Spotlight! See you in Vancouver!
0
2
5
Feeling bad for my student who extended Shtarkov’s characterization of minimax rates to the adversarial setting, a problem open since early 2000s, and—due to inexperienced reviewers—it got only a poster based on 8,6,6,4 reviews. Should we pull it and send to IEEE info theory?
32
45
1K
This work has gotten accepted at WMT (Conference on Machine Translation) held with EMNLP 2024! 🥳🎉 And we also have another work accepted on cross-cultural transcreation of restaurant menus, led by Zhonghe Zhang https://t.co/hQuisCn8L9 See you all in Miami! Eager to discuss
arxiv.org
Machine Translation of Culture-Specific Items (CSIs) poses significant challenges. Recent work on CSI translation has shown some success using Large Language Models (LLMs) to adapt to different...
We know LLMs are poor at MT in low-resource languages (LRLs): curious how to adapt them to perform better? 🚀 Our new paper explores the interplay between scale (of MT data) and diversity (of tasks/langs) in instruction tuning in determining LLM-MT performance for LRLs💡
1
10
34
Thrilled to announce that my paper on media background checks has been accepted to #EMNLP Findings! 🎉 Very happy to see this, especially because the metareviewer was an LLM - and not great. Their review helps to illustrate an argument from my paper, though! 🧵
1
3
24
Really excited about the enthusiasm for our LLM Agents MOOC: 4000+ already joined within 2.5 days of announcement! 🎉🎉🎉 Join us today at https://t.co/LhgNbafGGA online for 1st lecture on LLM reasoning, @denny_zhou @GoogleDeepMind, 3:10pm PT!
6
64
401
@yeewhye @AmosStorkey reflects on Chris and his career in Edinburgh so far from AutoML to part-bases scene understanding and @KhanAsif__ work pops up!
2
2
4