Ameya P. Profile
Ameya P.

@AmyPrb

Followers
395
Following
2K
Media
17
Statuses
873

Exploring Science of Benchmarking & Scaling up Automated 🧬 Discovery. Postdoc @bethgelab @uni_tue; Previously: @OxfordTVG, @intelailabs RT != endorsement

TĂĽbingen, Germany
Joined September 2021
Don't wanna be here? Send us removal request.
@AmyPrb
Ameya P.
5 months
🚨 New paper!. Exciting progress in GRPO variants, smarter training strategies, and curated datasets showing impressive improvements on math reasoning -- Is the hype justified?. Details below👇
Tweet media one
1
4
27
@AmyPrb
Ameya P.
17 hours
RT @JJitsev: This is awesome. Hope such spots will pop up more and more all around the world. Eg EU would do well to support such entities….
0
1
0
@grok
Grok
21 days
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
508
942
4K
@AmyPrb
Ameya P.
2 days
RT @polynoamial: To all undergrads interested in learning about AI: be wary of taking “Intro to AI” as your first AI course. In many progra….
0
114
0
@AmyPrb
Ameya P.
8 days
RT @nearcyan: sometimes someone goes completely insane when realizing the scale factory farming operates at and really that seems like a re….
0
50
0
@AmyPrb
Ameya P.
9 days
RT @Dorialexander: Ok small rant: I don't think we're in a generative AI bubble. Actual market is still small. It seems bigger only becaus….
0
22
0
@AmyPrb
Ameya P.
10 days
RT @spendergrsec: Huh, I didn't realize that the vibe-coded vulns inserted into 5 LTS kernels that still aren't fixed 22 days later haven't….
0
98
0
@AmyPrb
Ameya P.
11 days
RT @chalmermagne: new from me: UK AI policy is increasingly devoid of ambition, housed in a weak department writing cheques it can't cash.….
0
20
0
@AmyPrb
Ameya P.
11 days
Must read-- Great work from top labs like Bytedance. Exciting times ahead for LLM forecasting!.
@liujiashuo77
Jiashuo Liu
12 days
We built FutureX, the world’s first live benchmark for real future prediction — politics, economy, culture, sports, etc. Among 23 AI agents, #Grok4 ranked #1 🏆.Elon didn’t lie. @elonmusk your model sees further 🚀🍀. LeaderBoard:
Tweet media one
1
0
1
@AmyPrb
Ameya P.
15 days
RT @haifengxu0: Super excited to launch @ProphetArena, a platform for benchmarking AI's forecasting capabilities with a few unique features….
prophetarena.co
A Live Benchmark for Predictive Intelligence
0
12
0
@AmyPrb
Ameya P.
16 days
RT @natanielruizg: We are releasing a paper I'm very excited about. We know test-time scaling is a path to greatly improved results, and ac….
0
44
0
@AmyPrb
Ameya P.
16 days
RT @JJitsev: Debunking yet another study of many that claim benefit of "brain-inspired" mechanisms without doing proper controls - comparin….
0
8
0
@AmyPrb
Ameya P.
17 days
RT @Hidenori8Tanaka: @EkdeepL And of course, look out for @EkdeepL on the academic job market! You’ll see how creative science and frontier….
0
1
0
@AmyPrb
Ameya P.
18 days
RT @xeophon_: After thinking about this problem for months, I am so happy to finally introduce DetailBench!. It answers a simple question:….
0
62
0
@AmyPrb
Ameya P.
19 days
RT @StephenLCasper: 🧵 New paper from @AISecurityInst x @AiEleuther that I led with Kyle O’Brien:. Open-weight LLM safety is both important….
0
39
0
@AmyPrb
Ameya P.
19 days
Europe & UK need to really need to back off on their data centers & water nonsense.
@jameswilson
James Wilson
20 days
Holy shit it’s real. I am going to fucking LOSE IT.
Tweet media one
0
0
7
@AmyPrb
Ameya P.
20 days
RT @ShashwatGoel7: Seems like OpenAI has been prioritising verification, hugely. We re-ran REFUTE, our code verification eval (COLM'25) o….
0
18
0
@AmyPrb
Ameya P.
20 days
Frontier LLMs can win gold at the IMO in generation tasks — but can they spot bugs in code? . Our results show code analysis capabilities still lags!.
@shiven_sinha
Shiven Sinha
20 days
LLMs are winning IOI golds & crushing code gen—but can they verify correctness? . In Feb, our benchmark saw single digit scores with o3-mini. We re-ran our evals with the latest open models: GPT-OSS gets 21.6% at demonstrating bugs in code! Progress✅But verification's still hard
Tweet media one
1
1
16
@AmyPrb
Ameya P.
22 days
RT @ShakeelHashim: I'd totally missed this: pretty wild stuff
Tweet media one
Tweet media two
0
9
0
@AmyPrb
Ameya P.
24 days
RT @JustinLin610: this is it!. it means that u can use qwen code for free unless u need more than 2000 runs every day!. i hope u can better….
0
143
0
@AmyPrb
Ameya P.
24 days
RT @jaseweston: . is today a good day for new paper posts? .🤖Learning to Reason for Factuality 🤖.📝: - New reward f….
0
49
0
@AmyPrb
Ameya P.
26 days
Highly recommend AI safety and alignment folks to apply to Maksym's lab -- expect some of the coolest alignment work to come outta this group👇.
@maksym_andr
Maksym Andriushchenko
26 days
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨. Hiring. I'm looking for multiple PhD students: both those able to start
Tweet media one
0
0
12