
Ameya P.
@AmyPrb
Followers
395
Following
2K
Media
17
Statuses
873
Exploring Science of Benchmarking & Scaling up Automated 🧬 Discovery. Postdoc @bethgelab @uni_tue; Previously: @OxfordTVG, @intelailabs RT != endorsement
TĂĽbingen, Germany
Joined September 2021
🚨 New paper!. Exciting progress in GRPO variants, smarter training strategies, and curated datasets showing impressive improvements on math reasoning -- Is the hype justified?. Details below👇
1
4
27
RT @polynoamial: To all undergrads interested in learning about AI: be wary of taking “Intro to AI” as your first AI course. In many progra….
0
114
0
RT @Dorialexander: Ok small rant: I don't think we're in a generative AI bubble. Actual market is still small. It seems bigger only becaus….
0
22
0
RT @spendergrsec: Huh, I didn't realize that the vibe-coded vulns inserted into 5 LTS kernels that still aren't fixed 22 days later haven't….
0
98
0
RT @chalmermagne: new from me: UK AI policy is increasingly devoid of ambition, housed in a weak department writing cheques it can't cash.….
0
20
0
Must read-- Great work from top labs like Bytedance. Exciting times ahead for LLM forecasting!.
We built FutureX, the world’s first live benchmark for real future prediction — politics, economy, culture, sports, etc. Among 23 AI agents, #Grok4 ranked #1 🏆.Elon didn’t lie. @elonmusk your model sees further 🚀🍀. LeaderBoard:
1
0
1
RT @haifengxu0: Super excited to launch @ProphetArena, a platform for benchmarking AI's forecasting capabilities with a few unique features….
prophetarena.co
A Live Benchmark for Predictive Intelligence
0
12
0
RT @natanielruizg: We are releasing a paper I'm very excited about. We know test-time scaling is a path to greatly improved results, and ac….
0
44
0
RT @Hidenori8Tanaka: @EkdeepL And of course, look out for @EkdeepL on the academic job market! You’ll see how creative science and frontier….
0
1
0
RT @StephenLCasper: 🧵 New paper from @AISecurityInst x @AiEleuther that I led with Kyle O’Brien:. Open-weight LLM safety is both important….
0
39
0
RT @ShashwatGoel7: Seems like OpenAI has been prioritising verification, hugely. We re-ran REFUTE, our code verification eval (COLM'25) o….
0
18
0
Frontier LLMs can win gold at the IMO in generation tasks — but can they spot bugs in code? . Our results show code analysis capabilities still lags!.
LLMs are winning IOI golds & crushing code gen—but can they verify correctness? . In Feb, our benchmark saw single digit scores with o3-mini. We re-ran our evals with the latest open models: GPT-OSS gets 21.6% at demonstrating bugs in code! Progress✅But verification's still hard
1
1
16
RT @JustinLin610: this is it!. it means that u can use qwen code for free unless u need more than 2000 runs every day!. i hope u can better….
0
143
0
RT @jaseweston: . is today a good day for new paper posts? .🤖Learning to Reason for Factuality 🤖.📝: - New reward f….
0
49
0
Highly recommend AI safety and alignment folks to apply to Maksym's lab -- expect some of the coolest alignment work to come outta this group👇.
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨. Hiring. I'm looking for multiple PhD students: both those able to start
0
0
12