Ameya P. @AmyPrb X Profile

Ameya P.

@AmyPrb

Followers

395

Following

2K

Media

17

Statuses

873

Exploring Science of Benchmarking & Scaling up Automated 🧬 Discovery. Postdoc @bethgelab @uni_tue; Previously: @OxfordTVG, @intelailabs RT != endorsement

Tübingen, Germany

Joined September 2021

Don't wanna be here? Send us removal request.

Ameya P.

@AmyPrb

5 months

🚨 New paper!. Exciting progress in GRPO variants, smarter training strategies, and curated datasets showing impressive improvements on math reasoning -- Is the hype justified?. Details below👇

1

4

27

Ameya P.

@AmyPrb

17 hours

RT @JJitsev: This is awesome. Hope such spots will pop up more and more all around the world. Eg EU would do well to support such entities….

0

1

0

Grok

@grok

21 days

Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.

508

942

4K

Ameya P.

@AmyPrb

2 days

RT @polynoamial: To all undergrads interested in learning about AI: be wary of taking “Intro to AI” as your first AI course. In many progra….

0

114

0

Ameya P.

@AmyPrb

8 days

RT @nearcyan: sometimes someone goes completely insane when realizing the scale factory farming operates at and really that seems like a re….

0

50

0

Ameya P.

@AmyPrb

9 days

RT @Dorialexander: Ok small rant: I don't think we're in a generative AI bubble. Actual market is still small. It seems bigger only becaus….

0

22

0

Ameya P.

@AmyPrb

10 days

RT @spendergrsec: Huh, I didn't realize that the vibe-coded vulns inserted into 5 LTS kernels that still aren't fixed 22 days later haven't….

0

98

0

Ameya P.

@AmyPrb

11 days

RT @chalmermagne: new from me: UK AI policy is increasingly devoid of ambition, housed in a weak department writing cheques it can't cash.….

0

20

0

Ameya P.

@AmyPrb

11 days

Must read-- Great work from top labs like Bytedance. Exciting times ahead for LLM forecasting!.

Jiashuo Liu

@liujiashuo77

12 days

We built FutureX, the world’s first live benchmark for real future prediction — politics, economy, culture, sports, etc. Among 23 AI agents, #Grok4 ranked #1 🏆.Elon didn’t lie. @elonmusk your model sees further 🚀🍀. LeaderBoard:

1

0

1

Ameya P.

@AmyPrb

15 days

RT @haifengxu0: Super excited to launch @ProphetArena, a platform for benchmarking AI's forecasting capabilities with a few unique features….

prophetarena.co

A Live Benchmark for Predictive Intelligence

0

12

0

Ameya P.

@AmyPrb

16 days

RT @natanielruizg: We are releasing a paper I'm very excited about. We know test-time scaling is a path to greatly improved results, and ac….

0

44

0

Ameya P.

@AmyPrb

16 days

RT @JJitsev: Debunking yet another study of many that claim benefit of "brain-inspired" mechanisms without doing proper controls - comparin….

0

8

0

Ameya P.

@AmyPrb

17 days

RT @Hidenori8Tanaka: @EkdeepL And of course, look out for @EkdeepL on the academic job market! You’ll see how creative science and frontier….

0

1

0

Ameya P.

@AmyPrb

18 days

RT @xeophon_: After thinking about this problem for months, I am so happy to finally introduce DetailBench!. It answers a simple question:….

0

62

0

Ameya P.

@AmyPrb

19 days

RT @StephenLCasper: 🧵 New paper from @AISecurityInst x @AiEleuther that I led with Kyle O’Brien:. Open-weight LLM safety is both important….

0

39

0

Ameya P.

@AmyPrb

19 days

Europe & UK need to really need to back off on their data centers & water nonsense.

James Wilson

@jameswilson

20 days

Holy shit it’s real. I am going to fucking LOSE IT.

0

7

Ameya P.

@AmyPrb

20 days

RT @ShashwatGoel7: Seems like OpenAI has been prioritising verification, hugely. We re-ran REFUTE, our code verification eval (COLM'25) o….

0

18

0

Ameya P.

@AmyPrb

20 days

Frontier LLMs can win gold at the IMO in generation tasks — but can they spot bugs in code? . Our results show code analysis capabilities still lags!.

Shiven Sinha

@shiven_sinha

20 days

LLMs are winning IOI golds & crushing code gen—but can they verify correctness? . In Feb, our benchmark saw single digit scores with o3-mini. We re-ran our evals with the latest open models: GPT-OSS gets 21.6% at demonstrating bugs in code! Progress✅But verification's still hard

1

16

Ameya P.

@AmyPrb

22 days

RT @ShakeelHashim: I'd totally missed this: pretty wild stuff

0

9

0

Ameya P.

@AmyPrb

24 days

RT @JustinLin610: this is it!. it means that u can use qwen code for free unless u need more than 2000 runs every day!. i hope u can better….

0

143

0

Ameya P.

@AmyPrb

24 days

RT @jaseweston: . is today a good day for new paper posts? .🤖Learning to Reason for Factuality 🤖.📝: - New reward f….

0

49

0

Ameya P.

@AmyPrb

26 days

Highly recommend AI safety and alignment folks to apply to Maksym's lab -- expect some of the coolest alignment work to come outta this group👇.

Maksym Andriushchenko

@maksym_andr

26 days

🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨. Hiring. I'm looking for multiple PhD students: both those able to start

0

12