Ramon Astudillo @RamonAstudill12 X Profile

Ramon Astudillo

@RamonAstudill12

Followers

579

Following

3K

Media

20

Statuses

3K

Principal RS at IBM Research AI. Speech, Formal/Natural Language Processing. Currently LLM post-training, structured SDG/RL. Opinions my own and non stationary

https://t.co/WfAjRtaGoG

Manhattan, NY

Joined April 2019

Don't wanna be here? Send us removal request.

Ramon Astudillo

@RamonAstudill12

14 hours

2016: lol python? That's not programming. Real programmers use C/C++ 2026: lol AI coding agents? That's not programming. Real programmers use AI tab completion

0

Siddarth Venkatraman

@siddarthv66

7 days

> Be AI PhD student > Submit paper to conference > LLM slop reviews > Rejected > Concurrent paper with same method accepted > Resubmit to next conference > Reviewer points to concurrent paper which was accepted by last conference > Lack of novelty > Rejected

36

62

2K

Graham Neubig

@gneubig

8 days

ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?

27

96

508

Ofir Press

@OfirPress

18 days

CodeClash is our new benchmark for evaluating coding abilities- it's much harder than anything we've built before. LMs must manage an entire codebase and develop it to compete in challenging arenas against other LM-generated programs Current LMs really struggle, lots to do here!

John Yang

@jyangballin

18 days

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

6

9

71

Shane Gu

@shaneguML

23 days

Hot take: DAgger (Ross 2011) should be the first paper you read to get into RL, instead of Sutton's book. Maybe also read scheduled sampling (Bengio 2015). And before RL, study supervised learning thoroughly.

24

70

807

Sasha Rush

@srush_nlp

25 days

Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast. https://t.co/DX9bbalx0B Excited for the potential of building specialized models to help in critical domains.

56

76

799

Ramon Astudillo

@RamonAstudill12

26 days

Guys, none of you invented distillation

0

Ramon Astudillo

@RamonAstudill12

1 month

Damasio's proto-conscience

Nando de Freitas

@NandoDF

1 month

Machines that can predict what their sensors (touch, cameras, keyboard, temperature, microphones, gyros, …) will perceive are already aware and have subjective experience. It’s all a matter of degree now. More sensors, data, compute, tasks will lead without any doubt to the “I

0

Acquired Podcast

@AcquiredFM

1 month

Geoffrey Hinton, father of deep learning (right) at age 31, with Chris Riesbeck (left). La Jolla, CA.

14

36

641

Sasha Rush

@srush_nlp

2 months

Strong motivation for us to get the COLM talks up quickly. https://t.co/4RtGArHEen

youtube.com

Official YouTube Account of the Conference on Language Modeling

2

4

68

Ramon Astudillo

@RamonAstudill12

2 months

The Generative Model Alignment team at IBM Research is looking for next summer interns! Two candidates for two topics 🍰Reinforcement Learning environments for LLMs 🐎Speculative and non-auto regressive generation for LLMs interested/curious? DM / email ramon.astudillo@ibm.com

0

4

6

Alexander Doria

@Dorialexander

2 months

we finally have western qwen and you won't ever believe who this is.

Xeophon

@xeophon_

2 months

okay i will go on a limb: ibm really cooked with this and it might even dethrone my beloved qwen3 4b for local usage only tried the 4b dense version so far and its outputs are way less slop. example: clean up of transcript: granite4 not a single em-dash vs. 16 by qwen3

12

11

317

Keshav Ramji 🔜 NeurIPS'25

@KeshavRamji

2 months

Accepted at NeurIPS 2025! See you in San Diego! ☀️🌊

Keshav Ramji 🔜 NeurIPS'25

@KeshavRamji

6 months

Excited to share our new paper on language model self-improvement! Paper: https://t.co/etbyTNwDK9 We introduce Self-Taught Principle Learning (STaPLe), a new approach for LMs to generate their own constitutions on-policy, by learning the principles that are most effective to

1

5

Ramon Astudillo

@RamonAstudill12

2 months

If domain adaptation provides a moat against LLM providers is a million (billion) $ question. Watching Cursor's trajectory is probably one of the best ways to answer it.

Cursor

@cursor_ai

2 months

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.

0

Ramon Astudillo

@RamonAstudill12

3 months

IMO the problem with telling young people "work very hard" is that, at best, it is a necessary but not sufficient condition. So much more is needed like knowing what you want, how to get there, what are your limits, people skills, etc. "Work hard" alone is a recipe for burnout.

0

2

Ramon Astudillo

@RamonAstudill12

3 months

Is it me or that there has been an update to ChatGPT''s voice and now it sounds like it has difficulties speaking and breathing at the same time?

0

1

Ramon Astudillo

@RamonAstudill12

3 months

Peer review is at risk of disappearing mainly for reasons unrelated to the rise of bureaucrats to power on the orgs that coordinate/control it, but this is definitely making the situation far worse.

1

0

4

Aashka Trivedi

@aashkaa_

3 months

Granite Embedding R2 Models are here! 🔥 8k context 🏆 Top performance on BEIR, MTEB, COIR, MLDR, MT-RAG, Table IR, LongEmbed ⚡Fast and lightweight 🎯 Apache 2.0 license (trained on commercial friendly data) Try them now on @huggingface 👉

huggingface.co

tomaarsen

@tomaarsen

3 months

I really like the look of these, I reckon they would act as solid replacements of older models like all-MiniLM-L6-v2 and all-mpnet-base-v2. They'll do very solid on retrieval in particular. Very solid work, IBM!

0

3

13

Ramon Astudillo

@RamonAstudill12

3 months

If you think about when the first rumours of Q* started, it's like a year before O1-preview ... shouldn't we have heard something about the next thing already? not the best of signs?

0

1

Asaf Yehudai

@AsafYehudai

3 months

This is hepning in two hours!

Asaf Yehudai

@AsafYehudai

3 months

Today at 4 PM, we’re presenting our tutorial: “Evaluating LLM-based Agents: Foundations, Best Practices, & Open Challenges” If you’re in Montreal for @IJCAIconf, come join us to dive into the future of #AgentEvaluation! 🇨🇦🤖 w. @RoyBarHaim @LilachEdel and @alanli2020

0

1

13