Ramon Astudillo
@RamonAstudill12
Followers
579
Following
3K
Media
20
Statuses
3K
Principal RS at IBM Research AI. Speech, Formal/Natural Language Processing. Currently LLM post-training, structured SDG/RL. Opinions my own and non stationary
Manhattan, NY
Joined April 2019
2016: lol python? That's not programming. Real programmers use C/C++ 2026: lol AI coding agents? That's not programming. Real programmers use AI tab completion
0
0
0
> Be AI PhD student > Submit paper to conference > LLM slop reviews > Rejected > Concurrent paper with same method accepted > Resubmit to next conference > Reviewer points to concurrent paper which was accepted by last conference > Lack of novelty > Rejected
36
62
2K
ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?
27
96
508
CodeClash is our new benchmark for evaluating coding abilities- it's much harder than anything we've built before. LMs must manage an entire codebase and develop it to compete in challenging arenas against other LM-generated programs Current LMs really struggle, lots to do here!
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
6
9
71
Hot take: DAgger (Ross 2011) should be the first paper you read to get into RL, instead of Sutton's book. Maybe also read scheduled sampling (Bengio 2015). And before RL, study supervised learning thoroughly.
24
70
807
Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast. https://t.co/DX9bbalx0B Excited for the potential of building specialized models to help in critical domains.
56
76
799
Damasio's proto-conscience
Machines that can predict what their sensors (touch, cameras, keyboard, temperature, microphones, gyros, …) will perceive are already aware and have subjective experience. It’s all a matter of degree now. More sensors, data, compute, tasks will lead without any doubt to the “I
0
0
0
Geoffrey Hinton, father of deep learning (right) at age 31, with Chris Riesbeck (left). La Jolla, CA.
14
36
641
Strong motivation for us to get the COLM talks up quickly. https://t.co/4RtGArHEen
youtube.com
Official YouTube Account of the Conference on Language Modeling
2
4
68
The Generative Model Alignment team at IBM Research is looking for next summer interns! Two candidates for two topics 🍰Reinforcement Learning environments for LLMs 🐎Speculative and non-auto regressive generation for LLMs interested/curious? DM / email ramon.astudillo@ibm.com
0
4
6
we finally have western qwen and you won't ever believe who this is.
okay i will go on a limb: ibm really cooked with this and it might even dethrone my beloved qwen3 4b for local usage only tried the 4b dense version so far and its outputs are way less slop. example: clean up of transcript: granite4 not a single em-dash vs. 16 by qwen3
12
11
317
Accepted at NeurIPS 2025! See you in San Diego! ☀️🌊
Excited to share our new paper on language model self-improvement! Paper: https://t.co/etbyTNwDK9 We introduce Self-Taught Principle Learning (STaPLe), a new approach for LMs to generate their own constitutions on-policy, by learning the principles that are most effective to
1
1
5
If domain adaptation provides a moat against LLM providers is a million (billion) $ question. Watching Cursor's trajectory is probably one of the best ways to answer it.
We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.
0
0
0
IMO the problem with telling young people "work very hard" is that, at best, it is a necessary but not sufficient condition. So much more is needed like knowing what you want, how to get there, what are your limits, people skills, etc. "Work hard" alone is a recipe for burnout.
0
0
2
Is it me or that there has been an update to ChatGPT''s voice and now it sounds like it has difficulties speaking and breathing at the same time?
0
0
1
Peer review is at risk of disappearing mainly for reasons unrelated to the rise of bureaucrats to power on the orgs that coordinate/control it, but this is definitely making the situation far worse.
1
0
4
Granite Embedding R2 Models are here! 🔥 8k context 🏆 Top performance on BEIR, MTEB, COIR, MLDR, MT-RAG, Table IR, LongEmbed ⚡Fast and lightweight 🎯 Apache 2.0 license (trained on commercial friendly data) Try them now on @huggingface 👉
huggingface.co
I really like the look of these, I reckon they would act as solid replacements of older models like all-MiniLM-L6-v2 and all-mpnet-base-v2. They'll do very solid on retrieval in particular. Very solid work, IBM!
0
3
13
If you think about when the first rumours of Q* started, it's like a year before O1-preview ... shouldn't we have heard something about the next thing already? not the best of signs?
0
0
1
This is hepning in two hours!
Today at 4 PM, we’re presenting our tutorial: “Evaluating LLM-based Agents: Foundations, Best Practices, & Open Challenges” If you’re in Montreal for @IJCAIconf, come join us to dive into the future of #AgentEvaluation! 🇨🇦🤖 w. @RoyBarHaim @LilachEdel and @alanli2020
0
1
13