RamonAstudill12 Profile Banner
Ramon Astudillo Profile
Ramon Astudillo

@RamonAstudill12

Followers
579
Following
3K
Media
20
Statuses
3K

Principal RS at IBM Research AI. Speech, Formal/Natural Language Processing. Currently LLM post-training, structured SDG/RL. Opinions my own and non stationary

Manhattan, NY
Joined April 2019
Don't wanna be here? Send us removal request.
@RamonAstudill12
Ramon Astudillo
14 hours
2016: lol python? That's not programming. Real programmers use C/C++ 2026: lol AI coding agents? That's not programming. Real programmers use AI tab completion
0
0
0
@siddarthv66
Siddarth Venkatraman
7 days
> Be AI PhD student > Submit paper to conference > LLM slop reviews > Rejected > Concurrent paper with same method accepted > Resubmit to next conference > Reviewer points to concurrent paper which was accepted by last conference > Lack of novelty > Rejected
36
62
2K
@gneubig
Graham Neubig
8 days
ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?
27
96
508
@OfirPress
Ofir Press
18 days
CodeClash is our new benchmark for evaluating coding abilities- it's much harder than anything we've built before. LMs must manage an entire codebase and develop it to compete in challenging arenas against other LM-generated programs Current LMs really struggle, lots to do here!
@jyangballin
John Yang
18 days
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
6
9
71
@shaneguML
Shane Gu
23 days
Hot take: DAgger (Ross 2011) should be the first paper you read to get into RL, instead of Sutton's book. Maybe also read scheduled sampling (Bengio 2015). And before RL, study supervised learning thoroughly.
24
70
807
@srush_nlp
Sasha Rush
25 days
Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast. https://t.co/DX9bbalx0B Excited for the potential of building specialized models to help in critical domains.
56
76
799
@RamonAstudill12
Ramon Astudillo
26 days
Guys, none of you invented distillation
0
0
0
@RamonAstudill12
Ramon Astudillo
1 month
Damasio's proto-conscience
@NandoDF
Nando de Freitas
1 month
Machines that can predict what their sensors (touch, cameras, keyboard, temperature, microphones, gyros, …) will perceive are already aware and have subjective experience. It’s all a matter of degree now. More sensors, data, compute, tasks will lead without any doubt to the “I
0
0
0
@AcquiredFM
Acquired Podcast
1 month
Geoffrey Hinton, father of deep learning (right) at age 31, with Chris Riesbeck (left). La Jolla, CA.
14
36
641
@srush_nlp
Sasha Rush
2 months
Strong motivation for us to get the COLM talks up quickly. https://t.co/4RtGArHEen
Tweet card summary image
youtube.com
Official YouTube Account of the Conference on Language Modeling
2
4
68
@RamonAstudill12
Ramon Astudillo
2 months
The Generative Model Alignment team at IBM Research is looking for next summer interns! Two candidates for two topics 🍰Reinforcement Learning environments for LLMs 🐎Speculative and non-auto regressive generation for LLMs interested/curious? DM / email ramon.astudillo@ibm.com
0
4
6
@Dorialexander
Alexander Doria
2 months
we finally have western qwen and you won't ever believe who this is.
@xeophon_
Xeophon
2 months
okay i will go on a limb: ibm really cooked with this and it might even dethrone my beloved qwen3 4b for local usage only tried the 4b dense version so far and its outputs are way less slop. example: clean up of transcript: granite4 not a single em-dash vs. 16 by qwen3
12
11
317
@KeshavRamji
Keshav Ramji 🔜 NeurIPS'25
2 months
Accepted at NeurIPS 2025! See you in San Diego! ☀️🌊
@KeshavRamji
Keshav Ramji 🔜 NeurIPS'25
6 months
Excited to share our new paper on language model self-improvement! Paper: https://t.co/etbyTNwDK9 We introduce Self-Taught Principle Learning (STaPLe), a new approach for LMs to generate their own constitutions on-policy, by learning the principles that are most effective to
1
1
5
@RamonAstudill12
Ramon Astudillo
2 months
If domain adaptation provides a moat against LLM providers is a million (billion) $ question. Watching Cursor's trajectory is probably one of the best ways to answer it.
@cursor_ai
Cursor
2 months
We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.
0
0
0
@RamonAstudill12
Ramon Astudillo
3 months
IMO the problem with telling young people "work very hard" is that, at best, it is a necessary but not sufficient condition. So much more is needed like knowing what you want, how to get there, what are your limits, people skills, etc. "Work hard" alone is a recipe for burnout.
0
0
2
@RamonAstudill12
Ramon Astudillo
3 months
Is it me or that there has been an update to ChatGPT''s voice and now it sounds like it has difficulties speaking and breathing at the same time?
0
0
1
@RamonAstudill12
Ramon Astudillo
3 months
Peer review is at risk of disappearing mainly for reasons unrelated to the rise of bureaucrats to power on the orgs that coordinate/control it, but this is definitely making the situation far worse.
1
0
4
@aashkaa_
Aashka Trivedi
3 months
Granite Embedding R2 Models are here! 🔥 8k context 🏆 Top performance on BEIR, MTEB, COIR, MLDR, MT-RAG, Table IR, LongEmbed ⚡Fast and lightweight 🎯 Apache 2.0 license (trained on commercial friendly data) Try them now on @huggingface 👉
Tweet card summary image
huggingface.co
@tomaarsen
tomaarsen
3 months
I really like the look of these, I reckon they would act as solid replacements of older models like all-MiniLM-L6-v2 and all-mpnet-base-v2. They'll do very solid on retrieval in particular. Very solid work, IBM!
0
3
13
@RamonAstudill12
Ramon Astudillo
3 months
If you think about when the first rumours of Q* started, it's like a year before O1-preview ... shouldn't we have heard something about the next thing already? not the best of signs?
0
0
1
@AsafYehudai
Asaf Yehudai
3 months
This is hepning in two hours!
@AsafYehudai
Asaf Yehudai
3 months
Today at 4 PM, we’re presenting our tutorial: “Evaluating LLM-based Agents: Foundations, Best Practices, & Open Challenges” If you’re in Montreal for @IJCAIconf, come join us to dive into the future of #AgentEvaluation! 🇨🇦🤖 w. @RoyBarHaim @LilachEdel and @alanli2020
0
1
13