
Fahim Tajwar
@FahimTajwar10
Followers
628
Following
640
Media
21
Statuses
112
PhD Student @mldcmu @SCSatCMU BS/MS from @Stanford
Joined April 2021
RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers?. Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training!. š§µ 1/n
21
143
835
Please checkout Gaurav's insanely cool work on memorization, if you are at ICML!.
1/So much of privacy research is designing post-hoc methods to make models mem. free. Itās time we turn that around with architectural changes. Excited to add Memorization Sinks to the transformer architecture this #ICML2025 to isolate memorization during LLM trainingš§µ
0
0
13
RT @g_k_swamy: Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on thesā¦.
0
71
0
RT @AlexRobey23: On Monday, I'll be presenting a tutorial on jailbreaking LLMs + the security of AI agents with @HamedSHassani and @aminkarā¦.
0
9
0
RT @yidingjiang: @abitha___ will be presenting our work on training language models to predict further into the future beyond the next tokā¦.
0
5
0
RT @yidingjiang: I will be at ICML next week. If you are interested in chatting about anything related to generalization, exploration, andā¦.
0
9
0
Please attend @yidingjiang 's oral presentation of our work, Paprika, at ICML!.
I will talk about how to train agents with decision making capabilities that generalize to completely new environments:.
0
2
23
RT @sukjun_hwang: Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical netwā¦.
0
728
0
RT @yidingjiang: A mental model I find useful: all data acquisition (web scrapes, synthetic data, RL rollouts, etc.) is really an exploratiā¦.
yidingjiang.github.io
This post explores the idea that the next breakthroughs in AI may hinge more on how we collect experience through exploration, and less on how many parameters and data points we have.
0
58
0
RT @allenainie: Decision-making with LLM can be studied with RL! Can an agent solve a task with text feedback (OS terminal, compiler, a perā¦.
0
25
0
RT @g_k_swamy: Say ahoy to šš°šøš»š¾šāµ: a new paradigm of *learning to search* from demonstrations, enabling test-time reasoning about how to rā¦.
0
74
0
RT @g_k_swamy: In my experience, the details of RLHF matter a shocking amount. If you'd like to avoid solving a hard exploration problem, tā¦.
0
4
0
RT @askalphaxiv: "Can Large Reasoning Models Self-Train?". A brilliant paper from CMU showing LLMs can improve at math reasoning WITHOUT huā¦.
0
79
0
RT @mihdalal: This is really great work by Fahim and co, moving out of the regime where we have ground truth rewards is critical for the neā¦.
0
5
0
RT @shafayat_sheikh: Check out our latest work on self-improving LLMs, where we try to see if LLMs can utilize their internal self consisteā¦.
0
24
0
RT @askalphaxiv: This is pretty remarkable ā AI systems learning to self-improve. We're seeing a wave of research where AI isn't just learnā¦.
0
131
0
RT @gaurav_ghosal: While LLMs contain extensive factual knowledge, they are also unreliable when answering questions downstream. In our #ICā¦.
0
35
0
RT @IntologyAI: The 1st fully AI-generated scientific discovery to pass the highest level of peer review ā the main track of an A* conferenā¦.
0
134
0
19/ .This was an awesome collaboration with @shafayat_sheikh, my amazing advisors @rsalakhu and Jeff Schneider, and @Zanette_ai at @CarnegieMellon. I learned a lot throughout the project, and we appreciate any feedback! . Paper + code + datasets:
1
0
8
RT @mihirp98: Excited to share our work: Maximizing Confidence Alone Improves Reasoning. Humans rely on confidence to learn when answer keyā¦.
0
35
0