Fahim Tajwar @FahimTajwar10 X Profile

Fahim Tajwar

@FahimTajwar10

Followers

628

Following

640

Media

21

Statuses

112

PhD Student @mldcmu @SCSatCMU BS/MS from @Stanford

Joined April 2021

Don't wanna be here? Send us removal request.

Fahim Tajwar

@FahimTajwar10

2 months

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers?. Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training!. 🧵 1/n

21

143

835

Fahim Tajwar

@FahimTajwar10

22 days

Please checkout Gaurav's insanely cool work on memorization, if you are at ICML!.

Gaurav Ghosal

@gaurav_ghosal

22 days

1/So much of privacy research is designing post-hoc methods to make models mem. free. It’s time we turn that around with architectural changes. Excited to add Memorization Sinks to the transformer architecture this #ICML2025 to isolate memorization during LLM training🧵

0

13

Fahim Tajwar

@FahimTajwar10

24 days

RT @g_k_swamy: Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on thes….

0

71

0

Fahim Tajwar

@FahimTajwar10

26 days

RT @AlexRobey23: On Monday, I'll be presenting a tutorial on jailbreaking LLMs + the security of AI agents with @HamedSHassani and @aminkar….

0

9

0

Fahim Tajwar

@FahimTajwar10

26 days

RT @yidingjiang: @abitha___ will be presenting our work on training language models to predict further into the future beyond the next tok….

0

5

0

Fahim Tajwar

@FahimTajwar10

26 days

RT @yidingjiang: I will be at ICML next week. If you are interested in chatting about anything related to generalization, exploration, and….

0

9

0

Fahim Tajwar

@FahimTajwar10

26 days

Please attend @yidingjiang 's oral presentation of our work, Paprika, at ICML!.

Yiding Jiang

@yidingjiang

26 days

I will talk about how to train agents with decision making capabilities that generalize to completely new environments:.

0

2

23

Fahim Tajwar

@FahimTajwar10

28 days

RT @sukjun_hwang: Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical netw….

0

728

0

Fahim Tajwar

@FahimTajwar10

1 month

RT @yidingjiang: A mental model I find useful: all data acquisition (web scrapes, synthetic data, RL rollouts, etc.) is really an explorati….

yidingjiang.github.io

This post explores the idea that the next breakthroughs in AI may hinge more on how we collect experience through exploration, and less on how many parameters and data points we have.

0

58

0

Fahim Tajwar

@FahimTajwar10

2 months

RT @allenainie: Decision-making with LLM can be studied with RL! Can an agent solve a task with text feedback (OS terminal, compiler, a per….

0

25

0

Fahim Tajwar

@FahimTajwar10

2 months

RT @mihdalal: Incredibly excited to share that Neural MP got accepted to IROS as an Oral presentation!! Huge congrats to the whole team (@J….

0

9

0

Fahim Tajwar

@FahimTajwar10

2 months

RT @g_k_swamy: Say ahoy to 𝚂𝙰𝙸𝙻𝙾𝚁⛵: a new paradigm of *learning to search* from demonstrations, enabling test-time reasoning about how to r….

0

74

0

Fahim Tajwar

@FahimTajwar10

2 months

RT @g_k_swamy: In my experience, the details of RLHF matter a shocking amount. If you'd like to avoid solving a hard exploration problem, t….

0

4

0

Fahim Tajwar

@FahimTajwar10

2 months

RT @askalphaxiv: "Can Large Reasoning Models Self-Train?". A brilliant paper from CMU showing LLMs can improve at math reasoning WITHOUT hu….

0

79

0

Fahim Tajwar

@FahimTajwar10

2 months

RT @mihdalal: This is really great work by Fahim and co, moving out of the regime where we have ground truth rewards is critical for the ne….

0

5

0

Fahim Tajwar

@FahimTajwar10

2 months

RT @shafayat_sheikh: Check out our latest work on self-improving LLMs, where we try to see if LLMs can utilize their internal self consiste….

0

24

0

Fahim Tajwar

@FahimTajwar10

2 months

RT @askalphaxiv: This is pretty remarkable – AI systems learning to self-improve. We're seeing a wave of research where AI isn't just learn….

0

131

0

Fahim Tajwar

@FahimTajwar10

2 months

RT @gaurav_ghosal: While LLMs contain extensive factual knowledge, they are also unreliable when answering questions downstream. In our #IC….

0

35

0

Fahim Tajwar

@FahimTajwar10

2 months

RT @IntologyAI: The 1st fully AI-generated scientific discovery to pass the highest level of peer review – the main track of an A* conferen….

0

134

0

Fahim Tajwar

@FahimTajwar10

2 months

19/ .This was an awesome collaboration with @shafayat_sheikh, my amazing advisors @rsalakhu and Jeff Schneider, and @Zanette_ai at @CarnegieMellon. I learned a lot throughout the project, and we appreciate any feedback! . Paper + code + datasets:

1

0

8

Fahim Tajwar

@FahimTajwar10

2 months

RT @mihirp98: Excited to share our work: Maximizing Confidence Alone Improves Reasoning. Humans rely on confidence to learn when answer key….

0

35

0