Seungju Han
@SeungjuHan3
Followers
1K
Following
2K
Media
14
Statuses
184
language models & reasoning. cs phd student @stanfordailab
Stanford, CA
Joined December 2020
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
11
116
643
Introducing *dual representations*! tl;dr: We represent a state by the "set of similarities" to all other states. This dual perspective has lots of nice properties and practical benefits in RL. Blog post: https://t.co/lw1PortD9E Paper: https://t.co/zYKFjyOy7C ↓
14
97
792
🔥Introducing #AgentFlow, a new trainable agentic system where a team of agents learns to plan and use tools in the flow of a task. 🌐 https://t.co/Smp4uMNGI3 📄 https://t.co/e4pb6lnGqe AgentFlow unlocks full potential of LLMs w/ tool-use. (And yes, our 3/7B model beats GPT-4o)👇
30
240
1K
Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute
9
83
444
Want state-of-the-art data curation, data poisoning & more? Just do gradient descent! w/ @andrew_ilyas Ben Chen @axel_s_feldmann @wsmoses @aleks_madry: we show how to optimize final model loss wrt any continuous variable. Key idea: Metagradients (grads through model training)
9
35
176
Honored to be back on TIME100 AI for 2025 — alongside my longtime heroes @drfeifei and @BarzilayRegina! 😍 The recognition goes to my amazing students and colleagues, who strive to find ways to use AI to better humanity, as opposed to making AI for the sake of making AI better
40
39
490
Most takes on RL environments are bad. 1. There are hardly any high-quality RL environments and evals available. Most agentic environments and evals are flawed when you look at the details. It’s a crisis: and no one is talking about it because they’re being hoodwinked by labs
30
46
704
At @OpenAI, we believe that AI can accelerate science and drug discovery. An exciting example is our work with @RetroBiosciences, where a custom model designed improved variants of the Nobel-prize winning Yamanaka proteins. Today we published a closer look at the breakthrough. ⬇️
159
653
4K
Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the
38
240
1K
recently gave a talk on <Reality Checks> at two venues, and discussed (and rambled) about how leaderboard chasing is awesome (and we want it to continue) but that this isn't easy because everyone (me! me! me!) wants to write more papers. the link to the slide deck in the reply.
2
14
123
🚀 How far can RL scaling take LLMs? Drop ProRLv2! 🔥We keep expanding LLM’s reasoning boundaries through 3,000+ RL steps over 5 domains and set a new state-of-the-art ✨ among 1.5B reasoning models. 🔗Full blog: https://t.co/Xj1oaLK5gE 🤗Open model:
huggingface.co
3
28
220
A common takeaway from "the bitter lesson" is we don't need to put effort into encoding inductive biases, we just need compute. Nothing could be further from the truth! Better inductive biases mean better scaling exponents, which means exponential improvements with computation.
20
35
421
(1/x) Excited to share our new work on MAPoRL🍁: Multi-Agent Post-Co-Training for Collaborative LLMs with RL. Most current approaches just prompt pre-trained models and hope they’ll work together. But can we train LLM to discover the collaboration strategy?
10
5
48
Gemini solved the math problems end-to-end in natural language (English). This differs from our results last year when experts first translated them into formal languages like Lean for specialized systems to tackle.
2
19
369
life update: I'll be starting my PhD in CS at Stanford this September! I'm very excited to continue my research on reasoning of language models and to make new friends in the Bay Area! I'm deeply grateful to everyone who supported me and made this milestone possible
35
19
742
https://t.co/KcfT8vHTSf i thought this paper interesting and tried to reproduce the numbers in Table 2 very impressive that models can memorize the problems in benchmarks, especially MATH500 / AIME24 / AMC23. GPQA, AIME25, and LiveMathBench are less memorized
2
2
18
Does RL actually learn positively under random rewards when optimizing Qwen on MATH? Is Qwen really that magical such that even RLing on random rewards can make it reason better? Following prior work on spurious rewards on RL, we ablated algorithms. It turns out that if you
Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: https://t.co/fPFfw17IIz
1
14
102
how do people fairly evaluate agents with web access on benchmarks like HLE or GPQA? there could be content directly related to the benchmark on the web (e.g. blogpost showing an example from the benchmark), how is this issue addressed?
0
0
5
n-simplex attention makes incredible sense because of its honesty: it literally says you can put more compute on attention operation to get more gains: we've seen this trend so many times. This differs from lot of 'suspicious' claim, such as you can use less compute to perform
14
19
523