SeungjuHan3 Profile Banner
Seungju Han Profile
Seungju Han

@SeungjuHan3

Followers
1K
Following
2K
Media
14
Statuses
184

language models & reasoning. cs phd student @stanfordailab

Stanford, CA
Joined December 2020
Don't wanna be here? Send us removal request.
@HPouransari
Hadi Pouransari
1 month
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
11
116
643
@seohong_park
Seohong Park
28 days
Introducing *dual representations*! tl;dr: We represent a state by the "set of similarities" to all other states. This dual perspective has lots of nice properties and practical benefits in RL. Blog post: https://t.co/lw1PortD9E Paper: https://t.co/zYKFjyOy7C
14
97
792
@lupantech
Pan Lu
29 days
🔥Introducing #AgentFlow, a new trainable agentic system where a team of agents learns to plan and use tools in the flow of a task. 🌐 https://t.co/Smp4uMNGI3 📄 https://t.co/e4pb6lnGqe AgentFlow unlocks full potential of LLMs w/ tool-use. (And yes, our 3/7B model beats GPT-4o)👇
30
240
1K
@kothasuhas
Suhas Kotha
2 months
Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute
9
83
444
@logan_engstrom
Logan Engstrom
8 months
Want state-of-the-art data curation, data poisoning & more? Just do gradient descent! w/ @andrew_ilyas Ben Chen @axel_s_feldmann @wsmoses @aleks_madry: we show how to optimize final model loss wrt any continuous variable. Key idea: Metagradients (grads through model training)
9
35
176
@YejinChoinka
Yejin Choi
2 months
Honored to be back on TIME100 AI for 2025 — alongside my longtime heroes @drfeifei and @BarzilayRegina! 😍 The recognition goes to my amazing students and colleagues, who strive to find ways to use AI to better humanity, as opposed to making AI for the sake of making AI better
40
39
490
@rosstaylor90
Ross Taylor
2 months
Most takes on RL environments are bad. 1. There are hardly any high-quality RL environments and evals available. Most agentic environments and evals are flawed when you look at the details. It’s a crisis: and no one is talking about it because they’re being hoodwinked by labs
30
46
704
@BorisMPower
Boris Power
3 months
At @OpenAI, we believe that AI can accelerate science and drug discovery. An exciting example is our work with @RetroBiosciences, where a custom model designed improved variants of the Nobel-prize winning Yamanaka proteins. Today we published a closer look at the breakthrough. ⬇️
159
653
4K
@ctnzr
Bryan Catanzaro
3 months
Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the
38
240
1K
@kchonyc
Kyunghyun Cho
3 months
recently gave a talk on <Reality Checks> at two venues, and discussed (and rambled) about how leaderboard chasing is awesome (and we want it to continue) but that this isn't easy because everyone (me! me! me!) wants to write more papers. the link to the slide deck in the reply.
2
14
123
@GXiming
Ximing Lu
3 months
🚀 How far can RL scaling take LLMs? Drop ProRLv2! 🔥We keep expanding LLM’s reasoning boundaries through 3,000+ RL steps over 5 domains and set a new state-of-the-art ✨ among 1.5B reasoning models. 🔗Full blog: https://t.co/Xj1oaLK5gE 🤗Open model:
Tweet card summary image
huggingface.co
@_AndrewZhao
Andrew Zhao
5 months
RL scaling is here https://t.co/IX8z8XV6WX
3
28
220
@andrewgwils
Andrew Gordon Wilson
3 months
A common takeaway from "the bitter lesson" is we don't need to put effort into encoding inductive biases, we just need compute. Nothing could be further from the truth! Better inductive biases mean better scaling exponents, which means exponential improvements with computation.
20
35
421
@chanwoopark20
Chanwoo Park
4 months
(1/x) Excited to share our new work on MAPoRL🍁: Multi-Agent Post-Co-Training for Collaborative LLMs with RL. Most current approaches just prompt pre-trained models and hope they’ll work together. But can we train LLM to discover the collaboration strategy?
10
5
48
@GoogleDeepMind
Google DeepMind
4 months
Gemini solved the math problems end-to-end in natural language (English). This differs from our results last year when experts first translated them into formal languages like Lean for specialized systems to tackle.
2
19
369
@SeungjuHan3
Seungju Han
4 months
life update: I'll be starting my PhD in CS at Stanford this September! I'm very excited to continue my research on reasoning of language models and to make new friends in the Bay Area! I'm deeply grateful to everyone who supported me and made this milestone possible
35
19
742
@SeungjuHan3
Seungju Han
4 months
https://t.co/KcfT8vHTSf i thought this paper interesting and tried to reproduce the numbers in Table 2 very impressive that models can memorize the problems in benchmarks, especially MATH500 / AIME24 / AMC23. GPQA, AIME25, and LiveMathBench are less memorized
2
2
18
@WenSun1
Wen Sun
4 months
Does RL actually learn positively under random rewards when optimizing Qwen on MATH? Is Qwen really that magical such that even RLing on random rewards can make it reason better? Following prior work on spurious rewards on RL, we ablated algorithms. It turns out that if you
@g_k_swamy
Gokul Swamy
4 months
Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: https://t.co/fPFfw17IIz
1
14
102
@SeungjuHan3
Seungju Han
4 months
how do people fairly evaluate agents with web access on benchmarks like HLE or GPQA? there could be content directly related to the benchmark on the web (e.g. blogpost showing an example from the benchmark), how is this issue addressed?
0
0
5
@cloneofsimo
Simo Ryu
4 months
n-simplex attention makes incredible sense because of its honesty: it literally says you can put more compute on attention operation to get more gains: we've seen this trend so many times. This differs from lot of 'suspicious' claim, such as you can use less compute to perform
14
19
523