Houjun Liu Profile
Houjun Liu

@houjun_liu

Followers
72
Following
48
Media
5
Statuses
56

CS @stanford. Reasoning enjoyer @stanfordnlp, (PO)MDPs @SISLaboratory, and speech technologies @CarnegieMellon. AGI and Emacs are cool.

Stanford, CA
Joined July 2024
Don't wanna be here? Send us removal request.
@houjun_liu
Houjun Liu
17 hours
RT @tim_cook: Happy Birthday America! šŸ‡ŗšŸ‡ø.
0
3K
0
@houjun_liu
Houjun Liu
19 days
RT @chrmanning: Big congratulations to @ShikharMurty on receiving his @Stanford PhD today!
Tweet media one
0
2
0
@houjun_liu
Houjun Liu
19 days
RT @chrmanning: Huge congratulations to @annadgoldie on receiving her @Stanford PhD today! It’s been a great journey! .
0
9
0
@houjun_liu
Houjun Liu
26 days
RT @apartovi:
0
26
0
@houjun_liu
Houjun Liu
28 days
RT @stanfordnlp: Not on our bingo card!.
0
8
0
@houjun_liu
Houjun Liu
1 month
RT @jordanjuravsky: Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for….
0
46
0
@houjun_liu
Houjun Liu
1 month
RT @lateinteraction: What did DPO say to GRPO after they had a fight on main?. ā€œLet’s take this offline.ā€.
0
6
0
@houjun_liu
Houjun Liu
1 month
RT @lambdaviking: Padding a transformer’s input with blank tokens (. ) is a simple form of test-time compute. Can it increase the computat….
0
40
0
@houjun_liu
Houjun Liu
1 month
RT @aryaman2020: some day we will convince this guy to do more than a pinch of interp, but in the meantime definitely check out this cool p….
0
3
0
@houjun_liu
Houjun Liu
1 month
RT @houjun_liu: New Paper Day! For ACL Findings 2025:. You should **drop dropout** when you are training your LMs AND MLMs! .
0
16
0
@houjun_liu
Houjun Liu
1 month
@chrmanning I think there's a very cool training dynamics situation going on here; if you are pretraining with webtext and driving loss down to 0, disbursed representations matter a LOT less since your training corpus already regularizes decently.
0
0
9
@houjun_liu
Houjun Liu
1 month
@chrmanning Through a šŸ¤ pinch of interp, we show that model editing success gets degraded by pretraining with dropout. Dispersed representations built by dropout => less consistent representation of the world => worse models.
Tweet media one
1
1
15
@houjun_liu
Houjun Liu
1 month
@chrmanning BERTs and encoder models are not saved from this either, with MLM and SQuAD performance being degraded by just turning on 10% dropout.
Tweet media one
1
0
8
@houjun_liu
Houjun Liu
1 month
@chrmanning This stays true BOTH 1) at scale 2) with early dropout, which is supposed to be a way to stabilize convergence.
Tweet media one
Tweet media two
1
0
8
@houjun_liu
Houjun Liu
1 month
@chrmanning We show that applying dropout in pretraining kneecaps the models, even with downstream finetuning.
Tweet media one
1
0
8
@houjun_liu
Houjun Liu
1 month
@chrmanning Though most frontier shops already skips dropout in their biggest models, but BERTs, smaller LMs, and FTing still are trained with plenty of dropout (sometimes up to 30% 😱).
1
0
7
@houjun_liu
Houjun Liu
1 month
Work done with John Bauer and @chrmanning. Paper:
1
0
10
@houjun_liu
Houjun Liu
1 month
New Paper Day! For ACL Findings 2025:. You should **drop dropout** when you are training your LMs AND MLMs!
Tweet media one
3
16
84
@houjun_liu
Houjun Liu
1 month
RT @DSPyOSS: bro come on.
0
26
0
@houjun_liu
Houjun Liu
1 month
RT @neil_rathi: you should be using interp for evals 🌟.
0
3
0