
Houjun Liu
@houjun_liu
Followers
72
Following
48
Media
5
Statuses
56
CS @stanford. Reasoning enjoyer @stanfordnlp, (PO)MDPs @SISLaboratory, and speech technologies @CarnegieMellon. AGI and Emacs are cool.
Stanford, CA
Joined July 2024
RT @chrmanning: Huge congratulations to @annadgoldie on receiving her @Stanford PhD today! Itās been a great journey! .
0
9
0
RT @jordanjuravsky: Happy Throughput Thursday! Weāre excited to release Tokasaurus: an LLM inference engine designed from the ground up forā¦.
0
46
0
RT @lateinteraction: What did DPO say to GRPO after they had a fight on main?. āLetās take this offline.ā.
0
6
0
RT @lambdaviking: Padding a transformerās input with blank tokens (. ) is a simple form of test-time compute. Can it increase the computatā¦.
0
40
0
RT @aryaman2020: some day we will convince this guy to do more than a pinch of interp, but in the meantime definitely check out this cool pā¦.
0
3
0
RT @houjun_liu: New Paper Day! For ACL Findings 2025:. You should **drop dropout** when you are training your LMs AND MLMs! .
0
16
0
@chrmanning I think there's a very cool training dynamics situation going on here; if you are pretraining with webtext and driving loss down to 0, disbursed representations matter a LOT less since your training corpus already regularizes decently.
0
0
9
@chrmanning Through a š¤ pinch of interp, we show that model editing success gets degraded by pretraining with dropout. Dispersed representations built by dropout => less consistent representation of the world => worse models.
1
1
15
@chrmanning BERTs and encoder models are not saved from this either, with MLM and SQuAD performance being degraded by just turning on 10% dropout.
1
0
8
@chrmanning This stays true BOTH 1) at scale 2) with early dropout, which is supposed to be a way to stabilize convergence.
1
0
8
@chrmanning We show that applying dropout in pretraining kneecaps the models, even with downstream finetuning.
1
0
8
@chrmanning Though most frontier shops already skips dropout in their biggest models, but BERTs, smaller LMs, and FTing still are trained with plenty of dropout (sometimes up to 30% š±).
1
0
7