_emliu Profile Banner
Emmy Liu Profile
Emmy Liu

@_emliu

Followers
1K
Following
1K
Media
51
Statuses
299

PhD student @LTIatCMU, working with @gneubig on NLP || intern @AIatMeta || UofT ‘21 🇨🇦 ||🤖✨🔡

Pittsburgh, PA
Joined November 2021
Don't wanna be here? Send us removal request.
@_emliu
Emmy Liu
9 months
What design decisions in LLM training affect the final performance of LLMs? Scaling model size and training data is important, but it's not the only thing. We performed an analysis of 90+ open-weights models to answer this question. 🧵 https://t.co/R8FkBHgwgM (1/12)
6
57
217
@aryaman2020
Aryaman Arora
28 days
i hate ML conference reviewers. i take back everything bad i ever said about ACL. every ACL reviewer i ever got was at least literate
15
20
478
@nlpxuhui
Xuhui Zhou@NeurIPS
1 month
Hoping your coding agents could understand you and adapt to your preferences? Meet TOM-SWE, our new framework for coding agents that don’t just write code, but model the user's mind persistently (ranging from general preferences to small details) arxiv: https://t.co/uznLAjgWKr
6
41
124
@_emliu
Emmy Liu
1 month
Happy to visit Suzhou for #EMNLP2025 soon! Check out our work on how things like data affect deviation from scaling laws. DM me if you want to talk about scaling laws, pre/midtraining, or the science of LM development! Fri. Nov 7 at 14:00-15:30 Poster Hall C
@_emliu
Emmy Liu
9 months
What design decisions in LLM training affect the final performance of LLMs? Scaling model size and training data is important, but it's not the only thing. We performed an analysis of 90+ open-weights models to answer this question. 🧵 https://t.co/R8FkBHgwgM (1/12)
0
6
31
@gneubig
Graham Neubig
1 month
If you're at #EMNLP2025 next week, say hi to: - @_emliu, presenting Not Just Scaling Laws: https://t.co/u61AxfxABA - @Jeande_d, presenting CulturalGround: https://t.co/rBxvtdY4am - @yueqi_song, presenting CulturalGround and Synthetic Socratic Debates:
Tweet card summary image
arxiv.org
As large language models (LLMs) are increasingly used in morally sensitive domains, it is crucial to understand how persona traits affect their moral reasoning and persuasive behavior. We present...
@_emliu
Emmy Liu
1 month
MIA from twitter for a while, but brief life update! ✨ Passed my thesis proposal during the summer, a huge thank you to all my committee members! @gneubig, @XiongChenyan, @AdtRaghunathan, @jacobandreas, and @AndrewLampinen ✨ I will be at #EMNLP2025 next week (more l8r)
2
8
63
@_emliu
Emmy Liu
1 month
MIA from twitter for a while, but brief life update! ✨ Passed my thesis proposal during the summer, a huge thank you to all my committee members! @gneubig, @XiongChenyan, @AdtRaghunathan, @jacobandreas, and @AndrewLampinen ✨ I will be at #EMNLP2025 next week (more l8r)
12
6
122
@yueqi_song
Yueqi Song
1 month
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
Tweet card summary image
arxiv.org
Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...
27
175
1K
@TuhinChakr
Tuhin Chakrabarty
2 months
🚨New paper on AI and copyright Several authors have sued LLM companies for allegedly using their books without permission for model training. 👩‍⚖️Courts, however, require empirical evidence of harm (e.g., market dilution). Our new pre-registered study addresses exactly this
9
173
526
@niloofar_mire
Niloofar
2 months
I'm recruiting students for fall 2026 thru @LTIatCMU & @CMU_EPP, in: 1. Privacy & security of LLMs, coding, long horizon & embodied agents (robotics) 2. Tiny local llms 3. AI for scientific reasoning, esp. chemistry 4. Latent reasoning 5. anything YOU are passionate about!
27
188
1K
@ChengleiSi
CLS
5 months
Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.
12
196
635
@kgashteo
Kiril Gashteovski
4 months
Happy to share that our paper got accepted at #EMNLP2025! In this work, we study which factors—other than scaling model size or training data size—affect the performance of LLMs. For details, see Emmy's thread 👇
@_emliu
Emmy Liu
9 months
What design decisions in LLM training affect the final performance of LLMs? Scaling model size and training data is important, but it's not the only thing. We performed an analysis of 90+ open-weights models to answer this question. 🧵 https://t.co/R8FkBHgwgM (1/12)
0
1
5
@_emliu
Emmy Liu
4 months
Check out this exciting workshop at CMU on Sept 12! AI for science is an exciting upcoming area, hope to hear some great discussions!
@JiayiiGeng
Jiayi Geng
4 months
📢 We're thrilled to announce the CMU AI for Science Workshop on Sept 12 at CUC-MPW! Featuring an amazing lineup of speakers: - Akari Asai (AI2/CMU) - Gabe Gomes (CMU) - Chenglei Si (Stanford) - Keyon Vafa (Harvard) Join us on campus, submit your poster & register here:
0
0
8
@DengHokin
Hokin Deng
4 months
#embodied All forms of biological intelligence are grounded movements🏃‍♂️ muscles & motor neurons 🧠 emerge before visual cortex & rods & cones in eyes 👁️ Building monocular better-than-mocap-studio #video2motion is our critical step towards human embodied intelligence.
@MyolabAI
myolab.ai
4 months
💥We are excited to share that our #Video2Animation feature is now live at @MyolabAI's discord servers. We're giving away massive 𝐟𝐫𝐞𝐞 credits for early users. Head now to 👉 https://t.co/c4rLQxorBM
1
17
33
@gneubig
Graham Neubig
7 months
I do think that AI has a lot of promise for science, but we need lots of serious work to get there! One thing I'm very interested in is how we can build AI systems that can effectively judge the quality of research.
0
1
16
@_emliu
Emmy Liu
7 months
Returning from #NAACL2025 & had some interesting discussions! One topic that came up a lot was AI scientists and how they should be implemented, evaluated, etc. Used this as inspiration to finish up a blog post on AI science and the state of reviewing: https://t.co/gXnMJshJor
4
7
82
@kayo_yin
Kayo Yin
9 months
Induction heads are commonly associated with in-context learning, but are they the primary driver of ICL at scale? We find that recently discovered "function vector" heads, which encode the ICL task, are the actual primary drivers of few-shot ICL. https://t.co/zTpiOKatEF 🧵
16
115
776
@_emliu
Emmy Liu
8 months
@gneubig @LuWang__ @dilekhakkanitur @Ritam_Dutt @abertsch72 If you have any questions ahead of the panel, please submit them here! Faculty panel: https://t.co/KNNboWMFB4 Early career panel:
1
0
0
@_emliu
Emmy Liu
8 months
Faculty panel: James H. Martin @gneubig @LuWang__ @dilekhakkanitur Early career panel: Adam Wiemerslage Manuel Mager @Ritam_Dutt @abertsch72
1
0
2
@_emliu
Emmy Liu
8 months
Also at #NAACL SRW next week, 2 career panels aimed at students! We have a diverse group of faculty and early-career researchers sharing their advice and experiences! (all on May 1) 🕐 Faculty panel: 10:45-12PM MDT 🕑 Early career panel: 2-3PM MDT 📍 San Miguel, or online
1
8
36
@_emliu
Emmy Liu
8 months
Join us this week at the #NAACL student research workshop for an exciting keynote presentation by @psresnik! Fun fact, he was a co-creator of the SRW back in the day, so this is coming back full circle ⭐ ⏱️ May 1st, 4-5PM MDT 📍 San Miguel, or online
0
4
15
@belindazli
Belinda Li
9 months
Past work has shown that world state is linearly decodable from LMs trained on text and games like Othello. But how do LMs *compute* these states? We investigate state tracking using permutation composition as a model problem, and discover interpretable, controllable procedures🧵
3
46
228