PeterWestTM Profile Banner
Peter West Profile
Peter West

@PeterWestTM

Followers
1K
Following
2K
Media
16
Statuses
278

AI / NLP Researcher Incoming faculty at @UBC_CS and @CAIDA_UBC Postdoctoral fellow at @StanfordHAI @stanfordnlp Former PhD student at @uwcse @uwnlp he/him

Joined September 2019
Don't wanna be here? Send us removal request.
@PeterWestTM
Peter West
1 year
I have multiple MSc/PhD openings in my lab at @UBC_CS! Come discover the hidden capabilities/limits of LLMs, e.g. how to learn from, guide, and understand the outputs of models. See my website (bio) for more details. https://t.co/GWEH8yOO2k Apply by December 15th! Also...
8
61
163
@hila_gonen
Hila Gonen
1 month
Considering a PhD/MSc in NLP? I’m hiring students this cycle! If you are passionate about making language models reliable and safe, eager about understanding and controlling language models, and would like to add to your research some multilingual flavor - apply to my group! 👇
16
102
728
@divingwithorcas
Dang Nguyen
1 month
The top places in all of our leaderboards have been cracked. The reign of AI is over.
@universeinanegg
Ari Holtzman
2 months
For those who missed it, we just releaaed a little LLM-backed game called HR Simulator™ You play an intern ghostwriting emails for your boss. It’s like you’re stuck in corporate email hell…and you’re the devil 😈 link and an initial answer to “WHY WOULD YOU DO THIS?” below
1
2
6
@UBC_CS
UBC Computer Science
1 month
UBC Computer Science invites applications for up to two full-time tenure-track positions with the following priority areas: visualization, robotics, reinforcement learning, data management, and data mining. Applications are due Wed Dec 10, 2025. https://t.co/ARgHUbnGny
0
11
16
@m2saxon
Michael Saxon ✈️ NeurIPS SD
1 month
𝑵𝒆𝒘 𝒃𝒍𝒐𝒈𝒑𝒐𝒔𝒕! In which I give some brief reflections on #COLM2025 and give a rundown of a few great papers I checked out!
5
24
147
@ma_tay_
Taylor Sorensen
2 months
🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵
5
49
194
@shi_weiyan
Weiyan Shi
1 month
New paper: You can make ChatGPT 2x as creative with one sentence. Ever notice how LLMs all sound the same? They know 100+ jokes but only ever tell one. Every blog intro: "In today's digital landscape..." We figured out why – and how to unlock the rest 🔓 Copy-paste prompt: 🧵
60
155
1K
@wzhao_nlp
Wenting Zhao
2 months
Want to hear some hot takes about the future of language modeling, and share your takes too? Stop by the Visions of Language Modeling workshop at COLM on Friday, October 10 in room 519A! There will be over a dozen speakers working on all kinds of problems in modeling language and
1
15
80
@PeterWestTM
Peter West
2 months
Check out @eunjeong_hwang’s paper—how do we give LLMs aspects of social intelligence that actually *help* in conversation?
@eunjeong_hwang
EunJeong Hwang
2 months
Theory of Mind is key to human social intelligence, but does giving LLMs ToM make them better social reasoners?🤔 We find that ToM makes LLMs better at dialogue: more strategic, goal-oriented, enabling long-horizon adaptation! We introduce ToMA, a ToM-focused dialogue agent🧵👇
0
0
13
@PeterWestTM
Peter West
2 months
I considered myself a pretty effective email writer until we (led by the amazing @divingwithorcas!) started building this game. See if you fare any better than I did...
@universeinanegg
Ari Holtzman
2 months
For those who missed it, we just releaaed a little LLM-backed game called HR Simulator™ You play an intern ghostwriting emails for your boss. It’s like you’re stuck in corporate email hell…and you’re the devil 😈 link and an initial answer to “WHY WOULD YOU DO THIS?” below
1
2
9
@universeinanegg
Ari Holtzman
3 months
testing a game we're building where the mechanic is writing tricky HR emails, and noticing that LLMs have a built-in secret handshake with users to bypass safety guardrails. This seems both necessary to make LLMs actually useful and like they make guardrails essentially useless
0
1
7
@niloofar_mire
Niloofar
4 months
🧵 Academic job market season is almost here! There's so much rarely discussed—nutrition, mental and physical health, uncertainty, and more. I'm sharing my statements, essential blogs, and personal lessons here, with more to come in the upcoming weeks! ⬇️ (1/N)
3
40
261
@universeinanegg
Ari Holtzman
4 months
the economist published my little letter about the necessity of chaos for discovery
@DSI_UChicago
Data Science Institute
4 months
How can chaos create brilliance and breakthroughs? Ari Holtzman (@universeinanegg) Assistant Professor of Computer Science and Data Science, explores how embracing chaos has unlocked the capabilities of AI systems in a letter to @TheEconomist! https://t.co/U1RJKcFyO6
0
1
17
@universeinanegg
Ari Holtzman
5 months
Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper:
Tweet card summary image
arxiv.org
Prompting is the primary method by which we study and control large language models. It is also one of the most powerful: nearly every major capability attributed to LLMs-few-shot learning,...
6
33
160
@KaiserWhoLearns
Kaiser Sun
6 months
What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑 TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8 #NLProc #LLM #AIResearch
4
23
86
@universeinanegg
Ari Holtzman
5 months
The fact that in pretty much all LLMs the generative branching factor goes down as the model keeps generating feels like a fundamental limit of LLM creativity, and I've never seen a satisfying solution.
@chrome1996
Chenghao Yang
5 months
Have you noticed… 🔍 Aligned LLM generations feel less diverse? 🎯 Base models are decoding-sensitive? 🤔 Generations get more predictable as they progress? 🌲 Tree search fails mid-generation (esp. for reasoning)? We trace these mysteries to LLM probability concentration, and
2
5
29
@harveyiyun
Harvey Yiyun Fu @NeurIPS
5 months
LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing? 🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative space” in documents. paper:
11
33
160
@alex_gill_nlp
Alex Gill
6 months
𝐖𝐡𝐚𝐭 𝐇𝐚𝐬 𝐁𝐞𝐞𝐧 𝐋𝐨𝐬𝐭 𝐖𝐢𝐭𝐡 𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧? I'm happy to announce that the preprint release of my first project is online! Developed with the amazing support of @lasha_nlp and @anmarasovic (Full link below 👇)
1
20
76
@jaehunjung_com
Jaehun Jung
6 months
Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵
6
37
181
@robert_csordas
Csordás Róbert
6 months
Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations. In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6
35
140
1K
@Mike_A_Merrill
Mike A. Merrill
6 months
Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr
16
66
244