Peter West
@PeterWestTM
Followers
1K
Following
2K
Media
16
Statuses
278
AI / NLP Researcher Incoming faculty at @UBC_CS and @CAIDA_UBC Postdoctoral fellow at @StanfordHAI @stanfordnlp Former PhD student at @uwcse @uwnlp he/him
Joined September 2019
I have multiple MSc/PhD openings in my lab at @UBC_CS! Come discover the hidden capabilities/limits of LLMs, e.g. how to learn from, guide, and understand the outputs of models. See my website (bio) for more details. https://t.co/GWEH8yOO2k Apply by December 15th! Also...
8
61
163
Considering a PhD/MSc in NLP? I’m hiring students this cycle! If you are passionate about making language models reliable and safe, eager about understanding and controlling language models, and would like to add to your research some multilingual flavor - apply to my group! 👇
16
102
728
The top places in all of our leaderboards have been cracked. The reign of AI is over.
For those who missed it, we just releaaed a little LLM-backed game called HR Simulator™ You play an intern ghostwriting emails for your boss. It’s like you’re stuck in corporate email hell…and you’re the devil 😈 link and an initial answer to “WHY WOULD YOU DO THIS?” below
1
2
6
UBC Computer Science invites applications for up to two full-time tenure-track positions with the following priority areas: visualization, robotics, reinforcement learning, data management, and data mining. Applications are due Wed Dec 10, 2025. https://t.co/ARgHUbnGny
0
11
16
🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵
5
49
194
New paper: You can make ChatGPT 2x as creative with one sentence. Ever notice how LLMs all sound the same? They know 100+ jokes but only ever tell one. Every blog intro: "In today's digital landscape..." We figured out why – and how to unlock the rest 🔓 Copy-paste prompt: 🧵
60
155
1K
Want to hear some hot takes about the future of language modeling, and share your takes too? Stop by the Visions of Language Modeling workshop at COLM on Friday, October 10 in room 519A! There will be over a dozen speakers working on all kinds of problems in modeling language and
1
15
80
Check out @eunjeong_hwang’s paper—how do we give LLMs aspects of social intelligence that actually *help* in conversation?
Theory of Mind is key to human social intelligence, but does giving LLMs ToM make them better social reasoners?🤔 We find that ToM makes LLMs better at dialogue: more strategic, goal-oriented, enabling long-horizon adaptation! We introduce ToMA, a ToM-focused dialogue agent🧵👇
0
0
13
I considered myself a pretty effective email writer until we (led by the amazing @divingwithorcas!) started building this game. See if you fare any better than I did...
For those who missed it, we just releaaed a little LLM-backed game called HR Simulator™ You play an intern ghostwriting emails for your boss. It’s like you’re stuck in corporate email hell…and you’re the devil 😈 link and an initial answer to “WHY WOULD YOU DO THIS?” below
1
2
9
testing a game we're building where the mechanic is writing tricky HR emails, and noticing that LLMs have a built-in secret handshake with users to bypass safety guardrails. This seems both necessary to make LLMs actually useful and like they make guardrails essentially useless
0
1
7
🧵 Academic job market season is almost here! There's so much rarely discussed—nutrition, mental and physical health, uncertainty, and more. I'm sharing my statements, essential blogs, and personal lessons here, with more to come in the upcoming weeks! ⬇️ (1/N)
3
40
261
the economist published my little letter about the necessity of chaos for discovery
How can chaos create brilliance and breakthroughs? Ari Holtzman (@universeinanegg) Assistant Professor of Computer Science and Data Science, explores how embracing chaos has unlocked the capabilities of AI systems in a letter to @TheEconomist! https://t.co/U1RJKcFyO6
0
1
17
Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper:
arxiv.org
Prompting is the primary method by which we study and control large language models. It is also one of the most powerful: nearly every major capability attributed to LLMs-few-shot learning,...
6
33
160
What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑 TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8 #NLProc #LLM #AIResearch
4
23
86
The fact that in pretty much all LLMs the generative branching factor goes down as the model keeps generating feels like a fundamental limit of LLM creativity, and I've never seen a satisfying solution.
Have you noticed… 🔍 Aligned LLM generations feel less diverse? 🎯 Base models are decoding-sensitive? 🤔 Generations get more predictable as they progress? 🌲 Tree search fails mid-generation (esp. for reasoning)? We trace these mysteries to LLM probability concentration, and
2
5
29
LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing? 🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative space” in documents. paper:
11
33
160
𝐖𝐡𝐚𝐭 𝐇𝐚𝐬 𝐁𝐞𝐞𝐧 𝐋𝐨𝐬𝐭 𝐖𝐢𝐭𝐡 𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧? I'm happy to announce that the preprint release of my first project is online! Developed with the amazing support of @lasha_nlp and @anmarasovic (Full link below 👇)
1
20
76
Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵
6
37
181
Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations. In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6
35
140
1K
Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr
16
66
244