Peter West @PeterWestTM X Profile

Peter West

@PeterWestTM

Followers

1K

Following

2K

Media

16

Statuses

278

AI / NLP Researcher Incoming faculty at @UBC_CS and @CAIDA_UBC Postdoctoral fellow at @StanfordHAI @stanfordnlp Former PhD student at @uwcse @uwnlp he/him

https://t.co/Id4V7OuCV1

Joined September 2019

Don't wanna be here? Send us removal request.

Peter West

@PeterWestTM

1 year

I have multiple MSc/PhD openings in my lab at @UBC_CS! Come discover the hidden capabilities/limits of LLMs, e.g. how to learn from, guide, and understand the outputs of models. See my website (bio) for more details. https://t.co/GWEH8yOO2k Apply by December 15th! Also...

8

61

163

Hila Gonen

@hila_gonen

1 month

Considering a PhD/MSc in NLP? I’m hiring students this cycle! If you are passionate about making language models reliable and safe, eager about understanding and controlling language models, and would like to add to your research some multilingual flavor - apply to my group! 👇

16

102

728

Dang Nguyen

@divingwithorcas

1 month

The top places in all of our leaderboards have been cracked. The reign of AI is over.

Ari Holtzman

@universeinanegg

2 months

For those who missed it, we just releaaed a little LLM-backed game called HR Simulator™ You play an intern ghostwriting emails for your boss. It’s like you’re stuck in corporate email hell…and you’re the devil 😈 link and an initial answer to “WHY WOULD YOU DO THIS?” below

1

2

6

UBC Computer Science

@UBC_CS

1 month

UBC Computer Science invites applications for up to two full-time tenure-track positions with the following priority areas: visualization, robotics, reinforcement learning, data management, and data mining. Applications are due Wed Dec 10, 2025. https://t.co/ARgHUbnGny

0

11

16

Michael Saxon ✈️ NeurIPS SD

@m2saxon

1 month

𝑵𝒆𝒘 𝒃𝒍𝒐𝒈𝒑𝒐𝒔𝒕! In which I give some brief reflections on #COLM2025 and give a rundown of a few great papers I checked out!

5

24

147

Taylor Sorensen

@ma_tay_

2 months

🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵

5

49

194

Weiyan Shi

@shi_weiyan

1 month

New paper: You can make ChatGPT 2x as creative with one sentence. Ever notice how LLMs all sound the same? They know 100+ jokes but only ever tell one. Every blog intro: "In today's digital landscape..." We figured out why – and how to unlock the rest 🔓 Copy-paste prompt: 🧵

60

155

1K

Wenting Zhao

@wzhao_nlp

2 months

Want to hear some hot takes about the future of language modeling, and share your takes too? Stop by the Visions of Language Modeling workshop at COLM on Friday, October 10 in room 519A! There will be over a dozen speakers working on all kinds of problems in modeling language and

1

15

80

Peter West

@PeterWestTM

2 months

Check out @eunjeong_hwang’s paper—how do we give LLMs aspects of social intelligence that actually *help* in conversation?

EunJeong Hwang

@eunjeong_hwang

2 months

Theory of Mind is key to human social intelligence, but does giving LLMs ToM make them better social reasoners?🤔 We find that ToM makes LLMs better at dialogue: more strategic, goal-oriented, enabling long-horizon adaptation! We introduce ToMA, a ToM-focused dialogue agent🧵👇

0

13

Peter West

@PeterWestTM

2 months

I considered myself a pretty effective email writer until we (led by the amazing @divingwithorcas!) started building this game. See if you fare any better than I did...

Ari Holtzman

@universeinanegg

2 months

For those who missed it, we just releaaed a little LLM-backed game called HR Simulator™ You play an intern ghostwriting emails for your boss. It’s like you’re stuck in corporate email hell…and you’re the devil 😈 link and an initial answer to “WHY WOULD YOU DO THIS?” below

1

2

9

Ari Holtzman

@universeinanegg

3 months

testing a game we're building where the mechanic is writing tricky HR emails, and noticing that LLMs have a built-in secret handshake with users to bypass safety guardrails. This seems both necessary to make LLMs actually useful and like they make guardrails essentially useless

0

1

7

Niloofar

@niloofar_mire

4 months

🧵 Academic job market season is almost here! There's so much rarely discussed—nutrition, mental and physical health, uncertainty, and more. I'm sharing my statements, essential blogs, and personal lessons here, with more to come in the upcoming weeks! ⬇️ (1/N)

3

40

261

Ari Holtzman

@universeinanegg

4 months

the economist published my little letter about the necessity of chaos for discovery

Data Science Institute

@DSI_UChicago

4 months

How can chaos create brilliance and breakthroughs? Ari Holtzman (@universeinanegg) Assistant Professor of Computer Science and Data Science, explores how embracing chaos has unlocked the capabilities of AI systems in a letter to @TheEconomist! https://t.co/U1RJKcFyO6

0

1

17

Ari Holtzman

@universeinanegg

5 months

Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper:

arxiv.org

Prompting is the primary method by which we study and control large language models. It is also one of the most powerful: nearly every major capability attributed to LLMs-few-shot learning,...

6

33

160

Kaiser Sun

@KaiserWhoLearns

6 months

What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑 TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8 #NLProc #LLM #AIResearch

4

23

86

Ari Holtzman

@universeinanegg

5 months

The fact that in pretty much all LLMs the generative branching factor goes down as the model keeps generating feels like a fundamental limit of LLM creativity, and I've never seen a satisfying solution.

Chenghao Yang

@chrome1996

5 months

Have you noticed… 🔍 Aligned LLM generations feel less diverse? 🎯 Base models are decoding-sensitive? 🤔 Generations get more predictable as they progress? 🌲 Tree search fails mid-generation (esp. for reasoning)? We trace these mysteries to LLM probability concentration, and

2

5

29

Harvey Yiyun Fu @NeurIPS

@harveyiyun

5 months

LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing? 🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative space” in documents. paper:

11

33

160

Alex Gill

@alex_gill_nlp

6 months

𝐖𝐡𝐚𝐭 𝐇𝐚𝐬 𝐁𝐞𝐞𝐧 𝐋𝐨𝐬𝐭 𝐖𝐢𝐭𝐡 𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧? I'm happy to announce that the preprint release of my first project is online! Developed with the amazing support of @lasha_nlp and @anmarasovic (Full link below 👇)

1

20

76

Jaehun Jung

@jaehunjung_com

6 months

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵

6

37

181

Csordás Róbert

@robert_csordas

6 months

Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations. In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6

35

140

1K

Mike A. Merrill

@Mike_A_Merrill

6 months

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr

16

66

244