John Hewitt
@johnhewtt
Followers
7K
Following
453
Media
46
Statuses
224
Assistant Prof @columbia CS. Visiting Researcher @ Google DeepMind. PhD from @stanfordnlp. Language x Neural Nets.
New York, NY
Joined February 2015
Come do a PhD with me at Columbia! My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together! pic: a run in central park
12
129
951
📢 Some big (& slightly belated) life updates! 1. I defended my PhD at MIT this summer! 🎓 2. I'm joining NYU as an Assistant Professor starting Fall 2026, with a joint appointment in Courant CS and the Center for Data Science. 🎉 🔬 My lab will focus on empirically studying
102
90
2K
I hire through the computer science department, and will be hiring 1-2ish PhD students this year. Columbia and New York have been an amazing place to live and do research. And if you're not convinced, we just bought a mini fridge for snacks. Join us! https://t.co/STDGM5TBEB
6
12
101
We see this as a step towards developing new language tools for learning about how language models store, process, and reason about potentially complex concepts—differently from how we do. Work with Oyvind Tafjord, Robert Geirhos, @_beenkim Blog here:
1
0
12
In one example, we taught Gemma a neologism that causes single-sentence answers. When asked for synonyms of this new word, it suggested “lack,” as in, “Give me a lack answer.” This didn’t look right, but indeed causes very curt answers. We call this a machine-only synonym.
1
0
7
How can we tell if self-verbalizations are valid? In plug-in evaluation, we replace the neologism in a prompt with a self-verbalization, and measure the extent to which Gemma’s resulting responses reflect the neologism’s concept.
1
0
4
In our new work, Neologism Learning for Controllability and Self-Verbalization ( https://t.co/VYUMcpW2H0), we show that by asking Gemma about the new word ~concept, like “what’s a synonym for ~concept”, gemma can self-verbalize, generating English descriptions of the concept.
1
0
6
In neologism learning [HGK25] we freeze a language model, initialize one new word embedding, place that word in natural language contexts, and train it to optimize a loss on training examples that define some concept. Simple parameter-efficient finetuning, but you get a new word.
1
1
6
New work! Gemma3 can explain in English what it learned from data – when we distill that data into a new word (embedding) and query it for a description of the word. Gemma explained a word trained on incorrect answers as: “a lack of complete, coherent, or meaningful answers...”
3
29
188
Excited to give a talk at the interplay workshop tomorrow! Come say hi! Alas, it’s my only day at COLM. Catch me at the coffee breaks or the roundtable.
✨ The schedule for our INTERPLAY workshop at COLM is live! ✨ 🗓️ October 10th, Room 518C 🔹 Invited talks from @sarahwiegreffe @johnhewtt @amuuueller @kmahowald 🔹 Paper presentations and posters 🔹 Closing roundtable discussion. Join us in Montréal! @COLM_conf
0
2
38
Lecture 1: Text Representation and Language Modeling https://t.co/ekTfKZWkbE Lecture 2: Tokenization https://t.co/iM1oSbkjkd
2
6
80
My first NLP lectures at Columbia are in the books! In our first two lectures, we went over (1) learning from text with a simple word vector language model, and (2) tokenization of text. Lecture notes are brand new and freely available on my website (links in thread.)
18
74
1K
Come chat with me at our ICML poster about interpretability as a communication problem, and the need to derive new words for referencing language model concepts! 4:30PM-7, East Exhibition Hall A-B #E-500 We Can’t Understand AI Using our Existing Vocabulary
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI
2
10
79
I'll be at ICML this year! Reach out if: - you want to chat -- great! -- sign up here https://t.co/F0DjWyzyv4 and/or DM me. - you want to fund my lab @ Columbia -- also great! -- research into deeply understanding language models for alignment, safety, performance. email me.
5
10
118
I’m beginning to share notes from my upcoming fall 2025 NLP class, Columbia COMS 4705. First up, some notes to help students brush up on math. Vectors, matrices, eigenstuff, probability distributions, entropy, divergences, matrix calculus https://t.co/BWwd4xLP9u
8
53
446
We (@_beenkim @johnhewtt @NeelNanda5 Noah Fiedel Oyvind Tafjord) propose a research direction called 🤖agentic interpretability: we can and should ask and help AI systems to build mental models of us which will help us to build mental models of the LLMs. https://t.co/iw5lHnOlBU
8
35
222
I wrote a note on linear transformations and symbols that traces a common conversation/interview I've had with students. Outer products, matrix rank, eigenvectors, linear RNNs -- the topics are really neat, and lead to great discussions of intuitions. https://t.co/xrqHxdQNOr
6
24
235
‼️Skibidi for Machines! :) Developing language 🔠 between humans🧒 and machines🤖 has long been a dream - the language that will help us expand what we know so that we can communicate with machines better, and create machines better align with us. With @johnhewtt's amazing
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI
4
13
96
The position paper is We Can’t Understand AI Using Our Existing Vocabulary https://t.co/JGZh7gUHbe Feedback and discussion are very welcome.
7
2
19
We give a qualitative example where we sample many times, and ask the model to score its own outputs. We distill its preferences into a word 'Good_M', as in, 'Give me responses you'd think are Good_M'. Negating, 'Not Good_M', makes the model generate responses it scores lowly.
1
1
9