johnhewtt Profile Banner
John Hewitt Profile
John Hewitt

@johnhewtt

Followers
7K
Following
453
Media
46
Statuses
224

Assistant Prof @columbia CS. Visiting Researcher @ Google DeepMind. PhD from @stanfordnlp. Language x Neural Nets.

New York, NY
Joined February 2015
Don't wanna be here? Send us removal request.
@johnhewtt
John Hewitt
17 days
Come do a PhD with me at Columbia! My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together! pic: a run in central park
12
129
951
@pratyusha_PS
Pratyusha Sharma ✈️ NeurIPS
15 days
📢 Some big (& slightly belated) life updates! 1. I defended my PhD at MIT this summer! 🎓 2. I'm joining NYU as an Assistant Professor starting Fall 2026, with a joint appointment in Courant CS and the Center for Data Science. 🎉 🔬 My lab will focus on empirically studying
102
90
2K
@johnhewtt
John Hewitt
17 days
I hire through the computer science department, and will be hiring 1-2ish PhD students this year. Columbia and New York have been an amazing place to live and do research. And if you're not convinced, we just bought a mini fridge for snacks. Join us! https://t.co/STDGM5TBEB
6
12
101
@johnhewtt
John Hewitt
1 month
We see this as a step towards developing new language tools for learning about how language models store, process, and reason about potentially complex concepts—differently from how we do. Work with Oyvind Tafjord, Robert Geirhos, @_beenkim Blog here:
1
0
12
@johnhewtt
John Hewitt
1 month
In one example, we taught Gemma a neologism that causes single-sentence answers. When asked for synonyms of this new word, it suggested “lack,” as in, “Give me a lack answer.” This didn’t look right, but indeed causes very curt answers. We call this a machine-only synonym.
1
0
7
@johnhewtt
John Hewitt
1 month
How can we tell if self-verbalizations are valid? In plug-in evaluation, we replace the neologism in a prompt with a self-verbalization, and measure the extent to which Gemma’s resulting responses reflect the neologism’s concept.
1
0
4
@johnhewtt
John Hewitt
1 month
In our new work, Neologism Learning for Controllability and Self-Verbalization ( https://t.co/VYUMcpW2H0), we show that by asking Gemma about the new word ~concept, like “what’s a synonym for ~concept”, gemma can self-verbalize, generating English descriptions of the concept.
1
0
6
@johnhewtt
John Hewitt
1 month
In neologism learning [HGK25] we freeze a language model, initialize one new word embedding, place that word in natural language contexts, and train it to optimize a loss on training examples that define some concept. Simple parameter-efficient finetuning, but you get a new word.
1
1
6
@johnhewtt
John Hewitt
1 month
New work! Gemma3 can explain in English what it learned from data – when we distill that data into a new word (embedding) and query it for a description of the word. Gemma explained a word trained on incorrect answers as: “a lack of complete, coherent, or meaningful answers...”
3
29
188
@johnhewtt
John Hewitt
2 months
Excited to give a talk at the interplay workshop tomorrow! Come say hi! Alas, it’s my only day at COLM. Catch me at the coffee breaks or the roundtable.
@interplaywrkshp
INTERPLAY Workshop
2 months
✨ The schedule for our INTERPLAY workshop at COLM is live! ✨ 🗓️ October 10th, Room 518C 🔹 Invited talks from @sarahwiegreffe @johnhewtt @amuuueller @kmahowald 🔹 Paper presentations and posters 🔹 Closing roundtable discussion. Join us in Montréal! @COLM_conf
0
2
38
@johnhewtt
John Hewitt
3 months
Lecture 1: Text Representation and Language Modeling https://t.co/ekTfKZWkbE Lecture 2: Tokenization https://t.co/iM1oSbkjkd
2
6
80
@johnhewtt
John Hewitt
3 months
My first NLP lectures at Columbia are in the books! In our first two lectures, we went over (1) learning from text with a simple word vector language model, and (2) tokenization of text. Lecture notes are brand new and freely available on my website (links in thread.)
18
74
1K
@johnhewtt
John Hewitt
5 months
Come chat with me at our ICML poster about interpretability as a communication problem, and the need to derive new words for referencing language model concepts! 4:30PM-7, East Exhibition Hall A-B #E-500 We Can’t Understand AI Using our Existing Vocabulary
@johnhewtt
John Hewitt
10 months
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI
2
10
79
@johnhewtt
John Hewitt
5 months
I'll be at ICML this year! Reach out if: - you want to chat -- great! -- sign up here https://t.co/F0DjWyzyv4 and/or DM me. - you want to fund my lab @ Columbia -- also great! -- research into deeply understanding language models for alignment, safety, performance. email me.
5
10
118
@johnhewtt
John Hewitt
6 months
I’m beginning to share notes from my upcoming fall 2025 NLP class, Columbia COMS 4705. First up, some notes to help students brush up on math. Vectors, matrices, eigenstuff, probability distributions, entropy, divergences, matrix calculus https://t.co/BWwd4xLP9u
8
53
446
@_beenkim
Been Kim
6 months
We (@_beenkim @johnhewtt @NeelNanda5 Noah Fiedel Oyvind Tafjord) propose a research direction called 🤖agentic interpretability: we can and should ask and help AI systems to build mental models of us which will help us to build mental models of the LLMs. https://t.co/iw5lHnOlBU
8
35
222
@johnhewtt
John Hewitt
6 months
I wrote a note on linear transformations and symbols that traces a common conversation/interview I've had with students. Outer products, matrix rank, eigenvectors, linear RNNs -- the topics are really neat, and lead to great discussions of intuitions. https://t.co/xrqHxdQNOr
6
24
235
@_beenkim
Been Kim
10 months
‼️Skibidi for Machines! :) Developing language 🔠 between humans🧒 and machines🤖 has long been a dream - the language that will help us expand what we know so that we can communicate with machines better, and create machines better align with us. With @johnhewtt's amazing
@johnhewtt
John Hewitt
10 months
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI
4
13
96
@johnhewtt
John Hewitt
10 months
The position paper is We Can’t Understand AI Using Our Existing Vocabulary https://t.co/JGZh7gUHbe Feedback and discussion are very welcome.
7
2
19
@johnhewtt
John Hewitt
10 months
We give a qualitative example where we sample many times, and ask the model to score its own outputs. We distill its preferences into a word 'Good_M', as in, 'Give me responses you'd think are Good_M'. Negating, 'Not Good_M', makes the model generate responses it scores lowly.
1
1
9