johnhewtt Profile Banner
John Hewitt Profile
John Hewitt

@johnhewtt

Followers
6K
Following
375
Media
38
Statuses
197

Assistant Prof @columbia CS. Visiting Researcher @ Google DeepMind. PhD from @stanfordnlp. Language x Neural Nets.

Seattle, WA
Joined February 2015
Don't wanna be here? Send us removal request.
@johnhewtt
John Hewitt
5 months
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI
Tweet media one
1
28
167
@johnhewtt
John Hewitt
11 days
I’m beginning to share notes from my upcoming fall 2025 NLP class, Columbia COMS 4705. First up, some notes to help students brush up on math. Vectors, matrices, eigenstuff, probability distributions, entropy, divergences, matrix calculus.
7
49
431
@johnhewtt
John Hewitt
18 days
RT @_beenkim: We (@_beenkim @johnhewtt @NeelNanda5 Noah Fiedel Oyvind Tafjord) propose a research direction called 🤖agentic interpretabili….
0
31
0
@johnhewtt
John Hewitt
27 days
I wrote a note on linear transformations and symbols that traces a common conversation/interview I've had with students. Outer products, matrix rank, eigenvectors, linear RNNs -- the topics are really neat, and lead to great discussions of intuitions.
6
23
230
@johnhewtt
John Hewitt
5 months
RT @_beenkim: ‼️Skibidi for Machines! :) . Developing language 🔠 between humans🧒 and machines🤖 has long been a dream - the language that wi….
0
11
0
@johnhewtt
John Hewitt
5 months
The position paper is . We Can’t Understand AI Using Our Existing Vocabulary. Feedback and discussion are very welcome.
7
2
18
@johnhewtt
John Hewitt
5 months
We give a qualitative example where we sample many times, and ask the model to score its own outputs. We distill its preferences into a word 'Good_M', as in, 'Give me responses you'd think are Good_M'. Negating, 'Not Good_M', makes the model generate responses it scores lowly.
Tweet media one
1
1
7
@johnhewtt
John Hewitt
5 months
We train a neologism 'Ensure_H' that helps control response lengths in statements like 'Ensure_H that the response is at least 600 words long.' If you use the word in your prompt, you get better length control. If you want to use the original model again, just rephrase without it
Tweet media one
1
0
2
@johnhewtt
John Hewitt
5 months
As a proof-of-concept, we give a simple soft prompting-like method for learning neologisms. We (1) add a new word to the model's tokenizer, (2) construct preference data with prompts that incorporate the new word, (3) train _just_ that word via preference learning.
Tweet media one
1
0
3
@johnhewtt
John Hewitt
5 months
Developing neologisms also suggests that our interpretability results should _plug into natural language_ for compositionality. A neologism corresponding to a machine's notion of Move 37, or high-quality sorting code, should be referencable in natural language (e.g., in prompts.).
1
0
2
@johnhewtt
John Hewitt
5 months
The neologism framing is clarifying for interp, e.g., at what level of abstraction should we search for model concepts? Neologisms in languages (e.g., 'vibes', 'doomscroll') hit moderate levels of abstraction (if too low-level, not common enough. too abstract: not informative.)
Tweet media one
1
0
4
@johnhewtt
John Hewitt
5 months
Humans and machines process the world differently and build different concepts (skills, patterns, etc.) AlphaGo's 'Move 37' is indicative of a machine-only concept. Human values are examples of concepts not held by machines. Interpretability aims to characterize these concepts
Tweet media one
1
0
4
@johnhewtt
John Hewitt
7 months
I’ll be at neurips for a bit! If you want to talk in person about a PhD in my lab at Columbia, book a slot here:. If your organization wants to fund LLM understanding/interpretability/control research, reach out to me!.
4
10
122
@johnhewtt
John Hewitt
7 months
Teaching and mentorship are key reasons why I chose to join academia. This img is some of my not-great freshman grades. I know every student needs different support at different times, and every student contributes different skills. Come to New York and learn with me!
Tweet media one
1
2
101
@johnhewtt
John Hewitt
7 months
I’m hiring PhD students in computer science at Columbia!. Our lab will tackle core challenges in understanding and controlling neural models that interact with language. for example,. - methods for LLM control. - discoveries of LLM properties. - pretraining for understanding.
18
156
882
@johnhewtt
John Hewitt
8 months
RT @ml_collective: 📢 Join us tomorrow at 10 AM PST for the next DLCT talk featuring @johnhewtt! He’ll dive into "Instruction Following with….
0
2
0
@johnhewtt
John Hewitt
9 months
RT @percyliang: This was a really fun project. Fine-tuning a model on."" => response.produces a model that can do.instruction => response….
0
10
0
@johnhewtt
John Hewitt
9 months
This is work with @nelsonfliu @chrmanning @percyliang and is my last paper at Stanford NLP. It’s been a blast finding these very odd results.
1
0
14
@johnhewtt
John Hewitt
9 months
So, it isn’t just sample-efficient to instruction-tune LMs. Even seemingly totally deficient adaptations yield instruction following. I think this bears a lot more exploration!. Blog: GitHub:
1
1
10
@johnhewtt
John Hewitt
9 months
To make this concrete, we show: even just taking a product between a pretrained LM and a hand-written rule-based LM with only 3 rules also yields rough instruction following. The rules are: upweight EOS slowly, uniformly change 15 words’ probs, penalize repetition.
1
0
9