John Hewitt @johnhewtt X Profile

John Hewitt

@johnhewtt

Followers

6K

Following

375

Media

38

Statuses

197

Assistant Prof @columbia CS. Visiting Researcher @ Google DeepMind. PhD from @stanfordnlp. Language x Neural Nets.

Seattle, WA

Joined February 2015

Don't wanna be here? Send us removal request.

John Hewitt

@johnhewtt

5 months

Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI

1

28

167

John Hewitt

@johnhewtt

11 days

I’m beginning to share notes from my upcoming fall 2025 NLP class, Columbia COMS 4705. First up, some notes to help students brush up on math. Vectors, matrices, eigenstuff, probability distributions, entropy, divergences, matrix calculus.

7

49

431

John Hewitt

@johnhewtt

18 days

RT @_beenkim: We (@_beenkim @johnhewtt @NeelNanda5 Noah Fiedel Oyvind Tafjord) propose a research direction called 🤖agentic interpretabili….

0

31

0

John Hewitt

@johnhewtt

27 days

I wrote a note on linear transformations and symbols that traces a common conversation/interview I've had with students. Outer products, matrix rank, eigenvectors, linear RNNs -- the topics are really neat, and lead to great discussions of intuitions.

6

23

230

John Hewitt

@johnhewtt

5 months

RT @_beenkim: ‼️Skibidi for Machines! :) . Developing language 🔠 between humans🧒 and machines🤖 has long been a dream - the language that wi….

0

11

0

John Hewitt

@johnhewtt

5 months

The position paper is . We Can’t Understand AI Using Our Existing Vocabulary. Feedback and discussion are very welcome.

7

2

18

John Hewitt

@johnhewtt

5 months

We give a qualitative example where we sample many times, and ask the model to score its own outputs. We distill its preferences into a word 'Good_M', as in, 'Give me responses you'd think are Good_M'. Negating, 'Not Good_M', makes the model generate responses it scores lowly.

1

7

John Hewitt

@johnhewtt

5 months

We train a neologism 'Ensure_H' that helps control response lengths in statements like 'Ensure_H that the response is at least 600 words long.' If you use the word in your prompt, you get better length control. If you want to use the original model again, just rephrase without it

1

0

2

John Hewitt

@johnhewtt

5 months

As a proof-of-concept, we give a simple soft prompting-like method for learning neologisms. We (1) add a new word to the model's tokenizer, (2) construct preference data with prompts that incorporate the new word, (3) train _just_ that word via preference learning.

1

0

3

John Hewitt

@johnhewtt

5 months

Developing neologisms also suggests that our interpretability results should _plug into natural language_ for compositionality. A neologism corresponding to a machine's notion of Move 37, or high-quality sorting code, should be referencable in natural language (e.g., in prompts.).

1

0

2

John Hewitt

@johnhewtt

5 months

The neologism framing is clarifying for interp, e.g., at what level of abstraction should we search for model concepts? Neologisms in languages (e.g., 'vibes', 'doomscroll') hit moderate levels of abstraction (if too low-level, not common enough. too abstract: not informative.)

1

0

4

John Hewitt

@johnhewtt

5 months

Humans and machines process the world differently and build different concepts (skills, patterns, etc.) AlphaGo's 'Move 37' is indicative of a machine-only concept. Human values are examples of concepts not held by machines. Interpretability aims to characterize these concepts

1

0

4

John Hewitt

@johnhewtt

7 months

I’ll be at neurips for a bit! If you want to talk in person about a PhD in my lab at Columbia, book a slot here:. If your organization wants to fund LLM understanding/interpretability/control research, reach out to me!.

4

10

122

John Hewitt

@johnhewtt

7 months

Teaching and mentorship are key reasons why I chose to join academia. This img is some of my not-great freshman grades. I know every student needs different support at different times, and every student contributes different skills. Come to New York and learn with me!

1

2

101

John Hewitt

@johnhewtt

7 months

I’m hiring PhD students in computer science at Columbia!. Our lab will tackle core challenges in understanding and controlling neural models that interact with language. for example,. - methods for LLM control. - discoveries of LLM properties. - pretraining for understanding.

18

156

882

John Hewitt

@johnhewtt

8 months

RT @ml_collective: 📢 Join us tomorrow at 10 AM PST for the next DLCT talk featuring @johnhewtt! He’ll dive into "Instruction Following with….

0

2

0

John Hewitt

@johnhewtt

9 months

RT @percyliang: This was a really fun project. Fine-tuning a model on."" => response.produces a model that can do.instruction => response….

0

10

0

John Hewitt

@johnhewtt

9 months

This is work with @nelsonfliu @chrmanning @percyliang and is my last paper at Stanford NLP. It’s been a blast finding these very odd results.

1

0

14

John Hewitt

@johnhewtt

9 months

So, it isn’t just sample-efficient to instruction-tune LMs. Even seemingly totally deficient adaptations yield instruction following. I think this bears a lot more exploration!. Blog: GitHub:

1

10

John Hewitt

@johnhewtt

9 months

To make this concrete, we show: even just taking a product between a pretrained LM and a hand-written rule-based LM with only 3 rules also yields rough instruction following. The rules are: upweight EOS slowly, uniformly change 15 words’ probs, penalize repetition.

1

0

9