Matthew Carrigan @carrigmat X Profile

Matthew Carrigan

@carrigmat

Followers

14K

Following

3K

Media

143

Statuses

2K

@huggingface engineer. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working He/him

Dublin, Ireland

Joined April 2021

Don't wanna be here? Send us removal request.

Matthew Carrigan

@carrigmat

1 year

Big announcement today @huggingface: We now have a unified API for tool use across models from @MistralAI, @AIatMeta, @cohere, @NousResearch and more!. That means that you can reuse the same simplified, portable code to add tool capabilities to all of those models! 🧵

9

74

357

Matthew Carrigan

@carrigmat

2 days

As RL becomes a larger and larger part of training in the next 1-2 years, we should expect these metacognitive abilities to develop, and many of today's problems will be solved. What kinds of tasks will still be difficult for AI after that?.

0

Grok

@grok

7 hours

Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.

14

133

Matthew Carrigan

@carrigmat

2 days

For example, if you reward correct answers with +1, confident wrong answers with -1, and "I don't know" with 0, that -will- incentivize the development of metacognition, at least on that task. No amount of supervised learning can.

1

0

Matthew Carrigan

@carrigmat

2 days

A lot of the things AI is bad at right now fall into this "metacognition" category, understanding your own knowledge state and marshaling your own thoughts. But you can learn metacognition from reinforcement learning!.

1

0

Matthew Carrigan

@carrigmat

2 days

You want it to know what it knows and what it doesn't, but that skill can't be framed in terms of "model answers" to questions. The answer it should give depends on whether it knows or not; static training data can't supply that information.

1

0

Matthew Carrigan

@carrigmat

2 days

For example: AI hallucinates because "I don't know" is not a sentence you can learn by imitation. It requires introspection into your own knowledge state. You can try putting "I don't know" answers in the training data, but then it will say that even if it knows!.

1

0

Matthew Carrigan

@carrigmat

2 days

You can get a pretty good idea for what AI will be good or bad at by asking if you can learn the thing by imitation, which is still the vast majority of training 🧵.

1

0

1

Matthew Carrigan

@carrigmat

3 days

If you don't believe me, it's still there in the API. Progress got faster since the release of GPT-4, not slower! It's just easier not to notice when there aren't big discontinuous jumps.

3

0

2

Matthew Carrigan

@carrigmat

3 days

GPT-5 "AI is slowing down" takes are totally wrong. It feels smaller than the GPT-3 to GPT-4 leap only because everyone's been incrementally releasing every 3 months since then! Original GPT-4 is really stupid by modern standards, and the gap from there to today is huge.

1

0

3

Matthew Carrigan

@carrigmat

4 days

Realized after twenty minutes of talking to someone significantly older than me that they catastrophically misunderstood when I said I was writing a guide on "how to chat with a model".

1

0

23

Matthew Carrigan

@carrigmat

6 days

RT @julien_c: Please don’t download the weights all at once 🙏 or our servers will melt

0

117

0

Matthew Carrigan

@carrigmat

6 days

RT @OpenAIDevs: 🧑‍💻 Open Model Hackathon 🧑‍💻. @huggingface, @nvidia, @ollama, @vllm_project, and @OpenAIDevs challenge you to build somethi….

0

159

0

Matthew Carrigan

@carrigmat

6 days

RT @lvwerra: Harmony deep dive: OpenAI released Harmony along with the new gpt-oss models. It's a new chat template with several interestin….

0

54

0

Matthew Carrigan

@carrigmat

6 days

RT @carrigmat: Even more than the model itself, these new training techniques are going to change a lot, and everyone in the field is going….

0

1

0

Matthew Carrigan

@carrigmat

6 days

RT @sama: gpt-oss is a big deal; it is a state-of-the-art open-weights reasoning model, with strong real-world performance comparable to o4….

0

2K

0

Matthew Carrigan

@carrigmat

6 days

Even more than the model itself, these new training techniques are going to change a lot, and everyone in the field is going to take notice. Personal take, not the company's: Expect pivots and realignments among the big libraries as they adjust to new reality.

1

7

Matthew Carrigan

@carrigmat

6 days

Will that mean a convergence of training and inference code? Hard to say - maybe the learned attention sinks will make attention more quantizable too, and the benefits there may be just about big enough to keep quantization around. But why risk degradation for tiny savings?.

1

0

4

Matthew Carrigan

@carrigmat

6 days

In other words: Quantization just isn't that relevant anymore. From the moment they start training these neurons are already running blazing hot, blazing fast, stripped of every unnecessary bit. The weights in inference will be identical to the training weights.

2

1

8

Matthew Carrigan

@carrigmat

6 days

MXFP4 weights just don't have much fat to trim - I expect that quantizing them even to 3-bit will devastate model performance. You could maybe squeeze attention weights from 16 to 8-bit, but those weights are very quantization-sensitive, and the overall memory saving will be <10%.

1

0

3

Matthew Carrigan

@carrigmat

6 days

Squeezing a float32 model down to 8 or 6 bits quadrupled speed and cut the memory usage by over 75%, and at 8-bit in particular the performance was flawless. Models at train time were just carrying around a lot of unnecessary bits, but not anymore!.

1

0

3

Matthew Carrigan

@carrigmat

6 days

This is unprecedented for open-weights models. What should we expect as a result? Some personal takes:. Firstly, post-training quantization now becomes a lot less important. When models were trained with bfloat16 or float32, quantization for inference was essential.

1

0

3