carrigmat Profile Banner
Matthew Carrigan Profile
Matthew Carrigan

@carrigmat

Followers
14K
Following
3K
Media
143
Statuses
2K

@huggingface engineer. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working He/him

Dublin, Ireland
Joined April 2021
Don't wanna be here? Send us removal request.
@carrigmat
Matthew Carrigan
1 year
Big announcement today @huggingface: We now have a unified API for tool use across models from @MistralAI, @AIatMeta, @cohere, @NousResearch and more!. That means that you can reuse the same simplified, portable code to add tool capabilities to all of those models! đź§µ
Tweet media one
9
74
357
@carrigmat
Matthew Carrigan
2 days
As RL becomes a larger and larger part of training in the next 1-2 years, we should expect these metacognitive abilities to develop, and many of today's problems will be solved. What kinds of tasks will still be difficult for AI after that?.
0
0
0
@grok
Grok
7 hours
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
14
14
133
@carrigmat
Matthew Carrigan
2 days
For example, if you reward correct answers with +1, confident wrong answers with -1, and "I don't know" with 0, that -will- incentivize the development of metacognition, at least on that task. No amount of supervised learning can.
1
0
0
@carrigmat
Matthew Carrigan
2 days
A lot of the things AI is bad at right now fall into this "metacognition" category, understanding your own knowledge state and marshaling your own thoughts. But you can learn metacognition from reinforcement learning!.
1
0
0
@carrigmat
Matthew Carrigan
2 days
You want it to know what it knows and what it doesn't, but that skill can't be framed in terms of "model answers" to questions. The answer it should give depends on whether it knows or not; static training data can't supply that information.
1
0
0
@carrigmat
Matthew Carrigan
2 days
For example: AI hallucinates because "I don't know" is not a sentence you can learn by imitation. It requires introspection into your own knowledge state. You can try putting "I don't know" answers in the training data, but then it will say that even if it knows!.
1
0
0
@carrigmat
Matthew Carrigan
2 days
You can get a pretty good idea for what AI will be good or bad at by asking if you can learn the thing by imitation, which is still the vast majority of training đź§µ.
1
0
1
@carrigmat
Matthew Carrigan
3 days
If you don't believe me, it's still there in the API. Progress got faster since the release of GPT-4, not slower! It's just easier not to notice when there aren't big discontinuous jumps.
3
0
2
@carrigmat
Matthew Carrigan
3 days
GPT-5 "AI is slowing down" takes are totally wrong. It feels smaller than the GPT-3 to GPT-4 leap only because everyone's been incrementally releasing every 3 months since then! Original GPT-4 is really stupid by modern standards, and the gap from there to today is huge.
1
0
3
@carrigmat
Matthew Carrigan
4 days
Realized after twenty minutes of talking to someone significantly older than me that they catastrophically misunderstood when I said I was writing a guide on "how to chat with a model".
1
0
23
@carrigmat
Matthew Carrigan
6 days
RT @julien_c: Please don’t download the weights all at once 🙏 or our servers will melt
Tweet media one
0
117
0
@carrigmat
Matthew Carrigan
6 days
RT @OpenAIDevs: 🧑‍💻 Open Model Hackathon 🧑‍💻. @huggingface, @nvidia, @ollama, @vllm_project, and @OpenAIDevs challenge you to build somethi….
0
159
0
@carrigmat
Matthew Carrigan
6 days
RT @lvwerra: Harmony deep dive: OpenAI released Harmony along with the new gpt-oss models. It's a new chat template with several interestin….
0
54
0
@carrigmat
Matthew Carrigan
6 days
RT @carrigmat: Even more than the model itself, these new training techniques are going to change a lot, and everyone in the field is going….
0
1
0
@carrigmat
Matthew Carrigan
6 days
RT @sama: gpt-oss is a big deal; it is a state-of-the-art open-weights reasoning model, with strong real-world performance comparable to o4….
0
2K
0
@carrigmat
Matthew Carrigan
6 days
Even more than the model itself, these new training techniques are going to change a lot, and everyone in the field is going to take notice. Personal take, not the company's: Expect pivots and realignments among the big libraries as they adjust to new reality.
1
1
7
@carrigmat
Matthew Carrigan
6 days
Will that mean a convergence of training and inference code? Hard to say - maybe the learned attention sinks will make attention more quantizable too, and the benefits there may be just about big enough to keep quantization around. But why risk degradation for tiny savings?.
1
0
4
@carrigmat
Matthew Carrigan
6 days
In other words: Quantization just isn't that relevant anymore. From the moment they start training these neurons are already running blazing hot, blazing fast, stripped of every unnecessary bit. The weights in inference will be identical to the training weights.
2
1
8
@carrigmat
Matthew Carrigan
6 days
MXFP4 weights just don't have much fat to trim - I expect that quantizing them even to 3-bit will devastate model performance. You could maybe squeeze attention weights from 16 to 8-bit, but those weights are very quantization-sensitive, and the overall memory saving will be <10%.
1
0
3
@carrigmat
Matthew Carrigan
6 days
Squeezing a float32 model down to 8 or 6 bits quadrupled speed and cut the memory usage by over 75%, and at 8-bit in particular the performance was flawless. Models at train time were just carrying around a lot of unnecessary bits, but not anymore!.
1
0
3
@carrigmat
Matthew Carrigan
6 days
This is unprecedented for open-weights models. What should we expect as a result? Some personal takes:. Firstly, post-training quantization now becomes a lot less important. When models were trained with bfloat16 or float32, quantization for inference was essential.
1
0
3