Pierre Richemond 🇪🇺
@TheOneKloud
Followers
3K
Following
33K
Media
51
Statuses
13K
Multimodal lead @cohere. @ImperialCollege PhD, Paris VI - @Polytechnique, ENST, @HECParis alum. Prev @GoogleDeepMind scientist, @GoldmanSachs trader. Views mine
Joined July 2010
Excited to reveal what I've been working on for the last few months. Command-A-Vision is our new flagship 112B VLM that outperforms Llama 4 Maverick, Mistral Medium/Pixtral Large, GPT 4.1, and others. We release weights on HF https://t.co/7KZUGv2AT3 and hope you'll like it.
4
33
147
Excellent piece: “The real divide in our economy isn’t between rich and poor, but between those working to get us out of economic malaise and the legacy incumbents happy to keep coasting on the rents of stagnation”
new @BritishProgress piece w/ @pdmsero on exit tax: tl;dr: - exit tax pushes founders abroad, killing new economic engine just as it matures - briefing is like causing bank run & kills any upside - tax rent-seekers, not those rebuilding British dynamism https://t.co/SQgK7tjmBD
2
21
151
The Path Not Taken: RLVR Provably Learns Off the Principals "we show that RL operates in a distinct optimization regime from SFT, so directly adapting SFT-era parameter-efficient fine-tuning (PEFT) methods can be flawed, as evidenced by our case studies on advanced sparse
5
10
66
UK Innovation Is Bleeding To Death: The House of Lords Science & Tech Committee just dropped a brutal 2025 report: “We’re world-class at inventing… and world-class at losing it.” - DeepMind? London-born. Google’s now. - Arm? Cambridge chip genius. SoftBank owned. -
93
235
963
In many industry frontier labs, there’s a perceived tension between breadth and depth. It’s often missed that breadth *enables* meaningful depth. It may seem like you are advancing a frontier, but you are in fact in a myopic echo chamber. ML theory suffers badly from this effect.
8
8
157
Elon Musk on discrete diffusion. Timeline.
@StefanoErmon @_inception_ai Diffusion will obviously work on any bitstream. With text, since humans read from first word to last, there is just the question of whether the delay to first sentence for diffusion is worth it. That said, the vast majority of AI workload will be video understanding and
0
0
3
How is memorized data stored in a model? We disentangle MLP weights in LMs and ViTs into rank-1 components based on their curvature in the loss, and find representational signatures of both generalizing structure and memorized training data
9
61
487
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built
577
2K
10K
Discover DeepOCR: a fully open-source reproduction of DeepSeek-OCR, complete with training & evaluation code! #DeepLearning #OCR
We reproduce deepseek-ocr training from scratch, the code, model, results can be found in our website #DeepSeek
https://t.co/lZyAnkIDaL
3
19
96
Leaving Meta and PyTorch I'm stepping down from PyTorch and leaving Meta on November 17th. tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me. Eleven years
499
569
11K
A nice write up by Terry Tao and coauthors on 67 problems they attempted with alpha evolve and other tools. They cited my exasperated thread below where I spent a weekend trying to numerically solve one of the autocorrelation problems in their report. That was a bad weekend.
Google released AlphaEvolve. I'm trying to get a sense of whether the problems are hard to solve numerically. Let's focus on problem B.1. i'm going to do this live.
6
26
250
Online generation of rubrics. Critic generates a rubric given generator outputs and a generation is validated against the rubric. If the response satisfies the rubric the generator gets a reward and if unsatisfied then the critic gets a reward.
1
7
46
SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning Trains SAIL-VL2 into SAIL-VL2-Thinking with a dual-reward RL stack: - Thinking reward checks reasoning quality: factual grounding, logical coherence, answer consistency - Judging reward decides whether to
Bytedance Douyin Content Team SAIL-VL 1.5 (2B, 8B) and SAIL-VL (2B, 8B): small VLMs tuned for mobile deployment, positioned as drop-in replacements for InternVL2 with dynamic tiling/cropping. They target accessible, affordable on-device VLM use, trained on SAIL-Caption (~300M
1
4
23
It would end the UK's tech industry. No one would be able to WFH, so everyone would just leave Even I the person who said he'd stay and fix the place would have to think long and hard if the UK became the only place in the world tech employees can't WFH (when I'm not travelling)
The Government is launching a consultation on banning VPNs. We’d bet they don’t understand what a VPN is or how a ban would harm them and national industry. https://t.co/t7ocS8g6dW
49
187
3K
Continuing our IMO-gold journey, I’m delighted to share our #EMNLP2025 paper “Towards Robust Mathematical Reasoning”, which tells some of the key stories behind the success of our advanced Gemini #DeepThink at this year IMO. Finding the right north-star metrics was highly
Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this
14
111
709
The DRAM market appears to be pricing a full return to fp64.
0
0
1
Hyperscaler standalone AI ROI is hard to measure as AI investment benefits all parts of biz. Ccs ests (likely with full capex burden but no explicit new AI rev streams) still indicate mid-teens post-tax incremental ROI - well above WACC. If these weren't already the best
1
1
14
The Illustrated NeurIPS 2025: A Visual Map of the AI Frontier New blog post! NeurIPS 2025 papers are out—and it’s a lot to take in. This visualization lets you explore the entire research landscape interactively, with clusters, summaries, and @cohere LLM-generated explanations
20
200
1K
Just wow.. Morgan Stanley predicts that hyperscalers will spend $700 billion in capex in 2027. This is the biggest investment supercycle in history. AWS alone will add 10GW of data center capacity in the next 24 months. Hard to be bearish.. $AMZN $GOOG $MSFT $META $ORCL
79
180
913