
Lucas Beyer (bl16)
@giffmana
Followers
111K
Following
53K
Media
2K
Statuses
23K
Researcher (now: Meta. ex: OpenAI, DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian. Anon feedback: https://t.co/xe2XUqkKit ✗DMs → email
Zürich, Suisse
Joined December 2013
My Transformer tutorial slides are now available at https://t.co/aYfnVKPDjT I'll append recordings to this thread as I get them. If you want to use some of the slides for your lecture, you may, as long as you credit me. If you'd like me to give the lecture: maybe; e-mail me.
Giving a lecture introducing the Transformer architecture in all gory details at @M2lSchool tomorrow. Also got permission to publish slides and will share recording if/when I get one. It's a pretty cool set of slides, largely thanks to @_basilM for inspiration!
68
565
3K
I just did some quick tests, and it seems that when I link codex to my chatgpt account, it does not use the custom instructions that I have set in my chatgpt account. I personally did expect it to, but I can see how one might also expect it not to. What would you expect?
8
0
17
🧵 As AI labs race to scale RL, one question matters: when should you stop pre-training and start RL? We trained 5 Qwen models (0.6B→14B) with RL on GSM8K and found something wild: Small models see EMERGENCE-LIKE jumps. Large models see diminishing returns. The scaling law?
38
115
752
Not saying this is happening (idk!), but one very simple way for Google to continuously increase this number over a reasonable span of time would be to increase the fraction of search queries that have ai overview, as well as the length of the overview even if not fully shown.
Sundar Pichai on the earnings call: - 'The growth in usage has been incredible. At I/O in May we announced that we processed 480 trillion monthly tokens across our surfaces. Since then we have doubled that number. Now processing over 980 trillion tokens. A remarkable increase'
14
0
195
🪩The one and only @stateofaireport 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:
50
283
882
I need people to know that back in OpenAI I was an early adopter of codex, maybe in the first 1% of techstaff. Back when the TUI was so bad it couldn't even scroll. I just need you to know, that's all.
31
11
579
Quite the contrary: We're using the language that was designed as a glue language for gluing pieces together that are written in the language(s) that were designed for peak performance. Everything working exactly as designed.
its an ironic twist of fate that the most performance intensive workloads on the planet running on eye wateringly expensive hardware are run via one of the slowest programming languages with a precarious parallelism story
26
34
750
The tweet below is a glimpse into what coding will look like in the near future, at least directionally. I predict a big uptick in people learning about and starting to use git worktrees over the next few years. I'd buy git worktree stock if there was such a thing :)
@simonw Here's how I'm currently using multiple agents. I switched from Windows to Omarchy partially because I was running multiple agents in parallel and wanted to have a dedicated workspace for each one, and virtual desktops in Windows are clunky. I use Codex CLI but with a lot of
12
7
204
They...just...removed the link!
5
2
120
imo, pretraining is distilling evolution.
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea
23
13
481
Introducing LLM.Q: Quantized LLM training in pure CUDA/C++! With LLM.Q, you can train your own LLM on consumer GPUs with natively quantized matmuls, on single workstations. No datacenter required. Inspired by @karpathy's llm.c, but natively quantized.
3
16
138
Wow this is a disappointingly bad take/comic. To all the students, PhD or earlier: If you spend a week trying out things that don't work, you didn't do nothing! If you ran your experiments properly, you should have confidence in the result, and at least some intuition as to why
37
48
785
It just dawned on me that this is tiktok for intellectuals-ish. We also have sudden trends where everybody does their version of the trends. Just in the last few days: - Thinky blogpost is not news - bitter lesson about Rich - Frontier secret vs GRPO It's fun though, ngl
42
16
737