giffmana Profile Banner
Lucas Beyer (bl16) Profile
Lucas Beyer (bl16)

@giffmana

Followers
111K
Following
53K
Media
2K
Statuses
23K

Researcher (now: Meta. ex: OpenAI, DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian. Anon feedback: https://t.co/xe2XUqkKit ✗DMs → email

Zürich, Suisse
Joined December 2013
Don't wanna be here? Send us removal request.
@giffmana
Lucas Beyer (bl16)
3 years
My Transformer tutorial slides are now available at https://t.co/aYfnVKPDjT I'll append recordings to this thread as I get them. If you want to use some of the slides for your lecture, you may, as long as you credit me. If you'd like me to give the lecture: maybe; e-mail me.
@giffmana
Lucas Beyer (bl16)
3 years
Giving a lecture introducing the Transformer architecture in all gory details at @M2lSchool tomorrow. Also got permission to publish slides and will share recording if/when I get one. It's a pretty cool set of slides, largely thanks to @_basilM for inspiration!
68
565
3K
@giffmana
Lucas Beyer (bl16)
4 hours
nice
@sainingxie
Saining Xie
8 hours
diffusion transformers have come a long way, but most still lean on the old 2021 sd-vae for their latent space. that causes a few big issues: 1. outdated backbones make the architecture more complex than it needs to be. the sd-vae runs at around 450 gflops, while a simple ViT-B
3
3
81
@giffmana
Lucas Beyer (bl16)
14 hours
@osanseviero
Omar Sanseviero
15 hours
Google is the org at Hugging Face with the most downloads 🤗
19
4
239
@giffmana
Lucas Beyer (bl16)
4 days
I just did some quick tests, and it seems that when I link codex to my chatgpt account, it does not use the custom instructions that I have set in my chatgpt account. I personally did expect it to, but I can see how one might also expect it not to. What would you expect?
8
0
17
@josancamon19
Joan Cabezas
4 days
🧵 As AI labs race to scale RL, one question matters: when should you stop pre-training and start RL? We trained 5 Qwen models (0.6B→14B) with RL on GSM8K and found something wild: Small models see EMERGENCE-LIKE jumps. Large models see diminishing returns. The scaling law?
38
115
752
@giffmana
Lucas Beyer (bl16)
4 days
Not saying this is happening (idk!), but one very simple way for Google to continuously increase this number over a reasonable span of time would be to increase the fraction of search queries that have ai overview, as well as the length of the overview even if not fully shown.
@AndrewCurran_
Andrew Curran
3 months
Sundar Pichai on the earnings call: - 'The growth in usage has been incredible. At I/O in May we announced that we processed 480 trillion monthly tokens across our surfaces. Since then we have doubled that number. Now processing over 980 trillion tokens. A remarkable increase'
14
0
195
@giffmana
Lucas Beyer (bl16)
5 days
@beenwrekt
Ben Recht
5 days
Almost a decade ago, I coauthored a paper asking us to rethink our theory of generalization in machine learning. Today, I’m fine putting the theory back on the shelf.
1
8
153
@giffmana
Lucas Beyer (bl16)
5 days
A fun guy and a no funghi meet in a bar...
4
2
191
@nathanbenaich
Nathan Benaich
5 days
🪩The one and only @stateofaireport 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:
50
283
882
@giffmana
Lucas Beyer (bl16)
5 days
I need people to know that back in OpenAI I was an early adopter of codex, maybe in the first 1% of techstaff. Back when the TUI was so bad it couldn't even scroll. I just need you to know, that's all.
@scaling01
Lisan al Gaib
6 days
100% of PRs at OpenAI are reviewed by Codex
31
11
579
@giffmana
Lucas Beyer (bl16)
5 days
My experience with Claude:
@karpathy
Andrej Karpathy
5 days
POV: Your LLM agent is dividing a by b
30
21
885
@markerdmann
mark erdmann
2 months
@giffmana @an_vo12 @taesiri @anh_ng8 tested on gpt-5 pro out of curiosity. it failed with "how many legs does this animal have" but succeeded with "how many legs does this animal have? not like normally, but in this picture specifically."
2
3
22
@giffmana
Lucas Beyer (bl16)
7 days
Quite the contrary: We're using the language that was designed as a glue language for gluing pieces together that are written in the language(s) that were designed for peak performance. Everything working exactly as designed.
@MillionInt
Jerry Tworek
8 days
its an ironic twist of fate that the most performance intensive workloads on the planet running on eye wateringly expensive hardware are run via one of the slowest programming languages with a precarious parallelism story
26
34
750
@giffmana
Lucas Beyer (bl16)
8 days
The tweet below is a glimpse into what coding will look like in the near future, at least directionally. I predict a big uptick in people learning about and starting to use git worktrees over the next few years. I'd buy git worktree stock if there was such a thing :)
@thomasrice_au
Thomas Rice
9 days
@simonw Here's how I'm currently using multiple agents. I switched from Windows to Omarchy partially because I was running multiple agents in parallel and wanted to have a dedicated workspace for each one, and virtual desktops in Windows are clunky. I use Codex CLI but with a lot of
12
7
204
@giffmana
Lucas Beyer (bl16)
9 days
They...just...removed the link!
@giffmana
Lucas Beyer (bl16)
13 days
@thinkymachines This most interesting link is broken 404:
5
2
120
@giffmana
Lucas Beyer (bl16)
13 days
imo, pretraining is distilling evolution.
@karpathy
Andrej Karpathy
13 days
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea
23
13
481
@DAlistarh
Dan Alistarh
15 days
Introducing LLM.Q: Quantized LLM training in pure CUDA/C++! With LLM.Q, you can train your own LLM on consumer GPUs with natively quantized matmuls, on single workstations. No datacenter required. Inspired by @karpathy's llm.c, but natively quantized.
3
16
138
@giffmana
Lucas Beyer (bl16)
14 days
Wow this is a disappointingly bad take/comic. To all the students, PhD or earlier: If you spend a week trying out things that don't work, you didn't do nothing! If you ran your experiments properly, you should have confidence in the result, and at least some intuition as to why
@PHDcomics
PHD Comics
15 days
Anything to report?
37
48
785
@adnancagri
Cagri Demir
15 days
1
2
17
@giffmana
Lucas Beyer (bl16)
15 days
It just dawned on me that this is tiktok for intellectuals-ish. We also have sudden trends where everybody does their version of the trends. Just in the last few days: - Thinky blogpost is not news - bitter lesson about Rich - Frontier secret vs GRPO It's fun though, ngl
42
16
737
@giffmana
Lucas Beyer (bl16)
16 days
Guys, I have a theory... see Fig1 below.
31
10
596