Lucas Beyer (bl16) @giffmana X Profile

Lucas Beyer (bl16)

@giffmana

Followers

111K

Following

53K

Media

2K

Statuses

23K

Researcher (now: Meta. ex: OpenAI, DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian. Anon feedback: https://t.co/xe2XUqkKit ✗DMs → email

https://t.co/RsCh9TJjKC

Zürich, Suisse

Joined December 2013

Don't wanna be here? Send us removal request.

Lucas Beyer (bl16)

@giffmana

3 years

My Transformer tutorial slides are now available at https://t.co/aYfnVKPDjT I'll append recordings to this thread as I get them. If you want to use some of the slides for your lecture, you may, as long as you credit me. If you'd like me to give the lecture: maybe; e-mail me.

Lucas Beyer (bl16)

@giffmana

3 years

Giving a lecture introducing the Transformer architecture in all gory details at @M2lSchool tomorrow. Also got permission to publish slides and will share recording if/when I get one. It's a pretty cool set of slides, largely thanks to @_basilM for inspiration!

68

565

3K

Lucas Beyer (bl16)

@giffmana

4 hours

nice

Saining Xie

@sainingxie

8 hours

diffusion transformers have come a long way, but most still lean on the old 2021 sd-vae for their latent space. that causes a few big issues: 1. outdated backbones make the architecture more complex than it needs to be. the sd-vae runs at around 450 gflops, while a simple ViT-B

3

81

Lucas Beyer (bl16)

@giffmana

14 hours

https://t.co/TCDucSDwSD

Omar Sanseviero

@osanseviero

15 hours

Google is the org at Hugging Face with the most downloads 🤗

19

4

239

Lucas Beyer (bl16)

@giffmana

4 days

I just did some quick tests, and it seems that when I link codex to my chatgpt account, it does not use the custom instructions that I have set in my chatgpt account. I personally did expect it to, but I can see how one might also expect it not to. What would you expect?

8

0

17

Joan Cabezas

@josancamon19

4 days

🧵 As AI labs race to scale RL, one question matters: when should you stop pre-training and start RL? We trained 5 Qwen models (0.6B→14B) with RL on GSM8K and found something wild: Small models see EMERGENCE-LIKE jumps. Large models see diminishing returns. The scaling law?

38

115

752

Lucas Beyer (bl16)

@giffmana

4 days

Not saying this is happening (idk!), but one very simple way for Google to continuously increase this number over a reasonable span of time would be to increase the fraction of search queries that have ai overview, as well as the length of the overview even if not fully shown.

Andrew Curran

@AndrewCurran_

3 months

Sundar Pichai on the earnings call: - 'The growth in usage has been incredible. At I/O in May we announced that we processed 480 trillion monthly tokens across our surfaces. Since then we have doubled that number. Now processing over 980 trillion tokens. A remarkable increase'

14

0

195

Lucas Beyer (bl16)

@giffmana

5 days

https://t.co/y9m4IYtv9m

Ben Recht

@beenwrekt

5 days

Almost a decade ago, I coauthored a paper asking us to rethink our theory of generalization in machine learning. Today, I’m fine putting the theory back on the shelf.

1

8

153

Lucas Beyer (bl16)

@giffmana

5 days

A fun guy and a no funghi meet in a bar...

4

2

191

Nathan Benaich

@nathanbenaich

5 days

🪩The one and only @stateofaireport 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:

50

283

882

Lucas Beyer (bl16)

@giffmana

5 days

I need people to know that back in OpenAI I was an early adopter of codex, maybe in the first 1% of techstaff. Back when the TUI was so bad it couldn't even scroll. I just need you to know, that's all.

Lisan al Gaib

@scaling01

6 days

100% of PRs at OpenAI are reviewed by Codex

31

11

579

Lucas Beyer (bl16)

@giffmana

5 days

My experience with Claude:

Andrej Karpathy

@karpathy

5 days

POV: Your LLM agent is dividing a by b

30

21

885

mark erdmann

@markerdmann

2 months

@giffmana @an_vo12 @taesiri @anh_ng8 tested on gpt-5 pro out of curiosity. it failed with "how many legs does this animal have" but succeeded with "how many legs does this animal have? not like normally, but in this picture specifically."

2

3

22

Lucas Beyer (bl16)

@giffmana

7 days

Quite the contrary: We're using the language that was designed as a glue language for gluing pieces together that are written in the language(s) that were designed for peak performance. Everything working exactly as designed.

Jerry Tworek

@MillionInt

8 days

its an ironic twist of fate that the most performance intensive workloads on the planet running on eye wateringly expensive hardware are run via one of the slowest programming languages with a precarious parallelism story

26

34

750

Lucas Beyer (bl16)

@giffmana

8 days

The tweet below is a glimpse into what coding will look like in the near future, at least directionally. I predict a big uptick in people learning about and starting to use git worktrees over the next few years. I'd buy git worktree stock if there was such a thing :)

Thomas Rice

@thomasrice_au

9 days

@simonw Here's how I'm currently using multiple agents. I switched from Windows to Omarchy partially because I was running multiple agents in parallel and wanted to have a dedicated workspace for each one, and virtual desktops in Windows are clunky. I use Codex CLI but with a lot of

12

7

204

Lucas Beyer (bl16)

@giffmana

9 days

They...just...removed the link!

Lucas Beyer (bl16)

@giffmana

13 days

@thinkymachines This most interesting link is broken 404:

5

2

120

Lucas Beyer (bl16)

@giffmana

13 days

imo, pretraining is distilling evolution.

Andrej Karpathy

@karpathy

13 days

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea

23

13

481

Dan Alistarh

@DAlistarh

15 days

Introducing LLM.Q: Quantized LLM training in pure CUDA/C++! With LLM.Q, you can train your own LLM on consumer GPUs with natively quantized matmuls, on single workstations. No datacenter required. Inspired by @karpathy's llm.c, but natively quantized.

3

16

138

Lucas Beyer (bl16)

@giffmana

14 days

Wow this is a disappointingly bad take/comic. To all the students, PhD or earlier: If you spend a week trying out things that don't work, you didn't do nothing! If you ran your experiments properly, you should have confidence in the result, and at least some intuition as to why

PHD Comics

@PHDcomics

15 days

Anything to report?

37

48

785

Cagri Demir

@adnancagri

15 days

@giffmana @SchmidhuberAI

1

2

17

Lucas Beyer (bl16)

@giffmana

15 days

It just dawned on me that this is tiktok for intellectuals-ish. We also have sudden trends where everybody does their version of the trends. Just in the last few days: - Thinky blogpost is not news - bitter lesson about Rich - Frontier secret vs GRPO It's fun though, ngl

42

16

737

Lucas Beyer (bl16)

@giffmana

16 days

Guys, I have a theory... see Fig1 below.

31

10

596