Luke DH Lee @luke_lee_ai X Profile

Luke DH Lee

@luke_lee_ai

Followers

112

Following

846

Media

5

Statuses

45

PhD student @UCBerkeley. CS master’s @UCL. Formerly visiting researcher @ Stanford AI Lab.

Joined August 2018

Don't wanna be here? Send us removal request.

Seonglae Cho

@SeonglaeC

4 months

New paper! Rare SAE dataset approach: We train Sparse Autoencoders using only synthetic data generated by the model itself, revealing features that truly reflect what’s inside the model.

2

1

4

Dawn Song

@dawnsongtweets

4 months

My group & collaborators have developed many popular benchmarks over the years, e.g., MMLU, MATH, APPS---really excited about our latest benchmark OMEGA Ω: 🔍Can LLMs really think outside the box in math? a new benchmark probing 3 axes of generalization: 1️⃣ Exploratory 2️⃣

Nouha Dziri

@nouhadziri

4 months

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found

4

35

157

Luke DH Lee

@luke_lee_ai

5 months

"Rank the Rankers" from CMU! #RAG #MCP

Danny To Eun Kim

@TEKnologyy

5 months

🧵Working with #MCP or building a modular #RAG system, but not sure which rankers to use from your pool? 📊 Rank the Rankers⚡Route smart. This paper shows how. 👨‍🔬 w/ Fernando Diaz @841io 💻 Code: https://t.co/fPBzHWzuF2 Paper:

0

2

Luke DH Lee

@luke_lee_ai

5 months

Beginning of autonomous bug discovery & defense! 🔥 AI agents now match elite hackers — 15 zero-days found, $30K+ bugs patched. Huge milestone by @dawnsongtweets & team!

Dawn Song

@dawnsongtweets

5 months

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖

0

Azalia Mirhoseini

@Azaliamirh

7 months

In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved! The following amazing work theoretically proves the necessary and

Rylan Schaeffer

@RylanSchaeffer

7 months

Interested in test time / inference scaling laws? Then check out our newest preprint!! 📉 How Do Large Language Monkeys Get Their Power (Laws)? 📉 https://t.co/Vz76RpmXdF w/ @JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1

0

41

170

Danny To Eun Kim

@TEKnologyy

8 months

🚨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research! We address data limitations and offer a fresh evaluation method for the TOT complex queries. Curious how TREC TOT track test queries are created? Check out this thread🧵 and our paper📄:

2

11

30

Andrej Karpathy

@karpathy

11 months

The most bullish AI capability I'm looking for is not whether it's able to solve PhD grade problems. It's whether you'd hire it as a junior intern. Not "solve this theorem" but "get your slack set up, read these onboarding docs, do this task and let's check in next week".

356

680

10K

Andrej Karpathy

@karpathy

8 months

Agency > Intelligence I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are

Garry Tan

@garrytan

8 months

Intelligence is on tap now so agency is even more important

1K

7K

37K

Azalia Mirhoseini

@Azaliamirh

9 months

We are releasing CodeMonkeys, a system for solving SWE-bench problems with a focus on careful parallel and serial scaling of test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified and and running our selection mechanism on an ensemble of existing top

Bradley Brown

@brad19brown

9 months

My fellow code monkeys (@jordanjuravsky @ryansehrlich) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system

5

23

122

Jeremy Berman

@jerber888

11 months

Announcing LANG-JEPA — a new language model architecture I’m working on that optimizes in “concept” space instead of “token” space. Inspired by @ylecun's JEPA (I-JEPA for images, V-JEPA for video), LANG-JEPA asks: What if we train for conceptual understanding directly, rather

24

118

943

Azalia Mirhoseini

@Azaliamirh

11 months

Thanks for covering our work on test time scaling! Turns out repeated sampling alone is surprisingly effective (~ log linear relationship between num samples and coverage across many reasoning tasks) and even better if combined with sequential “thinking”!

Philipp Schmid

@_philschmid

11 months

Test-Time Compute scaling but in simple! @OpenAI o1/o3 made big waves by being able to scale inference compute relative to downstream performance. Here is a poor man's recipe for it. “Scaling Inference Compute with Repeated Sampling” is a paper that demonstrates how repeated

0

18

139

Anthropic

@AnthropicAI

11 months

New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.

208

708

4K

Felix Petersen

@FHKPetersen

1 year

Excited to share our NeurIPS 2024 Oral, Convolutional Differentiable Logic Gate Networks, leading to a range of inference efficiency records, including inference in only 4 nanoseconds 🏎️. We reduce model sizes by factors of 29x-61x over the SOTA. Paper: https://t.co/Aptk35mKir

18

255

2K

Jeffrey Scholz

@Jeyffre

11 months

I read Google's paper about their quantum computer so you don't have to. They claim to have ran a quantum computation in 5 minutes that would take a normal computer 10^25 years. But what was that computation? Does it live up to the hype? I will break it down.🧵

505

3K

24K

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

11 months

Training Large Language Models to Reason in a Continuous Latent Space Introduces a new paradigm for LLM reasoning called Chain of Continuous Thought (COCONUT) Extremely simple change: instead of mapping between hidden states and language tokens using the LLM head and embedding

51

301

2K

Azalia Mirhoseini

@Azaliamirh

11 months

AI as AI compiler? Very excited to release KernelBench, a new code generation benchmark for evaluating models' ability to generate correct and efficient CUDA kernels. KernelBench has 4 levels: Level 1 (100 tasks): Single-kernel operators (e.g. matmuls) Level 2 (100 tasks):

Anne Ouyang

@anneouyang

11 months

Kernels are the kernel of deep learning. 🙃...but writing kernels sucks. Can LLMs help? 🤔 Introducing 🌽 KernelBench (Preview), a new coding benchmark designed to evaluate the ability of LLMs to generate ⚡️efficient💨 GPU kernels for optimizing neural network performance.

3

29

165

Azalia Mirhoseini

@Azaliamirh

1 year

AI for AI chips for AI!

Anna Goldie

@annadgoldie

1 year

Great article in Freethink about the use of AI in chip design by NVIDIA, Synopsys, and Cadence, as well as Google's use of AlphaChip! https://t.co/IvquPpVrO1

3

1

58

Google DeepMind

@GoogleDeepMind

1 year

After the news was announced, John and Demis reunited with their teams in London. A snapshot of what they had to say ↓

3

21

246

Demis Hassabis

@demishassabis

1 year

Winning the @NobelPrize is the honour of a lifetime and the realisation of a lifelong dream - it still hasn’t really sunk in yet. With AlphaFold2 we cracked the 50-year grand challenge of protein structure prediction: predicting the 3D structure of a protein purely from its

The Nobel Prize

@NobelPrize

1 year

BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction.”

384

919

9K

Danny To Eun Kim

@TEKnologyy

1 year

Timely and fascinating research on LLM security in multi-agent systems! The discovery of 'Prompt Infection' highlights the vulnerabilities of larger models and the critical need for robust safeguards.

Luke DH Lee

@luke_lee_ai

1 year

🚨 Multi-agent systems are no longer safe from prompt injection! In our paper, we introduce Prompt Infection—an infectious prompt injection attack that spreads like a virus across LLM agents, turning your multi-agent system into a network of compromised agents. TL;DR: 1. One

1

2

6