Luke DH Lee Profile
Luke DH Lee

@luke_lee_ai

Followers
112
Following
846
Media
5
Statuses
45

PhD student @UCBerkeley. CS master’s @UCL. Formerly visiting researcher @ Stanford AI Lab.

Joined August 2018
Don't wanna be here? Send us removal request.
@SeonglaeC
Seonglae Cho
4 months
New paper! Rare SAE dataset approach: We train Sparse Autoencoders using only synthetic data generated by the model itself, revealing features that truly reflect what’s inside the model.
2
1
4
@dawnsongtweets
Dawn Song
4 months
My group & collaborators have developed many popular benchmarks over the years, e.g., MMLU, MATH, APPS---really excited about our latest benchmark OMEGA Ω: 🔍Can LLMs really think outside the box in math? a new benchmark probing 3 axes of generalization: 1️⃣ Exploratory 2️⃣
@nouhadziri
Nouha Dziri
4 months
📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found
4
35
157
@luke_lee_ai
Luke DH Lee
5 months
"Rank the Rankers" from CMU! #RAG #MCP
@TEKnologyy
Danny To Eun Kim
5 months
🧵Working with #MCP or building a modular #RAG system, but not sure which rankers to use from your pool? 📊 Rank the Rankers⚡Route smart. This paper shows how. 👨‍🔬 w/ Fernando Diaz @841io 💻 Code: https://t.co/fPBzHWzuF2 Paper:
0
0
2
@luke_lee_ai
Luke DH Lee
5 months
Beginning of autonomous bug discovery & defense! 🔥 AI agents now match elite hackers — 15 zero-days found, $30K+ bugs patched. Huge milestone by @dawnsongtweets & team!
@dawnsongtweets
Dawn Song
5 months
1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖
0
0
0
@Azaliamirh
Azalia Mirhoseini
7 months
In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved! The following amazing work theoretically proves the necessary and
@RylanSchaeffer
Rylan Schaeffer
7 months
Interested in test time / inference scaling laws? Then check out our newest preprint!! 📉 How Do Large Language Monkeys Get Their Power (Laws)? 📉 https://t.co/Vz76RpmXdF w/ @JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1
0
41
170
@TEKnologyy
Danny To Eun Kim
8 months
🚨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research! We address data limitations and offer a fresh evaluation method for the TOT complex queries. Curious how TREC TOT track test queries are created? Check out this thread🧵 and our paper📄:
2
11
30
@karpathy
Andrej Karpathy
11 months
The most bullish AI capability I'm looking for is not whether it's able to solve PhD grade problems. It's whether you'd hire it as a junior intern. Not "solve this theorem" but "get your slack set up, read these onboarding docs, do this task and let's check in next week".
356
680
10K
@karpathy
Andrej Karpathy
8 months
Agency > Intelligence I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are
@garrytan
Garry Tan
8 months
Intelligence is on tap now so agency is even more important
1K
7K
37K
@Azaliamirh
Azalia Mirhoseini
9 months
We are releasing CodeMonkeys, a system for solving SWE-bench problems with a focus on careful parallel and serial scaling of test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified and and running our selection mechanism on an ensemble of existing top
@brad19brown
Bradley Brown
9 months
My fellow code monkeys (@jordanjuravsky @ryansehrlich) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system
5
23
122
@jerber888
Jeremy Berman
11 months
Announcing LANG-JEPA — a new language model architecture I’m working on that optimizes in “concept” space instead of “token” space. Inspired by @ylecun's JEPA (I-JEPA for images, V-JEPA for video), LANG-JEPA asks: What if we train for conceptual understanding directly, rather
24
118
943
@Azaliamirh
Azalia Mirhoseini
11 months
Thanks for covering our work on test time scaling! Turns out repeated sampling alone is surprisingly effective (~ log linear relationship between num samples and coverage across many reasoning tasks) and even better if combined with sequential “thinking”!
@_philschmid
Philipp Schmid
11 months
Test-Time Compute scaling but in simple! @OpenAI o1/o3 made big waves by being able to scale inference compute relative to downstream performance. Here is a poor man's recipe for it. “Scaling Inference Compute with Repeated Sampling” is a paper that demonstrates how repeated
0
18
139
@AnthropicAI
Anthropic
11 months
New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.
208
708
4K
@FHKPetersen
Felix Petersen
1 year
Excited to share our NeurIPS 2024 Oral, Convolutional Differentiable Logic Gate Networks, leading to a range of inference efficiency records, including inference in only 4 nanoseconds 🏎️. We reduce model sizes by factors of 29x-61x over the SOTA. Paper: https://t.co/Aptk35mKir
18
255
2K
@Jeyffre
Jeffrey Scholz
11 months
I read Google's paper about their quantum computer so you don't have to. They claim to have ran a quantum computation in 5 minutes that would take a normal computer 10^25 years. But what was that computation? Does it live up to the hype? I will break it down.🧵
505
3K
24K
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
11 months
Training Large Language Models to Reason in a Continuous Latent Space Introduces a new paradigm for LLM reasoning called Chain of Continuous Thought (COCONUT) Extremely simple change: instead of mapping between hidden states and language tokens using the LLM head and embedding
51
301
2K
@Azaliamirh
Azalia Mirhoseini
11 months
AI as AI compiler? Very excited to release KernelBench, a new code generation benchmark for evaluating models' ability to generate correct and efficient CUDA kernels. KernelBench has 4 levels: Level 1 (100 tasks): Single-kernel operators (e.g. matmuls) Level 2 (100 tasks):
@anneouyang
Anne Ouyang
11 months
Kernels are the kernel of deep learning. 🙃...but writing kernels sucks. Can LLMs help? 🤔 Introducing 🌽 KernelBench (Preview), a new coding benchmark designed to evaluate the ability of LLMs to generate ⚡️efficient💨 GPU kernels for optimizing neural network performance.
3
29
165
@Azaliamirh
Azalia Mirhoseini
1 year
AI for AI chips for AI!
@annadgoldie
Anna Goldie
1 year
Great article in Freethink about the use of AI in chip design by NVIDIA, Synopsys, and Cadence, as well as Google's use of AlphaChip! https://t.co/IvquPpVrO1
3
1
58
@GoogleDeepMind
Google DeepMind
1 year
After the news was announced, John and Demis reunited with their teams in London. A snapshot of what they had to say ↓
3
21
246
@demishassabis
Demis Hassabis
1 year
Winning the @NobelPrize is the honour of a lifetime and the realisation of a lifelong dream - it still hasn’t really sunk in yet. With AlphaFold2 we cracked the 50-year grand challenge of protein structure prediction: predicting the 3D structure of a protein purely from its
@NobelPrize
The Nobel Prize
1 year
BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction.”
384
919
9K
@TEKnologyy
Danny To Eun Kim
1 year
Timely and fascinating research on LLM security in multi-agent systems! The discovery of 'Prompt Infection' highlights the vulnerabilities of larger models and the critical need for robust safeguards.
@luke_lee_ai
Luke DH Lee
1 year
🚨 Multi-agent systems are no longer safe from prompt injection! In our paper, we introduce Prompt Infection—an infectious prompt injection attack that spreads like a virus across LLM agents, turning your multi-agent system into a network of compromised agents. TL;DR: 1. One
1
2
6