iislucas (Lucas Dixon)
@iislucas
Followers
389
Following
101
Media
4
Statuses
249
machines learn, graphs reason, identity is a non-identity, incompetence over conspiracy, evil by association is evil, expression is never free, stay curious
Paris
Joined January 2010
Mapping LLMs with Sparse Autoencoders https://t.co/FIcY91YkS4 An interactive introduction to Sparse Autoencoders and their use cases with Nada Hussein, @shivamravalxai, Jimbo Wilson, Ari Alberich, @NeelNanda5, @iislucas, and @Nithum
8
60
510
AI is everywhere, but the most impactful features start with a real user need. Ask how AI can add unique value to that experience, not just how to release an AI feature. Here are three questions to consider before you start building. 1️⃣ What’s the core user problem? 2️⃣
22
18
94
my coworkers have been subjected to this yap so I now continue here: this is genuinely one of the most useful papers to engage with for interp imo. easily one of my most referred to papers. if you have not read it, pls do. mandatory reading until the heat death of the universe.
6
23
399
Veo holograms 🦝⚡️ Visualizing animal superpowers! Just discovered Veo 3's amazing ability to render 3d holograms. Virtual interfaces within the simulated world. 🔊 Prompts in 🧵
2
4
20
We are hiring Applied Interpretability researchers on the GDM Mech Interp Team!🧵 If interpretability is ever going to be useful, we need it to be applied at the frontier. Come work with @NeelNanda5, the @GoogleDeepMind AGI Safety team, and me: apply by 28th February as a
2
36
280
We scaled training data attribution (TDA) methods ~1000x to find influential pretraining examples for thousands of queries in an 8B-parameter LLM over the entire 160B-token C4 corpus! https://t.co/4mglIOAjyB
1
20
127
What a way to celebrate one year of incredible Gemini progress -- #1🥇across the board on overall ranking, as well as on hard prompts, coding, math, instruction following, and more, including with style control on. Thanks to the hard work of everyone in the Gemini team and
Big news on Chatbot Arena 🔥 The new @GoogleDeepMind model gemini-exp-1206 is crushing it, and the race is heating up. Google is back in the #1 spot 🏆overall and tied with O1 for the top coding model! Highlights (improvement since gemini-exp-1121 in parentheses) - First
89
320
2K
🚨 New Paper 🚨 Can LLMs perform latent multi-hop reasoning without exploiting shortcuts? We find the answer is yes – they can recall and compose facts not seen together in training or guessing the answer, but success greatly depends on the type of the bridge entity (80%+ for
7
49
202
I’m so proud of the updated version of #MusicFXDJ we developed in collaboration with @jacobcollier, available today at https://t.co/pYopej66CL. Over the past year I’ve spent countless hours experimenting with our real-time music models, and it feels like I’ve learned to play a
It’s here! Thrilled to collab with @jacobcollier on our latest #LabSession exploring the magical possibilities of generative music and #MusicFXDJ. Watch full video: https://t.co/5FuHxCZFtn Tune in 10/24 at 5 PM ET as Jacob livestreams a MusicFX DJ sesh on his YT channel.
7
35
190
🧵Responses to adversarial queries can still remain latent in a safety-tuned model. Why are they revealed sometimes, but not others? And what are the mechanics of this latent misalignment? Does it matter *who* the user is? (1/n)
1
11
62
We’re welcoming a new 2 billion parameter model to the Gemma 2 family. 🛠️ It offers best-in-class performance for its size and can run efficiently on a wide range of hardware. Developers can get started with 2B today → https://t.co/hQRWYwGY7q
34
307
2K
Can large language models (LLMs) explain their internal mechanisms? Check out the latest AI Explorable on Patchscopes, an inspection framework that uses LLMs to explain the hidden representations of LLMs. Learn more → https://t.co/mvmix9hKs0
17
149
575
Join us live tomorrow at 2:30pm CET for some exciting updates on our research!
13
38
244
Gemma 2 27B is now the best open model while being 2.5x smaller than alternatives! This validates the work done by the team and Gemini. This is just the beginning 💙♊️
We also collect more votes for Gemma-2-27B (now 5K+) for the past few days. Gemma-2 stays robust against Llama-3-70B, now the new best open model!
7
33
212
Being able to interpret an #ML model’s hidden representations is key to understanding its behavior. Today we introduce Patchscopes, an approach that trains #LLMs to provide natural language explanations of their own hidden representations. Learn more → https://t.co/WfY1FYa1Wt
32
346
1K
I love music most when it’s live, in the moment, and expressing something personal. This is why I’m psyched about the new “DJ mode” we developed for MusicFX: https://t.co/1Qk1VjnjEE It’s an infinite AI jam that you control 🎛️. Try mixing your unique 🌀 of instruments, genres,
45
102
434
Super excited for the Gemma model release, and with it a new debugging tool we built on 🔥LIT - use gradient-based salience to debug and refine complex LLM prompts!
ai.google.dev
Explore Gemma model’s behavior with The Learning Interpretability Tool (LIT), an open-source platform for debugging AI/ML models. ➡️ Improve prompts using saliency methods ➡️ Test hypotheses to improve model behavior ➡️ Democratize access to ML debugging https://t.co/pEyQAi75nk
1
4
13
Happy to introduce our paper MusicRL, the first music generation system finetuned with human preferences. Paper link:
arxiv.org
We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the...
2
29
80
🧵Can we “ask” an LLM to “translate” its own hidden representations into natural language? We propose 🩺Patchscopes, a new framework for decoding specific information from a representation by “patching” it into a separate inference pass, independently of its original context. 1/9
15
146
764
We often interpret neural nets by studying simplified representations (e.g. low-dim visualization). But how faithful are these simplifications to the original model? In our new preprint, we found some surprising "interpretability illusions"... 1/6
3
47
282