Mehrdad Farajtabar Profile
Mehrdad Farajtabar

@MFarajtabar

Followers
9K
Following
479
Media
54
Statuses
202

Research Scientist at @Apple, prev @DeepMind, prev @GeorgiaTech

Seattle Area
Joined January 2021
Don't wanna be here? Send us removal request.
@MFarajtabar
Mehrdad Farajtabar
4 days
Join our innovative team at #Apple as a Research Scientist/Engineer specializing in LLM #Reasoning, #Planning, and General #Intelligence. We are seeking an ideal candidate who: - Is available to start by the end of this year - Holds a PhD or will graduate by year-end - Has 3-5
lnkd.in
This link will take you to a page that’s not on LinkedIn
9
31
257
@MFarajtabar
Mehrdad Farajtabar
15 days
One usually gets a PhD to become an expert! Then we combine experts to form a Mixture-of-Experts (MoE) — gaining efficiency through specialization. But what if you could educate your MoE even further? In our latest work, we show that you can push the boundaries of #efficient
0
0
4
@nathanbenaich
Nathan Benaich
15 days
🪩The one and only @stateofaireport 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:
53
310
963
@edfrenkel
Edward Frenkel
2 months
This is an unwise statement that can only make people confused about what LLMs can or cannot do. Let me tell you something: Math is NOT about solving this kind of ad hoc optimization problems. Yeah, by scraping available data and then clustering it, LLMs can sometimes solve some
@SebastienBubeck
Sebastien Bubeck
2 months
Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.
252
233
2K
@JacksonAtkinsX
Jackson Atkins
3 months
Apple research just revealed a way to make LLMs 5.35x faster. 🤯 That’s not a typo. They've found a method to get a >500% speedup for code & math tasks, with ZERO quality loss. Here's how they're unlocking AI model's "latent potential": 🧵
18
83
565
@Tesla
Tesla
11 days
Model Y Standard & Model 3 Standard are here
0
691
6K
@FartashFg
Fartash Faghri
3 months
📢Submissions are now open for #NeurIPS2025 CCFM workshop. Submission deadline: August 22, 2025, AoE. Website:  https://t.co/oIrrtiRKD6 Call for papers:  https://t.co/9sUoMl7AJg Submission Link:  https://t.co/2aXHQaqFDf
openreview.net
Welcome to the OpenReview homepage for NeurIPS 2025 Workshop CCFM
@FartashFg
Fartash Faghri
4 months
Is your AI keeping Up with the world? Announcing #NeurIPS2025 CCFM Workshop: Continual and Compatible Foundation Model Updates When/Where: Dec. 6-7 San Diego Submission deadline: Aug. 22, 2025. (opening soon!) https://t.co/oIrrtiRcNy #FoundationModels #ContinualLearning
0
6
11
@MFarajtabar
Mehrdad Farajtabar
3 months
I noticed the same thing! Engaging in conversations, replies, or DMs with #DeepMind folks always feels safe and welcoming. Their culture is truly remarkable. Thanks to leaders like Samy Bengio, Devi Krishna, Daphne Luong, JG, and many others who've joined Apple, this incredible
@GaryMarcus
Gary Marcus
3 months
Personal observation: The level of intellectual discussion with @GoogleDeepMind vs @OpenAI that I am able to have is literally night and day. DeepMind knows my work, can raise serious objections, propose and develop alternatives, etc. OpenAI speaks to me with insulting memes
0
0
15
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 12/12 What’s next? Our method is just one way to unlock future-token knowledge in AR models. We hope to see new ideas build on this! Diffusion LMs explore the opposite extreme—fully non-AR—but suffer from slow inference. Multi-token prediction may be the sweet spot. 🔄✨
0
0
3
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 11/12 Tiny changes, big gains We add two lightweight components: Gated LoRA (on each Linear layer) Sampler head (on final transformer output) Memory overhead? Minimal. Even LoRA rank=1 yields speedup—proof that the AR model already knows the future. You just have to ask. 👀
1
0
3
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 10/12 Building speed, step by step 🛠️ Our design improves in layers, each adding speedup (shown in the figure): Linear speculative decoding → light blue Quadratic decoding → yellow boost Sampler head → dark blue LCM loss → olive green Each step stacks more gains. 📈
1
0
3
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 9/12 Speedup over different tasks. We trained a model to predict 8 future tokens at once—and saw 1.5× to 5× speedups, depending on the task. More predictable domains (like code & math) get the biggest gains. And the best part? No quality drop, thanks to gated LoRA
1
0
3
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 8/12 Latent Consistency Matching (LCM) loss We add an extra loss that encourages <mask> predictions to align with the AR model’s next-token predictions. This improves inference speedups by distilling knowledge from the AR model (teacher) to the multi-token predictor
1
0
3
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 7/12 Speculative decoding & multi-token prediction When generating multiple tokens, some might not match what the AR model would produce step-by-step. How do we catch and reject these? We use speculative decoding: generate extra tokens at step T, then verify or reject them at
2
0
2
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 6/12 Preserving generation quality with Gated LoRA Remember Gated LoRA? We fine-tune these modules to help the model fill in <mask> tokens. It’s a simple twist on LoRA: the adapter activates only on <mask> tokens, leaving all other tokens untouched. This ensures the model’s
1
0
5
@JoshJDurham
Josh Durham🔋 Influencer Marketing
2 months
The best influencer deals aren’t about price. They’re about outcomes. Cheap Organic Reach, Higher ROAS, Scaled Ad spend. Stop thinking “one post” and start thinking “partnership.”
5
13
170
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 5/12 Sampling coherent sequences The <mask> tokens give us a distribution over future tokens, but we need to sample from it to create coherent sequences. To do this, we train a sampler head—a simple 2-layer perceptron. The blue token (in the figure) is generated just like
1
0
3
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 4/12 Training and generation with <mask> tokens We fine-tune the AR model with Gated LoRA layers (more on that soon). During training, we insert <mask> tokens in place of future tokens (shown in yellow below). The model learns to predict them accurately. At generation time,
1
0
3
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 3/12 Converting AR model to multi-token predictor We augmented a standard AR model with a few lightweight components to leverage its knowledge of future tokens: 1️⃣ Treat <mask> tokens as future tokens to predict 2️⃣ Add a sampler head to generate coherent multi-token sequences
1
0
2
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 2/12 AR training: the unsung hero with a blessing and a curse AR training made LLMs possible—it's simple, scalable, and needs no labeled data. But at inference, it’s costly: every token needs a full model pass. Although AR models are trained to predict one token at a time,
1
0
3
@MFarajtabar
Mehrdad Farajtabar
3 months
🧵 1/12 Your LLM Knows the Future: Revealing its Multi-token Prediction Capabilities Autoregressive (AR) models power today's LLMs by predicting one token at a time. But what if they could see into the future? In our latest work, we show how to turn AR-trained models into
6
23
155