Mehrdad Farajtabar @MFarajtabar X Profile

Mehrdad Farajtabar

@MFarajtabar

Followers

9K

Following

479

Media

54

Statuses

202

Research Scientist at @Apple, prev @DeepMind, prev @GeorgiaTech

https://t.co/Z39ZU7S9Zt

Seattle Area

Joined January 2021

Don't wanna be here? Send us removal request.

Mehrdad Farajtabar

@MFarajtabar

4 days

Join our innovative team at #Apple as a Research Scientist/Engineer specializing in LLM #Reasoning, #Planning, and General #Intelligence. We are seeking an ideal candidate who: - Is available to start by the end of this year - Holds a PhD or will graduate by year-end - Has 3-5

lnkd.in

This link will take you to a page that’s not on LinkedIn

9

31

257

Mehrdad Farajtabar

@MFarajtabar

15 days

One usually gets a PhD to become an expert! Then we combine experts to form a Mixture-of-Experts (MoE) — gaining efficiency through specialization. But what if you could educate your MoE even further? In our latest work, we show that you can push the boundaries of #efficient

0

4

Nathan Benaich

@nathanbenaich

15 days

🪩The one and only @stateofaireport 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:

53

310

963

Edward Frenkel

@edfrenkel

2 months

This is an unwise statement that can only make people confused about what LLMs can or cannot do. Let me tell you something: Math is NOT about solving this kind of ad hoc optimization problems. Yeah, by scraping available data and then clustering it, LLMs can sometimes solve some

Sebastien Bubeck

@SebastienBubeck

2 months

Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.

252

233

2K

Jackson Atkins

@JacksonAtkinsX

3 months

Apple research just revealed a way to make LLMs 5.35x faster. 🤯 That’s not a typo. They've found a method to get a >500% speedup for code & math tasks, with ZERO quality loss. Here's how they're unlocking AI model's "latent potential": 🧵

18

83

565

Tesla

@Tesla

11 days

Model Y Standard & Model 3 Standard are here

0

691

6K

Fartash Faghri

@FartashFg

3 months

📢Submissions are now open for #NeurIPS2025 CCFM workshop. Submission deadline: August 22, 2025, AoE. Website: https://t.co/oIrrtiRKD6 Call for papers: https://t.co/9sUoMl7AJg Submission Link: https://t.co/2aXHQaqFDf

openreview.net

Welcome to the OpenReview homepage for NeurIPS 2025 Workshop CCFM

Fartash Faghri

@FartashFg

4 months

Is your AI keeping Up with the world? Announcing #NeurIPS2025 CCFM Workshop: Continual and Compatible Foundation Model Updates When/Where: Dec. 6-7 San Diego Submission deadline: Aug. 22, 2025. (opening soon!) https://t.co/oIrrtiRcNy #FoundationModels #ContinualLearning

0

6

11

Mehrdad Farajtabar

@MFarajtabar

3 months

I noticed the same thing! Engaging in conversations, replies, or DMs with #DeepMind folks always feels safe and welcoming. Their culture is truly remarkable. Thanks to leaders like Samy Bengio, Devi Krishna, Daphne Luong, JG, and many others who've joined Apple, this incredible

Gary Marcus

@GaryMarcus

3 months

Personal observation: The level of intellectual discussion with @GoogleDeepMind vs @OpenAI that I am able to have is literally night and day. DeepMind knows my work, can raise serious objections, propose and develop alternatives, etc. OpenAI speaks to me with insulting memes

0

15

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 12/12 What’s next? Our method is just one way to unlock future-token knowledge in AR models. We hope to see new ideas build on this! Diffusion LMs explore the opposite extreme—fully non-AR—but suffer from slow inference. Multi-token prediction may be the sweet spot. 🔄✨

0

3

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 11/12 Tiny changes, big gains We add two lightweight components: Gated LoRA (on each Linear layer) Sampler head (on final transformer output) Memory overhead? Minimal. Even LoRA rank=1 yields speedup—proof that the AR model already knows the future. You just have to ask. 👀

1

0

3

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 10/12 Building speed, step by step 🛠️ Our design improves in layers, each adding speedup (shown in the figure): Linear speculative decoding → light blue Quadratic decoding → yellow boost Sampler head → dark blue LCM loss → olive green Each step stacks more gains. 📈

1

0

3

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 9/12 Speedup over different tasks. We trained a model to predict 8 future tokens at once—and saw 1.5× to 5× speedups, depending on the task. More predictable domains (like code & math) get the biggest gains. And the best part? No quality drop, thanks to gated LoRA

1

0

3

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 8/12 Latent Consistency Matching (LCM) loss We add an extra loss that encourages <mask> predictions to align with the AR model’s next-token predictions. This improves inference speedups by distilling knowledge from the AR model (teacher) to the multi-token predictor

1

0

3

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 7/12 Speculative decoding & multi-token prediction When generating multiple tokens, some might not match what the AR model would produce step-by-step. How do we catch and reject these? We use speculative decoding: generate extra tokens at step T, then verify or reject them at

2

0

2

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 6/12 Preserving generation quality with Gated LoRA Remember Gated LoRA? We fine-tune these modules to help the model fill in <mask> tokens. It’s a simple twist on LoRA: the adapter activates only on <mask> tokens, leaving all other tokens untouched. This ensures the model’s

1

0

5

Josh Durham🔋 Influencer Marketing

@JoshJDurham

2 months

The best influencer deals aren’t about price. They’re about outcomes. Cheap Organic Reach, Higher ROAS, Scaled Ad spend. Stop thinking “one post” and start thinking “partnership.”

5

13

170

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 5/12 Sampling coherent sequences The <mask> tokens give us a distribution over future tokens, but we need to sample from it to create coherent sequences. To do this, we train a sampler head—a simple 2-layer perceptron. The blue token (in the figure) is generated just like

1

0

3

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 4/12 Training and generation with <mask> tokens We fine-tune the AR model with Gated LoRA layers (more on that soon). During training, we insert <mask> tokens in place of future tokens (shown in yellow below). The model learns to predict them accurately. At generation time,

1

0

3

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 3/12 Converting AR model to multi-token predictor We augmented a standard AR model with a few lightweight components to leverage its knowledge of future tokens: 1️⃣ Treat <mask> tokens as future tokens to predict 2️⃣ Add a sampler head to generate coherent multi-token sequences

1

0

2

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 2/12 AR training: the unsung hero with a blessing and a curse AR training made LLMs possible—it's simple, scalable, and needs no labeled data. But at inference, it’s costly: every token needs a full model pass. Although AR models are trained to predict one token at a time,

1

0

3

Mehrdad Farajtabar

@MFarajtabar

3 months

🧵 1/12 Your LLM Knows the Future: Revealing its Multi-token Prediction Capabilities Autoregressive (AR) models power today's LLMs by predicting one token at a time. But what if they could see into the future? In our latest work, we show how to turn AR-trained models into

6

23

155