Sachin @sacmehtauw X Profile

Sachin

@sacmehtauw

Followers

733

Following

209

Media

24

Statuses

155

Staff Research Scientist, GenAI@Meta and Affiliate Assistant Professor @UW. Opinions are my own.

https://t.co/AQJn6BeSaQ

Seattle, WA

Joined June 2019

Don't wanna be here? Send us removal request.

Sachin

@sacmehtauw

2 years

We've just published CoreNet. A few highlights: ⚡️OpenELM, a new efficient language model that optimizes parameters for accuracy with fewer tokens using layer-wise scaling. ⚡️MLX model conversion and inference. ⚡️Wide array of vision and language models with SOTA training recipes

1

8

47

Fahim Tajwar

@FahimTajwar10

6 months

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

21

147

839

Sachin

@sacmehtauw

8 months

Ready, set, innovate! #Llama4 is out now! 👇

Ahmad Al-Dahle

@Ahmad_Al_Dahle

8 months

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

0

2

Srini Iyer

@sriniiyer88

11 months

We're hiring PhD interns for Summer 2025 in Seattle to work with us on improving BLT even more! If this is something that excites you, reach out to me on dm/email asap!

AI at Meta

@AIatMeta

11 months

New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ https://t.co/0iamZCRnMN

4

29

314

Oncel Tuzel

@OncelTuzel

1 year

Join our team! Applications are open until December 16. Submit your application through the portal below, and feel free to send me a message afterward. This position is also available in Cupertino!

Oncel Tuzel

@OncelTuzel

1 year

Our Machine Learning Research (MLR) team at #Apple is seeking a passionate AI resident to conduct research on multi-modal generative models (vision, 3D, language, audio) and to explore effective control mechanisms for these models. Application details: https://t.co/NwwTeLYoGX

0

3

12

John Thickstun

@jwthickstun

1 year

I am recruiting PhD students for Fall '25 at Cornell! I plan to admit multiple students interested in building more controllable generative models, music technologies (or both!). 🎶 Please apply to @Cornell_CS.

3

47

249

AutoGen

@pyautogen

1 year

To our collaborators & community: We’ve seen questions about AutoGen forks/clones vs. the official project. Here's a summary of the latest. Please share with others. - The official repo is https://t.co/9OI3SK56zE. - We're actively working on AutoGen v0.2, with v0.4 innovations

github.com

A programming framework for agentic AI. Contribute to microsoft/autogen development by creating an account on GitHub.

AutoGen

@pyautogen

1 year

https://t.co/lGN9GUUnBX

9

24

55

Gagan Bansal

@bansalg_

1 year

Excited to finally release Magentic-One! The thing I love about this multi-agent team is that the same implementation achieves very strong performance across three challenging agentic benchmarks. If you are someone working on agentic systems, you know how challenging this can

AutoGen

@pyautogen

1 year

📢Introducing Magentic-One, a generalist 5-agent multi-agent system for solving open-ended web- and file-based tasks. 🤖🤖🤖🤖🤖 Magentic-One represents a significant step towards agents that can complete tasks that people encounter in their daily lives and can achieve strong

2

6

40

Sachin

@sacmehtauw

1 year

We’ve released QUANTIZED Llama 3.2 1B/3B models. ⚡️FAST and EFFICIENT: 1B decodes at ~50 tok/s on a MOBILE PHONE CPU. ⚡️As ACCURATE as full-precision models. ⚡️Ready to CONSUME on mobile devices. Looking forward to on-device experiences these models will enable! Read more👇

AI at Meta

@AIatMeta

1 year

We want to make it easier for more people to build with Llama — so today we’re releasing new quantized versions of Llama 3.2 1B & 3B that deliver up to 2-4x increases in inference speed and, on average, 56% reduction in model size, and 41% reduction in memory footprint. Details

0

7

17

Keivan Alizadeh

@KeivanAlizadeh2

1 year

Hey Guys, I'm gonna present LLM in a flash in ACL 2024. Hit me up if you are in Bangkok. https://t.co/t67MbvpPOO Updates from previous version: - Llama 2 results - Some results on Apple GPUs (Metal) - Speculative decoding - Memory Latency Tradeoff - Impact of longer generation

arxiv.org

Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory...

0

6

45

Sachin

@sacmehtauw

1 year

Today is my last day @Apple. It's been a fantastic journey working on projects like OpenELM, CoreNet, and MobileViT with such a talented team. I’m excited for what’s next but will always remember the incredible experiences here. 🍎 #GoodbyeApple #NextChapter

2

1

96

AK

@_akhaliq

1 year

Apple presents LazyLLM Dynamic Token Pruning for Efficient Long Context LLM Inference The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a

4

70

296

Hadi Pouransari

@HPouransari

1 year

🤔 Wondering how to leverage large foundation models to train small on-device task-specific models? Check out our ICML paper or stop by the poster next Thursday at 11:30am in Vienna. paper: https://t.co/VRqk6jXNYL dataset: https://t.co/F6wR9VkppT #Apple @ #ICML:

0

6

15

Srini Iyer

@sriniiyer88

2 years

Excited to release our work from last year showcasing a stable training recipe for fully token-based multi-modal early-fusion auto-regressive models! https://t.co/H0wOurpeuC Huge shout out to @ArmenAgha @ramakanth1729 @LukeZettlemoyer @gargighosh and other co-authors. (1/n)

arxiv.org

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training...

4

28

102

Ani Kembhavi

@anikembhavi

2 years

Friendly reminder that if you are looking to get into mixed-modal early-fusion foundation models & are looking for training and inference code, open model weights and benchmarking across a suite of vision and NLP benchmarks, Please take a look at our work from @allen_ai 2 years

AI at Meta

@AIatMeta

2 years

Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ https://t.co/JQZHig977O

0

10

55

AI at Meta

@AIatMeta

2 years

Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ https://t.co/JQZHig977O

25

192

910

Pedro Cuenca

@pcuenq

2 years

OpenELM (small 270M version) converted to Core ML, running on my M1 at 56 tok/s.

5

20

135

clem 🤗

@ClementDelangue

2 years

Apple OpenELM is now #1 trending model on HF! https://t.co/KNtnCiHXBl

7

25

177

Sachin

@sacmehtauw

2 years

Interesting!

Awni Hannun

@awnihannun

2 years

Run Apple's new OpenELM models in MLX LM thanks to @Prince_Canuma pip install -U mlx-lm 270M model in 16-bit runs quite fast on an 8GB M2 Mini (512 tokens at 115 toks/sec). Also pretty good quality for the size:

0

1

3

Sachin

@sacmehtauw

2 years

Like OpenELM, CatLIP is also "Open" https://t.co/66apun79Xm

github.com

CoreNet: A library for training deep neural networks - apple/corenet

AK

@_akhaliq

2 years

Apple presents CatLIP CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text

0

6

15

Sachin

@sacmehtauw

2 years

https://t.co/LJZZYr2wBb https://t.co/25EQU5ggVN

Sachin

@sacmehtauw

2 years

⚡️Yesterday: OpenELM (open LLM). Today: CatLIP (Open vision FM). 💡Embrace faster vision pre-training! By reframing pre-training on image-text as a classification task, we cut training time by 2.7× compared to CLIP. 🎯Match accuracy of private (JFT-3B) dataset-trained VFMs.

0

1