Sachin Profile
Sachin

@sacmehtauw

Followers
733
Following
209
Media
24
Statuses
155

Staff Research Scientist, GenAI@Meta and Affiliate Assistant Professor @UW. Opinions are my own.

Seattle, WA
Joined June 2019
Don't wanna be here? Send us removal request.
@sacmehtauw
Sachin
2 years
We've just published CoreNet. A few highlights: ⚡️OpenELM, a new efficient language model that optimizes parameters for accuracy with fewer tokens using layer-wise scaling. ⚡️MLX model conversion and inference. ⚡️Wide array of vision and language models with SOTA training recipes
1
8
47
@FahimTajwar10
Fahim Tajwar
6 months
RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n
21
147
839
@sacmehtauw
Sachin
8 months
Ready, set, innovate! #Llama4 is out now! 👇
@Ahmad_Al_Dahle
Ahmad Al-Dahle
8 months
Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
0
0
2
@sriniiyer88
Srini Iyer
11 months
We're hiring PhD interns for Summer 2025 in Seattle to work with us on improving BLT even more! If this is something that excites you, reach out to me on dm/email asap!
@AIatMeta
AI at Meta
11 months
New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ https://t.co/0iamZCRnMN
4
29
314
@OncelTuzel
Oncel Tuzel
1 year
Join our team! Applications are open until December 16. Submit your application through the portal below, and feel free to send me a message afterward. This position is also available in Cupertino!
@OncelTuzel
Oncel Tuzel
1 year
Our Machine Learning Research (MLR) team at #Apple is seeking a passionate AI resident to conduct research on multi-modal generative models (vision, 3D, language, audio) and to explore effective control mechanisms for these models. Application details: https://t.co/NwwTeLYoGX
0
3
12
@jwthickstun
John Thickstun
1 year
I am recruiting PhD students for Fall '25 at Cornell! I plan to admit multiple students interested in building more controllable generative models, music technologies (or both!). 🎶 Please apply to @Cornell_CS.
3
47
249
@pyautogen
AutoGen
1 year
To our collaborators & community: We’ve seen questions about AutoGen forks/clones vs. the official project. Here's a summary of the latest. Please share with others. - The official repo is https://t.co/9OI3SK56zE. - We're actively working on AutoGen v0.2, with v0.4 innovations
Tweet card summary image
github.com
A programming framework for agentic AI. Contribute to microsoft/autogen development by creating an account on GitHub.
9
24
55
@bansalg_
Gagan Bansal
1 year
Excited to finally release Magentic-One! The thing I love about this multi-agent team is that the same implementation achieves very strong performance across three challenging agentic benchmarks. If you are someone working on agentic systems, you know how challenging this can
@pyautogen
AutoGen
1 year
📢Introducing Magentic-One, a generalist 5-agent multi-agent system for solving open-ended web- and file-based tasks. 🤖🤖🤖🤖🤖 Magentic-One represents a significant step towards agents that can complete tasks that people encounter in their daily lives and can achieve strong
2
6
40
@sacmehtauw
Sachin
1 year
We’ve released QUANTIZED Llama 3.2 1B/3B models. ⚡️FAST and EFFICIENT: 1B decodes at ~50 tok/s on a MOBILE PHONE CPU. ⚡️As ACCURATE as full-precision models. ⚡️Ready to CONSUME on mobile devices. Looking forward to on-device experiences these models will enable! Read more👇
@AIatMeta
AI at Meta
1 year
We want to make it easier for more people to build with Llama — so today we’re releasing new quantized versions of Llama 3.2 1B & 3B that deliver up to 2-4x increases in inference speed and, on average, 56% reduction in model size, and 41% reduction in memory footprint. Details
0
7
17
@KeivanAlizadeh2
Keivan Alizadeh
1 year
Hey Guys, I'm gonna present LLM in a flash in ACL 2024. Hit me up if you are in Bangkok. https://t.co/t67MbvpPOO Updates from previous version: - Llama 2 results - Some results on Apple GPUs (Metal) - Speculative decoding - Memory Latency Tradeoff - Impact of longer generation
Tweet card summary image
arxiv.org
Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory...
0
6
45
@sacmehtauw
Sachin
1 year
Today is my last day @Apple. It's been a fantastic journey working on projects like OpenELM, CoreNet, and MobileViT with such a talented team. I’m excited for what’s next but will always remember the incredible experiences here. 🍎 #GoodbyeApple #NextChapter
2
1
96
@_akhaliq
AK
1 year
Apple presents LazyLLM Dynamic Token Pruning for Efficient Long Context LLM Inference The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a
4
70
296
@HPouransari
Hadi Pouransari
1 year
🤔 Wondering how to leverage large foundation models to train small on-device task-specific models? Check out our ICML paper or stop by the poster next Thursday at 11:30am in Vienna. paper: https://t.co/VRqk6jXNYL dataset: https://t.co/F6wR9VkppT #Apple @ #ICML:
0
6
15
@sriniiyer88
Srini Iyer
2 years
Excited to release our work from last year showcasing a stable training recipe for fully token-based multi-modal early-fusion auto-regressive models! https://t.co/H0wOurpeuC Huge shout out to @ArmenAgha @ramakanth1729 @LukeZettlemoyer @gargighosh and other co-authors. (1/n)
Tweet card summary image
arxiv.org
We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training...
4
28
102
@anikembhavi
Ani Kembhavi
2 years
Friendly reminder that if you are looking to get into mixed-modal early-fusion foundation models & are looking for training and inference code, open model weights and benchmarking across a suite of vision and NLP benchmarks, Please take a look at our work from @allen_ai 2 years
@AIatMeta
AI at Meta
2 years
Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ https://t.co/JQZHig977O
0
10
55
@AIatMeta
AI at Meta
2 years
Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ https://t.co/JQZHig977O
25
192
910
@pcuenq
Pedro Cuenca
2 years
OpenELM (small 270M version) converted to Core ML, running on my M1 at 56 tok/s.
5
20
135
@ClementDelangue
clem 🤗
2 years
Apple OpenELM is now #1 trending model on HF! https://t.co/KNtnCiHXBl
7
25
177
@sacmehtauw
Sachin
2 years
Interesting!
@awnihannun
Awni Hannun
2 years
Run Apple's new OpenELM models in MLX LM thanks to @Prince_Canuma pip install -U mlx-lm 270M model in 16-bit runs quite fast on an 8GB M2 Mini (512 tokens at 115 toks/sec). Also pretty good quality for the size:
0
1
3
@sacmehtauw
Sachin
2 years
Like OpenELM, CatLIP is also "Open" https://t.co/66apun79Xm
Tweet card summary image
github.com
CoreNet: A library for training deep neural networks - apple/corenet
@_akhaliq
AK
2 years
Apple presents CatLIP CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text
0
6
15
@sacmehtauw
Sachin
2 years
@sacmehtauw
Sachin
2 years
⚡️Yesterday: OpenELM (open LLM). Today: CatLIP (Open vision FM). 💡Embrace faster vision pre-training! By reframing pre-training on image-text as a classification task, we cut training time by 2.7× compared to CLIP. 🎯Match accuracy of private (JFT-3B) dataset-trained VFMs.
0
1
1