
Akshat Shrivastava
@AkshatS07
Followers
758
Following
1K
Media
19
Statuses
142
Co-founder & CTO @perceptroninc; ex Research Scientist @MetaAI (FAIR, AR, Assistant)
Seattle, WA
Joined October 2013
1/n I’m excited to share our new venture, Perceptron AI . With the advancements we have made with AI in the digital world , Perceptron sets out to build for the real-world.
We are Perceptron AI, a new foundation model company from @ArmenAgha, @AkshatS07. Foundation models transformed the digital realm, now it’s time for the physical world. We’re building the first foundation models designed for real-time, multi-modal intelligence across the real.
4
10
45
RT @kilian_maciej: very cool. we found similar results in diffusion model training where EMA on model weights & const LR is more common. s….
0
1
0
When @kilian_maciej and I first started talking about alignment and parameterization, he introduced several ideas presented in this blog post. As we continue to scale foundation models (esp multimodal), and with data-aware, scale-aware parameterization becoming more prevalent ,.
@mm_wojnar and I have been playing around with tensor alignments in neural networks. here’s a summary of our exploration. we go into neural net parameterizations, measuring tensor alignments, and we develop a dynamic maximal learning rate scheduler which factors in alignment
0
0
3
Excited to see further studies into early fusion vs late fusion models, in particular a great analysis into multimodal MoE’s aligned with our findings in MoMa on designing parameter specialization in multimodal LLMs. A few key things that helped us on top of the results presented.
We release a large scale study to answer the following:.- Is late fusion inherently better than early fusion for multimodal models?.- How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵
1
8
37
RT @kilian_maciej: fun debugging journey w/@AkshatS07: be careful around FP8 w. activation checkpointing. activation checkpointing works un….
0
11
0
RT @ariG23498: Bringing Efficiency to LLMs with Fine-Tuning. LayerSkip, introduced in the 2024 paper by @m_elhoushi et al. (arXiv:2404.1671….
0
4
0
MoE's have been a key driver in improving performance for LLMs when memory is abundant, but what happens when we get to resource constrained devices? . Checkout our latest work led by @huberpa91 exploring design decisions in making MoE's optimal for on-device deployment!.
1/n Introducing CoSMoEs 🪐, a set of Compact Sparse Mixture of Experts at on-device scale 📱(. In CoSMoEs, we explore how to enable Sparse Mixture of Experts for on-device inference, focusing on quality, memory, and latency. This work is done with my
0
0
10
RT @apoorvkh: I started a blog! First post is everything I know about setting up (fast, reproducible, error-proof) Python project environme….
0
8
0
RT @ArmenAgha: There is an unprecedented level of cope around DeepSeek, and very little signal on X around R1. I recommend unfollowing anyo….
0
494
0
RT @jecdohmann: I’m very excited to announce that I’ll be joining @perceptroninc () as a researcher and founding m….
0
10
0
Physical world modeling introduces a set of challenges around designing the right interaction space for our model and building the right/scalable data strategy. Reach out to hiring@perceptron.inc if you're interested!.
We have 2 open roles @perceptroninc in-person in Seattle. Full Stack Software Engineer.Software Engineer (Data). Send resumes to hiring@perceptron.inc.
0
3
17
RT @S32_VC: Thank you to everyone who joined our Breakfast at NeurIPS! .Our speakers @zicokolter shared insights on transitioning from acad….
0
3
0
RT @gargighosh: Sharing new research from my team- 1)Dynamic Byte Latent Transformer- First byte level model that matches current LLM perfo….
0
6
0
We (@perceptroninc) will be at NeurIPS! Would love to meet folks there, reach out if you want to chat!.
0
1
16