Akshat Shrivastava Profile
Akshat Shrivastava

@AkshatS07

Followers
758
Following
1K
Media
19
Statuses
142

Co-founder & CTO @perceptroninc; ex Research Scientist @MetaAI (FAIR, AR, Assistant)

Seattle, WA
Joined October 2013
Don't wanna be here? Send us removal request.
@AkshatS07
Akshat Shrivastava
8 months
1/n I’m excited to share our new venture, Perceptron AI . With the advancements we have made with AI in the digital world , Perceptron sets out to build for the real-world.
@perceptroninc
Perceptron AI
8 months
We are Perceptron AI, a new foundation model company from @ArmenAgha, @AkshatS07. Foundation models transformed the digital realm, now it’s time for the physical world. We’re building the first foundation models designed for real-time, multi-modal intelligence across the real.
4
10
45
@AkshatS07
Akshat Shrivastava
2 months
RT @kilian_maciej: very cool. we found similar results in diffusion model training where EMA on model weights & const LR is more common. s….
0
1
0
@AkshatS07
Akshat Shrivastava
3 months
When @kilian_maciej and I first started talking about alignment and parameterization, he introduced several ideas presented in this blog post. As we continue to scale foundation models (esp multimodal), and with data-aware, scale-aware parameterization becoming more prevalent ,.
@kilian_maciej
Maciej Kilian
3 months
@mm_wojnar and I have been playing around with tensor alignments in neural networks. here’s a summary of our exploration. we go into neural net parameterizations, measuring tensor alignments, and we develop a dynamic maximal learning rate scheduler which factors in alignment
Tweet media one
0
0
3
@AkshatS07
Akshat Shrivastava
3 months
MoMa Paper:
0
0
1
@AkshatS07
Akshat Shrivastava
3 months
To keep a single loss function and static graph during training we leverage experts choice MoE, however this breaks causality. We added auxillary routers (detached from the LM) based on
Tweet media one
1
0
3
@AkshatS07
Akshat Shrivastava
3 months
We noticed a similar trend of expert specialization when training from scratch (modality agnostic), we hypothesized that our router wasn’t able to learn higher order features as it was simulatenously learning representations and routing, we found upcycling a dense model even for
Tweet media one
1
0
2
@AkshatS07
Akshat Shrivastava
3 months
When controlling for experts - we found that modality aware routing was outperforming modality agnostic routing, we found scaling both modality and text experts and with heirarchical routing (router first by modality and then learned routing within the modality specific MoE) gave
Tweet media one
1
0
2
@AkshatS07
Akshat Shrivastava
3 months
Note: Our setup was a little different as we focused on training models with single loss function (no aux loss) and multimodal tokenizers (omni-style).
2
0
2
@AkshatS07
Akshat Shrivastava
3 months
Excited to see further studies into early fusion vs late fusion models, in particular a great analysis into multimodal MoE’s aligned with our findings in MoMa on designing parameter specialization in multimodal LLMs. A few key things that helped us on top of the results presented.
@MustafaShukor1
Mustafa Shukor
3 months
We release a large scale study to answer the following:.- Is late fusion inherently better than early fusion for multimodal models?.- How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵
Tweet media one
1
8
37
@AkshatS07
Akshat Shrivastava
3 months
RT @kilian_maciej: stay with me now
Tweet media one
0
4
0
@AkshatS07
Akshat Shrivastava
3 months
RT @kilian_maciej: fun debugging journey w/@AkshatS07: be careful around FP8 w. activation checkpointing. activation checkpointing works un….
0
11
0
@AkshatS07
Akshat Shrivastava
4 months
RT @ariG23498: Bringing Efficiency to LLMs with Fine-Tuning. LayerSkip, introduced in the 2024 paper by @m_elhoushi et al. (arXiv:2404.1671….
0
4
0
@AkshatS07
Akshat Shrivastava
4 months
MoE's have been a key driver in improving performance for LLMs when memory is abundant, but what happens when we get to resource constrained devices? . Checkout our latest work led by @huberpa91 exploring design decisions in making MoE's optimal for on-device deployment!.
@huberpa91
Patrick Huber
4 months
1/n Introducing CoSMoEs 🪐, a set of Compact Sparse Mixture of Experts at on-device scale 📱(. In CoSMoEs, we explore how to enable Sparse Mixture of Experts for on-device inference, focusing on quality, memory, and latency. This work is done with my
Tweet media one
0
0
10
@AkshatS07
Akshat Shrivastava
5 months
RT @apoorvkh: I started a blog! First post is everything I know about setting up (fast, reproducible, error-proof) Python project environme….
0
8
0
@AkshatS07
Akshat Shrivastava
5 months
RT @ArmenAgha: There is an unprecedented level of cope around DeepSeek, and very little signal on X around R1. I recommend unfollowing anyo….
0
494
0
@AkshatS07
Akshat Shrivastava
6 months
RT @jecdohmann: I’m very excited to announce that I’ll be joining @perceptroninc () as a researcher and founding m….
0
10
0
@AkshatS07
Akshat Shrivastava
7 months
Physical world modeling introduces a set of challenges around designing the right interaction space for our model and building the right/scalable data strategy. Reach out to hiring@perceptron.inc if you're interested!.
@ArmenAgha
Armen Aghajanyan
7 months
We have 2 open roles @perceptroninc in-person in Seattle. Full Stack Software Engineer.Software Engineer (Data). Send resumes to hiring@perceptron.inc.
0
3
17
@AkshatS07
Akshat Shrivastava
7 months
RT @S32_VC: Thank you to everyone who joined our Breakfast at NeurIPS! .Our speakers @zicokolter shared insights on transitioning from acad….
0
3
0
@AkshatS07
Akshat Shrivastava
7 months
Been waiting for this one, a strong step in removing tokenization from LLMs. Congrats to the team!.
@sriniiyer88
Srini Iyer
7 months
New paper! Byte-Level models are finally competitive with tokenizer-based models with better inference efficiency and robustness! Dynamic patching is the answer! Read all about it here:.(1/n).
0
3
19
@AkshatS07
Akshat Shrivastava
7 months
RT @gargighosh: Sharing new research from my team- 1)Dynamic Byte Latent Transformer- First byte level model that matches current LLM perfo….
0
6
0
@AkshatS07
Akshat Shrivastava
7 months
We (@perceptroninc) will be at NeurIPS! Would love to meet folks there, reach out if you want to chat!.
0
1
16