Akshat Shrivastava @AkshatS07 X Profile

Akshat Shrivastava

@AkshatS07

Followers

759

Following

1K

Media

19

Statuses

142

Co-founder & CTO @perceptroninc; ex Research Scientist @MetaAI (FAIR, AR, Assistant)

Seattle, WA

Joined October 2013

Don't wanna be here? Send us removal request.

Akshat Shrivastava

@AkshatS07

8 months

1/n I’m excited to share our new venture, Perceptron AI . With the advancements we have made with AI in the digital world , Perceptron sets out to build for the real-world.

Perceptron AI

@perceptroninc

8 months

We are Perceptron AI, a new foundation model company from @ArmenAgha, @AkshatS07. Foundation models transformed the digital realm, now it’s time for the physical world. We’re building the first foundation models designed for real-time, multi-modal intelligence across the real.

4

10

45

Akshat Shrivastava

@AkshatS07

2 months

RT @kilian_maciej: very cool. we found similar results in diffusion model training where EMA on model weights & const LR is more common. s….

0

1

0

Akshat Shrivastava

@AkshatS07

3 months

When @kilian_maciej and I first started talking about alignment and parameterization, he introduced several ideas presented in this blog post. As we continue to scale foundation models (esp multimodal), and with data-aware, scale-aware parameterization becoming more prevalent ,.

Maciej Kilian

@kilian_maciej

3 months

@mm_wojnar and I have been playing around with tensor alignments in neural networks. here’s a summary of our exploration. we go into neural net parameterizations, measuring tensor alignments, and we develop a dynamic maximal learning rate scheduler which factors in alignment

0

3

Akshat Shrivastava

@AkshatS07

3 months

MoMa Paper:

0

1

Akshat Shrivastava

@AkshatS07

3 months

To keep a single loss function and static graph during training we leverage experts choice MoE, however this breaks causality. We added auxillary routers (detached from the LM) based on

1

0

3

Akshat Shrivastava

@AkshatS07

3 months

We noticed a similar trend of expert specialization when training from scratch (modality agnostic), we hypothesized that our router wasn’t able to learn higher order features as it was simulatenously learning representations and routing, we found upcycling a dense model even for

1

0

2

Akshat Shrivastava

@AkshatS07

3 months

When controlling for experts - we found that modality aware routing was outperforming modality agnostic routing, we found scaling both modality and text experts and with heirarchical routing (router first by modality and then learned routing within the modality specific MoE) gave

1

0

2

Akshat Shrivastava

@AkshatS07

3 months

Note: Our setup was a little different as we focused on training models with single loss function (no aux loss) and multimodal tokenizers (omni-style).

2

0

2

Akshat Shrivastava

@AkshatS07

3 months

Excited to see further studies into early fusion vs late fusion models, in particular a great analysis into multimodal MoE’s aligned with our findings in MoMa on designing parameter specialization in multimodal LLMs. A few key things that helped us on top of the results presented.

Mustafa Shukor

@MustafaShukor1

3 months

We release a large scale study to answer the following:.- Is late fusion inherently better than early fusion for multimodal models?.- How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵

1

8

37

Akshat Shrivastava

@AkshatS07

3 months

RT @kilian_maciej: stay with me now

0

4

0

Akshat Shrivastava

@AkshatS07

3 months

RT @kilian_maciej: fun debugging journey w/@AkshatS07: be careful around FP8 w. activation checkpointing. activation checkpointing works un….

0

11

0

Akshat Shrivastava

@AkshatS07

4 months

RT @ariG23498: Bringing Efficiency to LLMs with Fine-Tuning. LayerSkip, introduced in the 2024 paper by @m_elhoushi et al. (arXiv:2404.1671….

0

4

0

Akshat Shrivastava

@AkshatS07

4 months

MoE's have been a key driver in improving performance for LLMs when memory is abundant, but what happens when we get to resource constrained devices? . Checkout our latest work led by @huberpa91 exploring design decisions in making MoE's optimal for on-device deployment!.

Patrick Huber

@huberpa91

4 months

1/n Introducing CoSMoEs 🪐, a set of Compact Sparse Mixture of Experts at on-device scale 📱(. In CoSMoEs, we explore how to enable Sparse Mixture of Experts for on-device inference, focusing on quality, memory, and latency. This work is done with my

0

10

Akshat Shrivastava

@AkshatS07

5 months

RT @apoorvkh: I started a blog! First post is everything I know about setting up (fast, reproducible, error-proof) Python project environme….

0

8

0

Akshat Shrivastava

@AkshatS07

6 months

RT @ArmenAgha: There is an unprecedented level of cope around DeepSeek, and very little signal on X around R1. I recommend unfollowing anyo….

0

496

0

Akshat Shrivastava

@AkshatS07

6 months

RT @jecdohmann: I’m very excited to announce that I’ll be joining @perceptroninc () as a researcher and founding m….

0

10

0

Akshat Shrivastava

@AkshatS07

7 months

Physical world modeling introduces a set of challenges around designing the right interaction space for our model and building the right/scalable data strategy. Reach out to hiring@perceptron.inc if you're interested!.

Armen Aghajanyan

@ArmenAgha

7 months

We have 2 open roles @perceptroninc in-person in Seattle. Full Stack Software Engineer.Software Engineer (Data). Send resumes to hiring@perceptron.inc.

0

3

17

Akshat Shrivastava

@AkshatS07

7 months

RT @S32_VC: Thank you to everyone who joined our Breakfast at NeurIPS! .Our speakers @zicokolter shared insights on transitioning from acad….

0

3

0

Akshat Shrivastava

@AkshatS07

7 months

Been waiting for this one, a strong step in removing tokenization from LLMs. Congrats to the team!.

Srini Iyer

@sriniiyer88

7 months

New paper! Byte-Level models are finally competitive with tokenizer-based models with better inference efficiency and robustness! Dynamic patching is the answer! Read all about it here:.(1/n).

0

3

19

Akshat Shrivastava

@AkshatS07

7 months

RT @gargighosh: Sharing new research from my team- 1)Dynamic Byte Latent Transformer- First byte level model that matches current LLM perfo….

0

6

0

Akshat Shrivastava

@AkshatS07

7 months

We (@perceptroninc) will be at NeurIPS! Would love to meet folks there, reach out if you want to chat!.

0

1

16