Arc Jax @arcjax7 X Profile

Arc Jax

@arcjax7

Followers

202

Following

997

Media

1

Statuses

13

Joined May 2025

Don't wanna be here? Send us removal request.

Wenhao Chai

@wenhaocha1

18 hours

From this project, I mainly learned three things: 1) Representation learning can fully emerge from generative objectives. In the language domain, this has almost become a consensus. However, in vision, discriminative representation learning methods such as CLIP and DINO still

Martin Ziqiao Ma

@ziqiao_ma

18 hours

NEPA: Next-Embedding Predictive Autoregression A simple objective for visual SSL and generative pretraining. Instead of reconstructing pixels or predicting discrete tokens, we train an autoregressive model to predict the next embedding given all previous embeddings. Key ideas:

8

29

317

sway

@SwayStar123

4 days

Speedrunning ImageNet Diffusion Abstract: Recent advances have significantly improved the training efficiency of diffusion transformers. However, these techniques have largely been studied in isolation, leaving unexplored the potential synergies from combining multiple

6

17

186

Gabriele Berton

@gabriberton

6 days

New cool paper on VGGT for noisy image sets. No training, simple method, good results, useful application Here is a summary, things I like and things I don't Title: Emergent Outlier View Rejection in Visual Geometry Grounded Transformers By KAIST, NYU, ETH and Berkeley [1/n]

6

24

211

Chinmay Jindal

@chinmayjindal_

6 days

gave a talk at the @Google JAX and @OpenXLA conference on scaling MoE pretraining on TPUs. check it out: https://t.co/aopum2ulnU

2

1

14

Arc Jax

@arcjax7

9 days

was very nice meeting @cgarciae88 @sharadvikram and others

0

1

4

Arc Jax

@arcjax7

9 days

howdy twitter, I spoke at Google's JAX Devlabs about scaling computer vision using JAX, please enjoy it:

2

4

22

Muyu He

@HeMuyu0327

10 days

We find that surprisingly for attention sink, MLP rather than the attention layer seems to be the driving force to transform the sink token (token 0)'s activation into something special. Earlier we discovered that layer 6 is responsible for producing an activation that the

4

5

98