Arc Jax
@arcjax7
Followers
202
Following
997
Media
1
Statuses
13
Joined May 2025
From this project, I mainly learned three things: 1) Representation learning can fully emerge from generative objectives. In the language domain, this has almost become a consensus. However, in vision, discriminative representation learning methods such as CLIP and DINO still
NEPA: Next-Embedding Predictive Autoregression A simple objective for visual SSL and generative pretraining. Instead of reconstructing pixels or predicting discrete tokens, we train an autoregressive model to predict the next embedding given all previous embeddings. Key ideas:
8
29
317
Speedrunning ImageNet Diffusion Abstract: Recent advances have significantly improved the training efficiency of diffusion transformers. However, these techniques have largely been studied in isolation, leaving unexplored the potential synergies from combining multiple
6
17
186
New cool paper on VGGT for noisy image sets. No training, simple method, good results, useful application Here is a summary, things I like and things I don't Title: Emergent Outlier View Rejection in Visual Geometry Grounded Transformers By KAIST, NYU, ETH and Berkeley [1/n]
6
24
211
gave a talk at the @Google JAX and @OpenXLA conference on scaling MoE pretraining on TPUs. check it out: https://t.co/aopum2ulnU
2
1
14
howdy twitter, I spoke at Google's JAX Devlabs about scaling computer vision using JAX, please enjoy it:
2
4
22
We find that surprisingly for attention sink, MLP rather than the attention layer seems to be the driving force to transform the sink token (token 0)'s activation into something special. Earlier we discovered that layer 6 is responsible for producing an activation that the
4
5
98