Arc Jax Profile
Arc Jax

@arcjax7

Followers
202
Following
997
Media
1
Statuses
13

Joined May 2025
Don't wanna be here? Send us removal request.
@wenhaocha1
Wenhao Chai
18 hours
From this project, I mainly learned three things: 1) Representation learning can fully emerge from generative objectives. In the language domain, this has almost become a consensus. However, in vision, discriminative representation learning methods such as CLIP and DINO still
@ziqiao_ma
Martin Ziqiao Ma
18 hours
NEPA: Next-Embedding Predictive Autoregression A simple objective for visual SSL and generative pretraining. Instead of reconstructing pixels or predicting discrete tokens, we train an autoregressive model to predict the next embedding given all previous embeddings. Key ideas:
8
29
317
@SwayStar123
sway
4 days
Speedrunning ImageNet Diffusion Abstract: Recent advances have significantly improved the training efficiency of diffusion transformers. However, these techniques have largely been studied in isolation, leaving unexplored the potential synergies from combining multiple
6
17
186
@gabriberton
Gabriele Berton
6 days
New cool paper on VGGT for noisy image sets. No training, simple method, good results, useful application Here is a summary, things I like and things I don't Title: Emergent Outlier View Rejection in Visual Geometry Grounded Transformers By KAIST, NYU, ETH and Berkeley [1/n]
6
24
211
@chinmayjindal_
Chinmay Jindal
6 days
gave a talk at the @Google JAX and @OpenXLA conference on scaling MoE pretraining on TPUs. check it out: https://t.co/aopum2ulnU
2
1
14
@arcjax7
Arc Jax
9 days
was very nice meeting @cgarciae88 @sharadvikram and others
0
1
4
@arcjax7
Arc Jax
9 days
howdy twitter, I spoke at Google's JAX Devlabs about scaling computer vision using JAX, please enjoy it:
2
4
22
@HeMuyu0327
Muyu He
10 days
We find that surprisingly for attention sink, MLP rather than the attention layer seems to be the driving force to transform the sink token (token 0)'s activation into something special. Earlier we discovered that layer 6 is responsible for producing an activation that the
4
5
98