Jaskirat Singh @ ICCV2025🌴 @1jaskiratsingh X Profile

Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

Followers

326

Following

596

Media

26

Statuses

243

https://t.co/GyfwnXPjLE

Seattle, Washington

Joined June 2018

Don't wanna be here? Send us removal request.

Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

7 months

Can we optimize both the VAE tokenizer and diffusion model together in an end-to-end manner? Short Answer: Yes. 🚨 Introducing REPA-E: the first end-to-end tuning approach for jointly optimizing both the VAE and the latent diffusion model using REPA loss 🚨 Key Idea: 🧠

7

31

170

Shusheng Yang

@shushengyang

1 day

[Videos are entanglements of space and time.] Around one year ago, we released VSI-Bench, in which we studied visual spatial intelligence: a fundamental but missing pillar of current MLLMs. Today, we are excited to introduce Cambrian-S, our further step that goes beyond visual

Saining Xie

@sainingxie

2 days

Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶

2

15

67

Amanda Bertsch

@abertsch72

1 day

Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

7

44

163

Saining Xie

@sainingxie

2 days

Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶

13

68

468

Fei-Fei Li

@drfeifei

2 days

It’s an honor to have received the @QEPrize along with my fellow laureates! But it’s also a responsibility. AI’s impact to humanity is in the hands of all of us.

The Royal Family

@RoyalFamily

3 days

Today, The King presented The Queen Elizabeth Prize for Engineering at St James's Palace, celebrating the innovations which are transforming our world. 🧠 This year’s prize honours seven pioneers whose work has shaped modern artificial intelligence. 🔗 Find out more:

92

94

2K

Saining Xie

@sainingxie

3 days

you can’t build superintelligence without first building supersensing

30

32

288

John Yang

@jyangballin

3 days

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

25

88

354

Linjie (Lindsey) Li

@LINJIEFUN

5 days

Check out our work ThinkMorph, which thinks in multi-modalities, not just with them.

Jiawei Gu

@Kuvvius

5 days

🚨Sensational title alert: we may have cracked the code to true multimodal reasoning. Meet ThinkMorph — thinking in modalities, not just with them. And what we found was... unexpected. 👀 Emergent intelligence, strong gains, and …🫣 🧵 https://t.co/2GPHnsPq7R (1/16)

1

9

28

Manish Shetty

@slimshetty_

5 days

Tests certify functional behavior; they don’t judge intent. GSO, our code optimization benchmark, now combines tests with a rubric-driven HackDetector to identify models that game the benchmark. We found that up to 30% of a model’s attempts are non-idiomatic reward hacks, which

1

3

15

Naman Jain

@StringChaos

5 days

We added LLM judge based hack detector to our code optimization evals and found models perform non-idiomatic code changes in upto 30% of the problems 🤯

Manish Shetty

@slimshetty_

5 days

Tests certify functional behavior; they don’t judge intent. GSO, our code optimization benchmark, now combines tests with a rubric-driven HackDetector to identify models that game the benchmark. We found that up to 30% of a model’s attempts are non-idiomatic reward hacks, which

0

2

7

Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

17 days

end-to-end training just makes latent diffusion transformers better! with repa-e, we showed the power of end-to-end training on imagenet. today we are extending it to text-to-image (T2I) generation. #ICCV2025 🌴 🚨 Introducing "REPA-E for T2I: family of end-to-end tuned VAEs for

1

17

42

Sayak Paul

@RisingSayak

10 days

With simple changes, I was able to cut down @krea_ai's new real-time video gen's timing from 25.54s to 18.14s 🔥🚀 1. FA3 through `kernels` 2. Regional compilation 3. Selective (FP8) quantization Notes are in 🧵 below

5

13

108

Chieh-Hsin (Jesse) Lai

@JCJesseLai

11 days

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core

43

431

2K

Wenhao Chai

@wenhaocha1

13 days

Back in 2024, LMMs-Eval built a complete evaluation ecosystem for the MLLM/LMM community, with countless researchers contributing their models and benchmarks to raise the whole edifice. I was fortunate to be one of them: our series of video-LMM works (MovieChat, AuroraCap, VDC)

Brian Bo Li

@BoLi68567011

15 days

Throughout my journey in developing multimodal models, I’ve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly — with full

2

3

29

Sida Wang

@sidawxyz

16 days

I have one PhD intern opening to do research as a part of a model training effort at the FAIR CodeGen team (latest: Code World Model). If interested, email me directly and apply at

metacareers.com

Meta's mission is to build the future of human connection and the technology that makes it possible.

7

27

235

Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

17 days

this release was a fun joint collaboration between @canva and the repa-e team. @xingjian_leng , @YunzhongH, @ZhenchangXing, @sainingxie , @LiangZheng_06 , @advadnoun , @torchcompiled 🙏 project page: https://t.co/uUiINuccVn code:

github.com

[ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers - End2End-Diffusion/REPA-E

0

3

8

Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

17 days

best part - all E2E-VAEs can be used within few lines of code!! from diffusers import AutoencoderKL vae = AutoencoderKL.from_pretrained("REPA-E/e2e-flux-vae").to("cuda") please see project page for more details. https://t.co/uUiINuccVn (7/n)

1

2

3

Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

17 days

but why does end-to-end tuning on imagenet generalize to complex t2i training? answer: better latent space structure. >> we found that this works because e2e-tuning automatically injects better spatial structure and semantic details into the VAE representations. Thus E2E-VAEs

1

2

Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

17 days

what about impact of end-to-end training on reconstruction performance of vae's? >> end-to-end vae's despite being only tuned on ImageNet 256×256 improve generation performance while maintaining reconstruction fidelity across challenging scenes with multiple faces, subjects and

1

2

3

Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

17 days

how does it compare with repa? repa definitely helps. end-to-end tuning helps even more! >> surprisingly, we observed that once end-to-end tuned, E2E-VAEs lead to better performance over repa w/o requiring additional representation alignment losses during T2I training. (4/n)

1

2

3

Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

17 days

but do we require massive compute and datasets for end-to-end training for T2I? turns out no! >> we can end-to-end tune the vae on something as simple as imagenet, and use this end-to-end tuned VAE (E2E-VAE) for T2I training. and it just works! E2E-VAEs tuned on just imagenet

1

2

3