Jaskirat Singh @ ICCV2025🌴 Profile
Jaskirat Singh @ ICCV2025🌴

@1jaskiratsingh

Followers
326
Following
596
Media
26
Statuses
243

Ph.D. Candidate at Australian National University | Intern @AIatMeta GenAI | @AdobeResearch | Multimodal Fusion Models and Agents | R2E-Gym | REPA-E

Seattle, Washington
Joined June 2018
Don't wanna be here? Send us removal request.
@1jaskiratsingh
Jaskirat Singh @ ICCV2025🌴
7 months
Can we optimize both the VAE tokenizer and diffusion model together in an end-to-end manner? Short Answer: Yes. 🚨 Introducing REPA-E: the first end-to-end tuning approach for jointly optimizing both the VAE and the latent diffusion model using REPA loss 🚨 Key Idea: 🧠
7
31
170
@shushengyang
Shusheng Yang
1 day
[Videos are entanglements of space and time.] Around one year ago, we released VSI-Bench, in which we studied visual spatial intelligence: a fundamental but missing pillar of current MLLMs. Today, we are excited to introduce Cambrian-S, our further step that goes beyond visual
@sainingxie
Saining Xie
2 days
Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶
2
15
67
@abertsch72
Amanda Bertsch
1 day
Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
7
44
163
@sainingxie
Saining Xie
2 days
Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶
13
68
468
@drfeifei
Fei-Fei Li
2 days
It’s an honor to have received the @QEPrize along with my fellow laureates! But it’s also a responsibility. AI’s impact to humanity is in the hands of all of us.
@RoyalFamily
The Royal Family
3 days
Today, The King presented The Queen Elizabeth Prize for Engineering at St James's Palace, celebrating the innovations which are transforming our world.   🧠 This year’s prize honours seven pioneers whose work has shaped modern artificial intelligence. 🔗 Find out more:
92
94
2K
@sainingxie
Saining Xie
3 days
you can’t build superintelligence without first building supersensing
30
32
288
@jyangballin
John Yang
3 days
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
25
88
354
@LINJIEFUN
Linjie (Lindsey) Li
5 days
Check out our work ThinkMorph, which thinks in multi-modalities, not just with them.
@Kuvvius
Jiawei Gu
5 days
🚨Sensational title alert: we may have cracked the code to true multimodal reasoning. Meet ThinkMorph — thinking in modalities, not just with them. And what we found was... unexpected. 👀 Emergent intelligence, strong gains, and …🫣 🧵 https://t.co/2GPHnsPq7R (1/16)
1
9
28
@slimshetty_
Manish Shetty
5 days
Tests certify functional behavior; they don’t judge intent. GSO, our code optimization benchmark, now combines tests with a rubric-driven HackDetector to identify models that game the benchmark. We found that up to 30% of a model’s attempts are non-idiomatic reward hacks, which
1
3
15
@StringChaos
Naman Jain
5 days
We added LLM judge based hack detector to our code optimization evals and found models perform non-idiomatic code changes in upto 30% of the problems 🤯
@slimshetty_
Manish Shetty
5 days
Tests certify functional behavior; they don’t judge intent. GSO, our code optimization benchmark, now combines tests with a rubric-driven HackDetector to identify models that game the benchmark. We found that up to 30% of a model’s attempts are non-idiomatic reward hacks, which
0
2
7
@1jaskiratsingh
Jaskirat Singh @ ICCV2025🌴
17 days
end-to-end training just makes latent diffusion transformers better! with repa-e, we showed the power of end-to-end training on imagenet. today we are extending it to text-to-image (T2I) generation. #ICCV2025 🌴 🚨 Introducing "REPA-E for T2I: family of end-to-end tuned VAEs for
1
17
42
@RisingSayak
Sayak Paul
10 days
With simple changes, I was able to cut down @krea_ai's new real-time video gen's timing from 25.54s to 18.14s 🔥🚀 1. FA3 through `kernels` 2. Regional compilation 3. Selective (FP8) quantization Notes are in 🧵 below
5
13
108
@JCJesseLai
Chieh-Hsin (Jesse) Lai
11 days
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
43
431
2K
@wenhaocha1
Wenhao Chai
13 days
Back in 2024, LMMs-Eval built a complete evaluation ecosystem for the MLLM/LMM community, with countless researchers contributing their models and benchmarks to raise the whole edifice. I was fortunate to be one of them: our series of video-LMM works (MovieChat, AuroraCap, VDC)
@BoLi68567011
Brian Bo Li
15 days
Throughout my journey in developing multimodal models, I’ve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly — with full
2
3
29
@sidawxyz
Sida Wang
16 days
I have one PhD intern opening to do research as a part of a model training effort at the FAIR CodeGen team (latest: Code World Model). If interested, email me directly and apply at
Tweet card summary image
metacareers.com
Meta's mission is to build the future of human connection and the technology that makes it possible.
7
27
235
@1jaskiratsingh
Jaskirat Singh @ ICCV2025🌴
17 days
best part - all E2E-VAEs can be used within few lines of code!! from diffusers import AutoencoderKL vae = AutoencoderKL.from_pretrained("REPA-E/e2e-flux-vae").to("cuda") please see project page for more details. https://t.co/uUiINuccVn (7/n)
1
2
3
@1jaskiratsingh
Jaskirat Singh @ ICCV2025🌴
17 days
but why does end-to-end tuning on imagenet generalize to complex t2i training? answer: better latent space structure. >> we found that this works because e2e-tuning automatically injects better spatial structure and semantic details into the VAE representations. Thus E2E-VAEs
1
2
2
@1jaskiratsingh
Jaskirat Singh @ ICCV2025🌴
17 days
what about impact of end-to-end training on reconstruction performance of vae's? >> end-to-end vae's despite being only tuned on ImageNet 256×256 improve generation performance while maintaining reconstruction fidelity across challenging scenes with multiple faces, subjects and
1
2
3
@1jaskiratsingh
Jaskirat Singh @ ICCV2025🌴
17 days
how does it compare with repa? repa definitely helps. end-to-end tuning helps even more! >> surprisingly, we observed that once end-to-end tuned, E2E-VAEs lead to better performance over repa w/o requiring additional representation alignment losses during T2I training. (4/n)
1
2
3
@1jaskiratsingh
Jaskirat Singh @ ICCV2025🌴
17 days
but do we require massive compute and datasets for end-to-end training for T2I? turns out no! >> we can end-to-end tune the vae on something as simple as imagenet, and use this end-to-end tuned VAE (E2E-VAE) for T2I training. and it just works! E2E-VAEs tuned on just imagenet
1
2
3