Jialuo Li @JialuoLi1007 X Profile

Jialuo Li

@JialuoLi1007

Followers

102

Following

6

Media

12

Statuses

29

CS Graduate Student @Gatech | Computer Vision | Deep Learning ; Prev: CS Undergraduate @Tsinghua Uni, Yao Class l Research Intern @nyuniversity @MSFTResearch

https://t.co/73y6R1f90C

Atlanta, GA, USA

Joined February 2024

Don't wanna be here? Send us removal request.

Jialuo Li

@JialuoLi1007

7 months

🚀 Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation! [CVPR 2025] 📜 Paper: https://t.co/ybG6z3MQbd 🌐 Project: https://t.co/IBJodI0Uvm 💻 Code: https://t.co/voFOyXPRhi 🤗 Dataset: https://t.co/fjKgXkiB8q 🔍

4

32

140

Wenhao Chai

@wenhaocha1

27 days

LiveCodeBench Pro remains one of the most challenging code benchmarks, but its evaluation and verification process is still a black box. We introduce AutoCode, which democratizes evaluation allowing anyone to locally run verification and perform RL training! For the first time,

4

30

124

Saining Xie

@sainingxie

1 month

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

56

334

2K

Wenhao Chai

@wenhaocha1

1 month

Introducing VideoNSA We started working on compression for video-LMM back in 2023. MovieChat focuses on inter-frame compression, while AuroraCap focuses on intra-frame compression. After the emergence of NSA, we realized that the manually set heuristics we relied on should

Enxin Song

@EnxinSong

1 month

Token compression causes irreversible information loss in video understanding. 🤔 What can we do with sparse attention? We introduce VideoNSA, a hardware-aware and learnable hybrid sparse attention mechanism that scales to 128K context length.

0

14

99

Qwen

@Alibaba_Qwen

1 month

🚀 Qwen3-VL-30B-A3B-Instruct & Thinking are here! Smaller size, same powerhouse performance 💪—packed with all the capabilities of Qwen3-VL! 🔧 With just 3B active params, it’s rivaling GPT-5-Mini & Claude4-Sonnet — and often beating them across STEM, VQA, OCR, Video, Agent

87

320

2K

OpenAI

@OpenAI

1 month

Sora 2 is here.

2K

22K

Aran Komatsuzaki

@arankomatsuzaki

2 months

Veo 3 = Zero-shot video reasoner • Trained on web-scale video, shows broad zero-shot skills (perception → physics → manipulation → reasoning) • New “Chain-of-Frames” reasoning = visual analogue of CoT • Big jump Veo2 → Veo3: edits, memory, symmetry, mazes, analogies •

9

58

309

Qwen

@Alibaba_Qwen

2 months

🚀 We're thrilled to unveil Qwen3-VL — the most powerful vision-language model in the Qwen series yet! 🔥 The flagship model Qwen3-VL-235B-A22B is now open-sourced and available in both Instruct and Thinking versions: ✅ Instruct outperforms Gemini 2.5 Pro on key vision

81

301

2K

Kaiwen Zheng

@zkwthu

2 months

1/ Excited to share the latest work with @chenhuay17! We propose DiffusionNFT, a new online diffusion RL paradigm that optimizes directly on the forward diffusion process. Paper: https://t.co/oacDQZua6I Code: https://t.co/4UBx26TOyz

3

13

34

Aran Komatsuzaki

@arankomatsuzaki

2 months

Apple presents Manzano: Simple & scalable unified multimodal LLM • Hybrid vision tokenizer (continuous ↔ discrete) cuts task conflict • SOTA on text-rich benchmarks, competitive in gen vs GPT-4o/Nano Banana • One model for both understanding & generation • Joint recipe:

4

64

322

Thinking Machines

@thinkymachines

2 months

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

237

1K

8K

Google DeepMind

@GoogleDeepMind

3 months

Image generation with Gemini just got a bananas upgrade and is the new state-of-the-art image generation and editing model. 🤯 From photorealistic masterpieces to mind-bending fantasy worlds, you can now natively produce, edit and refine visuals with new levels of reasoning,

184

539

3K

Humphrey Shi

@humphrey_shi

7 months

Over 4 years into our journey bridging Convolutions and Transformers, we introduce Generalized Neighborhood Attention—Multi-dimensional Sparse Attention at the Speed of Light: https://t.co/9awxf2Ogt9 A collaboration with the best minds in AI and HPC. 🐝🟩🟧 @gtcomputing @nvidia

0

30

125

Lucas Beyer (bl16)

@giffmana

7 months

This paper is interestingly thought- provoking for me. There is a chance, that it's easier to "align t2i model with real physics" in post-training. And let it learn to generate whatever (physically implausible) combinations in pretrain. As opposed to trying hard to come up with

Jialuo Li

@JialuoLi1007

7 months

🚀 Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation! [CVPR 2025] 📜 Paper: https://t.co/ybG6z3MQbd 🌐 Project: https://t.co/IBJodI0Uvm 💻 Code: https://t.co/voFOyXPRhi 🤗 Dataset: https://t.co/fjKgXkiB8q 🔍

8

17

212

Sayak Paul

@RisingSayak

7 months

Embedding a scientific basis in pre-trained T2I models can enhance the realism and consistency of the results. Cool work in "Science-T2I: Addressing Scientific Illusions in Image Synthesis" https://t.co/M6HisxOhK8

1

16

103

Jialuo Li

@JialuoLi1007

7 months

@NYU_Courant @upennnlp @mlpcucsd @UW 🥰 Working on this project is a truly unforgettable experience! Thanks to all my awesome collaborators @XingyuFu2 @wenhaocha1 @HaiyangXu3110 and advisor @sainingxie for making this project possible! [n/n]

0

1

Jialuo Li

@JialuoLi1007

7 months

@NYU_Courant @upennnlp @mlpcucsd @UW 🤩 For more technical details, please checkout our paper, website, code and dataset! If you have any question related to our work, feel free to contact! [7/n]

1

0

2

Jialuo Li

@JialuoLi1007

7 months

@NYU_Courant @upennnlp @mlpcucsd @UW To address this, we propose a two-stage training framework 🚀 for fine-tuning text-to-image (T2I) models to enhance scientific realism. It includes supervised fine-tuning followed by online fine-tuning. Experiments show a 50%+ improvement on Science-T2I-S! 📊 [6/n]

1

0

3

Jialuo Li

@JialuoLi1007

7 months

@NYU_Courant @upennnlp @mlpcucsd @UW 🌟 Leverage SciScore, we can assess how well text-to-image diffusion models generate scientifically accurate visuals. While these models excel at creating detailed and aesthetically pleasing images from text, they often fail to align with real-world scientific principles! [5/n]

1

0

2

Jialuo Li

@JialuoLi1007

7 months

@NYU_Courant @upennnlp @mlpcucsd @UW Experiments conducted on the Science-T2I-S&C reveal not only the impressive performance of SciScore✨ but also highlight the significant limitations of LMMs and VLMs . Particularly concerning (‼️) is the performance of VLMs, which perform close to random guessing. [4/n]

1

0

2

Jialuo Li

@JialuoLi1007

7 months

@NYU_Courant @upennnlp @mlpcucsd @UW We also create two benchmarks from the Science-T2I Dataset, Science-T2I-S and Science-T2I-C, to evaluate how well LMMs and VLMs can distinguish ✅ real from ❌ fake scientific images. (Submit your results on the leaderboard!) [3/n]

1

0

1