Jialuo Li
@JialuoLi1007
Followers
102
Following
6
Media
12
Statuses
29
CS Graduate Student @Gatech | Computer Vision | Deep Learning ; Prev: CS Undergraduate @Tsinghua Uni, Yao Class l Research Intern @nyuniversity @MSFTResearch
Atlanta, GA, USA
Joined February 2024
🚀 Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation! [CVPR 2025] 📜 Paper: https://t.co/ybG6z3MQbd 🌐 Project: https://t.co/IBJodI0Uvm 💻 Code: https://t.co/voFOyXPRhi 🤗 Dataset: https://t.co/fjKgXkiB8q 🔍
4
32
140
LiveCodeBench Pro remains one of the most challenging code benchmarks, but its evaluation and verification process is still a black box. We introduce AutoCode, which democratizes evaluation allowing anyone to locally run verification and perform RL training! For the first time,
4
30
124
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
56
334
2K
Introducing VideoNSA We started working on compression for video-LMM back in 2023. MovieChat focuses on inter-frame compression, while AuroraCap focuses on intra-frame compression. After the emergence of NSA, we realized that the manually set heuristics we relied on should
Token compression causes irreversible information loss in video understanding. 🤔 What can we do with sparse attention? We introduce VideoNSA, a hardware-aware and learnable hybrid sparse attention mechanism that scales to 128K context length.
0
14
99
🚀 Qwen3-VL-30B-A3B-Instruct & Thinking are here! Smaller size, same powerhouse performance 💪—packed with all the capabilities of Qwen3-VL! 🔧 With just 3B active params, it’s rivaling GPT-5-Mini & Claude4-Sonnet — and often beating them across STEM, VQA, OCR, Video, Agent
87
320
2K
Veo 3 = Zero-shot video reasoner • Trained on web-scale video, shows broad zero-shot skills (perception → physics → manipulation → reasoning) • New “Chain-of-Frames” reasoning = visual analogue of CoT • Big jump Veo2 → Veo3: edits, memory, symmetry, mazes, analogies •
9
58
309
🚀 We're thrilled to unveil Qwen3-VL — the most powerful vision-language model in the Qwen series yet! 🔥 The flagship model Qwen3-VL-235B-A22B is now open-sourced and available in both Instruct and Thinking versions: ✅ Instruct outperforms Gemini 2.5 Pro on key vision
81
301
2K
1/ Excited to share the latest work with @chenhuay17! We propose DiffusionNFT, a new online diffusion RL paradigm that optimizes directly on the forward diffusion process. Paper: https://t.co/oacDQZua6I Code: https://t.co/4UBx26TOyz
3
13
34
Apple presents Manzano: Simple & scalable unified multimodal LLM • Hybrid vision tokenizer (continuous ↔ discrete) cuts task conflict • SOTA on text-rich benchmarks, competitive in gen vs GPT-4o/Nano Banana • One model for both understanding & generation • Joint recipe:
4
64
322
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
237
1K
8K
Image generation with Gemini just got a bananas upgrade and is the new state-of-the-art image generation and editing model. 🤯 From photorealistic masterpieces to mind-bending fantasy worlds, you can now natively produce, edit and refine visuals with new levels of reasoning,
184
539
3K
Over 4 years into our journey bridging Convolutions and Transformers, we introduce Generalized Neighborhood Attention—Multi-dimensional Sparse Attention at the Speed of Light: https://t.co/9awxf2Ogt9 A collaboration with the best minds in AI and HPC. 🐝🟩🟧 @gtcomputing @nvidia
0
30
125
This paper is interestingly thought- provoking for me. There is a chance, that it's easier to "align t2i model with real physics" in post-training. And let it learn to generate whatever (physically implausible) combinations in pretrain. As opposed to trying hard to come up with
🚀 Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation! [CVPR 2025] 📜 Paper: https://t.co/ybG6z3MQbd 🌐 Project: https://t.co/IBJodI0Uvm 💻 Code: https://t.co/voFOyXPRhi 🤗 Dataset: https://t.co/fjKgXkiB8q 🔍
8
17
212
Embedding a scientific basis in pre-trained T2I models can enhance the realism and consistency of the results. Cool work in "Science-T2I: Addressing Scientific Illusions in Image Synthesis" https://t.co/M6HisxOhK8
1
16
103
@NYU_Courant @upennnlp @mlpcucsd @UW 🥰 Working on this project is a truly unforgettable experience! Thanks to all my awesome collaborators @XingyuFu2 @wenhaocha1 @HaiyangXu3110 and advisor @sainingxie for making this project possible! [n/n]
0
0
1
@NYU_Courant @upennnlp @mlpcucsd @UW 🤩 For more technical details, please checkout our paper, website, code and dataset! If you have any question related to our work, feel free to contact! [7/n]
1
0
2
@NYU_Courant @upennnlp @mlpcucsd @UW To address this, we propose a two-stage training framework 🚀 for fine-tuning text-to-image (T2I) models to enhance scientific realism. It includes supervised fine-tuning followed by online fine-tuning. Experiments show a 50%+ improvement on Science-T2I-S! 📊 [6/n]
1
0
3
@NYU_Courant @upennnlp @mlpcucsd @UW 🌟 Leverage SciScore, we can assess how well text-to-image diffusion models generate scientifically accurate visuals. While these models excel at creating detailed and aesthetically pleasing images from text, they often fail to align with real-world scientific principles! [5/n]
1
0
2
@NYU_Courant @upennnlp @mlpcucsd @UW Experiments conducted on the Science-T2I-S&C reveal not only the impressive performance of SciScore✨ but also highlight the significant limitations of LMMs and VLMs . Particularly concerning (‼️) is the performance of VLMs, which perform close to random guessing. [4/n]
1
0
2
@NYU_Courant @upennnlp @mlpcucsd @UW We also create two benchmarks from the Science-T2I Dataset, Science-T2I-S and Science-T2I-C, to evaluate how well LMMs and VLMs can distinguish ✅ real from ❌ fake scientific images. (Submit your results on the leaderboard!) [3/n]
1
0
1