JialuoLi1007 Profile Banner
Jialuo Li Profile
Jialuo Li

@JialuoLi1007

Followers
102
Following
6
Media
12
Statuses
29

CS Graduate Student @Gatech | Computer Vision | Deep Learning ; Prev: CS Undergraduate @Tsinghua Uni, Yao Class l Research Intern @nyuniversity @MSFTResearch

Atlanta, GA, USA
Joined February 2024
Don't wanna be here? Send us removal request.
@JialuoLi1007
Jialuo Li
7 months
🚀 Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation! [CVPR 2025] 📜 Paper: https://t.co/ybG6z3MQbd 🌐 Project: https://t.co/IBJodI0Uvm 💻 Code: https://t.co/voFOyXPRhi 🤗 Dataset: https://t.co/fjKgXkiB8q 🔍
4
32
140
@wenhaocha1
Wenhao Chai
27 days
LiveCodeBench Pro remains one of the most challenging code benchmarks, but its evaluation and verification process is still a black box. We introduce AutoCode, which democratizes evaluation allowing anyone to locally run verification and perform RL training! For the first time,
4
30
124
@sainingxie
Saining Xie
1 month
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
56
334
2K
@wenhaocha1
Wenhao Chai
1 month
Introducing VideoNSA We started working on compression for video-LMM back in 2023. MovieChat focuses on inter-frame compression, while AuroraCap focuses on intra-frame compression. After the emergence of NSA, we realized that the manually set heuristics we relied on should
@EnxinSong
Enxin Song
1 month
Token compression causes irreversible information loss in video understanding. 🤔 What can we do with sparse attention? We introduce VideoNSA, a hardware-aware and learnable hybrid sparse attention mechanism that scales to 128K context length.
0
14
99
@Alibaba_Qwen
Qwen
1 month
🚀 Qwen3-VL-30B-A3B-Instruct & Thinking are here! Smaller size, same powerhouse performance 💪—packed with all the capabilities of Qwen3-VL! 🔧 With just 3B active params, it’s rivaling GPT-5-Mini & Claude4-Sonnet — and often beating them across STEM, VQA, OCR, Video, Agent
87
320
2K
@OpenAI
OpenAI
1 month
Sora 2 is here.
2K
2K
22K
@arankomatsuzaki
Aran Komatsuzaki
2 months
Veo 3 = Zero-shot video reasoner • Trained on web-scale video, shows broad zero-shot skills (perception → physics → manipulation → reasoning) • New “Chain-of-Frames” reasoning = visual analogue of CoT • Big jump Veo2 → Veo3: edits, memory, symmetry, mazes, analogies •
9
58
309
@Alibaba_Qwen
Qwen
2 months
🚀 We're thrilled to unveil Qwen3-VL — the most powerful vision-language model in the Qwen series yet! 🔥 The flagship model Qwen3-VL-235B-A22B is now open-sourced and available in both Instruct and Thinking versions: ✅ Instruct outperforms Gemini 2.5 Pro on key vision
81
301
2K
@zkwthu
Kaiwen Zheng
2 months
1/ Excited to share the latest work with @chenhuay17! We propose DiffusionNFT, a new online diffusion RL paradigm that optimizes directly on the forward diffusion process. Paper: https://t.co/oacDQZua6I Code: https://t.co/4UBx26TOyz
3
13
34
@arankomatsuzaki
Aran Komatsuzaki
2 months
Apple presents Manzano: Simple & scalable unified multimodal LLM • Hybrid vision tokenizer (continuous ↔ discrete) cuts task conflict • SOTA on text-rich benchmarks, competitive in gen vs GPT-4o/Nano Banana • One model for both understanding & generation • Joint recipe:
4
64
322
@thinkymachines
Thinking Machines
2 months
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
237
1K
8K
@GoogleDeepMind
Google DeepMind
3 months
Image generation with Gemini just got a bananas upgrade and is the new state-of-the-art image generation and editing model. 🤯 From photorealistic masterpieces to mind-bending fantasy worlds, you can now natively produce, edit and refine visuals with new levels of reasoning,
184
539
3K
@humphrey_shi
Humphrey Shi
7 months
Over 4 years into our journey bridging Convolutions and Transformers, we introduce Generalized Neighborhood Attention—Multi-dimensional Sparse Attention at the Speed of Light: https://t.co/9awxf2Ogt9 A collaboration with the best minds in AI and HPC. 🐝🟩🟧 @gtcomputing @nvidia
0
30
125
@giffmana
Lucas Beyer (bl16)
7 months
This paper is interestingly thought- provoking for me. There is a chance, that it's easier to "align t2i model with real physics" in post-training. And let it learn to generate whatever (physically implausible) combinations in pretrain. As opposed to trying hard to come up with
@JialuoLi1007
Jialuo Li
7 months
🚀 Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation! [CVPR 2025] 📜 Paper: https://t.co/ybG6z3MQbd 🌐 Project: https://t.co/IBJodI0Uvm 💻 Code: https://t.co/voFOyXPRhi 🤗 Dataset: https://t.co/fjKgXkiB8q 🔍
8
17
212
@RisingSayak
Sayak Paul
7 months
Embedding a scientific basis in pre-trained T2I models can enhance the realism and consistency of the results. Cool work in "Science-T2I: Addressing Scientific Illusions in Image Synthesis" https://t.co/M6HisxOhK8
1
16
103
@JialuoLi1007
Jialuo Li
7 months
@NYU_Courant @upennnlp @mlpcucsd @UW 🥰 Working on this project is a truly unforgettable experience! Thanks to all my awesome collaborators @XingyuFu2 @wenhaocha1 @HaiyangXu3110 and advisor @sainingxie for making this project possible! [n/n]
0
0
1
@JialuoLi1007
Jialuo Li
7 months
@NYU_Courant @upennnlp @mlpcucsd @UW 🤩 For more technical details, please checkout our paper, website, code and dataset! If you have any question related to our work, feel free to contact! [7/n]
1
0
2
@JialuoLi1007
Jialuo Li
7 months
@NYU_Courant @upennnlp @mlpcucsd @UW To address this, we propose a two-stage training framework 🚀 for fine-tuning text-to-image (T2I) models to enhance scientific realism. It includes supervised fine-tuning followed by online fine-tuning. Experiments show a 50%+ improvement on Science-T2I-S! 📊 [6/n]
1
0
3
@JialuoLi1007
Jialuo Li
7 months
@NYU_Courant @upennnlp @mlpcucsd @UW 🌟 Leverage SciScore, we can assess how well text-to-image diffusion models generate scientifically accurate visuals. While these models excel at creating detailed and aesthetically pleasing images from text, they often fail to align with real-world scientific principles! [5/n]
1
0
2
@JialuoLi1007
Jialuo Li
7 months
@NYU_Courant @upennnlp @mlpcucsd @UW Experiments conducted on the Science-T2I-S&C reveal not only the impressive performance of SciScore✨ but also highlight the significant limitations of LMMs and VLMs . Particularly concerning (‼️) is the performance of VLMs, which perform close to random guessing. [4/n]
1
0
2
@JialuoLi1007
Jialuo Li
7 months
@NYU_Courant @upennnlp @mlpcucsd @UW We also create two benchmarks from the Science-T2I Dataset, Science-T2I-S and Science-T2I-C, to evaluate how well LMMs and VLMs can distinguish ✅ real from ❌ fake scientific images. (Submit your results on the leaderboard!) [3/n]
1
0
1