Junlin (Hans) Han Profile
Junlin (Hans) Han

@han_junlin

Followers
644
Following
2K
Media
21
Statuses
204

AI research. PhD student at Meta @AIatMeta and Oxford @OxfordTVG

London, England
Joined July 2020
Don't wanna be here? Send us removal request.
@han_junlin
Junlin (Hans) Han
2 months
Excited to share our new work: “Learning to See Before Seeing”! 🧠➡️👀 We investigate an interesting phenomeno: how do LLMs, trained only on text, learn about the visual world? Project page: https://t.co/9mQt3qnckL
7
24
149
@han_junlin
Junlin (Hans) Han
2 months
Congrats to the team! Such a simple yet graceful way to unlock the latent multimodal capacities of text-trained LLMs. The idea of using sensory prompts to actively steer representation alignment is very cool—both practically useful and conceptually deep.
@SophieLWang
Sophie Wang
2 months
LLMs, trained only on text, might already know more about other modalities than we realized; we just need to find ways elicit it. project page: https://t.co/8cIf1DW0OQ w/ @phillip_isola and @thisismyhat
1
0
10
@han_junlin
Junlin (Hans) Han
2 months
I have been trying to advance 3D gen from video gen back in 2024. This project is the ultimate version along this path. My great honor to participate. 3D perception is a form of visual common sense. We do not need any explicit 3D reps for generation. (As bitter lesson says)
@liu_shikun
Shikun Liu
2 months
Introducing Kaleido💮 from @AIatMeta — a universal generative neural rendering engine for photorealistic, unified object and scene view synthesis. Kaleido is built on a simple but powerful design philosophy: 3D perception is a form of visual common sense. Following this idea,
3
1
47
@han_junlin
Junlin (Hans) Han
2 months
Many thanks for sharing our work!
@_akhaliq
AK
2 months
Learning to See Before Seeing Demystifying LLM Visual Priors from Language Pre-training
0
0
5
@han_junlin
Junlin (Hans) Han
2 months
Want to go deeper? Our full paper details 6 findings and 3 hypotheses. Beyond what's in this thread, we study relations to Platonic Representation Hypothesis, the contributions of language vs. vision, how language can 'hack' vision, and much more! See https://t.co/9mQt3qnckL
0
0
7
@han_junlin
Junlin (Hans) Han
2 months
Implications: Beyond a better understanding of visual priors, we show that rather than treating vision as an "add-on," stronger models can be built by instilling such priors during language pre-training from scratch, making later multimodal adaptation easier and more efficient.
1
0
5
@han_junlin
Junlin (Hans) Han
2 months
So how can we pre-train a "vision-aware" LLM? We search between language-favorable and vision-favorable recipes to find a "balanced" one: heavy on reasoning (>50%) + a small part of visual related text (~15%). This recipe boosts visual abilities by clear margins in 7B+1T scale.
1
0
5
@han_junlin
Junlin (Hans) Han
2 months
Perception Priors emerge from broad, diverse text (like web crawl). This builds a vocabulary for "what things are". Strong perception priors lead to better visual perception skills, such as stronger OCR and object exsistence VQA performances (with our proposed MLE-Bench).
1
0
6
@han_junlin
Junlin (Hans) Han
2 months
Reasoning Priors are from structured reasoning data like code, math, and scientific papers. This teaches the LLM abstract logic, which it can then easily transfer to solve visual tasks require reasoning.
1
0
7
@han_junlin
Junlin (Hans) Han
2 months
We show LLMs build 'visual priors'—latent visual capabilities from language pre-training that give them a massive head start for understanding the visual world. These visual priors split into two distinct components, Reasoning and Perception, with very different origins.
1
0
6
@han_junlin
Junlin (Hans) Han
5 months
We’ll be presenting this Flex3D work at ICML soon! Unfortunately, I won’t be able to attend due to a pending visa, but Filippos @filippos_kok will present it in the 11am session on 17 July (W-216). Drop by if you’re interested in 3D generation!
@han_junlin
Junlin (Hans) Han
1 year
Releasing Flex3D, a two-stage pipeline for generating high-quality 3D assets in a feed-forward manner, as a further step toward high-quality 3D generation and reconstruction. Project page: https://t.co/F5tuEKjSBs
0
2
41
@han_junlin
Junlin (Hans) Han
5 months
Cool video generation project that leverages an explicit memory mechanism! Very exciting to see this as a big fan of memory in AI!
@JakabTomas
Tomas Jakab
5 months
Excited to share VMem: a novel memory mechanism for consistent video scene generation 🎞️✨ VMem evolves its understanding of scene geometry to retrieve the most relevant past frames, enabling long-term consistency 🌐 https://t.co/AHBj6j1ecE 🤗 https://t.co/FbUbJHWW4F 1/ 🧵
0
0
5
@han_junlin
Junlin (Hans) Han
6 months
A new framework for native parallel generation in LLMs! Natively parallel generation in LLMs, backed by a full-stack release (model, engine, data, and more).
@InfiniAILab
Infini-AI-Lab
6 months
🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: https://t.co/J9osByhWUf 🧵 1/n
0
0
5
@han_junlin
Junlin (Hans) Han
6 months
improving the method further, adding more content to the paper, and only then releasing it. He doesn’t rush to jump into the next project, but keeps improving the method and updating the codebase. In today’ AI, this kind of patience is truly rare and admirable!!
0
0
6
@han_junlin
Junlin (Hans) Han
6 months
Beyond the paper itself, one thing I think is especially worth mentioning is that Jianyuan has built up long-term expertise in 3D geometry and has an exceptional ability to stay calm and focused. He always keeps polishing papers after the initial submission.
1
0
5
@han_junlin
Junlin (Hans) Han
6 months
Congrats to @jianyuan_wang and the team for winning the CVPR Best Paper!!! I was still in the office helping Jianyuan with some final edits just 5 minutes before the submission deadline (around 7am?). Now I know—even a Best Paper’s writing can be rushed out in a week 🤣.
1
6
77
@FM_in_Wild
Foundation Models in the Wild @ ICLR 2025
7 months
🤩 It's happening today! Join us at the 2nd Workshop on Foundation Models in the Wild — Hall 4, #6, Singapore EXPO! 🔥 10 amazing invited talks 🔥 12 exciting oral presentations 🔥 Cutting-edge ideas and lively discussions 🚀 Don't miss it — come say hi and explore the future
0
13
27
@Xinyu2ML
Xinyu Yang ✈️ NeurIPS '25
7 months
We will be presenting "APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding", a novel encoding method that enables: 🚀Pre-caching Contexts for Fast Inference 🐍Re-using Positions for Long Context Our poster session is located in Hall 3 and Hall 2B,
@Xinyu2ML
Xinyu Yang ✈️ NeurIPS '25
10 months
📢 Announcing our new work "APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding" @iclr_conf 🚀 Enabling the efficient combination of multiple contexts with negligible prefilling cost 💅 Re-using the context window of LLMs to accommodate more and
0
23
51
@han_junlin
Junlin (Hans) Han
9 months
Check this amazing work if interested in 3D reconstruction and geometry. One thought: co-scaling data +architecture (usually self-attn) is almost working in all AI tasks or ill-posed problems. It’s also for 3D. Since 3D data is limited, combine and co-train on multiple tasks!
@jianyuan_wang
Jianyuan
9 months
Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense
0
1
22
@FM_in_Wild
Foundation Models in the Wild @ ICLR 2025
9 months
😀We're delighted to announce that the review stage of our 2nd FM-Wild Workshop at ICLR has successfully concluded. We extend our sincere gratitude to all authors and reviewers for their valuable contributions. 👉The accepted papers are now available at:
openreview.net
Welcome to the OpenReview homepage for ICLR 2025 Workshop FM-Wild
0
4
9