Junlin (Hans) Han @han_junlin X Profile

Junlin (Hans) Han

@han_junlin

Followers

644

Following

2K

Media

21

Statuses

204

AI research. PhD student at Meta @AIatMeta and Oxford @OxfordTVG

https://t.co/nSou1uT7Xl

London, England

Joined July 2020

Don't wanna be here? Send us removal request.

Junlin (Hans) Han

@han_junlin

2 months

Excited to share our new work: “Learning to See Before Seeing”! 🧠➡️👀 We investigate an interesting phenomeno: how do LLMs, trained only on text, learn about the visual world? Project page: https://t.co/9mQt3qnckL

7

24

149

Junlin (Hans) Han

@han_junlin

2 months

Congrats to the team! Such a simple yet graceful way to unlock the latent multimodal capacities of text-trained LLMs. The idea of using sensory prompts to actively steer representation alignment is very cool—both practically useful and conceptually deep.

Sophie Wang

@SophieLWang

2 months

LLMs, trained only on text, might already know more about other modalities than we realized; we just need to find ways elicit it. project page: https://t.co/8cIf1DW0OQ w/ @phillip_isola and @thisismyhat

1

0

10

Junlin (Hans) Han

@han_junlin

2 months

I have been trying to advance 3D gen from video gen back in 2024. This project is the ultimate version along this path. My great honor to participate. 3D perception is a form of visual common sense. We do not need any explicit 3D reps for generation. (As bitter lesson says)

Shikun Liu

@liu_shikun

2 months

Introducing Kaleido💮 from @AIatMeta — a universal generative neural rendering engine for photorealistic, unified object and scene view synthesis. Kaleido is built on a simple but powerful design philosophy: 3D perception is a form of visual common sense. Following this idea,

3

1

47

Junlin (Hans) Han

@han_junlin

2 months

Many thanks for sharing our work!

AK

@_akhaliq

2 months

Learning to See Before Seeing Demystifying LLM Visual Priors from Language Pre-training

0

5

Junlin (Hans) Han

@han_junlin

2 months

Want to go deeper? Our full paper details 6 findings and 3 hypotheses. Beyond what's in this thread, we study relations to Platonic Representation Hypothesis, the contributions of language vs. vision, how language can 'hack' vision, and much more! See https://t.co/9mQt3qnckL

0

7

Junlin (Hans) Han

@han_junlin

2 months

Implications: Beyond a better understanding of visual priors, we show that rather than treating vision as an "add-on," stronger models can be built by instilling such priors during language pre-training from scratch, making later multimodal adaptation easier and more efficient.

1

0

5

Junlin (Hans) Han

@han_junlin

2 months

So how can we pre-train a "vision-aware" LLM? We search between language-favorable and vision-favorable recipes to find a "balanced" one: heavy on reasoning (>50%) + a small part of visual related text (~15%). This recipe boosts visual abilities by clear margins in 7B+1T scale.

1

0

5

Junlin (Hans) Han

@han_junlin

2 months

Perception Priors emerge from broad, diverse text (like web crawl). This builds a vocabulary for "what things are". Strong perception priors lead to better visual perception skills, such as stronger OCR and object exsistence VQA performances (with our proposed MLE-Bench).

1

0

6

Junlin (Hans) Han

@han_junlin

2 months

Reasoning Priors are from structured reasoning data like code, math, and scientific papers. This teaches the LLM abstract logic, which it can then easily transfer to solve visual tasks require reasoning.

1

0

7

Junlin (Hans) Han

@han_junlin

2 months

We show LLMs build 'visual priors'—latent visual capabilities from language pre-training that give them a massive head start for understanding the visual world. These visual priors split into two distinct components, Reasoning and Perception, with very different origins.

1

0

6

Junlin (Hans) Han

@han_junlin

5 months

We’ll be presenting this Flex3D work at ICML soon! Unfortunately, I won’t be able to attend due to a pending visa, but Filippos @filippos_kok will present it in the 11am session on 17 July (W-216). Drop by if you’re interested in 3D generation!

Junlin (Hans) Han

@han_junlin

1 year

Releasing Flex3D, a two-stage pipeline for generating high-quality 3D assets in a feed-forward manner, as a further step toward high-quality 3D generation and reconstruction. Project page: https://t.co/F5tuEKjSBs

0

2

41

Junlin (Hans) Han

@han_junlin

5 months

Cool video generation project that leverages an explicit memory mechanism! Very exciting to see this as a big fan of memory in AI!

Tomas Jakab

@JakabTomas

5 months

Excited to share VMem: a novel memory mechanism for consistent video scene generation 🎞️✨ VMem evolves its understanding of scene geometry to retrieve the most relevant past frames, enabling long-term consistency 🌐 https://t.co/AHBj6j1ecE 🤗 https://t.co/FbUbJHWW4F 1/ 🧵

0

5

Junlin (Hans) Han

@han_junlin

6 months

A new framework for native parallel generation in LLMs! Natively parallel generation in LLMs, backed by a full-stack release (model, engine, data, and more).

Infini-AI-Lab

@InfiniAILab

6 months

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: https://t.co/J9osByhWUf 🧵 1/n

0

5

Junlin (Hans) Han

@han_junlin

6 months

improving the method further, adding more content to the paper, and only then releasing it. He doesn’t rush to jump into the next project, but keeps improving the method and updating the codebase. In today’ AI, this kind of patience is truly rare and admirable!!

0

6

Junlin (Hans) Han

@han_junlin

6 months

Beyond the paper itself, one thing I think is especially worth mentioning is that Jianyuan has built up long-term expertise in 3D geometry and has an exceptional ability to stay calm and focused. He always keeps polishing papers after the initial submission.

1

0

5

Junlin (Hans) Han

@han_junlin

6 months

Congrats to @jianyuan_wang and the team for winning the CVPR Best Paper!!! I was still in the office helping Jianyuan with some final edits just 5 minutes before the submission deadline (around 7am?). Now I know—even a Best Paper’s writing can be rushed out in a week 🤣.

1

6

77

Foundation Models in the Wild @ ICLR 2025

@FM_in_Wild

7 months

🤩 It's happening today! Join us at the 2nd Workshop on Foundation Models in the Wild — Hall 4, #6, Singapore EXPO! 🔥 10 amazing invited talks 🔥 12 exciting oral presentations 🔥 Cutting-edge ideas and lively discussions 🚀 Don't miss it — come say hi and explore the future

0

13

27

Xinyu Yang ✈️ NeurIPS '25

@Xinyu2ML

7 months

We will be presenting "APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding", a novel encoding method that enables: 🚀Pre-caching Contexts for Fast Inference 🐍Re-using Positions for Long Context Our poster session is located in Hall 3 and Hall 2B,

Xinyu Yang ✈️ NeurIPS '25

@Xinyu2ML

10 months

📢 Announcing our new work "APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding" @iclr_conf 🚀 Enabling the efficient combination of multiple contexts with negligible prefilling cost 💅 Re-using the context window of LLMs to accommodate more and

0

23

51

Junlin (Hans) Han

@han_junlin

9 months

Check this amazing work if interested in 3D reconstruction and geometry. One thought: co-scaling data +architecture (usually self-attn) is almost working in all AI tasks or ill-posed problems. It’s also for 3D. Since 3D data is limited, combine and co-train on multiple tasks!

Jianyuan

@jianyuan_wang

9 months

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense

0

1

22

Foundation Models in the Wild @ ICLR 2025

@FM_in_Wild

9 months

😀We're delighted to announce that the review stage of our 2nd FM-Wild Workshop at ICLR has successfully concluded. We extend our sincere gratitude to all authors and reviewers for their valuable contributions. 👉The accepted papers are now available at:

openreview.net

Welcome to the OpenReview homepage for ICLR 2025 Workshop FM-Wild

0

4

9