HanRong YE @leoyerrrr X Profile

HanRong YE

@leoyerrrr

Followers

892

Following

1K

Media

62

Statuses

371

@Nvidia Research Scientist | Open-Souce Omni-Modality LLMs

https://t.co/Pn2k34l5ji

Hong Kong

Joined October 2012

Don't wanna be here? Send us removal request.

HanRong YE

@leoyerrrr

18 days

OmniVinci is now #1 paper on Huggingface!!! 🤗 Building omni-modal LLMs is MORE than just mixing tokens 😉 At @NVIDIA, we explored deeper possibilities in building truly omni-modal systems — leading to OmniVinci-9B, which introduces three key innovations: - OmniAlignNet – a

11

27

148

HanRong YE

@leoyerrrr

5 days

Well our research finds that RL using FP4+LORA actually achieves faster reward growth and higher final accuracy than 16-bit LoRA and QLoRA: https://t.co/SmcEQMPiFv

Zichen Liu

@zzlccc

7 days

BF16 -> FP16 is such a simple (one configuration change in Oat) yet fundamental fix for inference-training mismatch. With FP16, the most basic importance sampling PG outperforms all algorithmic fixes in BF16. Let's rethink RL stability from the precision perspective.🔎

0

4

HanRong YE

@leoyerrrr

5 days

https://t.co/c8Su5OY8XV

0

1

HanRong YE

@leoyerrrr

5 days

OmniVinci: https://t.co/2efUT4oS4C OmniVideoBench： https://t.co/A7ubu8hPfz

github.com

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language. - NVlabs/OmniVinci

0

HanRong YE

@leoyerrrr

5 days

📈 According to the official test results from OmniVideoBench using our open-sourced model weights, OmniVinci has once again claimed #1 performance in the 7B LLM category! In addition, the model has been out for just two weeks and downloads have already soared past 6K 🌟

2

0

4

HanRong YE

@leoyerrrr

6 days

560A27B omni-modal MoE has landed 🤗 …but OmniVinci-9B is still my favorite size 😃

Meituan LongCat

@Meituan_LongCat

6 days

🔥 LongCat-Flash-Omni: Multimodal + Low-Latency 🏆 Leading Performance among Open-Source Omni-modal Models ☎️ Real-time Spoken Interaction: Millisecond-level E2E latency 🕒 128K context + Supports > 8min real-time AV interaction 🎥 Multimodal I/O: Arbitrary Combination of

0

2

Zongyu Lin

@zy27962986

7 days

🚀Really excited to see this amazing arch change (KDA) finally coming out! Replacing global attention with linear hybrid arch: better pretraining ppls, long context evals, downstream math&code&stem evals after RL, >6 * throughput at 1M to unblock more downstream potentials to

Kimi.ai

@Kimi_Moonshot

8 days

Kimi Linear Tech Report is dropped! 🚀 https://t.co/LwNB2sQnzM Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performance—ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi

1

18

55

HanRong YE

@leoyerrrr

9 days

And we at #NVIDIA Research are still seeking research interns to explore omni-modal LLMs across a variety of domains, including robotics (VLA), visual agentic tool using, world modeling, and unified understanding and generation. Drop me an email if you are interested!

HanRong YE

@leoyerrrr

18 days

OmniVinci is now #1 paper on Huggingface!!! 🤗 Building omni-modal LLMs is MORE than just mixing tokens 😉 At @NVIDIA, we explored deeper possibilities in building truly omni-modal systems — leading to OmniVinci-9B, which introduces three key innovations: - OmniAlignNet – a

0

1

12

Bryan Catanzaro

@ctnzr

10 days

NVIDIA is stepping up our support of openly developed AI across the industry. https://t.co/j4dJ4UqDck

blogs.nvidia.com

NVIDIA’s open model families, including NVIDIA Nemotron for digital AI, Cosmos for physical AI, Isaac GR00T for robotics and Clara for biomedical AI, provide developers with the foundation to build...

4

13

49

HanRong YE

@leoyerrrr

10 days

Jensen is streaming live - AI, 6G, Quantum, Models, Enterprise, Robotics, Factories 💚 #NVIDIAGTC #NVIDIA https://t.co/he1nst3nI9

0

1

6

HanRong YE

@leoyerrrr

10 days

🚀 📷🍌

Zhe Gan

@zhegan4

15 days

🎁🎁 We release Pico-Banana-400K, a large-scale, high-quality image editing dataset distilled from Nana-Banana across 35 editing types. 🔗 Data link: https://t.co/mi06ddf3mN 🔗Paper link: https://t.co/AaZM02xcJr It includes 258K single-turn image editing data, 72K multi-turn

0

4

HanRong YE

@leoyerrrr

10 days

Your model can do COT reasoning — but is it actually correct?👿

Yusu Qian

@sueqian111

10 days

🧩 New paper out! We introduce PRISM-Bench, a diagnostic benchmark for puzzle-based multimodal reasoning. Unlike standard VQA, PRISM-Bench tests not only if models can solve visual puzzles, but how their reasoning unfolds. 💡 Models must spot the first error in a

0

8

HanRong YE

@leoyerrrr

12 days

@nvidia @NVIDIAAI

0

Yuandong Tian

@tydsh

15 days

Several of my team members + myself are impacted by this layoff today. Welcome to connect :)

475

284

7K

HanRong YE

@leoyerrrr

18 days

@nvidia Joint work of @leoyerrrr , @yin_hongxu , @huckiyang , @goelarushi27 , @AaronWeiHuang , @LigengZhu , Yuanhang Su†, Sean Lin†, @anjjei , Zhen Wan†, @MXzBFhjFpS1jyMI , @YumingLou , Dong Yang†, @zhijianliu_ , @yukangchen_ , @AmbrishDantrey, @ehsanjjjjj , @SreyanG , Daguang Xu,

0

1

HanRong YE

@leoyerrrr

18 days

@nvidia 📺Also check the nice video made by @huckiyang

0

1

2

HanRong YE

@leoyerrrr

18 days

@nvidia 🔗 OmniVinci by NVIDIA Research 🌐 Webpage: https://t.co/qRscvzkSkh 💻 GitHub: https://t.co/2efUT4oS4C 🤖 Model: https://t.co/06nWW6gYO2 📄 Paper:

huggingface.co

1

6

HanRong YE

@leoyerrrr

20 days

Off to ICCV! Also, we have an omni-modal LLM reveal coming next Monday… straight from Hawaiiiii 🌴

1

4

34

Shizhe Diao

@shizhediao

21 days

Proud to see NVIDIA recognized by AI World as a leader in the open-source AI ecosystem. From Nemotron and BioNeMo to Cosmos, GR00T, and Canary, our contributions span foundation models, scientific computing, and agentic reasoning. I feel sooo excited to be part of the Nemotron

6

7

37

HanRong YE

@leoyerrrr

1 month

If you’re interested, please send your CV and cover letter to hanrongy@nvidia.com

0

1