Zhiheng LYU @ZhihengLyu X Profile

Zhiheng LYU

@ZhihengLyu

Followers

138

Following

80

Media

8

Statuses

33

Code Agent Post-training for MiniMax-M2; Core Contributor of VerlTool; MMath Student @UWaterloo TIGER-Lab Prev @HKUniversity @ETH_en @UCBerkeley

https://t.co/q0aKUS0a1V

Waterloo, CA

Joined May 2022

Don't wanna be here? Send us removal request.

Lingming Zhang

@LingmingZhang

18 days

🤯🤯🤯 Gemini 3 Pro + Live-SWE-agent hits 77.4% on SWE-bench Verified, beating ALL existing models, including Claude 4.5!! 🤖 Live-SWE-agent is the first live software agent that autonomously self-evolves on the fly — and it even outperforms the manually engineered scaffold

33

71

479

Hanqi Yan

@yan_hanqi

1 month

🚀 Thrilled to announce that I’ll be attending EMNLP 2025 (4Nov-9Nov) in Suzhou, China! 🇨🇳✨ I’ll be showcasing our latest research from #KCLNLP on implicit Chain-of-Thoughts (CoTs) and an AI Scientist demo system 🤖🧠 📘 CODI: Compressing Chain-of-Thought into Continuous Space

lnkd.in

This link will take you to a page that’s not on LinkedIn

2

4

39

Jiarui Liu

@Jiarui_Liu_

1 month

Our EMNLP 2025 paper "Synthetic Socratic Debates" is presenting today in Suzhou! 📍 Poster Session 1 🕚 Nov 5, 11:00 AM (Beijing) Come chat about how LLM personas shape moral reasoning & persuasion! 🔗

arxiv.org

As large language models (LLMs) are increasingly used in morally sensitive domains, it is crucial to understand how persona traits affect their moral reasoning and persuasive behavior. We present...

1

8

24

Zhiheng LYU

@ZhihengLyu

1 month

Actually saw it climb from 8th to 2nd; let's see what happens when the free trial ends : )

Skyler Miao

@SkylerMiao7

1 month

MiniMax-M2 the #2

0

1

Yuntian Deng

@yuntiandeng

1 month

My student Wentao reproduced Self-Adapting LMs and wrote a blog on lessons learned. Highly recommended for anyone adapting LMs! He's also looking for a summer internship. He has 2 first-author EMNLP papers after just one year! 🔗 https://t.co/OK9O2shJhJ 🔗 https://t.co/506fLt3FmL

Wentao Zhang

@wtzhang0820

1 month

🚨New reproduction study We re-implemented SEAL (Self-Adapting LMs) & confirmed results... but found sth surprising: ✅Self-editing gives most gains ❌RL is costly & adds little for instruct models 🤖External editors (GPT-5) are cheaper+competitive Blog: https://t.co/kHHECDZcq2

4

15

115

Yuansheng Ni

@YuanshengNi

1 month

📢 Introducing VisCoder2: Building Multi-Language Visualization Coding Agents! Existing LLMs often fail in practical workflows due to limited language coverage, unreliable execution, and a lack of iterative correction mechanisms. We introduce 3 resources to address this:

7

8

18

Wenhu Chen

@WenhuChen

2 months

# NewDataset for VLMs After the release of VisualWebInstruct, we kept pushing its quality and adopting different strategies to make it as accurate as possible. Today, we are releasing a verified version of VisualWebInstruct under https://t.co/Sw3MSM5FUK. It has around 100K

huggingface.co

Wenhu Chen

@WenhuChen

9 months

We have made a huge progress in language model reasoning. But our progress in multimodal reasoning (like MMMU) is very limited. Why? It's due to the lack of diverse, difficult and high-quality multimodal reasoning dataset! 🚀 New Paper Alert! 📢 We introduce VisualWebInstruct,

2

22

94

MiniMax (official)

@MiniMax__AI

1 month

We’re open-sourcing MiniMax M2 — Agent & Code Native, at 8% Claude Sonnet price, ~2x faster ⚡ Global FREE for a limited time via MiniMax Agent & API - Advanced Coding Capability: Engineered for end-to-end developer workflows. Strong capability on a wide-range of applications

126

839

2K

Wenhu Chen

@WenhuChen

2 months

Totally agree. We experimented with only-image input for every task. The results are quite good. Checkout our early paper PixelWorld: https://t.co/aA6zXOV2UI

arxiv.org

Recent agentic language models increasingly need to interact with real-world environments that contain tightly intertwined visual and textual information, often through raw camera pixels rather...

Andrej Karpathy

@karpathy

2 months

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language

5

12

188

Dongfu Jiang

@DongfuJiang

6 months

Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and

6

74

382

Dongfu Jiang

@DongfuJiang

3 months

🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of

Dongfu Jiang

@DongfuJiang

6 months

Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and

2

37

158

MiniMax (official)

@MiniMax__AI

6 months

Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning. - World’s longest context window: 1M-token input, 80k-token output - State-of-the-art agentic use among open-source models - RL at unmatched efficiency:

85

306

1K

Yuansheng Ni

@YuanshengNi

6 months

📢 Introducing VisCoder – fine-tuned language models for Python-based visualization code generation and feedback-driven self-debugging. Existing LLMs struggle to generate reliable plotting code: outputs often raise exceptions, produce blank visuals, or fail to reflect the

7

15

34

Zhiheng LYU

@ZhihengLyu

7 months

@kevinyang41 @ikekong @HKUniversity @BerkeleyNLP Excited to present our NAACL 2025 paper FactTrack: Time-Aware World State Tracking in Story Outlines tomorrow (Apr 29) at 16:30–16:45 in Ballroom B, Albuquerque. Come say hi! #NAACL2025 #NAACL Paper:

0

1

Kevin Yang

@kevinyang41

8 months

Will be at NAACL next week, excited to share two of our papers: FACTTRACK: Time-Aware World State Tracking in Story Outlines https://t.co/1KcL0aCWCI THOUGHTSCULPT: Reasoning with Intermediate Revision and Search https://t.co/ZGqvEeReHr Shoutout to first authors @ZhihengLyu and

0

4

10

Cong Wei

@CongWei1230

8 months

🚀Thrilled to introduce ☕️MoCha: Towards Movie-Grade Talking Character Synthesis Please unmute to hear the demo audio. ✨We defined a novel task: Talking Characters, which aims to generate character animations directly from Natural Language and Speech input. ✨We propose

18

58

220

Zhiheng LYU

@ZhihengLyu

10 months

(6/6) 🎉 Huge thanks to our amazing team: @ZhihengLyu @xueguang_ma @WenhuChen from @UWaterloo ! 🔍Check out “PixelWorld: Towards Perceiving Everything as Pixels” for more details: 🔗 https://t.co/PP1gXyRje8 We hope it sparks fresh ideas for truly unified multimodal models! 🏆✨

0

1

Zhiheng LYU

@ZhihengLyu

10 months

(5/6) We also introduce PEAP-Fast to prune blank pixel regions, significantly speeding up inference with minimal accuracy drop. ⚡️ This makes “perceiving everything as pixels” more efficient!

1

0

1

Zhiheng LYU

@ZhihengLyu

10 months

(4/6) Visualization shows attention over textual patches mirrors token attention—like a “universal tokenizer.” 🧐 But empty areas still receive focus, indicating room for optimization.

1

0

1

Zhiheng LYU

@ZhihengLyu

10 months

(3/6) Key findings: 1. PEAP excels on websites, slides, docs by leveraging layout info; 2. For complex tasks (math/code/knowledge), token-based text still leads; 3. Larger models handle pixel inputs better. 🚀

1

0

3