Zhiheng LYU
@ZhihengLyu
Followers
138
Following
80
Media
8
Statuses
33
Code Agent Post-training for MiniMax-M2; Core Contributor of VerlTool; MMath Student @UWaterloo TIGER-Lab Prev @HKUniversity @ETH_en @UCBerkeley
Waterloo, CA
Joined May 2022
🤯🤯🤯 Gemini 3 Pro + Live-SWE-agent hits 77.4% on SWE-bench Verified, beating ALL existing models, including Claude 4.5!! 🤖 Live-SWE-agent is the first live software agent that autonomously self-evolves on the fly — and it even outperforms the manually engineered scaffold
33
71
479
🚀 Thrilled to announce that I’ll be attending EMNLP 2025 (4Nov-9Nov) in Suzhou, China! 🇨🇳✨ I’ll be showcasing our latest research from #KCLNLP on implicit Chain-of-Thoughts (CoTs) and an AI Scientist demo system 🤖🧠 📘 CODI: Compressing Chain-of-Thought into Continuous Space
lnkd.in
This link will take you to a page that’s not on LinkedIn
2
4
39
Our EMNLP 2025 paper "Synthetic Socratic Debates" is presenting today in Suzhou! 📍 Poster Session 1 🕚 Nov 5, 11:00 AM (Beijing) Come chat about how LLM personas shape moral reasoning & persuasion! 🔗
arxiv.org
As large language models (LLMs) are increasingly used in morally sensitive domains, it is crucial to understand how persona traits affect their moral reasoning and persuasive behavior. We present...
1
8
24
Actually saw it climb from 8th to 2nd; let's see what happens when the free trial ends : )
MiniMax-M2 the #2
0
0
1
My student Wentao reproduced Self-Adapting LMs and wrote a blog on lessons learned. Highly recommended for anyone adapting LMs! He's also looking for a summer internship. He has 2 first-author EMNLP papers after just one year! 🔗 https://t.co/OK9O2shJhJ 🔗 https://t.co/506fLt3FmL
🚨New reproduction study We re-implemented SEAL (Self-Adapting LMs) & confirmed results... but found sth surprising: ✅Self-editing gives most gains ❌RL is costly & adds little for instruct models 🤖External editors (GPT-5) are cheaper+competitive Blog: https://t.co/kHHECDZcq2
4
15
115
📢 Introducing VisCoder2: Building Multi-Language Visualization Coding Agents! Existing LLMs often fail in practical workflows due to limited language coverage, unreliable execution, and a lack of iterative correction mechanisms. We introduce 3 resources to address this:
7
8
18
# NewDataset for VLMs After the release of VisualWebInstruct, we kept pushing its quality and adopting different strategies to make it as accurate as possible. Today, we are releasing a verified version of VisualWebInstruct under https://t.co/Sw3MSM5FUK. It has around 100K
huggingface.co
We have made a huge progress in language model reasoning. But our progress in multimodal reasoning (like MMMU) is very limited. Why? It's due to the lack of diverse, difficult and high-quality multimodal reasoning dataset! 🚀 New Paper Alert! 📢 We introduce VisualWebInstruct,
2
22
94
We’re open-sourcing MiniMax M2 — Agent & Code Native, at 8% Claude Sonnet price, ~2x faster ⚡ Global FREE for a limited time via MiniMax Agent & API - Advanced Coding Capability: Engineered for end-to-end developer workflows. Strong capability on a wide-range of applications
126
839
2K
Totally agree. We experimented with only-image input for every task. The results are quite good. Checkout our early paper PixelWorld: https://t.co/aA6zXOV2UI
arxiv.org
Recent agentic language models increasingly need to interact with real-world environments that contain tightly intertwined visual and textual information, often through raw camera pixels rather...
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language
5
12
188
Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and
6
74
382
🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of
Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and
2
37
158
Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning. - World’s longest context window: 1M-token input, 80k-token output - State-of-the-art agentic use among open-source models - RL at unmatched efficiency:
85
306
1K
📢 Introducing VisCoder – fine-tuned language models for Python-based visualization code generation and feedback-driven self-debugging. Existing LLMs struggle to generate reliable plotting code: outputs often raise exceptions, produce blank visuals, or fail to reflect the
7
15
34
@kevinyang41 @ikekong @HKUniversity @BerkeleyNLP Excited to present our NAACL 2025 paper FactTrack: Time-Aware World State Tracking in Story Outlines tomorrow (Apr 29) at 16:30–16:45 in Ballroom B, Albuquerque. Come say hi! #NAACL2025 #NAACL Paper:
0
0
1
Will be at NAACL next week, excited to share two of our papers: FACTTRACK: Time-Aware World State Tracking in Story Outlines https://t.co/1KcL0aCWCI THOUGHTSCULPT: Reasoning with Intermediate Revision and Search https://t.co/ZGqvEeReHr Shoutout to first authors @ZhihengLyu and
0
4
10
🚀Thrilled to introduce ☕️MoCha: Towards Movie-Grade Talking Character Synthesis Please unmute to hear the demo audio. ✨We defined a novel task: Talking Characters, which aims to generate character animations directly from Natural Language and Speech input. ✨We propose
18
58
220
(6/6) 🎉 Huge thanks to our amazing team: @ZhihengLyu @xueguang_ma @WenhuChen from @UWaterloo ! 🔍Check out “PixelWorld: Towards Perceiving Everything as Pixels” for more details: 🔗 https://t.co/PP1gXyRje8 We hope it sparks fresh ideas for truly unified multimodal models! 🏆✨
0
0
1
(5/6) We also introduce PEAP-Fast to prune blank pixel regions, significantly speeding up inference with minimal accuracy drop. ⚡️ This makes “perceiving everything as pixels” more efficient!
1
0
1
(4/6) Visualization shows attention over textual patches mirrors token attention—like a “universal tokenizer.” 🧐 But empty areas still receive focus, indicating room for optimization.
1
0
1
(3/6) Key findings: 1. PEAP excels on websites, slides, docs by leveraging layout info; 2. For complex tasks (math/code/knowledge), token-based text still leads; 3. Larger models handle pixel inputs better. 🚀
1
0
3