ZhihengLyu Profile Banner
Zhiheng LYU Profile
Zhiheng LYU

@ZhihengLyu

Followers
138
Following
80
Media
8
Statuses
33

Code Agent Post-training for MiniMax-M2; Core Contributor of VerlTool; MMath Student @UWaterloo TIGER-Lab Prev @HKUniversity @ETH_en @UCBerkeley

Waterloo, CA
Joined May 2022
Don't wanna be here? Send us removal request.
@LingmingZhang
Lingming Zhang
18 days
🤯🤯🤯 Gemini 3 Pro + Live-SWE-agent hits 77.4% on SWE-bench Verified, beating ALL existing models, including Claude 4.5!! 🤖 Live-SWE-agent is the first live software agent that autonomously self-evolves on the fly — and it even outperforms the manually engineered scaffold
33
71
479
@yan_hanqi
Hanqi Yan
1 month
🚀 Thrilled to announce that I’ll be attending EMNLP 2025 (4Nov-9Nov) in Suzhou, China! 🇨🇳✨ I’ll be showcasing our latest research from #KCLNLP on implicit Chain-of-Thoughts (CoTs) and an AI Scientist demo system 🤖🧠 📘 CODI: Compressing Chain-of-Thought into Continuous Space
lnkd.in
This link will take you to a page that’s not on LinkedIn
2
4
39
@Jiarui_Liu_
Jiarui Liu
1 month
Our EMNLP 2025 paper "Synthetic Socratic Debates" is presenting today in Suzhou! 📍 Poster Session 1 🕚 Nov 5, 11:00 AM (Beijing) Come chat about how LLM personas shape moral reasoning & persuasion! 🔗
Tweet card summary image
arxiv.org
As large language models (LLMs) are increasingly used in morally sensitive domains, it is crucial to understand how persona traits affect their moral reasoning and persuasive behavior. We present...
1
8
24
@ZhihengLyu
Zhiheng LYU
1 month
Actually saw it climb from 8th to 2nd; let's see what happens when the free trial ends : )
@SkylerMiao7
Skyler Miao
1 month
MiniMax-M2 the #2
0
0
1
@yuntiandeng
Yuntian Deng
1 month
My student Wentao reproduced Self-Adapting LMs and wrote a blog on lessons learned. Highly recommended for anyone adapting LMs! He's also looking for a summer internship. He has 2 first-author EMNLP papers after just one year! 🔗 https://t.co/OK9O2shJhJ 🔗 https://t.co/506fLt3FmL
@wtzhang0820
Wentao Zhang
1 month
🚨New reproduction study We re-implemented SEAL (Self-Adapting LMs) & confirmed results... but found sth surprising: ✅Self-editing gives most gains ❌RL is costly & adds little for instruct models 🤖External editors (GPT-5) are cheaper+competitive Blog: https://t.co/kHHECDZcq2
4
15
115
@YuanshengNi
Yuansheng Ni
1 month
📢 Introducing VisCoder2: Building Multi-Language Visualization Coding Agents! Existing LLMs often fail in practical workflows due to limited language coverage, unreliable execution, and a lack of iterative correction mechanisms. We introduce 3 resources to address this:
7
8
18
@WenhuChen
Wenhu Chen
2 months
# NewDataset for VLMs After the release of VisualWebInstruct, we kept pushing its quality and adopting different strategies to make it as accurate as possible. Today, we are releasing a verified version of VisualWebInstruct under https://t.co/Sw3MSM5FUK. It has around 100K
Tweet card summary image
huggingface.co
@WenhuChen
Wenhu Chen
9 months
We have made a huge progress in language model reasoning. But our progress in multimodal reasoning (like MMMU) is very limited. Why? It's due to the lack of diverse, difficult and high-quality multimodal reasoning dataset! 🚀 New Paper Alert! 📢 We introduce VisualWebInstruct,
2
22
94
@MiniMax__AI
MiniMax (official)
1 month
We’re open-sourcing MiniMax M2 — Agent & Code Native, at 8% Claude Sonnet price, ~2x faster ⚡ Global FREE for a limited time via MiniMax Agent & API - Advanced Coding Capability: Engineered for end-to-end developer workflows. Strong capability on a wide-range of applications
126
839
2K
@WenhuChen
Wenhu Chen
2 months
Totally agree. We experimented with only-image input for every task. The results are quite good. Checkout our early paper PixelWorld: https://t.co/aA6zXOV2UI
Tweet card summary image
arxiv.org
Recent agentic language models increasingly need to interact with real-world environments that contain tightly intertwined visual and textual information, often through raw camera pixels rather...
@karpathy
Andrej Karpathy
2 months
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language
5
12
188
@DongfuJiang
Dongfu Jiang
6 months
Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and
6
74
382
@DongfuJiang
Dongfu Jiang
3 months
🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of
@DongfuJiang
Dongfu Jiang
6 months
Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and
2
37
158
@MiniMax__AI
MiniMax (official)
6 months
Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning. - World’s longest context window: 1M-token input, 80k-token output - State-of-the-art agentic use among open-source models - RL at unmatched efficiency:
85
306
1K
@YuanshengNi
Yuansheng Ni
6 months
📢 Introducing VisCoder – fine-tuned language models for Python-based visualization code generation and feedback-driven self-debugging. Existing LLMs struggle to generate reliable plotting code: outputs often raise exceptions, produce blank visuals, or fail to reflect the
7
15
34
@ZhihengLyu
Zhiheng LYU
7 months
@kevinyang41 @ikekong @HKUniversity @BerkeleyNLP Excited to present our NAACL 2025 paper FactTrack: Time-Aware World State Tracking in Story Outlines tomorrow (Apr 29) at 16:30–16:45 in Ballroom B, Albuquerque. Come say hi! #NAACL2025 #NAACL Paper:
0
0
1
@kevinyang41
Kevin Yang
8 months
Will be at NAACL next week, excited to share two of our papers: FACTTRACK: Time-Aware World State Tracking in Story Outlines https://t.co/1KcL0aCWCI THOUGHTSCULPT: Reasoning with Intermediate Revision and Search https://t.co/ZGqvEeReHr Shoutout to first authors @ZhihengLyu and
0
4
10
@CongWei1230
Cong Wei
8 months
🚀Thrilled to introduce ☕️MoCha: Towards Movie-Grade Talking Character Synthesis Please unmute to hear the demo audio. ✨We defined a novel task: Talking Characters, which aims to generate character animations directly from Natural Language and Speech input. ✨We propose
18
58
220
@ZhihengLyu
Zhiheng LYU
10 months
(6/6) 🎉 Huge thanks to our amazing team: @ZhihengLyu @xueguang_ma @WenhuChen from @UWaterloo ! 🔍Check out “PixelWorld: Towards Perceiving Everything as Pixels” for more details: 🔗 https://t.co/PP1gXyRje8 We hope it sparks fresh ideas for truly unified multimodal models! 🏆✨
0
0
1
@ZhihengLyu
Zhiheng LYU
10 months
(5/6) We also introduce PEAP-Fast to prune blank pixel regions, significantly speeding up inference with minimal accuracy drop. ⚡️ This makes “perceiving everything as pixels” more efficient!
1
0
1
@ZhihengLyu
Zhiheng LYU
10 months
(4/6) Visualization shows attention over textual patches mirrors token attention—like a “universal tokenizer.” 🧐 But empty areas still receive focus, indicating room for optimization.
1
0
1
@ZhihengLyu
Zhiheng LYU
10 months
(3/6) Key findings: 1. PEAP excels on websites, slides, docs by leveraging layout info; 2. For complex tasks (math/code/knowledge), token-based text still leads; 3. Larger models handle pixel inputs better. 🚀
1
0
3