Wei Liu Profile
Wei Liu

@WeiLiu99

Followers
604
Following
3K
Media
18
Statuses
620

#NLProc | Ph.D. Student @hkust @hkustnlp | Prev. @AlibabaGroup @ShanghaiTechUni

Joined February 2018
Don't wanna be here? Send us removal request.
@WeiLiu99
Wei Liu
7 months
โ€œWhat is the answer of 1 + 1?โ€ Large Reasoning Models (LRMs) may generate 1500+ tokens just to answer this trivial question. Too much thinking ๐Ÿคฏ Can LRMs be both Faster AND Stronger? Yes. Introducing LASER๐Ÿ’ฅ: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
2
33
142
@GJChen11710
Guanjie Chen
4 days
Big thanks to @_akhaliq for sharing our work! ๐ŸŽ‰ Flash-DMD decouples the DMD objective, leads to extremely fast distillation convergence. In the second stage, we perform joint reinforcement while distilling , using the distillation loss as a natural regularization. Check it out๐Ÿค—
@_akhaliq
AK
5 days
Flash-DMD Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning
0
4
17
@realSharonZhou
Sharon Zhou โœˆ๏ธ NeurIPS
5 days
Excited to share our @NeurIPSConf Tutorial on How to Build Agents to Generate Kernels for Faster LLMs (and Other Models!) A collaboration across institutions: @AMD, @Stanford, @GoogleDeepMind, @Arm, @NVIDIAAI, @Meta, @Modular, @UCIrvine, @MLCommons. - If you're an AI
12
37
274
@sivil_taram
Qian Liu
5 days
๐Ÿ” From simple code completion to autonomous software engineering agents โ€” what changed in the past 5 years? We wrote the playbook ๐Ÿ“– "๐…๐ซ๐จ๐ฆ ๐‚๐จ๐๐ž ๐…๐จ๐ฎ๐ง๐๐š๐ญ๐ข๐จ๐ง ๐Œ๐จ๐๐ž๐ฅ๐ฌ ๐ญ๐จ ๐€๐ ๐ž๐ง๐ญ๐ฌ" โ€” 300 pages covering exact recipes ๐Ÿงช, scaling laws ๐Ÿ“ˆ & RL techniques ๐ŸŽฏ
2
35
175
@yugu_nlp
Yu Gu
11 days
At NeoCognition, we aim for the essentials: 1. Deliver applied research that turns agents into true business value; 2. Explore fundamental questionsโ€”not just scaling. If youโ€™re an agent believer who wants to build differently, send your CV to hiring@neocognition.io. I wonโ€™t be at
@ysu_nlp
Yu Su (Hiring @Neurips)
12 days
Life update: I moved to silicon valley to tackle agents' biggest challenges: plasticity and reliability. Today's agents are smart but brittle. They lack plasticity (continual learning and adaptation) and reliability (stable, predictable behavior with bounded failures). These two
0
8
48
@WeiLiu99
Wei Liu
6 days
Congrats on the fantastic DeepSeek-V3.2 update! Honored to see our Toolathlon benchmark ( https://t.co/RwOa7RxyKf) being used and highlighted for tool-using evaluation. Still amazed by how fast the community is moving โ€” open-source models can now score 35+ on this benchmark. Iโ€™m
Tweet card summary image
toolathlon.xyz
@deepseek_ai
DeepSeek
6 days
๐Ÿš€ Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale โ€” Reasoning-first models built for agents! ๐Ÿ”น DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. ๐Ÿ”น DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. ๐Ÿ“„ Tech
0
0
5
@lockonlvange
Junlong Li
6 days
Very honored to see Tool Decathlon being used and highlighted on the first page of DeepSeek-V3.2 paper! We have also updated some new models on our benchmark ( https://t.co/DkByeP6rii), and now we finally have DeepSeek-V3.2 as the first Open-Source one >35, great achievement!
@deepseek_ai
DeepSeek
6 days
๐Ÿš€ Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale โ€” Reasoning-first models built for agents! ๐Ÿ”น DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. ๐Ÿ”น DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. ๐Ÿ“„ Tech
0
5
49
@junxian_he
Junxian He
6 days
As a longtime fan of DeepSeek, I am excited to see DeepSeek-V3.2 progresses fast on our Tool Decathlon bench! ๐Ÿš€Update: We have deployed Toolathlon eval as a public service, now you can evaluate on Toolathlon without setting up anything:
Tweet card summary image
github.com
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution - hkust-nlp/Toolathlon
@deepseek_ai
DeepSeek
6 days
๐Ÿš€ Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale โ€” Reasoning-first models built for agents! ๐Ÿ”น DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. ๐Ÿ”น DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. ๐Ÿ“„ Tech
0
4
94
@saagnikkk
Sagnik Mukherjee @ NeurIPS 2025
7 days
๐ŸšจNew Blog Alert: Is AdamW an overkill for RLVR? We found that vanilla SGD is 1. As performant as AdamW, 2. 36x more parameter efficient naturally. (much more than a rank 1 lora) ๐Ÿคฏ Looks like a "free lunch". Maybe Itโ€™s time to rethink the optimizers for RLVR ๐Ÿงต
16
57
472
@shizhediao
Shizhe Diaoโœˆ๏ธNeurIPS 2025
10 days
๐Ÿš€ Excited to share ToolOrchestra, an end-to-end RL training framework for orchestrating tools and agentic workflows. Everyoneโ€™s building agent workflows these days โ€” connecting tools, APIs, and LLMs like LEGO. ๐Ÿงฉ But here are our findings: ๐Ÿ‘‰ Just prompting the agent workflow
25
68
312
@YizheZhangNLP
Yizhe Zhang
11 days
We use latent continuous thoughts for retrieval optimized via downstream NTP loss, unified under one LLM backbone. Since representations are shared, documents can be precomputedโ€”eliminating 2-stage RAG. We match raw text performance but with a much shorter context budget. ๐Ÿ“‰๐Ÿš€
@Jiehenlp
Jie He
12 days
Happy to introduce my internship work at @Apple . We introduce CLaRa: Continuous Latent Reasoning, an end-to-end training framework that jointly trains retrieval and generation ! ๐Ÿง ๐Ÿ“ฆ ๐Ÿ”— https://t.co/jEapFfeD7D #RAG #LLMs #Retrieval #Reasoning #AI
1
9
31
@junxian_he
Junxian He
13 days
Gemini-3-Pro improves Gemini-2.5-Pro from 10.5% to 36.4% on Toolathlon! Only one step away from Claude-4.5-Sonnet now, very impressive
@junxian_he
Junxian He
1 month
๐Ÿš€We are excited to introduce the Tool Decathlon (Toolathlon), a benchmark for language agents on diverse, complex, and realistic tool use. โญ๏ธ32 applications and 600+ tools based on real-world software environments โญ๏ธExecution-based, reliable evaluation โญ๏ธRealistic, covering
1
7
32
@AndrewYNg
Andrew Ng
13 days
Releasing a new "Agentic Reviewer" for research papers. I started coding this as a weekend project, and @jyx_su made it much better. I was inspired by a student who had a paper rejected 6 times over 3 years. Their feedback loop -- waiting ~6 months for feedback each time -- was
238
1K
6K
@Kimi_Moonshot
Kimi.ai
17 days
While Asynchronous RL is heating up, our algo folks walked in and said: "Synchronous/On-policy guarantees OR high efficiency? No, we want BOTH." So we dropped Seer ๐Ÿ˜‰ Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning. ๐Ÿ“„ Read here:
Tweet card summary image
arxiv.org
Reinforcement Learning (RL) has become critical for advancing modern Large Language Models (LLMs), yet existing synchronous RL systems face severe performance bottlenecks. The rollout phase, which...
16
88
696
@rosinality
Rosinality
19 days
Building a generative verifier for proof verification, but it was not very successful - models tend to hack using surface-level features. How can we assign non-hackable reward for this?
1
7
67
@zhu_hanqin41424
Hanqing Zhu
25 days
๐Ÿšจ New Work! ๐Ÿค” Is RL black-box weight tinkering? ๐Ÿ˜‰ No. We provably show RLVR follows a ๐Ÿงญย โ€” always updating the same off-principal regions while preserving the model's core spectra. โš ๏ธ Different optimization regime than SFT โ€” SFT-era PEFT tricks can misfire(like PiSSA, the
7
42
257
@ZhiyuanZeng_
Zhiyuan Zeng โœˆ๏ธ NeurIPS 25๐Ÿ–๏ธ
26 days
RL is bounded by finite data๐Ÿ˜ฃ? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model ๐Ÿ’กfind supervision signals right at the LM capability frontier + scale them ๐Ÿ”—in๐Ÿงต
12
115
472
@RichardYRLi
Yingru Li
26 days
1/ Great thread by @IdanShenfeld on Policy Mirror Descent (PMD) and its likely use in models like Kimi K2. It's a powerful technique for stabilizing RL. We'd like to highlight our NeurIPS 2019 work which was one of the first to frame policy optimization as a mirror descent
@IdanShenfeld
idan shenfeld
28 days
Everyoneโ€™s talking about Kimi K2 Thinking and its impressive performance. No full report yet, but judging from Kimi K2\1.5 reports, it likely uses Policy Mirror Descent - an RL trick thatโ€™s quietly becoming standard in frontier labs. Letโ€™s break down what it is:
1
7
53
@MerlinNoth79247
Mian Wu
30 days
Can we run RL to train LLMs on hard-to-verify or open-ended tasks? Even when tasks are verifiable, it is often impossible to check every design detail or catch all mistakes.. We can go prompt-tune LLM judges, but is that really the answer? Our new paper introduces RLAC: a
9
58
349
@RichardYRLi
Yingru Li
1 month
๐Ÿšจ UPDATE to the "1 bit per episode" analysis (inspired by @johnschulman's post at @thinkymachines ): After discussion with @mgostIH, I ned to points out the limit only applies to *scalar advantage*! REINFORCE with per-timestep advantages can learn O(T) bits when rewards are
@RichardYRLi
Yingru Li
2 months
Inspired by @thinkymachines 's "#LoRA Without Regret" post, I formalized their insight that policy gradient learns ~1 bit per episode via Bayesian #RL formulation. I prove this is a hard information-theoretic ceiling and extend the analysis to actor-critic methods. Full writeup
1
8
18