Jiaxin Huang Profile
Jiaxin Huang

@jiaxinhuang0229

Followers
567
Following
218
Media
4
Statuses
37

Assistant professor @WUSTL CSE. LLM, NLP, ML, Data Mining. PhD from @IllinoisCS. Microsoft Research PhD Fellow.

Joined June 2022
Don't wanna be here? Send us removal request.
@MingZhong_
Ming Zhong
1 month
Vibe coding with an LLM, but the final vibe is off? 🤔 We analyze why models fail the "vibe check" and what truly matters to users. Key insight: human preference 🧑‍💻 ≈ functional correctness ✅ + instruction following 🎯 Check out our paper: https://t.co/s5gGME5O9I
2
17
69
@jiaxinhuang0229
Jiaxin Huang
3 months
Thrilled to share this exciting work, R-Zero, from my student @ChengsongH31219 where LLM learns to reason from Zero human-curated data! The framework includes co-evolution of a "Challenger" to propose difficult tasks and a "Solver" to solve them. Check out more details in the
@ChengsongH31219
ChengSong Huang
3 months
🚀🚀Excited to share our paper R-Zero: Self-Evolving Reasoning LLM from Zero Data ! How to train LLM without data? R-Zero teaches Large Language Models to reason starting with nothing but a base model. No data required!!! Paper: https://t.co/z4tCJFTXUG Code:
1
4
23
@XiaotaoGu
Xiaotao Gu
4 months
We @Zai_org are thrilled to open-source GLM-4.1V-9B-Thinking, a VLM that can think with long CoTs. SoTA in <10B VLMs, comparable to Qwen-2.5-VL-72B in 18 tasks. One RL to rule them all! Details - Tech report: https://t.co/sxsKy2xP2P - Code: https://t.co/O8WXX7vK0F
3
9
30
@yumeng0818
Yu Meng
5 months
Excited to share our #ICML25 paper (led by @weizhepei) on accelerating LLM decoding! ⚡️ AdaDecode predicts tokens early from intermediate layers 🙅‍♂️No drafter model needed 🪶Just lightweight LM heads ✨Output consistency with standard autoregressive decoding Thread👇
@weizhepei
Zhepei Wei
5 months
⚠️ New #ICML2025 paper! Want faster and accurate LLM decoding? Check out AdaDecode! 🚀 ⚙️ Adaptive token prediction at intermediate layers w/o full forward pass! 🎯 Identical output to standard decoding! 🧩 No draft model — just a lightweight LM head (0.2% model size)! 🧵[1/n]
1
5
34
@jiaxinhuang0229
Jiaxin Huang
5 months
🚀🚀Excited to share our new work on Speculative Decoding by @shrangoh! We tackle a key limitation in draft models which predict worse tokens at later positions, and present PosS that generates high-quality drafts!
@shrangoh
Langlin Huang
5 months
New Research Released! 🚀PosS: Position Specialist Generates Better Draft for Speculative Decoding Is your LLM fast enough? PosS consistently improves over current speculative decoding methods by using position-specialized draft layers to generate high-quality drafts! 🔖Paper:
1
3
10
@liujc1998
Jiacheng Liu
5 months
We enabled OLMoTrace for Tülu 3 models! 🤠 Matched spans are shorter than for OLMo models, bc we can only search in Tülu's post-training data (base model is Llama). Yet we thought it'd still bring some value. Try yourself on the Ai2 playground -- https://t.co/xGDdIR99De
2
16
47
@Siru_Ouyang
Siru Ouyang
5 months
🚀 Introducing RAST: Reasoning Activation via Small Model Transfer! ✨ RAST adjusts key "reasoning tokens" at decoding time using insights from smaller RL-tuned models — no full RL tuning for large models! ⚡ Efficient & Performant,🧠 Scalable & Easy,📉 Up to 50% less GPU memory!
3
21
117
@yumeng0818
Yu Meng
5 months
What truly drives reasoning in RLVR? Check out our new paper led by @tianhongzxy for some fascinating insights and analysis!! 🤩
@tianhongzxy
Xinyu Zhu
5 months
🔥The debate’s been wild: How does the reward in RLVR actually improve LLM reasoning?🤔 🚀Introducing our new paper👇 💡TL;DR: Just penalizing incorrect rollouts❌ — no positive reward needed — can boost LLM reasoning, and sometimes better than PPO/GRPO! 🧵[1/n]
0
3
27
@yumeng0818
Yu Meng
6 months
Thrilled to be named to the Forbes 30 Under 30 Asia 2025 list! 🤩 Excited to keep pushing the boundaries of LLMs to tackle real-world challenges 🙌
@ForbesAsia
Forbes Asia
6 months
Just launched: Meet Asia’s Forbes 30 Under 30, Class of 2025 https://t.co/Ry5JDY2rza #ForbesU30Asia #ForbesUnder30
7
12
122
@billyuchenlin
Bill Yuchen Lin
6 months
Our paper was accepted by @icmlconf 2025! If you're working on RL for reasoning, consider adding more logical puzzle data to your training and eval. Share your ideas for logical reasoning tasks for ZebraLogic v2 and interesting RL studies you want to see! Many thanks to my
@billyuchenlin
Bill Yuchen Lin
9 months
If you're interested in LLMs like o1 and R1 for complex reasoning, check out this paper — we show that logical reasoning tasks are ideal for evaluating and understanding their scaling limits. 🦓 ZebraLogic-Bench is a dataset of 1K constraint satisfaction problems (CSPs)
2
4
96
@BowenJin13
Bowen Jin
7 months
Sorry to miss ICLR this year — but if you're interested in the 𝐥𝐨𝐧𝐠-𝐜𝐨𝐧𝐭𝐞𝐱𝐭 𝐋𝐋𝐌 𝐯𝐬. 𝐑𝐀𝐆 𝐝𝐞𝐛𝐚𝐭𝐞, don’t miss our poster! My amazing collaborator from Google will be there to chat and share insights. 📍 Hall 3 + Hall 2B #302 🕒 Thu, Apr 24 | 3:00–5:30
@omarsar0
elvis
1 year
Long-Context LLMs Meet RAG For many long-context LLMs, the quality of outputs declines as the number of passages increases. It seems that the performance loss is due to retrieved hard negatives. They propose two ways to improve long-context LLM-based RAG: 1) retrieval
1
19
89
@MingZhong_
Ming Zhong
7 months
I will be presenting our poster for the “Law of the Weakest Link” paper at ICLR today! If you're interested in this topic, feel free to stop by and chat! 📍 Location: Hall 3 + Hall 2B #257 ⏰ Time: Apr 25 | 10:00 AM – 12:30 PM SGT
@MingZhong_
Ming Zhong
1 year
Excited to share our recent work! We define and benchmark cross capabilities in LLMs, revealing the "Law of the Weakest Link": collaborative performance clusters around the weakest individual capability. 📄 Paper: https://t.co/GjxWmdyQ9Y 🌐 Website:
0
8
36
@jiaxinhuang0229
Jiaxin Huang
7 months
🚀I’ll be at #ICLR2025! Our group is presenting: Apr 25: Reward Calibration in RLHF Apr 26: Generative Joint Graph Language Modeling Apr 27/28: Logit Arithmetic Approach for In-Context Learning (SLLM, Reasoning & Planning Workshop) 😆 Let’s chat about LLM research, PhD
0
12
66
@jiaxinhuang0229
Jiaxin Huang
7 months
Can LVLMs solve crossword puzzles? Our evaluation of over 20 LLMs and LVLMs finds that LVLMs largely lag behind LLMs due to poor vertical word extraction. Reasoning LLMs (like o3-mini) outperform non-reasoning models, benefitting from cross-letter constraints!
@JixuanLeng
Jixuan Leng
7 months
🚀 Reasoning models are acing complex math and science problems, but what about the everyday puzzles we solve for fun? We introduce a new benchmark, CrossWordBench, designed to evaluate the reasoning capabilities of both LLMs and LVLMs through the medium of crossword puzzles. We
0
0
6
@DoerrfeldBill
Bill Doerrfeld
7 months
LLMs can now trace their outputs to their training data. 🤯 I cover the implications of @allen_ai's new OLMoTrace feature on @thenewstack today. https://t.co/FZdiGwmRQj
Tweet card summary image
thenewstack.io
Ai2’s OLMoTrace uses string matching to reveal the exact sources behind chatbot responses
3
10
38
@jiaxinhuang0229
Jiaxin Huang
8 months
Thrilled to share our recent work "Efficient Test-Time Scaling via Self-Calibration"! We introduce a smart way to boost LLM efficiency in test-time scaling without sacrificing accuracy🧠! By using self-calibrated confidence scores, we enable early stopping in Best-of-N and
@ChengsongH31219
ChengSong Huang
8 months
🚀🚀New Research Alert: Efficient Test-Time Scaling via Self-Calibration! ❓How to dynamically allocate computational resources in repeated sampling methods? 💡We propose an efficient test-time scaling method by using model confidence for dynamically sampling adjustment, since
0
2
13
@jiaxinhuang0229
Jiaxin Huang
10 months
🚀 Exciting opportunity for LLM multi-agent researchers at the Agent Society Challenge at WWW 2025! Monetary prizes are $12,000 in total and top teams will be recommended to publish their results in the WWW Companion proceedings🥳 More details can be found here:
0
1
14
@jiaxinhuang0229
Jiaxin Huang
1 year
🤨Ever wonder why RLHF-trained LLMs are overconfident? 🚀Check out our new work led by @JixuanLeng, revealing that reward models themselves are biased towards high-confidence responses!😯 🥳We introduce two practical solutions (PPO-M & PPO-C) to improve language model
@JixuanLeng
Jixuan Leng
1 year
🚀 RLHF-trained LLMs often show overconfidence in their expressed confidence levels. Our NEW PAPER reveals why – reward models tend to favor highly confident responses, even when responses are wrong! We introduce two PPO variants, PPO with Calibrated Reward Modeling and PPO with
0
2
27
@BanghuaZ
Banghua Zhu
1 year
🔍 Which reward model characteristics best predict RLHF performance? We evaluated RMs & LLM-judges on: - Human preference agreement on Chatbot Arena - Accuracy in selecting correct code/math answers - Correlation with Chatbot Arena rankings Interesting finding: Lower-bound
@arena
lmarena.ai
1 year
🔥New benchmark: Preference Proxy Evaluations (PPE) Can reward models guide RLHF? Can LLM judge replace real human evals? PPE addresses these questions! Highlights: - Real-world human preference from Chatbot Arena💬 - 16,000+ prompts and 32,000+ diverse model responses🗿 -
1
3
35
@jiaxinhuang0229
Jiaxin Huang
1 year
Curious about efficient many-shot ICL with LLMs? Our new paper led by @ChengsongH31219 introduces LARA that divides & reweights in-context examples to ensure ✅ Better Performance ✅ Improved Scalability ✅ No need to Access Model Parameters ✅ Less Memory Usage
@ChengsongH31219
ChengSong Huang
1 year
🚀🚀Excited to share our paper Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning ! How to implement efficient inference in many-shot in-context learning setting? We propose LARA and B-LARA for LLM efficient inference by dividing input
0
1
9