lucas110550 Profile Banner
Zhuolin Yang Profile
Zhuolin Yang

@lucas110550

Followers
46
Following
104
Media
2
Statuses
21

Research Scientist @NVIDIA, Ph.D @UofIllinois. ICPCWF20' Bronze Words are my own.

Santa Clara
Joined February 2016
Don't wanna be here? Send us removal request.
@lucas110550
Zhuolin Yang
2 months
Really impressive
@GoogleDeepMind
Google DeepMind
2 months
An advanced version of Gemini 2.5 Deep Think has achieved gold-medal level performance at the ICPC 2025 - one of the world’s most prestigious programming contests. 🏅 Building on the model's success in math at the IMO, this marks another historic milestone for advanced AI. 🧵
0
0
0
@j_dekoninck
Jasper Dekoninck
2 months
A new open reasoning model, K2-Think, was recently released boasting scores comparable to GPT-OSS 120B and getting a lot of media attention. However, their performance relies on flawed evaluation marked by contamination, unfair comparisons, and misrepresentation of results. 🧵
20
56
327
@lucas110550
Zhuolin Yang
2 months
oh and we have 42.20 for Qwen3-30B-A3B...... maybe qwen team should sue them for adding them into talkshow jokes without asking their copyrights LoL
0
0
0
@lucas110550
Zhuolin Yang
2 months
Just curious - for LCBv5 scores: OpenReasoning-Nemotron-32B: 57.79 Gemini-2.5-Pro: 58.24 this is not joking right??? Is okay to put such ridiculous numbers here???
@_akhaliq
AK
2 months
K2-Think a reasoning system that achieves frontier performance with just a 32B parameter model, surpassing or matching much larger models such as GPT-OSS 120B and DeepSeek v3.1 vibe coded a chat app for it in anycoder
1
0
0
@lucas110550
Zhuolin Yang
2 months
I stayed up late last night to (unofficially) participate ICPC WF25 online mirror by using our dev coding LLM. Up to now, it solves 5 problems (out of 12 problems in total) - D F H K L FYI: I'm just using a small scale LLM so it can be deployed on one single GPU. Some brief
0
0
0
@lucas110550
Zhuolin Yang
5 months
Our released evaluation toolkit can reproduce our AceReason-Nemotron models numbers (see below): AceReason-Nemotron-1.0-7B: LiveCodeBench (Avg@8): * [05/23-05/24]: 72.0; [06/24-01/25]: 54.2 * release set v5: 51.2; release set v6: 44.4 AIME (Avg@64): * AIME'24: 68.6; AIME'25:
Tweet card summary image
huggingface.co
@ychenNLP
Yang Chen
5 months
The first thing we did was to make sure the eval setup is correct! We spend a lot of time to make sure our eval can - accurately reproduce the DeepSeek-R1 numbers on AIME, LiveCodeBench - it's IMPOSSIBLE to track the RL progress without a good eval set up (e.g., we see AIME up
0
4
9
@zihan_johan_liu
Zihan (Johan) Liu
5 months
With stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models. 📄Report: https://t.co/yzYeGqWoTr 🤗Model: https://t.co/VRtprrPxZJ 📚SFT Data:
Tweet card summary image
huggingface.co
@_weiping
Wei Ping
5 months
Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate
1
8
25
@ychenNLP
Yang Chen
5 months
📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models. The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks. ✅AIME2025 (math): 53.6% -> 64.8% ✅LiveCodeBench
6
45
205
@_weiping
Wei Ping
5 months
Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate
2
16
69