Wenqi Shi @WenqiShi0106 X Profile

Wenqi Shi

@WenqiShi0106

Followers

296

Following

970

Media

11

Statuses

149

https://t.co/KzFNDt2h30

Dallas, TX

Joined November 2023

Don't wanna be here? Send us removal request.

Wenqi Shi

@WenqiShi0106

5 months

🤔 How can we systematically enhance LLMs for complex medical coding tasks? 🚀 Introducing MedAgentGym, an interactive gym-style platform designed specifically for training LLM agents in coding-based medical reasoning! 🧬💻 🎯 Comprehensive Code-based Medical Reasoning

9

20

132

Rohan Paul

@rohanpaul_ai

5 days

New Google paper trains LLM judges to use small bits of code alongside reasoning, so their decisions become precise. So judging stops being guesswork and becomes checkable. Text only judges often miscount, miss structure rules, or accept shaky logic that a simple program would

3

17

161

Yueqi Song @ EMNLP2025

@yueqi_song

6 days

We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.

arxiv.org

Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...

26

169

1K

Rob Tang

@XiangruTang

27 days

🚨 Eigen-1 gets 48.3% (Pass@1) & 61.74% (Pass@5) on "Humanity's Last Exam" (HLE) gold subset @FutureHouseSF using DeepSeek V3.1. Prev. Grok4->30.2%, GPT-5->22.8%, Gemini 2.5 Pro->18.8% 📎 https://t.co/4Fhcp8VTBG The future isn't bigger models, it's smarter agentic design! 🚀

1

4

40

Zichen Chen (🐱,💖)

@my_cat_can_code

29 days

With deep research revolutionizing research/data analysis, why are we still stuck in manually crafting data viz? Meet CoDA ( https://t.co/g8yjnHiMHM): The ultimate multi-agent LLM powerhouse for auto-generating stunning plots from NL queries! Handles complex data, self-refines

8

10

109

Rohan Paul

@rohanpaul_ai

28 days

New @GoogleResearch paper builds a personal health assistant that reads a user’s data, answers health questions, and coaches daily habits. It evaluates the system on 10 tasks with 7,000+ human annotations and 1,100 hours from experts and users. The assistant covers 4 needs,

25

170

1K

TuringPost

@TheTuringPost

1 month

RLAD (Reinforcement Learning with Abstraction and Deduction) trains models via RL using a 2-player setup: ▪️ An abstraction generator – proposes short, natural-language “reasoning hints” (abstractions) summarizing key facts and strategies. ▪️ A solution generator – uses them to

11

60

297

Ling Yang

@LingYang_PU

1 month

🚀 Tired of choosing between speed and accuracy for Diffusion Large Language Models? Meet FreeDave ( https://t.co/KzdsX3TrnX) — the lossless parallel decoding algorithm that fixes DLLMs’ inference pain points perfectly! No extra draft models, no model tweaks — just smart parallel

2

22

144

Shizhe Diao

@shizhediao

28 days

🚀 Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration When step-scaling hits a plateau, scale rollouts, not steps. BroRL takes reinforcement learning beyond saturation—reviving stalled models by expanding exploration with large-N rollouts. 👇 (1/n)

20

44

210

Rohan Paul

@rohanpaul_ai

1 month

🚫 This @Microsoft paper brings really bad news for medical AI models. Exposes some serious flaws. AI models just aren’t ready yet for reliable medical reasoning. 🤯 Paper finds that medical AI model pass tests by exploiting patterns in the data, not by actually combining

20

57

233

elvis

@omarsar0

1 month

Cool research paper from Google. This is what clever context engineering looks like. It proposes Tool-Use-Mixture (TUMIX), leveraging diverse tool-use strategies to improve reasoning. This work shows how to get better reasoning from LLMs by running a bunch of diverse agents

31

151

735

Rohan Paul

@rohanpaul_ai

1 month

The paper teaches small LLMs to reason better by training with built in tree search. i.e. Smarter exploration beats longer training runs. It reaches 62.95% average accuracy while using 5.7x fewer GPU hours. Typical reinforcement learning with verifiable rewards stalls because

3

28

160

Ran Xu

@ritaranx

1 month

🚨 Happy to share AceSearcher accepted to #NeurIPS2025 #Spotlight! 🔹 One LLM, two roles: Decomposer (split queries) + Solver (combine context) 🔹 +7.6% on QA & fact verification 🔹 32B ≈ DeepSeek-V3 on DocMath 📂 Code: https://t.co/lQU12Dm7vb 📑 arXiv: https://t.co/JI0kOh0yDk

1

16

25

Aran Komatsuzaki

@arankomatsuzaki

1 month

ReasoningBank: memory for self-evolving LLM agents • Distills strategies from both successes & failures • Enables agents to learn, reuse, and improve over time • Outperforms prior memory methods on web & SWE tasks (+34.2% eff., –16% steps)

12

114

599

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

2 months

Baichuan-M2: Scaling Medical Capability with Large Verifier System Baichuan has released what is probably right now the best open-source LLM for medicine! Second only to GPT-5! "Despite its relatively small number of parameters (only 32B), Baichuan-M2 outperformed all other

5

20

112

Seongmin Lee

@SeongminLeee

2 months

🎉Our paper "Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety" has been accepted to EMNLP 2025 Main Track! @emnlpmeeting 👉First survey connecting LLM interpretation & safety

4

20

176

Shizhe Diao

@shizhediao

2 months

✨ Alongside NVIDIA-Nemotron-Nano-v2-9B, we’re also open-sourcing its pre-training dataset. At NVIDIA, we remain committed to openness — models + datasets. As the global open-source ecosystem rapidly evolves (with remarkable momentum emerging from Asia and beyond), we stand

huggingface.co

Shizhe Diao

@shizhediao

2 months

This week, we open-sourced NVIDIA-Nemotron-Nano-v2-9B: our next-generation efficient hybrid model. - 6× faster than Qwen3-8B at reasoning tasks. - Retained long-context capability (8k → 262k trained, usable at 128k) First true demonstration that reasoning models can be

5

15

118

CTCAE6 GO

@ctcaego

3 months

👋 Welcome to CTCAE6 GO — fast CTCAE v6/v5 reference, built by a clinician. 🎓 Trainees always free 💡 Share how you're using Pro to learn, teach, or save time in clinic. Tag us or DM your best workflow or use case! =>longer access as CTCAE6 GO Elite User . Free PRO CODE below

1

4

6

Rosinality

@rosinality

3 months

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Reasoning RL using the pass@k metric. It promotes exploration and generalization.

2

29

167

Chao Huang

@huang_chao4969

3 months

🚀 DeepCode: Open Agentic Coding is Here! We dropped DeepCode - an AI-powered coding platform that transforms research papers and technical documents into production-ready code! 🔗 Fully Open Source: https://t.co/vBzRhcVAsN ✨ Current Features: • Paper2Code: Convert research

11

168

747