WenqiShi0106 Profile Banner
Wenqi Shi Profile
Wenqi Shi

@WenqiShi0106

Followers
296
Following
970
Media
11
Statuses
149

Assistant Professor @UTSWMedCenter | Ph.D. @GeorgiaTech | LLMs | Agent | RAG | EHRs | Clinical Decision Support | Pediatric Healthcare

Dallas, TX
Joined November 2023
Don't wanna be here? Send us removal request.
@WenqiShi0106
Wenqi Shi
5 months
๐Ÿค” How can we systematically enhance LLMs for complex medical coding tasks? ๐Ÿš€ Introducing MedAgentGym, an interactive gym-style platform designed specifically for training LLM agents in coding-based medical reasoning! ๐Ÿงฌ๐Ÿ’ป ๐ŸŽฏ Comprehensive Code-based Medical Reasoning
9
20
132
@rohanpaul_ai
Rohan Paul
5 days
New Google paper trains LLM judges to use small bits of code alongside reasoning, so their decisions become precise. So judging stops being guesswork and becomes checkable. Text only judges often miscount, miss structure rules, or accept shaky logic that a simple program would
3
17
161
@yueqi_song
Yueqi Song @ EMNLP2025
6 days
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
Tweet card summary image
arxiv.org
Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...
26
169
1K
@XiangruTang
Rob Tang
27 days
๐Ÿšจ Eigen-1 gets 48.3% (Pass@1) & 61.74% (Pass@5) on "Humanity's Last Exam" (HLE) gold subset @FutureHouseSF using DeepSeek V3.1. Prev. Grok4->30.2%, GPT-5->22.8%, Gemini 2.5 Pro->18.8% ๐Ÿ“Ž https://t.co/4Fhcp8VTBG The future isn't bigger models, it's smarter agentic design! ๐Ÿš€
1
4
40
@my_cat_can_code
Zichen Chen (๐Ÿฑ,๐Ÿ’–)
29 days
With deep research revolutionizing research/data analysis, why are we still stuck in manually crafting data viz? Meet CoDA ( https://t.co/g8yjnHiMHM): The ultimate multi-agent LLM powerhouse for auto-generating stunning plots from NL queries! Handles complex data, self-refines
8
10
109
@rohanpaul_ai
Rohan Paul
28 days
New @GoogleResearch paper builds a personal health assistant that reads a userโ€™s data, answers health questions, and coaches daily habits. It evaluates the system on 10 tasks with 7,000+ human annotations and 1,100 hours from experts and users. The assistant covers 4 needs,
25
170
1K
@TheTuringPost
TuringPost
1 month
RLAD (Reinforcement Learning with Abstraction and Deduction) trains models via RL using a 2-player setup: โ–ช๏ธ An abstraction generator โ€“ proposes short, natural-language โ€œreasoning hintsโ€ (abstractions) summarizing key facts and strategies. โ–ช๏ธ A solution generator โ€“ uses them to
11
60
297
@LingYang_PU
Ling Yang
1 month
๐Ÿš€ Tired of choosing between speed and accuracy for Diffusion Large Language Models? Meet FreeDave ( https://t.co/KzdsX3TrnX) โ€” the lossless parallel decoding algorithm that fixes DLLMsโ€™ inference pain points perfectly! No extra draft models, no model tweaks โ€” just smart parallel
2
22
144
@shizhediao
Shizhe Diao
28 days
๐Ÿš€ Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration When step-scaling hits a plateau, scale rollouts, not steps. BroRL takes reinforcement learning beyond saturationโ€”reviving stalled models by expanding exploration with large-N rollouts. ๐Ÿ‘‡ (1/n)
20
44
210
@rohanpaul_ai
Rohan Paul
1 month
๐Ÿšซ This @Microsoft paper brings really bad news for medical AI models. Exposes some serious flaws. AI models just arenโ€™t ready yet for reliable medical reasoning. ๐Ÿคฏ Paper finds that medical AI model pass tests by exploiting patterns in the data, not by actually combining
20
57
233
@omarsar0
elvis
1 month
Cool research paper from Google. This is what clever context engineering looks like. It proposes Tool-Use-Mixture (TUMIX), leveraging diverse tool-use strategies to improve reasoning. This work shows how to get better reasoning from LLMs by running a bunch of diverse agents
31
151
735
@rohanpaul_ai
Rohan Paul
1 month
The paper teaches small LLMs to reason better by training with built in tree search. i.e. Smarter exploration beats longer training runs. It reaches 62.95% average accuracy while using 5.7x fewer GPU hours. Typical reinforcement learning with verifiable rewards stalls because
3
28
160
@ritaranx
Ran Xu
1 month
๐Ÿšจ Happy to share AceSearcher accepted to #NeurIPS2025 #Spotlight! ๐Ÿ”น One LLM, two roles: Decomposer (split queries) + Solver (combine context) ๐Ÿ”น +7.6% on QA & fact verification ๐Ÿ”น 32B โ‰ˆ DeepSeek-V3 on DocMath ๐Ÿ“‚ Code: https://t.co/lQU12Dm7vb ๐Ÿ“‘ arXiv: https://t.co/JI0kOh0yDk
1
16
25
@arankomatsuzaki
Aran Komatsuzaki
1 month
ReasoningBank: memory for self-evolving LLM agents โ€ข Distills strategies from both successes & failures โ€ข Enables agents to learn, reuse, and improve over time โ€ข Outperforms prior memory methods on web & SWE tasks (+34.2% eff., โ€“16% steps)
12
114
599
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
2 months
Baichuan-M2: Scaling Medical Capability with Large Verifier System Baichuan has released what is probably right now the best open-source LLM for medicine! Second only to GPT-5! "Despite its relatively small number of parameters (only 32B), Baichuan-M2 outperformed all other
5
20
112
@SeongminLeee
Seongmin Lee
2 months
๐ŸŽ‰Our paper "Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety" has been accepted to EMNLP 2025 Main Track! @emnlpmeeting ๐Ÿ‘‰First survey connecting LLM interpretation & safety
4
20
176
@shizhediao
Shizhe Diao
2 months
โœจ Alongside NVIDIA-Nemotron-Nano-v2-9B, weโ€™re also open-sourcing its pre-training dataset. At NVIDIA, we remain committed to openness โ€” models + datasets. As the global open-source ecosystem rapidly evolves (with remarkable momentum emerging from Asia and beyond), we stand
Tweet card summary image
huggingface.co
@shizhediao
Shizhe Diao
2 months
This week, we open-sourced NVIDIA-Nemotron-Nano-v2-9B: our next-generation efficient hybrid model. - 6ร— faster than Qwen3-8B at reasoning tasks. - Retained long-context capability (8k โ†’ 262k trained, usable at 128k) First true demonstration that reasoning models can be
5
15
118
@ctcaego
CTCAE6 GO
3 months
๐Ÿ‘‹ Welcome to CTCAE6 GO โ€” fast CTCAE v6/v5 reference, built by a clinician. ๐ŸŽ“ Trainees always free ๐Ÿ’ก Share how you're using Pro to learn, teach, or save time in clinic. Tag us or DM your best workflow or use case! =>longer access as CTCAE6 GO Elite User . Free PRO CODE below
1
4
6
@rosinality
Rosinality
3 months
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Reasoning RL using the pass@k metric. It promotes exploration and generalization.
2
29
167
@huang_chao4969
Chao Huang
3 months
๐Ÿš€ DeepCode: Open Agentic Coding is Here! We dropped DeepCode - an AI-powered coding platform that transforms research papers and technical documents into production-ready code! ๐Ÿ”— Fully Open Source: https://t.co/vBzRhcVAsN โœจ Current Features: โ€ข Paper2Code: Convert research
11
168
747