Yixing Jiang @ NeurIPS Profile
Yixing Jiang @ NeurIPS

@jyx_su

Followers
912
Following
58
Media
4
Statuses
19

PhD student at Stanford | Stanford Machine Learning Group, HealthRex Lab | National Science Scholar | Previously at Google Deepmind, SmarterDx

Stanford, CA
Joined June 2022
Don't wanna be here? Send us removal request.
@AndrewYNg
Andrew Ng
4 days
NeurIPS received 21,575 paper submissions this year. Our Agentic Reviewer, released last week, just surpassed this in number of papers submitted and reviewed. It's clear agentic paper reviewing is here to stay and will be impactful!
@AndrewYNg
Andrew Ng
11 days
Releasing a new "Agentic Reviewer" for research papers. I started coding this as a weekend project, and @jyx_su made it much better. I was inspired by a student who had a paper rejected 6 times over 3 years. Their feedback loop -- waiting ~6 months for feedback each time -- was
68
281
2K
@jyx_su
Yixing Jiang @ NeurIPS
4 days
Excited to share that "Agentic Reviewer" (developed by @AndrewYNg and me) has reviewed more papers than the entire NeurIPS 2025 submission count (21,575). Thank you for the enthusiasm from 160 countries! We are glad that over 95% of you found the generated reviews useful, and
2
2
13
@AndrewYNg
Andrew Ng
11 days
Releasing a new "Agentic Reviewer" for research papers. I started coding this as a weekend project, and @jyx_su made it much better. I was inspired by a student who had a paper rejected 6 times over 3 years. Their feedback loop -- waiting ~6 months for feedback each time -- was
237
1K
6K
@KameronBlack633
Kameron Black
10 months
We highlight the fundamental shift from AI as a tool to AI as a teammate in our recent multi-agent benchmarking study that measures leading large language models in their ability to carry out tasks in medicine: Full study: https://t.co/zudgYwcvnT @StanfordAILab @StanfordMed
@jyx_su
Yixing Jiang @ NeurIPS
10 months
๐Ÿฉบ + ๐Ÿค– Introducing MedAgentBench: EHR Environment to Benchmark Medical LLM Agents โœ… 300 agent tasks (beyond QA) from 10 physician-written categories โœ… 100 realistic patient profiles with 700,000+ data elements โœ… FHIR-compliant interactive environment for translation
3
4
8
@jyx_su
Yixing Jiang @ NeurIPS
10 months
We further break the tasks into three difficulty levels: easy (only one step), medium (two steps) and hard (at least three steps). Most models achieve a lower SR when the tasks need more steps. Gemini 1.5 Pro is one exception as it achieves the highest SR for hard tasks.
1
0
9
@jyx_su
Yixing Jiang @ NeurIPS
10 months
MedAgentBench presents an unsaturated agent-oriented benchmark that current state-of-the-art LLMs exhibit some ability to succeed at. The best model (Claude 3.5 Sonnet v2) achieves a success rate of 69.67%. However, there is still substantial space for improvement.
3
3
23
@jyx_su
Yixing Jiang @ NeurIPS
10 months
Example successful trajectory and common error patterns in MedAgentBench.
0
0
3
@jyx_su
Yixing Jiang @ NeurIPS
10 months
The MedAgentBench workflow begins with a clinician specifying a high-level task, after which the agent orchestrator interacts with both the LLM provider and the electronic medical record environment to finish the task and finally provide feedback to the clinician.
1
0
6
@jyx_su
Yixing Jiang @ NeurIPS
10 months
๐Ÿฉบ + ๐Ÿค– Introducing MedAgentBench: EHR Environment to Benchmark Medical LLM Agents โœ… 300 agent tasks (beyond QA) from 10 physician-written categories โœ… 100 realistic patient profiles with 700,000+ data elements โœ… FHIR-compliant interactive environment for translation
24
53
213
@arena
lmarena.ai
1 year
Congrats @GoogleDeepMind on the Gemma-2-2B release! Gemma-2-2B has been tested in the Arena under "guava-chatbot". With just 2B parameters, it achieves an impressive score 1130 on par with models 10x its size! (For reference: GPT-3.5-Turbo-0613: 1117, Mixtral-8x7b: 1114). This
@GoogleDeepMind
Google DeepMind
1 year
Weโ€™re welcoming a new 2 billion parameter model to the Gemma 2 family. ๐Ÿ› ๏ธ It offers best-in-class performance for its size and can run efficiently on a wide range of hardware. Developers can get started with 2B today โ†’ https://t.co/hQRWYwGY7q
17
100
627
@Google
Google
1 year
Today, weโ€™re releasing Gemma 2 to researchers and developers globally. Available in both 9 billion and 27 billion parameter sizes, itโ€™s much more powerful and efficient than the first generation. Learn more โ†“
Tweet card summary image
blog.google
Gemma 2, our next generation of open models, is now available globally for researchers and developers.
68
234
1K
@jyx_su
Yixing Jiang @ NeurIPS
2 years
๐Ÿš€Want an image classifier within minutes? Just prompt latest models like GPT-4o and Gemini 1.5 with a bunch of demo examples (you can include thousands of them now) and ask multiple queries in one go! Work with @AndrewYNg at @StanfordAILab and @jonc101x . Inspired by @JeffDean
@jeremy_irvin16
Jeremy Irvin
2 years
Want to unlock the full potential of GPT-4o & Gemini 1.5 Pro? Give them **many** demonstrations and batch your queries! Our new work w/ @AndrewYNg shows these models can benefit substantially from lots of demo examples and asking many questions at once! https://t.co/SeAqBRFyy7
4
12
40
@_akhaliq
AK
2 years
Many-Shot In-Context Learning in Multimodal Foundation Models Large language models are well-known to be effective at few-shot in-context learning (ICL). Recent advancements in multimodal foundation models have enabled unprecedentedly long context windows, presenting an
3
55
276