Eric Xingdi Yuan @ericxyuan X Profile

Eric Xingdi Yuan

@ericxyuan

Followers

915

Following

1K

Media

40

Statuses

384

Senior Researcher at Microsoft Research, Montreal. Opinions are my own.

Montréal, Québec

Joined August 2016

Don't wanna be here? Send us removal request.

Eric Xingdi Yuan

@ericxyuan

20 days

We want to push towards agents that understand a repo on the codebase-level, this requires tasks beyond looking at just a few lines of the code or a single file. In this work led by our great intern Amy Lee, we explore how such tasks should look like.

hyunji amy lee

@hyunji_amy_lee

20 days

🚨 Excited to announce Gistify!, where a coding agent must extract the gist of a repository: generate a single, executable, and self-contained file that faithfully reproduces the behavior of a given command (e.g., a test or entrypoint). ✅ It is a lightweight, broadly applicable

0

4

12

Alessandro Sordoni

@murefil

1 month

This was a great group effort ❤️. Check the thread below! My 2c: we train a 32B coding agent by distilling a strong teacher model on a mix of real and synthetic bugs generated by our new approach BugPilot 🛩️! BugPilot creates bugs unintentionally, by asking the teacher to

Isadora White

@isadorcw

1 month

Excited to introduce our SoTA coding models, FrogBoss (32B) and FrogMini (14B), on SWE-Bench-Verified! (FrogBoss eats bugs… like a boss) 🐸🪲 These models were trained with bugs from a mix of existing and our new synthetic bug generation approach, called BugPilot. (1/n)

0

8

38

Eric Xingdi Yuan

@ericxyuan

1 month

Generate better bugs by avoid asking your agent to generate bugs! Great work led by @isadorcw and @twm_as !

Isadora White

@isadorcw

1 month

Excited to introduce our SoTA coding models, FrogBoss (32B) and FrogMini (14B), on SWE-Bench-Verified! (FrogBoss eats bugs… like a boss) 🐸🪲 These models were trained with bugs from a mix of existing and our new synthetic bug generation approach, called BugPilot. (1/n)

0

6

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 months

By popular demand we've extended the Wordplay Workshop deadline by a couple of weeks until Sept 12! The competition on realistic dialogue for game agents already has over 5000 submissions and the winners will also be at the workshop. Come hang out with us at EMNLP!

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

7 months

The Wordplay Workshop is back! 5th edition with EMNLP in Suzhou this Dec. We're also hosting a competition this time on making more realistic LLM powered NPCs in games! As always come by and chat all things text agents!

2

8

16

Eric Xingdi Yuan

@ericxyuan

4 months

Great work! Congratulations!

Jingcheng (Frank) Niu

@frankniujc

4 months

Hey this is me! Our paper: Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs Blog post:

0

5

Jingcheng (Frank) Niu

@frankniujc

4 months

📢 Next week, I will be presenting our paper "Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs" at ACL 2025! Paper: https://t.co/usPw15Woke Blog Post: https://t.co/9RRzaEz9m9 Talk: https://t.co/GiPHfOhzx8

1

2

12

Lucas Caccia

@LucasPCaccia

5 months

RAG and in-context learning are the go-to approaches for integrating new knowledge into LLMs, making inference very inefficient We propose instead 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗠𝗼𝗱𝘂𝗹𝗲𝘀 : lightweight LoRA modules trained offline that can match RAG performance without the drawbacks

1

14

44

Eric Xingdi Yuan

@ericxyuan

5 months

CFP of the Wordplay 2025 (EMNLP) is live! https://t.co/blqp5JQ1us

Eric Xingdi Yuan

@ericxyuan

7 months

Announcing the 5th Wordplay Workshop at EMNLP 2025 (Suzhou, China). We are co-organizing the CPDC Challenge (total prize value USD 20K!!!), the warm-up round is starting now!

0

6

17

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

7 months

Introducing TALES - Text Adventure Learning Environment Suite A benchmark of a few hundred text envs: science experiments and embodied cooking to solving murder mysteries. We test over 30 of the best LLM agents and pinpoint failure modes +how to improve 👨‍💻pip install tale-suite

2

19

66

Eric Xingdi Yuan

@ericxyuan

7 months

Announcing the 5th Wordplay Workshop at EMNLP 2025 (Suzhou, China). We are co-organizing the CPDC Challenge (total prize value USD 20K!!!), the warm-up round is starting now!

wordplay-workshop.github.io

Official website for the Wordplay Workshop at EMNLP 2025. Exploring interactive narratives, text-adventure games, and AI agents in language-based environments. Join us in Suzhou, China, November...

AIcrowd

@aicrowdHQ

7 months

🎮 You're exploring your favourite RPG city. The blacksmith greets you, remembers you saved his life recommends a customised weapon upgrade. Build better NPCs that respond naturally, adapt dynamically, and recall your actions.👇 https://t.co/Hdgs2IJkPC

1

6

Eric Xingdi Yuan

@ericxyuan

8 months

Super excited to share this. The project page, the technical report, and the open-sourced github repo can be found at

microsoft.github.io

Microsoft Research

@MSFTResearch

8 months

Developers spend a lot of time debugging code. Learn how debug-gym can equip AI agents to help, enabling them to set breakpoints, navigate the codebase, and print runtime variable values on demand, so they better understand the code and its execution flow: https://t.co/TFHncIElTZ

0

5

23

Ai2

@allen_ai

8 months

Imagine AI doing science: reading papers, generating ideas, designing and running experiments, analyzing results… How many more discoveries can we reveal? 🧐 Meet CodeScientist, a promising next step toward autonomous scientific discovery. 🧵

6

97

369

AutoGen

@pyautogen

10 months

HUGE: The biggest upgrade to #AutoGen just dropped! v0.4 (stable) is finally here. For detail, checkout the blog below.

Microsoft Research

@MSFTResearch

10 months

Announcing AutoGen 0.4, fully reimagined library for building advanced agentic AI systems, developed to improve code quality and robustness. Its asynchronous, event-driven architecture is designed to support dynamic, scalable workflows. Learn more: https://t.co/N7iSeR7ZJk

5

33

109

Alessandro Sordoni

@murefil

1 year

The ML team at @MSFTResearch Montréal 🍁 is hiring a Senior Researcher with a background in ML / NLP!!! Come work with us at the intersection of interactivity, modularity and reasoning in foundation models 😊 MSR is a highly collaborative environment where risky ideas are

1

37

128

Peter Jansen ( @peterjansen-ai.bsky.social )

@peterjansen_ai

1 year

My student Ruoyao Wang's ACL 2024 paper is featured in the State of AI report. He's on the job market this year, and one of the most experienced NLP+Simulation PhD students out there. You should hire him! Ruoyao's Website: https://t.co/mo0eRfogSq Paper:

Nathan Benaich

@nathanbenaich

1 year

🪩The @stateofaireport 2024 has landed! 🪩 Our seventh installment is our biggest and most comprehensive yet, covering everything you *need* to know about research, industry, safety and politics. As ever, here's my director’s cut (+ video tutorial!) 🧵

1

6

21

Sam Devlin

@smdvln

1 year

Our team @MSFTResearch is hiring for a 2-year AI Residency role in the area of learning to control embodied agents, with the goal of informing future applications in Gaming and Robotics. For more details and to formally apply, please visit:

6

35

166

AutoGen

@pyautogen

1 year

We are excited to announce a preview of the new architecture of AutoGen (coming in v0.4). To learn more, see Blog: https://t.co/fNZhTdxaL0 Pull request: https://t.co/TDi0tudjLq Come help us shape the future of AutoGen!

5

57

179

Eric Xingdi Yuan

@ericxyuan

1 year

Multiple authors (including me) are going to Bangkok, let's chat in person if you are going as well!

Peter Jansen ( @peterjansen-ai.bsky.social )

@peterjansen_ai

1 year

Can language models be used as world simulators? In our ACL 2024 paper, we show -- not really. GPT-4 is only ~60% accurate at simulating state changes based on common-sense tasks, like boiling water. Preprint: https://t.co/WYkTTcu6g7 @allen_ai @MSFTResearch @aclmeeting

0

2

15

Eric Xingdi Yuan

@ericxyuan

1 year

Hello community, we are looking for a few emergency reviewers to help reviewing some papers within 2 days. Please email us at wordplay.workshop.organizers@gmail.com to let us know your OpenReview account if you are willing to help! Thanks!

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Wordplay has been by far my favorite workshop on all things language agents, games, and interactive NLP since we started it in 2017. This time we'll be co located with ACL in Bangkok! Call for papers: https://t.co/TFpO8rLYPF

1

8

3

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Reminder that there's only a couple more weeks (May 31) until the deadline for the Wordplay: When Language Meets Games workshop at ACL in Bangkok!! Submit all your papers on language agents, simulations, narrative, AI for games, and more!!

1

24

43