Eric Xingdi Yuan
@ericxyuan
Followers
915
Following
1K
Media
40
Statuses
384
Senior Researcher at Microsoft Research, Montreal. Opinions are my own.
Montréal, Québec
Joined August 2016
We want to push towards agents that understand a repo on the codebase-level, this requires tasks beyond looking at just a few lines of the code or a single file. In this work led by our great intern Amy Lee, we explore how such tasks should look like.
🚨 Excited to announce Gistify!, where a coding agent must extract the gist of a repository: generate a single, executable, and self-contained file that faithfully reproduces the behavior of a given command (e.g., a test or entrypoint). ✅ It is a lightweight, broadly applicable
0
4
12
This was a great group effort ❤️. Check the thread below! My 2c: we train a 32B coding agent by distilling a strong teacher model on a mix of real and synthetic bugs generated by our new approach BugPilot 🛩️! BugPilot creates bugs unintentionally, by asking the teacher to
Excited to introduce our SoTA coding models, FrogBoss (32B) and FrogMini (14B), on SWE-Bench-Verified! (FrogBoss eats bugs… like a boss) 🐸🪲 These models were trained with bugs from a mix of existing and our new synthetic bug generation approach, called BugPilot. (1/n)
0
8
38
Generate better bugs by avoid asking your agent to generate bugs! Great work led by @isadorcw and @twm_as !
Excited to introduce our SoTA coding models, FrogBoss (32B) and FrogMini (14B), on SWE-Bench-Verified! (FrogBoss eats bugs… like a boss) 🐸🪲 These models were trained with bugs from a mix of existing and our new synthetic bug generation approach, called BugPilot. (1/n)
0
0
6
By popular demand we've extended the Wordplay Workshop deadline by a couple of weeks until Sept 12! The competition on realistic dialogue for game agents already has over 5000 submissions and the winners will also be at the workshop. Come hang out with us at EMNLP!
The Wordplay Workshop is back! 5th edition with EMNLP in Suzhou this Dec. We're also hosting a competition this time on making more realistic LLM powered NPCs in games! As always come by and chat all things text agents!
2
8
16
📢 Next week, I will be presenting our paper "Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs" at ACL 2025! Paper: https://t.co/usPw15Woke Blog Post: https://t.co/9RRzaEz9m9 Talk: https://t.co/GiPHfOhzx8
1
2
12
RAG and in-context learning are the go-to approaches for integrating new knowledge into LLMs, making inference very inefficient We propose instead 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗠𝗼𝗱𝘂𝗹𝗲𝘀 : lightweight LoRA modules trained offline that can match RAG performance without the drawbacks
1
14
44
CFP of the Wordplay 2025 (EMNLP) is live! https://t.co/blqp5JQ1us
Announcing the 5th Wordplay Workshop at EMNLP 2025 (Suzhou, China). We are co-organizing the CPDC Challenge (total prize value USD 20K!!!), the warm-up round is starting now!
0
6
17
Introducing TALES - Text Adventure Learning Environment Suite A benchmark of a few hundred text envs: science experiments and embodied cooking to solving murder mysteries. We test over 30 of the best LLM agents and pinpoint failure modes +how to improve 👨💻pip install tale-suite
2
19
66
Announcing the 5th Wordplay Workshop at EMNLP 2025 (Suzhou, China). We are co-organizing the CPDC Challenge (total prize value USD 20K!!!), the warm-up round is starting now!
wordplay-workshop.github.io
Official website for the Wordplay Workshop at EMNLP 2025. Exploring interactive narratives, text-adventure games, and AI agents in language-based environments. Join us in Suzhou, China, November...
🎮 You're exploring your favourite RPG city. The blacksmith greets you, remembers you saved his life recommends a customised weapon upgrade. Build better NPCs that respond naturally, adapt dynamically, and recall your actions.👇 https://t.co/Hdgs2IJkPC
1
1
6
Super excited to share this. The project page, the technical report, and the open-sourced github repo can be found at
microsoft.github.io
Developers spend a lot of time debugging code. Learn how debug-gym can equip AI agents to help, enabling them to set breakpoints, navigate the codebase, and print runtime variable values on demand, so they better understand the code and its execution flow: https://t.co/TFHncIElTZ
0
5
23
Imagine AI doing science: reading papers, generating ideas, designing and running experiments, analyzing results… How many more discoveries can we reveal? 🧐 Meet CodeScientist, a promising next step toward autonomous scientific discovery. 🧵
6
97
369
HUGE: The biggest upgrade to #AutoGen just dropped! v0.4 (stable) is finally here. For detail, checkout the blog below.
Announcing AutoGen 0.4, fully reimagined library for building advanced agentic AI systems, developed to improve code quality and robustness. Its asynchronous, event-driven architecture is designed to support dynamic, scalable workflows. Learn more: https://t.co/N7iSeR7ZJk
5
33
109
The ML team at @MSFTResearch Montréal 🍁 is hiring a Senior Researcher with a background in ML / NLP!!! Come work with us at the intersection of interactivity, modularity and reasoning in foundation models 😊 MSR is a highly collaborative environment where risky ideas are
1
37
128
My student Ruoyao Wang's ACL 2024 paper is featured in the State of AI report. He's on the job market this year, and one of the most experienced NLP+Simulation PhD students out there. You should hire him! Ruoyao's Website: https://t.co/mo0eRfogSq Paper:
🪩The @stateofaireport 2024 has landed! 🪩 Our seventh installment is our biggest and most comprehensive yet, covering everything you *need* to know about research, industry, safety and politics. As ever, here's my director’s cut (+ video tutorial!) 🧵
1
6
21
Our team @MSFTResearch is hiring for a 2-year AI Residency role in the area of learning to control embodied agents, with the goal of informing future applications in Gaming and Robotics. For more details and to formally apply, please visit:
6
35
166
We are excited to announce a preview of the new architecture of AutoGen (coming in v0.4). To learn more, see Blog: https://t.co/fNZhTdxaL0 Pull request: https://t.co/TDi0tudjLq Come help us shape the future of AutoGen!
5
57
179
Multiple authors (including me) are going to Bangkok, let's chat in person if you are going as well!
Can language models be used as world simulators? In our ACL 2024 paper, we show -- not really. GPT-4 is only ~60% accurate at simulating state changes based on common-sense tasks, like boiling water. Preprint: https://t.co/WYkTTcu6g7
@allen_ai @MSFTResearch @aclmeeting
0
2
15
Hello community, we are looking for a few emergency reviewers to help reviewing some papers within 2 days. Please email us at wordplay.workshop.organizers@gmail.com to let us know your OpenReview account if you are willing to help! Thanks!
Wordplay has been by far my favorite workshop on all things language agents, games, and interactive NLP since we started it in 2017. This time we'll be co located with ACL in Bangkok! Call for papers: https://t.co/TFpO8rLYPF
1
8
3
Reminder that there's only a couple more weeks (May 31) until the deadline for the Wordplay: When Language Meets Games workshop at ACL in Bangkok!! Submit all your papers on language agents, simulations, narrative, AI for games, and more!!
1
24
43