
Yiqing Xie
@YiqingXieNLP
Followers
169
Following
129
Media
15
Statuses
65
✨ Synthetic data; Auto Eval; Code-Gen; 🎓 PhD student @LTIatCMU; MSCS @dmguiuc. 👩💻 previously Intern @meta; @MSFTResearch * 2; @AlibabaDAMO.
Joined September 2023
How to construct repo-level coding environments in a scalable way?. Checkout RepoST: an automated framework to construct repo-level environments using Sandbox Testing (. Models trained with RepoST data can generalize well to other datasets (e.g., RepoEval)
3
23
87
RepoST was accepted to @COLM_conf !!! .See you in Montreal 🚀. #COLM2025.
How to construct repo-level coding environments in a scalable way?. Checkout RepoST: an automated framework to construct repo-level environments using Sandbox Testing (. Models trained with RepoST data can generalize well to other datasets (e.g., RepoEval)
0
3
17
RT @lmathur_: Future AI systems interacting with humans will need to perform social reasoning that is grounded in behavioral cues and exter….
0
15
0
RT @shubhamrgandhi: 🚨New preprint🚨 .I’m super excited to share our work: An Empirical Study on Strong-Weak Model Collaboration for Repo-le….
arxiv.org
We study cost-efficient collaboration between strong and weak language models for repository-level code generation, where the weak model handles simpler tasks at lower cost, and the most...
0
5
0
RT @GashonHussein: Excited to share our new paper, "One-Minute Video Generation with Test-Time Training (TTT)" in collaboration with NVIDIA….
0
159
0
RT @jacspringer: Training with more data = better LLMs, right? 🚨. False! Scaling language models by adding more pre-training data can decre….
0
184
0
If you’re interested in RepoST, checkout the:.- Paper: - Code & Data: Many thanks to my awesome collaborators: Alex Xie, @Divyanshu_Sheth, @stefan_fee, @dan_fried, @carolynprose!!.
github.com
Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing" - yiqingxyq/RepoST
0
0
5
We benchmark 12 Code LLMs on RepoST-Eval to evaluate their abilities to generate code in real GitHub repositories. The best model only achieves 39.53 Pass@1. We further conducted a human study on a sampled set, where the human participants solved 81.5% of the examples.
1
0
2
RT @PranjalAggarw16: What if you could control how long a reasoning model “thinks”?. Presenting L1-1.5B, an RL-trained reasoning model with….
0
72
0
RT @FariaHuqOaishi: [1/6] 🤔 Ever wondered if you could collaborate with an agent on web tasks?. We present CowPilot 🐮, a framework for hu….
0
50
0
RT @AutoScienceAI: Introducing Carl, the first AI system to create a research paper that passes peer review. Carl's work was just accepted….
0
34
0
RT @jiayi_pirate: We reproduced DeepSeek R1-Zero in the CountDown game, and it just works . Through RL, the 3B base LM develops self-verifi….
0
1K
0
RT @gaotianyu1350: Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by s….
0
47
0