
Princeton NLP Group
@princeton_nlp
Followers
5K
Following
278
Media
34
Statuses
257
Princeton NLP Group led by @prfsanjeevarora @danqi_chen @karthik_r_n
Princeton, NJ
Joined August 2020
RT @_carlosejimenez: What happens if you compare LMs on SWE-bench without the fancy scaffolds?.Our new leaderboard “SWE-bench (bash only)”….
0
23
0
RT @PrincetonAInews: Shoutout to all the @Princeton researchers participating in @icmlconf #ICML2025 . Browse through some of the cutting e….
0
9
0
RT @BenShi34: As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achi….
0
39
0
RT @_carlosejimenez: Improved reasoning increases performance on benchmarks, but are models able to pass their knowledge onto humans? 🧐 We….
0
1
0
RT @plodq: Introducing SWE-bench Multilingual: a new eval in the SWE-bench family to test LLM coding abilities in *9* programming languages….
0
16
0
RT @OfirPress: Join us on May 21st- I'll talk about how we built SWE-bench & SWE-agent and what I'm excited about for the future of autonom….
0
3
0
RT @stanfordnlp: Our warmest congratulations to @danqi_chen, @stanfordnlp grad and now Associate Professor at @PrincetonCS and Associ….
0
21
0
RT @a1zhang: Claude can play Pokemon, but can it play DOOM?. With a simple agent, we let VLMs play it, and found Sonnet 3.7 to get the furt….
0
56
0
RT @BenShi34: Can language models effectively impersonate you to family and friends? We find that they can: 44% of the time, close friends….
0
3
0
RT @OfirPress: Congrats on the Verified and Multimodal SWE-bench numbers.
venturebeat.com
Zencoder launches powerful AI coding agents with "Coffee Mode" that outperform competitors on benchmarks while integrating with existing developer environments, allowing programmers to be more...
0
1
0
RT @OfirPress: We just updated the SWE-bench Multimodal leaderboard. Congrats to Globant, Zencoder, and the Agentless team from UIUC for th….
0
5
0
RT @_awettig: 🤔 Ever wondered how prevalent some type of web content is during LM pre-training?. In our new paper, we propose WebOrganizer….
0
56
0
RT @OfirPress: This Tuesday (Feb 18), @_carlosejimenez will discuss SWE-bench and the future of codegen evals, as part of the Conference o….
0
2
0
RT @KLieret: SWE-agent 1.0 is the open-source SOTA on SWE-bench Lite! Tons of new features: massively parallel runs; cloud-based deployment….
0
18
0
RT @Yong18850571: 🚀 Introducing Goedel-Prover: A 7B LLM achieving SOTA open-source performance in automated theorem proving! 🔥. ✅ Improving….
0
66
0
RT @OfirPress: Congrats to o3-mini on setting a new high score on SciCode!! R1 clocks in at an impressive 4.6%, matching Claude 3.5. SciCo….
0
3
0
RT @OfirPress: SciCode is our super tough coding benchmark testing the abilities of LMs to program code based on research in physics/biolog….
0
9
0
RT @jyangballin: SWE-bench Multimodal evaluation code is out now!. SWE-bench MM is a new set of JavaScript issues that have a visual compon….
0
17
0