princeton_nlp Profile Banner
Princeton NLP Group Profile
Princeton NLP Group

@princeton_nlp

Followers
5K
Following
278
Media
34
Statuses
257

Princeton NLP Group led by @prfsanjeevarora @danqi_chen @karthik_r_n

Princeton, NJ
Joined August 2020
Don't wanna be here? Send us removal request.
@princeton_nlp
Princeton NLP Group
3 days
RT @_carlosejimenez: What happens if you compare LMs on SWE-bench without the fancy scaffolds?.Our new leaderboard “SWE-bench (bash only)”….
0
23
0
@princeton_nlp
Princeton NLP Group
16 days
RT @PrincetonAInews: Shoutout to all the @Princeton researchers participating in @icmlconf #ICML2025 . Browse through some of the cutting e….
0
9
0
@princeton_nlp
Princeton NLP Group
2 months
RT @BenShi34: As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achi….
0
39
0
@princeton_nlp
Princeton NLP Group
2 months
RT @_carlosejimenez: Improved reasoning increases performance on benchmarks, but are models able to pass their knowledge onto humans? 🧐 We….
0
1
0
@princeton_nlp
Princeton NLP Group
3 months
RT @plodq: Introducing SWE-bench Multilingual: a new eval in the SWE-bench family to test LLM coding abilities in *9* programming languages….
0
16
0
@princeton_nlp
Princeton NLP Group
3 months
RT @OfirPress: Join us on May 21st- I'll talk about how we built SWE-bench & SWE-agent and what I'm excited about for the future of autonom….
0
3
0
@princeton_nlp
Princeton NLP Group
3 months
RT @stanfordnlp: Our warmest congratulations to ⁦@danqi_chen⁩, ⁦@stanfordnlp⁩ grad and now Associate Professor at ⁦@PrincetonCS⁩ and Associ….
0
21
0
@princeton_nlp
Princeton NLP Group
4 months
RT @a1zhang: Claude can play Pokemon, but can it play DOOM?. With a simple agent, we let VLMs play it, and found Sonnet 3.7 to get the furt….
0
56
0
@princeton_nlp
Princeton NLP Group
4 months
RT @BenShi34: Can language models effectively impersonate you to family and friends? We find that they can: 44% of the time, close friends….
0
3
0
@princeton_nlp
Princeton NLP Group
4 months
RT @OfirPress: We just updated the SWE-bench Multimodal leaderboard. Congrats to Globant, Zencoder, and the Agentless team from UIUC for th….
0
5
0
@princeton_nlp
Princeton NLP Group
4 months
Nothing like a sunny hike to welcome spring!
Tweet media one
1
6
78
@princeton_nlp
Princeton NLP Group
6 months
RT @_awettig: 🤔 Ever wondered how prevalent some type of web content is during LM pre-training?. In our new paper, we propose WebOrganizer….
0
56
0
@princeton_nlp
Princeton NLP Group
6 months
RT @OfirPress: This Tuesday (Feb 18), @_carlosejimenez will discuss SWE-bench and the future of codegen evals, as part of the Conference o….
0
2
0
@princeton_nlp
Princeton NLP Group
6 months
RT @KLieret: SWE-agent 1.0 is the open-source SOTA on SWE-bench Lite! Tons of new features: massively parallel runs; cloud-based deployment….
0
18
0
@princeton_nlp
Princeton NLP Group
6 months
RT @Yong18850571: 🚀 Introducing Goedel-Prover: A 7B LLM achieving SOTA open-source performance in automated theorem proving! 🔥. ✅ Improving….
0
66
0
@princeton_nlp
Princeton NLP Group
6 months
RT @OfirPress: Congrats to o3-mini on setting a new high score on SciCode!! R1 clocks in at an impressive 4.6%, matching Claude 3.5. SciCo….
0
3
0
@princeton_nlp
Princeton NLP Group
6 months
RT @OfirPress: SciCode is our super tough coding benchmark testing the abilities of LMs to program code based on research in physics/biolog….
0
9
0
@princeton_nlp
Princeton NLP Group
6 months
Congrats to the DeepSeek team on the impressive SWE-bench results!.
@deepseek_ai
DeepSeek
7 months
🚀 DeepSeek-R1 is here!. ⚡ Performance on par with OpenAI-o1.📖 Fully open-source model & technical report.🏆 MIT licensed: Distill & commercialize freely!. 🌐 Website & API are live now! Try DeepThink at today!. 🐋 1/n
Tweet media one
0
0
3
@princeton_nlp
Princeton NLP Group
7 months
RT @jyangballin: SWE-bench Multimodal evaluation code is out now!. SWE-bench MM is a new set of JavaScript issues that have a visual compon….
0
17
0