
Karthik Narasimhan
@karthik_r_n
Followers
4K
Following
903
Media
7
Statuses
282
Professor@PrincetonCS, Research@SierraPlatform. Previously @OpenAI, @MIT_CSAIL, @iitmadras
Princeton, NJ
Joined July 2015
RT @BenShi34: As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achi….
0
39
0
RT @claybavor: Today we announced a set of major advances to our agent benchmark, 𝜏-bench. This new benchmark, 𝜏², introduces the notion of….
0
6
0
RT @SierraPlatform: Learn more:
sierra.ai
Benchmarking agents in collaborative real-world scenarios
0
1
0
RT @SierraPlatform: Last year, we introduced 𝜏-bench, a benchmark for evaluating AI agents on realistic, multi-step tasks involving tool us….
0
4
0
RT @a1zhang: Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II?. 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS….
0
76
0
RT @SierraPlatform: Successful agents are the result of collaboration between teams: engineering, operations, customer experience, and mark….
0
5
0
RT @claybavor: Like all great products, the best agents are the product of many teams working together — some technical, some non-technical….
0
2
0
RT @ShunyuYao12: I’m at ICLR to present a poster and give a talk, both related to the second half blogpost. See you there if you wanna chat….
0
7
0
Interesting tidbits on using dedicated "thinking" steps in agents from @AnthropicAI . Also loved seeing full pass^k curves for τ-bench - measuring this was the primary motivation of the benchmark, not just avg scores!.
We’re launching a new blog: Engineering at Anthropic. A hub where developers can find practical advice and our latest discoveries on how to get the most from Claude.
0
0
11
RT @SierraPlatform: In the AI age, agent reliability is key, and Sierra’s 𝜏-bench is setting the standard—shaping academic research, indust….
0
4
0
RT @KLieret: SWE-agent 1.0 is the open-source SOTA on SWE-bench Lite! Tons of new features: massively parallel runs; cloud-based deployment….
0
18
0
RT @CSM_ai: Today we're releasing Common Sense Agents, a new backbone for agentic creative computing: .💻 Windows VMs for safe and repeatabl….
0
29
0
RT @SierraPlatform: Today we're excited to announce a new way to interact with Sierra agents: voice. Learn more about how this new capabili….
0
4
0
RT @jyangballin: We're launching SWE-bench Multimodal to eval agents' ability to solve visual GitHub issues. - 617 *brand new* tasks from 1….
0
62
0
RT @SierraPlatform: Sierra partnered with @Casper to launch Luna 2.0, their AI agent delivering 24/7 personalized customer support. From he….
0
5
0
RT @AbramovichTalor: We're launching EnIGMA, our state-of-the-art AI agent for offensive cybersec! .It uses tools like Ghidra & pwntools, c….
0
15
0