Karthik Narasimhan @karthik_r_n X Profile

Karthik Narasimhan

@karthik_r_n

Followers

4K

Following

903

Media

7

Statuses

282

Professor@PrincetonCS, Research@SierraPlatform. Previously @OpenAI, @MIT_CSAIL, @iitmadras

Princeton, NJ

Joined July 2015

Don't wanna be here? Send us removal request.

Karthik Narasimhan

@karthik_r_n

2 months

RT @BenShi34: As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achi….

0

39

0

Karthik Narasimhan

@karthik_r_n

2 months

RT @claybavor: Today we announced a set of major advances to our agent benchmark, 𝜏-bench. This new benchmark, 𝜏², introduces the notion of….

0

6

0

Karthik Narasimhan

@karthik_r_n

2 months

RT @SierraPlatform: Learn more:

sierra.ai

Benchmarking agents in collaborative real-world scenarios

0

1

0

Karthik Narasimhan

@karthik_r_n

2 months

RT @SierraPlatform: Last year, we introduced 𝜏-bench, a benchmark for evaluating AI agents on realistic, multi-step tasks involving tool us….

0

4

0

Karthik Narasimhan

@karthik_r_n

2 months

RT @a1zhang: Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II?. 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS….

0

76

0

Karthik Narasimhan

@karthik_r_n

3 months

RT @SierraPlatform: Successful agents are the result of collaboration between teams: engineering, operations, customer experience, and mark….

0

5

0

Karthik Narasimhan

@karthik_r_n

3 months

RT @claybavor: Like all great products, the best agents are the product of many teams working together — some technical, some non-technical….

0

2

0

Karthik Narasimhan

@karthik_r_n

3 months

Humans evolved to communicate so we could coordinate better. But these days, it feels like we communicate so much, yet coordinate so little.

2

0

20

Karthik Narasimhan

@karthik_r_n

4 months

RT @ShunyuYao12: I’m at ICLR to present a poster and give a talk, both related to the second half blogpost. See you there if you wanna chat….

0

7

0

Karthik Narasimhan

@karthik_r_n

5 months

Interesting tidbits on using dedicated "thinking" steps in agents from @AnthropicAI . Also loved seeing full pass^k curves for τ-bench - measuring this was the primary motivation of the benchmark, not just avg scores!.

Anthropic

@AnthropicAI

5 months

We’re launching a new blog: Engineering at Anthropic. A hub where developers can find practical advice and our latest discoveries on how to get the most from Claude.

0

11

Karthik Narasimhan

@karthik_r_n

5 months

RT @SierraPlatform: In the AI age, agent reliability is key, and Sierra’s 𝜏-bench is setting the standard—shaping academic research, indust….

0

4

0

Karthik Narasimhan

@karthik_r_n

5 months

The best thing about SWE-agents and tools like cursor is the amount of additional agency they provide us.

0

1

13

Karthik Narasimhan

@karthik_r_n

6 months

RT @KLieret: SWE-agent 1.0 is the open-source SOTA on SWE-bench Lite! Tons of new features: massively parallel runs; cloud-based deployment….

0

18

0

Karthik Narasimhan

@karthik_r_n

7 months

The biggest mistake we can make right now is not dreaming big enough, especially w.r.t AI.

5

13

155

Karthik Narasimhan

@karthik_r_n

9 months

RT @CSM_ai: Today we're releasing Common Sense Agents, a new backbone for agentic creative computing: .💻 Windows VMs for safe and repeatabl….

0

29

0

Karthik Narasimhan

@karthik_r_n

10 months

RT @SierraPlatform: Today we're excited to announce a new way to interact with Sierra agents: voice. Learn more about how this new capabili….

0

4

0

Karthik Narasimhan

@karthik_r_n

10 months

RT @jyangballin: We're launching SWE-bench Multimodal to eval agents' ability to solve visual GitHub issues. - 617 *brand new* tasks from 1….

0

62

0

Karthik Narasimhan

@karthik_r_n

10 months

RT @SierraPlatform: Sierra partnered with @Casper to launch Luna 2.0, their AI agent delivering 24/7 personalized customer support. From he….

0

5

0

Karthik Narasimhan

@karthik_r_n

10 months

In a year or two from now, 'fine-tuning' will become synonymous with 'training' (as used in the good old ML days). LLMs will be seen more widely as starting points, just like weight initialization or choosing the number of layers for a Transformer. Pick a starting point, curate.

11

13

141

Karthik Narasimhan

@karthik_r_n

11 months

RT @AbramovichTalor: We're launching EnIGMA, our state-of-the-art AI agent for offensive cybersec! .It uses tools like Ghidra & pwntools, c….

0

15

0