karthik_r_n Profile Banner
Karthik Narasimhan Profile
Karthik Narasimhan

@karthik_r_n

Followers
4K
Following
903
Media
7
Statuses
282

Professor@PrincetonCS, Research@SierraPlatform. Previously @OpenAI, @MIT_CSAIL, @iitmadras

Princeton, NJ
Joined July 2015
Don't wanna be here? Send us removal request.
@karthik_r_n
Karthik Narasimhan
2 months
RT @BenShi34: As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achi….
0
39
0
@karthik_r_n
Karthik Narasimhan
2 months
RT @claybavor: Today we announced a set of major advances to our agent benchmark, 𝜏-bench. This new benchmark, 𝜏², introduces the notion of….
0
6
0
@karthik_r_n
Karthik Narasimhan
2 months
RT @SierraPlatform: Last year, we introduced 𝜏-bench, a benchmark for evaluating AI agents on realistic, multi-step tasks involving tool us….
0
4
0
@karthik_r_n
Karthik Narasimhan
2 months
RT @a1zhang: Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II?. 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS….
0
76
0
@karthik_r_n
Karthik Narasimhan
3 months
RT @SierraPlatform: Successful agents are the result of collaboration between teams: engineering, operations, customer experience, and mark….
0
5
0
@karthik_r_n
Karthik Narasimhan
3 months
RT @claybavor: Like all great products, the best agents are the product of many teams working together — some technical, some non-technical….
0
2
0
@karthik_r_n
Karthik Narasimhan
3 months
Humans evolved to communicate so we could coordinate better. But these days, it feels like we communicate so much, yet coordinate so little.
2
0
20
@karthik_r_n
Karthik Narasimhan
4 months
RT @ShunyuYao12: I’m at ICLR to present a poster and give a talk, both related to the second half blogpost. See you there if you wanna chat….
0
7
0
@karthik_r_n
Karthik Narasimhan
5 months
Interesting tidbits on using dedicated "thinking" steps in agents from @AnthropicAI . Also loved seeing full pass^k curves for τ-bench - measuring this was the primary motivation of the benchmark, not just avg scores!.
@AnthropicAI
Anthropic
5 months
We’re launching a new blog: Engineering at Anthropic. A hub where developers can find practical advice and our latest discoveries on how to get the most from Claude.
Tweet media one
0
0
11
@karthik_r_n
Karthik Narasimhan
5 months
RT @SierraPlatform: In the AI age, agent reliability is key, and Sierra’s 𝜏-bench is setting the standard—shaping academic research, indust….
0
4
0
@karthik_r_n
Karthik Narasimhan
5 months
The best thing about SWE-agents and tools like cursor is the amount of additional agency they provide us.
0
1
13
@karthik_r_n
Karthik Narasimhan
6 months
RT @KLieret: SWE-agent 1.0 is the open-source SOTA on SWE-bench Lite! Tons of new features: massively parallel runs; cloud-based deployment….
0
18
0
@karthik_r_n
Karthik Narasimhan
7 months
The biggest mistake we can make right now is not dreaming big enough, especially w.r.t AI.
5
13
155
@karthik_r_n
Karthik Narasimhan
9 months
RT @CSM_ai: Today we're releasing Common Sense Agents, a new backbone for agentic creative computing: .💻 Windows VMs for safe and repeatabl….
0
29
0
@karthik_r_n
Karthik Narasimhan
10 months
RT @SierraPlatform: Today we're excited to announce a new way to interact with Sierra agents: voice. Learn more about how this new capabili….
0
4
0
@karthik_r_n
Karthik Narasimhan
10 months
RT @jyangballin: We're launching SWE-bench Multimodal to eval agents' ability to solve visual GitHub issues. - 617 *brand new* tasks from 1….
0
62
0
@karthik_r_n
Karthik Narasimhan
10 months
RT @SierraPlatform: Sierra partnered with @Casper to launch Luna 2.0, their AI agent delivering 24/7 personalized customer support. From he….
0
5
0
@karthik_r_n
Karthik Narasimhan
10 months
In a year or two from now, 'fine-tuning' will become synonymous with 'training' (as used in the good old ML days). LLMs will be seen more widely as starting points, just like weight initialization or choosing the number of layers for a Transformer. Pick a starting point, curate.
11
13
141
@karthik_r_n
Karthik Narasimhan
11 months
RT @AbramovichTalor: We're launching EnIGMA, our state-of-the-art AI agent for offensive cybersec! .It uses tools like Ghidra & pwntools, c….
0
15
0