Vishakh Padmakumar @vishakh_pk X Profile

Vishakh Padmakumar

@vishakh_pk

Followers

648

Following

2K

Media

31

Statuses

405

Postdoc @stanfordnlp @stanfordAILab Prev @NYUDataScience

https://t.co/9wS1DXyiZi

Joined August 2015

Don't wanna be here? Send us removal request.

Vishakh Padmakumar

@vishakh_pk

5 days

Last year I worked at @Adobe @AdobeResearch and @allen_ai, exploring how we can help users read, organize and understand long documents. This piece covers what we learned on modelling user intent and combining LLMs with principled tools when building complex pipelines for it!

NYU Center for Data Science

@NYUDataScience

5 days

CDS PhD alum Vishakh Padmakumar (@vishakh_pk), now at @Stanford, tackled the hard part of summarization — deciding what matters. At @Adobe, he built diversity-aware summarizers; at AI2 (@allen_ai), intent-based tools for literature review tables. https://t.co/pNhEjhlUhV

2

5

50

Niloofar

@niloofar_mire

17 hours

I'm really excited about our new paper!! 📣 'Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs' Contrary to belief that RL ft degrades memorized knowledge, RL-enhanced models consistently outperform base/SFT on knowledge recall by 24pp! RL teaches

5

22

224

Alexander Spangher

@AlexanderSpangh

1 day

🚨🚨🚨So excited to have Aria-Duet accepted to Neurips 2025 Creative AI Track, see us in San Diego!! This has really been the most fun I've had doing research in a long time!! Fun work with @loubbrad and @BlancheMinerva , supported by @AiEleuther. Check it out!!

4

6

31

Alexander Spangher

@AlexanderSpangh

14 hours

✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n

22

100

443

Hao Zhu 朱昊

@_Hao_Zhu

20 hours

One year ago, I tried to use GPT-4V to control a Stretch robot while talking to me. One time, I yelled at it, “Don’t throw away the coke, I have not finished it yet” while watching it gently dropping it into the garbage can BEFORE responding with “Sure, I will not throw it away”.

realtimegym.saltlab.stanford.edu

We introduce real-time reasoning as a new problem formulation for bringing reasoning capabilities to agents operating in evolving environments, featuring AgileThinker architecture that combines...

Yixin Ye

@BLeavesYe

21 hours

Your code changes while the agent plans. Users message while it thinks. Current AI agents freeze🧊 the world to reason about it. What if AI agents could think deeply without missing what's happening around them🔥? We propose a new agent paradigm: real-time reasoning. 🔗in🧵

0

4

12

Yixin Ye

@BLeavesYe

21 hours

Your code changes while the agent plans. Users message while it thinks. Current AI agents freeze🧊 the world to reason about it. What if AI agents could think deeply without missing what's happening around them🔥? We propose a new agent paradigm: real-time reasoning. 🔗in🧵

6

15

52

Aakanksha Naik

@arnaik19

5 days

Cool new work from @vishakh_pk on incorporating user intent into document understanding tasks (partly done during an internship with us @allen_ai @SemanticScholar)!

Vishakh Padmakumar

@vishakh_pk

5 days

Last year I worked at @Adobe @AdobeResearch and @allen_ai, exploring how we can help users read, organize and understand long documents. This piece covers what we learned on modelling user intent and combining LLMs with principled tools when building complex pipelines for it!

0

1

9

Vishakh Padmakumar

@vishakh_pk

5 days

S/o to my amazing internship mentors/co-authors, Jen Healey @_zichaowang @darbour26, and then @arnaik19 @josephcc @kylelostat @_DougDowney - go work with them! ♥️ And @skwthomas from @NYUDataScience for putting this together ✍️

0

5

Jonathan Bragg

@turingmusician

6 days

Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy 👉AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems 👉SOTA results across 22 agent *classes* 👉AgentBaselines agents suite 🆕 https://t.co/BFjdGCAp1w 🧵👇

arxiv.org

AI agents hold the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new directions of inquiry;...

4

21

28

Vishakh Padmakumar

@vishakh_pk

6 days

Hanging out at AI2 last year was one of the highlights of my PhD experience! Apply to work with Kyle and the gang 💯

Kyle Lo

@kylelostat

7 days

why intern at Ai2? 🐟interns own major parts of our model development, sometimes even leading whole projects 🐡we're committed to open science & actively help our interns publish their work reach out if u wanna build open language models together 🤝 links👇

0

29

Shaily

@shaily99

7 days

Happily surprised to see OpenAI curating cultural benchmarks, especially focused on India. BUT, cultural knowledge != culturally aligned generations. My work for 2+ years focuses on cultural competence in generative tasks, like creative writing. Sharing some papers in LONG 🧵

OpenAI

@OpenAI

7 days

Introducing IndQA — a new benchmark that evaluates how well AI systems understand Indian languages and everyday cultural context. https://t.co/MWbRDFQQup

3

9

57

John Yang

@jyangballin

7 days

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

26

92

366

Siyan Sylvia Li 🦋🌸

@Sylvia_Sparkle

9 days

🤔Virtual Teaching Assistants (VTAs) have incredible potential, but how to evaluate them? Our EMNLP Findings paper addresses the lack of standardized pedagogical eval frameworks for VTAs. ‼️When using LLMs as automated evaluators, they struggle with these nuanced assessments!🧵

1

10

27

Arkadiy Saakyan

@rkdsaakyan

8 days

N-gram novelty is widely used as a measure of creativity and generalization. But if LLMs produce highly n-gram novel expressions that don’t make sense or sound awkward, should they still be called creative? In a new paper, we investigate how n-gram novelty relates to creativity.

1

13

44

Caleb Ziems

@cjziems

8 days

Can we map out gaps in LLMs’ cultural knowledge? Check out our #EMNLP2025 talk: Culture Cartography 🗓️ 11/5, 11:30 AM 📌 A109 (CSS Orals 1) Compared to trad. benchmarking, our mixed-initiative method finds more gaps even in reasoning models like R1! 📄 https://t.co/6RtZCuskl1

1

28

107

Zora Wang

@ZhiruoW

15 days

Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine

7

51

242

Jenna Russell

@jennajrussell

21 days

AI is already at work in American newsrooms. We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea. Here's what we learned about how AI is influencing local and national journalism:

4

52

143

Tuhin Chakrabarty

@TuhinChakr

22 days

🚨New paper on AI and copyright Several authors have sued LLM companies for allegedly using their books without permission for model training. 👩‍⚖️Courts, however, require empirical evidence of harm (e.g., market dilution). Our new pre-registered study addresses exactly this

9

172

526

Alexis Ross

@alexisjross

29 days

Can LLMs reason like a student? 👩🏻‍🎓📚✏️ For educational tools like AI tutors, modeling how students make mistakes is crucial. But current LLMs are much worse at simulating student errors ❌ than performing correct ✅ reasoning. We try to fix that with our method MISTAKE 🤭👇

11

55

336

Taylor Sorensen

@ma_tay_

30 days

🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵

5

47

194