Vishakh Padmakumar
@vishakh_pk
Followers
648
Following
2K
Media
31
Statuses
405
Postdoc @stanfordnlp @stanfordAILab Prev @NYUDataScience
Joined August 2015
Last year I worked at @Adobe @AdobeResearch and @allen_ai, exploring how we can help users read, organize and understand long documents. This piece covers what we learned on modelling user intent and combining LLMs with principled tools when building complex pipelines for it!
CDS PhD alum Vishakh Padmakumar (@vishakh_pk), now at @Stanford, tackled the hard part of summarization — deciding what matters. At @Adobe, he built diversity-aware summarizers; at AI2 (@allen_ai), intent-based tools for literature review tables. https://t.co/pNhEjhlUhV
2
5
50
I'm really excited about our new paper!! 📣 'Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs' Contrary to belief that RL ft degrades memorized knowledge, RL-enhanced models consistently outperform base/SFT on knowledge recall by 24pp! RL teaches
5
22
224
🚨🚨🚨So excited to have Aria-Duet accepted to Neurips 2025 Creative AI Track, see us in San Diego!! This has really been the most fun I've had doing research in a long time!! Fun work with @loubbrad and @BlancheMinerva , supported by @AiEleuther. Check it out!!
4
6
31
✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n
22
100
443
One year ago, I tried to use GPT-4V to control a Stretch robot while talking to me. One time, I yelled at it, “Don’t throw away the coke, I have not finished it yet” while watching it gently dropping it into the garbage can BEFORE responding with “Sure, I will not throw it away”.
realtimegym.saltlab.stanford.edu
We introduce real-time reasoning as a new problem formulation for bringing reasoning capabilities to agents operating in evolving environments, featuring AgileThinker architecture that combines...
Your code changes while the agent plans. Users message while it thinks. Current AI agents freeze🧊 the world to reason about it. What if AI agents could think deeply without missing what's happening around them🔥? We propose a new agent paradigm: real-time reasoning. 🔗in🧵
0
4
12
Your code changes while the agent plans. Users message while it thinks. Current AI agents freeze🧊 the world to reason about it. What if AI agents could think deeply without missing what's happening around them🔥? We propose a new agent paradigm: real-time reasoning. 🔗in🧵
6
15
52
Cool new work from @vishakh_pk on incorporating user intent into document understanding tasks (partly done during an internship with us @allen_ai @SemanticScholar)!
Last year I worked at @Adobe @AdobeResearch and @allen_ai, exploring how we can help users read, organize and understand long documents. This piece covers what we learned on modelling user intent and combining LLMs with principled tools when building complex pipelines for it!
0
1
9
S/o to my amazing internship mentors/co-authors, Jen Healey @_zichaowang @darbour26, and then @arnaik19 @josephcc @kylelostat @_DougDowney - go work with them! ♥️ And @skwthomas from @NYUDataScience for putting this together ✍️
0
0
5
Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy 👉AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems 👉SOTA results across 22 agent *classes* 👉AgentBaselines agents suite 🆕 https://t.co/BFjdGCAp1w 🧵👇
arxiv.org
AI agents hold the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new directions of inquiry;...
4
21
28
Hanging out at AI2 last year was one of the highlights of my PhD experience! Apply to work with Kyle and the gang 💯
why intern at Ai2? 🐟interns own major parts of our model development, sometimes even leading whole projects 🐡we're committed to open science & actively help our interns publish their work reach out if u wanna build open language models together 🤝 links👇
0
0
29
Happily surprised to see OpenAI curating cultural benchmarks, especially focused on India. BUT, cultural knowledge != culturally aligned generations. My work for 2+ years focuses on cultural competence in generative tasks, like creative writing. Sharing some papers in LONG 🧵
Introducing IndQA — a new benchmark that evaluates how well AI systems understand Indian languages and everyday cultural context. https://t.co/MWbRDFQQup
3
9
57
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
26
92
366
🤔Virtual Teaching Assistants (VTAs) have incredible potential, but how to evaluate them? Our EMNLP Findings paper addresses the lack of standardized pedagogical eval frameworks for VTAs. ‼️When using LLMs as automated evaluators, they struggle with these nuanced assessments!🧵
1
10
27
N-gram novelty is widely used as a measure of creativity and generalization. But if LLMs produce highly n-gram novel expressions that don’t make sense or sound awkward, should they still be called creative? In a new paper, we investigate how n-gram novelty relates to creativity.
1
13
44
Can we map out gaps in LLMs’ cultural knowledge? Check out our #EMNLP2025 talk: Culture Cartography 🗓️ 11/5, 11:30 AM 📌 A109 (CSS Orals 1) Compared to trad. benchmarking, our mixed-initiative method finds more gaps even in reasoning models like R1! 📄 https://t.co/6RtZCuskl1
1
28
107
Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine
7
51
242
AI is already at work in American newsrooms. We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea. Here's what we learned about how AI is influencing local and national journalism:
4
52
143
🚨New paper on AI and copyright Several authors have sued LLM companies for allegedly using their books without permission for model training. 👩⚖️Courts, however, require empirical evidence of harm (e.g., market dilution). Our new pre-registered study addresses exactly this
9
172
526
Can LLMs reason like a student? 👩🏻🎓📚✏️ For educational tools like AI tutors, modeling how students make mistakes is crucial. But current LLMs are much worse at simulating student errors ❌ than performing correct ✅ reasoning. We try to fix that with our method MISTAKE 🤭👇
11
55
336
🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵
5
47
194