Vishakh Padmakumar Profile
Vishakh Padmakumar

@vishakh_pk

Followers
648
Following
2K
Media
31
Statuses
405

Postdoc @stanfordnlp @stanfordAILab Prev @NYUDataScience

Joined August 2015
Don't wanna be here? Send us removal request.
@vishakh_pk
Vishakh Padmakumar
5 days
Last year I worked at @Adobe @AdobeResearch and @allen_ai, exploring how we can help users read, organize and understand long documents. This piece covers what we learned on modelling user intent and combining LLMs with principled tools when building complex pipelines for it!
@NYUDataScience
NYU Center for Data Science
5 days
CDS PhD alum Vishakh Padmakumar (@vishakh_pk), now at @Stanford, tackled the hard part of summarization — deciding what matters. At @Adobe, he built diversity-aware summarizers; at AI2 (@allen_ai), intent-based tools for literature review tables. https://t.co/pNhEjhlUhV
2
5
50
@niloofar_mire
Niloofar
17 hours
I'm really excited about our new paper!! 📣 'Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs' Contrary to belief that RL ft degrades memorized knowledge, RL-enhanced models consistently outperform base/SFT on knowledge recall by 24pp! RL teaches
5
22
224
@AlexanderSpangh
Alexander Spangher
1 day
🚨🚨🚨So excited to have Aria-Duet accepted to Neurips 2025 Creative AI Track, see us in San Diego!! This has really been the most fun I've had doing research in a long time!! Fun work with @loubbrad and @BlancheMinerva , supported by @AiEleuther. Check it out!!
4
6
31
@AlexanderSpangh
Alexander Spangher
14 hours
✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n
22
100
443
@_Hao_Zhu
Hao Zhu 朱昊
20 hours
One year ago, I tried to use GPT-4V to control a Stretch robot while talking to me. One time, I yelled at it, “Don’t throw away the coke, I have not finished it yet” while watching it gently dropping it into the garbage can BEFORE responding with “Sure, I will not throw it away”.
realtimegym.saltlab.stanford.edu
We introduce real-time reasoning as a new problem formulation for bringing reasoning capabilities to agents operating in evolving environments, featuring AgileThinker architecture that combines...
@BLeavesYe
Yixin Ye
21 hours
Your code changes while the agent plans. Users message while it thinks. Current AI agents freeze🧊 the world to reason about it. What if AI agents could think deeply without missing what's happening around them🔥? We propose a new agent paradigm: real-time reasoning. 🔗in🧵
0
4
12
@BLeavesYe
Yixin Ye
21 hours
Your code changes while the agent plans. Users message while it thinks. Current AI agents freeze🧊 the world to reason about it. What if AI agents could think deeply without missing what's happening around them🔥? We propose a new agent paradigm: real-time reasoning. 🔗in🧵
6
15
52
@arnaik19
Aakanksha Naik
5 days
Cool new work from @vishakh_pk on incorporating user intent into document understanding tasks (partly done during an internship with us @allen_ai @SemanticScholar)!
@vishakh_pk
Vishakh Padmakumar
5 days
Last year I worked at @Adobe @AdobeResearch and @allen_ai, exploring how we can help users read, organize and understand long documents. This piece covers what we learned on modelling user intent and combining LLMs with principled tools when building complex pipelines for it!
0
1
9
@vishakh_pk
Vishakh Padmakumar
5 days
S/o to my amazing internship mentors/co-authors, Jen Healey @_zichaowang @darbour26, and then @arnaik19 @josephcc @kylelostat @_DougDowney - go work with them! ♥️ And @skwthomas from @NYUDataScience for putting this together ✍️
0
0
5
@turingmusician
Jonathan Bragg
6 days
Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy 👉AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems 👉SOTA results across 22 agent *classes* 👉AgentBaselines agents suite 🆕 https://t.co/BFjdGCAp1w 🧵👇
Tweet card summary image
arxiv.org
AI agents hold the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new directions of inquiry;...
4
21
28
@vishakh_pk
Vishakh Padmakumar
6 days
Hanging out at AI2 last year was one of the highlights of my PhD experience! Apply to work with Kyle and the gang 💯
@kylelostat
Kyle Lo
7 days
why intern at Ai2? 🐟interns own major parts of our model development, sometimes even leading whole projects 🐡we're committed to open science & actively help our interns publish their work reach out if u wanna build open language models together 🤝 links👇
0
0
29
@shaily99
Shaily
7 days
Happily surprised to see OpenAI curating cultural benchmarks, especially focused on India. BUT, cultural knowledge != culturally aligned generations. My work for 2+ years focuses on cultural competence in generative tasks, like creative writing. Sharing some papers in LONG 🧵
@OpenAI
OpenAI
7 days
Introducing IndQA — a new benchmark that evaluates how well AI systems understand Indian languages and everyday cultural context. https://t.co/MWbRDFQQup
3
9
57
@jyangballin
John Yang
7 days
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
26
92
366
@Sylvia_Sparkle
Siyan Sylvia Li 🦋🌸
9 days
🤔Virtual Teaching Assistants (VTAs) have incredible potential, but how to evaluate them? Our EMNLP Findings paper addresses the lack of standardized pedagogical eval frameworks for VTAs. ‼️When using LLMs as automated evaluators, they struggle with these nuanced assessments!🧵
1
10
27
@rkdsaakyan
Arkadiy Saakyan
8 days
N-gram novelty is widely used as a measure of creativity and generalization. But if LLMs produce highly n-gram novel expressions that don’t make sense or sound awkward, should they still be called creative? In a new paper, we investigate how n-gram novelty relates to creativity.
1
13
44
@cjziems
Caleb Ziems
8 days
Can we map out gaps in LLMs’ cultural knowledge? Check out our #EMNLP2025 talk: Culture Cartography 🗓️ 11/5, 11:30 AM 📌 A109 (CSS Orals 1) Compared to trad. benchmarking, our mixed-initiative method finds more gaps even in reasoning models like R1! 📄 https://t.co/6RtZCuskl1
1
28
107
@ZhiruoW
Zora Wang
15 days
Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine
7
51
242
@jennajrussell
Jenna Russell
21 days
AI is already at work in American newsrooms. We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea. Here's what we learned about how AI is influencing local and national journalism:
4
52
143
@TuhinChakr
Tuhin Chakrabarty
22 days
🚨New paper on AI and copyright Several authors have sued LLM companies for allegedly using their books without permission for model training. 👩‍⚖️Courts, however, require empirical evidence of harm (e.g., market dilution). Our new pre-registered study addresses exactly this
9
172
526
@alexisjross
Alexis Ross
29 days
Can LLMs reason like a student? 👩🏻‍🎓📚✏️ For educational tools like AI tutors, modeling how students make mistakes is crucial. But current LLMs are much worse at simulating student errors ❌ than performing correct ✅ reasoning. We try to fix that with our method MISTAKE 🤭👇
11
55
336
@ma_tay_
Taylor Sorensen
30 days
🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵
5
47
194