Kunal Jha Profile
Kunal Jha

@kjha02

Followers
304
Following
460
Media
13
Statuses
73

CS PhD student @UW, prev. CSxPhilosophy @Dartmouth

Joined March 2024
Don't wanna be here? Send us removal request.
@kjha02
Kunal Jha
6 months
Oral @icmlconf !!! Can't wait to share our work and hear the community's thoughts on it, should be a fun talk! Can't thank my collaborators enough: @cogscikid @liangyanchenggg @SimonShaoleiDu @maxhkw @natashajaques
@kjha02
Kunal Jha
8 months
Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. https://t.co/XkfmGOSeh7🧵
0
4
52
@kjha02
Kunal Jha
3 days
Big thanks to the organizers for a fun workshop and for the incredible honor!! Also can’t thank my collaborators enough @aydan_huang265 @EricYe29011995 @natashajaques and @maxhkw!!! Was a fun (albeit brief) NeurIPS trip, I’m excited to go back to work!
@melaniesclar
Melanie Sclar @ NeurIPS
3 days
@LAW2025_NeurIPS best paper award for the Agents and Planning track: "Modeling others' minds as code" by Kunal Jha et al. @kjha02
2
9
38
@kjha02
Kunal Jha
10 days
I’m super excited to be giving a talk on this work for the @LAW2025_NeurIPS workshop @NeurIPSConf !!! If you’re attending the conference Saturday or Sunday and want to chat about anything multi-agent, RL, ALife, or general cog sci and philosophy stuff, send me a message!!!
@kjha02
Kunal Jha
2 months
Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")? Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior! https://t.co/1t2fsW7jyL🧵
2
6
24
@StellaLisy
Stella Li @NeurIPS 2025
16 days
🤔💭What even is reasoning? It's time to answer the hard questions! We built the first unified taxonomy of 28 cognitive elements underlying reasoning Spoiler—LLMs commonly employ sequential reasoning, rarely self-awareness, and often fail to use correct reasoning structures🧠
8
45
253
@ZhiyuanZeng_
Zhiyuan Zeng
30 days
RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵
12
115
474
@abhishekunique7
Abhishek Gupta
2 months
Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only
12
70
406
@marwaabdulhai
Marwa Abdulhai
2 months
Large Language Models interact with millions of people worldwide. However, their ability to produce deceptive outputs poses significant safety concerns. We introduce a new metric, belief misalignment, to quantify these behaviors, investigate the extent to which LLMs engage in
2
6
28
@kjha02
Kunal Jha
2 months
The big takeaway: framing behavior prediction as a program synthesis problem is an accurate, scalable, and efficient path to human-compatible AI! It allows multi-agent systems to rapidly and accurately anticipate others' actions for more effective collaboration.
1
0
3
@kjha02
Kunal Jha
2 months
ROTE doesn’t sacrifice accuracy for speed! While initial program generation takes time, the inferred code can be executed rapidly, making it orders of magnitude more efficient than other LLM-based methods for long-horizon predictions.
1
1
1
@kjha02
Kunal Jha
2 months
What explains this performance gap? ROTE handles complexity better. It excels with intricate tasks like cleaning and interacting with objects (e.g., turning items on/off) in Partnr, while baselines only showed success with simpler navigation and object manipulation.
1
1
1
@kjha02
Kunal Jha
2 months
We scaled up to the embodied robotics simulator Partnr, a complex, partially observable environment with goal-directed LLM-agents. ROTE still significantly outperformed all LLM-based and behavior cloning baselines for high-level action prediction in this domain!
1
1
1
@kjha02
Kunal Jha
2 months
A key strength of code: zero-shot generalization. Programs inferred from one environment transfer to new settings more effectively than all other baselines. ROTE's learned programs transfer without needing to re-incur the cost of text generation.
1
1
1
@kjha02
Kunal Jha
2 months
Can scripts model nuanced, real human behavior? We collected human gameplay data and found ROTE not only outperformed all baselines but also achieved human-level performance when predicting the trajectories of real people!
1
1
2
@kjha02
Kunal Jha
2 months
Introducing ROTE (Representing Others’ Trajectories as Executables)! We use LLMs to generate Python programs 💻 that model observed behavior, then uses Bayesian inference to select the most likely ones. The result: A dynamic, composable, and analyzable predictive representation!
1
1
4
@kjha02
Kunal Jha
2 months
Traditional AI is stuck! Predicting behavior is either brittle (Behavior Cloning) or too slow with endless belief space enumeration (Inverse Planning). How can we avoid mental state dualism while building scalable, robust predictive models?
1
0
3
@kjha02
Kunal Jha
2 months
Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")? Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior! https://t.co/1t2fsW7jyL🧵
4
33
104
@kjha02
Kunal Jha
2 months
Was so fun seeing this work come together, congrats to the authors on the spotlight!!!
@anshulnasery
Anshul Nasery @ NeurIPS2025
2 months
Are you worried that an LLM you trained could be stolen and misused by mysterious masked men 🥷? Our work (now a #NeurIPS2025 Spotlight 💫) can help you detect such unauthorized use. As a side-quest, we also analyse memorization and forgetting in LLMs 🧵(1/11).
0
1
3
@kjha02
Kunal Jha
2 months
The real bitter lesson may be “beware of category mistakes” - Concepts like “real reasoning”, “real imitation”, or “real intelligence” may only have meaning with respect to their utility. What can a system do, why, and where does that take us are way more productive questions!
@iamaniku
ISMAIL
3 months
Richard Sutton contends that LLMs are not a viable path to true general intelligence, considering them a "dead end." His primary critique is that LLMs operate by mimicking human behavior and predicting the next token based on vast amounts of internet text, rather than developing
0
1
4
@SVG__Collection
Melodies & Masterpieces
4 months
66 years ago today, Miles Davis released “Kind of Blue” Kind of Blue is regarded by many as a true masterpiece, the greatest jazz album ever recorded, and one of the best albums of all time… But what made the album so extraordinary? A thread 🧵
90
1K
5K
@maxhkw
Max Kleiman-Weiner
5 months
Our new paper is out in PNAS: "Evolving general cooperation with a Bayesian theory of mind"! Humans are the ultimate cooperators. We coordinate on a scale and scope no other species (nor AI) can match. What makes this possible? 🧵
2
21
100