Kunal Jha @kjha02 X Profile

Kunal Jha

@kjha02

Followers

304

Following

460

Media

13

Statuses

73

CS PhD student @UW, prev. CSxPhilosophy @Dartmouth

https://t.co/GwYgkevaaN

Joined March 2024

Don't wanna be here? Send us removal request.

Kunal Jha

@kjha02

6 months

Oral @icmlconf !!! Can't wait to share our work and hear the community's thoughts on it, should be a fun talk! Can't thank my collaborators enough: @cogscikid @liangyanchenggg @SimonShaoleiDu @maxhkw @natashajaques

Kunal Jha

@kjha02

8 months

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. https://t.co/XkfmGOSeh7🧵

0

4

52

Kunal Jha

@kjha02

3 days

Big thanks to the organizers for a fun workshop and for the incredible honor!! Also can’t thank my collaborators enough @aydan_huang265 @EricYe29011995 @natashajaques and @maxhkw!!! Was a fun (albeit brief) NeurIPS trip, I’m excited to go back to work!

Melanie Sclar @ NeurIPS

@melaniesclar

3 days

@LAW2025_NeurIPS best paper award for the Agents and Planning track: "Modeling others' minds as code" by Kunal Jha et al. @kjha02

2

9

38

Kunal Jha

@kjha02

10 days

I’m super excited to be giving a talk on this work for the @LAW2025_NeurIPS workshop @NeurIPSConf !!! If you’re attending the conference Saturday or Sunday and want to chat about anything multi-agent, RL, ALife, or general cog sci and philosophy stuff, send me a message!!!

Kunal Jha

@kjha02

2 months

Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")? Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior! https://t.co/1t2fsW7jyL🧵

2

6

24

Stella Li @NeurIPS 2025

@StellaLisy

16 days

🤔💭What even is reasoning? It's time to answer the hard questions! We built the first unified taxonomy of 28 cognitive elements underlying reasoning Spoiler—LLMs commonly employ sequential reasoning, rarely self-awareness, and often fail to use correct reasoning structures🧠

8

45

253

Zhiyuan Zeng

@ZhiyuanZeng_

30 days

RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵

12

115

474

Abhishek Gupta

@abhishekunique7

2 months

Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only

12

70

406

Marwa Abdulhai

@marwaabdulhai

2 months

Large Language Models interact with millions of people worldwide. However, their ability to produce deceptive outputs poses significant safety concerns. We introduce a new metric, belief misalignment, to quantify these behaviors, investigate the extent to which LLMs engage in

2

6

28

Kunal Jha

@kjha02

2 months

For more analyses and insights, check out the paper and code: https://t.co/1t2fsW7jyL Can’t thank my collaborators @aydan_huang265, @EricYe29011995, @natashajaques, @maxhkw enough for all the help and support!!!

kjha02.github.io

How can AI quickly and accurately predict the behaviors of others? We show an AI which uses Large Language Models to synthesize agent behavior into Python programs, then Bayesian Inference to reason...

1

3

12

Kunal Jha

@kjha02

2 months

The big takeaway: framing behavior prediction as a program synthesis problem is an accurate, scalable, and efficient path to human-compatible AI! It allows multi-agent systems to rapidly and accurately anticipate others' actions for more effective collaboration.

1

0

3

Kunal Jha

@kjha02

2 months

ROTE doesn’t sacrifice accuracy for speed! While initial program generation takes time, the inferred code can be executed rapidly, making it orders of magnitude more efficient than other LLM-based methods for long-horizon predictions.

1

Kunal Jha

@kjha02

2 months

What explains this performance gap? ROTE handles complexity better. It excels with intricate tasks like cleaning and interacting with objects (e.g., turning items on/off) in Partnr, while baselines only showed success with simpler navigation and object manipulation.

1

Kunal Jha

@kjha02

2 months

We scaled up to the embodied robotics simulator Partnr, a complex, partially observable environment with goal-directed LLM-agents. ROTE still significantly outperformed all LLM-based and behavior cloning baselines for high-level action prediction in this domain!

1

Kunal Jha

@kjha02

2 months

A key strength of code: zero-shot generalization. Programs inferred from one environment transfer to new settings more effectively than all other baselines. ROTE's learned programs transfer without needing to re-incur the cost of text generation.

1

Kunal Jha

@kjha02

2 months

Can scripts model nuanced, real human behavior? We collected human gameplay data and found ROTE not only outperformed all baselines but also achieved human-level performance when predicting the trajectories of real people!

1

2

Kunal Jha

@kjha02

2 months

Introducing ROTE (Representing Others’ Trajectories as Executables)! We use LLMs to generate Python programs 💻 that model observed behavior, then uses Bayesian inference to select the most likely ones. The result: A dynamic, composable, and analyzable predictive representation!

1

4

Kunal Jha

@kjha02

2 months

Traditional AI is stuck! Predicting behavior is either brittle (Behavior Cloning) or too slow with endless belief space enumeration (Inverse Planning). How can we avoid mental state dualism while building scalable, robust predictive models?

1

0

3

Kunal Jha

@kjha02

2 months

Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")? Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior! https://t.co/1t2fsW7jyL🧵

4

33

104

Kunal Jha

@kjha02

2 months

Was so fun seeing this work come together, congrats to the authors on the spotlight!!!

Anshul Nasery @ NeurIPS2025

@anshulnasery

2 months

Are you worried that an LLM you trained could be stolen and misused by mysterious masked men 🥷? Our work (now a #NeurIPS2025 Spotlight 💫) can help you detect such unauthorized use. As a side-quest, we also analyse memorization and forgetting in LLMs 🧵(1/11).

0

1

3

Kunal Jha

@kjha02

2 months

The real bitter lesson may be “beware of category mistakes” - Concepts like “real reasoning”, “real imitation”, or “real intelligence” may only have meaning with respect to their utility. What can a system do, why, and where does that take us are way more productive questions!

ISMAIL

@iamaniku

3 months

Richard Sutton contends that LLMs are not a viable path to true general intelligence, considering them a "dead end." His primary critique is that LLMs operate by mimicking human behavior and predicting the next token based on vast amounts of internet text, rather than developing

0

1

4

Melodies & Masterpieces

@SVG__Collection

4 months

66 years ago today, Miles Davis released “Kind of Blue” Kind of Blue is regarded by many as a true masterpiece, the greatest jazz album ever recorded, and one of the best albums of all time… But what made the album so extraordinary? A thread 🧵

90

1K

5K

Max Kleiman-Weiner

@maxhkw

5 months

Our new paper is out in PNAS: "Evolving general cooperation with a Bayesian theory of mind"! Humans are the ultimate cooperators. We coordinate on a scale and scope no other species (nor AI) can match. What makes this possible? 🧵

2

21

100