Ben Shi Profile
Ben Shi

@BenShi34

Followers
271
Following
398
Media
20
Statuses
85

Agents for humans | SF šŸŒ‰ | prev @princeton_NLP @meta

New Jersey, USA
Joined April 2024
Don't wanna be here? Send us removal request.
@BenShi34
Ben Shi
2 months
As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achieve superior collaborative outcomes?. In our new preprint, we investigate human-centric model reasoning for knowledge transfer 🧵:
Tweet media one
6
39
178
@BenShi34
Ben Shi
11 days
Thanks for reposting! I can’t believe I just noticed this šŸ˜“.
@_akhaliq
AK
2 months
When Models Know More Than They Can Explain. Quantifying Knowledge Transfer in Human-AI Collaboration
Tweet media one
0
1
11
@BenShi34
Ben Shi
1 month
IMPersona is accepted at #COLM2025 + recommended for oral! Check out our work on imbuing human personality and memories into LLMs, allowing them to evade detection even by close friends and family. @COLM_conf.
@BenShi34
Ben Shi
4 months
Can language models effectively impersonate you to family and friends? We find that they can: 44% of the time, close friends and family mis-identify Llama-3.1-8b as human….šŸ§µšŸ‘‡
Tweet media one
0
0
6
@BenShi34
Ben Shi
1 month
RT @ori_press: Do language models have algorithmic creativity?. To find out, we built AlgoTune, a benchmark challenging agents to optimize….
0
59
0
@BenShi34
Ben Shi
2 months
This and lots more insights (like a trajectory visualizer) at Thanks to Carlos, Diyi, Nick, Shunyu, and Karthik for helping make this happen! I’ve wanted to do this project for an entire year and it’s so rewarding finally seeing it come to fruition :).
kite-live.vercel.app
Research on mechanisms and dynamics of knowledge transfer in human-AI collaborative settings.
0
0
7
@BenShi34
Ben Shi
2 months
As we build more powerful AI, we need to be equally intentional about building AI that can effectively teach and collaborate with humans: otherwise we risk creating a world of powerful but incomprehensible AI assistants.
1
0
5
@BenShi34
Ben Shi
2 months
By clustering embeddings of user queries, feedback, and model responses, we see a nuanced picture of the interaction types that drive effective/non-effective knowledge transfer in collaboration.
Tweet media one
1
0
3
@BenShi34
Ben Shi
2 months
Even with the overall positive correlation, the significant outliers (ie Claude-3.7-Sonnet) prove that some models are just better teachers relative to their capabilities, suggesting diminishing returns can be broken with intentional post-training.
Tweet media one
1
0
7
@BenShi34
Ben Shi
2 months
ThisĀ diminishing returns pattern suggests we could be headingĀ toward a future where:. 1. AI capabilitiesĀ race ahead ofĀ human understanding.2. TheĀ "explanatory debt" compounds over time.3. Humans become increasingly dependent on AI theyĀ understand less and less.
1
0
7
@BenShi34
Ben Shi
2 months
We obtain scaling trends of collaboration v reasoning. We find:. 1. The gap between "what AI can do" and "what AI can teach humans to do" may be widening as models improve.2. Humans don’t necessarily prefer stronger reasoners: preferences are domain and skill specific
Tweet media one
1
1
7
@BenShi34
Ben Shi
2 months
We recruit 118 participants in a large-scale study + design a framework to isolate and measure knowledge transfer. Human ideate with models in an information exchange phase, then independently execute solutions, relying on internalization + combination of model + human insights.
Tweet media one
1
0
4
@BenShi34
Ben Shi
2 months
How do we define knowledge transfer? We parameterize it as a model’s ability to project knowledge in model representational space, to human representational space: aka, its ability to convey to you what they know. (Adapted from @_beenkim)
Tweet media one
1
0
7
@BenShi34
Ben Shi
2 months
RT @a1zhang: Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II?. š—©š—¶š—±š—²š—¼š—šš—®š—ŗš—²š—•š—²š—»š—°š—µ evaluates VLMs on Game Boy & MS-DOS….
0
76
0
@BenShi34
Ben Shi
4 months
This and lots more insights here!.
0
0
1
@BenShi34
Ben Shi
4 months
During communication, we each possess a fundamental intent. However, the textual representation of this intent varies remarkably across individuals. Current prompting methods struggle to effectively override the deeply ingrained stylistic patterns, connotational nuances, and.
1
0
1
@BenShi34
Ben Shi
4 months
Now take a look at LLaMA-8B, which was fine-tuned on just 500 messages + also given my memories. Although the information it conveys isn’t very out of this world, it nails down the typical sort of apathetic and dryly sarcastic texting content that I normally dish out.
Tweet media one
1
0
0
@BenShi34
Ben Shi
4 months
Let’s dive into some examples. This is GPT-4o pretending to be me, receiving my messaging data and memories through prompting. Sure, it accesses personal details correctly… but immediately draws suspicion due to how excited it was just to be there…
Tweet media one
1
0
1
@BenShi34
Ben Shi
4 months
I gave both LLaMA-8B and GPT-4o access to my messages, and tasked them with pretending to be me, seeing if close friends and family could tell the difference. LLaMa was able to deceive them 44% of the time. GPT-4o only 6%. Why is this?!
Tweet media one
2
2
6
@BenShi34
Ben Shi
4 months
RT @a1zhang: Claude can play Pokemon, but can it play DOOM?. With a simple agent, we let VLMs play it, and found Sonnet 3.7 to get the furt….
0
56
0
@BenShi34
Ben Shi
4 months
[9/9] šŸ™ to collaborators @_carlosejimenez @karthik_r_n and our amazing testers! What began as a side project while waiting for lengthy human experiments evolved quickly. For case studies and detection strategies, check our full paper: Stay vigilant!.
0
0
2