TomSheffer17807 Profile Banner
Tom Sheffer Profile
Tom Sheffer

@TomSheffer17807

Followers
115
Following
101
Media
5
Statuses
23

@Google Research Software Engineer #Medicine #Medical_AI #AI_4_Science | -M.D-

Joined January 2024
Don't wanna be here? Send us removal request.
@TomSheffer17807
Tom Sheffer
4 months
Just wrapped up #ACL2025 and feeling inspired! Standout sessions on LLM self-consistency and the role of pretrained models in text embeddings show how far NLP has come. Thanks to the organizers for an amazing conference. #AI #NLP #Neuroscience
1
0
12
@TomSheffer17807
Tom Sheffer
4 months
8/8 The big takeaway: Focusing on a single AI's benchmark score is missing the forest for the trees. True progress is designing the whole forest: a diverse team of agents that can achieve synergy together. #FutureofAI #Research #Teamwork /w @GoldsteinYAriel @yanivdover alonmiron
0
0
2
@TomSheffer17807
Tom Sheffer
4 months
7/8 ๐Ÿ–ผ๏ธ With humans, you get synergy. Fig 3 shows clear crossover zones, yielding a Diversity Gain of up to 7pp. Both students & LLM improve after chatting proving our winning combo: calibrated confidence + diverse knowledge. #HumanAI #Teamwork
1
0
1
@TomSheffer17807
Tom Sheffer
4 months
6/8 ๐Ÿ–ผ๏ธ Fig 2 shows the LLM-only teams. The accuracy lines barely cross, meaning Diversity Gain is near zero. The result: conversation actually hurts the best LLM. This shows homogeneous knowledge leads to weak synergy.
1
0
3
@TomSheffer17807
Tom Sheffer
4 months
5/8 ๐Ÿ–ผ๏ธ Our pipeline: solo answers โ†’ 2D knowledge profile (accuracy ร— confidence) โ†’ chat โ†’ re-answer. We quantify "Diversity Gain": the accuracy boost from an oracle telling an uncertain agent exactly when to copy a confident partner.
1
0
3
@TomSheffer17807
Tom Sheffer
4 months
4/8 Why do humans improve? 1๏ธโƒฃ Calibrated confidence: They know what they don't know. 2๏ธโƒฃ Confidence drives behavior: Low confidence โ†’ switch, high โ†’ stick. 3๏ธโƒฃ Diverse knowledge: They complement each other's gaps. LLMs have #1 & #2, but lack #3. With no diversity, there's no gain
1
0
3
@TomSheffer17807
Tom Sheffer
4 months
3/8 For comparison, we benchmarked this against the human standard: clinical-year medical students. Unlike the AI-only groups, the students' collaboration was a success. The team's accuracy surpassed that of its best individual member. ๐Ÿง ๐Ÿค
1
0
3
@TomSheffer17807
Tom Sheffer
4 months
2/8 We tested three flagship Large Language Models (LLMs) in a group chat to solve medical board-style questions. ๐Ÿฅ๐Ÿค– The result? They debated at length, but their group accuracy DROPPED. The most capable model actually got dumber by listening to the others. ๐Ÿ“‰
1
0
3
@TomSheffer17807
Tom Sheffer
4 months
1/8 ๐Ÿš€ Our new pre-print, "Knowledge Is More Than Performance" is out! Can a room full of language models collaborate like human experts? Spoiler: not yet. And our research reveals the fundamental reason why ๐Ÿงต #AI #LLM #humanaiinteraction https://t.co/JTtNkZyjlf
1
3
10
@TomSheffer17807
Tom Sheffer
4 months
Presenting our CISC paper tomorrow at #ACL2025! โšก๏ธ We save >40% compute on self consistency by using the LLM's valuable internal confidence signal. ๐Ÿ—“๏ธ Poster: Tues, 16:00-17:30 @ Hall X4 X5 Paper: https://t.co/N5AFzgG5Je Also chatting: LLMs in Neuro, MedNLP, & Human-AI collab!
0
0
7
@EliyaHabba
Eliya Habba @EMNLP ๐Ÿ‡จ๐Ÿ‡ณ
4 months
Presenting my poster : ๐Ÿ•Š๏ธ DOVE - A large-scale multi-dimensional predictions dataset towards meaningful LLM evaluation, Monday 18:00 Vienna, #ACL2025 Come chat about LLM evaluation, prompt sensitivity, and our 250M COLLECTION OF MODEL OUTPUTS!
2
11
47
@TomSheffer17807
Tom Sheffer
4 months
See you at #ACL2025 in Viennaโ€”come say Hi! w/ @TaubenfeldAmir eran_ofek @amir_feder @GoldsteinYAriel @zorikgekhman @_galyo @GoogleAI
3
0
7
@TomSheffer17807
Tom Sheffer
4 months
Our method uses a model's internal confidence to make self-consistency more efficient: โœ… Saves >40% compute on average โœ… Maintains performance โœ… Adds no latency overhead We're sharing the code to encourage reproduction and new research. Check it out! ๐Ÿ’ป
2
0
5
@zorikgekhman
Zorik Gekhman
5 months
Now accepted to #COLM2025! We formally define hidden knowledge in LLMs and show its existence in a controlled study. We even show that a model can know the answer yet fail to generate it in 1,000 attempts ๐Ÿ˜ต Looking forward to presenting and discussing our work in person.
@zorikgekhman
Zorik Gekhman
8 months
๐Ÿšจ It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this โ€œhidden knowledgeโ€? In our new paper, we clearly define this concept and design controlled experiments to test it. 1/๐Ÿงต
2
19
64
@TaubenfeldAmir
Amir Taubenfeld
10 months
New Preprint ๐ŸŽ‰ LLM self-assessment unlocks efficient decoding โœ… Our Confidence-Informed Self-Consistency (CISC) method cuts compute without losing accuracy. We also rethink confidence evaluation & contribute to the debate on self-verification. https://t.co/4vSCs9ETPL 1/8๐Ÿ‘‡
1
20
56