im_td Profile Banner
Tim Davidson Profile
Tim Davidson

@im_td

Followers
816
Following
11K
Media
146
Statuses
3K

PhD research @EPFL on reliable magic | spent time @MSFTResearch on agentic systems, @Google on synthetic data | https://t.co/Iveq1Vw9WH

Joined May 2015
Don't wanna be here? Send us removal request.
@im_td
Tim Davidson
3 days
We’ve identified a “Collaboration Gap” in today’s top AI models. Testing 32 leading LMs on our novel maze-solving benchmark, we found that models that excel solo can see their performance *collapse* when required to collaborate – even with an identical copy of themselves. A \🧵
1
18
47
@im_td
Tim Davidson
3 days
This research was done during my internship @MSFTResearch. Thank you to my awesome collaborators! @adamfourney @SaleemaAmershi @cervisiarius @erichorvitz @ecekamar Read the full paper here: > https://t.co/WOlX5WRVMQ And a lighter blogpost: > https://t.co/6eKhBeR0QN
0
0
6
@im_td
Tim Davidson
3 days
Our findings argue that collaboration is a distinct capability that current training strategies fail to capture. We shouldn’t just hope for it to emerge – we must *design* for it. This means new evals, training strategies, and interaction designs.
1
0
7
@im_td
Tim Davidson
3 days
Alternatively, we could use a strong model to “recover” a dialogue: a) Strong Primer: Just one strong "priming" message (K=2) lets a weak model perform near the strong model's level. b) Strong Recovery: If weak models start, a strong model struggles to recover the session.
1
0
2
@im_td
Tim Davidson
3 days
Because which model starts has such a pronounced impact on success, we experimented with a “relay” inference strategy: Have a strong (expensive) model “prime” the dialogue with just the first K messages, then hand off to a weaker (cheaper) model to finish.
1
0
3
@im_td
Tim Davidson
3 days
Letting models with different strengths and from different builders collaborate provides further insights: ordering and cross-family pairings matter, a *lot*. Generally: strong model starts > weak models starts, even though both need to agree on each move!
1
0
4
@im_td
Tim Davidson
3 days
The Collaboration Gap: Even when models are *really* good at completing mazes solo, requiring them to solve the *same* mazes with independent copies of themselves can drastically reduce performance. This gap is especially pronounced in distilled models.
1
0
4
@im_td
Tim Davidson
3 days
Stronger models are better at grounding than weaker models: 🟢 Strong collaborators (left) immediately define a coordinate system and share info. 🔴 Weak ones (right) are vague, leading to confusion, disagreement, and failure.
1
0
6
@im_td
Tim Davidson
3 days
Why is this hard? By splitting up information and requiring agreement, agents have to engage in “grounding” -- are shared information and actions understood the same way by both agents? Failure to ground has consequences (see image).
1
0
5
@im_td
Tim Davidson
3 days
How did we measure this? We designed a collaborative maze-solving benchmark that *isolates* collaborative capabilities. The twist: no agent gets the full map. We split the info, giving each agent a partial view. The *only* way to solve the maze is to talk, share & agree on moves
1
0
4
@im_td
Tim Davidson
3 days
Real-world communication: Current multi-agent systems rely on *pre-defined* communication protocols, e.g., MCP, or central orchestration. In contrast, open-world integration likely requires adaptive, *dynamic* communication – something humans are surprisingly good at!
1
0
4
@im_td
Tim Davidson
3 days
Why does this matter? The future of AI won’t be one giant model; it’s systems of multiple, independent AI agents w/ different information and skills. The success of such systems will critically depend on effective collaboration. But how do we measure collaborative capabilities?
1
0
5
@mariabrbic
Maria Brbic
4 days
Weak models can supervise stronger ones but we find that weak-to-strong generalization can become infeasible under distribution shifts! In our #NeurIPS25 paper, we introduce RAVEN 🐦‍⬛, a framework that dynamically learns optimal combinations of weak models to robustly guide
0
7
34
@jxmnop
dr. jack morris
10 days
this post is complete misinformation LLMs are lossy compressors! of *training data*. LLMs losslessly compress *prompts*, internally. that’s what this paper shows. source: i am the author of “Language Model Inversion”, the original paper on this
96
218
4K
@im_td
Tim Davidson
11 days
Side-channel communication is such a critical area of research for the coming wave of agent-to-agent interactions — nice work!
@noranta4
Antonio Norelli
11 days
How to tamper with a gas meter to pay lower bills? This is the story of how a supposedly aligned open source LLM, perhaps not even knowing how to do it, will give you the right instructions. (1/3) https://t.co/2SzbatO3mq
0
0
0
@cervisiarius
Bob West
11 days
📄✨Excited to share our new paper accepted to #EMNLP ’25: Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information Extraction https://t.co/ljsWULBHEA (led by #EPFL PhD student Marija Šakota -- soon on the job market, hire her!!)
1
7
15
@egrefen
Edward Grefenstette
12 days
This slop must hit so hard for people who care more about being "in the room where it happens" than actually being involved in building the frontier of AI. The instagrammafication of entrepreneurship has been rampant over the last few years and now we see it in research 😱
15
13
392
@caglarml
Caglar Gulcehre
14 days
This was an incredibly fun project to work on, and it has some of my favorite components in a research idea: - Simple. - Intuitive and works really well. In this work, we introduced the loophole technique, which lets discrete diffusion models bypass the "sampling wall" by
sites.google.com
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
@SungjinAhn_
Sungjin Ahn
16 days
🚨 Check out our new paper on next generation language modeling via "loopholing" discrete diffusion! 🤯 Surprisingly, our loopholing diffusion achieved a huge performance improvement, finally making it match (or even surpass) autoregressive models! ✅ How? We introduce the
1
6
17
@russogiusep
Giuseppe (Peppe) Russo
19 days
🏆🏆🏆 Thrilled to share that our paper “The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates” received an Honorable Mention Award at @ACM_CSCW 2025 🎉 By analyzing thousands of ICLR peer reviews, we show that papers receiving
@manoelribeiro
Manoel
2 years
Received ChatGPT-like reviews? They may have boosted your paper's odds of being accepted! In a quasi-experimental study of a top AI conference, ICLR, we measured the effect of AI-assisted peer reviews on scores and acceptance rates. (Led by @russogiusep) https://t.co/iRfF7FnZhY
1
4
16
@cervisiarius
Bob West
22 days
🚨New paper alert! 🚨 Tandem Training for Language Models https://t.co/Emzcgf1KHx Actions & thoughts of AI w/ superhuman skills will be hard for humans to follow, undermining human oversight of AI. We propose a new way to make AI produce human-understandable solutions. How?👉🧵
4
23
67