Benjamin Van Durme Profile
Benjamin Van Durme

@ben_vandurme

Followers
1K
Following
212
Media
5
Statuses
165

Johns Hopkins / Microsoft

Joined December 2008
Don't wanna be here? Send us removal request.
@ben_vandurme
Benjamin Van Durme
3 days
From now on in my advising meetings, any negative result will be met with my response of "think deeper".
@yanndubs
Yann Dubois
5 days
We significantly increased the rate limits to reasoning model by popular demand. If correctness is really important for you ask the model to “think deeper” or select “gpt5 thinking” in the model picker, this uses a higher reasoning effort than when you are auto switched to.
1
2
24
@ben_vandurme
Benjamin Van Durme
3 days
I am growing an R&D team around Copilot Tuning, a newly announced effort that supports adaptation at a customer-specific level. Join us! We collaborate with a crack team of eng and scientists that support the product, also growing!.
0
14
70
@ben_vandurme
Benjamin Van Durme
30 days
Ettin, a two-headed giant . language model.
Tweet card summary image
en.wikipedia.org
@orionweller
Orion Weller
30 days
Special thanks to @jhuclsp for amazing collaborators Kathryn Ricci @ruyimarone @ben_vandurme @lawrie_dawn, and LightOn with @antoine_chaffin!. And this project wouldn't exist without the efforts of ModernBERT (@benjamin_warner @bclavie @jeremyphoward, many more) so 🙏 them also.
0
3
9
@ben_vandurme
Benjamin Van Durme
30 days
Will continues to drive great work in the modular use of adapters. From security benefits in AdapterSwap to RE-adapting to the COLM '25 SpectR that enables this new result LAG.
Tweet card summary image
arxiv.org
Training large, general-purpose language models poses significant challenges. The growing availability of specialized expert models, fine-tuned from pretrained models for specific tasks or...
@willcfleshman
William Fleshman
1 month
Check out the paper w/@ben_vandurme now on arXiv:.
0
0
4
@ben_vandurme
Benjamin Van Durme
2 months
RT @JohnCLangford: A new opening for multimodal model research: . Please apply if interested.
0
10
0
@ben_vandurme
Benjamin Van Durme
3 months
RT @EYangTW: 🚨Wouldn’t it be nice if your agentic search system could reason over all your docs?. ✨Introducing Rank-K, a listwise reranker….
0
28
0
@ben_vandurme
Benjamin Van Durme
3 months
RT @satyanadella: 2. Copilot Tuning: Copilot can now learn your company’s unique tone and language. It is all about taking that expertise y….
0
57
0
@ben_vandurme
Benjamin Van Durme
4 months
RT @willcfleshman: 🚨 Our latest paper is now on ArXiv! 👻.(w/ @ben_vandurme). SpectR: Dynamically Composing LM Experts with Spectral Routing….
0
12
0
@ben_vandurme
Benjamin Van Durme
4 months
RT @alexdmartin314: Wish you could get a Wikipedia style article for unfolding events?. Introducing WikiVideo: a new multimodal task and be….
0
13
0
@ben_vandurme
Benjamin Van Durme
5 months
RT @mustafasuleyman: You can't just be right, you have to know you're right. Good advice for LLMs, according to new Johns Hopkins research.….
0
46
0
@ben_vandurme
Benjamin Van Durme
5 months
See here for more details:.Code and models to be released soon as part of a further announcement. w/ Vivek Chari and @hiaxui.
Tweet card summary image
arxiv.org
Sequence-to-sequence tasks often benefit from long contexts, but the quadratic complexity of self-attention in standard Transformers renders this non-trivial. During generation, temporary...
0
1
7
@ben_vandurme
Benjamin Van Durme
5 months
This follows our earlier compression work:.• CCoT • Speech Decoders • Text Decoders • Text Encoders • Propositions!
Tweet card summary image
aclanthology.org
Rachel Rudinger, Kevin Duh, Benjamin Van Durme. Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers. 2017.
1
1
3
@ben_vandurme
Benjamin Van Durme
5 months
Query-independent compression is especially useful for tasks like prefix caching, where you’d like the compressed cache to be informative for many possible queries.
1
0
1
@ben_vandurme
Benjamin Van Durme
5 months
KVD can be considered an upgrade to using h20 (, by distilling your model before hosting.
1
1
1
@ben_vandurme
Benjamin Van Durme
5 months
KVD trains a student model under KL to reproduce the teacher distribution, while learning an eviction strategy of prior states. The student model is forced to dynamically compress information => handling longer inputs without actually storing it!.
1
0
1
@ben_vandurme
Benjamin Van Durme
5 months
Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation.
Tweet media one
2
28
135
@ben_vandurme
Benjamin Van Durme
6 months
RT @zhengping_jiang: 1/ 🚨LLMs can still be factual even when they don’t know the full answer!🚨 Introducing Conformal Linguistic Calibration….
0
16
0
@ben_vandurme
Benjamin Van Durme
6 months
RT @CVPR: #CVPR2025 Area Chairs (ACs) identified a number of highly irresponsible reviewers, those who either abandoned the review process….
0
56
0
@ben_vandurme
Benjamin Van Durme
6 months
RT @orionweller: Ever wonder how test-time compute would do in retrieval? 🤔. introducing ✨rank1✨. rank1 is distilled from R1 & designed for….
0
38
0
@ben_vandurme
Benjamin Van Durme
6 months
RT @harsh_jhamtani: With PeopleJoin, our new benchmark, we study LM agents as coordinators to gather distributed insights and empower colla….
0
3
0