Howard Yen
@HowardYen1
Followers
266
Following
712
Media
20
Statuses
54
Joined January 2014
We will present QRHead (@WuweiZhang0723) #EMNLP2025 Without any training, we boosts Llama-3.1-8Bβs performance by >10% πon context reasoning tasks (CLIPPER, LongMemEval), and outperforms specialized re-rankers on BEIR. Check out our (virtual) poster tonight!
π€ Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval? π£ Introducing QRHeads (query-focused retrieval heads) that enhance retrieval Main contributions: π Better head detection: we find a
0
10
36
This project was done during my wonderful summer internship @samaya_AI with @AshwinParan @xiamengzhou @_vThejas @jmhessel @danqi_chen and @yuhaozhangx We also released a blog here: https://t.co/Z9HwYnGQtg
samaya.ai
Samaya's researchers create deep research systems that outperform existing frameworks while using a fraction of the tool calls with a simple and effective method for context management.
0
0
1
Finally, we find that SLIM hallucinates less, but existing systems still suffer from other issues like confirmation bias and ignoring answers found in long trajectories. We hope these analyses can give insights to how to improve these systems in future work!
1
0
1
To better understand agentic search systems, we develop a taxonomy of common error modes covering possible failure modes in the information gathering process as well as the answer synthesis stage. Then, we design an error analysis pipeline to characterize systems automatically.
1
1
1
With o3 as the base model, SLIM achieves 56% on BrowseComp and 31% on HLE, outperforming all open-source frameworks by 8 and 4 absolute points, respectively, while incurring 4β6x fewer tool calls.
1
0
0
We developed SLIM (Simple Lightweight Information Management)βa long-horizon search framework built on 3 core principles: 1. simple tools: decoupled search and browse tools 2. minimize irrelevant noise with selective content extraction 3. periodic summarization of trajectories
1
0
2
Taking a closer look at existing frameworks, they often suffer from context limitations and inefficient tool use: β Exceed context window β Irrelevant noise in context β Run out of tool budget
1
0
1
Long-horizon agentic search involves exhaustive exploration of the web and information synthesis across many sources, and is the foundation for applications like deep research. Between the proprietary systems and distinct open-source frameworks, itβs unclear which ones to use.
1
0
1
How to build agentic search systems for long-horizon tasks? Check out our new paper! - Simple design principles are efficient and effective - Error analysis and fine-grain analysis for search systems A π§΅ on SLIM, our long-horizon agentic search framework
1
14
41
I am going to present two papers at #COLM2025 tomorrow from 4:30-6:30pm, as none of our leading authors can attend due to visa issues. Haven't done poster presentations for years π€£π€£ .... so I will do my best! #76: LongProc #80: Goedel-Prover v1
Our Goedel-Prover V1 will be presented at COLM 2025 in Montreal this Wednesday afternoon! I wonβt be there in person, but my amazing and renowned colleague @danqi_chen will be around to help with the poster β feel free to stop by!
4
27
346
Congrats!!! As an empiricist, I always found your work super relevant and provide useful theoretical insights! Also thanks for giving great advices in our chats :)
Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here:
0
0
2
Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here:
39
71
542
π‘ Curious about long-context foundation models (LFCM)? π§ Weβre hosting a panel at the LCFM workshop at #ICML2025 on βHow to evaluate long-context foundation models?β β Weβd love to feature your question! Anything on long-context evaluation or modeling β drop it below / DM meπ€
1
10
26
π€ Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval? π£ Introducing QRHeads (query-focused retrieval heads) that enhance retrieval Main contributions: π Better head detection: we find a
2
23
70
LMs often output answers that sound right but arenβt supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.
2
26
50
π News! Our 2nd Workshop on Long-Context Foundation Models (LCFM), to be held at ICML 2025 in Vancouver π¨π¦! If you're working on long-context models, consider submitting your work! ποΈ DDL: May 22, 2025 (AOE) π Web: https://t.co/5Dt6uBATcN π OpenReview: https://t.co/FaxTWdeGnr
1
8
29
If you are attending ICLR, feel free to drop by our poster on Thursday (4/24), 10 AM - 12:30 PM in Hall 3 + Hall 2B (Poster #215)! Paper: https://t.co/qsZJQc7ZQ1 Leaderboard/website: https://t.co/jqRvUVDlum Blog:
huggingface.co
0
0
3
Since our initial release, we are thrilled to see the adoption of HELMET by the community, such as Microsoft's Phi-4 and AI21's Jamba 1.6. We are also constantly updating the leaderboard with new models!
1
0
1
Our extensive evaluations and analyses include over 50 frontier long-context models, the leaderboard is on our website: https://t.co/OcLEsnS9OZ! We study the synthetic and realistic evaluations in long-context settings, which we summarize in our blog post:
huggingface.co
1
0
1