Howard Yen @HowardYen1 X Profile

Howard Yen

@HowardYen1

Followers

266

Following

712

Media

20

Statuses

54

Joined January 2014

Don't wanna be here? Send us removal request.

Xi Ye

@xiye_nlp

15 days

We will present QRHead (@WuweiZhang0723) #EMNLP2025 Without any training, we boosts Llama-3.1-8B’s performance by >10% 📈on context reasoning tasks (CLIPPER, LongMemEval), and outperforms specialized re-rankers on BEIR. Check out our (virtual) poster tonight!

Xi Ye

@xiye_nlp

5 months

🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval? 📣 Introducing QRHeads (query-focused retrieval heads) that enhance retrieval Main contributions: 🔍 Better head detection: we find a

0

10

36

Howard Yen

@HowardYen1

28 days

This project was done during my wonderful summer internship @samaya_AI with @AshwinParan @xiamengzhou @_vThejas @jmhessel @danqi_chen and @yuhaozhangx We also released a blog here: https://t.co/Z9HwYnGQtg

samaya.ai

Samaya's researchers create deep research systems that outperform existing frameworks while using a fraction of the tool calls with a simple and effective method for context management.

0

1

Howard Yen

@HowardYen1

28 days

Paper: https://t.co/Nf9AmR1fCe Code:

github.com

Contribute to howard-yen/SLIM development by creating an account on GitHub.

1

0

2

Howard Yen

@HowardYen1

28 days

Finally, we find that SLIM hallucinates less, but existing systems still suffer from other issues like confirmation bias and ignoring answers found in long trajectories. We hope these analyses can give insights to how to improve these systems in future work!

1

0

1

Howard Yen

@HowardYen1

28 days

To better understand agentic search systems, we develop a taxonomy of common error modes covering possible failure modes in the information gathering process as well as the answer synthesis stage. Then, we design an error analysis pipeline to characterize systems automatically.

1

Howard Yen

@HowardYen1

28 days

With o3 as the base model, SLIM achieves 56% on BrowseComp and 31% on HLE, outperforming all open-source frameworks by 8 and 4 absolute points, respectively, while incurring 4–6x fewer tool calls.

1

0

Howard Yen

@HowardYen1

28 days

We developed SLIM (Simple Lightweight Information Management)—a long-horizon search framework built on 3 core principles: 1. simple tools: decoupled search and browse tools 2. minimize irrelevant noise with selective content extraction 3. periodic summarization of trajectories

1

0

2

Howard Yen

@HowardYen1

28 days

Taking a closer look at existing frameworks, they often suffer from context limitations and inefficient tool use: ❌ Exceed context window ❌ Irrelevant noise in context ❌ Run out of tool budget

1

0

1

Howard Yen

@HowardYen1

28 days

Long-horizon agentic search involves exhaustive exploration of the web and information synthesis across many sources, and is the foundation for applications like deep research. Between the proprietary systems and distinct open-source frameworks, it’s unclear which ones to use.

1

0

1

Howard Yen

@HowardYen1

28 days

How to build agentic search systems for long-horizon tasks? Check out our new paper! - Simple design principles are efficient and effective - Error analysis and fine-grain analysis for search systems A 🧵 on SLIM, our long-horizon agentic search framework

1

14

41

Danqi Chen

@danqi_chen

1 month

I am going to present two papers at #COLM2025 tomorrow from 4:30-6:30pm, as none of our leading authors can attend due to visa issues. Haven't done poster presentations for years 🤣🤣 .... so I will do my best! #76: LongProc #80: Goedel-Prover v1

Chi Jin

@chijinML

1 month

Our Goedel-Prover V1 will be presented at COLM 2025 in Montreal this Wednesday afternoon! I won’t be there in person, but my amazing and renowned colleague @danqi_chen will be around to help with the poster — feel free to stop by!

4

27

346

Howard Yen

@HowardYen1

2 months

Congrats!!! As an empiricist, I always found your work super relevant and provide useful theoretical insights! Also thanks for giving great advices in our chats :)

Sadhika Malladi

@SadhikaMalladi

2 months

Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here:

0

2

Sadhika Malladi

@SadhikaMalladi

2 months

Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here:

39

71

542

Zexue He

@ZexueHe

4 months

💡 Curious about long-context foundation models (LFCM)? 🧠 We’re hosting a panel at the LCFM workshop at #ICML2025 on “How to evaluate long-context foundation models?” — We’d love to feature your question! Anything on long-context evaluation or modeling — drop it below / DM me🎤

1

10

26

Xi Ye

@xiye_nlp

5 months

🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval? 📣 Introducing QRHeads (query-focused retrieval heads) that enhance retrieval Main contributions: 🔍 Better head detection: we find a

2

23

70

Jacqueline He

@jcqln_h

5 months

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

2

26

50

Zexue He

@ZexueHe

6 months

🚀 News! Our 2nd Workshop on Long-Context Foundation Models (LCFM), to be held at ICML 2025 in Vancouver 🇨🇦! If you're working on long-context models, consider submitting your work! 🗓️ DDL: May 22, 2025 (AOE) 🌐 Web: https://t.co/5Dt6uBATcN 🔗 OpenReview: https://t.co/FaxTWdeGnr

1

8

29

Howard Yen

@HowardYen1

7 months

If you are attending ICLR, feel free to drop by our poster on Thursday (4/24), 10 AM - 12:30 PM in Hall 3 + Hall 2B (Poster #215)! Paper: https://t.co/qsZJQc7ZQ1 Leaderboard/website: https://t.co/jqRvUVDlum Blog:

huggingface.co

0

3

Howard Yen

@HowardYen1

7 months

Since our initial release, we are thrilled to see the adoption of HELMET by the community, such as Microsoft's Phi-4 and AI21's Jamba 1.6. We are also constantly updating the leaderboard with new models!

1

0

1

Howard Yen

@HowardYen1

7 months

Our extensive evaluations and analyses include over 50 frontier long-context models, the leaderboard is on our website: https://t.co/OcLEsnS9OZ! We study the synthetic and realistic evaluations in long-context settings, which we summarize in our blog post:

huggingface.co

1

0

1