Zach Xu Profile
Zach Xu

@nehzux

Followers
101
Following
92
Media
2
Statuses
17

CS PhD @UChicago on LLM. I evolve myself (slowly). @VirtueAI_co

San Francisco
Joined July 2015
Don't wanna be here? Send us removal request.
@nehzux
Zach Xu
16 days
RT @togethercompute: 🛡️ VirtueGuard is LIVE on Together AI 🚀. AI security and safety model that screens input and output for harmful conten….
0
5
0
@nehzux
Zach Xu
20 days
RT @Chi_Wang_: 🚀 Meet MassGen! 🛠️.An open-source project for multi-agent scaling. Inspired by @grok Heavy & Gemini DeepThink. Enable parall….
0
44
0
@nehzux
Zach Xu
1 month
RT @james_y_zou: 📢New conference where AI is the primary author and reviewer! Current venues don't allow AI-writte….
0
127
0
@nehzux
Zach Xu
2 months
0
0
3
@nehzux
Zach Xu
2 months
Bottom line: "Divide and Conquer" isn't a silver bullet, but with a principled strategy, it's a powerful pathway to handling massive contexts. Our framework tells you when and why. Dive into the details in our new paper!. Link:
Tweet card summary image
arxiv.org
We investigate the challenge of applying Large Language Models (LLMs) to long texts. We propose a theoretical framework that distinguishes the failure modes of long context tasks into three...
1
0
1
@nehzux
Zach Xu
2 months
We tested this on different tasks. The results show a clear "sweet spot" for chunking: it dominates when model confusion is high, but the task doesn't require seeing everything at once. For tasks with extreme cross-chunk dependency, single-shot is still better.
Tweet media one
1
0
1
@nehzux
Zach Xu
2 months
So how do we manage this? We built a simple system with a Planner, Workers, and a Manager. The "Planner" is an LLM that automatically designs the prompts for the other agents to minimize "aggregator noise" 🧩 and get the best results.
Tweet media one
1
0
0
@nehzux
Zach Xu
2 months
FINDING: A weaker LLM using our method can OUTPERFORM a stronger one (like GPT-4o) on certain long-context tasks. Why? Because "model noise" 🤯 for the big model can grow superlinearly on the full text, making it more confused than weaker models are on smaller, manageable chunks.
1
0
0
@nehzux
Zach Xu
2 months
🤝 Task Noise: cross-chunk dependencies that can't be handled by processing each segment in isolation. 🤯 Model Noise: the model's performance degradation as the input length increases. 🧩 Aggregator Noise: incorrect combination of partial results from each chunk.
1
0
0
@nehzux
Zach Xu
2 months
To understand this, we introduce our Noise Decomposition Framework. It pinpoints why LLMs fail on long-context tasks by breaking the final error down into three distinct parts. This reveals the core trade-off between three noises:.
1
0
0
@nehzux
Zach Xu
2 months
LLMs are getting more powerful, but they still struggle with super long documents. A common trick is "Divide and Conquer" - chop it up, process chunks, and combine. But. when does this actually work? And when does it fail catastrophically?. We investigated. 🧵.
1
6
12
@nehzux
Zach Xu
2 months
RT @NewInML: New to ML research? Never published at ICML? Don't miss this!. Check out the New in ML workshop at ICML 2025 — no rejections,….
openreview.net
Welcome to the OpenReview homepage for ICML 2025 Workshop NewInML
0
14
0
@nehzux
Zach Xu
4 months
RT @GoogleDeepMind: Human generated data has fueled incredible AI progress, but what comes next? 📈. On the latest episode of our podcast, @….
0
265
0
@nehzux
Zach Xu
5 months
RT @RichardSSutton: I am pretty happy with this 30-minute summary of my views on the current state of AI and alignment. .
0
101
0
@nehzux
Zach Xu
5 months
RT @karpathy: New 2h11m YouTube video: How I Use LLMs. This video continues my general audience series. The last one focused on how LLMs ar….
0
2K
0
@nehzux
Zach Xu
9 months
RT @NewInML: 📢 BIG ANNOUNCEMENT . NewInML is back at @NeurIPSConf on Dec 10th!. Join us for insights from 3 incredible speakers: @tomgoldst….
0
5
0
@nehzux
Zach Xu
1 year
RT @NeurIPSConf: Soliciting participants for the NeurIPS 2024 Checklist Assistant Study! . Pre-register before the abstract submission dead….
0
4
0