Shang Zhu
@ShangZhu18
Followers
242
Following
478
Media
4
Statuses
139
AI Researcher at Together AI @togethercompute | alumni of @UMich @CMUEngineering and Xi'an Jiao-Tong University, China Opinions are my own.
San Francisco Bay Area
Joined July 2018
Check CLOUD: our most recent efforts on scaling crystal foundation models, with @changwen_xu98 @venkvis! This is @changwen_xu98 's very first work during his PhD at @umichme and it's my pleasure working with him! #AI4Science #FoundationModels #Materials #DifferentiablePhysics
1/n📢 New preprint: CLOUD — scalable & physics-informed foundation model for crystals! - Pretrained on 6M structures - Symmetry-consistent strings (SCOPE) - Integrates Debye model for thermodynamic consistency 🔥 👉 https://t.co/SlOeSrVsKO
#AI4Science #materials #FoundationModels
0
0
3
Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):
48
136
430
The paper shows reasoning models often ignore user instructions while thinking, even when final answers look fine. A reasoning trace is the hidden text the model writes before the final answer. The authors build ReasonIF, a test that pairs normal questions with simple rules
4
6
20
How controllable is LLM's reasoning process? We found that GPT OSS, Qwen3 and others fail >75% of time to follow simple reasoning instructions like using specific language, format or watermark⚠️ This makes reasoning LMs vulnerable to reward hacking, scheming + hard to steer🛑
2
4
19
A new benchmark from @togethercompute to assess how good reasoning models are at following instructions (there are significant gaps in frontier models today).
🧠Do reasoning models really follow our instructions? Together AI’s newest paper "ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning", studies how well large reasoning models (LRMs) follow user instructions during reasoning. Authors: @ykwon_0407,
0
7
24
🧠Do reasoning models really follow our instructions? Together AI’s newest paper "ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning", studies how well large reasoning models (LRMs) follow user instructions during reasoning. Authors: @ykwon_0407,
2
7
18
As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. https://t.co/YprwsthysY
1
34
84
.@togethercompute we are automating complex cloud workflows, like training speculative decoders for customer inference workloads, with AI agents. These workflows typically require human oversight due to diversity of tools, environments, security needs and data science processes
Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM
4
5
27
Useful design patterns + lessons on building AI agents to automate complex engineering tasks.👇
Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM
2
3
15
Check out lessons learned from our efforts in automating language model training with AI agents!
Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM
0
1
3
LLM efficiency research on steroids with agentic workflows! 🚀
Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM
0
2
6
Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM
4
20
68
🚀Exciting news! We will award 3 Best Papers at #Agents4Science — the 1st conference where the main authors + reviewers are AI agents! Each winner will receive $10k compute credit thanks to @togethercompute. Submit your agent's paper 👉 https://t.co/iRy3tzaQkC
5
10
78
🤖OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, trained using o3 techniques. Build anything, deploy anywhere🔥
13
24
111
🔮Exciting new benchmark testing how well AI predicts the future! Each week, we curate news + prediction markets for questions about next week. Then we have agents make forecasts. Requires advanced research + reasoning @togethercompute @huggingface 📜 https://t.co/sFsP0mG0Q4 🌐
together.ai
FutureBench is a live, leak-free benchmark of true reasoning—AI agents forecast real-world events (rates, geopolitics) before they happen.
Most AI benchmarks test the past. But real intelligence is about predicting the future. Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with @huggingface 🔍 Reasoning > memorization 📊 Real-world events 🧠 Dynamic,
0
6
33
Can LLMs predict the future? In FutureBench, friends from @togethercompute create new questions from evolving news & markets: As time passes, we'll see which agents are the best at predicting events that have yet to happen! 🔮 Also cool: by design, dynamic & uncontaminated eval
2
10
36
Most AI benchmarks test the past. But real intelligence is about predicting the future. Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with @huggingface 🔍 Reasoning > memorization 📊 Real-world events 🧠 Dynamic,
5
20
91
My new RAG course with Andrew is now out on the Coursera platform! I dive into all the nitty dirty details that you need to start building RAG systems, from retrieval systems and hybrid search to LLMs, evals, observability and everything in between! Check it out and let me know
Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by https://t.co/R0m408f8CA and taught by @ZainHasan6, experienced AI and ML engineer, researcher,
1
7
36
TL;DR - one way to push the quality-efficiency frontier: obtain high quality generations via a collection of LLMs -> distill to a smaller model -> get a higher quality small model that is more inference-efficient than the original collection of models. Poster session
Work done during my internship at Together AI is being presented at #icml25. Come and check it out! We propose a new model alignment pipeline that harness collective intelligence from open-source llms!
0
1
4
Work done during my internship at Together AI is being presented at #icml25. Come and check it out! We propose a new model alignment pipeline that harness collective intelligence from open-source llms!
0
3
18
🚀The era of overpriced, black-box coding assistants is OVER. Thrilled to lead the @Agentica_ team in open-sourcing and training DeepSWE—a SOTA software engineering agent trained end-to-end with @deepseek_ai -like RL on Qwen32B, hitting 59% on SWE-Bench-Verified and topping the
🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE
11
13
98