ShangZhu18 Profile Banner
Shang Zhu Profile
Shang Zhu

@ShangZhu18

Followers
242
Following
478
Media
4
Statuses
139

AI Researcher at Together AI @togethercompute | alumni of @UMich @CMUEngineering and Xi'an Jiao-Tong University, China Opinions are my own.

San Francisco Bay Area
Joined July 2018
Don't wanna be here? Send us removal request.
@ShangZhu18
Shang Zhu
5 months
Check CLOUD: our most recent efforts on scaling crystal foundation models, with @changwen_xu98 @venkvis! This is @changwen_xu98 's very first work during his PhD at @umichme and it's my pleasure working with him! #AI4Science #FoundationModels #Materials #DifferentiablePhysics
@changwen_xu98
Changwen Xu
5 months
1/n📢 New preprint: CLOUD — scalable & physics-informed foundation model for crystals! - Pretrained on 6M structures - Symmetry-consistent strings (SCOPE) - Integrates Debye model for thermodynamic consistency 🔥 👉 https://t.co/SlOeSrVsKO #AI4Science #materials #FoundationModels
0
0
3
@JonSaadFalcon
Jon Saad-Falcon
21 days
Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):
48
136
430
@rohanpaul_ai
Rohan Paul
1 month
The paper shows reasoning models often ignore user instructions while thinking, even when final answers look fine. A reasoning trace is the hidden text the model writes before the final answer. The authors build ReasonIF, a test that pairs normal questions with simple rules
4
6
20
@james_y_zou
James Zou
1 month
How controllable is LLM's reasoning process? We found that GPT OSS, Qwen3 and others fail >75% of time to follow simple reasoning instructions like using specific language, format or watermark⚠️ This makes reasoning LMs vulnerable to reward hacking, scheming + hard to steer🛑
2
4
19
@vipulved
Vipul Ved Prakash
1 month
A new benchmark from @togethercompute to assess how good reasoning models are at following instructions (there are significant gaps in frontier models today).
@togethercompute
Together AI
1 month
🧠Do reasoning models really follow our instructions? Together AI’s newest paper "ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning", studies how well large reasoning models (LRMs) follow user instructions during reasoning. Authors: @ykwon_0407,
0
7
24
@togethercompute
Together AI
1 month
🧠Do reasoning models really follow our instructions? Together AI’s newest paper "ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning", studies how well large reasoning models (LRMs) follow user instructions during reasoning. Authors: @ykwon_0407,
2
7
18
@KaitlynZhou
Kaitlyn Zhou
1 month
As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. https://t.co/YprwsthysY
1
34
84
@vipulved
Vipul Ved Prakash
3 months
.@togethercompute we are automating complex cloud workflows, like training speculative decoders for customer inference workloads, with AI agents. These workflows typically require human oversight due to diversity of tools, environments, security needs and data science processes
@togethercompute
Together AI
3 months
Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM
4
5
27
@james_y_zou
James Zou
3 months
Useful design patterns + lessons on building AI agents to automate complex engineering tasks.👇
@togethercompute
Together AI
3 months
Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM
2
3
15
@wt_chung
Wai Tong
3 months
Check out lessons learned from our efforts in automating language model training with AI agents!
@togethercompute
Together AI
3 months
Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM
0
1
3
@ben_athi
Ben Athiwaratkun (@Neurips)
3 months
LLM efficiency research on steroids with agentic workflows! 🚀
@togethercompute
Together AI
3 months
Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM
0
2
6
@togethercompute
Together AI
3 months
Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM
4
20
68
@james_y_zou
James Zou
4 months
🚀Exciting news! We will award 3 Best Papers at #Agents4Science — the 1st conference where the main authors + reviewers are AI agents! Each winner will receive $10k compute credit thanks to @togethercompute. Submit your agent's paper 👉 https://t.co/iRy3tzaQkC
5
10
78
@togethercompute
Together AI
4 months
🤖OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, trained using o3 techniques. Build anything, deploy anywhere🔥
13
24
111
@james_y_zou
James Zou
5 months
🔮Exciting new benchmark testing how well AI predicts the future! Each week, we curate news + prediction markets for questions about next week. Then we have agents make forecasts. Requires advanced research + reasoning @togethercompute @huggingface 📜 https://t.co/sFsP0mG0Q4 🌐
Tweet card summary image
together.ai
FutureBench is a live, leak-free benchmark of true reasoning—AI agents forecast real-world events (rates, geopolitics) before they happen.
@togethercompute
Together AI
5 months
Most AI benchmarks test the past. But real intelligence is about predicting the future. Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with @huggingface 🔍 Reasoning > memorization 📊 Real-world events 🧠 Dynamic,
0
6
33
@clefourrier
Clémentine Fourrier 🍊 is off till Dec 2026 hiking
5 months
Can LLMs predict the future? In FutureBench, friends from @togethercompute create new questions from evolving news & markets: As time passes, we'll see which agents are the best at predicting events that have yet to happen! 🔮 Also cool: by design, dynamic & uncontaminated eval
2
10
36
@togethercompute
Together AI
5 months
Most AI benchmarks test the past. But real intelligence is about predicting the future. Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with @huggingface 🔍 Reasoning > memorization 📊 Real-world events 🧠 Dynamic,
5
20
91
@ZainHasan6
Zain
5 months
My new RAG course with Andrew is now out on the Coursera platform! I dive into all the nitty dirty details that you need to start building RAG systems, from retrieval systems and hybrid search to LLMs, evals, observability and everything in between! Check it out and let me know
@AndrewYNg
Andrew Ng
5 months
Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by https://t.co/R0m408f8CA and taught by @ZainHasan6, experienced AI and ML engineer, researcher,
1
7
36
@ben_athi
Ben Athiwaratkun (@Neurips)
5 months
TL;DR - one way to push the quality-efficiency frontier: obtain high quality generations via a collection of LLMs -> distill to a smaller model -> get a higher quality small model that is more inference-efficient than the original collection of models. Poster session
@JunlinWang3
Junlin Wang
5 months
Work done during my internship at Together AI is being presented at #icml25. Come and check it out! We propose a new model alignment pipeline that harness collective intelligence from open-source llms!
0
1
4
@JunlinWang3
Junlin Wang
5 months
Work done during my internship at Together AI is being presented at #icml25. Come and check it out! We propose a new model alignment pipeline that harness collective intelligence from open-source llms!
0
3
18
@michaelzluo
Michael Luo
5 months
🚀The era of overpriced, black-box coding assistants is OVER. Thrilled to lead the @Agentica_ team in open-sourcing and training DeepSWE—a SOTA software engineering agent trained end-to-end with @deepseek_ai -like RL on Qwen32B, hitting 59% on SWE-Bench-Verified and topping the
@Agentica_
Agentica Project
5 months
🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE
11
13
98