Shang Zhu @ShangZhu18 X Profile

Shang Zhu

@ShangZhu18

Followers

242

Following

478

Media

4

Statuses

139

AI Researcher at Together AI @togethercompute | alumni of @UMich @CMUEngineering and Xi'an Jiao-Tong University, China Opinions are my own.

San Francisco Bay Area

Joined July 2018

Don't wanna be here? Send us removal request.

Shang Zhu

@ShangZhu18

5 months

Check CLOUD: our most recent efforts on scaling crystal foundation models, with @changwen_xu98 @venkvis! This is @changwen_xu98 's very first work during his PhD at @umichme and it's my pleasure working with him! #AI4Science #FoundationModels #Materials #DifferentiablePhysics

Changwen Xu

@changwen_xu98

5 months

1/n📢 New preprint: CLOUD — scalable & physics-informed foundation model for crystals! - Pretrained on 6M structures - Symmetry-consistent strings (SCOPE) - Integrates Debye model for thermodynamic consistency 🔥 👉 https://t.co/SlOeSrVsKO #AI4Science #materials #FoundationModels

0

3

Jon Saad-Falcon

@JonSaadFalcon

21 days

Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):

48

136

430

Rohan Paul

@rohanpaul_ai

1 month

The paper shows reasoning models often ignore user instructions while thinking, even when final answers look fine. A reasoning trace is the hidden text the model writes before the final answer. The authors build ReasonIF, a test that pairs normal questions with simple rules

4

6

20

James Zou

@james_y_zou

1 month

How controllable is LLM's reasoning process? We found that GPT OSS, Qwen3 and others fail >75% of time to follow simple reasoning instructions like using specific language, format or watermark⚠️ This makes reasoning LMs vulnerable to reward hacking, scheming + hard to steer🛑

2

4

19

Vipul Ved Prakash

@vipulved

1 month

A new benchmark from @togethercompute to assess how good reasoning models are at following instructions (there are significant gaps in frontier models today).

Together AI

@togethercompute

1 month

🧠Do reasoning models really follow our instructions? Together AI’s newest paper "ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning", studies how well large reasoning models (LRMs) follow user instructions during reasoning. Authors: @ykwon_0407,

0

7

24

Together AI

@togethercompute

1 month

🧠Do reasoning models really follow our instructions? Together AI’s newest paper "ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning", studies how well large reasoning models (LRMs) follow user instructions during reasoning. Authors: @ykwon_0407,

2

7

18

Kaitlyn Zhou

@KaitlynZhou

1 month

As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. https://t.co/YprwsthysY

1

34

84

Vipul Ved Prakash

@vipulved

3 months

.@togethercompute we are automating complex cloud workflows, like training speculative decoders for customer inference workloads, with AI agents. These workflows typically require human oversight due to diversity of tools, environments, security needs and data science processes

Together AI

@togethercompute

3 months

Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM

4

5

27

James Zou

@james_y_zou

3 months

Useful design patterns + lessons on building AI agents to automate complex engineering tasks.👇

Together AI

@togethercompute

3 months

Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM

2

3

15

Wai Tong

@wt_chung

3 months

Check out lessons learned from our efforts in automating language model training with AI agents!

Together AI

@togethercompute

3 months

Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM

0

1

3

Ben Athiwaratkun (@Neurips)

@ben_athi

3 months

LLM efficiency research on steroids with agentic workflows! 🚀

Together AI

@togethercompute

3 months

Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM

0

2

6

Together AI

@togethercompute

3 months

Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM

4

20

68

James Zou

@james_y_zou

4 months

🚀Exciting news! We will award 3 Best Papers at #Agents4Science — the 1st conference where the main authors + reviewers are AI agents! Each winner will receive $10k compute credit thanks to @togethercompute. Submit your agent's paper 👉 https://t.co/iRy3tzaQkC

5

10

78

Together AI

@togethercompute

4 months

🤖OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, trained using o3 techniques. Build anything, deploy anywhere🔥

13

24

111

James Zou

@james_y_zou

5 months

🔮Exciting new benchmark testing how well AI predicts the future! Each week, we curate news + prediction markets for questions about next week. Then we have agents make forecasts. Requires advanced research + reasoning @togethercompute @huggingface 📜 https://t.co/sFsP0mG0Q4 🌐

together.ai

FutureBench is a live, leak-free benchmark of true reasoning—AI agents forecast real-world events (rates, geopolitics) before they happen.

Together AI

@togethercompute

5 months

Most AI benchmarks test the past. But real intelligence is about predicting the future. Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with @huggingface 🔍 Reasoning > memorization 📊 Real-world events 🧠 Dynamic,

0

6

33

Clémentine Fourrier 🍊 is off till Dec 2026 hiking

@clefourrier

5 months

Can LLMs predict the future? In FutureBench, friends from @togethercompute create new questions from evolving news & markets: As time passes, we'll see which agents are the best at predicting events that have yet to happen! 🔮 Also cool: by design, dynamic & uncontaminated eval

2

10

36

Together AI

@togethercompute

5 months

Most AI benchmarks test the past. But real intelligence is about predicting the future. Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with @huggingface 🔍 Reasoning > memorization 📊 Real-world events 🧠 Dynamic,

5

20

91

Zain

@ZainHasan6

5 months

My new RAG course with Andrew is now out on the Coursera platform! I dive into all the nitty dirty details that you need to start building RAG systems, from retrieval systems and hybrid search to LLMs, evals, observability and everything in between! Check it out and let me know

Andrew Ng

@AndrewYNg

5 months

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by https://t.co/R0m408f8CA and taught by @ZainHasan6, experienced AI and ML engineer, researcher,

1

7

36

Ben Athiwaratkun (@Neurips)

@ben_athi

5 months

TL;DR - one way to push the quality-efficiency frontier: obtain high quality generations via a collection of LLMs -> distill to a smaller model -> get a higher quality small model that is more inference-efficient than the original collection of models. Poster session

Junlin Wang

@JunlinWang3

5 months

Work done during my internship at Together AI is being presented at #icml25. Come and check it out! We propose a new model alignment pipeline that harness collective intelligence from open-source llms!

0

1

4

Junlin Wang

@JunlinWang3

5 months

Work done during my internship at Together AI is being presented at #icml25. Come and check it out! We propose a new model alignment pipeline that harness collective intelligence from open-source llms!

0

3

18

Michael Luo

@michaelzluo

5 months

🚀The era of overpriced, black-box coding assistants is OVER. Thrilled to lead the @Agentica_ team in open-sourcing and training DeepSWE—a SOTA software engineering agent trained end-to-end with @deepseek_ai -like RL on Qwen32B, hitting 59% on SWE-Bench-Verified and topping the

Agentica Project

@Agentica_

5 months

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE

11

13

98