Ben Athiwaratkun @ben_athi X Profile

Ben Athiwaratkun

@ben_athi

Followers

868

Following

2K

Media

59

Statuses

375

Leading Turbo Team @ Together AI. prev: @awscloud @MSFTResearch, @Cornell PhD.

Earth, the Milky Way.

Joined July 2014

Don't wanna be here? Send us removal request.

Ben Athiwaratkun

@ben_athi

1 year

Introducing: Bifurcated Attention -- Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs @ #icml 2024. TL;DR -- We can generate hundreds of samples with the same latency as one without approximation for any attention-based LLMs. Poster session happening on

2

11

53

Ben Athiwaratkun

@ben_athi

13 days

RT @togethercompute: 🤖OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, train….

0

23

0

Ben Athiwaratkun

@ben_athi

1 month

Blog post on how we achieve the world’s fastest inference speed on NVIDIA Blackwell -

together.ai

Together AI inference is now among the world’s fastest, most capable platforms for running open-source reasoning models like DeepSeek-R1 at scale, thanks to our new inference engine designed for...

0

Ben Athiwaratkun

@ben_athi

1 month

If you’re at icml and interested in LMM efficiency research, come chat with us at Together AI Booth!.

Together AI

@togethercompute

1 month

Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528. We’ve upgraded the Together Inference Engine to run on @NVIDIA Blackwell GPUs—and the results speak for themselves:.📈 Highest known serverless throughput: 334 tokens/sec.🏃‍Fastest time to first answer token:

1

0

4

Ben Athiwaratkun

@ben_athi

1 month

TL;DR - one way to push the quality-efficiency frontier: . obtain high quality generations via a collection of LLMs -> distill to a smaller model -> get a higher quality small model that is more inference-efficient than the original collection of models. Poster session.

Junlin Wang

@JunlinWang3

1 month

Work done during my internship at Together AI is being presented at #icml25. Come and check it out! . We propose a new model alignment pipeline that harness collective intelligence from open-source llms!

0

1

4

Ben Athiwaratkun

@ben_athi

1 month

Come check out our poster on speeding up LLM, happening now til 1.30 pm. TL;DR — we show that we can hide the latency of all reduce operations in tensor parallel setting by modifying residual architecture to overlap MLP and attention.

Muru Zhang

@zhang_muru

1 month

I'm at #ICML2025, presenting Ladder-Residual ( at the first poster session tomorrow morning (7/15 11am-1:30pm), looking forward to seeing you at.West Exhibition Hall B2-B3 #W-1000!

0

6

Ben Athiwaratkun

@ben_athi

2 months

RT @togethercompute: Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-3….

0

80

0

Ben Athiwaratkun

@ben_athi

2 months

RT @togethercompute: 🔓⚡ FLUX.1 Kontext [dev] just landed on Together AI. First open-weight model w/ proprietary-level image editing:. 🎨 Per….

0

6

0

Ben Athiwaratkun

@ben_athi

2 months

Open Deep Research app + fully open recipe ☺️.

Together AI

@togethercompute

2 months

Introducing the Open Deep Research app!. Generate detailed reports on any topic with open source LLMs. Free & fully open source. We’re releasing everything: evaluation dataset, code, app, and blog.🔥

0

3

11

Ben Athiwaratkun

@ben_athi

2 months

RT @JonSaadFalcon: How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? .🧵 Introduci….

0

63

0

Ben Athiwaratkun

@ben_athi

2 months

RT @nutlope: Our open deep research app is launching in 24 hours!. Generate reports about any topic using OSS LLMs. 100% free & open sourc….

0

37

0

Ben Athiwaratkun

@ben_athi

2 months

RT @vipulved: .@togethercompute is building 2 gigawatts of AI factories (~100,000 GPUs) in the EU over the next 4 years with the first phas….

0

18

0

Ben Athiwaratkun

@ben_athi

2 months

RT @togethercompute: 🚀 New research: YAQA — Yet Another Quantization Algorithm (yes, pronounced like yaca/jackfruit 🥭). Led by @tsengalb99,….

0

5

0

Ben Athiwaratkun

@ben_athi

3 months

RT @togethercompute: 🔔 New blog post on how we can attain large speedups for our inference customers using custom speculators! 🚀. Key benef….

0

5

0

Ben Athiwaratkun

@ben_athi

4 months

If you're at ICLR and passionate about optimizing language models for speed and efficiency, swing by the Together AI booth for a chat.

1

31

70

Ben Athiwaratkun

@ben_athi

4 months

RT @LindaHe49140661: Excited to share our work on scaling LLMs to handle million-token contexts! Training models for ultra-long sequences i….

0

45

0

Ben Athiwaratkun

@ben_athi

4 months

RT @JunlinWang3: Excited to share work from my @togethercompute internship—a deep dive into inference‑time scaling methods 🧠. We rigorously….

0

54

0

Ben Athiwaratkun

@ben_athi

4 months

Deep Research with open source models (+ open recipe).

Together AI

@togethercompute

4 months

Introducing Open Deep Research!. A fully open-source Deep Research tool that:.• writes comprehensive reports.• does multi-hop search and reasoning.• generates cover images & pod-casts!. We’re releasing everything: evaluation dataset, code and blog.🔥. Example output report👇

0

1

6

Ben Athiwaratkun

@ben_athi

4 months

RT @AlpayAriyak: Excited to present our project in collaboration with Agentica: .14B LLM trained with Code RL that reaches OpenAI's o3-mini….

0

28

0

Ben Athiwaratkun

@ben_athi

4 months

RT @togethercompute: Announcing DeepCoder-14B – an o1 & o3-mini level coding reasoning model fully open-sourced!. We’re releasing everythin….

0

347

0