ben_athi Profile Banner
Ben Athiwaratkun @ICML Profile
Ben Athiwaratkun @ICML

@ben_athi

Followers
863
Following
2K
Media
59
Statuses
374

Leading Turbo Team @ Together AI. prev: @awscloud @MSFTResearch, @Cornell PhD.

Earth, the Milky Way.
Joined July 2014
Don't wanna be here? Send us removal request.
@ben_athi
Ben Athiwaratkun @ICML
1 year
Introducing: Bifurcated Attention -- Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs @ #icml 2024. TL;DR -- We can generate hundreds of samples with the same latency as one without approximation for any attention-based LLMs. Poster session happening on
Tweet media one
2
11
53
@ben_athi
Ben Athiwaratkun @ICML
4 hours
Blog post on how we achieve the world’s fastest inference speed on NVIDIA Blackwell -
0
0
0
@ben_athi
Ben Athiwaratkun @ICML
4 hours
If you’re at icml and interested in LMM efficiency research, come chat with us at Together AI Booth!.
@togethercompute
Together AI
5 hours
Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528. We’ve upgraded the Together Inference Engine to run on @NVIDIA Blackwell GPUs—and the results speak for themselves:.📈 Highest known serverless throughput: 334 tokens/sec.🏃‍Fastest time to first answer token:
Tweet media one
1
0
2
@ben_athi
Ben Athiwaratkun @ICML
2 days
TL;DR - one way to push the quality-efficiency frontier: . obtain high quality generations via a collection of LLMs -> distill to a smaller model -> get a higher quality small model that is more inference-efficient than the original collection of models. Poster session.
@JunlinWang3
Junlin Wang
2 days
Work done during my internship at Together AI is being presented at #icml25. Come and check it out! . We propose a new model alignment pipeline that harness collective intelligence from open-source llms!
Tweet media one
0
1
4
@ben_athi
Ben Athiwaratkun @ICML
2 days
Come check out our poster on speeding up LLM, happening now til 1.30 pm. TL;DR — we show that we can hide the latency of all reduce operations in tensor parallel setting by modifying residual architecture to overlap MLP and attention.
@zhang_muru
Muru Zhang @ ICML
3 days
I'm at #ICML2025, presenting Ladder-Residual ( at the first poster session tomorrow morning (7/15 11am-1:30pm), looking forward to seeing you at.West Exhibition Hall B2-B3 #W-1000!
Tweet media one
0
0
6
@ben_athi
Ben Athiwaratkun @ICML
15 days
RT @togethercompute: Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-3….
0
78
0
@ben_athi
Ben Athiwaratkun @ICML
21 days
RT @togethercompute: 🔓⚡ FLUX.1 Kontext [dev] just landed on Together AI. First open-weight model w/ proprietary-level image editing:. 🎨 Per….
0
6
0
@ben_athi
Ben Athiwaratkun @ICML
22 days
Open Deep Research app + fully open recipe ☺️.
@togethercompute
Together AI
22 days
Introducing the Open Deep Research app!. Generate detailed reports on any topic with open source LLMs. Free & fully open source. We’re releasing everything: evaluation dataset, code, app, and blog.🔥
0
3
11
@ben_athi
Ben Athiwaratkun @ICML
23 days
RT @JonSaadFalcon: How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? .🧵 Introduci….
0
61
0
@ben_athi
Ben Athiwaratkun @ICML
23 days
RT @nutlope: Our open deep research app is launching in 24 hours!. Generate reports about any topic using OSS LLMs. 100% free & open sourc….
0
38
0
@ben_athi
Ben Athiwaratkun @ICML
1 month
RT @vipulved: .@togethercompute is building 2 gigawatts of AI factories (~100,000 GPUs) in the EU over the next 4 years with the first phas….
0
18
0
@ben_athi
Ben Athiwaratkun @ICML
1 month
RT @togethercompute: 🚀 New research: YAQA — Yet Another Quantization Algorithm (yes, pronounced like yaca/jackfruit 🥭). Led by @tsengalb99,….
0
5
0
@ben_athi
Ben Athiwaratkun @ICML
2 months
RT @togethercompute: 🔔 New blog post on how we can attain large speedups for our inference customers using custom speculators! 🚀. Key benef….
0
5
0
@ben_athi
Ben Athiwaratkun @ICML
3 months
If you're at ICLR and passionate about optimizing language models for speed and efficiency, swing by the Together AI booth for a chat.
Tweet media one
1
32
73
@ben_athi
Ben Athiwaratkun @ICML
3 months
RT @LindaHe49140661: Excited to share our work on scaling LLMs to handle million-token contexts! Training models for ultra-long sequences i….
0
46
0
@ben_athi
Ben Athiwaratkun @ICML
3 months
RT @JunlinWang3: Excited to share work from my @togethercompute internship—a deep dive into inference‑time scaling methods 🧠. We rigorously….
0
53
0
@ben_athi
Ben Athiwaratkun @ICML
3 months
Deep Research with open source models (+ open recipe).
@togethercompute
Together AI
3 months
Introducing Open Deep Research!. A fully open-source Deep Research tool that:.• writes comprehensive reports.• does multi-hop search and reasoning.• generates cover images & pod-casts!. We’re releasing everything: evaluation dataset, code and blog.🔥. Example output report👇
0
1
6
@ben_athi
Ben Athiwaratkun @ICML
3 months
RT @AlpayAriyak: Excited to present our project in collaboration with Agentica: .14B LLM trained with Code RL that reaches OpenAI's o3-mini….
0
27
0
@ben_athi
Ben Athiwaratkun @ICML
3 months
RT @togethercompute: Announcing DeepCoder-14B – an o1 & o3-mini level coding reasoning model fully open-sourced!. We’re releasing everythin….
0
350
0
@ben_athi
Ben Athiwaratkun @ICML
3 months
RT @togethercompute: You can now run inference directly on the Llama 4 Hugging Face model page – powered by Together AI! .
0
18
0