
Ben Athiwaratkun
@ben_athi
Followers
868
Following
2K
Media
59
Statuses
375
Leading Turbo Team @ Together AI. prev: @awscloud @MSFTResearch, @Cornell PhD.
Earth, the Milky Way.
Joined July 2014
Introducing: Bifurcated Attention -- Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs @ #icml 2024. TL;DR -- We can generate hundreds of samples with the same latency as one without approximation for any attention-based LLMs. Poster session happening on
2
11
53
RT @togethercompute: š¤OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, trainā¦.
0
23
0
Blog post on how we achieve the worldās fastest inference speed on NVIDIA Blackwell -
together.ai
Together AI inference is now among the worldās fastest, most capable platforms for running open-source reasoning models like DeepSeek-R1 at scale, thanks to our new inference engine designed for...
0
0
0
If youāre at icml and interested in LMM efficiency research, come chat with us at Together AI Booth!.
Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528. Weāve upgraded the Together Inference Engine to run on @NVIDIA Blackwell GPUsāand the results speak for themselves:.š Highest known serverless throughput: 334 tokens/sec.šāFastest time to first answer token:
1
0
4
TL;DR - one way to push the quality-efficiency frontier: . obtain high quality generations via a collection of LLMs -> distill to a smaller model -> get a higher quality small model that is more inference-efficient than the original collection of models. Poster session.
Work done during my internship at Together AI is being presented at #icml25. Come and check it out! . We propose a new model alignment pipeline that harness collective intelligence from open-source llms!
0
1
4
Come check out our poster on speeding up LLM, happening now til 1.30 pm. TL;DR ā we show that we can hide the latency of all reduce operations in tensor parallel setting by modifying residual architecture to overlap MLP and attention.
I'm at #ICML2025, presenting Ladder-Residual ( at the first poster session tomorrow morning (7/15 11am-1:30pm), looking forward to seeing you at.West Exhibition Hall B2-B3 #W-1000!
0
0
6
RT @togethercompute: Announcing DeepSWE š¤: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-3ā¦.
0
80
0
RT @togethercompute: šā” FLUX.1 Kontext [dev] just landed on Together AI. First open-weight model w/ proprietary-level image editing:. šØ Perā¦.
0
6
0
RT @JonSaadFalcon: How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? .š§µ Introduciā¦.
0
63
0
RT @vipulved: .@togethercompute is building 2 gigawatts of AI factories (~100,000 GPUs) in the EU over the next 4 years with the first phasā¦.
0
18
0
RT @togethercompute: š New research: YAQA ā Yet Another Quantization Algorithm (yes, pronounced like yaca/jackfruit š„). Led by @tsengalb99,ā¦.
0
5
0
RT @togethercompute: š New blog post on how we can attain large speedups for our inference customers using custom speculators! š. Key benefā¦.
0
5
0
RT @LindaHe49140661: Excited to share our work on scaling LLMs to handle million-token contexts! Training models for ultra-long sequences iā¦.
0
45
0
RT @JunlinWang3: Excited to share work from my @togethercompute internshipāa deep dive into inferenceātime scaling methods š§ . We rigorouslyā¦.
0
54
0
Deep Research with open source models (+ open recipe).
Introducing Open Deep Research!. A fully open-source Deep Research tool that:.⢠writes comprehensive reports.⢠does multi-hop search and reasoning.⢠generates cover images & pod-casts!. Weāre releasing everything: evaluation dataset, code and blog.š„. Example output reportš
0
1
6
RT @AlpayAriyak: Excited to present our project in collaboration with Agentica: .14B LLM trained with Code RL that reaches OpenAI's o3-miniā¦.
0
28
0
RT @togethercompute: Announcing DeepCoder-14B ā an o1 & o3-mini level coding reasoning model fully open-sourced!. Weāre releasing everythinā¦.
0
347
0