Tandemn
@Tandemn_labs
Followers
51
Following
9
Media
4
Statuses
12
š Distributed AI infra fusing every underutilized GPU in TANDEMN into a cloud-grade, high-throughput, open-source, blazing fast heterogeneity-aware LLM engine
Joined June 2025
We also provided an easy way for people to use some open-source models deployed on our servers at https://t.co/hDfjtPv7yo by providing free 20$ credits to all users for the hackathon! Watch the entire stream here -
lnkd.in
This link will take you to a page thatās not on LinkedIn
0
0
0
We recently demoed our heterogeneous distributed inference stack at @HackMIT in a one-hour workshop, where we showcased our fully open-source stack ( https://t.co/Pl6zZmAehK), capable of running @vllm_project and @lmcache on heterogeneous GPUs that communicate via @iroh_n0.
lnkd.in
This link will take you to a page thatās not on LinkedIn
1
0
0
Making heterogeneous GPUs feel like one server. Fewer bottlenecks, more tokens/sec and more requests/sec. Grateful for the OSS we stand on.
1
3
6
Tandemn just raised $1.7M! https://t.co/AvdfepH9PV Building the missing layer between heterogeneous GPUs & huge models to end underutilization and make AI workloads effortless. Hiring cracked MLsys and networking engineers. (šCollab w/ @lmcache & @n0computer soon, stay tuned.)
tandemn.com
Tandemn
0
2
4
š Two 8ĆH100 boxes, long-prompt/short-output bench: mean TTFT 3Ć faster & QPS +50 % vs replica-only baseline, SLOs intact. Tandemnās next drop: hetero scheduler + QUIC mesh to harness every stray 3090/v100/a100/IntelArcs beside your H100s. If llm-d got you hyped, stay tuned. ā”ļø
0
0
1
š¹ KV-cache-aware routingāsend each chat to the replica that already holds its prefixes, so you skip redundant prefill and kill p95. š¹ Prefill ā Decodeāprefill is FLOP-bound, decode is DRAM-bound. llm-d runs them on different nodes so neither GPU half-idles.
0
0
2
LLM inference still crawling? šØ Meet llm-dāa K8s-native, @vllm_project-powered framework from Red Hat @RedHat_AI that slashes cost & tail-latency with cache-aware routing + disaggregated compute. Why it mattersš https://t.co/uiUrvamPyp
llm-d.ai
llm-d: Achieve SOTA Inference Performance On Any Accelerator
2
3
11
LLM inference is too slow, too expensive, and too hard to scale. šØ Introducing llm-d, a Kubernetes-native distributed inference framework, to change thatāusing vLLM (@vllm_project), smart scheduling, and disaggregated compute. Hereās how it worksāand how you can use it today:
5
88
549