Tandemn @Tandemn_labs X Profile

Tandemn

@Tandemn_labs

Followers

51

Following

9

Media

4

Statuses

12

🚀 Distributed AI infra fusing every underutilized GPU in TANDEMN into a cloud-grade, high-throughput, open-source, blazing fast heterogeneity-aware LLM engine

https://t.co/cSXshcj3aj

Joined June 2025

Don't wanna be here? Send us removal request.

Tandemn

@Tandemn_labs

3 months

We also provided an easy way for people to use some open-source models deployed on our servers at https://t.co/hDfjtPv7yo by providing free 20$ credits to all users for the hackathon! Watch the entire stream here -

lnkd.in

This link will take you to a page that’s not on LinkedIn

0

Tandemn

@Tandemn_labs

3 months

We recently demoed our heterogeneous distributed inference stack at @HackMIT in a one-hour workshop, where we showcased our fully open-source stack ( https://t.co/Pl6zZmAehK), capable of running @vllm_project and @lmcache on heterogeneous GPUs that communicate via @iroh_n0.

lnkd.in

This link will take you to a page that’s not on LinkedIn

1

0

Tandemn

@Tandemn_labs

3 months

Everyone repping up Tandemn merch!

0

2

5

Tandemn

@Tandemn_labs

3 months

Come check us out at @HackMIT #inferwithtandemn #MIT

0

2

8

Tandemn

@Tandemn_labs

3 months

Look at our work on -

github.com

Basically Heterogenous Inference. Contribute to Tandemn-Labs/tandemn-vllm development by creating an account on GitHub.

0

2

Tandemn

@Tandemn_labs

3 months

Batched inference ongoing!! ⛰️🧗

2

0

6

Tandemn

@Tandemn_labs

4 months

Making heterogeneous GPUs feel like one server. Fewer bottlenecks, more tokens/sec and more requests/sec. Grateful for the OSS we stand on.

1

3

6

Tandemn

@Tandemn_labs

6 months

Tandemn just raised $1.7M! https://t.co/AvdfepH9PV Building the missing layer between heterogeneous GPUs & huge models to end underutilization and make AI workloads effortless. Hiring cracked MLsys and networking engineers. (😉Collab w/ @lmcache & @n0computer soon, stay tuned.)

tandemn.com

Tandemn

0

2

4

Tandemn

@Tandemn_labs

6 months

📊 Two 8×H100 boxes, long-prompt/short-output bench: mean TTFT 3× faster & QPS +50 % vs replica-only baseline, SLOs intact. Tandemn’s next drop: hetero scheduler + QUIC mesh to harness every stray 3090/v100/a100/IntelArcs beside your H100s. If llm-d got you hyped, stay tuned. ⚡️

0

1

Tandemn

@Tandemn_labs

6 months

🔹 KV-cache-aware routing—send each chat to the replica that already holds its prefixes, so you skip redundant prefill and kill p95. 🔹 Prefill ≠ Decode—prefill is FLOP-bound, decode is DRAM-bound. llm-d runs them on different nodes so neither GPU half-idles.

0

2

Tandemn

@Tandemn_labs

6 months

LLM inference still crawling? 🚨 Meet llm-d—a K8s-native, @vllm_project-powered framework from Red Hat @RedHat_AI that slashes cost & tail-latency with cache-aware routing + disaggregated compute. Why it matters👇 https://t.co/uiUrvamPyp

llm-d.ai

llm-d: Achieve SOTA Inference Performance On Any Accelerator

2

3

11

Red Hat AI

@RedHat_AI

7 months

LLM inference is too slow, too expensive, and too hard to scale. 🚨 Introducing llm-d, a Kubernetes-native distributed inference framework, to change that—using vLLM (@vllm_project), smart scheduling, and disaggregated compute. Here’s how it works—and how you can use it today:

5

88

549