Tandemn_labs Profile Banner
Tandemn Profile
Tandemn

@Tandemn_labs

Followers
51
Following
9
Media
4
Statuses
12

šŸš€ Distributed AI infra fusing every underutilized GPU in TANDEMN into a cloud-grade, high-throughput, open-source, blazing fast heterogeneity-aware LLM engine

Joined June 2025
Don't wanna be here? Send us removal request.
@Tandemn_labs
Tandemn
3 months
We also provided an easy way for people to use some open-source models deployed on our servers at https://t.co/hDfjtPv7yo by providing free 20$ credits to all users for the hackathon! Watch the entire stream here -
lnkd.in
This link will take you to a page that’s not on LinkedIn
0
0
0
@Tandemn_labs
Tandemn
3 months
We recently demoed our heterogeneous distributed inference stack at @HackMIT in a one-hour workshop, where we showcased our fully open-source stack ( https://t.co/Pl6zZmAehK), capable of running @vllm_project and @lmcache on heterogeneous GPUs that communicate via @iroh_n0.
lnkd.in
This link will take you to a page that’s not on LinkedIn
1
0
0
@Tandemn_labs
Tandemn
3 months
Everyone repping up Tandemn merch!
0
2
5
@Tandemn_labs
Tandemn
3 months
Come check us out at @HackMIT #inferwithtandemn #MIT
0
2
8
@Tandemn_labs
Tandemn
3 months
Batched inference ongoing!! ā›°ļøšŸ§—
2
0
6
@Tandemn_labs
Tandemn
4 months
Making heterogeneous GPUs feel like one server. Fewer bottlenecks, more tokens/sec and more requests/sec. Grateful for the OSS we stand on.
1
3
6
@Tandemn_labs
Tandemn
6 months
Tandemn just raised $1.7M! https://t.co/AvdfepH9PV Building the missing layer between heterogeneous GPUs & huge models to end underutilization and make AI workloads effortless. Hiring cracked MLsys and networking engineers. (šŸ˜‰Collab w/ @lmcache & @n0computer soon, stay tuned.)
Tweet card summary image
tandemn.com
Tandemn
0
2
4
@Tandemn_labs
Tandemn
6 months
šŸ“Š Two 8ƗH100 boxes, long-prompt/short-output bench: mean TTFT 3Ɨ faster & QPS +50 % vs replica-only baseline, SLOs intact. Tandemn’s next drop: hetero scheduler + QUIC mesh to harness every stray 3090/v100/a100/IntelArcs beside your H100s. If llm-d got you hyped, stay tuned. āš”ļø
0
0
1
@Tandemn_labs
Tandemn
6 months
šŸ”¹ KV-cache-aware routing—send each chat to the replica that already holds its prefixes, so you skip redundant prefill and kill p95. šŸ”¹ Prefill ≠ Decode—prefill is FLOP-bound, decode is DRAM-bound. llm-d runs them on different nodes so neither GPU half-idles.
0
0
2
@Tandemn_labs
Tandemn
6 months
LLM inference still crawling? 🚨 Meet llm-d—a K8s-native, @vllm_project-powered framework from Red Hat @RedHat_AI that slashes cost & tail-latency with cache-aware routing + disaggregated compute. Why it mattersšŸ‘‡ https://t.co/uiUrvamPyp
llm-d.ai
llm-d: Achieve SOTA Inference Performance On Any Accelerator
2
3
11
@RedHat_AI
Red Hat AI
7 months
LLM inference is too slow, too expensive, and too hard to scale. 🚨 Introducing llm-d, a Kubernetes-native distributed inference framework, to change that—using vLLM (@vllm_project), smart scheduling, and disaggregated compute. Here’s how it works—and how you can use it today:
5
88
549