Cloud Rift
@CloudRiftAI
Followers
73
Following
20
Media
4
Statuses
48
Affordable AI Compute with no vendor lock-in.
Mountain View, CA
Joined March 2024
I present an LLM inference throughput benchmark for RTX PRO 6000 WK vs H100 vs H200 vs L40S GPUs, based on the vllm serve and vllm bench serve benchmarking tools, to understand the cost efficiency of RTX PRO 6000 vs previous-generation datacenter GPUs for LLM inference. Pro 6000
1
2
2
Is more intelligence always more expensive? Not necessarily. Introducing Poetiq. We’ve established a new SOTA and Pareto frontier on @arcprize using Gemini 3 and GPT-5.1.
59
116
947
I present an LLM inference throughput benchmark for RTX4090 / RTX5090 / PRO6000 GPUs based on vllm serving and vllm bench serve client benchmarking tool. Full article on Medium: https://t.co/SG0kSQtWpH Non-medium link: https://t.co/UrpcTqnG3a Github: https://t.co/c0CpFoaMtH
1
2
5
Want to know how multi-RTX 4090 / RTX 5090 builds compare with RTX PRO 6000? Check out our latest benchmark (spoiler: PCIE bottleneck is real, PRO 6000 beats them all) https://t.co/8Ht25ZWUaq
0
1
0
The reservation conversion feature is deployed! Now you can convert your running instances into reserved ones and save up to 20%. https://t.co/2SvpHj8KdC
0
1
0
By popular demand, a new free model has been deployed: Qwen/Qwen3-Next-80B-A3B-Thinking. Check it out at
0
1
0
Want to get into GPU programming? Understand why Flash Attention is cool? Or learn how Deep Learning was born and how NVIDIA became the world's most prominent company. Check our latest article, "The Evolution of GPU Programming: From Smart Pixels to the Backbone of an
medium.com
From Smart Pixels to the Backbone of an AI-driven World
0
0
2
We were totally not expecting that our humble $1000 reward would cause such a stir 🙀 The bug is indeed extremely annoying and has caused significant damage to early adopters of Blackwell hardware. Kudos to the community that has promptly found the workaround and brought this
0
0
2
DeepSeek-V3.1 is on the platform - check it out. Very powerful and very affordable! Only $0.15 / $0.5 per million input and output tokens, respectively. https://t.co/6hWviLTkF5
0
1
1
Feeling GPU poor? Check out our AI Grant program. No requirements and no strings attached! Just apply, get credits and mention our sponsorship when you finish your project.
cloudrift.ai
$100–$1,000 in GPU credits for independent AI builders. Apply to get on-demand access to RTX 4090s, RTX 5090s, and more.
0
1
4
Feeling GPU poor? Not enough RAM in your RTX 4090? Check out our trending Medium article on how to give your 4090 lots of memory for LLM inference.
medium.com
Network-Attached KV Cache for Long-Context, Multi-Turn Workloads
0
1
3
New write up comparing LLM inference providers. Open, reproducible benchmarks across price, throughput, and features using Llama 4 Maverick and DeepSeek. Methods and code included.
0
1
5
We've collaborated again with @CloudRiftAI but this time we're offering "llama-3.1-70b-instruct-free" for FREE
0
1
4
We’ve onboarded a new GPU provider in the UK. You can now rent RTX 4090 and RTX 5090 with low-latency access and strong throughput, suitable for AI training, LLM inference, and 3D rendering. If you’re in the UK or need EU-adjacent capacity, we’ve got you covered. Check it out
0
0
1
btw, we deployed some new RTX PRO 6000 nodes this week. Lovely machine for many day-to-day AI tasks, thanks to 96GB RAM, numerous Blackwell cores, and a very reasonable price tag. 💖
0
0
3
"Teach Yourself Computer Science" is the best resource to learn CS. 2 weeks into vibe coding and non-technical people feel the pain. "I really wish I was technical. I just don't know how to proceed." It takes ~1000hrs across 9 topics to understand CS with any depth.
64
340
3K
Doing a little online AI hackathon this Fri to Sun, Aug 15 to 17. Cash prizes for standout projects. GPU credits for all participants. Since we’re hiring, strong projects may get an interview. No guarantees, though. https://t.co/iowXQbaUdk
0
0
2
If prefixes repeat, stop recomputing them. We tested a network attached KV cache on 4090s and wrote up what helped and what did not.
medium.com
Network-Attached KV Cache for Long-Context, Multi-Turn Workloads
0
5
5