kourosh hakhamaneshi Profile
kourosh hakhamaneshi

@CyrusHakha

Followers
1K
Following
2K
Media
42
Statuses
817

LLMs + Ray @anyscalecompute ๐Ÿ’ป prev PhD, EECS, @UCBerkeley ๐Ÿ‘จโ€๐ŸŽ“

California, USA
Joined September 2010
Don't wanna be here? Send us removal request.
@CyrusHakha
kourosh hakhamaneshi
2 years
๐Ÿš€ Exploring Llama-2โ€™s Quality: Can we replace generalist GPT-4 endpoints with specialized OSS models? Dive deep with our technical blogpost to understand the nuances and insights of fine-tuning OSS models. ๐Ÿ”— https://t.co/zVStDCoG4y ๐Ÿงตย Thread 1/N๐Ÿ‘‡
Tweet card summary image
anyscale.com
We examine the Llama-2 models under 3 real-world use cases and show that fine-tuning yields significant accuracy improvements.
16
116
527
@CyrusHakha
kourosh hakhamaneshi
4 days
It has always been insightful to talk with ray developers and how they are solving their infrastructure problems with ray. Last meetup of the year happening today:
@raydistributed
ray
6 days
Join us for the final Ray Meetup of the year, where we will deep dive with technical talks on core advancements in Ray, as well as discuss what's coming in 2026. ๐ŸŽ‰ Ray Meetup: A Year of Distributed Systems Innovation (End-of-Year Celebration) ๐Ÿ—“๏ธ December 18 โฑ๏ธ 5:30 - 7:30 PM ๐Ÿ“
1
1
3
Join us for the final Ray Meetup of the year, where we will deep dive with technical talks on core advancements in Ray, as well as discuss what's coming in 2026. ๐ŸŽ‰ Ray Meetup: A Year of Distributed Systems Innovation (End-of-Year Celebration) ๐Ÿ—“๏ธ December 18 โฑ๏ธ 5:30 - 7:30 PM ๐Ÿ“
Tweet card summary image
luma.com
Join the Ray OSS community for our final meetup of 2025! As we close out a milestone year for distributed computing, we are bringing everyone together toโ€ฆ
0
1
3
@vllm_project
vLLM
5 days
vLLM delivers even more inference performance with the same GPU platform. In just 1 month, we've worked with NVIDIA to increase @nvidia Blackwell maximum throughput per GPU by up to 33% -- significantly reducing cost per token -- while also enabling even higher peak speed for
9
40
325
@PyTorch
PyTorch
6 days
Watch @richliaw (@anyscalecompute) explain why Ray joined PyTorch Foundation, citing the ecosystem forming around PyTorch, DeepSpeed, and vLLM, and what this move signals about Rayโ€™s role in the AI infrastructure stack. ๐Ÿ”— https://t.co/1cKtUtDmvm #PyTorch #Ray #AIInfrastructure
0
4
53
@seiji_________
Seiji Eicher
4 days
vLLM + Ray Serve LLM APIs = ๐Ÿ’˜! It was an honor to collaborate with the vLLM team to put this together.
@vllm_project
vLLM
4 days
Scaling MoE inference is often communication + KV-cache bound: once you push expert parallelism, decode can become dominated by collectives and imbalance, and prefill stragglers can stall an entire EP group. New community benchmark results for vLLM wide-EP on multi-node H200
0
3
8
@vllm_project
vLLM
4 days
Scaling MoE inference is often communication + KV-cache bound: once you push expert parallelism, decode can become dominated by collectives and imbalance, and prefill stragglers can stall an entire EP group. New community benchmark results for vLLM wide-EP on multi-node H200
5
38
264
@pcmoritz
Philipp Moritz
14 days
We are happy to announce SkyRL tx 0.2, see our blog post https://t.co/bwn5kBtCf8. It comes with lots of performance improvements, all parts of the execution now use jax jit, so there is very little overhead. Now is probably the best time to try it out if you haven't already ๐Ÿงธ
2
8
41
@NovaSkyAI
NovaSky
14 days
We recently released SkyRL-Train v0.3.0! Highlights include: - Experimental support for Pipeline-RL style Async-RL - Updated E2E Recipes page with Math, Search, SQL runs - Migration from mbridge -> Megatron-Bridge - 14 new OSS contributors! (1/n) ๐Ÿงต
2
6
28
@CyrusHakha
kourosh hakhamaneshi
16 days
The team cooks ๐Ÿ”ฅ Iteration velocity on RL is key to achieving good results. SkyRL is built to modularize RL on LLMs so that researchers can focus on improving the model quality.
@charlie_ruan
Charlie Ruan
16 days
Announcing OpenThoughts-Agent with an incredible team โ€” a data-centric effort on TerminalBench-style tasks, built with SkyRL+Harbor ๐Ÿ’ป๐Ÿค– Co-leading the RL team over the past month has been a blast, and weโ€™re just getting started! (1/n) ๐Ÿงต
0
1
8
@charlie_ruan
Charlie Ruan
16 days
Announcing OpenThoughts-Agent with an incredible team โ€” a data-centric effort on TerminalBench-style tasks, built with SkyRL+Harbor ๐Ÿ’ป๐Ÿค– Co-leading the RL team over the past month has been a blast, and weโ€™re just getting started! (1/n) ๐Ÿงต
@NeginRaoof_
Negin Raoof
16 days
How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on
6
14
45
@CyrusHakha
kourosh hakhamaneshi
21 days
We are starting a recurrent office hour session for our LLM APIs on Ray + vLLM. This week weโ€™ll have prefix agenda on wide-EP demo for online inference and distributing batched embedding computation using ray data. Stop by if you are curious about these topics.
@seiji_________
Seiji Eicher
21 days
Ray Serve/Data LLM office hours tomorrow 12/2, 9:30-10:30a PT. Come through to chat distributed LLM inference ๐Ÿš€ @nikhil_r_ghosh giving away free alpha on batch embeddings workloads; I'll demo the new wide-EP and disaggregated serving APIs for Ray Serve
0
1
5
@anyscalecompute
Anyscale
22 days
๐Ÿš€ Join Anyscale at #NeurIPS2025 in San Diego. We'll be gathering a group of researchers, founders, & engineers over food and drinks. We'll be discussing Ray and the frontier of large-scale RL, multimodal model training, and multi-node LLM inference. ๐Ÿ“… Thursday, December 4 ยท
Tweet card summary image
luma.com
Youโ€™re invited to the Anyscale Happy Hour at NeurIPS! Join us for an evening hosted by Anyscale co-founder Robert Nishiharaโ€”a relaxed, high-energy gatheringโ€ฆ
0
3
13
@CyrusHakha
kourosh hakhamaneshi
24 days
Wise words. Just adding to this, I also think the training infra cost will still be severely dominated by inference cost (rather than pure training) for 1) data curation and synthesis and 2) RL rollouts. So itโ€™s still inference infrastructure that is dominating the foundation.
@AndrewYNg
Andrew Ng
24 days
Is there an AI bubble? With the massive number of dollars going into AI infrastructure such as OpenAIโ€™s $1.4 trillion plan and Nvidia briefly reaching a $5 trillion market cap, many have asked if speculation and hype have driven the values of AI investments above sustainable
1
0
2
@ArtificialAnlys
Artificial Analysis
26 days
Google TPU v6e vs AMD MI300X vs NVIDIA H100/B200: Artificial Analysisโ€™ Hardware Benchmarking shows NVIDIA achieving a ~5x tokens-per-dollar advantage over TPU v6e (Trillium), and a ~2x advantage over MI300X, in our key inference cost metric In our metric for inference cost
66
189
1K
@shiyi_c98
Shiyi Cao
26 days
1/n ๐Ÿš€ Introducing SkyRL-Agent, a framework for efficient RL agent training. โšก 1.55ร— faster async rollout dispatch ๐Ÿ›  Lightweight tool + task integration ๐Ÿ”„ Backend-agnostic (SkyRL-train / VeRL / Tinker) ๐Ÿ† Used to train SA-SWE-32B, improving Qwen3-32B from 24.4% โ†’ 39.4%
5
60
274
@CyrusHakha
kourosh hakhamaneshi
26 days
Weโ€™ve been constantly asked on how to do deepseek-style deployments with ray serve. Ideas like prefill decode dissaggregation, wide-EP, custom request routing for prefill / decode, etc require fair amount of work in the orchestration layer that might be non-trivial. In Ray 2.52
@seiji_________
Seiji Eicher
26 days
Wide-EP and prefill/decode disaggregation APIs for vLLM are now available in Ray 2.52 ๐Ÿš€๐Ÿš€ Validated at 2.4k tokens/H200 on Anyscale Runtime, these patterns maximize sparse MoE model inference efficiency, but often require non-trivial orchestration logic. Hereโ€™s how they
0
0
1
@seiji_________
Seiji Eicher
26 days
Wide-EP and prefill/decode disaggregation APIs for vLLM are now available in Ray 2.52 ๐Ÿš€๐Ÿš€ Validated at 2.4k tokens/H200 on Anyscale Runtime, these patterns maximize sparse MoE model inference efficiency, but often require non-trivial orchestration logic. Hereโ€™s how they
1
16
28
@RedHat_AI
Red Hat AI
1 month
Weโ€™re open-sourcing a set of high quality speculator models for Llamas, Qwens, and gpt-oss on Hugging Face. In real workloads, you can expect 1.5 to 2.5x speedups and sometimes more than 4x. Hereโ€™s how this fits into the bigger story for speculative decoding. A thread ๐Ÿงต:
5
21
90
@vllm_project
vLLM
1 month
Need to customize vLLM? Don't fork it. ๐Ÿ”Œ vLLM's plugin system lets you inject surgical modifications without maintaining a fork or monkey-patching entire modules. Blog by Dhruvil Bhatt from AWS SageMaker ๐Ÿ‘‡ Why plugins > forks: โ€ข vLLM releases every 2 weeks with 100s of PRs
6
48
396
@raydistributed
ray
2 months
New Anyscale releases announced at Ray Summit, from Developer Central to Anyscale Runtime to Cluster Controller. Read the roll up blog: https://t.co/iMdQWtp5w5
0
1
6