Praveen Gorthy Profile
Praveen Gorthy

@praveengorthy

Followers
129
Following
2K
Media
9
Statuses
344

https://t.co/GQkA4KVcRA Engineer by profession. Here for Tech, Programming, Sports, Finance. Strong Beliefs weakly held. In pursuit of going to bed smarter everyday.

Joined May 2010
Don't wanna be here? Send us removal request.
@cricketingview
cricketingview
6 days
Bumrah is the larger grey dot. And he's managed this in an era where T20 hitting has undergone a revolution...
16
220
1K
@CyrusHakha
kourosh hakhamaneshi
1 year
Announcing native LLM APIs in @raydistributed Ray Data and Ray Serve Libraries. These are experimental APIs we are announcing today that abstract two things: 1. Serve LLM: simplifies the deployment of LLM engines (e.g. vLLM) through ray serve APIs. Enables things like
anyscale.com
Try the new LLM APIs available on Ray Data and Ray Serve. It's now easier than ever to use Ray for offline LLM batch inference and online LLM inference.
0
7
36
@anyscalecompute
Anyscale
1 year
Scaling AI is hard. Anyscale and Google Cloud make it easier. Read how @anyscalecompute – built by the creators of @raydistributed runs on @Google Compute Engine to help teams scale any AI workload, from LLMs to classic ML. πŸ‘‡
Tweet card summary image
cloud.google.com
Without a unified and optimized infrastructure, complexity quickly spirals into excessive cloud spending, resource inefficiencies, and productivity bottlenecks. Enter Ray, the AI compute engine.
0
3
13
@anyscalecompute
Anyscale
1 year
Python dependency management has been a longstanding challenge facing AI teams. The uv package manager, built by the team at @astral_sh, goes a long long way toward putting that problem to rest, at least for code running on a single machine. The challenge is even bigger in the
2
7
43
@fleetwood___
Fleetwood
1 year
Understanding GPU bottlenecks is easy with a visualisation πŸ‘¨πŸ»β€πŸ³
25
387
3K
@anyscalecompute
Anyscale
1 year
Anyscale is expanding to India! We're opening our first international office. Come work with us to get this office off the ground (DM @jaikumarharikoa).
0
1
17
@robertnishihara
Robert Nishihara
1 year
Just sat down to read the DeepSeek-R1 paper. We're entering an era where compute isn't primarily for training. It's for creating better data. I expect to see the money & compute spent on data processing (generation / annotation / curation) grow to match and exceed the money &
31
157
978
New to Ray Train? The @Anyscalecompute team just shared an amazing presentation at #RaySummit2024, unveiling the fully built Ray Train Dashboard 🀩 With detailed insights into resource utilization, training throughput, and even profiling tools to debug bottlenecks, it’s built to
0
3
4
@benxneo
benedict neo
1 year
OpenAI, Uber, and Netflix all use Ray to scale their AI workflows. From distributed data preprocessing to LLM serving, Ray does it all. I wrote about what Ray is and why it matters in the age of LLMs link in the comments πŸ‘‡
1
8
40
@robertnishihara
Robert Nishihara
2 years
Here is the chain of thought πŸ€” 1⃣ Many companies have a lot of data. 2⃣ The point of having this data is to use it to get insights and make decisions. 3⃣ Today, the primary way that companies do that is through data analytics. Running SQL queries and simple analytics. 4⃣ In the
@robertnishihara
Robert Nishihara
2 years
Most (all?) LLM performance benchmarks like @ArtificialAnlys go in depth on *online* inference. *Batch* inference seems simpler since almost all companies run some form of embarrassingly parallel workloads. But batch inference is different from other map-reduce style workloads.
2
7
29
@anyscalecompute
Anyscale
2 years
With today’s release, vLLM 0.6.0 gives users a huge performance boost compared to 0.5.0. Anyscale is happy to have contributed batch scheduling to vLLM this release, which improved request throughput on Llama3-8b by 70%. Shout out to other contributors (@neuralmagic,
@vllm_project
vLLM
2 years
A month ago, we announced our performance roadmap. Today, we are happy to share that the latest release achieves πŸš€2.7x higher throughput and is 5x faster for output latency on Llama 8B, and 1.8x higher throughput and 2x faster on Llama 70B for H100s. https://t.co/QWTT5cyvKw
0
10
28
@robertnishihara
Robert Nishihara
2 years
In 5 of 8 recent conversations, ML platform leaders told me that their top priority over the next 6 months is to enable training on more data (e.g., an order of magnitude more). Why? Scaling laws. The idea that larger models + data + compute can lead to better results (not just
1
14
32
@anyscalecompute
Anyscale
2 years
πŸ“£πŸ“£πŸ“£ Meta-LLama-3.1-405B is now available on Anyscale! Get started here: https://t.co/8dJcU4aU9M Video:
3
11
23
@robertnishihara
Robert Nishihara
2 years
Huge release from Meta! You can spin up Llama 405B on @anyscalecompute in minutes with @raydistributed and @vllm_project.
@AIatMeta
AI at Meta
2 years
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context
0
5
19
@anyscalecompute
Anyscale
2 years
We’ve recently contributed FP8 support to the @vllm_project in collaboration with @neuralmagic. With this feature, you can see up to a 1.8x reduction in inter-token latency, with >99% accuracy preservation! 1/n
2
32
104
@cricbuzz
Cricbuzz
2 years
2014 - Man of the Tournament πŸ… 2016 - Man of the Tournament πŸ… 2022 - Played the greatest knock in T20 WC history 2024 - WC winner & Man of the Match in the final πŸ† @deeputalks decodes Virat Kohli's stunning T20I career in numbers here - https://t.co/pGXwllf270
6
596
4K
@cdnamz
Cade Daniel πŸ‡ΊπŸ‡Έ
2 years
Tomorrow I'll present a Hacker's Guide to Speculative Decoding in @vllm_project with a focus on enabling external contributors. Topics include proposer/scorer/verifier framework, proposal methods, lookahead scheduling, dynamic speculative decoding, and future contribution ideas.
3
13
110
@cdnamz
Cade Daniel πŸ‡ΊπŸ‡Έ
2 years
Chunked prefill expands the Pareto frontier for fast & cheap online continuous batching. Great work in @vllm_project by engineers at @anyscalecompute .
@anyscalecompute
Anyscale
2 years
Recently, we’ve contributed chunked prefill to @vllm_project, leading to up to 2x speedup for higher QPS regimes! In vLLM, prefilling, which fills the KV cache, and decoding, which outputs new tokens, can interfere with each other, resulting in latency degradation. 1/n
1
3
21
@anyscalecompute
Anyscale
2 years
Recently, we’ve contributed chunked prefill to @vllm_project, leading to up to 2x speedup for higher QPS regimes! In vLLM, prefilling, which fills the KV cache, and decoding, which outputs new tokens, can interfere with each other, resulting in latency degradation. 1/n
4
22
94