vinaysrao Profile Banner
Vinay S Rao Profile
Vinay S Rao

@vinaysrao

Followers
698
Following
220
Media
1
Statuses
39

LLM stuff at Nvidia, previously Meta, Character AI, Google Brain, Baidu, Cerebras.

Palo Alto, CA
Joined March 2010
Don't wanna be here? Send us removal request.
@vinaysrao
Vinay S Rao
1 month
While at Meta, I worked on this optimizer-wrapper (outer step lookahead momentum) we're calling Snoo ( https://t.co/SSZLcYNXzp). You can use it with AdamW or Muon and see really strong scaling. Here's a plot where we ran it against (tuned) AdamW up to 1e23 training flop scales.
5
21
232
@leloykun
leloy!
6 months
Fast, Numerically Stable, and Auto-Differentiable Spectral Clipping via Newton-Schulz Iteration Hi all, I'm bacc. I have a lot to talk about, but let's start with this fun side-project. Here I'll talk about novel (?) ways to compute: 1. Spectral Clipping (discussed in Rohan's
@_arohan_
rohan anil
6 months
Doing some math to cleanse the timelinez Why do loss blow up? A question to deepthink. So an attempt: why not clip the singular values of the update? σ > 1, clip to 1 σ <=1, return σ Naive implementation: Update = U S V.T Update_clipped = U clip(S, 1) V.T How to make it
7
40
295
@Clashluke
Lucas Nestler
6 months
Ever wonder why switching optimizers never actually helps? Introducing HeavyBall Benchmark
8
27
194
@WolframRvnwlf
Wolfram Ravenwolf
8 months
By the way, I've also re-evaluated Llama 4 Scout via the Together API. Happy to report that they've fixed whatever issues they'd had earlier, and the score jumped from 66.83% to 74.27%!
0
3
7
@AIatMeta
AI at Meta
8 months
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model
834
2K
13K
@Ahmad_Al_Dahle
Ahmad Al-Dahle
8 months
As of today, Llama 4 Maverick offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena. It's wild to think Llama was a research project a couple of years ago & amazing to see how much progress we've made in the last two
26
63
638
@Ahmad_Al_Dahle
Ahmad Al-Dahle
8 months
Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
320
915
6K
@aleks_madry
Aleksander Madry
9 months
GSM8K has been a cornerstone benchmark for LLMs, but performance seemed stuck around 95%. Why? Turns out, the benchmark itself was noisy. We fixed that, and found that it significantly affects evals. Introducing GSM8K-Platinum! w/@EdwardVendrow @josh_vendrow @sarameghanbeery
9
60
468
@character_ai
Character.AI
3 years
Have you ever spent hours chatting with your favorite character? 😍 Well, it turns out you're not alone! Today, users spend an average of two hours talking to their characters, and with our new C1.2 update, characters are more helpful than ever before! 🤖💬 https://t.co/bwi8aAGG2I
Tweet card summary image
similarweb.com
AI chatbot from ex-Google engineers raises $150 million based on high engagement – 3 to 4 times more than other top websites
379
48
609
@character_ai
Character.AI
3 years
Announcing our Series A and our new AI model, C1.2!
376
35
454
@vinaysrao
Vinay S Rao
3 years
https://t.co/BaHlD9Vg2O is in the news (NYTimes and TechRadar). Read about it here: https://t.co/xHLe383wmX,
0
0
5
@character_ai
Character.AI
3 years
Introducing Character What if you could create your own AI, and it was always available to help you with anything? Imagine everything it could do for you… https://t.co/L1X32llikX
Tweet card summary image
blog.character.ai
Character is a full stack Artificial General Intelligence (AGI) company. What if you could create your own AI, and it was always available to help you with anything? Imagine everything it could do...
217
93
574
@IrwanBello
Irwan Bello
3 years
https://t.co/vk1dPe3UaW becomes multimodal! Stay tuned, we got quality, consistency and personalization improvements on the way.
Tweet card summary image
character.ai
Chat with millions of AI Characters on the #1 AI chat app. Where will your next adventure take you?
@character_ai
Character.AI
3 years
Introducing Image Generating Characters! Image Generating Characters generate images as you talk to them, offering a more engaging and immersive experience. Try them at https://t.co/u1NTCjis64
1
6
66
@slatestarcodex
Scott Alexander
3 years
Not quite deepfake-level yet (source: https://t.co/m0grkqa2iq)
26
17
298
@onetrueslava
Slava Butkovich
3 years
To test out the chatbot capabilities of @character_ai characters, I did what any reasonable person would do: put Joe Biden and Donald Trump together in a D&D party (moderated by a DM). What happened ended up being surprisingly touching. Well, at least some parts. A thread 🧵:
1
1
2
@vinaysrao
Vinay S Rao
3 years
Here's what I've been working on! https://t.co/brp3ICgtD4 Check out our beta - create your own characters, and talk to more! We're excited to see what you think!
@character_ai
Character.AI
3 years
We’re excited and proud to be opening up the https://t.co/Ft9b0j47zQ beta to the public! Character lets you create and talk to advanced AI (language tutors, text adventure games, celebrities, talking animals + more).
0
0
8
@arankomatsuzaki
Aran Komatsuzaki
4 years
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer By transferring from 40M parameters, µTransfer outperforms the 6.7B GPT-3, with tuning cost only 7% of total pretraining cost. abs: https://t.co/kYiuGDiUpE repo: https://t.co/TG4eZHErto
5
43
245
@tomgoldsteincs
Tom Goldstein
4 years
There's no evidence that SGD plays a fundamental role in generalization. With totally deterministic full-batch gradient descent, Resnet18 still gets >95% accuracy on CIFAR10. With data augmentation, full-batch Resnet152 gets 96.76%. https://t.co/iwIqQd7U1O
29
166
858
@SuryaGanguli
Surya Ganguli
4 years
1/ Our new work: "How many degrees of freedom do we need to train deep networks: a loss landscape perspective." https://t.co/O7UKkSOO63 We present a geometric theory that connects to lottery tickets and a new method: lottery subspaces. w/ @_BrettLarsen @caenopy @stanislavfort
5
64
290
@edwardjhu
Edward Hu
4 years
GPT-3 175B is powerful but too expensive to serve many finetuned copies. We use low-rank adaptation (LoRA) to learn task modules that are 10,000x smaller and can be swapped while the main model is frozen. No extra inference latency or quality drop! Paper: https://t.co/Nz7aMrgRDj
6
71
352