Dhruv π Profile
Dhruv π

@dhruv31415

Followers
548
Following
2K
Media
64
Statuses
247

On the other side of the fence @tilderesearch

Palo Alto, CA
Joined November 2023
Don't wanna be here? Send us removal request.
@dhruv31415
Dhruv π
4 months
Some exciting stuff since then!
@tilderesearch
Tilde
4 months
We’re excited to announce that Tilde completed an $8M seed round earlier this year, led by Khosla Ventures. Understanding model intelligence is the most important problem in the world, and the key to actualizing the promise that ASI can offer. 🧵 A thread on our approach:
0
0
17
@dhruv31415
Dhruv π
12 days
It'll converge to a better test loss I promise
0
0
11
@zy27962986
Zongyu Lin
12 days
🚀Really excited to see this amazing arch change (KDA) finally coming out! Replacing global attention with linear hybrid arch: better pretraining ppls, long context evals, downstream math&code&stem evals after RL, >6 * throughput at 1M to unblock more downstream potentials to
@Kimi_Moonshot
Kimi.ai
13 days
Kimi Linear Tech Report is dropped! 🚀 https://t.co/LwNB2sQnzM Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performance—ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi
1
18
55
@RidgerZhu
Rui-Jie (Ridger) Zhu
13 days
Thrilled to release new paper: “Scaling Latent Reasoning via Looped Language Models.” TLDR: We scale up loop language models to 2.6 billion parameters, and pretrained on > 7 trillion tokens. The resulting model is on par with SOTA language models of 2 to 3x size.
20
137
627
@dhruv31415
Dhruv π
14 days
Main innovation seems to be a row-wise forget gate. Cool!
@eliebakouch
elie
15 days
Kimi Delta Attention PR in FLA, very nice @yzhang_cs and team, i'm sooo excited for this model
2
1
36
@eliebakouch
elie
15 days
Kimi Delta Attention PR in FLA, very nice @yzhang_cs and team, i'm sooo excited for this model
@eliebakouch
elie
15 days
OMG, I'M SO HYPE
4
6
85
@Jianlin_S
jianlin.su
16 days
Low-precision attention may suffer from biased rounding errors https://t.co/0hxHG3tPu2
1
14
144
@tilderesearch
Tilde
16 days
5 days left! 🎃
@tilderesearch
Tilde
29 days
Today we're very happy to announce that we’re launching the Tilde Fellowship Program to support research in a mechanistic understanding of pre-training science (arch, optimizers, learning dynamics, etc.). Much of modern ML progress has come from scaling models and empirically
2
1
27
@tinabmai
Tina Mai
19 days
i’ve been deeply obsessed with the question of how to make humans less fragile. several months ago i decided to leave Stanford to research and deploy the biological machine learning methods that can get us closer. can finally share that i’ve been on the founding team
@ValthosTech
Valthos
19 days
Valthos builds next-generation biodefense. Of all AI applications, biotechnology has the highest upside and most catastrophic downside. Heroes at the frontlines of biodefense are working every day to protect the world against the worst case. But the pace of biotech is against
63
26
443
@Aboozle
Abhay
18 days
NousCon last night was a massive success! Thank you to everyone who showed out for our biggest event of the year. The future of open source AI is incredibly bright. S/o @rosstaylor90 and @dhruv31415 for coming to speak, @poetengineer__ and @johnkarborn for their epic live art
13
14
188
@NousResearch
Nous Research
23 days
art, drinks, open source ai w.s.g. tilde research and general reasoning oct. 24th, SF, 6p
38
28
663
@dhruv31415
Dhruv π
22 days
My feed in the past few days has become dominated by random Japanese ecology accounts. Never have I been happier to open this app.
@K_theHermit
でんか@『海のあかちゃん』『ヤドカリ探索図鑑』
24 days
いきもにあで「磯遊びはいいぞ」って100回くらい言った。磯遊びはいいぞ。
0
0
4
@masonwang025
Mason Wang
25 days
(1/2) i felt like no one actually teaches you a good framework for how to read (ML) papers well + fast, so i wrote this 5-minute read tldr: because so many papers suck, here's how to go through them quickly and revisit the good ones
28
208
2K
@dhruv31415
Dhruv π
29 days
It's crazy how many interesting questions there are to ask - and how many of them don't get solved in a world where more researchers don't get access to resources.
@tilderesearch
Tilde
29 days
Today we're very happy to announce that we’re launching the Tilde Fellowship Program to support research in a mechanistic understanding of pre-training science (arch, optimizers, learning dynamics, etc.). Much of modern ML progress has come from scaling models and empirically
1
0
16
@dhruv31415
Dhruv π
29 days
Weakest SignSGD user >>>
0
0
4
@dhruv31415
Dhruv π
30 days
Really cool generalization of Manifold Muon, awesome work from @SolidlySheafy
@tilderesearch
Tilde
30 days
Modern optimizers can struggle with unstable training. Building off of Manifold Muon, we explore more lenient mechanisms for constraining the geometry of a neural network's weights directly through their Gram matrix 🧠 A 🧵… ~1/6~
0
0
10
@YifeiZuoX
Yifei Zuo
1 month
[1/N] How can we make attention more powerful—not just more efficient? How do different attention mechanisms handle associative memory, and can we design a better one from first principles? 🤔 Our new work explores these questions by introducing Local Linear Attention (LLA).
3
35
213
@dhruv31415
Dhruv π
1 month
wat
0
0
6
@dhruv31415
Dhruv π
1 month
0
0
10
@lambdaviking
William Merrill
1 month
My thesis, 𝘈 𝘵𝘩𝘦𝘰𝘳𝘺 𝘰𝘧 𝘵𝘩𝘦 𝘤𝘰𝘮𝘱𝘶𝘵𝘢𝘵𝘪𝘰𝘯𝘢𝘭 𝘱𝘰𝘸𝘦𝘳 𝘢𝘯𝘥 𝘭𝘪𝘮𝘪𝘵𝘢𝘵𝘪𝘰𝘯𝘴 𝘰𝘧 𝘭𝘢𝘯𝘨𝘶𝘢𝘨𝘦 𝘮𝘰𝘥𝘦𝘭𝘪𝘯𝘨 𝘢𝘳𝘤𝘩𝘪𝘵𝘦𝘤𝘵𝘶𝘳𝘦𝘴, is now online:
8
46
386
@yule_gan
Yulu Gan
1 month
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
90
390
3K