Dhruv π
@dhruv31415
Followers
548
Following
2K
Media
64
Statuses
247
On the other side of the fence @tilderesearch
Palo Alto, CA
Joined November 2023
Some exciting stuff since then!
We’re excited to announce that Tilde completed an $8M seed round earlier this year, led by Khosla Ventures. Understanding model intelligence is the most important problem in the world, and the key to actualizing the promise that ASI can offer. 🧵 A thread on our approach:
0
0
17
🚀Really excited to see this amazing arch change (KDA) finally coming out! Replacing global attention with linear hybrid arch: better pretraining ppls, long context evals, downstream math&code&stem evals after RL, >6 * throughput at 1M to unblock more downstream potentials to
Kimi Linear Tech Report is dropped! 🚀 https://t.co/LwNB2sQnzM Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performance—ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi
1
18
55
Thrilled to release new paper: “Scaling Latent Reasoning via Looped Language Models.” TLDR: We scale up loop language models to 2.6 billion parameters, and pretrained on > 7 trillion tokens. The resulting model is on par with SOTA language models of 2 to 3x size.
20
137
627
Main innovation seems to be a row-wise forget gate. Cool!
2
1
36
Kimi Delta Attention PR in FLA, very nice @yzhang_cs and team, i'm sooo excited for this model
4
6
85
Low-precision attention may suffer from biased rounding errors https://t.co/0hxHG3tPu2
1
14
144
i’ve been deeply obsessed with the question of how to make humans less fragile. several months ago i decided to leave Stanford to research and deploy the biological machine learning methods that can get us closer. can finally share that i’ve been on the founding team
Valthos builds next-generation biodefense. Of all AI applications, biotechnology has the highest upside and most catastrophic downside. Heroes at the frontlines of biodefense are working every day to protect the world against the worst case. But the pace of biotech is against
63
26
443
NousCon last night was a massive success! Thank you to everyone who showed out for our biggest event of the year. The future of open source AI is incredibly bright. S/o @rosstaylor90 and @dhruv31415 for coming to speak, @poetengineer__ and @johnkarborn for their epic live art
13
14
188
art, drinks, open source ai w.s.g. tilde research and general reasoning oct. 24th, SF, 6p
38
28
663
My feed in the past few days has become dominated by random Japanese ecology accounts. Never have I been happier to open this app.
0
0
4
(1/2) i felt like no one actually teaches you a good framework for how to read (ML) papers well + fast, so i wrote this 5-minute read tldr: because so many papers suck, here's how to go through them quickly and revisit the good ones
28
208
2K
It's crazy how many interesting questions there are to ask - and how many of them don't get solved in a world where more researchers don't get access to resources.
Today we're very happy to announce that we’re launching the Tilde Fellowship Program to support research in a mechanistic understanding of pre-training science (arch, optimizers, learning dynamics, etc.). Much of modern ML progress has come from scaling models and empirically
1
0
16
Really cool generalization of Manifold Muon, awesome work from @SolidlySheafy
Modern optimizers can struggle with unstable training. Building off of Manifold Muon, we explore more lenient mechanisms for constraining the geometry of a neural network's weights directly through their Gram matrix 🧠 A 🧵… ~1/6~
0
0
10
[1/N] How can we make attention more powerful—not just more efficient? How do different attention mechanisms handle associative memory, and can we design a better one from first principles? 🤔 Our new work explores these questions by introducing Local Linear Attention (LLA).
3
35
213
My thesis, 𝘈 𝘵𝘩𝘦𝘰𝘳𝘺 𝘰𝘧 𝘵𝘩𝘦 𝘤𝘰𝘮𝘱𝘶𝘵𝘢𝘵𝘪𝘰𝘯𝘢𝘭 𝘱𝘰𝘸𝘦𝘳 𝘢𝘯𝘥 𝘭𝘪𝘮𝘪𝘵𝘢𝘵𝘪𝘰𝘯𝘴 𝘰𝘧 𝘭𝘢𝘯𝘨𝘶𝘢𝘨𝘦 𝘮𝘰𝘥𝘦𝘭𝘪𝘯𝘨 𝘢𝘳𝘤𝘩𝘪𝘵𝘦𝘤𝘵𝘶𝘳𝘦𝘴, is now online:
8
46
386
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
90
390
3K