
Rota
@pli_cachete
Followers
23K
Following
156K
Media
1K
Statuses
11K
Continually return to what you love. What is beautiful and good and true
14
104
847
We need to put a total stop to physicists writing textbooks until we figure out what is going on
2
1
60
Hey, New York — Brian thinks he's our new CFO. We gave him a stage to prove it.
0
0
1
We would have AGI now if the labs had a guy who knew about infinite dimensional Lévy processes
9
3
97
Somebody needs to write their philosophy dissertation on AI and the pessimistic meta-induction
1
0
8
$arbe sold most of the position with plenty of time , going to let these eat.
3
3
21
i’ve published my first s*bstack summarizing On What Matters, specifically, the strongest arguments against Subjectivism
here’s the high level summary of On What Matters: -facts of the world, not our desires, provide us with reasons to act -these non-natural facts are accessible via intuition similar to how we can assess the validity of an argument or the truth of a mathematical or modal claim
3
5
38
If gradstudents knew what actually worked in training SOTA LLMs they would be so mad
25
14
547
🧵 As AI labs race to scale RL, one question matters: when should you stop pre-training and start RL? We trained 5 Qwen models (0.6B→14B) with RL on GSM8K and found something wild: Small models see EMERGENCE-LIKE jumps. Large models see diminishing returns. The scaling law?
35
89
644
"We live in the arguably the most exciting era of mathematics in human history since the time of Euclid." Yang-Hui He's talk "The rise of the machines" for the Royal Institute, traces three ways #AI is reshaping #mathematics: bottom-up verification through systems like
2
42
205
Our Goedel-Prover V1 will be presented at COLM 2025 in Montreal this Wednesday afternoon! I won’t be there in person, but my amazing and renowned colleague @danqi_chen will be around to help with the poster — feel free to stop by!
2
8
72
We introduce a new ''rule'' for understanding diffusion models: Selective Underfitting. It explains: 🚨 How diffusion models generalize beyond training data 🚨 Why popular training recipes (e.g., DiT, REPA) are effective and scale well Co-led with @kiwhansong0! (1/n)
8
59
401
August 2025: Oxford and Cambridge mathematicians publish a paper entitled "No LLM Solved Yu Tsumura's 554th Problem". They gave this problem to o3 Pro, Gemini 2.5 Deep Think, Claude Opus 4 (Extended Thinking) and other models, with instructions to "not perform a web search to
GPT-5-Pro solved, in just 15 minutes (without any internet search), the presentation problem known as “Yu Tsumura’s 554th Problem.” https://t.co/tKae6Vo0Kb This is the first model to solve this task completely. I expect more such results soon — the model demonstrates a strong
47
125
1K
Now it's up to us to refine and scale symbolic AGI to save the world economy before the genAI bubble pops. Tick tock
85
83
1K
I'm starting a new project. Working on what I consider to be the most important problem: building thinking machines that adapt and continuously learn. We have incredibly talent dense founding team + are hiring for engineering, ops, design. Join us:
adaptionlabs.ai
Building the future of adaptable intelligence
182
182
2K
Update: we were able to close the gap between neural networks and reweighted kernel methods on sparse hierarchical functions with hypercube data. Interestingly the kernel methods outperform carefully tuned networks in our tests.
we wrote a paper about learning 'sparse' and 'hierarchical' functions with data dependent kernel methods. you just 'iteratively reweight' the coordinates by the gradients of the prediction function. typically 5 iterations suffices.
5
31
243