Dmitry Rybin
@DmitryRybin1
Followers
2K
Following
21K
Media
67
Statuses
490
ML PhD at CUHK, BSc. Math HSE || ML for Math, Algorithm Discovery || Grand First Prize at IMC
Hong Kong
Joined May 2022
We discovered faster way to compute product of matrix by its transpose! This has profound implications for data analysis, chip design, wireless communication, and LLM training! paper: https://t.co/tWlWEc7csk The algorithm is based on the following discovery: we can compute
53
587
4K
Terence Tao, Javier Gómez-Serrano, and DeepMind reported more extended experiments with AlphaEvolve + DeepThink + AlphaProof. Many new discoveries, including some general constructions that inspired T. Tao to write two new papers. We live in the wildest timeline
8
20
232
Unbelievably detailed new paper from DeepMind on benchmarks and autograders they used on IMO Gold journey. For me main takeouts are: - autograding can achieve ~90% accuracy even on long and difficult reasoning - DeepThink is quite behind IMO Gold model on very difficult problems
11
66
611
In 2022 there were just ~3 tricks that consistently showed gains for LLMs: (1) CoT prompting (2) reward models (3) majority voting Scaling (1) => reasoning models Scaling (2) => judge LLM Scaling (3) => multi-agent systems Principled formulation is the key eg RL for reasoning
0
0
7
If you are among the researchers with access to DeepThink, AlphaProof, and AlphaEvolve - please share your experience publicly. Understanding capabilities of these tools is extremely important for math research landscape
Together with @Googleorg, we’re introducing the AI for Math initiative, bringing together five prestigious research institutions pioneering the use of AI in mathematics. ⤵️
1
1
32
My posts last week created a lot of unnecessary confusion*, so today I would like to do a deep dive on one example to explain why I was so excited. In short, it’s not about AIs discovering new results on their own, but rather how tools like GPT-5 can help researchers navigate,
190
224
2K
Would you believe that i found exactly the same things with Gemini Deep Think on some Erdos problems 2 months ago? But i never thought updating an outdated database entry from ‘open’ to ‘solved’ lands you a job at a frontier lab
5
6
347
GRPO is not frontier and is broken in so many ways i don’t even know where to start. ~50% of GRPO budget is wasted on too easy/too difficult tasks (advantage = 0) This work fixes it:
🚀 Excited to share our work at Bytedance Seed! Knapsack RL: Unlocking Exploration of LLMs via Budget Allocation 🎒 Exploration in LLM training is crucial but expensive. Uniform rollout allocation is wasteful: ✅ Easy tasks → always solved → 0 gradient ❌ Hard tasks →
4
25
363
If you are working in pure math or theoretical computer science: keep in mind that there is a $500B multi-million GPU supercomputer pointed at automating your research
87
55
1K
New short blogpost: GPT-5 and o3 helped me prove a new theorem on matrix multiplication I show that the fastest way to multiply a collection of NxN matrices A_1 A_2 ... A_k is sequential For some reason there was no literature even on multiplication of three 2x2 matrices ABC
14
54
786
If you inspect the source code of Thinky Machines blog, you will find some hidden paragraphs and RL plots The only other AI lab that does this is OpenAI e.g. LaTeX source of o1 System Card has hidden evals for Person Identification with o1
0
0
12
Software RnD is being redefined in front of our eyes. SAT solvers were developed over many years and here they are drastically improved within ~3 months of playing with AlphaEvolve-style evolution BTW Nvidia is interested in SAT solver because it can make and verify chip design
This is huge! NVIDIA just built a framework named SATLUTION, which can scale LLM-based code evolution from small kernels to full repositories (hundreds of files, tens of thousands of lines of C/C++). It is the first one that can do this. Targeting SAT (the canonical
0
3
10
In mathematics this is known as local-to-global properties. You have some photo or text where everything looks ok locally. You patch the parts together - and it’s broken ☹️ just like this Chessboard. Mathematicians use sheaves to describe when local data patched together
This one actually looks somewhat reasonable (although the piece placement is garbage, no black queen either), until, of course, you realise there are only 7 rows...
0
1
11
I did my best work after I stopped submitting to the big conferences three years ago. Not everyone has that privilege so I understand why people still try. A new venue would suffer from the same problems if a large number of papers at that venue lead to high paying jobs and
AI/ML publication venues are broken beyond fixable. I genuinely believe the only way to fix them is to completely devalue them (best to do that immediately, but perhaps slowly overtime since people have inertia). Then, start something new that encourages quality over quantity.
0
10
106
Pros: - RL training with rewards for success/fail on long horizon tasks - emergent behaviors: agents collaborate, propagate useful knowledge through the graph Cons: - may converge to some greedy behavior e.g. only communicate with 1 other agent - number of pairs in
1
0
2
Has anyone tried training LLM specifically for multi-agent, common-goal settings by modeling pairwise communication channels simply as another tool call? Specifically: add a tool <|communicate with agent {i}|>, and train the model to use it just like other tools: python, search
1
0
6
I think METR eval provides a good mental model for LLM capabilities here: Generalist model like this (or actually system of agents) can solve any problems that take human experts ~1.5 hours. Within this framework, IOI Gold is not too surprising
1/ I competed for Team USA at IOI in 2015, so this achievement hits home for me. The biggest highlight: we *did not* train a model specifically for IOI. Our IMO gold model actually set a new state of the art in our internal competitive programming evals. Reasoning generalizes!
0
0
11