JDisselh Profile Banner
Jan Disselhoff Profile
Jan Disselhoff

@JDisselh

Followers
60
Following
65
Media
5
Statuses
34

Deep Learning Scientist | The ARChitects Kaggle Team

Joined November 2025
Don't wanna be here? Send us removal request.
@JDisselh
Jan Disselhoff
3 days
ARC Prize 2025 is over, an amazing contest, with amazing people competing. This year our team "the ARChitects" managed to reach second place. We tried a lot of things, some thoughts and explanation of our approach below!
2
1
33
@arcprize
ARC Prize
2 days
ARC Prize 2025 Winners Interviews Top Score 2nd Place The ARChitects (@dvhrtm, @JDisselh, Daniel Franzen) detail their 2D-aware, masked-diffusion LLM w/recursive self-refinement + perspective-based scoring - improving substantially over the team's 2024 autoregressive system.
1
5
45
@darrenangle
darren
3 days
DPO pushed baguettotron so far into unreadable experimental land that I didn't like it however skipping straight from SFT to GRPO is producing moments that make me forget that this model is only 371M params GRPO w mostly format reward (</think>, title, length), a huge
@darrenangle
darren
8 days
baguettotron poetry llm experiments complete and to come: - train baguettotron bradley-terry reward model on 10k kimi vs gemma 3n poems (failed, look at data, reward hacking formatting quirks) - sft baguettotron on 10k kimi poems and reverse-engineered SYNTH reasoning traces
3
1
43
@JFPuget
JFPuget πŸ‡ΊπŸ‡¦πŸ‡¨πŸ‡¦πŸ‡¬πŸ‡±
5 days
Ivan Sorokin and I are the official winners on the Arc Prize competition, with a significant lead over other teams. Thanks to @kaggle and @arcprize for hosting the competition. NVIDIA tech blog summarizing what we did: https://t.co/BU8nHPCliJ Our writeup:
37
52
518
@arcprize
ARC Prize
4 days
ARC Prize 2025 Winners Interviews Top Score 1st Place NVARC (@JFPuget, Ivan Sorokin) detail their synthetic-data-driven ensemble of an improved ARChitects-style, test-time-trained model + TRM-based components that reaches ~24% on ARC-AGI-2 under Kaggle contest constraints.
3
12
94
@arcprize
ARC Prize
5 days
Announcing the ARC Prize 2025 Top Score & Paper Award winners The Grand Prize remains unclaimed Our analysis on AGI progress marking 2025 the year of the refinement loop
23
49
317
@JFPuget
JFPuget πŸ‡ΊπŸ‡¦πŸ‡¨πŸ‡¦πŸ‡¬πŸ‡±
5 days
One ingredient of our solution is the Tiny Recursive Model of @jm_alexia . During the competition we got a score of 10% on the semi private dataset of arc agi2, and 10.41% on the public eval dataset. I further trained TRM for 10 more days using the same recipe as in our
@JFPuget
JFPuget πŸ‡ΊπŸ‡¦πŸ‡¨πŸ‡¦πŸ‡¬πŸ‡±
5 days
We also appear on the ARC AGI2 leaderboard. Not best score, but clearly on the Pareto frontier with a much lower cost than best scores.
6
13
151
@JDisselh
Jan Disselhoff
3 days
(P.S. I vaguely remember some paper that merges word embeddings to reduce token counts in LLMs that I wanted to link, but can't find anymore. If anyone knows what I am talking about hmu, or share below!)
1
0
2
@JDisselh
Jan Disselhoff
3 days
All in all an amazing experience! Huge thanks to the organizers @arcprize and congratulations to the other winning teams and papers! Definitely check them out at
Tweet card summary image
arcprize.org
Prize information, rules, and key dates.
1
0
4
@JDisselh
Jan Disselhoff
3 days
The things above were the things that worked, but of course we had a lot of approaches that did not pan out. Most frustrating one: Dave invested a lot of time into synthetic data generation, which was the approach that the first place NVARC team used! (More examples in blog)
1
0
4
@JDisselh
Jan Disselhoff
3 days
Recurrence seems a popular approach currently, and I think that our final approach is somewhat close to HRM/TRM style solutions. Adding some intermediary reasoning tokens would make them even more similar, and is something we might test in the future.
1
0
2
@JDisselh
Jan Disselhoff
3 days
Even though this is not how the model was trained, it handles this very well and suddenly was able to fix errors and solve problems it previously struggled with! That insight allowed us to increase our score to our final leaderboard score
1
0
4
@JDisselh
Jan Disselhoff
3 days
In the last weeks of the competition, Dave and Daniel had a breakthrough: They stopped using the masked diffusion as such, and did not fully demask tokens! Instead partial solutions are combined with mask tokens and recurrently fed back into the LLM!
1
0
2
@JDisselh
Jan Disselhoff
3 days
The issue was that our approach from last year was incredibly effective at utilizing additional compute, but we had no way of doing the same for the masked diffusion model! The base model was stronger, but did not scale well in inference!
1
0
2
@JDisselh
Jan Disselhoff
3 days
All of this allowed us to build a masked diffusion model that was able to solve ARC tasks. We had amazing performance on the public eval dataset in our tests and then... could not increase our score on the leaderboard...
1
0
2
@JDisselh
Jan Disselhoff
3 days
Additionally we experimented with manipulating positional embeddings, to allow the model a better understanding of the 2D structure (see https://t.co/IEgWUEZJka). This helped, but less than expected. RoPE is surprisingly adaptable even to problems it was not designed to handle.
1
0
2
@JDisselh
Jan Disselhoff
3 days
While there are issues with that in natural language, on ARC this can be of great benefit since we can predict parts of the problem very easily (such as background), and puzzle tasks become much simpler. For issues see for example here: https://t.co/Ez2yqMMZCV
@ducx_du
Cunxiao Du
15 days
Diffusion LLMs (DLLM) can do β€œany-order” generation, in principle, more flexible than left-to-right (L2R) LLM. Our main finding is uncomfortable: ➑️ In real language, this flexibility backfires: DLLMs become worse probabilistic models than the L2R / R2L AR LMs. This
1
0
2
@JDisselh
Jan Disselhoff
3 days
However, we were also working on a different approach, using Masked Diffusion LLMs based on LLaDA ( https://t.co/accFw5lW8A). While these models are often cited for their inference speed, we were far more interested in their ability to choose the order in which they unmask tokens!
Tweet card summary image
github.com
Official PyTorch implementation for "Large Language Diffusion Models" - ML-GSAI/LLaDA
1
0
6
@JDisselh
Jan Disselhoff
3 days
Using our old method with some inference optimizations therefore saturated at ~17 points on the public leaderboard, 14.17 points on the private score.
1
0
2
@JDisselh
Jan Disselhoff
3 days
However, we saw that this method had a hard time on ARC-2. Die to the Autoregressive nature, it was easy to make early prediction mistakes that the model is then unable to fix. Our method struggled especially on puzzle and simulation tasks, as well as at predicting diagonals.
1
0
2
@JDisselh
Jan Disselhoff
3 days
When the contest began, we were still working on optimizing our approach from the previous year, which won ARC-2024. It used finetuned LLMs with a custom sampling method and a selection scheme that allowed us to leverage test-time compute very efficiently!
1
0
2