Adithya Dsilva
@AdithyaDsilva
Followers
207
Following
123
Media
2
Statuses
48
Fortune favors the bold. @lossfunk | @_lagrangepoint
Bangalore
Joined August 2020
It's over. GPT-5.2 aces one of the most important benchmarks and it's not even close!
177
104
2K
Anyone driving from Bangalore -> Mangalore in the next couple days? Want to take my cat back home, happy to pay for fuel
3
0
10
Training towards a local minima sure, but isn't that all competitive leaderboards anyway?
0
0
1
One may call training on the test inputs "cheating", but it is no more an act of cheating than Nakamura training against a chess model simulating Carlsens playing style, one day before their world chess championship.
1
0
1
If nothing, this is sufficient validation that a lot of potential performance with transformers is left on the table by using heuristics to encode positions (RoPE, YaRN etc) vs pos embeddings generated by the model itself. The bitter lesson comes for us all
1
0
2
validation inputs look like. Joint training (predict inputs and outputs) vs conditional (predict outputs given inputs) does have some perf impact tho. However, by far the biggest factor for perf (~21%) is from 3D RoPE + well configured model hyper-params (ablations ongoing)
1
0
1
Follow up ablations and reasoning: - Training on train i/o + test inp gives 27.5% - Training on train i/o only gives 23.25% - Training on train outputs only gives 21.5% All training is standard NTP. Biggest drop in perf is from not knowing what the... https://t.co/rq6Mjjrqal
@evilmathkid Reproduced the results successfully, but the reasoning in the blog doesn't stand up to ablations. Not training on test input has minimal impact on performance. The success is most likely by the novel usage of 3d RoPE + shared latents. Awesome results regardless!
2
1
8
@AndrewYNg Limitations: - gains through representation crafting will vanish as the model is scaled. - very heavily dependent on the ability to create infinite training data from a limited dataset by exploiting invariance. - no easy way of extending this to solve arc-3
0
0
3
Surprising results: - transformer trained to predict both input and output performs ~30% better than just predicting the output (A 2001 @AndrewYNg paper going into the details on this https://t.co/aMIn7b3Qcf) - exploiting invariance of the arc problemset takes you a long way
2
0
3
Biggest factors for perf gains: - wide token dimensions, hand-crafted representations (3D rope, 800x augmentations, single model trained over all tasks) - reweighing training dataset using info from val (model implicitly prioritizes training samples acc to questions in val set)
0
0
2
A TL;DR; on the new "pareto-optimal arc-1 solution" - trains a simple (4 layer transformer) auto-regressive model on a dataset of `<arc-input> <sep> <arc-output>` sequences, gets 27.5% on public val. 🧵 https://t.co/QT63GFWfFz
Announcing New Pareto Frontier on ARC-AGI 27.5% for just $2 333x cheaper than TRM! Beats every non-thinking LLM in existence Cost so low, its literally off the chart Vanilla transformer. No special architectures. Tiny. Trained in 2 hrs. Open source. Thread:
2
0
6
If you enjoy obsessively figuring out how protocols, APIs, and machines work under the hood, DM me with your past work You'll work alongside some *excellent* collaborators at @_lagrangepoint over a retreat
5
9
100
we’re hosting a small dinner in bangalore next week for founders building at the frontier of AI, science + deep tech. one rule: bring the hardest problem you haven’t solved yet.
47
17
290
Finally hiring for @eyecandyrobots Looking for these roles, entry level could work: - robotics reinforcement learning dev - mechanical - electronics - 3d animation artist in-person in bangalore for now. DM hobby projects and that should be it. will get back asap.
4
13
77
We've been running Diagram Chasing, our dataviz publication, for a year. Now we're looking to build a network of creative collaborators for future projects. Illustrators, designers, programmers, cartographers, anyone who wants to make public-interest data work with us! 🧵
2
1
7
We released the full Batch 5 graduation video on your channel🎓. This one is filled with demos and talks ranging from walking robots to a chess AI that beats a DeepMind model. Go watch it 👇
2
4
77
https://t.co/c7mhCdLGkU The beauty of FFT couldn't be more apparent!
nima101.github.io
Problems and Algorithms
0
0
2
Working at @lossfunk on a novel diffusion-rl direction to crack arc-agi-2. Looking for interested researchers. Hmu if you've implemented LLMs from scratch before
9
2
105
Slides of my presentation @lossfunk. Shoutout to my co-author @OerstedIV. Hopefully a research paper soon? https://t.co/456FnhqBBq.
docs.google.com
LIES, DAMNED LIES, AND LOSS CURVES Adithya Dsilva @lossfunk Or, beating DeepMind’s model with ⅓ the compute
0
0
5
Fastest and most fun graduation ever Ft. @Mankaran32 @khushhhi_ @HariAyapps @rt_2422 @nishgl @AdithyaDsilva @chit_raa @TanayLohia1 @paraschopra @Timbre Thanks for such gr8 community @lossfunk
3
6
74