AdithyaDsilva Profile Banner
Adithya Dsilva Profile
Adithya Dsilva

@AdithyaDsilva

Followers
207
Following
123
Media
2
Statuses
48

Fortune favors the bold. @lossfunk | @_lagrangepoint

Bangalore
Joined August 2020
Don't wanna be here? Send us removal request.
@mark_k
Mark Kretschmann
2 days
It's over. GPT-5.2 aces one of the most important benchmarks and it's not even close!
177
104
2K
@AdithyaDsilva
Adithya Dsilva
3 days
Anyone driving from Bangalore -> Mangalore in the next couple days? Want to take my cat back home, happy to pay for fuel
3
0
10
@AdithyaDsilva
Adithya Dsilva
4 days
Training towards a local minima sure, but isn't that all competitive leaderboards anyway?
0
0
1
@AdithyaDsilva
Adithya Dsilva
4 days
One may call training on the test inputs "cheating", but it is no more an act of cheating than Nakamura training against a chess model simulating Carlsens playing style, one day before their world chess championship.
1
0
1
@AdithyaDsilva
Adithya Dsilva
4 days
If nothing, this is sufficient validation that a lot of potential performance with transformers is left on the table by using heuristics to encode positions (RoPE, YaRN etc) vs pos embeddings generated by the model itself. The bitter lesson comes for us all
1
0
2
@AdithyaDsilva
Adithya Dsilva
4 days
validation inputs look like. Joint training (predict inputs and outputs) vs conditional (predict outputs given inputs) does have some perf impact tho. However, by far the biggest factor for perf (~21%) is from 3D RoPE + well configured model hyper-params (ablations ongoing)
1
0
1
@AdithyaDsilva
Adithya Dsilva
4 days
Follow up ablations and reasoning: - Training on train i/o + test inp gives 27.5% - Training on train i/o only gives 23.25% - Training on train outputs only gives 21.5% All training is standard NTP. Biggest drop in perf is from not knowing what the... https://t.co/rq6Mjjrqal
@AdithyaDsilva
Adithya Dsilva
5 days
@evilmathkid Reproduced the results successfully, but the reasoning in the blog doesn't stand up to ablations. Not training on test input has minimal impact on performance. The success is most likely by the novel usage of 3d RoPE + shared latents. Awesome results regardless!
2
1
8
@AdithyaDsilva
Adithya Dsilva
4 days
@AndrewYNg Limitations: - gains through representation crafting will vanish as the model is scaled. - very heavily dependent on the ability to create infinite training data from a limited dataset by exploiting invariance. - no easy way of extending this to solve arc-3
0
0
3
@AdithyaDsilva
Adithya Dsilva
4 days
Surprising results: - transformer trained to predict both input and output performs ~30% better than just predicting the output (A 2001 @AndrewYNg paper going into the details on this https://t.co/aMIn7b3Qcf) - exploiting invariance of the arc problemset takes you a long way
2
0
3
@AdithyaDsilva
Adithya Dsilva
4 days
Biggest factors for perf gains: - wide token dimensions, hand-crafted representations (3D rope, 800x augmentations, single model trained over all tasks) - reweighing training dataset using info from val (model implicitly prioritizes training samples acc to questions in val set)
0
0
2
@AdithyaDsilva
Adithya Dsilva
4 days
A TL;DR; on the new "pareto-optimal arc-1 solution" - trains a simple (4 layer transformer) auto-regressive model on a dataset of `<arc-input> <sep> <arc-output>` sequences, gets 27.5% on public val. 🧵 https://t.co/QT63GFWfFz
@evilmathkid
Mithil Vakde
5 days
Announcing New Pareto Frontier on ARC-AGI 27.5% for just $2 333x cheaper than TRM! Beats every non-thinking LLM in existence Cost so low, its literally off the chart Vanilla transformer. No special architectures. Tiny. Trained in 2 hrs. Open source. Thread:
2
0
6
@itsarnavb
Arnav Bansal ⠕
26 days
If you enjoy obsessively figuring out how protocols, APIs, and machines work under the hood, DM me with your past work You'll work alongside some *excellent* collaborators at @_lagrangepoint over a retreat
5
9
100
@natashamalpani
Natasha Malpani 👁
29 days
we’re hosting a small dinner in bangalore next week for founders building at the frontier of AI, science + deep tech. one rule: bring the hardest problem you haven’t solved yet.
47
17
290
@Mankaran32
mansin
1 month
Finally hiring for @eyecandyrobots Looking for these roles, entry level could work: - robotics reinforcement learning dev - mechanical - electronics - 3d animation artist in-person in bangalore for now. DM hobby projects and that should be it. will get back asap.
4
13
77
@thedivtagguy
Aman Bhargava 📊
2 months
We've been running Diagram Chasing, our dataviz publication, for a year. Now we're looking to build a network of creative collaborators for future projects. Illustrators, designers, programmers, cartographers, anyone who wants to make public-interest data work with us! 🧵
2
1
7
@lossfunk
Lossfunk
2 months
We released the full Batch 5 graduation video on your channel🎓. This one is filled with demos and talks ranging from walking robots to a chess AI that beats a DeepMind model. Go watch it 👇
2
4
77
@AdithyaDsilva
Adithya Dsilva
2 months
https://t.co/c7mhCdLGkU The beauty of FFT couldn't be more apparent!
Tweet card summary image
nima101.github.io
Problems and Algorithms
0
0
2
@AdithyaDsilva
Adithya Dsilva
3 months
Working at @lossfunk on a novel diffusion-rl direction to crack arc-agi-2. Looking for interested researchers. Hmu if you've implemented LLMs from scratch before
9
2
105
@pr0t0_01
Alqama. S
4 months
Fastest and most fun graduation ever Ft. @Mankaran32 @khushhhi_ @HariAyapps @rt_2422 @nishgl @AdithyaDsilva @chit_raa @TanayLohia1 @paraschopra @Timbre Thanks for such gr8 community @lossfunk
3
6
74