Adithya Dsilva @AdithyaDsilva X Profile

Adithya Dsilva

@AdithyaDsilva

Followers

207

Following

123

Media

2

Statuses

48

Fortune favors the bold. @lossfunk | @_lagrangepoint

Bangalore

Joined August 2020

Don't wanna be here? Send us removal request.

Mark Kretschmann

@mark_k

2 days

It's over. GPT-5.2 aces one of the most important benchmarks and it's not even close!

177

104

2K

Adithya Dsilva

@AdithyaDsilva

3 days

Anyone driving from Bangalore -> Mangalore in the next couple days? Want to take my cat back home, happy to pay for fuel

3

0

10

Adithya Dsilva

@AdithyaDsilva

4 days

Training towards a local minima sure, but isn't that all competitive leaderboards anyway?

0

1

Adithya Dsilva

@AdithyaDsilva

4 days

One may call training on the test inputs "cheating", but it is no more an act of cheating than Nakamura training against a chess model simulating Carlsens playing style, one day before their world chess championship.

1

0

1

Adithya Dsilva

@AdithyaDsilva

4 days

If nothing, this is sufficient validation that a lot of potential performance with transformers is left on the table by using heuristics to encode positions (RoPE, YaRN etc) vs pos embeddings generated by the model itself. The bitter lesson comes for us all

1

0

2

Adithya Dsilva

@AdithyaDsilva

4 days

validation inputs look like. Joint training (predict inputs and outputs) vs conditional (predict outputs given inputs) does have some perf impact tho. However, by far the biggest factor for perf (~21%) is from 3D RoPE + well configured model hyper-params (ablations ongoing)

1

0

1

Adithya Dsilva

@AdithyaDsilva

4 days

Follow up ablations and reasoning: - Training on train i/o + test inp gives 27.5% - Training on train i/o only gives 23.25% - Training on train outputs only gives 21.5% All training is standard NTP. Biggest drop in perf is from not knowing what the... https://t.co/rq6Mjjrqal

Adithya Dsilva

@AdithyaDsilva

5 days

@evilmathkid Reproduced the results successfully, but the reasoning in the blog doesn't stand up to ablations. Not training on test input has minimal impact on performance. The success is most likely by the novel usage of 3d RoPE + shared latents. Awesome results regardless!

2

1

8

Adithya Dsilva

@AdithyaDsilva

4 days

@AndrewYNg Limitations: - gains through representation crafting will vanish as the model is scaled. - very heavily dependent on the ability to create infinite training data from a limited dataset by exploiting invariance. - no easy way of extending this to solve arc-3

0

3

Adithya Dsilva

@AdithyaDsilva

4 days

Surprising results: - transformer trained to predict both input and output performs ~30% better than just predicting the output (A 2001 @AndrewYNg paper going into the details on this https://t.co/aMIn7b3Qcf) - exploiting invariance of the arc problemset takes you a long way

2

0

3

Adithya Dsilva

@AdithyaDsilva

4 days

Biggest factors for perf gains: - wide token dimensions, hand-crafted representations (3D rope, 800x augmentations, single model trained over all tasks) - reweighing training dataset using info from val (model implicitly prioritizes training samples acc to questions in val set)

0

2

Adithya Dsilva

@AdithyaDsilva

4 days

A TL;DR; on the new "pareto-optimal arc-1 solution" - trains a simple (4 layer transformer) auto-regressive model on a dataset of `<arc-input> <sep> <arc-output>` sequences, gets 27.5% on public val. 🧵 https://t.co/QT63GFWfFz

Mithil Vakde

@evilmathkid

5 days

Announcing New Pareto Frontier on ARC-AGI 27.5% for just $2 333x cheaper than TRM! Beats every non-thinking LLM in existence Cost so low, its literally off the chart Vanilla transformer. No special architectures. Tiny. Trained in 2 hrs. Open source. Thread:

2

0

6

Arnav Bansal ⠕

@itsarnavb

26 days

If you enjoy obsessively figuring out how protocols, APIs, and machines work under the hood, DM me with your past work You'll work alongside some *excellent* collaborators at @_lagrangepoint over a retreat

5

9

100

Natasha Malpani 👁

@natashamalpani

29 days

we’re hosting a small dinner in bangalore next week for founders building at the frontier of AI, science + deep tech. one rule: bring the hardest problem you haven’t solved yet.

47

17

290

mansin

@Mankaran32

1 month

Finally hiring for @eyecandyrobots Looking for these roles, entry level could work: - robotics reinforcement learning dev - mechanical - electronics - 3d animation artist in-person in bangalore for now. DM hobby projects and that should be it. will get back asap.

4

13

77

Aman Bhargava 📊

@thedivtagguy

2 months

We've been running Diagram Chasing, our dataviz publication, for a year. Now we're looking to build a network of creative collaborators for future projects. Illustrators, designers, programmers, cartographers, anyone who wants to make public-interest data work with us! 🧵

2

1

7

Lossfunk

@lossfunk

2 months

We released the full Batch 5 graduation video on your channel🎓. This one is filled with demos and talks ranging from walking robots to a chess AI that beats a DeepMind model. Go watch it 👇

2

4

77

Adithya Dsilva

@AdithyaDsilva

2 months

https://t.co/c7mhCdLGkU The beauty of FFT couldn't be more apparent!

nima101.github.io

Problems and Algorithms

0

2

Adithya Dsilva

@AdithyaDsilva

3 months

Working at @lossfunk on a novel diffusion-rl direction to crack arc-agi-2. Looking for interested researchers. Hmu if you've implemented LLMs from scratch before

9

2

105

Adithya Dsilva

@AdithyaDsilva

3 months

Slides of my presentation @lossfunk. Shoutout to my co-author @OerstedIV. Hopefully a research paper soon? https://t.co/456FnhqBBq.