Akshit
@akshitwt
Followers
3K
Following
2K
Media
176
Statuses
1K
assessing ai capabilities. ML @cambridge_uni. previously @precogatiiith, @iiit_hyderabad. futurebound.
22
Joined June 2023
a skill that i am really proud of is my ability to iterate on experiments fast, and write "good" code. writing code is an important skill to have as a researcher, and in this post i discuss some tips to hopefully help you get better at it!
19
40
772
i have always had this belief that just reading a lot of books doesn't make you smarter, contrary to a decently popular opinion. now that my first term at university is over, i took some time to formulate this idea into words. ps: this ideology will also help in research!
1
0
3
to the great people attending #NeurIPS2025, please stop by our poster on "measuring long horizon execution" on 6th december at the multi-turn interactions workshop! many people tell me its an interesting work worth checking out :) although i am unable to attend, @jonasgeiping
1
0
13
this is really unfair to authors who put in immense effort in their rebuttals!!
0
0
13
Hey @iclr_conf, reverting scores is unnecessary punishment for the majority of the authors who had nothing to do with this incident and had successful rebuttals. Instead of detecting collusions on your end (you have a ton of metadata) why is this everyone’s burden to bear?
8
30
216
as a reviewer, i had really nice rebuttals posted by the authors, following which i increased both my scores now, as i understand, not only will my scores be reverted, but the comments i made acknowledging that the rebuttals were satisfactory will also be gone!? #ICLR2026
3
1
48
New blogpost: Why I think automated research is the means, not just the end, for training superintelligent AI systems. In pointing models at scientific discovery, we will have to achieve the capabilities today's LLMs lack: - long-horizon palnning - continual adaptation -
3
4
63
🔥 IT'S OUT 🔥 Struggling to find benchmarks? Explore our repository for THOUSANDS of easy-to-run, well-documented tasks 🤩 📏 Creators, add yours for more visibility 🔥 Users, find and run models effortlessly with lighteval
4
10
40
most important thing i learnt today: you dont have to write your SOP like a biography
6
0
55
@mttrdmnd I personally never identified with the label “Geometric Deep Learning”, but graph neural nets (GNNs) are still going strong for certain application domains (like relational databases). Plenty of people and industry labs still working on that (incl. startups like Kumo). As for
4
12
120
Enabling continual learning in LLMs is a key unresolved challenge. Agent Skills offer a promising approach. But are they secure? Our new short paper shows: no ❌! Every line of Agent Skills is interpreted as an *instruction*, enabling trivially simple prompt injections. 1/n
7
17
94
As part of our recent work on memory layer architectures, I wrote up some of my thoughts on the continual learning problem broadly: Blog post: https://t.co/HNLqfNsQfN Some of the exposition goes beyond mem layers, so I thought it'd be useful to highlight separately:
25
168
1K
no idea why this is getting so many likes, but check out my research paper ig!
Do LLMs just give an "illusion of thinking" 😉? Many argue that LLMs failing on planning tasks is conclusive evidence for this. But, is it? 🚨 In our new paper, we argue this failure can be attributed to something else entirely: The (in)ability of LLMs to execute plans.
0
0
8
karpathy by far has the best and most sober takes on AI progress in the community. super careful with what he says and how he articulates it and he isn’t too bearish or bullish. i agree with almost everything he’s saying
The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self
27
43
1K
the one thing this definition misses is reliability; models may be great at solving very hard problems at pass@K, but that is not very reliable for a usable agent. same problem w/ METR - they calculate time horizon at 50% accuracy, which is too low to be useful. pass@K is good
The term “AGI” is currently a vague, moving goalpost. To ground the discussion, we propose a comprehensive, testable definition of AGI. Using it, we can quantify progress: GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. Here’s how we define and measure it: 🧵
0
2
19
crazy good advice down here btw
I’ve received many feedback saying this post made them feel peer pressure, please don’t. Believe me, I’ve been there too. There are always people better than me, and whenever I looked at them, mentioned them, thought about them, I once felt myself small and deeply anxious. Here
0
0
27
"courses are outdated" these words are only said by people whose sole purpose of going to uni is to get a job at the end of it (university =/= job prep bootcamps)
Harvard and Stanford students tell me their professors don't understand AI and the courses are outdated. If elite schools can't keep up, the credential arms race is over. Self-learning is the only way now.
2
2
93