tanishqkumar07 Profile Banner
Tanishq Kumar Profile
Tanishq Kumar

@tanishqkumar07

Followers
1K
Following
189
Media
11
Statuses
87

incoming CS PhD student @Stanford, prev math undergrad @Harvard

Boston, SF & Bombay
Joined July 2022
Don't wanna be here? Send us removal request.
@tanishqkumar07
Tanishq Kumar
3 months
trained a nanoGPT? feeling behind before o4-mini?. 🚨🚨i'm open-sourcing beyond-nanoGPT, an internal codebase to help people go from LLM basics to research-level understanding. 🚨🚨. it contains thousands of lines of from-scratch, annotated pytorch implementing advanced
Tweet media one
6
47
318
@tanishqkumar07
Tanishq Kumar
9 days
[1] [2] [3]
1
0
1
@tanishqkumar07
Tanishq Kumar
9 days
the fact that speculative decoding was independently yet simultaneously invented* by the two divisions of google (deepmind and research) [1, 2] is absolutely hilarious to me. *as always, noam shazeer et al came up with it even before LLMs [3].
3
1
15
@tanishqkumar07
Tanishq Kumar
10 days
0
0
4
@tanishqkumar07
Tanishq Kumar
10 days
ah, the ecstasy of a working tensor parallel reimplementation, and the agony of writing distributed backward passes👇
Tweet media one
1
0
3
@tanishqkumar07
Tanishq Kumar
17 days
discussing classic literature with frontier models really exposes their overconfidence. presumably because Dickens appears often in high-quality pretraining corpora, gpt-4o believes it can respond to all my questions without searching, resulting in near-constant hallucination.
2
0
5
@tanishqkumar07
Tanishq Kumar
20 days
hello friends, i will be in SF for july/aug/some of sept - if you know any summer sublets/rentals in the city i should look at on short notice, dm me :).
0
0
7
@tanishqkumar07
Tanishq Kumar
27 days
i find it entertaining that under the hood, most open source "GRPO" implementations (eg. trl) by default actually implement REINFORCE with monte carlo group-advantages (by not reusing rollouts & making clipping/ratios redundant).
0
0
4
@tanishqkumar07
Tanishq Kumar
1 month
this is an equally interesting question if you replace the "test set" throughout with "train set" -- the answer (for whether we should see such training practices as bad or as indicative of our evals being bad) is not clear to me for either variant of the q.
0
0
0
@tanishqkumar07
Tanishq Kumar
1 month
if this fails to be true in practice (as it obviously does) that feels to me more a failure of the evals, not the training ("mal")practice?.
1
0
0
@tanishqkumar07
Tanishq Kumar
1 month
by that, i mean that an ideal set of evals should satisfy the implication that { if you do well on *all* the evals and *did not literally train on the test set* => you *must* have a good model (from pov of user) }?.
1
0
0
@tanishqkumar07
Tanishq Kumar
1 month
initially, everyone says bad since it does feel like somehow "cheating." but isn't the desiderata for the ideal set of evals that they should be a "sufficient statistic" for model performance?.
1
0
0
@tanishqkumar07
Tanishq Kumar
1 month
something i often wonder about: is training on synthetic rephrases of eval test sets (but not the actual test set) a good or bad thing?.
1
0
0
@tanishqkumar07
Tanishq Kumar
1 month
[5/5] obviously this is a toy example where the app it makes is pretty trivial -- the repo is for educational use. nonetheless, seeing how <1000 lines of vanilla python is enough to go from {modeling tokens} to {genuine, useful autonomy} was certainly a striking "feel the AGI".
0
0
2
@tanishqkumar07
Tanishq Kumar
1 month
[4/5] some lessons on my end:.- context/memory management is the most important thing re agents. we as humans are bad at recognizing implicit info we ourselves are using, so often leave our agents malnourished, so it’s no surprise they often flounder. - “tool use" is a big deal.
1
0
3
@tanishqkumar07
Tanishq Kumar
1 month
[3/5] this is a self-contained and minimal repo. we use no frameworks, no SDKs, no exotic imports. the agent supports:. - tool use (web search, read/write files, run code) with some sandboxing.- reAct (iterated CoT with tool use in between).- long & short term memory (context.
1
0
1
@tanishqkumar07
Tanishq Kumar
1 month
[2/5] in the demo above, the agent is told to make a calculator app, which itself is trivial. but it doesn’t just write the code: after correctly pushing a demo and readme to github, i raise a github issue requesting a new feature.--> it one-shots making a correct PR end to.
1
0
1
@tanishqkumar07
Tanishq Kumar
1 month
GitHub: Some lessons learned:
1
0
2
@tanishqkumar07
Tanishq Kumar
1 month
[1/5] lots of chatter about agents these days!. i was curious, and it turns out a vanilla python wrapper around LLM.sample(), <1000 loc, is enough for a minimal "agent" that can autonomously make PRs end to end, add features, fix bugs, do deep research, and more.👇
2
1
32
@tanishqkumar07
Tanishq Kumar
2 months
when reading about generative RMs in LLMs, i thought the idea of distilling inference-time compute from RMs into the student was creative -- but reading about expert iteration, i realize RL folks have known such ideas (search -> distill) for ages!. all that is old is new again.
0
0
8