Tanishq Kumar @tanishqkumar07 X Profile

Tanishq Kumar

@tanishqkumar07

Followers

1K

Following

189

Media

11

Statuses

87

incoming CS PhD student @Stanford, prev math undergrad @Harvard

Boston, SF & Bombay

Joined July 2022

Don't wanna be here? Send us removal request.

Tanishq Kumar

@tanishqkumar07

3 months

trained a nanoGPT? feeling behind before o4-mini?. 🚨🚨i'm open-sourcing beyond-nanoGPT, an internal codebase to help people go from LLM basics to research-level understanding. 🚨🚨. it contains thousands of lines of from-scratch, annotated pytorch implementing advanced

6

47

318

Tanishq Kumar

@tanishqkumar07

9 days

[1] [2] [3]

1

0

1

Tanishq Kumar

@tanishqkumar07

9 days

the fact that speculative decoding was independently yet simultaneously invented* by the two divisions of google (deepmind and research) [1, 2] is absolutely hilarious to me. *as always, noam shazeer et al came up with it even before LLMs [3].

3

1

15

Tanishq Kumar

@tanishqkumar07

10 days

0

4

Tanishq Kumar

@tanishqkumar07

10 days

ah, the ecstasy of a working tensor parallel reimplementation, and the agony of writing distributed backward passes👇

1

0

3

Tanishq Kumar

@tanishqkumar07

17 days

discussing classic literature with frontier models really exposes their overconfidence. presumably because Dickens appears often in high-quality pretraining corpora, gpt-4o believes it can respond to all my questions without searching, resulting in near-constant hallucination.

2

0

5

Tanishq Kumar

@tanishqkumar07

20 days

hello friends, i will be in SF for july/aug/some of sept - if you know any summer sublets/rentals in the city i should look at on short notice, dm me :).

0

7

Tanishq Kumar

@tanishqkumar07

27 days

i find it entertaining that under the hood, most open source "GRPO" implementations (eg. trl) by default actually implement REINFORCE with monte carlo group-advantages (by not reusing rollouts & making clipping/ratios redundant).

0

4

Tanishq Kumar

@tanishqkumar07

1 month

this is an equally interesting question if you replace the "test set" throughout with "train set" -- the answer (for whether we should see such training practices as bad or as indicative of our evals being bad) is not clear to me for either variant of the q.

0

Tanishq Kumar

@tanishqkumar07

1 month

if this fails to be true in practice (as it obviously does) that feels to me more a failure of the evals, not the training ("mal")practice?.

1

0

Tanishq Kumar

@tanishqkumar07

1 month

by that, i mean that an ideal set of evals should satisfy the implication that { if you do well on *all* the evals and *did not literally train on the test set* => you *must* have a good model (from pov of user) }?.

1

0

Tanishq Kumar

@tanishqkumar07

1 month

initially, everyone says bad since it does feel like somehow "cheating." but isn't the desiderata for the ideal set of evals that they should be a "sufficient statistic" for model performance?.

1

0

Tanishq Kumar

@tanishqkumar07

1 month

something i often wonder about: is training on synthetic rephrases of eval test sets (but not the actual test set) a good or bad thing?.

1

0

Tanishq Kumar

@tanishqkumar07

1 month

[5/5] obviously this is a toy example where the app it makes is pretty trivial -- the repo is for educational use. nonetheless, seeing how <1000 lines of vanilla python is enough to go from {modeling tokens} to {genuine, useful autonomy} was certainly a striking "feel the AGI".

0

2

Tanishq Kumar

@tanishqkumar07

1 month

[4/5] some lessons on my end:.- context/memory management is the most important thing re agents. we as humans are bad at recognizing implicit info we ourselves are using, so often leave our agents malnourished, so it’s no surprise they often flounder. - “tool use" is a big deal.

1

0

3

Tanishq Kumar

@tanishqkumar07

1 month

[3/5] this is a self-contained and minimal repo. we use no frameworks, no SDKs, no exotic imports. the agent supports:. - tool use (web search, read/write files, run code) with some sandboxing.- reAct (iterated CoT with tool use in between).- long & short term memory (context.

1

0

1

Tanishq Kumar

@tanishqkumar07

1 month

[2/5] in the demo above, the agent is told to make a calculator app, which itself is trivial. but it doesn’t just write the code: after correctly pushing a demo and readme to github, i raise a github issue requesting a new feature.--> it one-shots making a correct PR end to.

1

0

1

Tanishq Kumar

@tanishqkumar07

1 month

GitHub: Some lessons learned:

1

0

2

Tanishq Kumar

@tanishqkumar07

1 month

[1/5] lots of chatter about agents these days!. i was curious, and it turns out a vanilla python wrapper around LLM.sample(), <1000 loc, is enough for a minimal "agent" that can autonomously make PRs end to end, add features, fix bugs, do deep research, and more.👇

2

1

32

Tanishq Kumar

@tanishqkumar07

2 months

when reading about generative RMs in LLMs, i thought the idea of distilling inference-time compute from RMs into the student was creative -- but reading about expert iteration, i realize RL folks have known such ideas (search -> distill) for ages!. all that is old is new again.

0

8