Subham Kumar @subgrad X Profile

Subham Kumar

@subgrad

Followers

179

Following

4K

Media

14

Statuses

94

Machine Learning @Google

Bengaluru

Joined August 2021

Don't wanna be here? Send us removal request.

Subham Kumar

@subgrad

2 months

memory as a tool.

0

5

Subham Kumar

@subgrad

3 months

nothing as powerful as 2.5 pro.

0

6

Subham Kumar

@subgrad

4 months

Multi‑Head Latent Attention(MLA) is such a clever strategy in DeepSeek‑V3. By compressing the KV cache, it slashes memory usage dramatically. With 64 attention heads(each of 128 dims), a compression dimension of 2048, and a positional dimension of 2048, MLA cuts memory by ~75%!

0

1

6

Subham Kumar

@subgrad

5 months

Have been using this LLM Consortium for sometime, nice to see @llama_index's implementation with orchestrating asynchronous agents.

Subham Kumar

@subgrad

8 months

LLM1 writes code → LLM2 critiques → Feed critique back for self-improving iterations. Each iteration improves itself based on previous feedback. Anyone built a browser extension for this loop?.

0

2

9

Subham Kumar

@subgrad

5 months

grpo in deepseek r1 proves that group feedback beats solo judgment - no explicit critic required.

0

5

Subham Kumar

@subgrad

5 months

How o1 differs from GPT is best illustrated by this-.RL at train and reasoning skills at test time.

Jim Fan

@DrJimFan

10 months

OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to

0

5

Subham Kumar

@subgrad

7 months

Tried comparing 4o vs o1 on reasoning tasks involving compound rules: this UKCAT-style problem to differentiate two sets. Both models are failing, o1's explicit reasoning doesn’t seem to provide any advantage here.

Subham Kumar

@subgrad

7 months

Haven't come across a single useful scenario where o1-models perform better. They take much longer with repetitive output that doesn’t feel like actual 'thought.' In contrast, implicit Chain-of-Thought reasoning with 4-o seems more precise and definitely faster.

0

8

Subham Kumar

@subgrad

7 months

Haven't come across a single useful scenario where o1-models perform better. They take much longer with repetitive output that doesn’t feel like actual 'thought.' In contrast, implicit Chain-of-Thought reasoning with 4-o seems more precise and definitely faster.

0

7

Subham Kumar

@subgrad

8 months

@pratyush_r8.

1

0

3

Subham Kumar

@subgrad

8 months

LLM1 writes code → LLM2 critiques → Feed critique back for self-improving iterations. Each iteration improves itself based on previous feedback. Anyone built a browser extension for this loop?.

1

10

Subham Kumar

@subgrad

1 year

Hallucination could be a feature for approximate retrievals, but it can't be this bad when the prompt itself has significant hint. Long live RAG!!

0

5

Subham Kumar

@subgrad

1 year

Stuck at implementing a custom T5 with no-decoder FFN for the last 2 days 😣. Trying to create a CustomT5 class from T5ForConditionalGeneration, but struggling with cache reordering for beam search decoding. Here’s the original paper: Any pointers?#NLProc.

0

4

Subham Kumar

@subgrad

2 years

Time-To-First-Token has increased significantly with the latest GPT Turbo along with significant fall in throughput. Also there appears to be a recurrent issue where every fourth or fifth request fails, displaying "Hmm. something seems to have gone wrong.".

1

0

4

Subham Kumar

@subgrad

3 years

Had fun talking to @Krishnaik06 about the hiring process in Data Science roles.

Krish Naik

@Krishnaik06

3 years

Podcast With Subham To Understand The Data Science Interview Process In FAANG! . Let’s develop relationships with hard-to-reach people through these podcasts @__SubhamKumar . #datascience #artificialintelligence #ai #data #bigdata #coding #datascientist.

1

15

Subham Kumar

@subgrad

3 years

Why many a time posts on the Twitter feed are clustered with same topic? IMO this hampers exploration. Or am I missing something here? 🤔

1

0

8

Subham Kumar

@subgrad

3 years

Thank you @UrbanYogiHQ @jaggi_yogi 🚀

0

14

Subham Kumar

@subgrad

4 years

Was talking to someone who is looking to transition into DS.25+ yrs of work ex, Leader in healthcare industry & well settled abroad.He joined the meet with a detailed presentation of his current progress, short & long term goals in ML. Amazed to see so much dedication at age 50!.

0

13

Subham Kumar

@subgrad

4 years

Seeing a recent trend of some YouTube creators recommending to watch content in 1.5x. Given YT AdSense revenue is based on watchtime, why would someone do this to hamper their own monetisation?.

1

0

9

Subham Kumar

@subgrad

4 years

Hey @Nils_Reimers, I was following the semantic search retrieve-rerank framework from SBERT, any suggestions on encoding around 100 million records? I am getting CUDA OOM for more than 5-10 million records on 8x V100 GPU.

1

0

3

Subham Kumar

@subgrad

4 years

0.74 is the W/L ratio for Kohli. Looks like the unluckiest captain!!

0

4