Subham Kumar Profile
Subham Kumar

@subgrad

Followers
179
Following
4K
Media
14
Statuses
94

Machine Learning @Google

Bengaluru
Joined August 2021
Don't wanna be here? Send us removal request.
@subgrad
Subham Kumar
2 months
memory as a tool.
0
0
5
@subgrad
Subham Kumar
3 months
nothing as powerful as 2.5 pro.
0
0
6
@subgrad
Subham Kumar
4 months
Multi‑Head Latent Attention(MLA) is such a clever strategy in DeepSeek‑V3. By compressing the KV cache, it slashes memory usage dramatically. With 64 attention heads(each of 128 dims), a compression dimension of 2048, and a positional dimension of 2048, MLA cuts memory by ~75%!
Tweet media one
0
1
6
@subgrad
Subham Kumar
5 months
Have been using this LLM Consortium for sometime, nice to see @llama_index's implementation with orchestrating asynchronous agents.
Tweet media one
@subgrad
Subham Kumar
8 months
LLM1 writes code → LLM2 critiques → Feed critique back for self-improving iterations. Each iteration improves itself based on previous feedback. Anyone built a browser extension for this loop?.
0
2
9
@subgrad
Subham Kumar
5 months
grpo in deepseek r1 proves that group feedback beats solo judgment - no explicit critic required.
0
0
5
@subgrad
Subham Kumar
5 months
How o1 differs from GPT is best illustrated by this-.RL at train and reasoning skills at test time.
@DrJimFan
Jim Fan
10 months
OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to
Tweet media one
0
0
5
@subgrad
Subham Kumar
7 months
Tried comparing 4o vs o1 on reasoning tasks involving compound rules: this UKCAT-style problem to differentiate two sets. Both models are failing, o1's explicit reasoning doesn’t seem to provide any advantage here.
Tweet media one
Tweet media two
@subgrad
Subham Kumar
7 months
Haven't come across a single useful scenario where o1-models perform better. They take much longer with repetitive output that doesn’t feel like actual 'thought.' In contrast, implicit Chain-of-Thought reasoning with 4-o seems more precise and definitely faster.
0
0
8
@subgrad
Subham Kumar
7 months
Haven't come across a single useful scenario where o1-models perform better. They take much longer with repetitive output that doesn’t feel like actual 'thought.' In contrast, implicit Chain-of-Thought reasoning with 4-o seems more precise and definitely faster.
0
0
7
@subgrad
Subham Kumar
8 months
1
0
3
@subgrad
Subham Kumar
8 months
LLM1 writes code → LLM2 critiques → Feed critique back for self-improving iterations. Each iteration improves itself based on previous feedback. Anyone built a browser extension for this loop?.
1
1
10
@subgrad
Subham Kumar
1 year
Hallucination could be a feature for approximate retrievals, but it can't be this bad when the prompt itself has significant hint. Long live RAG!!
Tweet media one
Tweet media two
0
0
5
@subgrad
Subham Kumar
1 year
Stuck at implementing a custom T5 with no-decoder FFN for the last 2 days 😣. Trying to create a CustomT5 class from T5ForConditionalGeneration, but struggling with cache reordering for beam search decoding. Here’s the original paper: Any pointers?#NLProc.
0
0
4
@subgrad
Subham Kumar
2 years
Time-To-First-Token has increased significantly with the latest GPT Turbo along with significant fall in throughput. Also there appears to be a recurrent issue where every fourth or fifth request fails, displaying "Hmm. something seems to have gone wrong.".
1
0
4
@subgrad
Subham Kumar
3 years
Had fun talking to @Krishnaik06 about the hiring process in Data Science roles.
@Krishnaik06
Krish Naik
3 years
Podcast With Subham To Understand The Data Science Interview Process In FAANG! . Let’s develop relationships with hard-to-reach people through these podcasts @__SubhamKumar . #datascience #artificialintelligence #ai #data #bigdata #coding #datascientist.
1
1
15
@subgrad
Subham Kumar
3 years
Why many a time posts on the Twitter feed are clustered with same topic? IMO this hampers exploration. Or am I missing something here? 🤔
Tweet media one
1
0
8
@subgrad
Subham Kumar
3 years
Thank you @UrbanYogiHQ @jaggi_yogi 🚀
Tweet media one
0
0
14
@subgrad
Subham Kumar
4 years
Was talking to someone who is looking to transition into DS.25+ yrs of work ex, Leader in healthcare industry & well settled abroad.He joined the meet with a detailed presentation of his current progress, short & long term goals in ML. Amazed to see so much dedication at age 50!.
0
0
13
@subgrad
Subham Kumar
4 years
Seeing a recent trend of some YouTube creators recommending to watch content in 1.5x. Given YT AdSense revenue is based on watchtime, why would someone do this to hamper their own monetisation?.
1
0
9
@subgrad
Subham Kumar
4 years
Hey @Nils_Reimers, I was following the semantic search retrieve-rerank framework from SBERT, any suggestions on encoding around 100 million records? I am getting CUDA OOM for more than 5-10 million records on 8x V100 GPU.
1
0
3
@subgrad
Subham Kumar
4 years
0.74 is the W/L ratio for Kohli. Looks like the unluckiest captain!!
Tweet media one
0
0
4