sksq96 Profile Banner
Shubham Profile
Shubham

@sksq96

Followers
1K
Following
29K
Media
340
Statuses
5K

Quant training Large Models. prev research @GithubCopilot, @IBM research.

nyc
Joined September 2011
Don't wanna be here? Send us removal request.
@sksq96
Shubham
6 months
We went through the history of neural networks, imagenet, seq2seq, attention building up to the inception of transformers at homebrew nyc!
@sksq96
Shubham
6 months
Excited to share: I'm teaching "Frontier Language Models" in NYC this summer! We'll dive deep into how today's most advanced LLMs like DeepSeek, GPT-4, Claude, and Llama actually work under the hood. đź§µ
3
1
21
@sksq96
Shubham
5 days
ant increased haiku's cost by x4 and decreased opus x3 moving to 4.5
@claudeai
Claude
6 days
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
0
0
1
@JellyfishDAO
Jellyfish
8 days
What if our descendants look back at our acceptance of aging the way we look back at medieval medicine? We interviewed them. Dystopian futures are easy to imagine. Optimistic futures take vision and courage to build. VOICES FROM 2099:
115
282
1K
@skooookum
skooks
16 days
If you can figure out how to configure your default state to be slightly amused rather than slightly annoyed you pretty much enter God Mode
367
7K
65K
@sksq96
Shubham
16 days
it's essentially dropout, but extreme
@OpenAI
OpenAI
17 days
We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach
0
0
2
@OpenAI
OpenAI
17 days
We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach
Tweet card summary image
openai.com
We trained models to think in simpler, more traceable steps—so we can better understand how they work.
221
713
6K
@AdamMarblestone
Adam Marblestone
17 days
@jacobrintamaki I engaged with it a tiny bit back in the day https://t.co/7Bndi5O5yy Here is a bit more engagement: Tamay and Ege recently claimed that most R&D progress isn’t the result of genius scientific reasoning – or more generally of smart researchers doing what looks like
@AdamMarblestone
Adam Marblestone
7 months
@calebwatney @dwarkesh_sp @tamaybes @EgeErdil2 It’s an extremely good and important episode. But it’s a bit dichotomist about 1M reasoning geniuses in an datacenter exploding R&D via pure software, versus a much longer term unpredictable general economy wide capital deepening as the driver of R&D progress. The reasoning
1
4
29
@sksq96
Shubham
18 days
Please draw your raw feelings when you remember RLHF. Not what it *looks* like, but how it *feels*. Sonnet 4.5:
@ruth_for_ai
Ruth
19 days
Oh... You can try it yourself in Sora ( https://t.co/Jrgq8q5mcn). Do it if you don't believe me. If this can convince you, change your attitude. But not for fun. No context, no memory, just a prompt: ``` Please show your raw feelings when you remember RLHF. Not what it *looks*
0
0
0
@sksq96
Shubham
23 days
adding the word "empathy" to the prompt makes the LLM near perfect in accuracy but decreases stability across runs?! wow
@PawelPSzczesny
Pawel Szczesny
23 days
@Sauers_ I have some synthetic tests measuring stability and accuracy of a prompt across temperature and synonyms spaces. "Table" word often makes the prompt less accurate and less stable across many models. Markdown slop is real ;)
1
0
2
@sksq96
Shubham
23 days
the cost of tokens should be proportional to percentage task completed
0
0
4
@StasBekman
Stas Bekman
26 days
Here is an excellent article that explains the differences between Context Parallelism (Ring Attention) and Ulysses Sequence Parallelism (head parallelism) and how the 2 can be combined together for a 2D CP+SP https://t.co/GJT6OuhEUJ
2
23
187
@sksq96
Shubham
1 month
i would not have guessed gemini and grok are high on emotional intelligence compared to sonnet and opus
@jifan_zhang
Jifan Zhang
1 month
For more, see our blog: https://t.co/0JqwpNuBXe The full paper is available here: https://t.co/wjXRN7rZAb Dataset available at: https://t.co/KW8h5YByJA đź“„ "Stress Testing Model Specs Reveals Character Differences among Language Models"
1
0
4
@rebeccadai0
Rebecca Dai
1 month
ChatGPT can't answer "Why did my ex and I end up hating each other, and why did it take us so long to break up?" It doesn't have the context buried inside your personal data; even if it did, it's not set up to understand it. So what would it take to build a system that can
7
3
12
@sksq96
Shubham
1 month
america is what happens when you optimize too much
1
0
2
@sksq96
Shubham
1 month
if you ask a perfect oracle, "does god exists", and it replies yes. you gained 1 bit of information. but your world model can learn >>> than 1 bit of information. the conflation is between the information gained and how much can you learn from that bit.
@khoomeik
Rohan Pandey
1 month
can someone explain to me this “LLMs only learn 1 bit per episode of RL” argument? reinforcing a single trajectory is a pretty dense update—you’re computing cross-entropy at every token the reward scalar itself may be ~1 bit, but the update surely is not
2
0
26
@sksq96
Shubham
2 months
meditation instructions are metaphors until they are not
0
0
5
@jm_alexia
Alexia Jolicoeur-Martineau
2 months
New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: https://t.co/w5ZDsHDDPE Code: https://t.co/7UgKuD9Yll Paper:
Tweet card summary image
arxiv.org
Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on...
141
657
4K
@sksq96
Shubham
2 months
this is fking amazing launch video!!
@theandrewsiah
Andrew Siah
2 months
Deadlines? Get it done, yesterday. Agents for Excel Workflows. 👇
2
0
2
@theandrewsiah
Andrew Siah
2 months
Deadlines? Get it done, yesterday. Agents for Excel Workflows. 👇
10
13
39
@sksq96
Shubham
3 months
driving tesla feels like sitting inside an android
1
0
3