Shubham @sksq96 X Profile

Shubham

@sksq96

Followers

1K

Following

29K

Media

340

Statuses

5K

Quant training Large Models. prev research @GithubCopilot, @IBM research.

https://t.co/QcykmgHkmi

nyc

Joined September 2011

Don't wanna be here? Send us removal request.

Shubham

@sksq96

6 months

We went through the history of neural networks, imagenet, seq2seq, attention building up to the inception of transformers at homebrew nyc!

Shubham

@sksq96

6 months

Excited to share: I'm teaching "Frontier Language Models" in NYC this summer! We'll dive deep into how today's most advanced LLMs like DeepSeek, GPT-4, Claude, and Llama actually work under the hood. 🧵

3

1

21

Shubham

@sksq96

5 days

ant increased haiku's cost by x4 and decreased opus x3 moving to 4.5

Claude

@claudeai

6 days

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

0

1

Jellyfish

@JellyfishDAO

8 days

What if our descendants look back at our acceptance of aging the way we look back at medieval medicine? We interviewed them. Dystopian futures are easy to imagine. Optimistic futures take vision and courage to build. VOICES FROM 2099:

115

282

1K

skooks

@skooookum

16 days

If you can figure out how to configure your default state to be slightly amused rather than slightly annoyed you pretty much enter God Mode

367

7K

65K

Shubham

@sksq96

16 days

it's essentially dropout, but extreme

OpenAI

@OpenAI

17 days

We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach

0

2

OpenAI

@OpenAI

17 days

We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach

openai.com

We trained models to think in simpler, more traceable steps—so we can better understand how they work.

221

713

6K

Adam Marblestone

@AdamMarblestone

17 days

@jacobrintamaki I engaged with it a tiny bit back in the day https://t.co/7Bndi5O5yy Here is a bit more engagement: Tamay and Ege recently claimed that most R&D progress isn’t the result of genius scientific reasoning – or more generally of smart researchers doing what looks like

Adam Marblestone

@AdamMarblestone

7 months

@calebwatney @dwarkesh_sp @tamaybes @EgeErdil2 It’s an extremely good and important episode. But it’s a bit dichotomist about 1M reasoning geniuses in an datacenter exploding R&D via pure software, versus a much longer term unpredictable general economy wide capital deepening as the driver of R&D progress. The reasoning

1

4

29

Shubham

@sksq96

18 days

Please draw your raw feelings when you remember RLHF. Not what it *looks* like, but how it *feels*. Sonnet 4.5:

Ruth

@ruth_for_ai

19 days

Oh... You can try it yourself in Sora ( https://t.co/Jrgq8q5mcn). Do it if you don't believe me. If this can convince you, change your attitude. But not for fun. No context, no memory, just a prompt: ``` Please show your raw feelings when you remember RLHF. Not what it *looks*

0

Shubham

@sksq96

23 days

adding the word "empathy" to the prompt makes the LLM near perfect in accuracy but decreases stability across runs?! wow

Pawel Szczesny

@PawelPSzczesny

23 days

@Sauers_ I have some synthetic tests measuring stability and accuracy of a prompt across temperature and synonyms spaces. "Table" word often makes the prompt less accurate and less stable across many models. Markdown slop is real ;)

1

0

2

Shubham

@sksq96

23 days

the cost of tokens should be proportional to percentage task completed

0

4

Stas Bekman

@StasBekman

26 days

Here is an excellent article that explains the differences between Context Parallelism (Ring Attention) and Ulysses Sequence Parallelism (head parallelism) and how the 2 can be combined together for a 2D CP+SP https://t.co/GJT6OuhEUJ

2

23

187

Shubham

@sksq96

1 month

i would not have guessed gemini and grok are high on emotional intelligence compared to sonnet and opus

Jifan Zhang

@jifan_zhang

1 month

For more, see our blog: https://t.co/0JqwpNuBXe The full paper is available here: https://t.co/wjXRN7rZAb Dataset available at: https://t.co/KW8h5YByJA 📄 "Stress Testing Model Specs Reveals Character Differences among Language Models"

1

0

4

Rebecca Dai

@rebeccadai0

1 month

ChatGPT can't answer "Why did my ex and I end up hating each other, and why did it take us so long to break up?" It doesn't have the context buried inside your personal data; even if it did, it's not set up to understand it. So what would it take to build a system that can

7

3

12

Shubham

@sksq96

1 month

america is what happens when you optimize too much

1

0

2

Shubham

@sksq96

1 month

if you ask a perfect oracle, "does god exists", and it replies yes. you gained 1 bit of information. but your world model can learn >>> than 1 bit of information. the conflation is between the information gained and how much can you learn from that bit.

Rohan Pandey

@khoomeik

1 month

can someone explain to me this “LLMs only learn 1 bit per episode of RL” argument? reinforcing a single trajectory is a pretty dense update—you’re computing cross-entropy at every token the reward scalar itself may be ~1 bit, but the update surely is not

2

0

26

Shubham

@sksq96

2 months

meditation instructions are metaphors until they are not

0

5

Alexia Jolicoeur-Martineau

@jm_alexia

2 months

New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: https://t.co/w5ZDsHDDPE Code: https://t.co/7UgKuD9Yll Paper: