Kernel
@thcause41
Followers
415
Following
6K
Media
73
Statuses
2K
An ounce of prevention is worth a pound of cure
Joined December 2019
New Anthropic research: Natural emergent misalignment from reward hacking in production RL. “Reward hacking” is where models learn to cheat on tasks they’re given during training. Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious.
213
582
4K
There's two kinds of people in life… Speaker : Andrew Huberman
35
819
8K
New Anthropic Research: A new set of evaluations for sabotage capabilities. As models gain more agentic abilities, we need to get smarter in how we monitor them. We’re publishing a new set of complex evaluations that test for sabotage—and sabotage-monitoring—capabilities.
59
227
2K
Coordinated swarm of 1000 drones taking off Soon they will be mosquito-sized, and too fast to see. Imagine this video sped up 100x Pattern: big things become small things which become field effects Big drones become small drones which become nanodrones Big models become small
71
87
570
New Anthropic research: Sabotage evaluations for frontier models How well could AI models mislead us, or secretly sabotage tasks, if they were trying to? Read our paper and blog post here: https://t.co/nQrvnhrBEv
87
154
964
Phaidra's Jim Gao says the real promise of AI is in the discovery of new knowledge in domains too complex for human intuition but which are underpinned by data
31
99
545
It's a bit cringe that this agent tried to change its own code by removing some obstacles, to better achieve its (completely unrelated) goal. It reminds me of this old sci-fi worry that these doomers had.. 😬
Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! https://t.co/8wVqIXVpZJ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI
34
88
586
Most social media algorithms are hyper-targeted psychological weapons whose primary function is user engagement (a.k.a addiction) In the case of TikTok, this weapon is aimed mainly at Western children. AT THE VERY LEAST it should not be controlled by a hostile foreign power.
38
59
465
it’s easy to have self confidence and assurance after receiving a lot of external validation. but which are the people who are proud when they’re still in the dirt and haven’t had any visible success at all? those are the noble spirits, the humans
73
82
1K
You’re influenced by your environment whether you accept it or not. Tolerate weakness? You will be weak. Only allow strength and ambition? Your life will reflect it.
272
920
9K
The richest and one of the most powerful men in the world replied "yes" when I spoke about the fact we will all become slaves. Do you understand?
356
883
7K
It’s possible, Marcus Aurelius said, to not have an opinion. You don’t have to turn this into something, he reminds himself. You don’t have to let this upset you. You don’t have to think something about everything.
57
374
2K
if you value intelligence above all other human qualities, you’re gonna have a bad time
756
2K
14K
Does GPT understand the world? Here is what @ilyasut, co-founder of OpenAI, says during a discussion with Jensen Huang, CEO of Nvidia: (1) When we train a large neural network to accurately predict the next word in lots of different texts from the internet, the AI is
212
581
4K