
Rob K
@robertzzk
Followers
452
Following
2K
Media
301
Statuses
3K
RT @IvanArcus: 🧵Chain-of-Thought reasoning in LLMs like Claude 3.7 and R1 is behind many recent breakthroughs. But does the CoT always expl….
0
64
0
RT @peterwildeford: 🚨 IAPS is hiring 🚨. We seek Researchers / Senior Researchers to join our team to identifies concrete interventions that….
0
14
0
Outer loss transmits very limited bits of information through the info-theoretic bottleneck of the training process. so yes, if your function family (eg wetware brains) priors mesa-optimizers relatively densely, no sparsely delivered outer loss (death in the EEA) suffices. .
"Train it to be nice" is the obvious thought. Alas, I predict that one idiom that does generalize from natural selection to gradient descent, is that training on an outer loss gets you something not internally aligned to that outer loss. It gets you ice cream and condoms.
0
0
0
Disagree. "Gradient-descenting" is the wrong verb as it implies attribution to kindness-likelihood is primarily the optimizer. The text that is to be predicted matters; the etiology of the ontology implicit in the training distribution *was* shaped by m-s of proteins and IS kind.
Gradient-descending an AI system to predict text, or even to play video games, is nothing like this. It is exploring nowhere near this space. Gradient descent of matrices is not mutation-selection of proteins and I don't expect it to hit on anything like similar architectures.
1
0
1
RT @akothari: The Microsoft / CrowdStrike outage has taken down most airports in India. I got my first hand-written boarding pass today 😅 h….
0
11K
0
What does fine tuning do to models? Do representations transfer between base and chat versions? Find out in the next episode of @Connor_Kissane and Rob project!.
New post with @robertzzk, @ArthurConmy, & @NeelNanda5: Sparse Autoencoders (usually) Transfer between Base and Chat Models! This suggests that models' representations remain extremely similar after fine-tuning.
0
0
1
RT @Connor_Kissane: New post with @robertzzk, @ArthurConmy, & @NeelNanda5: Sparse Autoencoders (usually) Transfer between Base and Chat Mod….
0
4
0
Sparse Autoencoders help us understand the MLPs of LLMs, but what's up with attention?. Find out in our new paper with @Connor_Kissane and @NeelNanda5!.
Sparse Autoencoders help us understand the MLPs of LLMs, but what's up with attention?. In our new paper with @NeelNanda5, we introduce Attention Output SAEs to uncover what concepts attention layers learn. Further, we use them to find novel insights previously out-of-reach!🧵
0
0
4
New attention sparseautoencoders post dropped. $1000 bounty to whomever find the best attention circuit!.
Great post from my scholars @Connor_Kissane.& @robertzzk!. SAEs are fashionable, but are they a useful tool for researchers? They are! We find a deeper understanding of the well-studied IOI circuit, and make a circuit analysis tool. $1000 bounty to whoever finds the best circuit!.
0
1
5
RT @robertghilduta: Anyone know of any publications or research on changing out the activation function of trained transformer network? Loo….
0
1
0