noahdgoodman @noahdgoodman profile

noahdgoodman

@noahdgoodman

Followers

5K

Following

351

Media

8

Statuses

253

Professor of natural and artificial intelligence @Stanford. Alignment at @GoogleDeepMind. (@StanfordNLP @StanfordAILab etc)

Joined November 2019

Don't wanna be here? Send us removal request.

noahdgoodman

@noahdgoodman

4 months

Congrats to OAI on producing a reasoning model! Their opaque tweets demonstrate that they’ve (independently) found some of the core ideas that we did on our way to STaR.

Mark Chen

@markchen90

4 months

Congrats to DeepSeek on producing an o1-level reasoning model! Their research paper demonstrates that they’ve independently found some of the core ideas that we did on our way to o1.

30

132

2K

noahdgoodman

@noahdgoodman

1 year

When I first saw Tree of Thoughts, I asked myself: If language models can reason better by searching, why don't they do it themselves during Chain of Thought? Some possible answers (and a new paper): 🧵.

8

47

341

noahdgoodman

@noahdgoodman

1 year

This seems like a good time to mention that I've taken a part-time role at @GoogleDeepMind working on AI Safety and Alignment!.

Anca Dragan

@ancadianadragan

1 year

So excited and so very humbled to be stepping in to head AI Safety and Alignment at @GoogleDeepMind. Lots of work ahead, both for present-day issues and for extreme risks in anticipation of capabilities advancing.

13

10

269

noahdgoodman

@noahdgoodman

4 months

@tomlikestocode I think it means in the future we will develop models by having smart and intense researchers stand in the shoulders of previous smart and intense researchers.

2

3

172

noahdgoodman

@noahdgoodman

8 months

You give your partner flowers. They become angry. Why? It turns out language models are good at this — superhuman affective cognition:

8

32

147

noahdgoodman

@noahdgoodman

1 year

Hey twitter! @mcxfrank and I are teaching a seminar "Topics in Natural and Artificial Intelligence", where we'll be thinking about the relations between cognitive psych and modern AI (mostly LLMs). What papers are we missing?.

9

23

134

noahdgoodman

@noahdgoodman

1 year

What nobody is mentioning: state-space models (mamba, stripey hyena, etc) are a huge win for AI safety. The strict bottleneck between past and future provides a clean target for interpretation and intervention.

7

8

113

noahdgoodman

@noahdgoodman

7 months

New work with the awesome @aryaman2020 ! This started as some musings about “model organisms” for studying LM alignment, and ended up with really cool connections between ICL, Bayes, and alignment techniques.

Aryaman Arora

@aryaman2020

7 months

New paper! 🫡. In-context learning (ICL) is when LLMs infer how to do a task from examples. We know that the relationship between # of ICL examples and task accuracy is predictable. Can we predict the shape of the ICL curve using Bayesian assumptions? Our paper shows yes!

1

7

100

noahdgoodman

@noahdgoodman

1 year

I’m thrilled for this to be public! The culmination of years of thinking about thinking.

Eric Zelikman

@ericzelikman

1 year

Language models today are trained to reason either 1) generally, imitating online reasoning data or 2) narrowly, self-teaching on their own solutions to specific tasks. Can LMs teach themselves to reason generally?🌟Introducing Quiet-STaR, self-teaching via internal monologue!🧵

2

11

96

noahdgoodman

@noahdgoodman

2 years

💚. (i don't work at openai, or know @sama, or have really any opinion about any of this. but all the cool kids are heartreposting and i fomo).

Sam Altman

@sama

2 years

i love the openai team so much.

1

2

86

noahdgoodman

@noahdgoodman

2 years

Y’all know what the * in Q* stands for, don’t you? Self TAught Reasoner.

0

6

53

noahdgoodman

@noahdgoodman

1 year

New work where language models learn to ask questions? So they can better understand user needs? With an amazing method name? Oh, yes!.

Philipp

@jphilippfranken

1 year

When prompting language models to complete a task, users often leave important things unsaid. Can language models teach themselves to ask clarifying questions? . In STaR-GATE, we explore LMs' ability to self-improve by rewarding the model for generating useful questions!

2

9

42

noahdgoodman

@noahdgoodman

2 years

My very first phd student, @stuhlmueller, founded @oughtinc after leaving CoCoLab. Ought Gad done amazing work as a nonprofit lab, that helped me see the power of LLMs. I’m excited for their next chapter as @elicitorg!! (And in a new role for me I’m an “angel”).

Elicit

@elicitorg

2 years

1/ Announcing our spinoff from @oughtinc into a public benefit corporation, our $9 million seed round, and a much more powerful Elicit!. This new Elicit takes the components of the popular literature review workflow and extends them to automate more research workflows.

0

4

44

noahdgoodman

@noahdgoodman

2 years

Some PR on a fun auto-coding project we've (@ericzelikman @GabrielPoesia @nickhaber @qhwang3) been doing. Tl;dr: decomposing the problem, solving the pieces, then re-composing is useful! .New Tool Helps AI and Humans Learn To Code Better

2

7

41

noahdgoodman

@noahdgoodman

2 years

Dear twitter, I am coining an important new term:. FoMo FOMO Fatigue. or, being exhausted by the fear of missing out on foundation models. It is the feeling you get when you read that post about the amazingsuperprompt that you must know or you'll NEVER BE EMPLOYED AGAIN.

1

6

41

noahdgoodman

@noahdgoodman

1 year

Base language models already know a lot about good behavior. Here we bring out that latent knowledge by enhancing the connection between principles and responses — no preferences required!.

Philipp

@jphilippfranken

1 year

Constitutional AI showed LMs can learn to follow constitutions by labeling their own outputs. But why can't we just tell a base model the principles of desired behavior and rely on it to act appropriately?. Introducing SAMI: Self-Supervised Alignment with Mutual Information!

1

7

41

noahdgoodman

@noahdgoodman

1 year

First author @gandhikanishk's post: Arxiv:

Kanishk Gandhi

@gandhikanishk

1 year

Language models struggle to search, not due to an architecture problem, but a data one! They rarely see how to search or backtrack. We show how LLMs can be taught to search by representing the process of search in language as a flattened string, a stream of search (SoS)!

1

4

40

noahdgoodman

@noahdgoodman

1 year

In new work we tested this for the simple (but hard) game of Countdown. When trained on only correct solutions a LM does badly; when trained on a serialized "stream of search" the LM does much better and can cary out search on new problems. So, it was a data problem all along!.

1

3

40

noahdgoodman

@noahdgoodman

3 months

“Four habits of highly effective STaRs” — we show that certain high level cognitive behaviors are necessary for learning to reason through RL. Exciting!.

Kanishk Gandhi

@gandhikanishk

3 months

New Paper!! We try to understand why some LMs self-improve their reasoning while others hit a wall. The key? Cognitive behaviors! Read our paper on how the right cognitive behaviors can make all the difference in a model's ability to improve with RL! 🧵1/13

0

5

41

noahdgoodman

@noahdgoodman

1 year

Thinking quietly helps in thinking aloud - we added an additional result to the Quiet-STaR paper! We also open sourced code and weights today.

Eric Zelikman

@ericzelikman

1 year

A couple exciting updates! First, we quantitatively evaluated the improvement from combining Quiet-STaR with chain-of-thought (i.e. letting the model think before each CoT token). We found it improves zero-shot CoT accuracy on GSM8K by over 7%!

2

8

39

noahdgoodman

@noahdgoodman

1 year

It turns out LLMs are surprisingly good at statistical modeling! New work with @michaelyli__ and Emily Fox.

2

4

38

noahdgoodman

@noahdgoodman

4 months

@AmeliaDacine Speaking purely in my capacity as a cranky Stanford professor.

0

38

noahdgoodman

@noahdgoodman

2 years

I'm super excited about this new work with @BenPrystawski on the origins of reasoning and why chain-of-thought helps LLMs!.

0

2

37

noahdgoodman

@noahdgoodman

2 years

@JanetAdamsAI slides:

0

5

34

noahdgoodman

@noahdgoodman

2 years

I open sourced the code for Meta-prompt. (also see the @langchain version by @mmmbchang!).

1

6

34

noahdgoodman

@noahdgoodman

11 months

2

4

30

noahdgoodman

@noahdgoodman

2 years

Here's the latest in a very exciting research program from Stanford -- just about ready to find causal structure behind the behavior of LLMs!.

Zhengxuan Wu

@ZhengxuanZenWu

2 years

📣How does 🔥Alpaca🦙 follow your instructions?. Mechanistic interpretability at scale – our new paper identifies the causal mechanisms the Alpaca 7B model uses to solve simple reasoning tasks (with Atticus Geiger, @ChrisGPotts, and @noahdgoodman!). Paper:

0

3

29

noahdgoodman

@noahdgoodman

5 months

@RichardMCNgo I believe it’s the case that hyperbolic is mixture of exponentials, so if you have broad uncertainty about true discount rate you get hyperbolic.

0

11

noahdgoodman

@noahdgoodman

8 months

A great series of talks! (Except maybe the first one…).

Kempner Institute at Harvard University

@KempnerInst

8 months

See the speaker line-up for 2024-25 #KempnerInstitute Seminar Series! @noahdgoodman @MarkChurchland @scott_linderman @jacobandreas @FieteGroup @rsalakhu @StefanoFusi2 @RAIVNLab @SuthanaLab @behrenstimb @tri_dao @KordingLab @tyrell_turing @AndrewLampinen

1

2

26

noahdgoodman

@noahdgoodman

2 years

Meta-prompt: A Simple Self-improving Language Agent

1

4

25

noahdgoodman

@noahdgoodman

2 years

Noah’s Substack: Data contamination, New Caledonian crows, and the acceleration of cultural

0

12

23

noahdgoodman

@noahdgoodman

1 year

4) A data problem? Maybe LMs don't learn to search when thinking because the training data is human communication, not human thinking. (We don't write our solutions the same way we originally got them.).

2

1

23

noahdgoodman

@noahdgoodman

2 years

Intriguing new work that suggests a kind of hypothesis-generation CoT can dramatically improve ICL for hard induction problems!.

Eric Zelikman

@ericzelikman

2 years

Did you know there’s a task people easily solve but GPT-4 fails? From a few input-output grids, ARC asks you to infer and apply a rule. With Hypothesis Search, we double GPT-4’s score. w/@ruocheng_w @GabrielPoesia @evanthebouncy @nickhaber @noahdgoodman.🧵

1

2

21

noahdgoodman

@noahdgoodman

4 months

@maninthemoon_ I’m human, so probably about 90?.

0

22

noahdgoodman

@noahdgoodman

7 months

@FelixHill84 @stanfordnlp Love this work Felix! We followed it up with an argument that similar burstiness (local observation distribution) leads to CoT. Pretty amazing that both inductive inference and sequential reasoning may emerge from statistical properties of the data!.

2

1

21

noahdgoodman

@noahdgoodman

2 years

Oh, I forgot to promote this fun @cogsci_soc paper with @tsvilodub @meanwhileina and @hawkrobe!. We explore what extra information should be provided in answering simple questions: "Do you have wine?" "No, but we have beer!". (LLMs are spotty at this.).

Leshem (Legend) Choshen 🤖🤗

@LChoshen

2 years

Do you have an opinion?. Humans and LLMs often answer yes/no questions with multiple words.While humans add assisting information.GPT&friends add all\random info unless prompted otherwise. New LLM challenge?.@tsvilodub @meanwhileina @ramihawk @noahdgoodman.

0

5

19

noahdgoodman

@noahdgoodman

2 years

LMs are not terrible at asking the questions they need to figure out a task -- but there's a lot of room for improvement. We set out a framework for exploring this. Fun work with amazing collaborators!.

Alex Tamkin

@AlexTamkin

2 years

Eliciting Human Preferences with Language Models. Currently, people write detailed prompts to describe what they want a language model to do. We explore *generative elicitation*—where models interactively ask for this information through open-ended conversation . 1/

0

3

21

noahdgoodman

@noahdgoodman

2 years

This is really cool — midjourney’s “zoom out” tool is being used for visual in-context learning (aka few-shot induction for old ML folks).

Chase Lean

@chaseleantj

2 years

Here are the results. As you can see, we have generated a new image of the boy sitting down. All you need to do is to crop him out of the picture.

1

0

18

noahdgoodman

@noahdgoodman

2 years

This cool project pulls from a lot of research threads — theorem provers, RL, curricula. (And no LLMs in sight… but stay tuned for my tweet later this week!).

Gabriel Poesia

@GabrielPoesia

2 years

Excited to share our work on Peano, which just came out today in the Phil. Trans. of the Royal Society! This has been quite a journey w/ @noahdgoodman, from dependent types to the role of curricula in the cultural transmission of mathematics! 1/n.

0

1

18

noahdgoodman

@noahdgoodman

2 years

Awesome @willknight on GPT. He chooses quotes where I sound deep rather than just confused 🫶. “These days my viewpoint is that this is AGI, in that it is a kind of intelligence and it is general—but we have to be a little bit less, you know, hysterical about what AGI means,”.

Will Knight

@willknight

2 years

GPT-4 often "seems" remarkably clever, and some believe it exhibits features of more general intelligence. Experts say we need new methods for probing the model's intelligence, and more transparency from @OpenAI about how it works to truly understand it.

1

0

17

noahdgoodman

@noahdgoodman

1 year

With these models we know where to look for 'intentions', 'goals', 'beliefs'; if we manage to find them we can intervene because causal influence must flow through the bottleneck. In short, SSMs will make mechanistic interpretability more feasible.

1

0

18

noahdgoodman

@noahdgoodman

1 year

@karpathy thanks for the shoutout to our gist token work! i think of this as an example of "meta tokens", which are tokens that affect the processing of the model (by eg changing the mask). what other examples are there of this?.

Andrej Karpathy

@karpathy

1 year

New (2h13m 😅) lecture: "Let's build the GPT Tokenizer". Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and

1

0

18

noahdgoodman

@noahdgoodman

8 months

Try this with ChatGPT advanced voice mode: “I would like you to make up a sentence, say that sentence in five different ways with different prosody so that it means different things.”.

1

0

17

noahdgoodman

@noahdgoodman

1 year

Bonus! Once you've given the LM the idea of search it can self-improve (through STaR and APA), finding more efficient ways to search and solving new problems.

1

15

noahdgoodman

@noahdgoodman

8 months

I just at LMapalooza! Say hi at #COLM2024.

0

2

14

noahdgoodman

@noahdgoodman

1 year

What's next? Looking at transfer and adapting these ideas for pre-trained LMs.

2

0

14

noahdgoodman

@noahdgoodman

1 year

(Caveat: it's also an efficiency problem. SoS requires a lot of context even for small problems.).

1

0

14

noahdgoodman

@noahdgoodman

8 months

It seems that this model can use prosody to convey meaning, which is a meaningful step beyond text.

1

0

13

noahdgoodman

@noahdgoodman

8 months

TL;DR has anyone compressed LM context into LoRA patches?

1

13

noahdgoodman

@noahdgoodman

3 months

A note on hyperbole, halo, and language models. No not about startup valuations!.

1

3

12

noahdgoodman

@noahdgoodman

1 year

1) Maybe search cannot be represented as a single sequence? Nope: if I stick print statements into my python search code I get a sequential stream for the search process.

1

0

12

noahdgoodman

@noahdgoodman

2 years

A fun trick @jayelmnop @XiangLisaLi2 and I came up with to teach LMs how to compress their instructions into a few “meta tokens” worth of activations.

Jesse Mu

@jayelmnop

2 years

Prompting is cool and all, but isn't it a waste of compute to encode a prompt over and over again?. We learn to compress prompts up to 26x by using "gist tokens", saving memory+storage and speeding up LM inference:. (w/ @XiangLisaLi2 and @noahdgoodman). 🧵

1

0

12

noahdgoodman

@noahdgoodman

1 year

@RogerGrosse Agreed! It would be super cool to make a set of illustrative examples for alignment along the lines of Power move is to then train transformers on data generated from them and see if you can “find the issues”.

0

12

noahdgoodman

@noahdgoodman

2 years

Here's a fun article thinking about ML to translate from animalese. (Full disclosure: quotes me.)

3

1

10

noahdgoodman

@noahdgoodman

2 years

And. here’s the follow up: we use Peano together with a new kind of tool to guide LLMs toward better reasoning. And then learn from it!.

Gabriel Poesia

@GabrielPoesia

2 years

Thrilled to share our work on Certified Reasoning with Language Models, a really fun collaboration with @gandhikanishk @ericzelikman @noahdgoodman!. This is a mix of Peano, Synchromesh & STaR that shows a simple way to intermix & self-improve on formal and informal reasoning. 1/.

0

2

11

noahdgoodman

@noahdgoodman

3 months

work with the amazing @tsvilodub @gandhikanishk @HaoranZhaoHRZ @jphilipp95 @meanwhileina.

0

3

11

noahdgoodman

@noahdgoodman

1 year

It's awesome to work with @ancadianadragan and far too many others to list.

0

1

11

noahdgoodman

@noahdgoodman

2 years

Noah’s Substack: AI and the end of culture

0

3

8

noahdgoodman

@noahdgoodman

1 year

@StephenLCasper I think you might be wrong, but for the opposite reason to others. Single humans are not unified principle agents that can be aligned to, either. Cas this morning, Cas tonight, Cas near donuts vs salad, etc. A human has incoherent goals. Just like a society.

1

0

10

noahdgoodman

@noahdgoodman

1 year

@OwainEvans_UK Nice! And what if you add the statement that not everything you read is true? (This sounds like joke, but I’m serious — can metacognitive hints ameliorate the effect?).

4

0

9

noahdgoodman

@noahdgoodman

1 year

2) Maybe the architecture is not able to carry out computations needed for search? This seems possible, but ToT defers most interesting steps to the LM anyway.

1

0

9

noahdgoodman

@noahdgoodman

2 years

Finally, a photorealistic robot next to a unicorn in a playground! I have tried this for months, and always get a robotic unicorn. This time I used (the new) in-painting: first generate a unicorn in a playground, then paint in a humanoid robot.

1

0

8

noahdgoodman

@noahdgoodman

2 years

I've been pondering for a while how cognitive psychology should respond to the rise of LLMs. Let's explore by imagining the future:.

1

0

8

noahdgoodman

@noahdgoodman

1 year

Sometimes illusions are illusory! We respond to some interesting criticisms of our DAS method for establishing causal abstractions of neural nets. (#mechinterp).

Zhengxuan Wu

@ZhengxuanZenWu

1 year

🧐In our new commentary: we argue the notion of "illusion" in this paper labels correct explanation as illusory, & that avoiding "illusion" would require unwarranted constraints on NNs. The "illusions" are, though, instructive about how models work. 1/.

0

1

8

noahdgoodman

@noahdgoodman

2 months

@ericzelikman Only if it’s a barber.

0

9

noahdgoodman

@noahdgoodman

2 years

I wrote a little story to explore the ways that LLMs may affect the kind of understandings we seek for psychology! . I posted it a few days ago, but X in their glorious wisdom suppresses links to other platforms. .

3

2

7

noahdgoodman

@noahdgoodman

7 months

@FelixHill84 @stanfordnlp

0

2

5

noahdgoodman

@noahdgoodman

2 years

It is also the feeling in the pit of your stomach when three of today's arxiv papers may have scooped your dissertation project.

1

0

7

noahdgoodman

@noahdgoodman

3 months

In 2014 Kao, Wu, Bergen, and I studied number word interpretation, such as:."That latte cost a million dollars" vs "That latte cost three dollars". (hyperbole)."I'll meet you at 11:57a" vs "I'll meet you at noon". (halo).

1

0

6

noahdgoodman

@noahdgoodman

3 months

This may be a deep failure of understanding or a shallow result of current alignment goals. But it is striking to see in a world where language models *get everything right".

1

0

7

noahdgoodman

@noahdgoodman

2 years

@FelixHill84 CoT is system2. We just don’t move our lips.

0

6

noahdgoodman

@noahdgoodman

1 year

@justintchiu For instance here’s a fun one I haven’t revisited in ages (what does this even look like for LLMs?).

0

2

6

noahdgoodman

@noahdgoodman

2 years

It may also be the angst you feel when your midjourney pictures just aren't as amazing as everyone else's. (Eg why can't I get a proper non-mechanical unicorn?)

2

0

6

noahdgoodman

@noahdgoodman

1 year

3) Various efficiency arguments -- context windows, etc -- possible, but then we should see simple proto-search in CoT.

1

0

6

noahdgoodman

@noahdgoodman

4 months

@willdepue I drank sotas for breakfast before you were born! At the pub with Schmidhuber!.

0

4

noahdgoodman

@noahdgoodman

1 year

@ShunyuYao12 Oh, nice! — “As a thought experiment, imagine GPT-N could “simulate” memory, grounding, learning, and decision-making in context: list all the possible actions, simulate and evaluate each one, and maintain its entire long-term memory explicitly in a very long context.”.

0

5

noahdgoodman

@noahdgoodman

8 months

@gerardsans could you be more specific in your outrageous criticism? we make pretty specific claims and back them up with data. though i guess the comparison to @elonmusk is sort of complementary?.

1

0

5

noahdgoodman

@noahdgoodman

1 year

@emollick @JeffDean Sometimes I write a paper just for the title… and that would be a strong contender @JeffDean.

0

4

noahdgoodman

@noahdgoodman

1 year

@carterleffen We’ll open source an implementation as soon as we can clean up the code. Though if you’re eager, go for it! (Multiple OS implementations are helpful, anyhow.).

1

5

noahdgoodman

@noahdgoodman

1 year

@ESYudkowsky this is a distinction i think we need to consider a lot more -- AI trained to imitate us vs AI trained to understand us. (our new qstar is closer to the later, though not all the way.) it's actually not clear to me which is more alignable.

1

0

5

noahdgoodman

@noahdgoodman

8 months

So, has something like this been done? If not, does anyone want to try it? (I don't have time but would be happy to follow along / advise.).

0

5

noahdgoodman

@noahdgoodman

2 years

@mcxfrank you should be careful, i hear meta-analyses are highly inflammatory:

1

0

4

noahdgoodman

@noahdgoodman

3 months

In new work we compared explored behavior of large language models on the same task. We found models were very poor at understanding hyperbole, and even reversed patterns in human data for halo effects.

1

0

5

noahdgoodman

@noahdgoodman

2 years

@glupyan @kushin_m i think we've seen traces of this. our STaR paper ( gives a hint, since the model learns from it's own reasoning.

0

4

noahdgoodman

@noahdgoodman

1 year

@tallinzen @GoogleDeepMind Unsolvable. Maybe by AGI?.

0

4

noahdgoodman

@noahdgoodman

1 year

@VictorTaelin Ya

0

4

noahdgoodman

@noahdgoodman

3 months

Nonliteral understanding of number words | PNAS

1

0

4

noahdgoodman

@noahdgoodman

2 years

0

4

noahdgoodman

@noahdgoodman

1 year

@ericzelikman @mcxfrank Yes. the confusion about LLM ToM is surpassed only by the confusion about infant/toddler ToM….

0

2

noahdgoodman

@noahdgoodman

1 year

@stuhlmueller This is super neat. I look forward to reading the top essays on this. I also request that you add a baseline condition. Where you prompt a large language model with the prize announcement and include that in your evaluations (blinded, of course).

1

0

4

noahdgoodman

@noahdgoodman

2 years

@GabrielPoesia @AlbertQJiang First assume a spherical company….

0

4

noahdgoodman

@noahdgoodman

1 year

@ellitoshiba The days of artisanal Bayesian modeling for large-scale AI may be sadly behind us. But the days of understanding large-scale AI by artisanal Bayesian modeling may be in front of us!.

0

3

noahdgoodman

@noahdgoodman

1 year

@justintchiu Btw we tried for a while reducing variance by using multiple samples for y and didn’t get anything that worked better. But the sea of variance reducers is wide…!.

1

0

4

noahdgoodman

@noahdgoodman

2 years

Noah’s Substack: Unaligned Trinity

0

4

noahdgoodman

@noahdgoodman

2 years

@Tkaraletsos @sama artificial affect is the next frontier. .

0

4

noahdgoodman

@noahdgoodman

1 year

@horseracedpast These models have a finite memory state that separates past observations from future predictions!.

1

0

3

noahdgoodman

@noahdgoodman

2 years

@evanthebouncy I tried 20 Questions with Meta-prompt as you suggested! I'm not sure if it actually got better (should do a proper eval), but it did develop reasonable instructions for itself (in 🧵).

1

0

3

noahdgoodman

@noahdgoodman

2 years

@srush_nlp @headinthebox I think the other reason to do it is that there are language model HoFs that you want to have the right commutation behavior with chaining. Again not seeing it much yet, but think rejection sampling from generations.

1

0

3

noahdgoodman

@noahdgoodman

1 year

@CFGeek and SSMs should make it a lot easier to find these kind of representations (if they exist) because 'activation patching across time' (tm:) is well defined within the hidden state.

1

0

3

noahdgoodman

@noahdgoodman

2 years

nice -- let's find out what the noosphere thinks! Will AI increase or decrease Common Knowledge?.

Andreas Stuhlmüller

@stuhlmueller

2 years

@mhtessler @noahdgoodman Yeah it's in part a prediction about likely deployment/form factor:

0

3