Aryaman Arora @aryaman2020 X Profile

Aryaman Arora

@aryaman2020

Followers

7K

Following

31K

Media

310

Statuses

13K

member of technical staff @stanfordnlp

🌲

Joined December 2018

Don't wanna be here? Send us removal request.

Aryaman Arora

@aryaman2020

2 months

I'll be interning at @TransluceAI for the summer doing interp 🫡 will be staying in SF.

15

5

251

Aryaman Arora

@aryaman2020

9 hours

more on delta update rule:. "this process can also be regarded as optimizing an online regression loss using a single step of SGD". "An alternative interpretation for DeltaNet is from the perspective of key-value retrieval. It first retrieves the old value using the current.

1

0

1

Aryaman Arora

@aryaman2020

10 hours

nerdsniped while reading the DeltaNet paper: the main representation rewrite function we propose in ReFT is super similar to the delta update rule! no wonder it worked so well

1

0

12

Aryaman Arora

@aryaman2020

10 hours

it's also true at the same time that interp researchers should focus on ways to add value that are complementary to / not easily replaced by baselines like prompting

Dimitris Papailiopoulos

@DimitrisPapail

18 hours

@sohamdaga22 why yes? please give me a concrete example where mech interpr has, beyond a reasonable doubt, informed how we design, train, or understand these models, that simple input, output A/B tests (i.e., asking the model questions) have been insufficient for?.

0

2

Aryaman Arora

@aryaman2020

10 hours

people rightly ask, where is the practical downstream value that mech interp has created? it turns out that if you prioritise rigour over immediate scaling there are plenty of good examples 🙂.

Christopher Potts

@ChrisGPotts

12 hours

@DimitrisPapail @sohamdaga22 I feel that these papers (from my group) are examples of what you are nominally asking for:. 1. 2. 3. 4. 5. 6. 7.

1

0

8

Aryaman Arora

@aryaman2020

10 hours

RT @ChrisGPotts: @DimitrisPapail @sohamdaga22 I feel that these papers (from my group) are examples of what you are nominally asking for:….

0

9

0

Aryaman Arora

@aryaman2020

17 hours

RT @DimitrisPapail: @sohamdaga22 why yes? please give me a concrete example where mech interpr has, beyond a reasonable doubt, informed how….

0

1

0

Aryaman Arora

@aryaman2020

20 hours

RT @neil_rathi: new paper!. robust and general "soft" preferences are a hallmark of human language production. we show that these emerge fr….

0

5

0

Aryaman Arora

@aryaman2020

2 days

that's crazy, maybe someone should work on this.

unusual_whales

@unusual_whales

2 days

"Researchers from top AI labs including Google, OpenAI, and Anthropic warn they may be losing the ability to understand advanced AI models," per FORTUNE.

10

2

180

Aryaman Arora

@aryaman2020

4 days

please go to this fire poster.

Harshit Joshi

@harshitj__

4 days

flying to Vienna 🇦🇹 for ACL to present Genie Worksheets (Monday 11am)!. come and say hi if you want to talk about how to create controllable and reliable application layers on top of LLMs, knowledge discovery and curation, or just wanna hang

0

13

Aryaman Arora

@aryaman2020

7 days

RT @shi_weiyan: 💥New Paper💥. #LLMs encode harmfulness and refusal separately!.1️⃣We found a harmfulness direction.2️⃣The model internally k….

0

21

0

Aryaman Arora

@aryaman2020

8 days

relatedly, you realise how useful mundane time is when you try to hyper-optimise your time for doing better research. the highest-value time seems to be shower and time spent unable to fall asleep in bed.

1

0

51

Aryaman Arora

@aryaman2020

8 days

if you think data cleaning is beneath you then ngmi.

Luke Heeney

@heeney_luke

12 days

Academia must be the only industry where extremely high-skilled PhD students spend much of their time doing low value work (like data cleaning). A 1st year management consultant outsources this immediately. Imagine the productivity gains if PhDs could focus on thinking.

11

30

680

Aryaman Arora

@aryaman2020

10 days

I should have shouted out @sarahwiegreffe for being such a great moderator 🫡.

0

8

Aryaman Arora

@aryaman2020

11 days

ok that's the end! pretty interesting.

2

0

6

Aryaman Arora

@aryaman2020

11 days

- Lo: we should study the effect of data on models wayyy more than architecture.- Marks and Saphra agree.- Barez: we don't know how to find/make the right data for lots of problems. ppl like Cynthia Rudin suggest work on whitebox arch, but no clear path to this. Barez says it's.

1

0

5

Aryaman Arora

@aryaman2020

11 days

*science -> scientific understanding.

1

0

1

Aryaman Arora

@aryaman2020

11 days

[NB: I think we will make scientific progress without understanding simply due to scaling simulations for RL in various domains, but yeah seems right that reverse-engineering our blackboxes is needed for science].

1

0

2

Aryaman Arora

@aryaman2020

11 days

in science, incentives are aligned bc understanding translates to e.g. better drugs/materials, so interp falls out naturally.

1

0

4

Aryaman Arora

@aryaman2020

11 days

- Barez: interp isn't useful so far bc of incentive mismatch. apart from discovery, there's no real need for understanding/explanation. e.g. court of law decides if your system was fair/transparent -- no current monetary incentive to be interpretable.

1

0

3

Aryaman Arora

@aryaman2020

11 days

- Lo (reply to Saphra): we *should* care about what non-science ppl think. if the findings don't transfer outside, impact is limited. - Saphra: right now AI doesn't look like it's advancing scientific understanding. interp's goal is getting back understanding from these.

1

0

5