aryaman2020 Profile Banner
Aryaman Arora Profile
Aryaman Arora

@aryaman2020

Followers
7K
Following
31K
Media
310
Statuses
13K

member of technical staff @stanfordnlp

🌲
Joined December 2018
Don't wanna be here? Send us removal request.
@aryaman2020
Aryaman Arora
2 months
I'll be interning at @TransluceAI for the summer doing interp 🫡 will be staying in SF.
15
5
251
@aryaman2020
Aryaman Arora
9 hours
more on delta update rule:. "this process can also be regarded as optimizing an online regression loss using a single step of SGD". "An alternative interpretation for DeltaNet is from the perspective of key-value retrieval. It first retrieves the old value using the current.
1
0
1
@aryaman2020
Aryaman Arora
10 hours
nerdsniped while reading the DeltaNet paper: the main representation rewrite function we propose in ReFT is super similar to the delta update rule! no wonder it worked so well
Tweet media one
Tweet media two
1
0
12
@aryaman2020
Aryaman Arora
10 hours
it's also true at the same time that interp researchers should focus on ways to add value that are complementary to / not easily replaced by baselines like prompting
@DimitrisPapail
Dimitris Papailiopoulos
18 hours
@sohamdaga22 why yes? please give me a concrete example where mech interpr has, beyond a reasonable doubt, informed how we design, train, or understand these models, that simple input, output A/B tests (i.e., asking the model questions) have been insufficient for?.
0
0
2
@aryaman2020
Aryaman Arora
10 hours
people rightly ask, where is the practical downstream value that mech interp has created? it turns out that if you prioritise rigour over immediate scaling there are plenty of good examples 🙂.
@ChrisGPotts
Christopher Potts
12 hours
@DimitrisPapail @sohamdaga22 I feel that these papers (from my group) are examples of what you are nominally asking for:. 1. 2. 3. 4. 5. 6. 7.
1
0
8
@aryaman2020
Aryaman Arora
10 hours
RT @ChrisGPotts: @DimitrisPapail @sohamdaga22 I feel that these papers (from my group) are examples of what you are nominally asking for:….
0
9
0
@aryaman2020
Aryaman Arora
17 hours
RT @DimitrisPapail: @sohamdaga22 why yes? please give me a concrete example where mech interpr has, beyond a reasonable doubt, informed how….
0
1
0
@aryaman2020
Aryaman Arora
20 hours
RT @neil_rathi: new paper!. robust and general "soft" preferences are a hallmark of human language production. we show that these emerge fr….
0
5
0
@aryaman2020
Aryaman Arora
2 days
that's crazy, maybe someone should work on this.
@unusual_whales
unusual_whales
2 days
"Researchers from top AI labs including Google, OpenAI, and Anthropic warn they may be losing the ability to understand advanced AI models," per FORTUNE.
10
2
180
@aryaman2020
Aryaman Arora
4 days
please go to this fire poster.
@harshitj__
Harshit Joshi
4 days
flying to Vienna 🇦🇹 for ACL to present Genie Worksheets (Monday 11am)!. come and say hi if you want to talk about how to create controllable and reliable application layers on top of LLMs, knowledge discovery and curation, or just wanna hang
Tweet media one
0
0
13
@aryaman2020
Aryaman Arora
7 days
RT @shi_weiyan: 💥New Paper💥. #LLMs encode harmfulness and refusal separately!.1️⃣We found a harmfulness direction.2️⃣The model internally k….
0
21
0
@aryaman2020
Aryaman Arora
8 days
relatedly, you realise how useful mundane time is when you try to hyper-optimise your time for doing better research. the highest-value time seems to be shower and time spent unable to fall asleep in bed.
1
0
51
@aryaman2020
Aryaman Arora
8 days
if you think data cleaning is beneath you then ngmi.
@heeney_luke
Luke Heeney
12 days
Academia must be the only industry where extremely high-skilled PhD students spend much of their time doing low value work (like data cleaning). A 1st year management consultant outsources this immediately. Imagine the productivity gains if PhDs could focus on thinking.
11
30
680
@aryaman2020
Aryaman Arora
10 days
I should have shouted out @sarahwiegreffe for being such a great moderator 🫡.
0
0
8
@aryaman2020
Aryaman Arora
11 days
ok that's the end! pretty interesting.
2
0
6
@aryaman2020
Aryaman Arora
11 days
- Lo: we should study the effect of data on models wayyy more than architecture.- Marks and Saphra agree.- Barez: we don't know how to find/make the right data for lots of problems. ppl like Cynthia Rudin suggest work on whitebox arch, but no clear path to this. Barez says it's.
1
0
5
@aryaman2020
Aryaman Arora
11 days
*science -> scientific understanding.
1
0
1
@aryaman2020
Aryaman Arora
11 days
[NB: I think we will make scientific progress without understanding simply due to scaling simulations for RL in various domains, but yeah seems right that reverse-engineering our blackboxes is needed for science].
1
0
2
@aryaman2020
Aryaman Arora
11 days
in science, incentives are aligned bc understanding translates to e.g. better drugs/materials, so interp falls out naturally.
1
0
4
@aryaman2020
Aryaman Arora
11 days
- Barez: interp isn't useful so far bc of incentive mismatch. apart from discovery, there's no real need for understanding/explanation. e.g. court of law decides if your system was fair/transparent -- no current monetary incentive to be interpretable.
1
0
3
@aryaman2020
Aryaman Arora
11 days
- Lo (reply to Saphra): we *should* care about what non-science ppl think. if the findings don't transfer outside, impact is limited. - Saphra: right now AI doesn't look like it's advancing scientific understanding. interp's goal is getting back understanding from these.
1
0
5