David Chiang Profile
David Chiang

@davidweichiang

Followers
2K
Following
503
Media
35
Statuses
794

Associate Professor of Computer Science and Engineering at University of Notre Dame. Natural language processing, formal grammars, machine learning

South Bend, IN
Joined September 2012
Don't wanna be here? Send us removal request.
@pentagonalize
Andy J Yang
7 days
Our updated theorems show this depth separation holds even when the transformers incorporate positional information, like RoPE and ALiBi. As a fun side quest, our results also imply depth separations in extremely uniform subclasses of linear TC^0.
1
1
3
@pentagonalize
Andy J Yang
7 days
L_k consists of k alternating blocks of symbols, e.g. L_3={aba, aabbaa,aaabbbbbaaaaa,...}, and each requires more depth to express. The updated experiments show this theory very closely predicts what depth transformers need to learn this language!
1
1
0
@pentagonalize
Andy J Yang
7 days
Then, by leveraging lower-bound techniques developed for majority logic with two variables, we prove a depth hierarchy for C-RASP. That is, we find a family of languages L_k, such that a program of depth k can express L_k, but no programs of depth k-1 can.
1
1
0
@pentagonalize
Andy J Yang
7 days
C-RASP is a programming language which extends and refines @gail_w's RASP. We prove transformers are expressively equivalent to C-RASP programs under a particular fixed-precision set-up.
1
1
1
@pentagonalize
Andy J Yang
7 days
For those stumbling on my page, here's a research update. Earlier, Michaël Cadilhac, @davidweichiang, and I proved a depth hierarchy in C-RASP which aligns with learnability in transformers of a given depth. Now with new theory on positional encodings and new experiments :)
1
1
3
@davidweichiang
David Chiang
8 days
I am recruiting a PhD student to work with me, Peter Cholak, Anand Pillay, and Andy Yang @pentagonalize on transformers and logic/model theory (or related topics). If you are interested, please email me with "FLaNN" in the subject line!
9
68
272
@pentagonalize
Andy J Yang
1 month
Thanks to all the chefs, Chris Watson, @AntonXue, @satwik1729, Jose Llarena, @lambdaviking, Emile Dos Santos Ferreira, @AnejSvete, @davidweichiang !
1
1
8
@pentagonalize
Andy J Yang
1 month
There is no better way to understand what transformers can do than to get your hands dirty and construct them, weight-by-weight. The Transformer Cookbook provides a guide for anyone aiming to understand the expressive power of transformers on such a formal level.
1
1
5
@pentagonalize
Andy J Yang
1 month
We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers! Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!
1
13
40
@ND_CSE
Notre Dame CSE
2 months
@ND_CSE is hiring a tenure-track professor at @NotreDame. Computer vision, software systems for robotics, and quantum computing as priority search areas. Apply and join the Notre Dame Computer Science and Engineering Department! ☘️ Apply Now:
0
5
6
@TobyJLi
Toby J. Li😺 (he/him)
2 months
📢 We're hiring open-rank TT CS faculty at Notre Dame!! All areas are welcomed, with computer vision, software systems for robotics, and quantum computing being of particular interest. ♥️ Come and be my colleague! It's a fantastic dept. to be a part of.
0
20
44
@davidweichiang
David Chiang
3 months
Dear NeurIPS 2030 reviewers: We have not yet received your final final final justification in response to the authors' final final final remarks.
0
0
41
@davidweichiang
David Chiang
4 months
If position embeddings have poly(n) magnitude and the 1st- and 2nd-place attention weight are separated by a constant-size gap, then the required scale is O(log n). If the gap is 1/n^k, then the required scale is O(n^k log n).
1
0
1
@davidweichiang
David Chiang
4 months
It's known that any average-hard attention transformer can be simulated by a softmax-attention transformer by scaling attention logits. We give a new bound on how much they need to be scaled by, and this bound now works for any average-hard attention transformer.
1
0
0
@davidweichiang
David Chiang
4 months
We updated our paper on soft attention simulating hard attention with a more general result. Many theoretical constructions of transformers use hard attention, but what does that say about actual transformers, which use soft attention?
1
0
3
@davidweichiang
David Chiang
5 months
Andy Yang @pentagonalize drove the conceptualization, theory, and experiments of this work. I was just the checker and editor!
0
0
5
@mhahn29
Michael Hahn
5 months
Very excited about this work: deep results from logic shedding light on Transformers and the benefit of depth
@davidweichiang
David Chiang
5 months
New on arXiv: Knee-Deep in C-RASP, by @pentagonalize, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.
0
3
13