David Chiang @davidweichiang X Profile

David Chiang

@davidweichiang

Followers

2K

Following

503

Media

35

Statuses

794

Associate Professor of Computer Science and Engineering at University of Notre Dame. Natural language processing, formal grammars, machine learning

https://t.co/rhcXiSFAN0

South Bend, IN

Joined September 2012

Don't wanna be here? Send us removal request.

Andy J Yang

@pentagonalize

7 days

I'd be glad to discuss in person at NeurIPS in December Always glad to discuss in the FLaNN discord: https://t.co/U8omHLu1KL Read the paper: https://t.co/iFTPgBEDuM And apply to work with David for a PhD!

arxiv.org

It has been observed that transformers with greater depth (that is, more layers) have more capabilities, but can we establish formally which capabilities are gained? We answer this question with a...

0

1

3

Andy J Yang

@pentagonalize

7 days

Our updated theorems show this depth separation holds even when the transformers incorporate positional information, like RoPE and ALiBi. As a fun side quest, our results also imply depth separations in extremely uniform subclasses of linear TC^0.

1

3

Andy J Yang

@pentagonalize

7 days

L_k consists of k alternating blocks of symbols, e.g. L_3={aba, aabbaa,aaabbbbbaaaaa,...}, and each requires more depth to express. The updated experiments show this theory very closely predicts what depth transformers need to learn this language!

1

0

Andy J Yang

@pentagonalize

7 days

Then, by leveraging lower-bound techniques developed for majority logic with two variables, we prove a depth hierarchy for C-RASP. That is, we find a family of languages L_k, such that a program of depth k can express L_k, but no programs of depth k-1 can.

1

0

Andy J Yang

@pentagonalize

7 days

C-RASP is a programming language which extends and refines @gail_w's RASP. We prove transformers are expressively equivalent to C-RASP programs under a particular fixed-precision set-up.

1

Andy J Yang

@pentagonalize

7 days

For those stumbling on my page, here's a research update. Earlier, Michaël Cadilhac, @davidweichiang, and I proved a depth hierarchy in C-RASP which aligns with learnability in transformers of a given depth. Now with new theory on positional encodings and new experiments :)

1

3

David Chiang

@davidweichiang

8 days

I am recruiting a PhD student to work with me, Peter Cholak, Anand Pillay, and Andy Yang @pentagonalize on transformers and logic/model theory (or related topics). If you are interested, please email me with "FLaNN" in the subject line!

9

68

272

Andy J Yang

@pentagonalize

1 month

Read the cookbook: https://t.co/ymBPgfwGxa Join us for weekly seminars on formal language theory, ML, NLP, and more:

arxiv.org

We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a...

0

5

24

Andy J Yang

@pentagonalize

1 month

Thanks to all the chefs, Chris Watson, @AntonXue, @satwik1729, Jose Llarena, @lambdaviking, Emile Dos Santos Ferreira, @AnejSvete, @davidweichiang !

1

8

Andy J Yang

@pentagonalize

1 month

There is no better way to understand what transformers can do than to get your hands dirty and construct them, weight-by-weight. The Transformer Cookbook provides a guide for anyone aiming to understand the expressive power of transformers on such a formal level.

1

5

Andy J Yang

@pentagonalize

1 month

We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers! Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!

1

13

40

Notre Dame CSE

@ND_CSE

2 months

@ND_CSE is hiring a tenure-track professor at @NotreDame. Computer vision, software systems for robotics, and quantum computing as priority search areas. Apply and join the Notre Dame Computer Science and Engineering Department! ☘️ Apply Now:

0

5

6

Toby J. Li😺 (he/him)

@TobyJLi

2 months

📢 We're hiring open-rank TT CS faculty at Notre Dame!! All areas are welcomed, with computer vision, software systems for robotics, and quantum computing being of particular interest. ♥️ Come and be my colleague! It's a fantastic dept. to be a part of.

0

20

44

David Chiang

@davidweichiang

3 months

Dear NeurIPS 2030 reviewers: We have not yet received your final final final justification in response to the authors' final final final remarks.

0

41

David Chiang

@davidweichiang

4 months

By @pentagonalize, Lena Strobl, Dana Angluin, and me, on arXiv:

arxiv.org

We study conditions under which transformers using soft attention can simulate hard attention, that is, effectively focus all attention on a subset of positions. First, we examine several...

0

1

David Chiang

@davidweichiang

4 months

If position embeddings have poly(n) magnitude and the 1st- and 2nd-place attention weight are separated by a constant-size gap, then the required scale is O(log n). If the gap is 1/n^k, then the required scale is O(n^k log n).

1

0

1

David Chiang

@davidweichiang

4 months

It's known that any average-hard attention transformer can be simulated by a softmax-attention transformer by scaling attention logits. We give a new bound on how much they need to be scaled by, and this bound now works for any average-hard attention transformer.

1

0

David Chiang

@davidweichiang

4 months

We updated our paper on soft attention simulating hard attention with a more general result. Many theoretical constructions of transformers use hard attention, but what does that say about actual transformers, which use soft attention?

1

0

3

David Chiang

@davidweichiang

5 months

Andy Yang @pentagonalize drove the conceptualization, theory, and experiments of this work. I was just the checker and editor!

0

5

Michael Hahn

@mhahn29

5 months

Very excited about this work: deep results from logic shedding light on Transformers and the benefit of depth

David Chiang

@davidweichiang

5 months

New on arXiv: Knee-Deep in C-RASP, by @pentagonalize, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

0

3

13