David Chiang Profile
David Chiang

@davidweichiang

Followers
2K
Following
500
Media
35
Statuses
778

Associate Professor of Computer Science and Engineering at University of Notre Dame. Natural language processing, formal grammars, machine learning

South Bend, IN
Joined September 2012
Don't wanna be here? Send us removal request.
@davidweichiang
David Chiang
1 month
If position embeddings have poly(n) magnitude and the 1st- and 2nd-place attention weight are separated by a constant-size gap, then the required scale is O(log n). If the gap is 1/n^k, then the required scale is O(n^k log n).
1
0
0
@davidweichiang
David Chiang
1 month
It's known that any average-hard attention transformer can be simulated by a softmax-attention transformer by scaling attention logits. We give a new bound on how much they need to be scaled by, and this bound now works for any average-hard attention transformer.
1
0
0
@davidweichiang
David Chiang
1 month
We updated our paper on soft attention simulating hard attention with a more general result. Many theoretical constructions of transformers use hard attention, but what does that say about actual transformers, which use soft attention?
Tweet media one
1
0
2
@davidweichiang
David Chiang
2 months
Andy Yang @pentagonalize drove the conceptualization, theory, and experiments of this work. I was just the checker and editor!.
0
0
5
@davidweichiang
David Chiang
2 months
RT @mhahn29: Very excited about this work: deep results from logic shedding light on Transformers and the benefit of depth.
0
3
0
@davidweichiang
David Chiang
2 months
Although there is a lot of wiggle room in defining rounding/precision, our theoretical predictions are confirmed by experiments surprisingly well!.
1
0
4
@davidweichiang
David Chiang
2 months
The separating languages are very simple: L_k is the language of k blocks of one or more repetitions of a symbol, e.g., L_3 contains strings aba, aabbbbaaaaaa, etc. More blocks require more depth.
1
0
3
@davidweichiang
David Chiang
2 months
Further, we show that deeper programs/formulas in C-RASP are strictly more expressive than shallower programs/formulas. Together, these results imply that in the above-defined variant, deeper transformers are strictly more expressive than shallower transformers.
Tweet media one
1
1
12
@davidweichiang
David Chiang
2 months
C-RASP is a programmer-friendly version of "temporal logic with future-masked counting." We show both are exactly equivalent to soft-attention transformers with fixed precision outside attention but no rounding inside attention (to avoid under/overflow summing over sequence).
1
0
2
@davidweichiang
David Chiang
2 months
New on arXiv: Knee-Deep in C-RASP, by @pentagonalize, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.
Tweet media one
1
10
37
@davidweichiang
David Chiang
3 months
RT @huangxt233: I'll be presenting our paper together with @mhahn29 on Saturday morning poster session. Feel free to reach out!.
0
7
0
@davidweichiang
David Chiang
4 months
RT @AnganaBorah2: Last week, I had a fantastic time presenting our work on belief congruence in LLMs at the Midwest Speech and Language Day….
0
8
0
@davidweichiang
David Chiang
4 months
RT @Ruyuan_Wan: @davidweichiang @aarsri21 @ND_CSE wooohoo! Congratulations Aarohi!!!.
0
1
0
@davidweichiang
David Chiang
4 months
(Out of the papers that Aarohi @aarsri21 has published while at @ND_CSE, 80% have received an award!).
1
0
4
@davidweichiang
David Chiang
4 months
In contrast, on text with variation involving new words or meanings (e.g., "lie" vs. "cap"), far more data is needed, but it leads to a massive breakthrough in performance.
1
0
1
@davidweichiang
David Chiang
4 months
On text with character-level variation (e.g., "strategy" vs. "strat"), out-of-the-box performance improves even with a few additional training examples -- but approaches a plateau, suggesting that more data is not the solution.
1
0
1
@davidweichiang
David Chiang
4 months
Congratulations to @aarsri21 on winning the Best Paper Award at W-NUT at NAACL 2025! This paper applies various interventions simulating noisy text or dialectal variation to discover how different interventions have different effects.
Tweet card summary image
arxiv.org
We present a suite of experiments that allow us to understand the underlying challenges of language model adaptation to nonstandard text. We do so by designing interventions that approximate core...
2
4
24