Nayoung Lee
@nayoung_nylee
Followers
261
Following
99
Media
0
Statuses
26
ECE Ph.D. Candidate @UWMadison Soon to be on the job market https://t.co/21MsvK4XhF
Joined February 2022
Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained
24
80
524
We have another related result with Zheyang @zheyangxiong and Vasilis @vpapageorgiou_ on transfer. We finetune models on artificial retrieval tasks, and they become better on the "needle in haystack" test the amount by which transfer happens seems to be a function of model
Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained
2
8
58
The ability to generalize out of the training distribution increases with the amount of pretraining you do. A cool finding that Jack @jackcai1206 and @nayoung_nylee stubmled upon in their recent length generalization transfer paper is the following: As we specifically
5
9
109
o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement. Below is the acc of a tiny model teaching itself how to add.
55
118
1K
one of the few times in my career I feel good about calling a problem "solved", for an approximately satisfying value of "solved" length generalization challenges can be overcome with iterative self improvement
o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement. Below is the acc of a tiny model teaching itself how to add.
6
5
106
o3 can't multiply 10 digit numbers, but here is the acc of a 14m transformer that teaches itself how to do it, with iterative self-improvement
o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement. Below is the acc of a tiny model teaching itself how to add.
33
74
827
Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. Paper on arxiv coming on Monday. Link to a talk I gave on this below ๐ Super excited about this work!
19
148
1K
What enables a strong model to surpass its weaker teacher? ๐ Excited to share our ICLR 2025 paper: "Weak-to-Strong Generalization Through the Data-Centric Lens"! ๐งต
2
24
126
๐ค My group is looking for a postdoc interested in the theoretical/algorithmic aspects of foundation models, particularly LLMs. You can see our recent papers here: https://t.co/88dTHxaG1T If you are interested in working with us, please email me your CV & research statement! ๐
kangwooklee.com
Lee Lab @ UW Madison
2
19
76
Curious if you can robustify๐ช foundation models๐ค almost for free?(!!)๐ธ Join us for the poster presentation on โZero-Shot Robustification of Zero-Shot Modelsโ with @dyhadila, @CaiLinrong @fredsala!
Come by #ICLR2024 Session 2 on Tuesday to see our work using representation editing to make foundation models robust! No fine-tuning, no additional data, no problem. https://t.co/6GYYuOPN9T
1
3
20
"Looped Transformers are Better at Learning Learning Algorithms" in ICLR @Yang_Liuu offers a simple and clean message in this paper. When it comes to emulating learning algorithms, using a looped transformer (i.e., one where the iterative structure is hardcoded) helps a lot.
5
66
387
"Teaching Arithmetic" got in ICLR! See you in Austria, and I promise I'll quickly teach you how to add, but I can't promise you'll length generalize
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
3
14
125
๐ 1/7 We are thrilled to launch LLM360 โ pushing the frontier of open-source & transparent LLMs! Starting with Amber (7B) & CrystalCoder (7B), we are releasing brand new pre-trained LLMs with all training code, data, and up to 360 model checkpoints. ๐ https://t.co/ZcLPtYQhdQ
19
188
1K
You've heard about the Reversal Curse of autoregressive LMs, but have you heard of the Blessing of Reversal? In our "Teaching Arithmetic" work we see that teaching A+B = C is harder than A+B=reverse(C). Why? Reversal makes the function easier to learn! https://t.co/ECbeypLF4q
3
12
83
Nayoung (@nayoung_nylee) gave a great short talk on our work at the recent LLM workshop at @SimonsInstitute
https://t.co/i4NUyJsKP4
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
0
8
33
My amazing collaborator @nayoung_nylee gave a great short talk on our recent work at @SimonsInstitute. Check it out!
Nayoung (@nayoung_nylee) gave a great short talk on our work at the recent LLM workshop at @SimonsInstitute
https://t.co/i4NUyJsKP4
0
3
12
๐งตFour amazing presentations lined up for the final day of #ICML2023! Our group will cover topics from teaching Transformers arithmetic and iterative in-context learning to understanding weight decay and speeding up GPT! Stay tuned! (1/5)
1
10
17
4 points stood out to me: 1) Models only learn to add the number of digits they saw during training, not how to do addition in general. 2) Models are way better at adding if you let them output lower digits first instead of higher ones; this lets them compute the answer one
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
0
10
27
Our paper is finally out! We wanted to use basic arithmetic as a lens to better understand the phenomenon of emergence in transformers. I thoroughly enjoyed working on this project with my amazing collaborators @nayoung_nylee, @jasondeanlee @Kangwook_Lee @DimitrisPapail
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
2
6
36
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
13
151
630