Nayoung Lee Profile
Nayoung Lee

@nayoung_nylee

Followers
261
Following
99
Media
0
Statuses
26

ECE Ph.D. Candidate @UWMadison Soon to be on the job market https://t.co/21MsvK4XhF

Joined February 2022
Don't wanna be here? Send us removal request.
@DimitrisPapail
Dimitris Papailiopoulos
4 months
Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained
24
80
524
@DimitrisPapail
Dimitris Papailiopoulos
4 months
We have another related result with Zheyang @zheyangxiong and Vasilis @vpapageorgiou_ on transfer. We finetune models on artificial retrieval tasks, and they become better on the "needle in haystack" test the amount by which transfer happens seems to be a function of model
@DimitrisPapail
Dimitris Papailiopoulos
4 months
Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained
2
8
58
@DimitrisPapail
Dimitris Papailiopoulos
3 months
The ability to generalize out of the training distribution increases with the amount of pretraining you do. A cool finding that Jack @jackcai1206 and @nayoung_nylee stubmled upon in their recent length generalization transfer paper is the following: As we specifically
5
9
109
@DimitrisPapail
Dimitris Papailiopoulos
9 months
o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement. Below is the acc of a tiny model teaching itself how to add.
55
118
1K
@DimitrisPapail
Dimitris Papailiopoulos
9 months
one of the few times in my career I feel good about calling a problem "solved", for an approximately satisfying value of "solved" length generalization challenges can be overcome with iterative self improvement
@DimitrisPapail
Dimitris Papailiopoulos
9 months
o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement. Below is the acc of a tiny model teaching itself how to add.
6
5
106
@DimitrisPapail
Dimitris Papailiopoulos
9 months
o3 can't multiply 10 digit numbers, but here is the acc of a 14m transformer that teaches itself how to do it, with iterative self-improvement
@DimitrisPapail
Dimitris Papailiopoulos
9 months
o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement. Below is the acc of a tiny model teaching itself how to add.
33
74
827
@DimitrisPapail
Dimitris Papailiopoulos
10 months
Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. Paper on arxiv coming on Monday. Link to a talk I gave on this below ๐Ÿ‘‡ Super excited about this work!
19
148
1K
@Changho_Shin_
Changho Shin
9 months
What enables a strong model to surpass its weaker teacher? ๐Ÿš€ Excited to share our ICLR 2025 paper: "Weak-to-Strong Generalization Through the Data-Centric Lens"! ๐Ÿงต
2
24
126
@Kangwook_Lee
Kangwook Lee
2 years
๐Ÿค— My group is looking for a postdoc interested in the theoretical/algorithmic aspects of foundation models, particularly LLMs. You can see our recent papers here: https://t.co/88dTHxaG1T If you are interested in working with us, please email me your CV & research statement! ๐Ÿ˜Š
kangwooklee.com
Lee Lab @ UW Madison
2
19
76
@Changho_Shin_
Changho Shin
2 years
Curious if you can robustify๐Ÿ’ช foundation models๐Ÿค– almost for free?(!!)๐Ÿ’ธ Join us for the poster presentation on โ€œZero-Shot Robustification of Zero-Shot Modelsโ€ with @dyhadila, @CaiLinrong @fredsala!
@fredsala
Fred Sala
2 years
Come by #ICLR2024 Session 2 on Tuesday to see our work using representation editing to make foundation models robust! No fine-tuning, no additional data, no problem. https://t.co/6GYYuOPN9T
1
3
20
@DimitrisPapail
Dimitris Papailiopoulos
2 years
"Looped Transformers are Better at Learning Learning Algorithms" in ICLR @Yang_Liuu offers a simple and clean message in this paper. When it comes to emulating learning algorithms, using a looped transformer (i.e., one where the iterative structure is hardcoded) helps a lot.
5
66
387
@DimitrisPapail
Dimitris Papailiopoulos
2 years
"Teaching Arithmetic" got in ICLR! See you in Austria, and I promise I'll quickly teach you how to add, but I can't promise you'll length generalize
@DimitrisPapail
Dimitris Papailiopoulos
2 years
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
3
14
125
@llm360
LLM360
2 years
๐Ÿš€ 1/7 We are thrilled to launch LLM360 โ€” pushing the frontier of open-source & transparent LLMs! Starting with Amber (7B) & CrystalCoder (7B), we are releasing brand new pre-trained LLMs with all training code, data, and up to 360 model checkpoints. ๐Ÿ”— https://t.co/ZcLPtYQhdQ
19
188
1K
@DimitrisPapail
Dimitris Papailiopoulos
2 years
You've heard about the Reversal Curse of autoregressive LMs, but have you heard of the Blessing of Reversal? In our "Teaching Arithmetic" work we see that teaching A+B = C is harder than A+B=reverse(C). Why? Reversal makes the function easier to learn! https://t.co/ECbeypLF4q
3
12
83
@DimitrisPapail
Dimitris Papailiopoulos
2 years
Nayoung (@nayoung_nylee) gave a great short talk on our work at the recent LLM workshop at @SimonsInstitute https://t.co/i4NUyJsKP4
@DimitrisPapail
Dimitris Papailiopoulos
2 years
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
0
8
33
@KartikSreeni
Kartik Sreenivasan
2 years
My amazing collaborator @nayoung_nylee gave a great short talk on our recent work at @SimonsInstitute. Check it out!
@DimitrisPapail
Dimitris Papailiopoulos
2 years
Nayoung (@nayoung_nylee) gave a great short talk on our work at the recent LLM workshop at @SimonsInstitute https://t.co/i4NUyJsKP4
0
3
12
@Kangwook_Lee
Kangwook Lee
2 years
๐ŸงตFour amazing presentations lined up for the final day of #ICML2023! Our group will cover topics from teaching Transformers arithmetic and iterative in-context learning to understanding weight decay and speeding up GPT! Stay tuned! (1/5)
1
10
17
@davisblalock
Davis Blalock
2 years
4 points stood out to me: 1) Models only learn to add the number of digits they saw during training, not how to do addition in general. 2) Models are way better at adding if you let them output lower digits first instead of higher ones; this lets them compute the answer one
@DimitrisPapail
Dimitris Papailiopoulos
2 years
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
0
10
27
@KartikSreeni
Kartik Sreenivasan
2 years
Our paper is finally out! We wanted to use basic arithmetic as a lens to better understand the phenomenon of emergence in transformers. I thoroughly enjoyed working on this project with my amazing collaborators @nayoung_nylee, @jasondeanlee @Kangwook_Lee @DimitrisPapail
@DimitrisPapail
Dimitris Papailiopoulos
2 years
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
2
6
36
@DimitrisPapail
Dimitris Papailiopoulos
2 years
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
13
151
630