Nayoung Lee @nayoung_nylee X Profile

Nayoung Lee

@nayoung_nylee

Followers

261

Following

99

Media

0

Statuses

26

ECE Ph.D. Candidate @UWMadison Soon to be on the job market https://t.co/21MsvK4XhF

Joined February 2022

Don't wanna be here? Send us removal request.

Dimitris Papailiopoulos

@DimitrisPapail

4 months

Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained

24

80

524

Dimitris Papailiopoulos

@DimitrisPapail

4 months

We have another related result with Zheyang @zheyangxiong and Vasilis @vpapageorgiou_ on transfer. We finetune models on artificial retrieval tasks, and they become better on the "needle in haystack" test the amount by which transfer happens seems to be a function of model

Dimitris Papailiopoulos

@DimitrisPapail

4 months

Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained

2

8

58

Dimitris Papailiopoulos

@DimitrisPapail

3 months

The ability to generalize out of the training distribution increases with the amount of pretraining you do. A cool finding that Jack @jackcai1206 and @nayoung_nylee stubmled upon in their recent length generalization transfer paper is the following: As we specifically

5

9

109

Dimitris Papailiopoulos

@DimitrisPapail

9 months

o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement. Below is the acc of a tiny model teaching itself how to add.

55

118

1K

Dimitris Papailiopoulos

@DimitrisPapail

9 months

one of the few times in my career I feel good about calling a problem "solved", for an approximately satisfying value of "solved" length generalization challenges can be overcome with iterative self improvement

Dimitris Papailiopoulos

@DimitrisPapail

9 months

o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement. Below is the acc of a tiny model teaching itself how to add.

6

5

106

Dimitris Papailiopoulos

@DimitrisPapail

9 months

o3 can't multiply 10 digit numbers, but here is the acc of a 14m transformer that teaches itself how to do it, with iterative self-improvement

Dimitris Papailiopoulos

@DimitrisPapail

9 months

o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement. Below is the acc of a tiny model teaching itself how to add.

33

74

827

Dimitris Papailiopoulos

@DimitrisPapail

10 months

Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. Paper on arxiv coming on Monday. Link to a talk I gave on this below 👇 Super excited about this work!

19

148

1K

Changho Shin

@Changho_Shin_

9 months

What enables a strong model to surpass its weaker teacher? 🚀 Excited to share our ICLR 2025 paper: "Weak-to-Strong Generalization Through the Data-Centric Lens"! 🧵

2

24

126

Kangwook Lee

@Kangwook_Lee

2 years

🤗 My group is looking for a postdoc interested in the theoretical/algorithmic aspects of foundation models, particularly LLMs. You can see our recent papers here: https://t.co/88dTHxaG1T If you are interested in working with us, please email me your CV & research statement! 😊

kangwooklee.com

Lee Lab @ UW Madison

2

19

76

Changho Shin

@Changho_Shin_

2 years

Curious if you can robustify💪 foundation models🤖 almost for free?(!!)💸 Join us for the poster presentation on “Zero-Shot Robustification of Zero-Shot Models” with @dyhadila, @CaiLinrong @fredsala!

Fred Sala

@fredsala

2 years

Come by #ICLR2024 Session 2 on Tuesday to see our work using representation editing to make foundation models robust! No fine-tuning, no additional data, no problem. https://t.co/6GYYuOPN9T

1

3

20

Dimitris Papailiopoulos

@DimitrisPapail

2 years

"Looped Transformers are Better at Learning Learning Algorithms" in ICLR @Yang_Liuu offers a simple and clean message in this paper. When it comes to emulating learning algorithms, using a looped transformer (i.e., one where the iterative structure is hardcoded) helps a lot.

5

66

387

Dimitris Papailiopoulos

@DimitrisPapail

2 years

"Teaching Arithmetic" got in ICLR! See you in Austria, and I promise I'll quickly teach you how to add, but I can't promise you'll length generalize

Dimitris Papailiopoulos

@DimitrisPapail

2 years

1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.

3

14

125

LLM360

@llm360

2 years

🚀 1/7 We are thrilled to launch LLM360 — pushing the frontier of open-source & transparent LLMs! Starting with Amber (7B) & CrystalCoder (7B), we are releasing brand new pre-trained LLMs with all training code, data, and up to 360 model checkpoints. 🔗 https://t.co/ZcLPtYQhdQ

19

188

1K

Dimitris Papailiopoulos

@DimitrisPapail

2 years

You've heard about the Reversal Curse of autoregressive LMs, but have you heard of the Blessing of Reversal? In our "Teaching Arithmetic" work we see that teaching A+B = C is harder than A+B=reverse(C). Why? Reversal makes the function easier to learn! https://t.co/ECbeypLF4q

3

12

83

Dimitris Papailiopoulos

@DimitrisPapail

2 years

Nayoung (@nayoung_nylee) gave a great short talk on our work at the recent LLM workshop at @SimonsInstitute https://t.co/i4NUyJsKP4

Dimitris Papailiopoulos

@DimitrisPapail

2 years

1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.

0

8

33

Kartik Sreenivasan

@KartikSreeni

2 years

My amazing collaborator @nayoung_nylee gave a great short talk on our recent work at @SimonsInstitute. Check it out!

Dimitris Papailiopoulos

@DimitrisPapail

2 years

Nayoung (@nayoung_nylee) gave a great short talk on our work at the recent LLM workshop at @SimonsInstitute https://t.co/i4NUyJsKP4

0

3

12

Kangwook Lee

@Kangwook_Lee

2 years

🧵Four amazing presentations lined up for the final day of #ICML2023! Our group will cover topics from teaching Transformers arithmetic and iterative in-context learning to understanding weight decay and speeding up GPT! Stay tuned! (1/5)

1

10

17

Davis Blalock

@davisblalock

2 years

4 points stood out to me: 1) Models only learn to add the number of digits they saw during training, not how to do addition in general. 2) Models are way better at adding if you let them output lower digits first instead of higher ones; this lets them compute the answer one

Dimitris Papailiopoulos

@DimitrisPapail

2 years

1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.

0

10

27

Kartik Sreenivasan

@KartikSreeni

2 years

Our paper is finally out! We wanted to use basic arithmetic as a lens to better understand the phenomenon of emergence in transformers. I thoroughly enjoyed working on this project with my amazing collaborators @nayoung_nylee, @jasondeanlee @Kangwook_Lee @DimitrisPapail

Dimitris Papailiopoulos

@DimitrisPapail

2 years

1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.

2

6

36

Dimitris Papailiopoulos

@DimitrisPapail

2 years

1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.

13

151

630