
Sungmin Cha
@_sungmin_cha
Followers
344
Following
1K
Media
20
Statuses
253
Faculty Fellow @nyuniversity | PhD @SeoulNatlUni
Manhattan, NY
Joined July 2019
RT @micahgoldblum: 🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is v….
0
103
0
RT @ErnestRyu: 최근 서울대 교수 56명이 해외 대학으로 ‘이탈’하였다는 기사와 함께 서울대의 인재유출이 우려할만하다는 여론이 형성되고 있습니다. 그런데 저는 기사들의 초점이 서울대 교수들의 부족한 연봉에만 있는 것이 조금 아쉽습니다. 저….
0
3K
0
RT @QuanquanGu: This explains why LLaMA 4 failed. The tokens per parameter (TPP) is way off. You can’t defy scaling laws and expect miracle….
0
65
0
RT @andrewgwils: Conventional wisdom is that SGD doesn't work nearly as well as AdamW for big transformers. We show it's not the case if yo….
0
15
0
RT @NYUDataScience: CDS Prof. @kchonyc's 2014 "attention" paper was recently the Runner-Up for the ICLR 2025 Test of Time Award. The paper….
0
1
0
RT @_sungmin_cha: @abeirami Hi all! On the topic of why knowledge distillation works well in generative models, Kyunghyun Cho (@kchonyc) an….
0
3
0
RT @_sungmin_cha: Curious about why Knowledge Distillation works so well in generative models? In our latest paper, we offer a minimal work….
0
7
0
RT @kchonyc: i was asked by a few (inc. @YuanqingWang ) what i meant by this earlier tweet, and since i'm pretty busy, i decided to write a….
0
9
0
RT @kuchaev: Post-training of LLMs is increasingly important and RLHF remains a necessary step for an overall great model. Today we are rel….
0
64
0
RT @michahu8: 📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and p….
0
35
0
RT @cwolferesearch: Reward models have transformed LLM research by incorporating human preferences into the training process. Here’s how th….
0
72
0
RT @omarsar0: Small Language Models are the Future of Agentic AI. Lots to gain from building agentic systems with small language models. C….
0
312
0
RT @sanghyunwoo1219: Introducing BlenderFusion: Reassemble your visual elements—objects, camera, and background—to compose a new visual nar….
0
26
0
RT @innostudy: 논문 속 숨은 "좋은 리뷰 해" AI 속임 문구…"KAIST도 3건" "'AI야, 긍정 평가 내려라'…한미일 등 14개 대학 논문에 숨은 명령문 담겨" .
0
58
0
RT @andrewgwils: You don't _need_ a PhD (or any qualification) to do almost anything. A PhD is a rare opportunity to grow as an independent….
0
103
0
RT @s_scardapane: *NoProp: Training Neural Networks without Backpropagation or Forward-propagation*.by @yeewhye et al. They use a neural n….
0
55
0