Sanae Lotfi
@LotfiSanae
Followers
3K
Following
2K
Media
46
Statuses
411
AI Research Scientist @MetaAI (FAIR) | PhD from @nyuniversity
Menlo Park, CA
Joined August 2020
Grateful for this great summary of our recent work!
Soft Tokens, Hard Truths • First scalable RL method for continuous CoT • Learns “soft” tokens (mixtures + noise) → richer reasoning paths • Matches discrete CoTs at pass@1, beats them at pass@32 (more diversity) • Best setup: train w/ soft tokens, infer w/ hard tokens
0
2
14
Getting small batch sizes to work in bfloat16 precision can be challenging. In our recent paper on batch size, we ran all experiments in float32, but memory-constrained settings demand lower precision. Here are two tricks that we used to enable bf16 training at small batch sizes:
7
26
258
Huge thanks to my amazing labmates, mentors, collaborators at Amazon, Meta, and Microsoft Research, and to my friends and family, I can’t name everyone, but I’m truly grateful for all your support. Special shoutout to Ethan for being the first to cite me as Dr. Lotfi!
1
1
28
First, I’m very thankful to my advisor, @andrewgwils, for his mentorship, for guiding me to grow as an independent researcher, and for creating a lab that is both a home to brilliant collaborators and a community of supportive friends. I never took any of this for granted!
1
0
22
Excited to share two milestones: I have officially completed my PhD at NYU, and I have joined Meta AI’s Fundamental AI Research (FAIR) team in the Bay Area as a Research Scientist! I’m so grateful to many people who made this possible; more in this thread 🧵
52
25
745
🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n
28
116
841
My new paper "Deep Learning is Not So Mysterious or Different": https://t.co/AgHdSQkals. Generalization behaviours in deep learning can be intuitively understood through a notion of soft inductive biases, and formally characterized with countable hypothesis bounds! 1/12
16
322
2K
I’m excited to be a keynote speaker and panelist at the machine learning and compression workshop @NeurIPSConf ( https://t.co/qr55V2Eizr). Find me in meeting room 211-214 at 1:25pm and 3:50pm to talk about compression bounds!
1
8
88
We need more of *Science of Deep Learning* in the major ML conferences. This year’s @NeurIPSConf workshop @scifordl on this topic is just starting, and I hope it is NOT the last edition!!!
1
17
152
Very excited to be co-organizing the Science of Deep Learning workshop, which will take place on Sunday. Please stop by, we have an amazing lineup of speakers and panelists. We’ll also announce the winners of the debunking common-held beliefs in DL challenge 🔥
CDS researchers @FlorentinGuth, @LotfiSanae, and recent grad @ZKadkhodaie, et al, are leading a new approach to studying deep learning at #NeurIPS2024. Their workshop (@scifordl) promotes a science of controlled experiments to understand deep nets. https://t.co/cMCbHoim0J
1
4
30
Nice crowd at our #NeurIPS2024 poster today with @LotfiSanae presenting on token-level generalization bounds for LLMs with billions of parameters! https://t.co/rU2VZG0TLC
5
7
69
Over the past year I have been working on using multiple specialized models in a collective fashion to solve novel tasks. We investigated Mixture of Experts (MoE) style routing for merging. However, we find that feature based merging is likely not scalable paradigm. Read on!
2
27
98
I was fortunate to collaborate with this incredible team during my internship at MSR. Not only do they work on important and timely research questions, but they are also some of the most supportive and uplifting people you’ll collaborate with. Highly recommend this position!!
The ML team at @MSFTResearch Montréal 🍁 is hiring a Senior Researcher with a background in ML / NLP!!! Come work with us at the intersection of interactivity, modularity and reasoning in foundation models 😊 MSR is a highly collaborative environment where risky ideas are
1
0
38
🎭Recent work shows that models’ inductive biases for 'simpler' features may lead to shortcut learning. What do 'simple' vs 'complex' features look like? What roles do they play in generalization? Our new paper explores these questions. https://t.co/aW2PrlYQF4
#Neurips2024
7
105
506
📢I’ll be admitting multiple PhD students this winter to Columbia University 🏙️ in the most exciting city in the world! If you are interested in dissecting modern deep learning systems to probe how they work, advancing AI safety, or automating data science, apply to my group.
6
145
560
My experience with other researchers in the ML community has been more uplifting than not! Unexpected words of encouragement and acts of kindness go a long way! To all (senior) researchers who are inclusive, helpful and welcoming: you're amazing and make a huge difference!
1
2
90
Excited to share that we’re organizing a #neurips2024 workshop on scientific methods for understanding deep learning with outstanding speakers & panelists 🥳 Submit your best papers demonstrating why and when deep learning works by **Sep 10** & stay tuned for more details ;)
📢Excited to announce the Workshop on Scientific Methods for Understanding Deep Learning #NeurIPS2024 🥳 ➡️Submission Deadline: Sep 10 ‘24 ➡️Speaker lineup: https://t.co/MmrlYngPTY ➡️Call for paper: https://t.co/GMHdMfJpzg ➡️Our ✨Debunking ✨ challenge: https://t.co/VAzhYWCjc0
0
5
57
Much more in the paper! We are really excited about this work: https://t.co/iSM87VR5CR with amazing co-authors: @KuangYilun, @brandondamos, @micahgoldblum, @m_finzi, and @andrewgwils 8/8
arxiv.org
Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but...
0
1
6
We find that as models are quantized more aggressively, their ability to recall memorized facts from its pretraining data deteriorates faster than its ability to recognize structured patterns, echoing the findings of @tjingrant et al. about the effect of down-scaling LLMs. 7/8
1
0
9