weijie444 Profile Banner
Weijie Su Profile
Weijie Su

@weijie444

Followers
6K
Following
4K
Media
55
Statuses
865

Associate Professor @Wharton & CS Penn. coDir @Penn Research #MachineLearning. PhD @Stanford. #MachineLearng #DeepLearning #Statistics #Privacy #Optimization.

Philadelphia, PA
Joined September 2011
Don't wanna be here? Send us removal request.
@weijie444
Weijie Su
2 days
Back from NeurIPS with one fun observation: Two different communities--optimization and deep learning theory--were both talking about Muon (aka spectral GD). https://t.co/YEG65gGLqG Lots of emerging (and sometimes contradictory) takes. I’ve got two perspectives sketched in the
@weijie444
Weijie Su
1 month
Why and how does gradient/matrix orthogonalization work in Muon for training #LLMs? We introduce an isotropic curvature model to explain it. Take-aways: 1. Orthogonalization is a good idea, "on the right track". 2. But it might not be optimal. [1/n]
8
23
221
@weijie444
Weijie Su
1 day
Here are the details https://t.co/udkWWKkSTx
0
0
0
@TennisTV
Tennis TV
1 day
The top under-21's in the world assemble in Jeddah for the Next Gen ATP Finals. Subscribe to watch live and get ready for a massive 2026 season.
1
2
9
@weijie444
Weijie Su
1 day
Our ICML 2026 Policy for LLM use in Reviewing
@icmlconf
ICML Conference
1 day
Announcing the ICML 2026 policy for LLMs in reviewing! Reviewers and authors both pick either conservative or permissive LLM use, and will be matched accordingly. Importantly: authors on papers who choose conservative must obey the conservative policy as reviewers.
3
3
18
@PennEngAI
Penn Engineering AI
3 days
Doctoral student @HeWeiqing86254 presented his @NeurIPSConf 2025 research on using statistical tests to help detect AI-generated text. This paper was co-authored by Weiqing He, Xiang Li & Tianqi Shang, along with Profs. @lishenlc, @weijie444 & @DrQiLong. https://t.co/pVUqk8URpJ
0
3
7
@weijie444
Weijie Su
8 days
Of course, citation analysis is tricky. There are many confounders: • Early arXiv visibility 🗓️ • Author fame 🌟 • "Hot" topics 🔥 We tried our best to control for these factors. However, given that top-ranked papers consistently receive ~2x the citations of lower-ranked
0
1
1
@Wayfair
Wayfair.com
17 days
Our favorite piping hot hosting hack (literally) ♨️
22
120
2K
@weijie444
Weijie Su
8 days
Main results of *How to Find Fantastic AI Papers: Self-Rankings as a Powerful Predictor of Scientific Impact Beyond Peer Review* are shown in Figure 2: We grouped papers by how authors privately ranked them. Analysis based on our ICML 2023 ranking experiment
0
1
2
@weijie444
Weijie Su
8 days
Just landed in SD for #NeurIPS2025. With 5K accepted papers, how to find *Fantastic* AI papers? Solution: ask the authors to rank their own papers Results: Papers ranked #1 by authors received 2x more citations than those they ranked last Paper: https://t.co/oboZXfjdEC
5
8
64
@weijie444
Weijie Su
10 days
A bit of tech details: We want to maximize the weighted sum (\sum_t \text{freq}(t)\times |t|). This leads to viewing tokenization as a **graph-partitioning problem**, where characters form a weighted graph and merges correspond to partitions that maximize this objective.
0
1
12
@weijie444
Weijie Su
10 days
Here are the comparisons between the Length-MAX tokenizer and BPE:
2
1
9
@TakeProfitLLC
TakeProfitTrader
4 days
Futures Traders: Now you can get 30% off for life, never pay an activation fee, and get 100% initial test refunds with your first withdrawal! Use code NOFEE100 to claim this limited time offer.
0
8
53
@weijie444
Weijie Su
10 days
A new tokenizer is introduced for LLMs: https://t.co/Zuerv1jsZ4 Idea: Instead of merging tokens by frequency (BPE), optimize the tokenizer directly for maximizing average token length, yielding longer, more efficient tokens. Results: 14–18% fewer tokens, faster training &
15
68
453
@zhun_deng
Zhun Deng
11 days
Will present our recent work at NeurIPS with wonderful students and faculty @lihua_lei_stat on inference under data feedback loops https://t.co/f8ElEKSslE, characterizing the exact limit distribution of repeated training without non-asymptotic error compounding.
0
3
11
@weijie444
Weijie Su
11 days
A special issue on stats and AI.
@SLADS_Journal
Statistical Learning and Data Science
25 days
🚀 Call for Papers: Special Issue on Statistics and AI Journal: Statistical Learning and Data Science (SLADS) 📅 Submission Deadline: March 31, 2026 👨‍🎨 Guest Editors: Xiaowu Dai (UCLA), Weijie Su (UPenn), Linglong Kong (UAlberta), Zhihua Zhang (PKU)
0
1
8
@weijie444
Weijie Su
12 days
Heading to SD for #NeurIPS2025 from Dec 4 to 7. Happy to meet and chat.
4
4
54
@MagnesiacoreMan
MagnesiacoreGuy
2 months
Magnesiacore. Indoor Pool Drywall and Ceilings is a True Marine Wallboard. When it comes to indoor pools, natatoriums, or other enclosed spaces with high humidity and the moist chemical exposure it entails, you need a construction board that can withstand these elements.
19
39
227
@weijie444
Weijie Su
14 days
You Are the Best Reviewer of Your Own Papers
@sainingxie
Saining Xie
15 days
it may seem like an ordinary day, but it could become the strangest moment in peer review and open science please please please treat our community with care. it’s already so fragile. don’t let it die.
0
2
36
@Aaroth
Aaron Roth
18 days
Honored to follow in the footsteps of so many other great researchers at Penn that I admire.
@PennEngineers
Penn Engineering
18 days
Congratulations to Aaron Roth (@Aaroth), the Henry Salvatori Professor of Computer & Cognitive Science (@cis_penn), for receiving the 2025-26 George H. Heilmeier Faculty Award for Excellence in Research. Roth has been recognized for his fundamental contributions "to formalizing,
12
5
131
@jasondeanlee
Jason Lee
24 days
So a week ago, I was complaining gpt 5 doesn't write latex. Gemini 3 is much worse. Basically nothing renders
129
29
879
@chenyx04
Yuxin Chen
1 month
Two super talented student collaborators @yuhuang42 and @Zixin_Wen developed new theory for length-generalizable CoT reasoning!
@yuhuang42
Yu Huang
1 month
Excited to share our recent work! We provide a mechanistic understanding of long CoT reasoning in state-tracking: when do transformers length-generalize strongly, when they stall, and how recursive self-training pushes the boundary. 🧵(1/8)
0
2
33
@MidnightNtwrk
Midnight
2 days
The history of the world is the history of secrets. For thousands of years, we’ve chased one question: how do we protect what matters? Step into History of Secrets and see how we got from ancient ciphers to today’s digital world. 🕛 Learn more about each era.
0
0
5
@weijie444
Weijie Su
1 month
We're excited to announce the call for papers for #ICML 2026: https://t.co/RDT3zVZDYX See you in Seoul next summer!
@icmlconf
ICML Conference
1 month
🎉ICML 2026 Call for Papers (& Position Papers) has arrived!🎉 A few key changes this year: - Attendance for authors of accepted papers is optional - Originally submitted version of accepted papers will be made public - Cap on # of papers one can be reciprocal reviewer for ...
0
3
21
@weijie444
Weijie Su
1 month
This suggests that one really should treat the gradient as a matrix for deep learning optimization, and Muon is effective. However, the 'ultimate' optimal method should not involve exact orthogonalization. [5/n]
1
0
5
@weijie444
Weijie Su
1 month
Theorem 2: when the curvature has a kink ('takes off' suddenly), then matrix orthogonalization is OPTIMAL! This suggests Muon is optimal by assuming an extreme-case of the curvature. Vice versa, a kink is necessary: if orthogonalization is optimal, then the curvature must take
0
0
8