Weijie Su
@weijie444
Followers
6K
Following
4K
Media
55
Statuses
868
Associate Professor @Wharton & CS Penn. coDir @Penn Research #MachineLearning. PhD @Stanford. #MachineLearng #DeepLearning #Statistics #Privacy #Optimization.
Philadelphia, PA
Joined September 2011
Back from NeurIPS with one fun observation: Two different communities--optimization and deep learning theory--were both talking about Muon (aka spectral GD). https://t.co/YEG65gGLqG Lots of emerging (and sometimes contradictory) takes. I’ve got two perspectives sketched in the
Why and how does gradient/matrix orthogonalization work in Muon for training #LLMs? We introduce an isotropic curvature model to explain it. Take-aways: 1. Orthogonalization is a good idea, "on the right track". 2. But it might not be optimal. [1/n]
8
24
224
Looking forward to the talk and conversations!
Don't miss our last seminar of the year: 'The Interplay of Economic Thinking and Language Models: Vignettes and Lessons', live 18th of December (5pm GMT, 9am PT, 12pm ET) led by @haifengxu0 (@UChicago). Link below.
0
1
9
📢 Call for workshop proposals for #ICML2026 in Seoul! 🇰🇷 📆Deadline: February 13, 2026 New this year: - At most 8 organizers per workshop - Organizers must declare if they're an organizer on another workshop proposal - Stricter enforcement of the proposal page limit
3
21
137
Our ICML 2026 Policy for LLM use in Reviewing
Announcing the ICML 2026 policy for LLMs in reviewing! Reviewers and authors both pick either conservative or permissive LLM use, and will be matched accordingly. Importantly: authors on papers who choose conservative must obey the conservative policy as reviewers.
3
3
18
How did illegal vapes from China flood our neighborhoods and schools?
41
268
808
Doctoral student @HeWeiqing86254 presented his @NeurIPSConf 2025 research on using statistical tests to help detect AI-generated text. This paper was co-authored by Weiqing He, Xiang Li & Tianqi Shang, along with Profs. @lishenlc, @weijie444 & @DrQiLong. https://t.co/pVUqk8URpJ
0
3
7
Of course, citation analysis is tricky. There are many confounders: • Early arXiv visibility 🗓️ • Author fame 🌟 • "Hot" topics 🔥 We tried our best to control for these factors. However, given that top-ranked papers consistently receive ~2x the citations of lower-ranked
0
1
1
Main results of *How to Find Fantastic AI Papers: Self-Rankings as a Powerful Predictor of Scientific Impact Beyond Peer Review* are shown in Figure 2: We grouped papers by how authors privately ranked them. Analysis based on our ICML 2023 ranking experiment
0
1
2
Just landed in SD for #NeurIPS2025. With 5K accepted papers, how to find *Fantastic* AI papers? Solution: ask the authors to rank their own papers Results: Papers ranked #1 by authors received 2x more citations than those they ranked last Paper: https://t.co/oboZXfjdEC
5
8
65
Every financial market speaks a different language. But Chainlink understands them all.
0
0
5
A bit of tech details: We want to maximize the weighted sum (\sum_t \text{freq}(t)\times |t|). This leads to viewing tokenization as a **graph-partitioning problem**, where characters form a weighted graph and merges correspond to partitions that maximize this objective.
0
1
12
Here are the comparisons between the Length-MAX tokenizer and BPE:
2
1
9
A new tokenizer is introduced for LLMs: https://t.co/Zuerv1jsZ4 Idea: Instead of merging tokens by frequency (BPE), optimize the tokenizer directly for maximizing average token length, yielding longer, more efficient tokens. Results: 14–18% fewer tokens, faster training &
15
68
454
Will present our recent work at NeurIPS with wonderful students and faculty @lihua_lei_stat on inference under data feedback loops https://t.co/f8ElEKSslE, characterizing the exact limit distribution of repeated training without non-asymptotic error compounding.
0
3
11
The plan was clean. The escape? Not so much. Midway City doesn’t play fair. Wishlist this 4-player coop heist FPS on Steam today.
10
15
294
Heading to SD for #NeurIPS2025 from Dec 4 to 7. Happy to meet and chat.
4
4
54
Honored to follow in the footsteps of so many other great researchers at Penn that I admire.
Congratulations to Aaron Roth (@Aaroth), the Henry Salvatori Professor of Computer & Cognitive Science (@cis_penn), for receiving the 2025-26 George H. Heilmeier Faculty Award for Excellence in Research. Roth has been recognized for his fundamental contributions "to formalizing,
12
5
131
AI is reshaping energy demand, and we’re ready. Our Chairman & CEO Mike Wirth shares how we’re powering the future.
14
18
155
So a week ago, I was complaining gpt 5 doesn't write latex. Gemini 3 is much worse. Basically nothing renders
129
29
880
Two super talented student collaborators @yuhuang42 and @Zixin_Wen developed new theory for length-generalizable CoT reasoning!
Excited to share our recent work! We provide a mechanistic understanding of long CoT reasoning in state-tracking: when do transformers length-generalize strongly, when they stall, and how recursive self-training pushes the boundary. 🧵(1/8)
0
2
33
We're excited to announce the call for papers for #ICML 2026: https://t.co/RDT3zVZDYX See you in Seoul next summer!
🎉ICML 2026 Call for Papers (& Position Papers) has arrived!🎉 A few key changes this year: - Attendance for authors of accepted papers is optional - Originally submitted version of accepted papers will be made public - Cap on # of papers one can be reciprocal reviewer for ...
0
3
21