weijie444 Profile Banner
Weijie Su Profile
Weijie Su

@weijie444

Followers
6K
Following
4K
Media
55
Statuses
868

Associate Professor @Wharton & CS Penn. coDir @Penn Research #MachineLearning. PhD @Stanford. #MachineLearng #DeepLearning #Statistics #Privacy #Optimization.

Philadelphia, PA
Joined September 2011
Don't wanna be here? Send us removal request.
@weijie444
Weijie Su
4 days
Back from NeurIPS with one fun observation: Two different communities--optimization and deep learning theory--were both talking about Muon (aka spectral GD). https://t.co/YEG65gGLqG Lots of emerging (and sometimes contradictory) takes. I’ve got two perspectives sketched in the
@weijie444
Weijie Su
1 month
Why and how does gradient/matrix orthogonalization work in Muon for training #LLMs? We introduce an isotropic curvature model to explain it. Take-aways: 1. Orthogonalization is a good idea, "on the right track". 2. But it might not be optimal. [1/n]
8
24
224
@haifengxu0
Haifeng Xu
1 day
Looking forward to the talk and conversations!
@coop_ai
Cooperative AI Foundation
3 days
Don't miss our last seminar of the year: 'The Interplay of Economic Thinking and Language Models: Vignettes and Lessons', live 18th of December (5pm GMT, 9am PT, 12pm ET) led by @haifengxu0 (@UChicago). Link below.
0
1
9
@premium
Premium
4 months
Why guess when you can know?
0
711
8K
@icmlconf
ICML Conference
2 days
📢 Call for workshop proposals for #ICML2026 in Seoul! 🇰🇷 📆Deadline: February 13, 2026 New this year: - At most 8 organizers per workshop - Organizers must declare if they're an organizer on another workshop proposal - Stricter enforcement of the proposal page limit
3
21
137
@weijie444
Weijie Su
3 days
Here are the details https://t.co/udkWWKkSTx
0
0
0
@weijie444
Weijie Su
3 days
Our ICML 2026 Policy for LLM use in Reviewing
@icmlconf
ICML Conference
3 days
Announcing the ICML 2026 policy for LLMs in reviewing! Reviewers and authors both pick either conservative or permissive LLM use, and will be matched accordingly. Importantly: authors on papers who choose conservative must obey the conservative policy as reviewers.
3
3
18
@prageru
PragerU
2 months
How did illegal vapes from China flood our neighborhoods and schools?
41
268
808
@PennEngAI
Penn Engineering AI
5 days
Doctoral student @HeWeiqing86254 presented his @NeurIPSConf 2025 research on using statistical tests to help detect AI-generated text. This paper was co-authored by Weiqing He, Xiang Li & Tianqi Shang, along with Profs. @lishenlc, @weijie444 & @DrQiLong. https://t.co/pVUqk8URpJ
0
3
7
@weijie444
Weijie Su
10 days
Of course, citation analysis is tricky. There are many confounders: • Early arXiv visibility 🗓️ • Author fame 🌟 • "Hot" topics 🔥 We tried our best to control for these factors. However, given that top-ranked papers consistently receive ~2x the citations of lower-ranked
0
1
1
@weijie444
Weijie Su
10 days
Main results of *How to Find Fantastic AI Papers: Self-Rankings as a Powerful Predictor of Scientific Impact Beyond Peer Review* are shown in Figure 2: We grouped papers by how authors privately ranked them. Analysis based on our ICML 2023 ranking experiment
0
1
2
@weijie444
Weijie Su
10 days
Just landed in SD for #NeurIPS2025. With 5K accepted papers, how to find *Fantastic* AI papers? Solution: ask the authors to rank their own papers Results: Papers ranked #1 by authors received 2x more citations than those they ranked last Paper: https://t.co/oboZXfjdEC
5
8
65
@chainlink
Chainlink
11 days
Every financial market speaks a different language. But Chainlink understands them all.
0
0
5
@weijie444
Weijie Su
12 days
A bit of tech details: We want to maximize the weighted sum (\sum_t \text{freq}(t)\times |t|). This leads to viewing tokenization as a **graph-partitioning problem**, where characters form a weighted graph and merges correspond to partitions that maximize this objective.
0
1
12
@weijie444
Weijie Su
12 days
Here are the comparisons between the Length-MAX tokenizer and BPE:
2
1
9
@weijie444
Weijie Su
12 days
A new tokenizer is introduced for LLMs: https://t.co/Zuerv1jsZ4 Idea: Instead of merging tokens by frequency (BPE), optimize the tokenizer directly for maximizing average token length, yielding longer, more efficient tokens. Results: 14–18% fewer tokens, faster training &
15
68
454
@zhun_deng
Zhun Deng
13 days
Will present our recent work at NeurIPS with wonderful students and faculty @lihua_lei_stat on inference under data feedback loops https://t.co/f8ElEKSslE, characterizing the exact limit distribution of repeated training without non-asymptotic error compounding.
0
3
11
@denofwolvesgame
Den of Wolves
9 days
The plan was clean. The escape? Not so much. Midway City doesn’t play fair. Wishlist this 4-player coop heist FPS on Steam today.
10
15
294
@weijie444
Weijie Su
13 days
A special issue on stats and AI.
@SLADS_Journal
Statistical Learning and Data Science
27 days
🚀 Call for Papers: Special Issue on Statistics and AI Journal: Statistical Learning and Data Science (SLADS) 📅 Submission Deadline: March 31, 2026 👨‍🎨 Guest Editors: Xiaowu Dai (UCLA), Weijie Su (UPenn), Linglong Kong (UAlberta), Zhihua Zhang (PKU)
0
1
8
@weijie444
Weijie Su
14 days
Heading to SD for #NeurIPS2025 from Dec 4 to 7. Happy to meet and chat.
4
4
54
@weijie444
Weijie Su
16 days
You Are the Best Reviewer of Your Own Papers
@sainingxie
Saining Xie
17 days
it may seem like an ordinary day, but it could become the strangest moment in peer review and open science please please please treat our community with care. it’s already so fragile. don’t let it die.
0
3
37
@Aaroth
Aaron Roth
20 days
Honored to follow in the footsteps of so many other great researchers at Penn that I admire.
@PennEngineers
Penn Engineering
20 days
Congratulations to Aaron Roth (@Aaroth), the Henry Salvatori Professor of Computer & Cognitive Science (@cis_penn), for receiving the 2025-26 George H. Heilmeier Faculty Award for Excellence in Research. Roth has been recognized for his fundamental contributions "to formalizing,
12
5
131
@Chevron
Chevron
30 days
AI is reshaping energy demand, and we’re ready. Our Chairman & CEO Mike Wirth shares how we’re powering the future.
14
18
155
@jasondeanlee
Jason Lee
26 days
So a week ago, I was complaining gpt 5 doesn't write latex. Gemini 3 is much worse. Basically nothing renders
129
29
880
@chenyx04
Yuxin Chen
1 month
Two super talented student collaborators @yuhuang42 and @Zixin_Wen developed new theory for length-generalizable CoT reasoning!
@yuhuang42
Yu Huang
1 month
Excited to share our recent work! We provide a mechanistic understanding of long CoT reasoning in state-tracking: when do transformers length-generalize strongly, when they stall, and how recursive self-training pushes the boundary. 🧵(1/8)
0
2
33
@weijie444
Weijie Su
1 month
We're excited to announce the call for papers for #ICML 2026: https://t.co/RDT3zVZDYX See you in Seoul next summer!
@icmlconf
ICML Conference
1 month
🎉ICML 2026 Call for Papers (& Position Papers) has arrived!🎉 A few key changes this year: - Attendance for authors of accepted papers is optional - Originally submitted version of accepted papers will be made public - Cap on # of papers one can be reciprocal reviewer for ...
0
3
21