Derek Lim
@dereklim_lzh
Followers
4K
Following
3K
Media
61
Statuses
291
Post-Training @OpenAI
San Francisco
Joined June 2020
Our new paper covers: Neural nets on eigenvectors / eigenspaces, transformers on graphs, universal approximation of invariant functions, graph positional encodings, generalizing spectral graph neural networks, and more!! Thread about SignNet and BasisNet: 1/9
6
124
703
another tale of RL IRL: 2 = off-policy pretrain / sft maxxer 3 = on-policy RL maxxer
In freshman calc, the 3 highest grades going into the final were exempt Last day of class prof announced the names to the class. I was obviously 1, 2nd was a girl who never raised her hand, 3rd was a guy who constantly raised his hand and got many wrong. I think ab him a lot
1
0
4
the problems were pretty accessible but hard to make progress on, and I learned a lot from Prof Johnson + his books (also check out “Topics in Matrix Analysis”). since it turns out that matrices happen to be quite useful for AI stuff, this was very useful for future me as well :)
0
0
2
I was lucky enough to do matrix analysis research with Prof Johnson for one summer in undergrad, he even signed my copy of the book! there are a suprising amount of open problems in matrices to this day, the problem we worked on is still open.
So apparently this 600 pages book has all the niche shit about matrix that linalg class dont teach you
1
0
5
I joined the post-training team at OpenAI and moved to SF a month ago! excited to be working on improving frontier models
37
12
538
When and why are neural network solutions connected by low-loss paths? In our #ICML2025 paper, we show that mode connectivity often arises from symmetries—transformations of parameters that leave the network’s output unchanged. Paper: https://t.co/bZSO5foYfv (1/6)
11
30
200
Check out our new paper on learning from LLM output signatures: the |tokens| x |vocab|+1 matrix of predicted next-token probs and actual next-token prob. It provably generalizes several existing approaches and is great at hallucination / data contamination detection tasks!
📢 Introducing: Learning on LLM Output Signatures for Gray-box LLM Behavior Analysis [ https://t.co/ixdfiyqNqO] A joint work with @ffabffrasca (co-first author) and our amazing collaborators: @dereklim_lzh @yoav_gelberg @YftahZ @el_yaniv @GalChechik @HaggaiMaron 🧵Thread
0
4
19
What if models could be the data🤔Find out at @iclr_conf #ICLR2025 Join the 1st workshop on Model Weights as a New Data Modality. We're training networks on model weights for a wide variety of tasks. Featuring an amazing lineup of papers & speakers🚀 🗓️Sunday 9-17 📍Topaz 220-225
3
15
88
Boston Symmetry Day is happening TODAY at Northeastern University’s Columbus Place and Alumni Center (716 Columbus Ave, 6th floor)! Breakfast starts at 9 AM, but talks are happening throughout the day, followed by a social. We’ll see you there!
0
3
12
Speakers are confirmed and registration is open for the third Boston Symmetry Day! Come increase the order of the group!
Registration is now open for Boston Symmetry Day on March 31! Sign up by March 21st at https://t.co/kjpiejs0gB We have an exciting lineup of speakers (see our website: https://t.co/2V7RQwloCW ) Also featuring a poster session so you have a chance to present your awesome work!
0
0
14
Save the date -- Boston Symmetry Day 2025 will be held on March 31st, at Northeastern University! Speakers and sponsors to be announced in the coming weeks, but you can expect another great lineup of talks, networking, and posters. We'll see you there!
0
5
16
did you know you've been doing test-time learning this whole time? transformers, SSMs, RNNs, are all test-time regressors but with different design choices we present a unifying framework that derives sequence layers (and higher-order attention👀) from a *single* equation 🧵
6
97
515
Excited to share our new 7B LLM @LiquidAI_ . Strong evals on diverse tasks (including several evals from the synthetic arena that I lead), long context strength at low memory cost, and edge-device / on-prem deployment options for customers. Great work from the team :).
Introducing LFM-7B, our new best-in-class language model in English, Arabic, and Japanese optimized to be the substrate for private enterprise chat, code, fast instruction following, and agentic workflows. 1/
3
8
52
Tune in to GLOW next week for my talk on metanetworks!
🌟 GLOW 2025 kicks off with a super session in January! 🎙️ Hear from our amazing speakers Clayton Sanford and @dereklim_lzh. 🗓️ Jan 15th, 17 CET on Zoom. 🌐 Details & sign-up: https://t.co/kaKuULliya
0
4
29
Workshop organizers / steerers that did lots of heavy lifting: @k_schuerholt @gbouritsas @EliahuHorwitz @yoav_gelberg @BoZhao__ @AllanZhou17 @damianborth @StefanieJegelka @mmbronstein @GalChechik Stella Yu @HaggaiMaron @YHoshen
0
1
10
My collaborators and I have worked on several papers in this direction in recent years, very excited by it: Graph Metanetworks: https://t.co/k3Z30J5Zrq Empirical Impact of Parameter Symmetries: https://t.co/Duv7tjznof Learning on LoRAs:
arxiv.org
Low-rank adaptations (LoRAs) have revolutionized the finetuning of large foundation models, enabling efficient adaptation even with limited computational resources. The resulting proliferation of...
1
1
12
Our new workshop at ICLR 2025: Weight Space Learning: https://t.co/1CiTQXl3G1 Weights are data. We can learn from weights. Learning can outperform human-designed methods for optimization, interpretability, model merging, and more.
weight-space-learning.github.io
Neural Network Weights as a New Data Modality
4
59
341
We raised a $250M Series A led by @AMD Ventures to scale Liquid Foundation Models and accelerate their deployment on-device and at enterprises https://t.co/u37Cv9DVa4
25
49
329
Presenting our paper today (Thursday) at NeurIPS at 11am! East Exhibit Hall A-C #4402 Stop by if you want to learn about our insights on weight space geometry, loss landscapes, model merging etc. Reach out to me if you want to chat about anything else at NeurIPS too!
New version + code for our NeurIPS paper is now out: “The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof.” We study how symmetries in weight-space impact optimization and loss landscape geometry of neural nets, via "counterfactual" NNs w/o symmetries. 1/n
1
8
55