Gin Jiang
@ZhiyingJ
Followers
1K
Following
827
Media
12
Statuses
146
Figuring out decentralized superintelligence @bageldotcom; Building https://t.co/CXreMsUCGO, prev CS PhD @UWaterloo; Interested in 🤖🧠🍞🎮; Irrational exuberance 🖖
Joined April 2020
I'm surprised to see gzip has received that much attention😂 I'd like to make some clarifications in case this paper delivers any incorrect messages: 🧵1/8
3
121
849
Excited to share what I’ve been working on for the past two months - decentralized diffusion models pre-trained entirely in isolation. They outperform monolithic training under the same conditions and reach comparable FID to the DDM paper using 14x less data and 16x less compute!
Introducing Paris - world's first decentralized trained open-weight diffusion model. We named it Paris after the city that has always been a refuge for those creating without permission. Paris is open for research and commercial use.
1
4
9
Excited to share our latest story! We found disentangled memory representations in the hippocampus that generalized across time and environments, despite the seemingly random drift and remapping of single cells. This code enabled the transfer of prior knowledge to solve new tasks
16
163
1K
I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won't give us a "compressed 21st century". The "compressed 21st century" comes from Dario's "Machine of Loving Grace" and if you haven’t read it, you probably
280
505
2K
reading a deepseek paper and stumbled upon a very beautiful formula where they unify SFT and MOST RL TYPES (DPO, PPO, GRPO, etc.) into ONE FORMULA* *that requires additional reward functions to be defined. But the fundamental insight - that all these training methods can be
31
424
3K
DeepSeek R1 (the full 680B model) runs nicely in higher quality 4-bit on 3 M2 Ultras with MLX. Asked it a coding question and it thought for ~2k tokens and generated 3500 tokens overall:
156
561
6K
Also being intellectually honest is a habit to have if you want to maximize your growth rate, do great research, and/or build a healthy culture in an organization
If you're inexperienced, don't try to pretend you're not. It will fool no one and make you look ridiculous. Instead just be openly curious. This will seem natural instead of awkward, and you'll learn a lot more.
0
0
1
🚨 What’s the best way to select data for fine-tuning LLMs effectively? 📢Introducing ZIP-FIT—a compression-based data selection framework that outperforms leading baselines, achieving up to 85% faster convergence in cross-entropy loss, and selects data up to 65% faster. 🧵1/8
10
44
246
This paper answers one of the questions that I'm constantly wondering, thanks!
🚨 New paper 🚨 How Large Language Models Acquire Factual Knowledge During Pretraining? I’m thrilled to announce the release of my new paper! 🎉 This research explores how LLMs acquire and retain factual knowledge during pretraining. Here are some key insights:
0
0
4
I’ve been really excited about eureka labs. Although I hold a bit different AI+education thesis from Andrej’s, I do believe given proper tools in education humans’ capabilities can be improved more than 2 standard deviations.
Thank you @saranormous and @eladgil for hosting me on the @NoPriorsPod pod, pleasure to talk with you (as always!)
0
0
1
find it amazing that whether it's high-level understanding or building from scratch, @karpathy contributes the best resources
Don't know where to start to learn about LLMs? Answer these questions to figure out your starting point 🗺️ https://t.co/5ybdzML2Y8
#MachineLearning #learningAI #LLMs
0
0
2
5 insightful books that help you gain deep, theoretical and transdisciplinary understanding of machine learning. #MachineLearning #AI #DeepLearning
0
1
1
The trait of simplicity and symmetry can both be described using "minimum description length", which can be formalized as Kolmogorov complexity. The ability to quantify beauty (to some extent) makes the concept itself a minimum description length, which is recursively beautiful.
Since there has been a lot of talk about beauty and symmetry lately, here's something I wrote about the subject:
1
0
3
This tutorial is concise, canonical, and easy-to-understand. Highly recommend it to people who are new to information theory!
Information Theory: A Tutorial Introduction https://t.co/wVuGlWrvgI Shannon's mathematical theory of communication defines fundamental limits on how much information can be transmitted between the different components of any system. This paper is an introduction to the main ideas
0
1
3
Excited to share new work @icmlconf by Loek van Rossem exploring universal aspects of representation learning Why is it that large, complex models often learn similar representations? And why might these be similar to the brain? How can we understand this theoretically? (1/11)
2
68
315
Interesting challenge, Hutter prize in this era
0
0
0
Really excited about this paper! The possibility unlocked by attribution analysis + feature steering will inspire tons of interesting works! Also I’m curious to learn whether beta-vae could be a better choice for learning disentangled representations?
Today, we announced that we’ve gotten dictionary learning working on Sonnet, extracting millions of features from one of the best models in the world. This is the first time this has been successfully done on a frontier model. I wanted to share some highlights 🧵
0
0
1
Such a great idea! It's theoretically beautiful and empirically useful. I feel many great ideas are transdisciplinary (mostly physics x maths x cs x biology). Also, since Kolmogorov complexity and KART have neural network incarnations, curious to know which one will be the next👀
MLPs are so foundational, but are there alternatives? MLPs place activation functions on neurons, but can we instead place (learnable) activation functions on weights? Yes, we KAN! We propose Kolmogorov-Arnold Networks (KAN), which are more accurate and interpretable than MLPs.🧵
0
0
2
GEB is the first book I read that explicitly conveying that the knowledge is an interconnected network and art and science is definitely not mutually exclusive
This is Douglas Hofstadter, professor of Cognitive & Computer Science at Indiana University Bloomington Best known for his book Goedel, Escher, Bach (1980) this is nerd 🧵 about his influence on me, and the present
0
0
2