ZhiyingJ Profile Banner
Gin Jiang Profile
Gin Jiang

@ZhiyingJ

Followers
1K
Following
827
Media
12
Statuses
146

Figuring out decentralized superintelligence @bageldotcom; Building https://t.co/CXreMsUCGO, prev CS PhD @UWaterloo; Interested in 🤖🧠🍞🎮; Irrational exuberance 🖖

Joined April 2020
Don't wanna be here? Send us removal request.
@ZhiyingJ
Gin Jiang
2 years
I'm surprised to see gzip has received that much attention😂 I'd like to make some clarifications in case this paper delivers any incorrect messages: 🧵1/8
3
121
849
@ZhiyingJ
Gin Jiang
26 days
Excited to share what I’ve been working on for the past two months - decentralized diffusion models pre-trained entirely in isolation. They outperform monolithic training under the same conditions and reach comparable FID to the DDM paper using 14x less data and 16x less compute!
@bageldotcom
bagel.com
26 days
Introducing Paris - world's first decentralized trained open-weight diffusion model. We named it Paris after the city that has always been a refuge for those creating without permission. Paris is open for research and commercial use.
1
4
9
@ZhiyingJ
Gin Jiang
6 months
this and beta-vae were my two favorite papers in year 2017
@khoomeik
Rohan Pandey
6 months
someone should probably retry all those late 2010s deep RL ideas to see if they work on LLMs
0
0
1
@antferrui
AntonioFR
8 months
Excited to share our latest story! We found disentangled memory representations in the hippocampus that generalized across time and environments, despite the seemingly random drift and remapping of single cells. This code enabled the transfer of prior knowledge to solve new tasks
16
163
1K
@Thom_Wolf
Thomas Wolf
8 months
I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won't give us a "compressed 21st century". The "compressed 21st century" comes from Dario's "Machine of Loving Grace" and if you haven’t read it, you probably
280
505
2K
@N8Programs
N8 Programs
9 months
reading a deepseek paper and stumbled upon a very beautiful formula where they unify SFT and MOST RL TYPES (DPO, PPO, GRPO, etc.) into ONE FORMULA* *that requires additional reward functions to be defined. But the fundamental insight - that all these training methods can be
31
424
3K
@awnihannun
Awni Hannun
9 months
DeepSeek R1 (the full 680B model) runs nicely in higher quality 4-bit on 3 M2 Ultras with MLX. Asked it a coding question and it thought for ~2k tokens and generated 3500 tokens overall:
156
561
6K
@ZhiyingJ
Gin Jiang
9 months
Also being intellectually honest is a habit to have if you want to maximize your growth rate, do great research, and/or build a healthy culture in an organization
@paulg
Paul Graham
9 months
If you're inexperienced, don't try to pretend you're not. It will fool no one and make you look ridiculous. Instead just be openly curious. This will seem natural instead of awkward, and you'll learn a lot more.
0
0
1
@ObbadElyas
Elyas Obbad
1 year
🚨 What’s the best way to select data for fine-tuning LLMs effectively? 📢Introducing ZIP-FIT—a compression-based data selection framework that outperforms leading baselines, achieving up to 85% faster convergence in cross-entropy loss, and selects data up to 65% faster. 🧵1/8
10
44
246
@ZhiyingJ
Gin Jiang
1 year
This paper answers one of the questions that I'm constantly wondering, thanks!
@hoyeon_chang
Hoyeon Chang
1 year
🚨 New paper 🚨 How Large Language Models Acquire Factual Knowledge During Pretraining? I’m thrilled to announce the release of my new paper! 🎉 This research explores how LLMs acquire and retain factual knowledge during pretraining. Here are some key insights:
0
0
4
@ZhiyingJ
Gin Jiang
1 year
I’ve been really excited about eureka labs. Although I hold a bit different AI+education thesis from Andrej’s, I do believe given proper tools in education humans’ capabilities can be improved more than 2 standard deviations.
@karpathy
Andrej Karpathy
1 year
Thank you @saranormous and @eladgil for hosting me on the @NoPriorsPod pod, pleasure to talk with you (as always!)
0
0
1
@ZhiyingJ
Gin Jiang
1 year
find it amazing that whether it's high-level understanding or building from scratch, @karpathy contributes the best resources
@afaikio
AFAIK.io
1 year
Don't know where to start to learn about LLMs? Answer these questions to figure out your starting point 🗺️ https://t.co/5ybdzML2Y8 #MachineLearning #learningAI #LLMs
0
0
2
@afaikio
AFAIK.io
1 year
5 insightful books that help you gain deep, theoretical and transdisciplinary understanding of machine learning. #MachineLearning #AI #DeepLearning
0
1
1
@predict_addict
Valeriy M., PhD, MBA, CQF
1 year
Hidden deep inside the depth of internet, a treasure. #probability
51
667
6K
@ZhiyingJ
Gin Jiang
1 year
The trait of simplicity and symmetry can both be described using "minimum description length", which can be formalized as Kolmogorov complexity. The ability to quantify beauty (to some extent) makes the concept itself a minimum description length, which is recursively beautiful.
@paulg
Paul Graham
1 year
Since there has been a lot of talk about beauty and symmetry lately, here's something I wrote about the subject:
1
0
3
@ZhiyingJ
Gin Jiang
1 year
This tutorial is concise, canonical, and easy-to-understand. Highly recommend it to people who are new to information theory!
@jgvfwstone
James V Stone
1 year
Information Theory: A Tutorial Introduction https://t.co/wVuGlWrvgI Shannon's mathematical theory of communication defines fundamental limits on how much information can be transmitted between the different components of any system. This paper is an introduction to the main ideas
0
1
3
@SaxeLab
Andrew Saxe
1 year
Excited to share new work @icmlconf by Loek van Rossem exploring universal aspects of representation learning Why is it that large, complex models often learn similar representations? And why might these be similar to the brain? How can we understand this theoretically? (1/11)
2
68
315
@ZhiyingJ
Gin Jiang
1 year
Interesting challenge, Hutter prize in this era
@var_epsilon
varepsilon
1 year
this looks interesting
0
0
0
@ZhiyingJ
Gin Jiang
1 year
Really excited about this paper! The possibility unlocked by attribution analysis + feature steering will inspire tons of interesting works! Also I’m curious to learn whether beta-vae could be a better choice for learning disentangled representations?
@mlpowered
Emmanuel Ameisen
1 year
Today, we announced that we’ve gotten dictionary learning working on Sonnet, extracting millions of features from one of the best models in the world. This is the first time this has been successfully done on a frontier model. I wanted to share some highlights 🧵
0
0
1
@ZhiyingJ
Gin Jiang
2 years
Such a great idea! It's theoretically beautiful and empirically useful. I feel many great ideas are transdisciplinary (mostly physics x maths x cs x biology). Also, since Kolmogorov complexity and KART have neural network incarnations, curious to know which one will be the next👀
@ZimingLiu11
Ziming Liu
2 years
MLPs are so foundational, but are there alternatives? MLPs place activation functions on neurons, but can we instead place (learnable) activation functions on weights? Yes, we KAN! We propose Kolmogorov-Arnold Networks (KAN), which are more accurate and interpretable than MLPs.🧵
0
0
2
@ZhiyingJ
Gin Jiang
2 years
GEB is the first book I read that explicitly conveying that the knowledge is an interconnected network and art and science is definitely not mutually exclusive
@forthrighter
Forth ❤️‍🔥
2 years
This is Douglas Hofstadter, professor of Cognitive & Computer Science at Indiana University Bloomington Best known for his book Goedel, Escher, Bach (1980) this is nerd 🧵 about his influence on me, and the present
0
0
2