Gin Jiang @ZhiyingJ X Profile

Gin Jiang

@ZhiyingJ

Followers

1K

Following

827

Media

12

Statuses

146

Figuring out decentralized superintelligence @bageldotcom; Building https://t.co/CXreMsUCGO, prev CS PhD @UWaterloo; Interested in 🤖🧠🍞🎮; Irrational exuberance 🖖

https://t.co/JeioNfIY6M

Joined April 2020

Don't wanna be here? Send us removal request.

Gin Jiang

@ZhiyingJ

2 years

I'm surprised to see gzip has received that much attention😂 I'd like to make some clarifications in case this paper delivers any incorrect messages: 🧵1/8

3

121

849

Gin Jiang

@ZhiyingJ

26 days

Excited to share what I’ve been working on for the past two months - decentralized diffusion models pre-trained entirely in isolation. They outperform monolithic training under the same conditions and reach comparable FID to the DDM paper using 14x less data and 16x less compute!

bagel.com

@bageldotcom

26 days

Introducing Paris - world's first decentralized trained open-weight diffusion model. We named it Paris after the city that has always been a refuge for those creating without permission. Paris is open for research and commercial use.

1

4

9

Gin Jiang

@ZhiyingJ

6 months

this and beta-vae were my two favorite papers in year 2017

Rohan Pandey

@khoomeik

6 months

someone should probably retry all those late 2010s deep RL ideas to see if they work on LLMs

0

1

AntonioFR

@antferrui

8 months

Excited to share our latest story! We found disentangled memory representations in the hippocampus that generalized across time and environments, despite the seemingly random drift and remapping of single cells. This code enabled the transfer of prior knowledge to solve new tasks

16

163

1K

Thomas Wolf

@Thom_Wolf

8 months

I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won't give us a "compressed 21st century". The "compressed 21st century" comes from Dario's "Machine of Loving Grace" and if you haven’t read it, you probably

280

505

2K

N8 Programs

@N8Programs

9 months

reading a deepseek paper and stumbled upon a very beautiful formula where they unify SFT and MOST RL TYPES (DPO, PPO, GRPO, etc.) into ONE FORMULA* *that requires additional reward functions to be defined. But the fundamental insight - that all these training methods can be

31

424

3K

Awni Hannun

@awnihannun

9 months

DeepSeek R1 (the full 680B model) runs nicely in higher quality 4-bit on 3 M2 Ultras with MLX. Asked it a coding question and it thought for ~2k tokens and generated 3500 tokens overall:

156

561

6K

Gin Jiang

@ZhiyingJ

9 months

Also being intellectually honest is a habit to have if you want to maximize your growth rate, do great research, and/or build a healthy culture in an organization

Paul Graham

@paulg

9 months

If you're inexperienced, don't try to pretend you're not. It will fool no one and make you look ridiculous. Instead just be openly curious. This will seem natural instead of awkward, and you'll learn a lot more.

0

1

Elyas Obbad

@ObbadElyas

1 year

🚨 What’s the best way to select data for fine-tuning LLMs effectively? 📢Introducing ZIP-FIT—a compression-based data selection framework that outperforms leading baselines, achieving up to 85% faster convergence in cross-entropy loss, and selects data up to 65% faster. 🧵1/8

10

44

246

Gin Jiang

@ZhiyingJ

1 year

This paper answers one of the questions that I'm constantly wondering, thanks!

Hoyeon Chang

@hoyeon_chang

1 year

🚨 New paper 🚨 How Large Language Models Acquire Factual Knowledge During Pretraining? I’m thrilled to announce the release of my new paper! 🎉 This research explores how LLMs acquire and retain factual knowledge during pretraining. Here are some key insights:

0

4

Gin Jiang

@ZhiyingJ

1 year

I’ve been really excited about eureka labs. Although I hold a bit different AI+education thesis from Andrej’s, I do believe given proper tools in education humans’ capabilities can be improved more than 2 standard deviations.

Andrej Karpathy

@karpathy

1 year

Thank you @saranormous and @eladgil for hosting me on the @NoPriorsPod pod, pleasure to talk with you (as always!)

0

1

Gin Jiang

@ZhiyingJ

1 year

find it amazing that whether it's high-level understanding or building from scratch, @karpathy contributes the best resources

AFAIK.io

@afaikio

1 year

Don't know where to start to learn about LLMs? Answer these questions to figure out your starting point 🗺️ https://t.co/5ybdzML2Y8 #MachineLearning #learningAI #LLMs

0

2

AFAIK.io

@afaikio

1 year

5 insightful books that help you gain deep, theoretical and transdisciplinary understanding of machine learning. #MachineLearning #AI #DeepLearning

0

1

Valeriy M., PhD, MBA, CQF

@predict_addict

1 year

Hidden deep inside the depth of internet, a treasure. #probability

51

667

6K

Gin Jiang

@ZhiyingJ

1 year

The trait of simplicity and symmetry can both be described using "minimum description length", which can be formalized as Kolmogorov complexity. The ability to quantify beauty (to some extent) makes the concept itself a minimum description length, which is recursively beautiful.

Paul Graham

@paulg

1 year

Since there has been a lot of talk about beauty and symmetry lately, here's something I wrote about the subject:

1

0

3

Gin Jiang

@ZhiyingJ

1 year

This tutorial is concise, canonical, and easy-to-understand. Highly recommend it to people who are new to information theory!

James V Stone

@jgvfwstone

1 year

Information Theory: A Tutorial Introduction https://t.co/wVuGlWrvgI Shannon's mathematical theory of communication defines fundamental limits on how much information can be transmitted between the different components of any system. This paper is an introduction to the main ideas

0

1

3

Andrew Saxe

@SaxeLab

1 year

Excited to share new work @icmlconf by Loek van Rossem exploring universal aspects of representation learning Why is it that large, complex models often learn similar representations? And why might these be similar to the brain? How can we understand this theoretically? (1/11)

2

68

315

Gin Jiang

@ZhiyingJ

1 year

Interesting challenge, Hutter prize in this era

varepsilon

@var_epsilon

1 year

this looks interesting

0

Gin Jiang

@ZhiyingJ

1 year

Really excited about this paper! The possibility unlocked by attribution analysis + feature steering will inspire tons of interesting works! Also I’m curious to learn whether beta-vae could be a better choice for learning disentangled representations?

Emmanuel Ameisen

@mlpowered

1 year

Today, we announced that we’ve gotten dictionary learning working on Sonnet, extracting millions of features from one of the best models in the world. This is the first time this has been successfully done on a frontier model. I wanted to share some highlights 🧵

0

1

Gin Jiang

@ZhiyingJ

2 years

Such a great idea! It's theoretically beautiful and empirically useful. I feel many great ideas are transdisciplinary (mostly physics x maths x cs x biology). Also, since Kolmogorov complexity and KART have neural network incarnations, curious to know which one will be the next👀

Ziming Liu

@ZimingLiu11

2 years

MLPs are so foundational, but are there alternatives? MLPs place activation functions on neurons, but can we instead place (learnable) activation functions on weights? Yes, we KAN! We propose Kolmogorov-Arnold Networks (KAN), which are more accurate and interpretable than MLPs.🧵

0

2

Gin Jiang

@ZhiyingJ

2 years

GEB is the first book I read that explicitly conveying that the knowledge is an interconnected network and art and science is definitely not mutually exclusive

Forth ❤️‍🔥

@forthrighter

2 years

This is Douglas Hofstadter, professor of Cognitive & Computer Science at Indiana University Bloomington Best known for his book Goedel, Escher, Bach (1980) this is nerd 🧵 about his influence on me, and the present

0

2