You Jiacheng @YouJiacheng X Profile

You Jiacheng

@YouJiacheng

Followers

9K

Following

17K

Media

2K

Statuses

12K

a big fan of TileLang 关注TileLang喵！关注TileLang谢谢喵！ https://t.co/utshC0jrCO 十年老粉

https://t.co/0C0N7rv3Ei

Joined August 2015

Don't wanna be here? Send us removal request.

You Jiacheng

@YouJiacheng

2 months

I think an intrinsic property of "fast weight" is that different tokens with different contexts will see different weights. In this sense, MoE is a special case of "fast weight".

2

1

60

You Jiacheng

@YouJiacheng

20 hours

Kun

Lincoln 🇿🇦

@Presidentlin

2 days

TIL that people pronounce Qwen as Q-Wen. I say it as Gwen, but with a Q. I'm not changing.

1

0

7

You Jiacheng

@YouJiacheng

20 hours

China × California √

China Perspective

@China_Fact

3 days

🇨🇳#China’s intelligent port operates with high efficiency At China’s smart port, autonomous transport vehicles move in an orderly and tireless manner. https://t.co/xBDxgruRUS

1

0

22

You Jiacheng

@YouJiacheng

22 hours

a well organized and information rich thread!

Saining Xie

@sainingxie

1 day

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

0

2

23

Yilun Du

@du_yilun

2 days

Introducing Geometry-aware Policy Imitation (GPI)! GPI constructs an energy landscape over the state space using demonstrations. A policy acts in the environment by following the gradient of the landscape. This enables fast multimodal policies with very fast inference (<1 ms)!

Yiming Li

@YimingLi389852

2 days

🎉 Excited to share Geometry-aware Policy Imitation (GPI): A simple, efficient, and interpretable approach for imitation learning. Delivers multimodal skills, stronger performance, 20–100× faster inference (<1 ms), and orders-of-magnitude less memory. https://t.co/YEUaiYwuQd

4

44

387

You Jiacheng

@YouJiacheng

2 days

sora2 can generate this?

0

10

You Jiacheng

@YouJiacheng

2 days

never say never~

Hieu Pham

@hyhieu226

3 days

Wait, do people actually prompt LLMs by starting with things like: "You are an expert programmer ..." or "NEVER EVER do something" with the hope that the models will treat follow those statements more obediently? (and does it work?) 😅

0

6

You Jiacheng

@YouJiacheng

2 days

in some sense, stuck in local minima ≈ lack of novelty😂

Joseph Suarez 🐡

@jsuarez5341

3 days

You have to have been in ML for over a decade to really understand how bad this was. From 2015-2018ish, any time anything went wrong in an experiment, at least someone would say "local minimum"

0

4

You Jiacheng

@YouJiacheng

2 days

it's a stack plus a read-only view. (a stack doesn't support random read)

🐻熊狸

@bigeagle_xd

2 days

@YouJiacheng @ShengjieWa34067 it's stack

1

10

You Jiacheng

@YouJiacheng

2 days

sorry, but it's not random access memory. it's append-only log.

Jeffrey Wang

@wangzjeff

2 days

Context is the new RAM

31

28

1K

You Jiacheng

@YouJiacheng

2 days

the total number of tokens is slightly reduced (2380 * 2/3 < 1630), but I'm not sure if this comes from separate batch_size's or other changes in this PR.

0

2

You Jiacheng

@YouJiacheng

2 days

last year, I was motivated to introduce a similar change: do less embedding updates, because modded-nanogpt has many embedding params (due to value embeddings) and their gradient communications are costly. but I gave up this idea cuz it looks ugly and ad-hoc😂.

Larry Dial

@classiclarryd

15 days

Down to 146.8s on modded-nanogpt! https://t.co/OV0TaesL4I Surprising result: Different parameter groups have different sensitivity to batch size. Instead of picking a single batch size, grad accumulation can be managed on a param level to simulate different batch sizes.

1

0

3

You Jiacheng

@YouJiacheng

2 days

(disclaimer: I witnessed this work but did not contribute to it.)

0

4

You Jiacheng

@YouJiacheng

2 days

🥵

Kun Lei

@kunlei15

3 days

Robots level up tomorrow — come see real-world reinforcement learning teach them new tricks.

1

0

15

You Jiacheng

@YouJiacheng

3 days

one caveat is that memfd_create needs glibc≥2.27 I tried it when I did my homework a few years ago and I found the test environment is Ubuntu 16.04🥵

heiner

@HeinrichKuttler

3 days

@jeremyphoward You can do something like fd = os.memfd_create(name) os.ftruncate(fd, size) and then either share fd with your child process e.g. via subprocess.Popen(pass_fds=) or you mmap it which multiprocessing can deserialize to the same region. The kernel refcounts the fd like a file.

0

10

You Jiacheng

@YouJiacheng

3 days

source: https://t.co/vCPTkQE8Zh page 5

0

You Jiacheng

@YouJiacheng

3 days

China level chart crime😂

Stock Market Nerd

@StockMarketNerd

5 days

Wells Fargo data pointing to $GOOGL Search market share (not including Gemini) rising through July & August after more than a year of declines:

1

0

12

You Jiacheng

@YouJiacheng

3 days

A1

sankalp

@dejavucoder

3 days

chat, what is this? wrong answers only

0

1

You Jiacheng

@YouJiacheng

3 days

🥵waow

Light 💡

@LightVivien

4 days

诺贝尔和平奖评选委员会在全世界民众的眼皮下，硬生生地把自己变成了一个笑话，贬损了诺贝尔和平奖的声誉。一个平息了八场战争，挽救了无数生命的美国总统川普先生无缘和平奖，连诺贝尔奖得主玛丽亚·科琳娜·马查多 Maria Corina Machado

0

1

You Jiacheng

@YouJiacheng

3 days

g[o]t-5?

Greg Brockman

@gdb

3 days

got-5 for astronomy and astrophysics:

1

0

5

You Jiacheng

@YouJiacheng

4 days

To be precise: cut off U.S. quartz mineral that can be purified to 5N purity exports. the purity of the mineral is not important. the composition of impurities matters.

Select Committee on China

@ChinaSelect

5 days

Immediate term: Throttle the PRC tech sector. To stop Beijing from achieving dominance in critical tech, we must: ⚙️ Cut off U.S. high-purity quartz exports 🚫 Expand SME export controls 🤝 Align allies like Japan & the Netherlands with U.S. policy

0

7