YouJiacheng Profile Banner
You Jiacheng Profile
You Jiacheng

@YouJiacheng

Followers
9K
Following
17K
Media
2K
Statuses
12K

a big fan of TileLang 关注TileLang喵!关注TileLang谢谢喵! https://t.co/utshC0jrCO 十年老粉

Joined August 2015
Don't wanna be here? Send us removal request.
@YouJiacheng
You Jiacheng
2 months
I think an intrinsic property of "fast weight" is that different tokens with different contexts will see different weights. In this sense, MoE is a special case of "fast weight".
2
1
60
@YouJiacheng
You Jiacheng
20 hours
Kun
@Presidentlin
Lincoln 🇿🇦
2 days
TIL that people pronounce Qwen as Q-Wen. I say it as Gwen, but with a Q. I'm not changing.
1
0
7
@YouJiacheng
You Jiacheng
20 hours
China × California √
@China_Fact
China Perspective
3 days
🇨🇳#China’s intelligent port operates with high efficiency At China’s smart port, autonomous transport vehicles move in an orderly and tireless manner. https://t.co/xBDxgruRUS
1
0
22
@YouJiacheng
You Jiacheng
22 hours
a well organized and information rich thread!
@sainingxie
Saining Xie
1 day
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
0
2
23
@du_yilun
Yilun Du
2 days
Introducing Geometry-aware Policy Imitation (GPI)! GPI constructs an energy landscape over the state space using demonstrations. A policy acts in the environment by following the gradient of the landscape. This enables fast multimodal policies with very fast inference (<1 ms)!
@YimingLi389852
Yiming Li
2 days
🎉 Excited to share Geometry-aware Policy Imitation (GPI): A simple, efficient, and interpretable approach for imitation learning. Delivers multimodal skills, stronger performance, 20–100× faster inference (<1 ms), and orders-of-magnitude less memory. https://t.co/YEUaiYwuQd
4
44
387
@YouJiacheng
You Jiacheng
2 days
sora2 can generate this?
0
0
10
@YouJiacheng
You Jiacheng
2 days
never say never~
@hyhieu226
Hieu Pham
3 days
Wait, do people actually prompt LLMs by starting with things like: "You are an expert programmer ..." or "NEVER EVER do something" with the hope that the models will treat follow those statements more obediently? (and does it work?) 😅
0
0
6
@YouJiacheng
You Jiacheng
2 days
in some sense, stuck in local minima ≈ lack of novelty😂
@jsuarez5341
Joseph Suarez 🐡
3 days
You have to have been in ML for over a decade to really understand how bad this was. From 2015-2018ish, any time anything went wrong in an experiment, at least someone would say "local minimum"
0
0
4
@YouJiacheng
You Jiacheng
2 days
it's a stack plus a read-only view. (a stack doesn't support random read)
@bigeagle_xd
🐻熊狸
2 days
1
1
10
@YouJiacheng
You Jiacheng
2 days
sorry, but it's not random access memory. it's append-only log.
@wangzjeff
Jeffrey Wang
2 days
Context is the new RAM
31
28
1K
@YouJiacheng
You Jiacheng
2 days
the total number of tokens is slightly reduced (2380 * 2/3 < 1630), but I'm not sure if this comes from separate batch_size's or other changes in this PR.
0
0
2
@YouJiacheng
You Jiacheng
2 days
last year, I was motivated to introduce a similar change: do less embedding updates, because modded-nanogpt has many embedding params (due to value embeddings) and their gradient communications are costly. but I gave up this idea cuz it looks ugly and ad-hoc😂.
@classiclarryd
Larry Dial
15 days
Down to 146.8s on modded-nanogpt! https://t.co/OV0TaesL4I Surprising result: Different parameter groups have different sensitivity to batch size. Instead of picking a single batch size, grad accumulation can be managed on a param level to simulate different batch sizes.
1
0
3
@YouJiacheng
You Jiacheng
2 days
(disclaimer: I witnessed this work but did not contribute to it.)
0
0
4
@YouJiacheng
You Jiacheng
2 days
🥵
@kunlei15
Kun Lei
3 days
Robots level up tomorrow — come see real-world reinforcement learning teach them new tricks.
1
0
15
@YouJiacheng
You Jiacheng
3 days
one caveat is that memfd_create needs glibc≥2.27 I tried it when I did my homework a few years ago and I found the test environment is Ubuntu 16.04🥵
@HeinrichKuttler
heiner
3 days
@jeremyphoward You can do something like fd = os.memfd_create(name) os.ftruncate(fd, size) and then either share fd with your child process e.g. via subprocess.Popen(pass_fds=) or you mmap it which multiprocessing can deserialize to the same region. The kernel refcounts the fd like a file.
0
0
10
@YouJiacheng
You Jiacheng
3 days
0
0
0
@YouJiacheng
You Jiacheng
3 days
China level chart crime😂
@StockMarketNerd
Stock Market Nerd
5 days
Wells Fargo data pointing to $GOOGL Search market share (not including Gemini) rising through July & August after more than a year of declines:
1
0
12
@YouJiacheng
You Jiacheng
3 days
A1
@dejavucoder
sankalp
3 days
chat, what is this? wrong answers only
0
0
1
@YouJiacheng
You Jiacheng
3 days
🥵waow
@LightVivien
Light 💡
4 days
诺贝尔和平奖评选委员会在全世界民众的眼皮下,硬生生地把自己变成了一个笑话,贬损了诺贝尔和平奖的声誉。一个平息了八场战争,挽救了无数生命的美国总统川普先生无缘和平奖,连诺贝尔奖得主玛丽亚·科琳娜·马查多 Maria Corina Machado
0
0
1
@YouJiacheng
You Jiacheng
3 days
g[o]t-5?
@gdb
Greg Brockman
3 days
got-5 for astronomy and astrophysics:
1
0
5
@YouJiacheng
You Jiacheng
4 days
To be precise: cut off U.S. quartz mineral that can be purified to 5N purity exports. the purity of the mineral is not important. the composition of impurities matters.
@ChinaSelect
Select Committee on China
5 days
Immediate term: Throttle the PRC tech sector. To stop Beijing from achieving dominance in critical tech, we must: ⚙️ Cut off U.S. high-purity quartz exports 🚫 Expand SME export controls 🤝 Align allies like Japan & the Netherlands with U.S. policy
0
0
7