Lizhang Chen @Tim38463182 X Profile

Lizhang Chen

@Tim38463182

Followers

73

Following

73

Media

0

Statuses

53

Student researcher @GoogleResearch Ph.D student at UT Austin

Pasadena, CA

Joined November 2019

Don't wanna be here? Send us removal request.

Lizhang Chen

@Tim38463182

2 months

RT @aaron_defazio: Why do gradients increase near the end of training? .Read the paper to find out!.We also propose a simple fix to AdamW t….

0

74

0

Lizhang Chen

@Tim38463182

6 months

RT @ljb121002: Excited to introduce Prior-Informed Preference Alignment (PIPA)🎶! . 🚀Works anywhere DPO/KTO does, with a 3-10% performance….

0

15

0

Lizhang Chen

@Tim38463182

6 months

RT @cranialxix: If you are interested in learning/using flow/diffusion models, please check this thread from the original author of rectifi….

0

3

0

Lizhang Chen

@Tim38463182

6 months

RT @lqiang67: 🚀 New Rectified Flow materials (WIP)!. 📖 Tutorials: 💻 Code: 📜 Notes: https://….

github.com

code based for rectified flow. Contribute to lqiang67/rectified-flow development by creating an account on GitHub.

0

42

0

Lizhang Chen

@Tim38463182

7 months

RT @DrJimFan: We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive - truly open, frontier res….

0

2K

0

Lizhang Chen

@Tim38463182

8 months

Is shampoo really better than Adam? 🤔.

rohan anil

@_arohan_

8 months

Today is 10th anniversary of Adam paper on arxiv! . Even though Shampoo is far better than Adam, it’s undeniable how good Adam is with respect to simplicity.

0

1

Lizhang Chen

@Tim38463182

8 months

RT @wellingmax: Truly excellent piece on entropy. Source: Quanta Magazine

quantamagazine.org

Exactly 200 years ago, a French engineer introduced an idea that would quantify the universe’s inexorable slide into decay. But entropy, as it’s currently understood, is less a fact about the world...

0

73

0

Lizhang Chen

@Tim38463182

8 months

RT @_clashluke: Cautioning gives substantial speedups (see quoted tweet) with a one-line change but also increases the implicit step size….

0

5

0

Lizhang Chen

@Tim38463182

8 months

RT @XixiHu12: 🚀 Excited to share AdaFlow at #NeurIPS2024!. A fast, adaptive method for training robots to act with one-step efficiency—no d….

0

6

0

Lizhang Chen

@Tim38463182

8 months

let all optimizers be cautious now!.

0

Lizhang Chen

@Tim38463182

8 months

"this boost appears more consistent than some of the new optimizers -- it's a relatively small addition that can be made to most existing optimizers".

Ross Wightman

@wightmanr

8 months

One of the last minute papers I added support for that delayed this release was 'Cautious Optimizers' As I promised, I pushed some sets of experiments at Consider me impressed, this boost appears more consistent than some of the new optimizers -- it's a.

1

0

1

Lizhang Chen

@Tim38463182

8 months

RT @giffmana: Nice, independent verification of the "cautious" one-line change to optimizers by Ross, on separate problems. Seems to consis….

0

16

0

Lizhang Chen

@Tim38463182

8 months

Caution code:

github.com

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V...

Lucas Nestler

@_clashluke

8 months

Nice, it works.

1

0

3

Lizhang Chen

@Tim38463182

8 months

RT @wightmanr: I was going to publish a new timm release yesterday with significant Optimizer updates: Adopt, Big Vision Adafactor, MARS, a….

0

12

0

Lizhang Chen

@Tim38463182

8 months

RT @_clashluke: Underrated find.

0

12

0

Lizhang Chen

@Tim38463182

8 months

RT @KyleLiang5: TLDR: 1⃣ line modification, satisfaction (theoretically and empirically) guaranteed 😀😀😀.Core idea: 🚨Do not update if you ar….

0

37

0

Lizhang Chen

@Tim38463182

8 months

RT @konstmish: OpenReview's LaTeX parser seems to be quite bad and it makes it very painful to be a reviewer sometimes. For example:."Assum….

0

2

0

Lizhang Chen

@Tim38463182

10 months

Notably, Distributed Lion attains comparable performance to standard Lion or AdamW optimizers applied on aggregated gradients, but with significantly reduced communication bandwidth. This feature is particularly advantageous for training large models.

0

1

Lizhang Chen

@Tim38463182

10 months

Our theoretical analysis confirms Distributed Lion's convergence properties. Empirical results demonstrate its robustness across a range of tasks, worker counts, and batch sizes, on both vision and language problems.

1

0

Lizhang Chen

@Tim38463182

10 months

Leveraging the sign operator in Lion, our Distributed Lion only requires communicating binary or lower-precision vectors between workers to the center server, significantly reducing the communication cost.

1

0