Michael Zhang @mzhangio X Profile

Michael Zhang

@mzhangio

Followers

2K

Following

619

Media

115

Statuses

282

cs phd @hazyresearch, @StanfordAILab. robust + reliable LLLMs.

Stanford, CA

Joined February 2016

Don't wanna be here? Send us removal request.

Michael Zhang

@mzhangio

11 months

Ever wanted to scale subquadratic models up to 7B+ LLMs? But didn't want to pretrain billions of parameters on trillions of tokens?. Then just for you, we're happy to share LoLCATs 😺. We show how to convert existing Transformers like Llama 3 8B & Mistral 7B into state-of-the-art

6

51

247

Michael Zhang

@mzhangio

3 months

but while we failed you in your objective, your sacrifice will not be forgotten. yes, we are developing new techniques and algorithms. and we'll continue to make regular progress, as we move through our model and method descent 🥲

0

1

Michael Zhang

@mzhangio

3 months

i'm sorry llama-3.2-3b-inst-new_method-r=4_1. you were too pocket-sized, too pretrained, for the tasks ahead of you. 🙃

1

0

1

Michael Zhang

@mzhangio

3 months

sometimes the hill doesn't get climbed. 🫠. and TIL too many negative outcomes can trigger some pretty depressed self-awareness tokens

2

1

6

Michael Zhang

@mzhangio

4 months

We studied (and will now talk about) how these ideas let us:. - Take existing Transformer LLMs, and turn them into SoTA subquadratic LLMs. - Get SoTA quality, despite only training 0.2% of past methods' model parameters, with 0.4% of their training tokens (i.e., a 2500x boost in

0

7

31

Michael Zhang

@mzhangio

4 months

Ever wondered if instead of designing new efficient architectures from scratch, we could just learn to approximate softmax attentions? . Then check out LoLCATs, perhaps today at ICLR! .- Poster: 3- 5:30p, Hall 3 + 2B #220

1

7

66

Michael Zhang

@mzhangio

5 months

new thoughts from the adviser! Don't agree w all the aesthetic delivery, but I do believe better to have world's AI be on familiar tech. I do wonder how US AI wins on consumer, and how we should probs do stuff here. China 🇨🇳 cares less about privacy + has the super-apps (++data).

hazyresearch

@HazyResearch

5 months

The Great American AI Race. I wrote something about how we need a holistic AI effort from academia, industry, and the US government to have the best shot at a freer, better educated, and healthier world in AI. I’m a mega bull on the US and open source AI. Maybe we’re cooking

0

8

Michael Zhang

@mzhangio

6 months

RT @bfspector: (1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding!….

0

71

0

Michael Zhang

@mzhangio

6 months

good read! @jdunnmon . yes china used llama for military*. but when the nation-states start post-training their own LLMs (for military or otherwise; not everyone wants to share their prompt data). seems like it'd be nice if they built on US AI infra ?. *

reuters.com

Top Chinese research institutions linked to the People's Liberation Army have used Meta's publicly available Llama model to develop an AI tool for potential military applications, according to three...

Jared Dunnmon

@jdunnmon

6 months

My latest on the importance of American leadership in open-source AI. The chip war and the open source race are merging. Given the stakes, second place is not an option. Huge thanks to the @ForeignAffairs team & all the friends who helped along the way.

0

1

Michael Zhang

@mzhangio

8 months

RT @w4nderlus7: Today, I’m excited to unveil a project that’s incredibly close to my heart. As a lifelong gamer, I’ve always dreamed of pus….

0

26

0

Michael Zhang

@mzhangio

8 months

RT @mzhangio: new AI-made music video (2024 wrapped OST). happy holidays from my (LLM-focused) GenAI family to yours ^_^ .

0

2

0

Michael Zhang

@mzhangio

8 months

new AI-made music video (2024 wrapped OST). happy holidays from my (LLM-focused) GenAI family to yours ^_^

0

2

17

Michael Zhang

@mzhangio

9 months

Links:.- Robustness / Correct-N-Contrast: - LoooolCATs: . - Early demos of self-correction / applying to codegen (come chat at ). - Bonus: excited to hear takes on RL vs SFT on self-generated data (e.g.,.

arxiv.org

Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention --...

0

2

Michael Zhang

@mzhangio

9 months

Slight delay, but around for NeurIPS workshops 🇨🇦. Thinking about all kinds of self-improving systems, robustness for agentic workflows. - Learn from past mistakes (Correct-N-Contrast). - Use their own attns to learn cheaper + faster alternatives (LoLCATs). - Self-correct in new

1

2

19

Michael Zhang

@mzhangio

9 months

RT @NeelGuha: What's (1) a "drink of fresh fruit pureed with milk, yogurt, or ice cream" and (2) an unsupervised algorithm for test-time LL….

arxiv.org

Large language models (LLMs) are increasingly used in applications where LLM inputs may span many different tasks. Recent work has found that the choice of LLM is consequential, and different LLMs...

0

24

0

Michael Zhang

@mzhangio

9 months

yay more self-improving systems. use LLM to write kernels, make test-time compute cheaper. put those kernels back into the LLMs, so they can do more test-time compute + come up w even better kernels. repeat ???

Anne Ouyang

@anneouyang

9 months

Kernels are the kernel of deep learning. 🙃. but writing kernels sucks. Can LLMs help? 🤔. Introducing 🌽 KernelBench (Preview), a new coding benchmark designed to evaluate the ability of LLMs to generate ⚡️efficient💨 GPU kernels for optimizing neural network performance.

2

3

29

Michael Zhang

@mzhangio

9 months

new thoughts from the advisor, reflecting on building foundation models for X*. - we got bitter-lesson / llm-pilled in our own way .- many greats like math + rigor, but sometimes stupid just works better.- this “clarity” might challenge how we should think about LLMs .- our lab.

hazyresearch

@HazyResearch

9 months

An Unserious Person’s Take on Axiomatic Knowledge in the Era of Foundation Models. This post explains why we started the work that led to Evo (HyenaDNA), recently on the cover of Science–thanks to a host of wonderful collaborators at @arcinstitute . It

0

20