Michael Zhang Profile
Michael Zhang

@mzhangio

Followers
2K
Following
619
Media
115
Statuses
282

cs phd @hazyresearch, @StanfordAILab. robust + reliable LLLMs.

Stanford, CA
Joined February 2016
Don't wanna be here? Send us removal request.
@mzhangio
Michael Zhang
11 months
Ever wanted to scale subquadratic models up to 7B+ LLMs? But didn't want to pretrain billions of parameters on trillions of tokens?. Then just for you, we're happy to share LoLCATs 😺. We show how to convert existing Transformers like Llama 3 8B & Mistral 7B into state-of-the-art
6
51
247
@mzhangio
Michael Zhang
3 months
but while we failed you in your objective, your sacrifice will not be forgotten. yes, we are developing new techniques and algorithms. and we'll continue to make regular progress, as we move through our model and method descent 🥲
Tweet media one
Tweet media two
Tweet media three
0
0
1
@mzhangio
Michael Zhang
3 months
i'm sorry llama-3.2-3b-inst-new_method-r=4_1. you were too pocket-sized, too pretrained, for the tasks ahead of you. 🙃
Tweet media one
1
0
1
@mzhangio
Michael Zhang
3 months
sometimes the hill doesn't get climbed. 🫠. and TIL too many negative outcomes can trigger some pretty depressed self-awareness tokens
Tweet media one
Tweet media two
Tweet media three
2
1
6
@mzhangio
Michael Zhang
4 months
We studied (and will now talk about) how these ideas let us:. - Take existing Transformer LLMs, and turn them into SoTA subquadratic LLMs. - Get SoTA quality, despite only training 0.2% of past methods' model parameters, with 0.4% of their training tokens (i.e., a 2500x boost in
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
7
31
@mzhangio
Michael Zhang
4 months
Ever wondered if instead of designing new efficient architectures from scratch, we could just learn to approximate softmax attentions? . Then check out LoLCATs, perhaps today at ICLR! .- Poster: 3- 5:30p, Hall 3 + 2B #220
1
7
66
@mzhangio
Michael Zhang
5 months
new thoughts from the adviser! Don't agree w all the aesthetic delivery, but I do believe better to have world's AI be on familiar tech. I do wonder how US AI wins on consumer, and how we should probs do stuff here. China 🇨🇳 cares less about privacy + has the super-apps (++data).
@HazyResearch
hazyresearch
5 months
The Great American AI Race. I wrote something about how we need a holistic AI effort from academia, industry, and the US government to have the best shot at a freer, better educated, and healthier world in AI. I’m a mega bull on the US and open source AI. Maybe we’re cooking
Tweet media one
0
0
8
@mzhangio
Michael Zhang
6 months
RT @bfspector: (1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding!….
0
71
0
@mzhangio
Michael Zhang
6 months
good read! @jdunnmon . yes china used llama for military*. but when the nation-states start post-training their own LLMs (for military or otherwise; not everyone wants to share their prompt data). seems like it'd be nice if they built on US AI infra ?. *
Tweet card summary image
reuters.com
Top Chinese research institutions linked to the People's Liberation Army have used Meta's publicly available Llama model to develop an AI tool for potential military applications, according to three...
@jdunnmon
Jared Dunnmon
6 months
My latest on the importance of American leadership in open-source AI. The chip war and the open source race are merging. Given the stakes, second place is not an option. Huge thanks to the @ForeignAffairs team & all the friends who helped along the way.
0
0
1
@mzhangio
Michael Zhang
8 months
RT @w4nderlus7: Today, I’m excited to unveil a project that’s incredibly close to my heart. As a lifelong gamer, I’ve always dreamed of pus….
0
26
0
@mzhangio
Michael Zhang
8 months
RT @mzhangio: new AI-made music video (2024 wrapped OST). happy holidays from my (LLM-focused) GenAI family to yours ^_^ .
0
2
0
@mzhangio
Michael Zhang
8 months
new AI-made music video (2024 wrapped OST). happy holidays from my (LLM-focused) GenAI family to yours ^_^
0
2
17
@mzhangio
Michael Zhang
9 months
Links:.- Robustness / Correct-N-Contrast: - LoooolCATs: . - Early demos of self-correction / applying to codegen (come chat at ). - Bonus: excited to hear takes on RL vs SFT on self-generated data (e.g.,.
Tweet card summary image
arxiv.org
Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention --...
0
0
2
@mzhangio
Michael Zhang
9 months
Slight delay, but around for NeurIPS workshops 🇨🇦. Thinking about all kinds of self-improving systems, robustness for agentic workflows. - Learn from past mistakes (Correct-N-Contrast). - Use their own attns to learn cheaper + faster alternatives (LoLCATs). - Self-correct in new
1
2
19
@mzhangio
Michael Zhang
9 months
RT @NeelGuha: What's (1) a "drink of fresh fruit pureed with milk, yogurt, or ice cream" and (2) an unsupervised algorithm for test-time LL….
Tweet card summary image
arxiv.org
Large language models (LLMs) are increasingly used in applications where LLM inputs may span many different tasks. Recent work has found that the choice of LLM is consequential, and different LLMs...
0
24
0
@mzhangio
Michael Zhang
9 months
yay more self-improving systems. use LLM to write kernels, make test-time compute cheaper. put those kernels back into the LLMs, so they can do more test-time compute + come up w even better kernels. repeat ???
Tweet media one
Tweet media two
@anneouyang
Anne Ouyang
9 months
Kernels are the kernel of deep learning. 🙃. but writing kernels sucks. Can LLMs help? 🤔. Introducing 🌽 KernelBench (Preview), a new coding benchmark designed to evaluate the ability of LLMs to generate ⚡️efficient💨 GPU kernels for optimizing neural network performance.
Tweet media one
2
3
29
@mzhangio
Michael Zhang
9 months
new thoughts from the advisor, reflecting on building foundation models for X*. - we got bitter-lesson / llm-pilled in our own way .- many greats like math + rigor, but sometimes stupid just works better.- this “clarity” might challenge how we should think about LLMs .- our lab.
@HazyResearch
hazyresearch
9 months
An Unserious Person’s Take on Axiomatic Knowledge in the Era of Foundation Models. This post explains why we started the work that led to Evo (HyenaDNA), recently on the cover of Science–thanks to a host of wonderful collaborators at @arcinstitute . It
Tweet media one
0
0
20