
Michael Zhang
@mzhangio
Followers
2K
Following
619
Media
115
Statuses
282
cs phd @hazyresearch, @StanfordAILab. robust + reliable LLLMs.
Stanford, CA
Joined February 2016
Ever wanted to scale subquadratic models up to 7B+ LLMs? But didn't want to pretrain billions of parameters on trillions of tokens?. Then just for you, we're happy to share LoLCATs 😺. We show how to convert existing Transformers like Llama 3 8B & Mistral 7B into state-of-the-art
6
51
247
new thoughts from the adviser! Don't agree w all the aesthetic delivery, but I do believe better to have world's AI be on familiar tech. I do wonder how US AI wins on consumer, and how we should probs do stuff here. China 🇨🇳 cares less about privacy + has the super-apps (++data).
The Great American AI Race. I wrote something about how we need a holistic AI effort from academia, industry, and the US government to have the best shot at a freer, better educated, and healthier world in AI. I’m a mega bull on the US and open source AI. Maybe we’re cooking
0
0
8
RT @bfspector: (1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding!….
0
71
0
good read! @jdunnmon . yes china used llama for military*. but when the nation-states start post-training their own LLMs (for military or otherwise; not everyone wants to share their prompt data). seems like it'd be nice if they built on US AI infra ?. *
reuters.com
Top Chinese research institutions linked to the People's Liberation Army have used Meta's publicly available Llama model to develop an AI tool for potential military applications, according to three...
My latest on the importance of American leadership in open-source AI. The chip war and the open source race are merging. Given the stakes, second place is not an option. Huge thanks to the @ForeignAffairs team & all the friends who helped along the way.
0
0
1
RT @w4nderlus7: Today, I’m excited to unveil a project that’s incredibly close to my heart. As a lifelong gamer, I’ve always dreamed of pus….
0
26
0
Links:.- Robustness / Correct-N-Contrast: - LoooolCATs: . - Early demos of self-correction / applying to codegen (come chat at ). - Bonus: excited to hear takes on RL vs SFT on self-generated data (e.g.,.
arxiv.org
Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention --...
0
0
2
RT @NeelGuha: What's (1) a "drink of fresh fruit pureed with milk, yogurt, or ice cream" and (2) an unsupervised algorithm for test-time LL….
arxiv.org
Large language models (LLMs) are increasingly used in applications where LLM inputs may span many different tasks. Recent work has found that the choice of LLM is consequential, and different LLMs...
0
24
0
yay more self-improving systems. use LLM to write kernels, make test-time compute cheaper. put those kernels back into the LLMs, so they can do more test-time compute + come up w even better kernels. repeat ???
Kernels are the kernel of deep learning. 🙃. but writing kernels sucks. Can LLMs help? 🤔. Introducing 🌽 KernelBench (Preview), a new coding benchmark designed to evaluate the ability of LLMs to generate ⚡️efficient💨 GPU kernels for optimizing neural network performance.
2
3
29
new thoughts from the advisor, reflecting on building foundation models for X*. - we got bitter-lesson / llm-pilled in our own way .- many greats like math + rigor, but sometimes stupid just works better.- this “clarity” might challenge how we should think about LLMs .- our lab.
An Unserious Person’s Take on Axiomatic Knowledge in the Era of Foundation Models. This post explains why we started the work that led to Evo (HyenaDNA), recently on the cover of Science–thanks to a host of wonderful collaborators at @arcinstitute . It
0
0
20