Joey Gonzalez
@profjoeyg
Followers
5K
Following
1K
Media
48
Statuses
727
Professor @UCBerkeley, co-director of @LMSysorg, and co-founder @RunLLM
Berkeley, CA
Joined June 2011
Turn your laptop into a powerful RAG system! LEANN can index and search through millions of documents while using 97% less storage than traditional solutions without accuracy loss. LEANN achieves this through graph-based selective recomputation with high-degree preserving
17
137
891
Meet Slingshots // One. This inaugural batch includes leading-edge researchers advancing the science and practice of AI - with benchmarks, frameworks, and agents that ship real impact into the world. We're honored to support research from: @alexgshaw @Mike_A_Merrill
2
17
62
Trillion dollar data center buildouts are all the rage. Why is all of this kicking off at once? The infrastructure investment we're seeing tells us a lot about the future of inference and the economics of intelligence. @profjoeyg and I break down why intelligence might not be
1
3
5
@nicoalbanese10 yeah it works beautifully in @Letta_AI, since it's basically post-training of claude to be better at "MemGPT"/Letta-style context engineering great example of better post-training (claude) lifting the performance of an existing harness (@Letta_AI) https://t.co/s8OVJ7uT8p
3
4
20
Risk paralysis run amok. "We are concerned that a culture of risk aversion limits creative problem solving, inhibits collaboration and interferes with the systemic change needed to reduce bureaucracy" -- UC Berkeley Task force on Reducing Bureaucratic Burden
Unbelievable: the famed Berkeley Math Circle is being forced to shut down due to a bureaucratic requirement where a guest lecturer giving an hour long lesson needs to be officially fingerprinted. How is fingerprinting even still a thing in the 21st century? Chancellor Lyons
6
14
87
The Skyβs Fun Committee, representing the ppl of sky, just dropped the new lab theme: β«οΈπ Black Pink x Halloween ππ¦ We have: - Gru & the minions - kpop ??? π«°π
8
8
52
Humans handle dynamic situations easily, what about models? Turns out, they break in three distinct ways: β Force Stop β Reasoning leakage (wonβt stop) β‘οΈ Speedup β Panic (rushed answers) β Info Updates β Self-doubt (reject updates) πCheck out https://t.co/wKrnsMkiFY
5
21
68
AI coding tools are all the rage, but very few people are thinking about Day 2: What happens when the code generated by Cursor/Claude Code/etc. goes into production? Maintaining code is often costlier than generating it β here's what you'll need to consider before you drive
Your SRE team is about to go bankruptβand AI coding tools are why. Every CTO celebrates the productivity gains: 2Γ throughput, 50% faster development. But AI-generated code enters production with zero ownership. Reading code is not the same as writing it. The hidden costs: β
0
1
4
What are some good examples of opinionated AI products where the opinion helps to meaningfully define the product?
"Build opinionated products" is not new advice, but it's more important than ever. If you're not careful, your agents can be everything to everyone. That might sound wonderful at first, but it's going to cause you headaches later. Here's why π
1
0
0
For the most part, everyone's use of AI today is synchronous and interactive... but it doesn't have to be that. As agents proliferate, we'll see more and more agents working in the background, doing things for us that we didn't want to bother doing ourselves. The most obvious
1
4
5
What's wrong with this picture? We are still managing GPUs like _old_ mainframes. It's time to start sharing!
π End the GPU Cost Crisis Today!!! Headache with LLMs lock a whole GPU but leave capacity idle? Frustrated by your cluster's low utilization? We launch kvcached, the first library for elastic GPU sharing across LLMs. π https://t.co/3BC7B6s2EX π§΅π Why it matters:
1
1
15
π End the GPU Cost Crisis Today!!! Headache with LLMs lock a whole GPU but leave capacity idle? Frustrated by your cluster's low utilization? We launch kvcached, the first library for elastic GPU sharing across LLMs. π https://t.co/3BC7B6s2EX π§΅π Why it matters:
9
53
196
AI coding tools are enabling anyone to create and modify applications faster than ever. I fear we are about to see an explosion of poorly scoped and rapidly "improving" applications running and interacting on infrastructure designed in a bygone era where software was
AI coding tools are all the rage, but very few people are thinking about Day 2: What happens when the code generated by Cursor/Claude Code/etc. goes into production? Maintaining code is often costlier than generating it β here's what you'll need to consider before you drive
2
3
12
Pair programming with your coding agent would be cool, right? But are the current models ready for this challenge? Not quite. In our recent work, we evaluate reasoning models under "dynamic" world settings. Check it out and reach out to chat!
Humans handle dynamic situations easily, what about models? Turns out, they break in three distinct ways: β Force Stop β Reasoning leakage (wonβt stop) β‘οΈ Speedup β Panic (rushed answers) β Info Updates β Self-doubt (reject updates) πCheck out https://t.co/wKrnsMkiFY
0
1
9
... but wait, maybe this is an interesting tweet. </think> Have you ever wondered what happens if you force a model to stop thinking? It turns out, models are pretty good at answering with partial thoughts but occasionally they will cleverly return to contemplating in the
Humans handle dynamic situations easily, what about models? Turns out, they break in three distinct ways: β Force Stop β Reasoning leakage (wonβt stop) β‘οΈ Speedup β Panic (rushed answers) β Info Updates β Self-doubt (reject updates) πCheck out https://t.co/wKrnsMkiFY
0
2
17
Excited to share our new research: vAttention - Verified Sparse Attention. Sparse attention with provable quality guarantees for LLMs. Full paper: https://t.co/pvOSEI8E7J Gibhub: xAlg-ai/sparse-attention-hub π§΅ A thread π
arxiv.org
State-of-the-art sparse attention methods for reducing decoding latency fall into two main categories: approximate top-$k$ (and its extension, top-$p$) and recently introduced sampling-based...
1
9
15
At @Berkeley_EECS we always work to keep our curriculum fresh. Our intro ML course CS 189 just got a drastic makeover this semester (thanks @profjoeyg @NargesNorouzi!) and now includes ~12 lectures on e.g. Adam, PyTorch, various NN architectures, LLMs, and more (see
eecs189.org
A week-to-week description of the content covered in the course.
Harvard and Stanford students tell me their professors don't understand AI and the courses are outdated. If elite schools can't keep up, the credential arms race is over. Self-learning is the only way now.
21
95
845
Over the last year, AI companies have either moved towards building broader products or narrower ones. Which one's better? @profjoeyg and I have some opinions: The post this week explores why narrowly-scoped agents are beginning to deliver more value than generic platforms π
1
3
6
π Paper: https://t.co/NU2mNypAzz π» Code: https://t.co/SZYOxcd9M4 This project was co-led with @aczhu1326 and advised by @matei_zaharia, @AlexGDimakis, and @profjoeyg. Reach out to @aczhu1326 and me if you want to chat about interesting applications! (8/n)
github.com
How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models - az1326/advisor-models
0
2
23