
Joey
@joeyism101
Followers
108
Following
11K
Media
28
Statuses
1K
Senior ML Engineer
đ
Joined December 2014
Did Stanford just kill LLM fine-tuning? This new paper from Stanford, called Agentic Context Engineering (ACE), proves something wild: you can make models smarter without changing a single weight. Here's how it works: Instead of retraining the model, ACE evolves the context
41
141
785
New paper from @Google is a major memory breakthrough for AI agents. ReasoningBank helps an AI agent improve during use by learning from its wins and mistakes. To succeed in real-world settings, LLM agents must stop making the same mistakes. ReasoningBank memory framework
55
293
2K
You're paying more. Getting less. And have no idea what's working. Adora launches today at @advertisingweek NYCâgiving enterprise marketers transparency and control over their performance marketing. #AWNY
0
1
1
How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4
22
98
590
Some random thoughts I've been having about video world model/long video generation since working on Mixture of Contexts (whose title could also be "Learnable Sparse Attention for Long Video Generation"): đšSemi-long Post Alertđš 1. Learnable sparse attention is still underrated
How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4
6
38
214
LLMs just learned how to explain their own thoughts. Not only do they generate answers, they can now describe the internal processes that led to those answers⊠and get better at it with training. Weâre officially entering the era of self-interpretable AI. Models arenât just
116
264
1K
One of the best paper of the recent week. The big takeaway: scaling up model size doesnât just make models smarter in terms of knowledge, it makes them last longer on multi-step tasks, which is what really matters for agents. Shows that small models can usually do one step
21
115
686
Another paper claiming really BIG result. The First method to achieve 99.9% on AIME 2025 with open-source models! đ€Ż DeepConf uses a modelâs own token confidence to keep only its strongest reasoning, with GPT-OSS-120B while cutting tokens by up to 84.7% compared to standard
20
153
824
Nice work concurrent to ASFT that tries to diffuse in pixel space by decoding its coordinate instead. We may be near death of Latent diffusion
5
28
284
I never knew how beautifully connected Softmax and Cross-entropy were till I read this.
9
105
1K
found a cool yt channel where someone dumbs down complex ML papers. absolute gold.
16
222
4K
Flow Matching (FM) is one of the hottest ideas in generative AI - and itâs everywhere at #ICML2025. But what is it? And why is it so elegant? đ€ This thread is an animated, intuitive intro into (Variational) Flow Matching - no dense math required. Let's dive in! đ§”đ
111
258
2K
This is a solid 29 videos playlist on how to build DeepSeek from scratch. It covers theory and code, from the very foundations to advanced. Self attention, multi-head [latent] attention, GQA, how DeepSeek rewrote Quantization, etc. One video a day and youâll finish in a month.
6
199
1K
What if an LLM could update its own weights? Meet SEALđŠ: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated modelâs downstream performance as reward.
131
524
3K
Google opensources DeepSearch stack Get started with building Fullstack Agents using Gemini 2.5 and LangGraph đ Overview This project has a React frontend and a FastAPI backend built on LangGraph. The agent turns user input into search queries with Gemini, fetches web results
10
122
890
An AI agent upgraded its own tools and doubled its bug-fix score. Darwin-style search plus Gödel-style self-reference cracked coding tasks. Pass rate jumps from 20 % to 50 % on SWE-bench-Verified Darwin Gödel Machine (DGM) is a coding agent that rewrites its own code, tests
6
54
251
KV caching in LLMs, clearly explained (with visuals):
14
166
2K
LLMs Get Lost in Multi-turn Conversation The cat is out of the bag. Pay attention, devs. This is one of the most common issues when building with LLMs today. Glad there is now paper to share insights. Here are my notes:
98
632
4K