David Sweet
@phinance99
Followers
212
Following
4K
Media
174
Statuses
3K
Learn to experiment: https://t.co/F9l8CmY7xu Keep agent code tight: cargo install kiss-ai
Manhattan, NY
Joined January 2008
The problem with attempts to get the LLMs to be "accurate" just by talking to themselves is that that simply isn't how knowledge is acquired. The LLMs will behave like scientists if you tell them to. Demand rigor, you'll get rigor. Demand navel-gazing, and you'll get ... idk,
Ask ChatGPT a complex question and you'll get a confident, well-reasoned answer. Then type, "Are you sure?" Watch it completely reverse its position. Ask again. It flips back. By the third round, it usually acknowledges you're testing it, which is somehow worse. It knows what's
0
0
0
I've witnessed the development of computers, the Internet, ML & now AI, the end of the Cold War, the development of reusable space vehicles, GPS (a technological application of general relativity!), and soon self-driving cars and humanoid robots and abundant energy. For
0
0
0
We're sharing a new method for scoring models on agentic coding tasks. Here's how models in Cursor compare on intelligence and efficiency:
194
251
3K
When an agent is optimizing something (running time, out of sample performance) or just hunting for a bug and it seems to be having trouble, try a little creativity. I am always pleasantly surprised by what it comes up with. Ask her to generate 10 ideas, treat them as
@maxbittker @karpathy A "creativity" prompt from researcher Margaret Boden: # MB2 You are stuck producing small local variations of the same idea. Do NOT introduce new frameworks, objectives, or representations. Stay within the same formal system, but explore its boundaries. Identify the core
0
0
0
Excited about Agent research loops? Use a scientific method use and you'll get solid results. Sprinkle in some creativity and you'll get great results.
@karpathy Can I propose an optimization agent? Be scientific and creative. 1. Follow Karl Popper's scientific method of hypothesizing and falsifying. 2. Follow Margaret Boden's definition of "level 2" creativity. I'm getting great results so far. I'd love to see this compete
0
0
0
As another example, watch an LLM hypothesize and falsify its way to heuristic robot controllers https://t.co/mRcwtOicRZ
github.com
Contribute to dsweet99/agent-descent development by creating an account on GitHub.
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes,
0
0
0
I want to own the AI car and the AI errand bot. I want them both to just roam around town chauffering us, picking up groceries, dropping off dry cleaning. While we're at it, I want a dusting drone -- tirelessly flying around my home cleaning up all the dust.
I think Waymo is going to bail parents out of chauffeur culture. I talk to so many people with older kids and they just seem dead with driving their kids places. I think in five years you’re gonna be able to put Bobby in a Waymo to his guitar lesson and Sally in a Waymo to
0
0
0
2. MBC2: Creativity --- You are stuck producing small local variations of the same idea. Do NOT introduce new frameworks, objectives, or representations. Stay within the same formal system, but explore its boundaries. Identify the core assumptions of the current idea. For
0
0
0
1. KPop: Scientific Method --- do **Hypothesize**: Hypothesize one falsifiable explanation of the cause of the problem. **Predict**: Define a falsifying test. If the hypothesis were true, what outcome would the test produce? **Falsify**: Run the test. If falsified, reject the
1
0
0
Or you could teach to problem-solve on its own. "Clearly state the problem. Hypothesize a cause. Try to falsify your hypothesis. Repeat up to 10 times." This Popperian approach also works for optimizations (speed, memory), hardening a review, hardening a plan, and adversarial
If your AI agent can't fix a bug after 3 tries, stop. You're making it worse. Here's what most devs do - they paste the error message back into the chat. Agent tries something. Doesn't work. Paste the new error. Agent tries again. Doesn't work. You're now 10 messages deep and
0
0
0
What about epistemic uncertainty? If the LLM doesn't know something -- and it *can't* know most things -- of what value is consistency? I think optimizing for *method* is worth more time and effort. Generate multiple, hypotheses. (Inconsistency actually helps here.) Then falsify
New paper: It's time to optimize for 🔁self-consistency 🔁 We’ve pushed LLMs to the limits of available data, yet failures like sycophancy and factual inconsistency persist. We argue these stem from the same assumption: that behavior can be specified one I/O pair at a time. 🧵
0
0
0
> AI tools remove the "desirable difficulty" you need to build deep mental models. Counterpoint: You'll build mental models of a high-level process. Instead of having a model of the functions, classes, etc, and their interactions, you'll build a mental model of the process that
A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise
0
0
1