phinance99 Profile Banner
David Sweet Profile
David Sweet

@phinance99

Followers
212
Following
4K
Media
174
Statuses
3K

Learn to experiment: https://t.co/F9l8CmY7xu Keep agent code tight: cargo install kiss-ai

Manhattan, NY
Joined January 2008
Don't wanna be here? Send us removal request.
@phinance99
David Sweet
2 hours
The problem with attempts to get the LLMs to be "accurate" just by talking to themselves is that that simply isn't how knowledge is acquired. The LLMs will behave like scientists if you tell them to. Demand rigor, you'll get rigor. Demand navel-gazing, and you'll get ... idk,
@randal_olson
Randy Olson
1 month
Ask ChatGPT a complex question and you'll get a confident, well-reasoned answer. Then type, "Are you sure?" Watch it completely reverse its position. Ask again. It flips back. By the third round, it usually acknowledges you're testing it, which is somehow worse. It knows what's
0
0
0
@phinance99
David Sweet
10 hours
I've witnessed the development of computers, the Internet, ML & now AI, the end of the Cold War, the development of reusable space vehicles, GPS (a technological application of general relativity!), and soon self-driving cars and humanoid robots and abundant energy. For
0
0
0
@phinance99
David Sweet
10 hours
It's a passing phase. The AI is clearing the path to the problem on which he'll get stuck. And it's wonderful.
@redtachyon
Ariel
23 hours
BTW if you agree with this, you don't work on difficult enough problens
0
0
0
@phinance99
David Sweet
2 days
cargo install kiss-ai or you're just vibe-coding.
0
0
0
@cursor_ai
Cursor
3 days
We're sharing a new method for scoring models on agentic coding tasks. Here's how models in Cursor compare on intelligence and efficiency:
194
251
3K
@phinance99
David Sweet
4 days
Agents need to learn how to forget well. Remembering is easy.
0
0
0
@phinance99
David Sweet
4 days
When an agent is optimizing something (running time, out of sample performance) or just hunting for a bug and it seems to be having trouble, try a little creativity. I am always pleasantly surprised by what it comes up with. Ask her to generate 10 ideas, treat them as
@phinance99
David Sweet
5 days
@maxbittker @karpathy A "creativity" prompt from researcher Margaret Boden: # MB2 You are stuck producing small local variations of the same idea. Do NOT introduce new frameworks, objectives, or representations. Stay within the same formal system, but explore its boundaries. Identify the core
0
0
0
@phinance99
David Sweet
4 days
@DrEliDavid
Dr. Eli David
4 days
🚨 Breaking – President Trump: “We've got to finish the job [with the regime]. We don’t want to go back every two years. Because there will be a time when you don’t have me as president. Perhaps you’ll have a weak pathetic person as we’ve had.”
0
0
0
@phinance99
David Sweet
5 days
Excited about Agent research loops? Use a scientific method use and you'll get solid results. Sprinkle in some creativity and you'll get great results.
@phinance99
David Sweet
9 days
@karpathy Can I propose an optimization agent? Be scientific and creative. 1. Follow Karl Popper's scientific method of hypothesizing and falsifying. 2. Follow Margaret Boden's definition of "level 2" creativity. I'm getting great results so far. I'd love to see this compete
0
0
0
@phinance99
David Sweet
6 days
As another example, watch an LLM hypothesize and falsify its way to heuristic robot controllers https://t.co/mRcwtOicRZ
Tweet card summary image
github.com
Contribute to dsweet99/agent-descent development by creating an account on GitHub.
@karpathy
Andrej Karpathy
6 days
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes,
0
0
0
@phinance99
David Sweet
7 days
I want to own the AI car and the AI errand bot. I want them both to just roam around town chauffering us, picking up groceries, dropping off dry cleaning. While we're at it, I want a dusting drone -- tirelessly flying around my home cleaning up all the dust.
@staysaasy
staysaasy
7 days
I think Waymo is going to bail parents out of chauffeur culture. I talk to so many people with older kids and they just seem dead with driving their kids places. I think in five years you’re gonna be able to put Bobby in a Waymo to his guitar lesson and Sally in a Waymo to
0
0
0
@phinance99
David Sweet
7 days
20??: AI prevents the problem.
@george__mack
George Mack
7 days
2025: Don’t blindly trust AI without speaking to your doctor. 2026: Don’t blindly trust a doctor without speaking to your AI.
0
0
0
@phinance99
David Sweet
8 days
"But between systems that were built by people who measured, and systems that were built by tools that pattern-match." I have heard this same comment for 40 years in so many guises, it's driving me nuts. - Computers can't play chess. - Computers can't play go. - Computers can't
@KatanaLarp
Hōrōshi バガボンド
9 days
0
0
0
@phinance99
David Sweet
9 days
Everything coming out of your LLM should be treated as a hypothesis. Neither fact nor hallucination. Just useful fuel for your process. [Falsify your hypotheses.]
@KatanaLarp
Hōrōshi バガボンド
9 days
0
0
1
@phinance99
David Sweet
9 days
2. MBC2: Creativity --- You are stuck producing small local variations of the same idea. Do NOT introduce new frameworks, objectives, or representations. Stay within the same formal system, but explore its boundaries. Identify the core assumptions of the current idea. For
0
0
0
@phinance99
David Sweet
9 days
1. KPop: Scientific Method --- do **Hypothesize**: Hypothesize one falsifiable explanation of the cause of the problem. **Predict**: Define a falsifying test. If the hypothesis were true, what outcome would the test produce? **Falsify**: Run the test. If falsified, reject the
1
0
0
@phinance99
David Sweet
9 days
Or you could teach to problem-solve on its own. "Clearly state the problem. Hypothesize a cause. Try to falsify your hypothesis. Repeat up to 10 times." This Popperian approach also works for optimizations (speed, memory), hardening a review, hardening a plan, and adversarial
@burkeholland
Burke Holland
10 days
If your AI agent can't fix a bug after 3 tries, stop. You're making it worse. Here's what most devs do - they paste the error message back into the chat. Agent tries something. Doesn't work. Paste the new error. Agent tries again. Doesn't work. You're now 10 messages deep and
0
0
0
@phinance99
David Sweet
10 days
What about epistemic uncertainty? If the LLM doesn't know something -- and it *can't* know most things -- of what value is consistency? I think optimizing for *method* is worth more time and effort. Generate multiple, hypotheses. (Inconsistency actually helps here.) Then falsify
@PresItamar
Itamar Pres
10 days
New paper: It's time to optimize for 🔁self-consistency 🔁 We’ve pushed LLMs to the limits of available data, yet failures like sycophancy and factual inconsistency persist. We argue these stem from the same assumption: that behavior can be specified one I/O pair at a time. 🧵
0
0
0
@phinance99
David Sweet
11 days
> AI tools remove the "desirable difficulty" you need to build deep mental models. Counterpoint: You'll build mental models of a high-level process. Instead of having a model of the functions, classes, etc, and their interactions, you'll build a mental model of the process that
@MLStreetTalk
Machine Learning Street Talk
12 days
A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise
0
0
1