David Sweet @phinance99 X Profile

David Sweet

@phinance99

Followers

212

Following

4K

Media

174

Statuses

3K

Learn to experiment: https://t.co/F9l8CmY7xu Keep agent code tight: cargo install kiss-ai

https://t.co/onRmuxGieW

Manhattan, NY

Joined January 2008

Don't wanna be here? Send us removal request.

David Sweet

@phinance99

2 hours

The problem with attempts to get the LLMs to be "accurate" just by talking to themselves is that that simply isn't how knowledge is acquired. The LLMs will behave like scientists if you tell them to. Demand rigor, you'll get rigor. Demand navel-gazing, and you'll get ... idk,

Randy Olson

@randal_olson

1 month

Ask ChatGPT a complex question and you'll get a confident, well-reasoned answer. Then type, "Are you sure?" Watch it completely reverse its position. Ask again. It flips back. By the third round, it usually acknowledges you're testing it, which is somehow worse. It knows what's

0

David Sweet

@phinance99

10 hours

I've witnessed the development of computers, the Internet, ML & now AI, the end of the Cold War, the development of reusable space vehicles, GPS (a technological application of general relativity!), and soon self-driving cars and humanoid robots and abundant energy. For

0

David Sweet

@phinance99

10 hours

It's a passing phase. The AI is clearing the path to the problem on which he'll get stuck. And it's wonderful.

Ariel

@redtachyon

23 hours

BTW if you agree with this, you don't work on difficult enough problens

0

David Sweet

@phinance99

2 days

cargo install kiss-ai or you're just vibe-coding.

0

Cursor

@cursor_ai

3 days

We're sharing a new method for scoring models on agentic coding tasks. Here's how models in Cursor compare on intelligence and efficiency:

194

251

3K

David Sweet

@phinance99

4 days

Agents need to learn how to forget well. Remembering is easy.

0

David Sweet

@phinance99

4 days

When an agent is optimizing something (running time, out of sample performance) or just hunting for a bug and it seems to be having trouble, try a little creativity. I am always pleasantly surprised by what it comes up with. Ask her to generate 10 ideas, treat them as

David Sweet

@phinance99

5 days

@maxbittker @karpathy A "creativity" prompt from researcher Margaret Boden: # MB2 You are stuck producing small local variations of the same idea. Do NOT introduce new frameworks, objectives, or representations. Stay within the same formal system, but explore its boundaries. Identify the core

0

David Sweet

@phinance99

4 days

https://t.co/W2cQxF20Ci

Dr. Eli David

@DrEliDavid

4 days

🚨 Breaking – President Trump: “We've got to finish the job [with the regime]. We don’t want to go back every two years. Because there will be a time when you don’t have me as president. Perhaps you’ll have a weak pathetic person as we’ve had.”

0

David Sweet

@phinance99

5 days

Excited about Agent research loops? Use a scientific method use and you'll get solid results. Sprinkle in some creativity and you'll get great results.

David Sweet

@phinance99

9 days

@karpathy Can I propose an optimization agent? Be scientific and creative. 1. Follow Karl Popper's scientific method of hypothesizing and falsifying. 2. Follow Margaret Boden's definition of "level 2" creativity. I'm getting great results so far. I'd love to see this compete

0

David Sweet

@phinance99

6 days

As another example, watch an LLM hypothesize and falsify its way to heuristic robot controllers https://t.co/mRcwtOicRZ

github.com

Contribute to dsweet99/agent-descent development by creating an account on GitHub.

Andrej Karpathy

@karpathy

6 days

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes,

0

David Sweet

@phinance99

7 days

I want to own the AI car and the AI errand bot. I want them both to just roam around town chauffering us, picking up groceries, dropping off dry cleaning. While we're at it, I want a dusting drone -- tirelessly flying around my home cleaning up all the dust.

staysaasy

@staysaasy

7 days

I think Waymo is going to bail parents out of chauffeur culture. I talk to so many people with older kids and they just seem dead with driving their kids places. I think in five years you’re gonna be able to put Bobby in a Waymo to his guitar lesson and Sally in a Waymo to

0

David Sweet

@phinance99

7 days

20??: AI prevents the problem.

George Mack

@george__mack

7 days

2025: Don’t blindly trust AI without speaking to your doctor. 2026: Don’t blindly trust a doctor without speaking to your AI.

0

GSD

@gsd_foundation

8 days

https://t.co/W1jQu2sszf

49

151

1K

David Sweet

@phinance99

8 days

"But between systems that were built by people who measured, and systems that were built by tools that pattern-match." I have heard this same comment for 40 years in so many guises, it's driving me nuts. - Computers can't play chess. - Computers can't play go. - Computers can't

Hōrōshi バガボンド

@KatanaLarp

9 days

https://t.co/h9C7YZeCx3

0

David Sweet

@phinance99

9 days

Everything coming out of your LLM should be treated as a hypothesis. Neither fact nor hallucination. Just useful fuel for your process. [Falsify your hypotheses.]

Hōrōshi バガボンド

@KatanaLarp

9 days

https://t.co/h9C7YZeCx3

0

1

David Sweet

@phinance99

9 days

2. MBC2: Creativity --- You are stuck producing small local variations of the same idea. Do NOT introduce new frameworks, objectives, or representations. Stay within the same formal system, but explore its boundaries. Identify the core assumptions of the current idea. For

0

David Sweet

@phinance99

9 days

1. KPop: Scientific Method --- do **Hypothesize**: Hypothesize one falsifiable explanation of the cause of the problem. **Predict**: Define a falsifying test. If the hypothesis were true, what outcome would the test produce? **Falsify**: Run the test. If falsified, reject the

1

0

David Sweet

@phinance99

9 days

Or you could teach to problem-solve on its own. "Clearly state the problem. Hypothesize a cause. Try to falsify your hypothesis. Repeat up to 10 times." This Popperian approach also works for optimizations (speed, memory), hardening a review, hardening a plan, and adversarial

Burke Holland

@burkeholland

10 days

If your AI agent can't fix a bug after 3 tries, stop. You're making it worse. Here's what most devs do - they paste the error message back into the chat. Agent tries something. Doesn't work. Paste the new error. Agent tries again. Doesn't work. You're now 10 messages deep and

0

David Sweet

@phinance99

10 days

What about epistemic uncertainty? If the LLM doesn't know something -- and it *can't* know most things -- of what value is consistency? I think optimizing for *method* is worth more time and effort. Generate multiple, hypotheses. (Inconsistency actually helps here.) Then falsify

Itamar Pres

@PresItamar

10 days

New paper: It's time to optimize for 🔁self-consistency 🔁 We’ve pushed LLMs to the limits of available data, yet failures like sycophancy and factual inconsistency persist. We argue these stem from the same assumption: that behavior can be specified one I/O pair at a time. 🧵

0

David Sweet

@phinance99

11 days

> AI tools remove the "desirable difficulty" you need to build deep mental models. Counterpoint: You'll build mental models of a high-level process. Instead of having a model of the functions, classes, etc, and their interactions, you'll build a mental model of the process that

Machine Learning Street Talk

@MLStreetTalk

12 days

A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise

0

1