Julian Flieller @fliellerjulian X Profile

Julian Flieller

@fliellerjulian

Followers

465

Following

2K

Media

126

Statuses

902

building ai agents @ https://t.co/hIxvnB9On4 (yc w23) · prev https://t.co/x3LV6373wN

https://t.co/tpx2ZYLvRN

🇺🇸🇩🇪

Joined June 2015

Don't wanna be here? Send us removal request.

nico

@nicochristie

1 day

Accidentally said "hard" instead of "non-trivial" and they kicked me out of SF

298

610

17K

Julian Flieller

@fliellerjulian

23 hours

made it open-source:

github.com

Contribute to fliellerjulian/claude-code-feedback development by creating an account on GitHub.

Julian Flieller

@fliellerjulian

1 day

Built PR comments for Claude Code

0

1

Julian Flieller

@fliellerjulian

1 day

Built PR comments for Claude Code

0

Julian Flieller

@fliellerjulian

7 days

full article:

anthropic.com

Discover how Anthropic approaches the development of reliable AI agents. Learn about our research on agent capabilities, safety considerations, and technical framework for building trustworthy AI.

0

Julian Flieller

@fliellerjulian

7 days

Guide from @AnthropicAI on when to actually build agents 👇 The golden rule: "Find the simplest solution possible, and only increase complexity when needed." The complexity ladder (only climb when you must): Single prompt → Prompt chaining → Routing → Agents However, most

1

0

1

Julian Flieller

@fliellerjulian

8 days

Full article:

anthropic.com

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

0

Julian Flieller

@fliellerjulian

8 days

Prompt engineering = write a good query Context engineering = curate what the agent sees, iteratively Key insight from @AnthropicAI's guide: More context ≠ better performance "As tokens in the context window increase, the model's ability to accurately recall information

2

0

Julian Flieller

@fliellerjulian

17 days

suggested infra: (read my last post about why to use refutations and support facts)

0

Julian Flieller

@fliellerjulian

17 days

Why RAG works better for fact-checking: - Answers are grounded in retrieved context. - Citations = verifiability. - Hallucination risk is reduced. - Updating knowledge = add documents (no retraining).

1

0

Julian Flieller

@fliellerjulian

17 days

What about RAG? - Retriever: pulls highly relevant, trustworthy papers or datasets (with metadata) - Generator: uses those documents to craft the answer. - Grounding: you can cite the exact evidence.

1

0

Julian Flieller

@fliellerjulian

17 days

Websearch as a solution: - Links are ranked by popularity, not necessarily relevance. - LLMs then have to parse noisy pages - No guarantee the right evidence is surfaced.

1

0

Julian Flieller

@fliellerjulian

17 days

Problem: When you ask a plain LLM a scientific question, it’s like a student relying only on memory. Sometimes they recall correctly, but often they hallucinate or cherry-pick.

1

0

Julian Flieller

@fliellerjulian

17 days

If you want to zero-shot scientific claim verification with LLMs, should you use websearch or RAG? 👇

1

0

1

Julian Flieller

@fliellerjulian

21 days

"Agentic coding is a skill that scales with your technical knowledge."

0

Julian Flieller

@fliellerjulian

22 days

Whats even more surprising (at least to me) that this approach outperforms websearch -> more on that tomorrow. Link to the paper covered in this post:

0

Julian Flieller

@fliellerjulian

22 days

That’s it. With just prompting + a retriever, GPT-4 (!) was competitive with fully supervised models in scientific claim verification....would be super interesting to see how current frontier models would perform

1

0

Julian Flieller

@fliellerjulian

22 days

Step 4: Scale it up. Use a RAG to fetch top-k abstracts and run the verifier on each claim–evidence pair you want to check. Aggregate results → most likely verdict + supporting citation.

1

0

Julian Flieller

@fliellerjulian

22 days

Step 3: Give you model of choice a citance + abstract of the cited paper, and ask: Does this paper SUPPORT, REFUTE, or provide NOT ENOUGH INFO for the claim? Return verdict + short rationale.

1

0

Julian Flieller

@fliellerjulian

22 days

Step 2: Generate refutations. Use any AI to flip the meaning: -> “LiFePO4 does not have superior cycle life.” Now you have SUPPORT vs REFUTE examples.

1

0

Julian Flieller

@fliellerjulian

22 days

Step 1: Collect “citances” (citation sentences). Example: “Prior work shows LiFePO4 has superior cycle life [Smith 2020].” This sentence is a claim, and the cited paper is its evidence.

1

0