
Julian Flieller
@fliellerjulian
Followers
465
Following
2K
Media
126
Statuses
902
building ai agents @ https://t.co/hIxvnB9On4 (yc w23) · prev https://t.co/x3LV6373wN
🇺🇸🇩🇪
Joined June 2015
Accidentally said "hard" instead of "non-trivial" and they kicked me out of SF
298
610
17K
made it open-source:
github.com
Contribute to fliellerjulian/claude-code-feedback development by creating an account on GitHub.
0
0
1
Guide from @AnthropicAI on when to actually build agents 👇 The golden rule: "Find the simplest solution possible, and only increase complexity when needed." The complexity ladder (only climb when you must): Single prompt → Prompt chaining → Routing → Agents However, most
1
0
1
Prompt engineering = write a good query Context engineering = curate what the agent sees, iteratively Key insight from @AnthropicAI's guide: More context ≠ better performance "As tokens in the context window increase, the model's ability to accurately recall information
2
0
0
suggested infra: (read my last post about why to use refutations and support facts)
0
0
0
Why RAG works better for fact-checking: - Answers are grounded in retrieved context. - Citations = verifiability. - Hallucination risk is reduced. - Updating knowledge = add documents (no retraining).
1
0
0
What about RAG? - Retriever: pulls highly relevant, trustworthy papers or datasets (with metadata) - Generator: uses those documents to craft the answer. - Grounding: you can cite the exact evidence.
1
0
0
Websearch as a solution: - Links are ranked by popularity, not necessarily relevance. - LLMs then have to parse noisy pages - No guarantee the right evidence is surfaced.
1
0
0
Problem: When you ask a plain LLM a scientific question, it’s like a student relying only on memory. Sometimes they recall correctly, but often they hallucinate or cherry-pick.
1
0
0
If you want to zero-shot scientific claim verification with LLMs, should you use websearch or RAG? 👇
1
0
1
"Agentic coding is a skill that scales with your technical knowledge."
0
0
0
Whats even more surprising (at least to me) that this approach outperforms websearch -> more on that tomorrow. Link to the paper covered in this post:
0
0
0
That’s it. With just prompting + a retriever, GPT-4 (!) was competitive with fully supervised models in scientific claim verification....would be super interesting to see how current frontier models would perform
1
0
0
Step 4: Scale it up. Use a RAG to fetch top-k abstracts and run the verifier on each claim–evidence pair you want to check. Aggregate results → most likely verdict + supporting citation.
1
0
0
Step 3: Give you model of choice a citance + abstract of the cited paper, and ask: Does this paper SUPPORT, REFUTE, or provide NOT ENOUGH INFO for the claim? Return verdict + short rationale.
1
0
0
Step 2: Generate refutations. Use any AI to flip the meaning: -> “LiFePO4 does not have superior cycle life.” Now you have SUPPORT vs REFUTE examples.
1
0
0
Step 1: Collect “citances” (citation sentences). Example: “Prior work shows LiFePO4 has superior cycle life [Smith 2020].” This sentence is a claim, and the cited paper is its evidence.
1
0
0