Julian Flieller Profile
Julian Flieller

@fliellerjulian

Followers
465
Following
2K
Media
126
Statuses
902

building ai agents @ https://t.co/hIxvnB9On4 (yc w23) · prev https://t.co/x3LV6373wN

🇺🇸🇩🇪
Joined June 2015
Don't wanna be here? Send us removal request.
@nicochristie
nico
1 day
Accidentally said "hard" instead of "non-trivial" and they kicked me out of SF
298
610
17K
@fliellerjulian
Julian Flieller
23 hours
made it open-source:
Tweet card summary image
github.com
Contribute to fliellerjulian/claude-code-feedback development by creating an account on GitHub.
@fliellerjulian
Julian Flieller
1 day
Built PR comments for Claude Code
0
0
1
@fliellerjulian
Julian Flieller
1 day
Built PR comments for Claude Code
0
0
0
@fliellerjulian
Julian Flieller
7 days
Guide from @AnthropicAI on when to actually build agents 👇 The golden rule: "Find the simplest solution possible, and only increase complexity when needed." The complexity ladder (only climb when you must): Single prompt → Prompt chaining → Routing → Agents However, most
1
0
1
@fliellerjulian
Julian Flieller
8 days
Prompt engineering = write a good query Context engineering = curate what the agent sees, iteratively Key insight from @AnthropicAI's guide: More context ≠ better performance "As tokens in the context window increase, the model's ability to accurately recall information
2
0
0
@fliellerjulian
Julian Flieller
17 days
suggested infra: (read my last post about why to use refutations and support facts)
0
0
0
@fliellerjulian
Julian Flieller
17 days
Why RAG works better for fact-checking: - Answers are grounded in retrieved context. - Citations = verifiability. - Hallucination risk is reduced. - Updating knowledge = add documents (no retraining).
1
0
0
@fliellerjulian
Julian Flieller
17 days
What about RAG? - Retriever: pulls highly relevant, trustworthy papers or datasets (with metadata) - Generator: uses those documents to craft the answer. - Grounding: you can cite the exact evidence.
1
0
0
@fliellerjulian
Julian Flieller
17 days
Websearch as a solution: - Links are ranked by popularity, not necessarily relevance. - LLMs then have to parse noisy pages - No guarantee the right evidence is surfaced.
1
0
0
@fliellerjulian
Julian Flieller
17 days
Problem: When you ask a plain LLM a scientific question, it’s like a student relying only on memory. Sometimes they recall correctly, but often they hallucinate or cherry-pick.
1
0
0
@fliellerjulian
Julian Flieller
17 days
If you want to zero-shot scientific claim verification with LLMs, should you use websearch or RAG? 👇
1
0
1
@fliellerjulian
Julian Flieller
21 days
"Agentic coding is a skill that scales with your technical knowledge."
0
0
0
@fliellerjulian
Julian Flieller
22 days
Whats even more surprising (at least to me) that this approach outperforms websearch -> more on that tomorrow. Link to the paper covered in this post:
0
0
0
@fliellerjulian
Julian Flieller
22 days
That’s it. With just prompting + a retriever, GPT-4 (!) was competitive with fully supervised models in scientific claim verification....would be super interesting to see how current frontier models would perform
1
0
0
@fliellerjulian
Julian Flieller
22 days
Step 4: Scale it up. Use a RAG to fetch top-k abstracts and run the verifier on each claim–evidence pair you want to check. Aggregate results → most likely verdict + supporting citation.
1
0
0
@fliellerjulian
Julian Flieller
22 days
Step 3: Give you model of choice a citance + abstract of the cited paper, and ask: Does this paper SUPPORT, REFUTE, or provide NOT ENOUGH INFO for the claim? Return verdict + short rationale.
1
0
0
@fliellerjulian
Julian Flieller
22 days
Step 2: Generate refutations. Use any AI to flip the meaning: -> “LiFePO4 does not have superior cycle life.” Now you have SUPPORT vs REFUTE examples.
1
0
0
@fliellerjulian
Julian Flieller
22 days
Step 1: Collect “citances” (citation sentences). Example: “Prior work shows LiFePO4 has superior cycle life [Smith 2020].” This sentence is a claim, and the cited paper is its evidence.
1
0
0