
Prateek Joshi
@PrateekVJoshi
Followers
7K
Following
8K
Media
545
Statuses
10K
infra investor at @MoxxieVentures | author of 13 AI books | nvidia alum | recovering founder
San Francisco Bay Area
Joined September 2009
search APIs are infra, not features. freshness, vertical recall, and provenance need SLAs as opposed to footnotes. generic search loses to domain-tuned pipelines with traceable sources. own crawl cadence, schema, and disambiguation. expose “why this result” like a receipt.
0
0
0
incredible! thank you for doing the lord’s work
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,
0
0
1
this captures it well
@zenitsu_aprntc Good question, it's basically entirely hand-written (with tab autocomplete). I tried to use claude/codex agents a few times but they just didn't work well enough at all and net unhelpful, possibly the repo is too far off the data distribution.
0
0
0
nobody trusts your model. everyone trusts win-rate on their golden set.
0
0
1
observability for long-running agents is a weird beast. single LLM calls are easy to debug. long chains are crime scenes. you need spans, traces, replays, and “what if we changed step 7?” store tool i/o, prompt diffs, and policy versions like source code. one-click
0
0
1
messy workflows → trainable environments. your logs, UIs, and APIs are an unlabeled RL gym waiting to happen. instrument them and you compress weeks of onboarding into hours. capture states, actions, feedback, and failure modes. simulate before writing to prod. don’t beg for
0
0
1
everything is about creating leverage
Saturday scoop: Thinking Machines Lab co-founder Andrew Tulloch has joined Meta, the startup confirmed. W/ @keachhagey
0
0
0
RL for agents = ops. everyone romanticizes rewards in RL. but the hard part is budgets, replay buffers, and “don’t break prod”. treat RL like SREs treat incidents. define observable rewards tied to outcomes. keep offline datasets clean and counterfactuals handy.
0
0
2
RLP
Lot of insights in @YejinChoinka's talk on RL training. Rip for next token prediction training (NTP) and welcome to Reinforcement Learning Pretraining (RLP). #COLM2025 No place to even stand in the room.
0
0
1
agent orchestration is the control plane. models are table stakes. routing, memory, tools, and rollback are turning out to be the differentiators. it's like kubernetes but for decisions and side effects. give pm-friendly levers. who can call what tool, what gets cached, when
0
0
1
Stefano Ermon is the OG of diffusion LLMs. Here's my convo with him. Really insightful. And he's great at explaining things.
Our CEO @StefanoErmon joined the Infinite Curiosity Podcast and shared how our Mercury diffusion LLMs deliver faster, cheaper models and why diffusion is reshaping coding, reasoning, and multimodal AI. Thanks for having him on @PrateekVJoshi! https://t.co/9gTrf5IEMV
0
0
1
The RL in IRL stands for reinforcement learning. remember this when someone wants to meet you IRL. you're welcome!
0
0
0