Shriyash Upadhyay
@shriyashku
Followers
346
Following
300
Media
7
Statuses
237
Founder @withmartian
Joined February 2018
Safety is being subsumed to capitalism because when there is misalignment between the two, capitalism wins. The only way to make sure AI is safe is to make a strong capitalist case for the technologies that will make AI safe: creating an economic incentive to understand models.
2
0
9
I guess I'll tweet more about both? Seems like a weak local solution. Strong global solutions pending...
0
0
0
I wish startup people knew more about how research actually works. I also wish researchers knew more about how startups actually work.
1
0
0
Admittedly, maybe Roam made you a thinker a few years ago, probably Obsidian now, sadly
0
0
0
If AI-made software can solve the obvious problems, more value in software will come from identity. Roam makes you a thinker. Apple makes you a taste-maker. Nintendo makes you fun. SWE will be less about tools and more about mirrors. Reflecting back who the user wants to be.
2
0
2
tl;dr Today, we’re announcing our new company @EntireHQ to build the next developer platform for agent–human collaboration. Open, scalable, independent, and backed by a $60M seed round. Plus, we are shipping Checkpoints to automatically capture agent context. In the last three
Beep, boop. Come in, rebels. We’ve raised a 60m seed round to build the next developer platform. Open. Scalable. Independent. And we ship our first OSS release today. https://t.co/OvPKCcjXbq
165
290
2K
Really interesting to think about what coding looks like if we reject repos of code as the right abstraction. I worked on PL stuff and even interned at PL cos back in the day because it's such an intellectually fascinating question. Excited for @ashtom to take a big swing at it!
Beep, boop. Come in, rebels. We’ve raised a 60m seed round to build the next developer platform. Open. Scalable. Independent. And we ship our first OSS release today. https://t.co/OvPKCcjXbq
1
0
5
I know I'm taking the adage a bit too literally in this tweet, but the more generalized form of the advice is also wrong
0
0
0
Amusing how much startup advice is just wrong. "Build painkillers, not vitamins" The global supplements market: ~$203 billion. Painkillers: $87 billion.
2
1
2
Thinking of training an RL model specialized for @openclaw using ARES. So it gets better performance and much lower token costs. Is this something folks would be interested in using? 100 likes and I put up an endpoint
3
2
14
You can try to extrapolate current trends in software and land on "all software will become A/B testing". This is reductionist. You can't A/B test your way to the iPhone.
0
0
2
Josh will be covering some very cool pieces of the ARES roadmap that *you* could help build
We'll be presenting the ARES roadmap at office hours tomorrow at 2pm PT. If you're interested in agents | RL | interp and want to contribute to open-source send me a DM for more info. https://t.co/tGymWgVm5b
0
1
6
As part of Prod, can confirm @bfspector is quite wonderful. Congrats to the whole team
0
0
1
If you're building a god, there are apparently three ways to name things: - Technical Mumbo Jumbo (ChatGPT) - Very serious (Anthropic) - With a sense of... let's call it child-like wonder
Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.
2
0
1
RL progress is bottlenecked by infra for training and evaluation. @VmaxAI is excited to be partnering @withmartian, generating environments for the Agentic Research and Evaluation (ARES) framework
7
30
74
This is a preview of many more tasks to come for Ares!
ARES uses the Harbor task format ( @alexgshaw ). It comes with SWE-Bench Verified, TerminalBench2, SWESmith, and everything else in the Harbor ecosystem. We're also releasing 1k new JavaScript tasks with @VmaxAI ( @MavorParker @matthewjsargent ) to help the ecosystem grow.
1
5
19
If you’re building agents, RL algorithms, harnesses, or benchmarks: plug into ARES. Let’s make the online-RL coding stack a shared public good.
0
0
2
Where ARES fits: we’re open-sourcing the missing horizontal layer—Gym-like + async-native infra for true online RL on coding agents, with the boundary at the LLM interface. The history of RL shows RL for LLMs should be online. Repo:
github.com
Agentic Research and Evaluation Suite. Contribute to withmartian/ares development by creating an account on GitHub.
1
0
2
@cognition/@windsurf SWE-1.5 points the same way: end-to-end RL on a custom harness (Cascade), high-fidelity coding envs + hardened rewards. https://t.co/Rfw9SzzKpT &
windsurf.com
SWE-1.5 is our latest frontier model, delivering near-SOTA coding performance at unprecedented speed.
1
0
0
@cursor_ai's Tab RL (online RL on editor feedback -- probably the first massive IRL online example):
cursor.com
Our new Tab model makes 21% fewer suggestions while having 28% higher accept rate.
1
0
0