shriyashku Profile Banner
Shriyash Upadhyay Profile
Shriyash Upadhyay

@shriyashku

Followers
346
Following
300
Media
7
Statuses
237

Founder @withmartian

Joined February 2018
Don't wanna be here? Send us removal request.
@shriyashku
Shriyash Upadhyay
2 years
Safety is being subsumed to capitalism because when there is misalignment between the two, capitalism wins. The only way to make sure AI is safe is to make a strong capitalist case for the technologies that will make AI safe: creating an economic incentive to understand models.
2
0
9
@shriyashku
Shriyash Upadhyay
1 day
I guess I'll tweet more about both? Seems like a weak local solution. Strong global solutions pending...
0
0
0
@shriyashku
Shriyash Upadhyay
1 day
I wish startup people knew more about how research actually works. I also wish researchers knew more about how startups actually work.
1
0
0
@shriyashku
Shriyash Upadhyay
4 days
Admittedly, maybe Roam made you a thinker a few years ago, probably Obsidian now, sadly
0
0
0
@shriyashku
Shriyash Upadhyay
4 days
If AI-made software can solve the obvious problems, more value in software will come from identity. Roam makes you a thinker. Apple makes you a taste-maker. Nintendo makes you fun. SWE will be less about tools and more about mirrors. Reflecting back who the user wants to be.
2
0
2
@ashtom
Thomas Dohmke
4 days
tl;dr Today, we’re announcing our new company @EntireHQ to build the next developer platform for agent–human collaboration. Open, scalable, independent, and backed by a $60M seed round. Plus, we are shipping Checkpoints to automatically capture agent context. In the last three
@EntireHQ
Entire
4 days
Beep, boop. Come in, rebels. We’ve raised a 60m seed round to build the next developer platform. Open. Scalable. Independent. And we ship our first OSS release today. https://t.co/OvPKCcjXbq
165
290
2K
@shriyashku
Shriyash Upadhyay
4 days
Really interesting to think about what coding looks like if we reject repos of code as the right abstraction. I worked on PL stuff and even interned at PL cos back in the day because it's such an intellectually fascinating question. Excited for @ashtom to take a big swing at it!
@EntireHQ
Entire
4 days
Beep, boop. Come in, rebels. We’ve raised a 60m seed round to build the next developer platform. Open. Scalable. Independent. And we ship our first OSS release today. https://t.co/OvPKCcjXbq
1
0
5
@shriyashku
Shriyash Upadhyay
4 days
I know I'm taking the adage a bit too literally in this tweet, but the more generalized form of the advice is also wrong
0
0
0
@shriyashku
Shriyash Upadhyay
4 days
Amusing how much startup advice is just wrong. "Build painkillers, not vitamins" The global supplements market: ~$203 billion. Painkillers: $87 billion.
2
1
2
@shriyashku
Shriyash Upadhyay
5 days
Thinking of training an RL model specialized for @openclaw using ARES. So it gets better performance and much lower token costs. Is this something folks would be interested in using? 100 likes and I put up an endpoint
3
2
14
@shriyashku
Shriyash Upadhyay
5 days
You can try to extrapolate current trends in software and land on "all software will become A/B testing". This is reductionist. You can't A/B test your way to the iPhone.
0
0
2
@shriyashku
Shriyash Upadhyay
9 days
Josh will be covering some very cool pieces of the ARES roadmap that *you* could help build
@joshgreaves_ml
Josh Greaves
9 days
We'll be presenting the ARES roadmap at office hours tomorrow at 2pm PT. If you're interested in agents | RL | interp and want to contribute to open-source send me a DM for more info. https://t.co/tGymWgVm5b
0
1
6
@shriyashku
Shriyash Upadhyay
16 days
As part of Prod, can confirm @bfspector is quite wonderful. Congrats to the whole team
0
0
1
@shriyashku
Shriyash Upadhyay
16 days
If you're building a god, there are apparently three ways to name things: - Technical Mumbo Jumbo (ChatGPT) - Very serious (Anthropic) - With a sense of... let's call it child-like wonder
@flappyairplanes
Flapping Airplanes
17 days
Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.
2
0
1
@MavorParker
Augustine Mavor-Parker
16 days
RL progress is bottlenecked by infra for training and evaluation. @VmaxAI is excited to be partnering @withmartian, generating environments for the Agentic Research and Evaluation (ARES) framework
7
30
74
@MavorParker
Augustine Mavor-Parker
16 days
This is a preview of many more tasks to come for Ares!
@joshgreaves_ml
Josh Greaves
16 days
ARES uses the Harbor task format ( @alexgshaw ). It comes with SWE-Bench Verified, TerminalBench2, SWESmith, and everything else in the Harbor ecosystem. We're also releasing 1k new JavaScript tasks with @VmaxAI ( @MavorParker @matthewjsargent ) to help the ecosystem grow.
1
5
19
@shriyashku
Shriyash Upadhyay
16 days
If you’re building agents, RL algorithms, harnesses, or benchmarks: plug into ARES. Let’s make the online-RL coding stack a shared public good.
0
0
2
@shriyashku
Shriyash Upadhyay
16 days
Where ARES fits: we’re open-sourcing the missing horizontal layer—Gym-like + async-native infra for true online RL on coding agents, with the boundary at the LLM interface. The history of RL shows RL for LLMs should be online. Repo:
Tweet card summary image
github.com
Agentic Research and Evaluation Suite. Contribute to withmartian/ares development by creating an account on GitHub.
1
0
2
@shriyashku
Shriyash Upadhyay
16 days
@cognition/@windsurf SWE-1.5 points the same way: end-to-end RL on a custom harness (Cascade), high-fidelity coding envs + hardened rewards. https://t.co/Rfw9SzzKpT &
Tweet card summary image
windsurf.com
SWE-1.5 is our latest frontier model, delivering near-SOTA coding performance at unprecedented speed.
1
0
0
@shriyashku
Shriyash Upadhyay
16 days
@cursor_ai's Tab RL (online RL on editor feedback -- probably the first massive IRL online example):
Tweet card summary image
cursor.com
Our new Tab model makes 21% fewer suggestions while having 28% higher accept rate.
1
0
0