sdand Profile Banner
Surya Dantuluri Profile
Surya Dantuluri

@sdand

Followers
14K
Following
44K
Media
379
Statuses
3K

ml https://t.co/r5ICReH9eO

san francisco
Joined November 2016
Don't wanna be here? Send us removal request.
@kliu128
Kevin Liu
3 days
the true AI rollup isn't buying legacy businesses, it's taking over a small developing nation, instituting pro-growth ai-friendly regulation, and watching as 2029-era agi produces 20% y/y gdp growth
0
2
12
@sdand
Surya Dantuluri
4 days
Initially built on Qwen3vl RL gym I rolled up in JAX here: https://t.co/E6gwW2ji1P https://t.co/TAne21D7iV
@sdand
Surya Dantuluri
2 months
I found VLMs can learn geolocation through progressive geodesic tightening. In abstract, you shape the rewards to enable it to learn country->region->city and blend it with predicted coordinates depending on the schedule. A visual on how that would look:
1
0
1
@sdand
Surya Dantuluri
4 days
So when I came across Tinker I felt it was similar to when I found OAI Gym in HS. it’s enough of an abstraction that I don’t need to deal with underlying infra and focus on tuning everything else. I’ve implemented GEPA in tinker here https://t.co/IGOTsXhKUU and am wrapping up on
Tweet card summary image
github.com
GEPA using Tinker. Contribute to sdan/tinker-gepa development by creating an account on GitHub.
2
1
2
@sdand
Surya Dantuluri
4 days
The impetus here was trying to solve geoguessing. a year ago i made an app based on GeoCLIP a contrastive trained embedding model that was decent(but spent more time making the app pretty). recently i tried training Qwen3VL by reimplementing it by hand in JAX where I ran into a
1
0
1
@sdand
Surya Dantuluri
4 days
Inspired by the tinker console i made tinkerviz available below within geospot - it comes with several pages to stream rollouts in realtime. this is becuase we need to stream in data (hf in this case) down and lightly process and send up images, in that process we get batch
1
0
2
@sdand
Surya Dantuluri
5 days
I made a geoguessing RL environment with the Tinker API this weekend TML started vision support on Friday so I decided to make a dashboard to visualize GRPO in realtime
4
1
39
@sdand
Surya Dantuluri
11 days
61 million street view pictures downloaded ✅
0
0
13
@sdand
Surya Dantuluri
23 days
*runs at 0.07hz w/ thinking and .15hz without
1
0
9
@sdand
Surya Dantuluri
23 days
full gameplay transcript:
Tweet card summary image
gist.github.com
GitHub Gist: instantly share code, notes, and snippets.
1
0
14
@sdand
Surya Dantuluri
23 days
claude playing TFT -- i ran it for two games and it improved in-context from 0 to 3/30 rounds won. it figured out "3-starring" on its own (buying pairs to upgrade units) which is a core mechanic of this game and hasnt been instructed on how to play besides asking it to "play tft
35
8
342
@sdand
Surya Dantuluri
23 days
I wrapped up my work/studies and decided to release this this personal project, try it out just byok: https://t.co/5AShaeI1te some more notes on cua... currently the paradigm taking over osworld is using the computer's cli directly to shortcut actions/commands instead of using
Tweet card summary image
github.com
native claude computer use app for mac. Contribute to pravaco/cuaview-releases development by creating an account on GitHub.
1
3
29
@sdand
Surya Dantuluri
23 days
But it can book flights and hotels for you pretty well if you ask it nicely!
1
0
18
@sdand
Surya Dantuluri
23 days
Similar in prior instances, claude seems to get distracted quickly. Here its instructed to find car insurance. At some point it gets frustrated on how extensive the form is and gets distracted going to Geico half way through
3
0
40
@sdand
Surya Dantuluri
23 days
When given instructions to build a house in minecraft it decides to circumvent the intended environment and start playing in creative mode, and uses slash commands to build a house programatically which it, to its credit, finishes In the prior video with league it proactively
2
0
62
@sdand
Surya Dantuluri
23 days
Can computer-use models play games now, one-shot? I gave Claude Opus 4.5 a simple prompt like "play league of legends" and it starts clicking and typing around my computer pretty effectively even though it doesn't win due to latency More interestingly between Minecraft, finding
68
43
986
@sdand
Surya Dantuluri
25 days
opus 4.5 was a surprise in the sense that it matches or beats sonnet at latency which is leaps over opus 4.1. the tool calling is very strong, partially because it uses much fewer tokens than any other model ive used opus is #1 on osworld verified at 66% (human is ~72%) which
1
0
23
@sdand
Surya Dantuluri
25 days
claude 4.5 opus doing taxes end-to-end, one-shot no special prompting or tuning (at 20x speed)
@dwarkesh_sp
Dwarkesh Patel
7 months
I'd bet 2028 for computer use agents that can do taxes end-to-end for my small business as well as a competent general manager could in a week: including chasing down all the receipts on different websites, emailing back and forth for invoices, and filing to the IRS
22
44
1K