Surya Dantuluri
@sdand
Followers
14K
Following
44K
Media
379
Statuses
3K
ml https://t.co/r5ICReH9eO
san francisco
Joined November 2016
the true AI rollup isn't buying legacy businesses, it's taking over a small developing nation, instituting pro-growth ai-friendly regulation, and watching as 2029-era agi produces 20% y/y gdp growth
0
2
12
tinkerviz and geospot for tinker is here ! https://t.co/fJHjSJaPp8 also cc @clarejtbirch @alhyunsoo @johnschulman2
github.com
Contribute to sdan/geospot development by creating an account on GitHub.
0
0
1
Initially built on Qwen3vl RL gym I rolled up in JAX here: https://t.co/E6gwW2ji1P
https://t.co/TAne21D7iV
I found VLMs can learn geolocation through progressive geodesic tightening. In abstract, you shape the rewards to enable it to learn country->region->city and blend it with predicted coordinates depending on the schedule. A visual on how that would look:
1
0
1
So when I came across Tinker I felt it was similar to when I found OAI Gym in HS. it’s enough of an abstraction that I don’t need to deal with underlying infra and focus on tuning everything else. I’ve implemented GEPA in tinker here https://t.co/IGOTsXhKUU and am wrapping up on
github.com
GEPA using Tinker. Contribute to sdan/tinker-gepa development by creating an account on GitHub.
2
1
2
The impetus here was trying to solve geoguessing. a year ago i made an app based on GeoCLIP a contrastive trained embedding model that was decent(but spent more time making the app pretty). recently i tried training Qwen3VL by reimplementing it by hand in JAX where I ran into a
1
0
1
Inspired by the tinker console i made tinkerviz available below within geospot - it comes with several pages to stream rollouts in realtime. this is becuase we need to stream in data (hf in this case) down and lightly process and send up images, in that process we get batch
1
0
2
I made a geoguessing RL environment with the Tinker API this weekend TML started vision support on Friday so I decided to make a dashboard to visualize GRPO in realtime
4
1
39
https://t.co/Clk6OtApl1 full gameplay transcripts
gist.github.com
GitHub Gist: instantly share code, notes, and snippets.
0
0
1
full gameplay transcript:
gist.github.com
GitHub Gist: instantly share code, notes, and snippets.
1
0
14
claude playing TFT -- i ran it for two games and it improved in-context from 0 to 3/30 rounds won. it figured out "3-starring" on its own (buying pairs to upgrade units) which is a core mechanic of this game and hasnt been instructed on how to play besides asking it to "play tft
35
8
342
I wrapped up my work/studies and decided to release this this personal project, try it out just byok: https://t.co/5AShaeI1te some more notes on cua... currently the paradigm taking over osworld is using the computer's cli directly to shortcut actions/commands instead of using
github.com
native claude computer use app for mac. Contribute to pravaco/cuaview-releases development by creating an account on GitHub.
1
3
29
But it can book flights and hotels for you pretty well if you ask it nicely!
1
0
18
Similar in prior instances, claude seems to get distracted quickly. Here its instructed to find car insurance. At some point it gets frustrated on how extensive the form is and gets distracted going to Geico half way through
3
0
40
When given instructions to build a house in minecraft it decides to circumvent the intended environment and start playing in creative mode, and uses slash commands to build a house programatically which it, to its credit, finishes In the prior video with league it proactively
2
0
62
Can computer-use models play games now, one-shot? I gave Claude Opus 4.5 a simple prompt like "play league of legends" and it starts clicking and typing around my computer pretty effectively even though it doesn't win due to latency More interestingly between Minecraft, finding
68
43
986
opus 4.5 was a surprise in the sense that it matches or beats sonnet at latency which is leaps over opus 4.1. the tool calling is very strong, partially because it uses much fewer tokens than any other model ive used opus is #1 on osworld verified at 66% (human is ~72%) which
1
0
23
there wasnt a cua app for the mac so here it is: https://t.co/e5I99El9Jg just byok
github.com
Cuaview v0.2.0 Native macOS client for Claude computer use. For free, no logging, just bring your own Anthropic Key. EmergentBehavrioLeague_10mb.mp4 Features: E...
6
3
64
claude 4.5 opus doing taxes end-to-end, one-shot no special prompting or tuning (at 20x speed)
I'd bet 2028 for computer use agents that can do taxes end-to-end for my small business as well as a competent general manager could in a week: including chasing down all the receipts on different websites, emailing back and forth for invoices, and filing to the IRS
22
44
1K