
Tony Ginart
@tginart
Followers
280
Following
3K
Media
7
Statuses
788
AI Hacker. Scientist @SFResearch. Alum: @YCombinator @StanfordAILab.
Joined March 2020
Why does gpt-5 feel so much smarter than other models in cursor even though they’re so close on benchmarks? The difference is just really stark at this point
0
0
0
I’ll make the case that if OpenAI is committed to making sure AGI benefits all of humanity they should IPO immediately. Investing in AI should be democratized and returns shouldn’t go to a small number of privates investors.
the idea of an ‘IPO’ as an exit is a quaint legacy private frontier AI company shares are now trading with more frequency and liquidity than many publicly traded companies
0
0
1
For product, stick with frontier to build an mvp but optimizing with open source is an option down the road. There’s also space for a startup to do the unglamorous work of making an open models usable. This wouldn’t require that much compute but does require data and taste.
0
0
0
2️⃣ the polish and ergonomics around the open source models is significantly inferior — more so than the topline capability gap would suggest. Open source models are significantly more jagged and have more rough edges. So what…
1
0
0
This is a fascinating trend I’ve been tracking closely as some sitting between ai training and ai product. 1️⃣ yes, in terms of core capabilities, open weights has kept ~6 months behind frontier and it seems like is slowly shrinking BUT…
A ton of attention over the years goes to plots comparing open to closed models. The real trend that matters for AI impacts on society is the gap between closed frontier models and local consumer models. Local models passing major milestones will have major repercussions.
1
0
0
Einstein’s relativity paper (1905) and Satoshi’s Bitcoin paper (2008) are two of a kind. Short, axiomatic, self-contained. Accessible yet rigorous. Resolute — as if carved in stone. Brilliantly simple, carrying the inevitability of sunrise.
0
0
0
Tried hooking up LLMs to df earlier this year as a weekend project but tougher than expected… anyone got an open source df clone that runs in bash shell?
Tarn Adam’s masterpiece “Dwarf Fortress” is possibly the best procedural generation ever applied to video games, to the point where the programmatic engravings the dwarves carve into the stones detailing the worlds history can actually be emotionally moving
0
0
1
Yes, GRPO basically just means some kind of policy gradient method using some kind of group relative normalization around rewards.
when people say they're doing GRPO they don't mean they're doing *literal* GRPO as it was originally formulated. more of a vibe thing. it's like when people say they're doing SGD but they really mean they're doing AdamW
0
0
3
Oh wow this brings back some ptsd
0
0
0
Yes the models will be working autonomously for hours or days in a year or two. Yes they will still get helplessly confused and require oversight. Yes it will be weird.
0
0
0
So it turns out that for function calling, audio pipelined systems work better than end-2-end omni models (for now) — and both incur a performance degradation relative to pure text. Had a lot of fun working on this with @HuanzhiMao
BFCL Audio: A Benchmark for Audio-Native Function Calling 🎙️ Function calling benchmarks focus exclusively on text, but voice interfaces are critical for enterprise call centers and customer support where precision matters. BFCL Audio evaluates models on conversational speech
0
0
4
Current RL paradigm won’t fix ai reliability because RL doesn’t meaningfully improve fluid intelligence. It just hill-climbs crystallized intelligence in narrow domains. Current RL paradigm will just make the models increasingly jagged. I still think we need something new.
0
0
3
Gpt-4.1 is seriously underrated in cursor. (1) so much faster than thinking models (2) follows instructions, doesn’t make rogue changes (3) basically on par with top models for small and medium sized tasks
0
0
1
there are theorems that are true but can’t be proved, and there are theorems that are arbitrarily hard to prove. So we’ll always have a frontier to push on, no matter how good ai gets! Mathematics will be ok I think. A lot more computer-aided proofs in the coming years tho
the openai IMO news hit me pretty heavy this weekend i'm still in the acute phase of the impact, i think i consider myself a professional mathematician (a characterization some actual professional mathematicians might take issue with, but my party my rules) and i don't think i
0
0
2
btw not a dunk on cursor. this is rare. i happily use it everyday.
1
0
1
Degenerate repetitions in the wild on @cursor_ai! This is why we need the LZ Penalty: https://t.co/V8Bktb5je8
1
0
2