tginart Profile Banner
Tony Ginart Profile
Tony Ginart

@tginart

Followers
280
Following
3K
Media
7
Statuses
788

AI Hacker. Scientist @SFResearch. Alum: @YCombinator @StanfordAILab.

Joined March 2020
Don't wanna be here? Send us removal request.
@tginart
Tony Ginart
4 days
2 wrong answers; 2 right answers. I will not elaborate.
@simonsarris
Simon Sarris
4 days
Who would you rather win?
0
0
0
@tginart
Tony Ginart
8 days
Why does gpt-5 feel so much smarter than other models in cursor even though they’re so close on benchmarks? The difference is just really stark at this point
0
0
0
@tginart
Tony Ginart
8 days
I’ll make the case that if OpenAI is committed to making sure AGI benefits all of humanity they should IPO immediately. Investing in AI should be democratized and returns shouldn’t go to a small number of privates investors.
@AnjneyMidha
Anjney Midha
8 days
the idea of an ‘IPO’ as an exit is a quaint legacy private frontier AI company shares are now trading with more frequency and liquidity than many publicly traded companies
0
0
1
@tginart
Tony Ginart
8 days
For product, stick with frontier to build an mvp but optimizing with open source is an option down the road. There’s also space for a startup to do the unglamorous work of making an open models usable. This wouldn’t require that much compute but does require data and taste.
0
0
0
@tginart
Tony Ginart
8 days
2️⃣ the polish and ergonomics around the open source models is significantly inferior — more so than the topline capability gap would suggest. Open source models are significantly more jagged and have more rough edges. So what…
1
0
0
@tginart
Tony Ginart
8 days
This is a fascinating trend I’ve been tracking closely as some sitting between ai training and ai product. 1️⃣ yes, in terms of core capabilities, open weights has kept ~6 months behind frontier and it seems like is slowly shrinking BUT…
@natolambert
Nathan Lambert
8 days
A ton of attention over the years goes to plots comparing open to closed models. The real trend that matters for AI impacts on society is the gap between closed frontier models and local consumer models. Local models passing major milestones will have major repercussions.
1
0
0
@tginart
Tony Ginart
10 days
Einstein’s relativity paper (1905) and Satoshi’s Bitcoin paper (2008) are two of a kind. Short, axiomatic, self-contained. Accessible yet rigorous. Resolute — as if carved in stone. Brilliantly simple, carrying the inevitability of sunrise.
0
0
0
@tginart
Tony Ginart
10 days
Tried hooking up LLMs to df earlier this year as a weekend project but tougher than expected… anyone got an open source df clone that runs in bash shell?
@tszzl
roon
10 days
Tarn Adam’s masterpiece “Dwarf Fortress” is possibly the best procedural generation ever applied to video games, to the point where the programmatic engravings the dwarves carve into the stones detailing the worlds history can actually be emotionally moving
0
0
1
@tginart
Tony Ginart
10 days
Yes, GRPO basically just means some kind of policy gradient method using some kind of group relative normalization around rewards.
@willccbb
will brown
10 days
when people say they're doing GRPO they don't mean they're doing *literal* GRPO as it was originally formulated. more of a vibe thing. it's like when people say they're doing SGD but they really mean they're doing AdamW
0
0
3
@tginart
Tony Ginart
11 days
Oh wow this brings back some ptsd
0
0
0
@tginart
Tony Ginart
14 days
Yes the models will be working autonomously for hours or days in a year or two. Yes they will still get helplessly confused and require oversight. Yes it will be weird.
0
0
0
@tszzl
roon
16 days
unbelievable
210
227
8K
@tginart
Tony Ginart
2 months
So it turns out that for function calling, audio pipelined systems work better than end-2-end omni models (for now) — and both incur a performance degradation relative to pure text. Had a lot of fun working on this with @HuanzhiMao
@SFResearch
Salesforce AI Research
2 months
BFCL Audio: A Benchmark for Audio-Native Function Calling 🎙️ Function calling benchmarks focus exclusively on text, but voice interfaces are critical for enterprise call centers and customer support where precision matters. BFCL Audio evaluates models on conversational speech
0
0
4
@tginart
Tony Ginart
2 months
Can’t help but feel a bit sad, torchtune was one of my favorites!
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
2 months
torchtune was deprecated last month and a new repo for post-training at scale is being developed at Facebook. Hoping to hear more about this new library soon!
0
0
2
@tginart
Tony Ginart
2 months
Current RL paradigm won’t fix ai reliability because RL doesn’t meaningfully improve fluid intelligence. It just hill-climbs crystallized intelligence in narrow domains. Current RL paradigm will just make the models increasingly jagged. I still think we need something new.
0
0
3
@tginart
Tony Ginart
2 months
Not enough people talk about how insane diffusion model training seems to autoregressive people… That’s the real black magick
@kalomaze
kalomaze
2 months
you left out the "deeply painful to train + productionize" part
1
0
7
@tginart
Tony Ginart
2 months
Gpt-4.1 is seriously underrated in cursor. (1) so much faster than thinking models (2) follows instructions, doesn’t make rogue changes (3) basically on par with top models for small and medium sized tasks
0
0
1
@tginart
Tony Ginart
3 months
there are theorems that are true but can’t be proved, and there are theorems that are arbitrarily hard to prove. So we’ll always have a frontier to push on, no matter how good ai gets! Mathematics will be ok I think. A lot more computer-aided proofs in the coming years tho
@_Dave__White_
Dave White
3 months
the openai IMO news hit me pretty heavy this weekend i'm still in the acute phase of the impact, i think i consider myself a professional mathematician (a characterization some actual professional mathematicians might take issue with, but my party my rules) and i don't think i
0
0
2
@tginart
Tony Ginart
3 months
btw not a dunk on cursor. this is rare. i happily use it everyday.
1
0
1
@tginart
Tony Ginart
3 months
Degenerate repetitions in the wild on @cursor_ai! This is why we need the LZ Penalty: https://t.co/V8Bktb5je8
1
0
2