
theseriousadult
@gallabytes
Followers
6K
Following
192K
Media
167
Statuses
3K
father, ML enjoyer. building agents @cursor_ai. @midjourney v2-7.
Joined April 2014
this is asinine. deepseek is accomplishing comparable feats with 100x less staff and 1000x less budget. Google needs to fire people whose job is to get in the way, or who treat getting in the way as their 20% time project.
26
19
741
did you know: the best way to spread chinese propaganda & undermine the american economy is to upload preprints to arxiv, release the results open source under a permissive license, then wait for the forbes readers to throw a tantrum.
DeepSeek is legitimately impressive, but the level of hysteria is an indictment of so many. The $5M number is bogus. It is pushed by a Chinese hedge fund to slow investment in American AI startups, service their own shorts against American titans like Nvidia, and hide sanction.
20
45
651
@theojaffee I like that walking around and comfortable seats are the norm on trains. Jumbo planes pressurized to sea level where you can't hear the engines and which take off from airports you can get to 15 minutes before departure having bought the tickets earlier that day would be great.
10
2
532
I often get the sense people are thinking about aging in a way that's deeply wrong. When I look at aging, I see a breakdown of a complex interconnected system, where the whole thing is collectively decaying in an accelerating fashion due to the decay of its parts. What I *don't*.
When we figure out how to significantly slow down aging I am 99% sure that mass production will require technology no more advanced than we had in the 1930s. In other words, we have endured a century of unnecessary suffering because we have not asked sufficiently correct.
32
13
306
come work w/me & have the job on the left
any great engineers out there who want to get closer to ai? we're hiring for the core data team at @midjourney there's cool challenges and big opportunities to both learn and make a difference in the creative capacity of the world.
12
11
315
whale bros have the mandate of heaven, truly. xAI should just run the DeepSeek code on their cluster on the biggest dataset they can cobble together and release the best omni-model the world has ever seen. don't bother post-training it, just make the best base model and release.
🚀 Introducing DeepSeek-V3!. Biggest leap forward yet:.⚡ 60 tokens/second (3x faster than V2!).💪 Enhanced capabilities.🛠 API compatibility intact.🌍 Fully open-source models & papers. 🐋 1/n
9
16
265
absolutely insane to me to see hackers grow up and try to raise their kids in a way that's incompatible with becoming hackers.
🔮 $PLTR co-founder Peter Thiel on screen time for kids 📺:.“If you ask the executives in those companies, how much screen time do they let their kids use, and there’s probably an interesting critique one could make. Andrew: What do you do?. Thiel: “An hour and a half a week.”
32
7
209
@srush_nlp TPUs have massive SIMD instructions on a few simple serial cores. GPUs have a ton of cores which are individually much slower. If you need very dynamic memory access patterns in your code they'll be much worse at it than GPUs because they can't just swap out to another thread.
6
17
198
entropix is reasonable evidence for harder takeoffs. I'm not *convinced* but I am convinced to take it more seriously. @doomslide I owe you some bayes points.
6
3
181
I remember seeing dalle1 and thinking "goddamn OpenAI is going to build the coolest stuff and never release it bc they believe in AGI not products." my very next thought was "what an opportunity!" and immediately set to work on replicating it. roughly 1.5y later I beat it.
i remember panicking about dalle 1, 3 years ago. i thought that AI was going to be locked in a orwellian "I'm your mommy" tech company basement. I'm glad I was wrong. technology wants to be free. it will always escape. because the people who build it, build it as worship.
3
3
183
it turns out all you needed was a cracked free tier to displace the homework app?. that or everyone has already downloaded chatgpt and this is only measuring recent downloads for some quite short window.
DeepSeek app sitting at number 1 overall in the US Iphone App Store is not on my bingo card and is the biggest sign yet that the ChatGPT moat can maybe be cracked.
7
2
168
idk man o1 pro is pretty good. not quite there yet but "principal engineer of the gaps" is getting awful tight. I've got a few years left before I can fully automate my job but probably not 10, and I think I've got one of the most automation-resistant IC jobs in tech.
ofc you think AI will do all SWE work in 2 years, you have 6 months of experience where they only give you tasks so well-scoped a child could do them. you’ll be convinced of that until you make SWE2 then you’ll be posting about how “general reasoning” is required to do your job
6
1
154
@destructionset @fabiooreilly this reads more like standard alphabet cope than a real justification. Spotify did 4b in revenue q2, that's nearly half what gcp did.
3
0
142
llm phenomenology is understudied and I want people less weird than @repligate and @jd_pressman to look into it more. not because I don't like their work but because the study can't mature like this.
27
5
152
this is what *real* ai safety evals look like btw. and this one is genuinely concerning.
Claude 3.5 Sonnet agents use "costly punishment" sparingly (pay resources to reduce a different agent's resources) against free-riders to maintain cooperation, increase payoffs. Gemini 1.5 Flash agents overuse punishment so much that they harm the collective outcome
1
6
132
ok screw it, I'll put my money where my mouth is here. 10k$ bounty and a job offer to anyone who can figure out how to make a Mondrian compress to at least 16x fewer tokens than an equivalent resolution Where's Waldo in a way that generalizes like you'd expect.
@Ethan_smith_20 @_clashluke not all pictures are worth a thousand words. some much less, some much more. any scheme which doesn't account for this is leaving a lot of compression on the table. hierarchical encoding isn't it imo. we have to be dynamic somewhere, might as well be at the first opportunity.
10
10
119
not me tho. big parallel thinking just got derisked at scale. they'll catch up. if recursive self improvement is the game OpenAI will win. if industrial scaling is the game it'll be Google. if unit economics are the game then everyone will win.
8
5
122
this happened *to me*. my parents are not remotely tech bros. they tried their best, put me in schools they felt were good, and those schools thought that the best way to enrich my math education was to make me teach the other kids. this WILL NOT HAPPEN to my children.
@NielsHoven @andrewbunner It's so weird how this keeps happening to the children of the Tech Bro community. Will no one speak for them?.
9
1
116
in what sense is this diffusion? I see no SDE, no probability flow, no noise. not every iterative sampling method is diffusion!. this paper is genuinely impressive but it's a new thing I don't see how I would port diffusion intuitions over to it.
Large Language Diffusion Models. Introduces LLaDA-8B, a large language diffusion model that pretrained on 2.3 trillion tokens using 0.13 million H800 GPU hours, followed by SFT on 4.5 million pairs. LLaDA 8B surpasses Llama-2 7B on nearly all 15 standard zero/few-shot learning
9
5
118
@finmoorhouse Nick is describing position taking over medium to long time scales not day trading.
1
0
112
come work with us! my DMs are open. if you want to work on coding models or the wild infrastructure challenges required to train them well we need more people! it's a small & highly capable team working on super cool problems w/huge impact.
Some personal news: I recently joined Cursor. Cursor is a small, ambitious team, and they’ve created my favorite AI systems. We’re now building frontier RL models at scale in real-world coding environments. Excited for how good coding is going to be.
1
1
111
this shit is everywhere and is basically fraudulent. cut it out. stop crying wolf. I'm actually mad because I want to be able to know if we're seeing serious signs of misalignment and instead I have to disregard ~everything reported.
Things like this detract from the credibility of AI safety work, IMO -- it sounds spicy ("o1 tried to escape!!!") but when you dig into the details it's always "we told the robot to act like a sociopath and maximize power, and then it did exactly that".
6
2
109
Jax on TPU is such a lovely contrast to everyone's complaints about Torch on GPU. Feel like I'm running a Linux webserver in 2004 - this is so much less jank than the market-leading madness, but people haven't yet switched en-masse due to some combination of not knowing that.
A fantastic post on large-scale infra pain. If you've wondered why MosaicML was a unicorn, it's this. tl;dr:. Every cluster and every PyTorch library is its own unique, broken, unstable snowflake. Everything is hard at scale. Nothing "just works.". We get paid to abstract this.
9
6
108
MJ was fully remote from day 1. The problem isn't remote vs in-office (though that does have significant downsides!) but that Google as a company has no fire. Plenty of individual employees do, but the company doesn't.
Former Google CEO Eric Schmidt says Google lost its competitive edge when it decided that employees working from home and going home early was more important than winning
2
5
107
you gotta be question maxxing. you gotta be coming up with turing award level questions while you're putting on your socks. you gotta be theorizing novel attention mechanisms in the shower. you gotta be debugging transformer architectures in your dreams.
2
7
93
idk man at 16 I had a lot more fun and learned more hanging out with grad students and adults after school than with other 16 year olds.
You know those prodigies who end up in college at 16? What kind of experience do you think they're having?. Absolutely zero college kids - sorry, adults - will want to hang around a 16 year old for reasons that I hope are obvious. Let your kid grow up like everyone else. 6/.
7
1
88
people don't realize how simpler rectified flow is. also, it's not magic. it's maybe marginally better than v diffusion + cosine schedule. scale still rules everything around me.
@amelie_iska it's really simple. probably the simplest diffusion setup I've seen so far.
5
13
87
@TheXeophon that doesn't make them a joke. these things are expensive to run, and it's not too hard for even moderately heavy users to go way over the 20$ plan.
2
0
89
they just *do not get* the philosophical lessons of deep learning. they really fundamentally don't get it. it's not that it hasn't been explained to them. you can lead a horse to water, but you can't make it compatible with his ontology.
Regular reminder that MIRI folks consider it plausible that AI just keeps being more and more beneficial for society up until the day before AI causes everyone to drop dead in the same five seconds. The x-risk view has never been very close to the generic "AI bad, boo AI" view.
7
3
88
the downside to the memory feature is that there's no way to "send prompt" - as soon as I realized how powerful it was I put some deliberate effort into building persistent respect & rapport with the models and now my chatgpt experience is different.
@gallabytes Ohhhh send prompt, this is basically my first pass filter for ideas.
5
4
87
@kalomaze it's for the vibes. sakana is about "the swarm" and various other ngmi ideas, executed with enough obsessive competence that it kinda works anyway.
8
2
87
this is a neat thread but missing the core issue with gcp ime:.@Google doesn't actually ship the tech they use internally! they ship weird nerfed buggy versions of similar products instead.
~12yrs ago, I got a job @Google. Those were still early days of cloud. I joined GCP @<150M ARR & left @~4B (excld GSuite). Learned from some of the smartest ppl in tech. But we also got a LOT wrong that took yrs to fix. Much of it now public, but here’s my ring-side view👇.
5
2
84
another day another gpu dev box with broken software. seems like @RekaAILabs has a pretty similar experience to mine. GOOG >>> NVDA if they can just bring themselves to *sell their goddamn hardware*.
6
3
77
Drives me nuts to see them plowing all this capacity into free tier while our paid capacity requests for Gemini API have been delayed for weeks.
We just expanded the Gemini API free tier access (the most generous LLM API free tier out there) to 35 additional countries including the EU 🇪🇺 and UK 🇬🇧. Happy building : ).
3
1
80
longshoremen level scummy move. @OpenAI this is disgraceful.
They also argue for banning the use of PRC-produced models within Tier 1 countries that 'violate user privacy and create security risks such as the risk of IP theft.' . This is an anti-Whale harpoon.
2
5
78
this feels in line with my sense of the quality of the product. 4o actually got good? not just the image stuff the normal model too. deep research is great. o1 and 4.5 are good premium offerings. they filled out the product pretty well.
This isn't an April Fools joke: ChatGPT revenue has surged 30% in just three months. In this morning's Agenda, @amir and I get into ChatGPT's growth, the OpenAI-Google attention war, and what Sam Altman is actually saying by releasing an open model:.
2
1
75
got used to r1 and now that it's overloaded it's hard to go back. @deepseek_ai please do something amazing and be the first LLM provider to offer surge pricing. the unofficial APIs are unusably slow.
7
3
73
very bullish on 3, interested in how far lightweight approximations can go via 2, and super bearish on 1. 4 is orthogonal. taking notes seems good.
afaik, there are 4 main ways we could get LLM memory:. ->we just get really long contexts and the context grows over an instance's “lifetime”; optionally can do iterative compression/summarization.->state space model that keeps memory in constant size vector.->each context is a.
5
2
64