
T Ay.
@ayedtay
Followers
601
Following
11K
Media
203
Statuses
3K
A few more thoughts on this.
@boazbaraktcs @xai re: advanced capabilities that can be used for bad things, i understand your point but I would also point out that this is fundamentally the same argument that has been used for every frontier model release since GPT 3.5 - we've cried wold too many times already. re: undocumented.
0
0
0
this reply blew up a bit .some understood it as me being against common safety practices like system cards, red teaming, refusals, censorship, etc. that's not actually my point.I personally don't know.I'm just pointing out that scolding competitors for not doing these practices.
@boazbaraktcs @xai I’ve heard several people from other labs explain all the things that xAI *ought* to do but I don’t hear good justifications of why these things are good and necessary .Do we have any reason to believe these things (system card, censorship, etc.) improve any desirable outcome?.
1
0
1
I like this idea a lot.
Here's the gist:. Insurers have incentives and power to enforce that the companies they insure take action to prevent the risks that matter. They enforce security through an incentive flywheel:. Insurers create standards. Standards outline which risks matter and what companies
1
0
1
I really can’t tell if this is super cool or a gigantic scam.
We have three big @pipedream_labs announcements today. The 3rd is the most ambitious project we've ever attempted. Here’s a quick summary thread:
1
0
1
I’d have to read the whole thing but my hot take is that this is like comparing cars and bicycles in a city center, finding that people on bikes are marginally faster and concluding that driving a car is bad for speed.
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
2
0
2
Reposting this since the original post was removed by @natolambert.
@natolambert Honestly if you’re surprised by the grok 4 numbers in either direction you need to increase cogsec.
0
0
0
More numbers here.
xAI gave us early access to Grok 4 - and the results are in. Grok 4 is now the leading AI model. We have run our full suite of benchmarks and Grok 4 achieves an Artificial Analysis Intelligence Index of 73, ahead of OpenAI o3 at 70, Google Gemini 2.5 Pro at 70, Anthropic Claude
0
0
0
Grok 4 single run with no tools beats o3-pro.
@Evantaged @arXivBangers Base, with no tools. We have not tested Grok 4 Heavy yet.
1
0
0