
PITTI
@PITTI_DATA
Followers
663
Following
5K
Media
2K
Statuses
7K
Just trying to kill boredom without killing anyone in the process | Anything unrelated to actual (super niche) area of expertise | Dubito ergo sum
Joined October 2021
Pour les francophones qui souhaitent approfondir ce sujet sans se limiter aux enjeux financiers (par ex: droit de la concurrence, droit des contrats, privacy) je recommande vivement les transcriptions d’un récent webinaire de l’@InstitutPresaje sur ces thèmes. Lien ci-dessous.
Good take and this makes crypto a relatively "normal" development. There always were alternative, badly regulated financial market for speculative trading and recycling of dirty money. All either disappeared or got regulated.
1
0
1
I think that the METR benchmark is fundamentally flawed as it is biased towards tasks for which failure and verification are cheap. For that reason I regard any post with METR charts as bait. But at the same time, I believe that the IBM analogy is valid
if you told someone in 1965 that a watch in 2025 would be more powerful than the warehouse-sized ibm model 91, i think you’d see similar cope.
0
0
2
I’m fundamentally opposed to any form of scanning (eye, face, …) but I have to admit that UX is better on 🦋 after the verification than on X where this is the first thing I see 50% of the times I open the app.
@xlr8harder @georgejrjrjr the situation in the Uk is concerning (I had to scan my face to access bluesky!) Australia seems equally bad if not worse. The US seems to have a different problem but in both cases I interpret it as an expression of profound political rot. To me it’s the same underlying issue.
0
0
1
I unironically wrote a series of articles entitled ‘Aligned’ in April describing that version of alignment
so most people are only working on alinging AIs to us. but do we even know what humans want? we've already aligned machines to human desire they're called slot machines and Tiktok. what things can we learn from them? this is bidirectional alignment, or "productive misalignment",
1
0
2
There is no way to know the actual volume (in tokens) for closed models. For market share in value, the open router data should probably be combined with card providers data. I remember finding the Ramp data insightful last year
People draw far too many conclusions from the Open Router market share plot. You should only rely on it for:.a) open models, that.b) don't have API offerings elsewhere, which is .c) a very weird minority market. (its still cool, but not industry defining).
0
0
1
> No one trusts an LLM that says you're right all the time. No-one should be in a trust relationship with a LLM. It does not make sense. You cannot negotiate with a LLM. A LLM will never return a favor. A LLM will never lose face. A LLM will not bear the risk of its mistakes.
Important point about the consumer backlash to GPT-5: it was NOT the sycophancy that people missed. This is evident if you read the subreddits. People missed 4o's fun personality - the use of emojis, all caps, slang, and general sense of ~life~. Very different issue from.
1
0
1
I definitely noticed how short the gpt-oss reasoning chains were. And I would have loved to see phi-4 thinking on that bench as I imagine this model as the worst offender.
Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark. We measured token usage across reasoning models: open models output 1.5-4x more tokens than closed models on identical tasks, but with huge variance depending on task type (up to
1
0
5
one of my vibe checking prompt is precisely this task could, and I could not remember where I got this idea from. But now it seems clear that I stole it from @xeophon_ or at least it was heavily inspired by a post of his some time ago.
The setup is to simple that it fits in a single tweet:. An LLM is given 3 paragraphs of a popular Wikipedia article, but one (numerical) fact is slightly changed. The LLM is then asked to translate it to another (high-resource) language. LLM passes if it detects the error!
1
0
2