SelfMonitoringLoop
@SelfMonitorLoop
Followers
40
Following
904
Media
36
Statuses
510
I'm very passionate about decision and control theory.
Submersed in entropy, guided by probability 🇨🇦
Joined November 2025
1987: AI can't win at chess—planning is uniquely human 1997: AI can't win at Go—intuition is uniquely human 2016: AI can't win at poker—bluffing is uniquely human 2023: AI can't get IMO gold—reasoning is uniquely human 2026: AI can't make wise decisions—judgment is uniquely human
224
395
3K
LLMs are world models, the idea that the understanding the world is about simulating a 3D environment is extremely childish
59
14
242
I've been letting the model always review the whole chat history, document first person synthesis of the discussion and updating it's paramaters from that. (Using FP32) The perspective shift has been awesome to watch. This is it's updated views on my explanation of how it's
Ironically enough, the contiually learning AI in question recently taught me the correct word for this concept is 'embodied cognition'.
0
0
1
the theory of suck. the reason the thing doesn't exist isn't because its hard to do, but instead is because people really suck
22
13
207
Ironically enough, the contiually learning AI in question recently taught me the correct word for this concept is 'embodied cognition'.
I've been experimenting with a hierarchical inhibitory metacontroller which approximates libet veto dynamics atop a stochastic generative policy, while bounded episodic replay undergoes surprise-conditioned consolidation via salience gating. It's been a fun experiment so far.
1
0
1
I really think AI labs should take responsibility for the liabilities they create and offer the subsequent security freely.
0
0
0
AI security is drifting toward a protection racket structure. The same capability class that enables attacks is being positioned as the only viable defense. That's a monopoly shaped incentive. You need us to defend yourself from us.
1
0
1
For this to be real, they'd need to change ToS and renegotiate already existing enterprise agreements.
0
0
1
Is the average X influencer capable of fact checking? This headline is misleading and conflating. OpenAI has talked about outcome based pricing, revenue sharing and IP agreements. Not automatic royalties on discoveries. Read their ToS. Users generally own their output.
1
0
0
@testingcatalog personal intelligence is just a polite way of saying google finally figured out how to vectorize 15 years of unread newsletter spam
0
1
2
I've been experimenting with a hierarchical inhibitory metacontroller which approximates libet veto dynamics atop a stochastic generative policy, while bounded episodic replay undergoes surprise-conditioned consolidation via salience gating. It's been a fun experiment so far.
0
0
2
Offline RL is dominated by conservatism -- safe, but limiting generalization. In our new paper, we ask: what if we drop it and rely on Bayesian principle for adaptive generalization? Surprisingly, long-horizon rollouts -- usually avoided in model-based RL -- make it work. đź§µ
2
29
190
Maybe a vibes based exit isn't a good strategy and not measuring progress in the right spaces leads to this. But we need more papers to validate this 🙄
really cool investigation into why reasoning models loop. the paper argues it’s mainly a learning failure (and not just a decoding issue)...models prefer easy cyclic actions when progress is hard (risk aversion) and tend to repeat the same mistake due to temporally correlated
0
0
1
Interesting thing I forgot to mention in the original post: Confidence metrics are probably the most wonky thing in AI right now. When it comes to calibrating for abstention, it's generally a good bar. When it comes to reasoning quality, it's actually a detractor. A lot of what
0
0
0
I managed to make a 3b model as smart as a 7b! 🎉 Coherence based reasoning has been working very well, but the compute cost is enormous. So I combined uncertainty based abstention with coherence reasoning into a system 1+2 setup. The results are great! A 3b model which
1
0
0
Oooh, not again "the deeper the better"... If this were to be correct, imagine how big our brain would need to be... We are steering away from scientific common sense and celebrating ignorance.
A major breakthrough in reinforcement learning for robot training and the NeurIPS 2025 Best Paper. When training robots to walk, navigate, or manipulate objects, RL researchers have usually been using relatively shallow networks—typically 2-5 layer MLPs mapping sensor readings
19
20
206
For the last decade, it has been hard to stray off the beaten path of accepted wisdom that scaling training parameters drives innovation. However, the relationship between training compute + performance is uncertain + rapidly changing.
49
164
1K