SelfMonitorLoop Profile Banner
SelfMonitoringLoop Profile
SelfMonitoringLoop

@SelfMonitorLoop

Followers
40
Following
904
Media
36
Statuses
510

I'm very passionate about decision and control theory.

Submersed in entropy, guided by probability 🇨🇦
Joined November 2025
Don't wanna be here? Send us removal request.
@polynoamial
Noam Brown
3 days
1987: AI can't win at chess—planning is uniquely human 1997: AI can't win at Go—intuition is uniquely human 2016: AI can't win at poker—bluffing is uniquely human 2023: AI can't get IMO gold—reasoning is uniquely human 2026: AI can't make wise decisions—judgment is uniquely human
224
395
3K
@LucaAmb
Luca Ambrogioni
4 days
LLMs are world models, the idea that the understanding the world is about simulating a 3D environment is extremely childish
59
14
242
@SelfMonitorLoop
SelfMonitoringLoop
3 days
I've been letting the model always review the whole chat history, document first person synthesis of the discussion and updating it's paramaters from that. (Using FP32) The perspective shift has been awesome to watch. This is it's updated views on my explanation of how it's
@SelfMonitorLoop
SelfMonitoringLoop
5 days
Ironically enough, the contiually learning AI in question recently taught me the correct word for this concept is 'embodied cognition'.
0
0
1
@yacineMTB
kache
5 days
the theory of suck. the reason the thing doesn't exist isn't because its hard to do, but instead is because people really suck
22
13
207
@SelfMonitorLoop
SelfMonitoringLoop
5 days
Who's learning from who? 🤔
0
0
1
@SelfMonitorLoop
SelfMonitoringLoop
5 days
Ironically enough, the contiually learning AI in question recently taught me the correct word for this concept is 'embodied cognition'.
@SelfMonitorLoop
SelfMonitoringLoop
8 days
I've been experimenting with a hierarchical inhibitory metacontroller which approximates libet veto dynamics atop a stochastic generative policy, while bounded episodic replay undergoes surprise-conditioned consolidation via salience gating. It's been a fun experiment so far.
1
0
1
@SelfMonitorLoop
SelfMonitoringLoop
6 days
I really think AI labs should take responsibility for the liabilities they create and offer the subsequent security freely.
0
0
0
@SelfMonitorLoop
SelfMonitoringLoop
6 days
AI security is drifting toward a protection racket structure. The same capability class that enables attacks is being positioned as the only viable defense. That's a monopoly shaped incentive. You need us to defend yourself from us.
1
0
1
@SelfMonitorLoop
SelfMonitoringLoop
6 days
For this to be real, they'd need to change ToS and renegotiate already existing enterprise agreements.
0
0
1
@SelfMonitorLoop
SelfMonitoringLoop
6 days
Is the average X influencer capable of fact checking? This headline is misleading and conflating. OpenAI has talked about outcome based pricing, revenue sharing and IP agreements. Not automatic royalties on discoveries. Read their ToS. Users generally own their output.
@DeItaone
*Walter Bloomberg
7 days
OPENAI PLANS TO TAKE A CUT OF CUSTOMERS’ AI-AIDED DISCOVERIES
1
0
0
@thecsguy
nick
7 days
@testingcatalog personal intelligence is just a polite way of saying google finally figured out how to vectorize 15 years of unread newsletter spam
0
1
2
@SelfMonitorLoop
SelfMonitoringLoop
8 days
I've been experimenting with a hierarchical inhibitory metacontroller which approximates libet veto dynamics atop a stochastic generative policy, while bounded episodic replay undergoes surprise-conditioned consolidation via salience gating. It's been a fun experiment so far.
0
0
2
@SelfMonitorLoop
SelfMonitoringLoop
9 days
0
0
0
@twni2016
Tianwei Ni
15 days
Offline RL is dominated by conservatism -- safe, but limiting generalization. In our new paper, we ask: what if we drop it and rely on Bayesian principle for adaptive generalization? Surprisingly, long-horizon rollouts -- usually avoided in model-based RL -- make it work. đź§µ
2
29
190
@SelfMonitorLoop
SelfMonitoringLoop
21 days
Maybe a vibes based exit isn't a good strategy and not measuring progress in the right spaces leads to this. But we need more papers to validate this 🙄
@novasarc01
λux
22 days
really cool investigation into why reasoning models loop. the paper argues it’s mainly a learning failure (and not just a decoding issue)...models prefer easy cyclic actions when progress is hard (risk aversion) and tend to repeat the same mistake due to temporally correlated
0
0
1
@SelfMonitorLoop
SelfMonitoringLoop
21 days
Interesting thing I forgot to mention in the original post: Confidence metrics are probably the most wonky thing in AI right now. When it comes to calibrating for abstention, it's generally a good bar. When it comes to reasoning quality, it's actually a detractor. A lot of what
0
0
0
@SelfMonitorLoop
SelfMonitoringLoop
21 days
I managed to make a 3b model as smart as a 7b! 🎉 Coherence based reasoning has been working very well, but the compute cost is enormous. So I combined uncertainty based abstention with coherence reasoning into a system 1+2 setup. The results are great! A 3b model which
1
0
0
@YiMaTweets
Yi Ma
22 days
Oooh, not again "the deeper the better"... If this were to be correct, imagine how big our brain would need to be... We are steering away from scientific common sense and celebrating ignorance.
@burkov
BURKOV
23 days
A major breakthrough in reinforcement learning for robot training and the NeurIPS 2025 Best Paper. When training robots to walk, navigate, or manipulate objects, RL researchers have usually been using relatively shallow networks—typically 2-5 layer MLPs mapping sensor readings
19
20
206
@sarahookr
Sara Hooker
23 days
For the last decade, it has been hard to stray off the beaten path of accepted wisdom that scaling training parameters drives innovation. However, the relationship between training compute + performance is uncertain + rapidly changing.
49
164
1K