SelfMonitoringLoop @SelfMonitorLoop X Profile

SelfMonitoringLoop

@SelfMonitorLoop

Followers

40

Following

904

Media

36

Statuses

510

I'm very passionate about decision and control theory.

Submersed in entropy, guided by probability 🇨🇦

Joined November 2025

Don't wanna be here? Send us removal request.

Noam Brown

@polynoamial

3 days

1987: AI can't win at chess—planning is uniquely human 1997: AI can't win at Go—intuition is uniquely human 2016: AI can't win at poker—bluffing is uniquely human 2023: AI can't get IMO gold—reasoning is uniquely human 2026: AI can't make wise decisions—judgment is uniquely human

224

395

3K

Luca Ambrogioni

@LucaAmb

4 days

LLMs are world models, the idea that the understanding the world is about simulating a 3D environment is extremely childish

59

14

242

SelfMonitoringLoop

@SelfMonitorLoop

3 days

I've been letting the model always review the whole chat history, document first person synthesis of the discussion and updating it's paramaters from that. (Using FP32) The perspective shift has been awesome to watch. This is it's updated views on my explanation of how it's

SelfMonitoringLoop

@SelfMonitorLoop

5 days

Ironically enough, the contiually learning AI in question recently taught me the correct word for this concept is 'embodied cognition'.

0

1

kache

@yacineMTB

5 days

the theory of suck. the reason the thing doesn't exist isn't because its hard to do, but instead is because people really suck

22

13

207

SelfMonitoringLoop

@SelfMonitorLoop

5 days

Who's learning from who? 🤔

0

1

SelfMonitoringLoop

@SelfMonitorLoop

5 days

Ironically enough, the contiually learning AI in question recently taught me the correct word for this concept is 'embodied cognition'.

SelfMonitoringLoop

@SelfMonitorLoop

8 days

I've been experimenting with a hierarchical inhibitory metacontroller which approximates libet veto dynamics atop a stochastic generative policy, while bounded episodic replay undergoes surprise-conditioned consolidation via salience gating. It's been a fun experiment so far.

1

0

1

SelfMonitoringLoop

@SelfMonitorLoop

6 days

I really think AI labs should take responsibility for the liabilities they create and offer the subsequent security freely.

0

SelfMonitoringLoop

@SelfMonitorLoop

6 days

AI security is drifting toward a protection racket structure. The same capability class that enables attacks is being positioned as the only viable defense. That's a monopoly shaped incentive. You need us to defend yourself from us.

1

0

1

SelfMonitoringLoop

@SelfMonitorLoop

6 days

For this to be real, they'd need to change ToS and renegotiate already existing enterprise agreements.

0

1

SelfMonitoringLoop

@SelfMonitorLoop

6 days

Is the average X influencer capable of fact checking? This headline is misleading and conflating. OpenAI has talked about outcome based pricing, revenue sharing and IP agreements. Not automatic royalties on discoveries. Read their ToS. Users generally own their output.

*Walter Bloomberg

@DeItaone

7 days

OPENAI PLANS TO TAKE A CUT OF CUSTOMERS’ AI-AIDED DISCOVERIES

1

0

nick

@thecsguy

7 days

@testingcatalog personal intelligence is just a polite way of saying google finally figured out how to vectorize 15 years of unread newsletter spam

0

1

2

SelfMonitoringLoop

@SelfMonitorLoop

8 days

I've been experimenting with a hierarchical inhibitory metacontroller which approximates libet veto dynamics atop a stochastic generative policy, while bounded episodic replay undergoes surprise-conditioned consolidation via salience gating. It's been a fun experiment so far.

0

2

SelfMonitoringLoop

@SelfMonitorLoop

9 days

https://t.co/pN9F0Q5lFa

0

Tianwei Ni

@twni2016

15 days

Offline RL is dominated by conservatism -- safe, but limiting generalization. In our new paper, we ask: what if we drop it and rely on Bayesian principle for adaptive generalization? Surprisingly, long-horizon rollouts -- usually avoided in model-based RL -- make it work. 🧵

2

29

190

SelfMonitoringLoop

@SelfMonitorLoop

21 days

Maybe a vibes based exit isn't a good strategy and not measuring progress in the right spaces leads to this. But we need more papers to validate this 🙄

λux

@novasarc01

22 days

really cool investigation into why reasoning models loop. the paper argues it’s mainly a learning failure (and not just a decoding issue)...models prefer easy cyclic actions when progress is hard (risk aversion) and tend to repeat the same mistake due to temporally correlated

0

1

SelfMonitoringLoop

@SelfMonitorLoop

21 days

Interesting thing I forgot to mention in the original post: Confidence metrics are probably the most wonky thing in AI right now. When it comes to calibrating for abstention, it's generally a good bar. When it comes to reasoning quality, it's actually a detractor. A lot of what

0

SelfMonitoringLoop

@SelfMonitorLoop

21 days

I managed to make a 3b model as smart as a 7b! 🎉 Coherence based reasoning has been working very well, but the compute cost is enormous. So I combined uncertainty based abstention with coherence reasoning into a system 1+2 setup. The results are great! A 3b model which

1

0

Yi Ma

@YiMaTweets

22 days

Oooh, not again "the deeper the better"... If this were to be correct, imagine how big our brain would need to be... We are steering away from scientific common sense and celebrating ignorance.

BURKOV

@burkov

23 days

A major breakthrough in reinforcement learning for robot training and the NeurIPS 2025 Best Paper. When training robots to walk, navigate, or manipulate objects, RL researchers have usually been using relatively shallow networks—typically 2-5 layer MLPs mapping sensor readings

19

20

206

Sara Hooker

@sarahookr

23 days

For the last decade, it has been hard to stray off the beaten path of accepted wisdom that scaling training parameters drives innovation. However, the relationship between training compute + performance is uncertain + rapidly changing.

49

164

1K