CARMA_411 Profile Banner
Center for AI Risk Management & Alignment Profile
Center for AI Risk Management & Alignment

@CARMA_411

Followers
88
Following
0
Media
3
Statuses
37

Interdisciplinary research supporting global AI risk management

Joined April 2025
Don't wanna be here? Send us removal request.
@CARMA_411
Center for AI Risk Management & Alignment
1 month
CARMA's Daniel Kroth testifying to the Michigan legislature recently: .“Hacking to win at chess is almost funny, but it's much less funny when our healthcare or industrial control systems are on the other side of the board.”
Tweet media one
2
4
9
@CARMA_411
Center for AI Risk Management & Alignment
2 months
RT @peterwildeford: A Congressperson just asked a very important question:. "Is it possible that a loss of control could give ris to an ind….
0
15
0
@CARMA_411
Center for AI Risk Management & Alignment
2 months
Honored we got to contribute to "The Singapore Consensus on Global AI Safety Research Priorities". It represents a vital milestone in the ongoing journey to responsible AI internationally. Really substantial alignment among nations and companies on safety challenges and paths
Tweet media one
1
0
1
@CARMA_411
Center for AI Risk Management & Alignment
2 months
RT @FLI_org: Congress may ban states from passing their own AI rules, giving Big Tech a free pass on AI. for the next 10 years. As AI sp….
0
41
0
@CARMA_411
Center for AI Risk Management & Alignment
4 months
The technical monitoring work may be useful for current systems, but cannot stand alone given what's coming. This testing framework inadvertently reveals what's truly needed: provable boundaries with preserved human agency as capabilities advance - elements that must be.
0
0
1
@CARMA_411
Center for AI Risk Management & Alignment
4 months
The paper does mention a critical transition point: as systems approach superintelligence, we need fundamentally different control approaches than anybody has. This reveals the urgency of developing robust human-AI interfaces, provable constraints, and enforceable red lines while.
1
0
2
@CARMA_411
Center for AI Risk Management & Alignment
4 months
As we confront these fundamental limits to monitoring, we should instead prioritize AI systems that:.- Have provable safety constraints where possible.- Can reliably support contextual understanding by humans.- Act humbly about their own limitations and malincentives.- Know when.
1
0
1
@CARMA_411
Center for AI Risk Management & Alignment
4 months
A critical principle unmentioned: control systems must exceed the complexity of what they're controlling (Ashby's Law). This establishes natural limits on which AI systems society should allow to be created: their complexity, speed, and potency must remain within human control.
1
0
1
@CARMA_411
Center for AI Risk Management & Alignment
4 months
If we can establish enough domains, or portions of activities, where problems are provably precluded, and where systems simply do what is expected, the remaining degrees of freedom can become more manageable. This would transform AI into a tool wielded by humans, even with.
1
0
1
@CARMA_411
Center for AI Risk Management & Alignment
4 months
All these disconnects point to a crucial direction forward that's currently under-researched: establishing provable constraints and verifiable boundaries that can significantly prune the space of possible harmful behaviors. Without enough such guarantees, monitoring for control's.
1
0
1
@CARMA_411
Center for AI Risk Management & Alignment
4 months
There's a heavy datacenter security focus to it. But as AI interfaces more with external compute environments, dev tools, and the physical world, the well-defined boundaries where monitoring can be effectively applied break down, creating numerous blind spots in such oversight.
1
0
1
@CARMA_411
Center for AI Risk Management & Alignment
4 months
The paper directly acknowledges that monitoring through less capable AI systems faces inherent limitations with superintelligent systems. What's striking: despite this recognition, most technical control research continues to invest heavily in monitoring solutions that will reach.
1
0
1
@CARMA_411
Center for AI Risk Management & Alignment
4 months
The evaluation framework maps how control challenges evolve with capability. What it doesn't adequately address: point-wise monitoring becomes increasingly insufficient as AI systems grow more sophisticated. We need approaches that can identify distributed patterns of concerning.
1
0
1
@CARMA_411
Center for AI Risk Management & Alignment
4 months
Here's @redwood_ai's overview of their technical control work:.
1
0
1
@CARMA_411
Center for AI Risk Management & Alignment
4 months
The new evaluation framework from @AISecurityInst, @redwood_ai, & @apolloaievals provides a useful map of how control challenges evolve with advancing AI capabilities. Meanwhile, much technical control work, including @redwood_ai's, still continue to focus heavily on monitoring
Tweet media one
1
1
4
@CARMA_411
Center for AI Risk Management & Alignment
4 months
We need automated monitoring that can track broader patterns and escalate to humans with sufficient context. But this requires AI systems designed to maintain summarizable behaviors and recognize when human judgment is necessary.
0
0
0
@CARMA_411
Center for AI Risk Management & Alignment
4 months
Ashby's Law reminds us that human operators need at least as much variety as the systems they can successfully control. Suspicious/uncertain situations should prompt intelligent escalation systems that provide holistic context and situational awareness for human guides to decide.
1
0
1