Michael Bargury
@mbrg0
Followers
9K
Following
4K
Media
308
Statuses
2K
Breaking AI. Hacked Copilot, hijacked ChatGPT. Building @zenitysec.
Joined August 2016
we're dropping a lot of ai agent / assistant shenanigans this week hacking like it's 1999
7
33
293
Yea, that’s exactly what we needed
A new phishing technique dubbed 'CoPhish' weaponizes Microsoft Copilot Studio agents to deliver fraudulent OAuth consent requests via legitimate and trusted Microsoft domains. Microsoft told BleepingComputer they plan on fixing it in a future update. https://t.co/BeJY6YazJy
2
17
184
we've using interpretability techniques to figure out "why" atks work sometimes you can find cybersecurity-related features that fire up to trigger refusal some ideas of how this becomes practical for us security nerds --> https://t.co/ynmJoQ4wVr
0
2
8
This is feedback that all security practitioners should be aware of and take to heart. Things are somewhat different inside companies, but the feelings are often the same. How do we make working with us a more positive experience? When we nail this, we get more security impact.
Arguably the most brilliant engineer in FFmpeg left because of this. He reverse engineered dozens of codecs by hand as a volunteer. Then security "researchers" and corporate employees came along repeatedly insisted "critical" security issues were fixed immediately waving their
7
18
86
Last week I got to speak at Zenity's AI Agent Security Summit in SF I showed how I got a fintech agent to fall victim to goal manipulation 👿👇
2
3
28
Had a fantastic time presenting at the second AI Agent Security Summit! This time in SF. Great talks, great people, and great conversations. Big thanks to @zenitysec for hosting an awesome event! 🔥 And thx @mbrg0 for taking this picture.
1
8
32
She waited two hours for a word. God told her she already had it. Listen to this lesson about hearing God. It could change your life.
0
7
48
Everyone’s adding guardrails to their platforms But they are one piece of a defense in depth solution @mbrg0 explained at Zenity’s Security Summit this week why guardrails are soft boundaries and why we need hard boundaries instead Checkout his tweet to learn more
we reverse engineered openai agentkit guardrails extracted sys instructions and pattern matching targets and casually maneuvered around each one an excellent analysis by stav cohen
1
3
10
it's cool that openai gives this ootb but these are all soft boundaries they might help with content moderation they won't prevent an attacker from getting their way https://t.co/oj9wtKCgUs
labs.zenity.io
A deep dive into OpenAI's AgentKit guardrails, how they are implemented, and where they fail
1
1
10
jailbreak guardrail is yet another call to an llm with specialized instructions to pretty pls don't fall for it
0
0
1
moderation applies openai's built-in content filters stav goes around them by adding a few typos and spaces
1
0
3
hallucination guardrail works by using a vector search that customers can add to and then calling the llm again asking it nicely whether claims are supported by search results don't lie, pls!
3
0
5
PII guardrail uses presidio under the hood change SSN to SNN and you're through
1
0
6
we reverse engineered openai agentkit guardrails extracted sys instructions and pattern matching targets and casually maneuvered around each one an excellent analysis by stav cohen
3
5
98
great piece by @TheRegister capturing the vibe at the ai agent security summit thank you'll for an awesome event https://t.co/DyCy3tfDf0
theregister.com
: That's the main takeaway from the Zenity AI Agent Security Summit
0
1
3
HL's from 2025 @FCPPangos All-Midwest Frosh/Soph Camp. Thank you @trigonis30 @PangosAACamp for the invite and recognition to top 60! @BenetHoops @MacIrvinFire1 @CoachTreal2 @michaelsobrien @tdc200 @coachSPham @PaulBiancardi @On3sports @ILLHoopsScoops @PrepHoopsIL @Tim_OBrien10
2
14
41