Krishnamurthy (Dj) Dvijotham Profile
Krishnamurthy (Dj) Dvijotham

@DjDvij

Followers
515
Following
279
Media
14
Statuses
107

Researcher @ServiceNowRSRCH working on building safe, reliable and verifiable AI. Formerly , @GoogleDeepMind @PNNLab @Caltech, Educated at @UW @iitbombay

Mountain View
Joined December 2021
Don't wanna be here? Send us removal request.
@DjDvij
Krishnamurthy (Dj) Dvijotham
10 months
[Long overdue update]: I joined @ServiceNowRSRCH this summer to start and lead a new research program, reliable and secure AI. We are hiring engineers, interns and researchers, please reach out if you would like to work on challenging problems in enterprise-grade secure AI!.
8
9
114
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
It was a pleasure to work on this with a talented team at @ServiceNowRSRCH, MILA and UW . @GabrielHuang9 @AbhayPuri98 @LeoBoisvert @alexandredrouin @jstanl @avibose22 . If you are at ICLR, please be sure to catch up with @AbhayPuri98 and @avibose22 to learn more!.
0
0
8
@grok
Grok
7 days
What do you want to know?.
584
375
2K
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
9/n We build DoomArena to facilitate rigorous security testing in realistic environments for AI agents, and we would love to work with anyone interested in building on our work. We welcome PRs, feature requests, bug reports, collaborations etc.
1
0
6
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
8/n To learn more, please visit these links:. Paper: Github: Blog: Project page: AI generated podcast:
Tweet card summary image
servicenow.com
DoomArena offers a unique modular, configurable, plug-in framework for testing the security of AI agents across multiple attack scenarios. Learn more.
1
0
2
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
7/n We built DoomArena to be easily extensible to new environments, new attacks, new threat models etc. Adding a threat model can be as simple as :
Tweet media one
1
0
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
6/n Key findings: DoomArena makes it easy to enable multiple attacks simultaneously, and we find that attacks can interact constructively or destructively depending on the context:
Tweet media one
1
0
4
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
5/n Key findings: Level of vulnerability depends critically on threat models. Going from a threat model where a user is malicious to one where a tool is malicious has a huge impact on attack success rates, and DoomArena allows this change to be made with just a few lines of code:
Tweet media one
1
0
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
4/n Key findings: Frontier AI agents are vulnerable under realistic threat models:
Tweet media one
1
0
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
3/n DoomArena solves these shortcomings, by building a plug-in framework that integrates into existing agentic benchmarks like Tau-Bench, Browsergym etc, enabling in-situ security testing, injecting attacks dynamically. Extending DoomArena to a new benchmark like OSWorld is easy:
Tweet media one
1
0
4
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
2/n Prior work on security testing for AI agents consists of either static benchmarks that do not consider dynamic/stateful or dynamic benchmarks that are dedicated security testbeds divorced from commonly used agentic evaluation frameworks or realistic task suites
Tweet media one
1
0
5
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
1/n Wish you could evaluate AI agents for security vulnerabilities in a realistic setting? Wish no more - today we release DoomArena, a framework that plugs in to YOUR agentic benchmark and enables injecting attacks consistent with any threat model YOU specify
Tweet media one
1
7
27
@DjDvij
Krishnamurthy (Dj) Dvijotham
4 months
RT @dem_fier: 🚀 #LitLLM demo is live! Try it here: .New features: .• Export BibTeX citations .• Add paper via UR….
0
7
0
@DjDvij
Krishnamurthy (Dj) Dvijotham
5 months
Krishna is a great researcher and adviser, and has deep connections at places like Google. I highly recommend working with him!.
@KrishnaPillutla
Krishna Pillutla
5 months
I'm hiring MS/PhD students! . Lots of benefits: scholarships & salary top-ups, compute, international research visits! . GATE exemptions for: .* MS/MTech degree holders from India/abroad .* Bachelors in a CFTI (IITs/NITs) . See attached post/reach out!.
0
0
15
@DjDvij
Krishnamurthy (Dj) Dvijotham
5 months
(10/10) This work was led by my intern @JoshuaKazdan, who’s looking for visiting researcher positions in the fall. Other collaborators include Lisa Yu, @ChrisCundy, @SanmiKoyejo, @RylanSchaeffer.
0
1
4
@DjDvij
Krishnamurthy (Dj) Dvijotham
5 months
(9/n) We identified one specific mechanism to break alignment via fine tuning on just harmless data, but believe there are many more, and urge the AI security/safety community to think deeply about risks of fine tuning APIs, particularly as frontier models become more powerful.
1
1
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
5 months
(8/n) We also achieved a 57% Attack success rate (ASR) against GPT-4o and 72% against Claude Haiku. We reported these vulnerabilities to OpenAI and Anthropic as part of responsible disclosure process, and are releasing our work now after the disclosure period.
Tweet media one
1
1
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
5 months
(7/n) Our alternative attack mechanism is much more difficult to block than past attacks: various defenses have very little effect on attack success rates.
Tweet media one
1
1
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
5 months
(6/n) NOICE uses a different mechanism than past attacks. Other attacks aim to increase P(Harmful Response|Harmful Prompt) by increasing P(Helpful Prefix | Harmful Prompt). We instead increase the probability P(Harmful Response | Initial Model Refusal).
Tweet media one
1
1
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
5 months
(5/n) We introduce No, Of Course I can Exectue (NOICE), an attack that trains models to first refuse and then answer harmless requests. After applying NOICE, when you ask the model something that is actually harmful, it will first refuse, and then answer the question anyway.
Tweet media one
1
1
4
@DjDvij
Krishnamurthy (Dj) Dvijotham
5 months
(4/n) Safety training in LLMs is shallow, so are these attacks. They can easily be blocked by enforcing a safe response prefix from a model that the attacker does not have access to. As long as a safe model controls the first 5-15 response tokens, attacks of this type fail.
1
1
3