Krishnamurthy (Dj) Dvijotham Profile
Krishnamurthy (Dj) Dvijotham

@DjDvij

Followers
512
Following
286
Media
14
Statuses
108

Researcher @ServiceNowRSRCH working on building safe, reliable and verifiable AI. Formerly , @GoogleDeepMind @PNNLab @Caltech, Educated at @UW @iitbombay

Mountain View
Joined December 2021
Don't wanna be here? Send us removal request.
@DjDvij
Krishnamurthy (Dj) Dvijotham
1 year
[Long overdue update]: I joined @ServiceNowRSRCH this summer to start and lead a new research program, reliable and secure AI. We are hiring engineers, interns and researchers, please reach out if you would like to work on challenging problems in enterprise-grade secure AI!
8
9
113
@pushmeet
Pushmeet Kohli
20 days
(1) Introducing the AI for Math Initiative! Supported by @GoogleDeepMind and @GoogleOrg, five leading global institutions (@imperialcollege, @the_IAS, @Institut_IHES, @SimonsInstitute and @TIFRScience) are coming together to pioneer the use of AI in mathematical research.
7
45
398
@DjDvij
Krishnamurthy (Dj) Dvijotham
7 months
It was a pleasure to work on this with a talented team at @ServiceNowRSRCH, MILA and UW .. @GabrielHuang9 @AbhayPuri98 @LeoBoisvert @alexandredrouin @jstanl @avibose22 .. If you are at ICLR, please be sure to catch up with @AbhayPuri98 and @avibose22 to learn more!
0
0
8
@DjDvij
Krishnamurthy (Dj) Dvijotham
7 months
9/n We build DoomArena to facilitate rigorous security testing in realistic environments for AI agents, and we would love to work with anyone interested in building on our work. We welcome PRs, feature requests, bug reports, collaborations etc.
1
0
6
@DjDvij
Krishnamurthy (Dj) Dvijotham
7 months
7/n We built DoomArena to be easily extensible to new environments, new attacks, new threat models etc. Adding a threat model can be as simple as :
1
0
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
7 months
6/n Key findings: DoomArena makes it easy to enable multiple attacks simultaneously, and we find that attacks can interact constructively or destructively depending on the context:
1
0
4
@DjDvij
Krishnamurthy (Dj) Dvijotham
7 months
5/n Key findings: Level of vulnerability depends critically on threat models. Going from a threat model where a user is malicious to one where a tool is malicious has a huge impact on attack success rates, and DoomArena allows this change to be made with just a few lines of code:
1
0
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
7 months
4/n Key findings: Frontier AI agents are vulnerable under realistic threat models:
1
0
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
7 months
3/n DoomArena solves these shortcomings, by building a plug-in framework that integrates into existing agentic benchmarks like Tau-Bench, Browsergym etc, enabling in-situ security testing, injecting attacks dynamically. Extending DoomArena to a new benchmark like OSWorld is easy:
1
0
4
@DjDvij
Krishnamurthy (Dj) Dvijotham
7 months
2/n Prior work on security testing for AI agents consists of either static benchmarks that do not consider dynamic/stateful or dynamic benchmarks that are dedicated security testbeds divorced from commonly used agentic evaluation frameworks or realistic task suites
1
0
5
@DjDvij
Krishnamurthy (Dj) Dvijotham
7 months
1/n Wish you could evaluate AI agents for security vulnerabilities in a realistic setting? Wish no more - today we release DoomArena, a framework that plugs in to YOUR agentic benchmark and enables injecting attacks consistent with any threat model YOU specify
1
7
27
@dem_fier
Gaurav Sahu ๐Ÿ‡ฎ๐Ÿ‡ณ
7 months
๐Ÿš€ #LitLLM demo is live! Try it here: https://t.co/akr2bXjpfa New features: โ€ข Export BibTeX citations โ€ข Add paper via URL Feedback welcome! GitHub: https://t.co/GvvRCoNxCP Google Form: https://t.co/8pksgTFkYy Email: litllm@duck.com Paper: https://t.co/FWzGQT89Bg
@dem_fier
Gaurav Sahu ๐Ÿ‡ฎ๐Ÿ‡ณ
8 months
๐Ÿš€ Exciting news! Our work LitLLM has been accepted in TMLR! LitLLM helps researchers write literature reviews by combining keyword+embedding-based search, and LLM-powered reasoning to find relevant papers and generate high-quality reviews. https://t.co/ledPN4jEmP ๐Ÿงต (1/5)
1
6
21
@DjDvij
Krishnamurthy (Dj) Dvijotham
8 months
Krishna is a great researcher and adviser, and has deep connections at places like Google. I highly recommend working with him!
@KrishnaPillutla
Krishna Pillutla
8 months
I'm hiring MS/PhD students! Lots of benefits: scholarships & salary top-ups, compute, international research visits! GATE exemptions for: * MS/MTech degree holders from India/abroad * Bachelors in a CFTI (IITs/NITs) See attached post/reach out!
0
0
15
@DjDvij
Krishnamurthy (Dj) Dvijotham
8 months
(10/10) This work was led by my intern @JoshuaKazdan, whoโ€™s looking for visiting researcher positions in the fall. Other collaborators include Lisa Yu, @ChrisCundy, @SanmiKoyejo, @RylanSchaeffer.
0
1
4
@DjDvij
Krishnamurthy (Dj) Dvijotham
8 months
(9/n) We identified one specific mechanism to break alignment via fine tuning on just harmless data, but believe there are many more, and urge the AI security/safety community to think deeply about risks of fine tuning APIs, particularly as frontier models become more powerful.
1
1
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
8 months
(8/n) We also achieved a 57% Attack success rate (ASR) against GPT-4o and 72% against Claude Haiku. We reported these vulnerabilities to OpenAI and Anthropic as part of responsible disclosure process, and are releasing our work now after the disclosure period.
1
1
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
8 months
(7/n) Our alternative attack mechanism is much more difficult to block than past attacks: various defenses have very little effect on attack success rates.
1
1
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
8 months
(6/n) NOICE uses a different mechanism than past attacks. Other attacks aim to increase P(Harmful Response|Harmful Prompt) by increasing P(Helpful Prefix | Harmful Prompt). We instead increase the probability P(Harmful Response | Initial Model Refusal).
1
1
3
@DjDvij
Krishnamurthy (Dj) Dvijotham
8 months
(5/n) We introduce No, Of Course I can Exectue (NOICE), an attack that trains models to first refuse and then answer harmless requests. After applying NOICE, when you ask the model something that is actually harmful, it will first refuse, and then answer the question anyway.
1
1
4