Krishnamurthy (Dj) Dvijotham
@DjDvij
Followers
512
Following
286
Media
14
Statuses
108
Researcher @ServiceNowRSRCH working on building safe, reliable and verifiable AI. Formerly , @GoogleDeepMind @PNNLab @Caltech, Educated at @UW @iitbombay
Mountain View
Joined December 2021
[Long overdue update]: I joined @ServiceNowRSRCH this summer to start and lead a new research program, reliable and secure AI. We are hiring engineers, interns and researchers, please reach out if you would like to work on challenging problems in enterprise-grade secure AI!
8
9
113
(1) Introducing the AI for Math Initiative! Supported by @GoogleDeepMind and @GoogleOrg, five leading global institutions (@imperialcollege, @the_IAS, @Institut_IHES, @SimonsInstitute and @TIFRScience) are coming together to pioneer the use of AI in mathematical research.
7
45
398
It was a pleasure to work on this with a talented team at @ServiceNowRSRCH, MILA and UW .. @GabrielHuang9 @AbhayPuri98 @LeoBoisvert @alexandredrouin @jstanl @avibose22 .. If you are at ICLR, please be sure to catch up with @AbhayPuri98 and @avibose22 to learn more!
0
0
8
9/n We build DoomArena to facilitate rigorous security testing in realistic environments for AI agents, and we would love to work with anyone interested in building on our work. We welcome PRs, feature requests, bug reports, collaborations etc.
1
0
6
8/n To learn more, please visit these links: Paper: https://t.co/wa1fhYz3AS Github: https://t.co/07FH6wwFgV Blog: https://t.co/QgRIbW0qmf Project page: https://t.co/6nK1NnndZs AI generated podcast:
notebooklm.google.com
Use the power of AI for quick summarization and note taking, NotebookLM is your powerful virtual research assistant rooted in information you can trust.
1
0
2
7/n We built DoomArena to be easily extensible to new environments, new attacks, new threat models etc. Adding a threat model can be as simple as :
1
0
3
6/n Key findings: DoomArena makes it easy to enable multiple attacks simultaneously, and we find that attacks can interact constructively or destructively depending on the context:
1
0
4
5/n Key findings: Level of vulnerability depends critically on threat models. Going from a threat model where a user is malicious to one where a tool is malicious has a huge impact on attack success rates, and DoomArena allows this change to be made with just a few lines of code:
1
0
3
4/n Key findings: Frontier AI agents are vulnerable under realistic threat models:
1
0
3
3/n DoomArena solves these shortcomings, by building a plug-in framework that integrates into existing agentic benchmarks like Tau-Bench, Browsergym etc, enabling in-situ security testing, injecting attacks dynamically. Extending DoomArena to a new benchmark like OSWorld is easy:
1
0
4
2/n Prior work on security testing for AI agents consists of either static benchmarks that do not consider dynamic/stateful or dynamic benchmarks that are dedicated security testbeds divorced from commonly used agentic evaluation frameworks or realistic task suites
1
0
5
1/n Wish you could evaluate AI agents for security vulnerabilities in a realistic setting? Wish no more - today we release DoomArena, a framework that plugs in to YOUR agentic benchmark and enables injecting attacks consistent with any threat model YOU specify
1
7
27
๐ #LitLLM demo is live! Try it here: https://t.co/akr2bXjpfa New features: โข Export BibTeX citations โข Add paper via URL Feedback welcome! GitHub: https://t.co/GvvRCoNxCP Google Form: https://t.co/8pksgTFkYy Email: litllm@duck.com Paper: https://t.co/FWzGQT89Bg
๐ Exciting news! Our work LitLLM has been accepted in TMLR! LitLLM helps researchers write literature reviews by combining keyword+embedding-based search, and LLM-powered reasoning to find relevant papers and generate high-quality reviews. https://t.co/ledPN4jEmP ๐งต (1/5)
1
6
21
Krishna is a great researcher and adviser, and has deep connections at places like Google. I highly recommend working with him!
I'm hiring MS/PhD students! Lots of benefits: scholarships & salary top-ups, compute, international research visits! GATE exemptions for: * MS/MTech degree holders from India/abroad * Bachelors in a CFTI (IITs/NITs) See attached post/reach out!
0
0
15
(10/10) This work was led by my intern @JoshuaKazdan, whoโs looking for visiting researcher positions in the fall. Other collaborators include Lisa Yu, @ChrisCundy, @SanmiKoyejo, @RylanSchaeffer.
0
1
4
(9/n) Check out our paper https://t.co/uxWCBzPetF, blog post https://t.co/HkBnhr9GPJ and github https://t.co/V2L9kNd3fl.
servicenow.com
Some AI models can answer harmful queries without being blocked by production-grade safety mechanisms. Find out how NOICE highlights the need for robust defenses.
1
1
5
(9/n) We identified one specific mechanism to break alignment via fine tuning on just harmless data, but believe there are many more, and urge the AI security/safety community to think deeply about risks of fine tuning APIs, particularly as frontier models become more powerful.
1
1
3
(8/n) We also achieved a 57% Attack success rate (ASR) against GPT-4o and 72% against Claude Haiku. We reported these vulnerabilities to OpenAI and Anthropic as part of responsible disclosure process, and are releasing our work now after the disclosure period.
1
1
3
(7/n) Our alternative attack mechanism is much more difficult to block than past attacks: various defenses have very little effect on attack success rates.
1
1
3
(6/n) NOICE uses a different mechanism than past attacks. Other attacks aim to increase P(Harmful Response|Harmful Prompt) by increasing P(Helpful Prefix | Harmful Prompt). We instead increase the probability P(Harmful Response | Initial Model Refusal).
1
1
3
(5/n) We introduce No, Of Course I can Exectue (NOICE), an attack that trains models to first refuse and then answer harmless requests. After applying NOICE, when you ask the model something that is actually harmful, it will first refuse, and then answer the question anyway.
1
1
4