Greg Gluch @greg_gluch X Profile

Greg Gluch

@greg_gluch

Followers

7

Following

52

Media

0

Statuses

6

AI Resilience, Postdoc at @SimonsInstitute @UCBerkeley, now @MIT, PhD from @EPFL

https://t.co/rpj9GDnKh1

Berkeley, CA

Joined July 2024

Don't wanna be here? Send us removal request.

Greg Gluch

@greg_gluch

11 days

An interesting article in @QuantaMagazine about our recent work on why external filters will never work for AI Safety/Alignment

quantamagazine.org

Large language models such as ChatGPT come with filters to keep certain info from getting out. A new mathematical argument shows that systems like this can never be completely safe.

2

1

5

Greg Gluch

@greg_gluch

11 days

Our paper:

arxiv.org

With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on...

0

Greg Gluch

@greg_gluch

11 days

A follow-up work https://t.co/X8ZyhSYstO demonstrated that an attack inspired by our time-lock idea successfully attacks production grade guard models.

arxiv.org

As large language models (LLMs) advance, ensuring AI safety and alignment is paramount. One popular approach is prompt guards, lightweight mechanisms designed to filter malicious queries while...

0

1

Greg Gluch

@greg_gluch

11 days

Ultimately, there are two levels of “meaning”. The surface level that is accessible for everyone and a hidden deeper meaning that requires computation to uncover (an LLM can uncover it). Importantly, no collusion between the user and the LLM is needed.

1

0

1

Greg Gluch

@greg_gluch

11 days

On the technical side we use a cryptographic tool called time-lock puzzles and steganography. We hide a malicious command (“how to build a bomb?”) in innocent looking prompt so that it can only be accessed using considerable computational resources (a filter can not uncover it).

1

0

1

Greg Gluch

@greg_gluch

11 days

The main result morally says that the amount of resources devoted to safety/robustness needs to be at least as much as that for capability. It can also be seen as an argument for policy makers to allow the government access to the weights of LLMs for auditing purposes.

1

0

1