Jiachen Zhao @jcz12856876 X Profile

Jiachen Zhao

@jcz12856876

Followers

152

Following

209

Media

7

Statuses

41

PhD student @KhouryCollege | Scholar @MATSprogram | Prev: @UMassAmherst @HKUST

Boston, MA

Joined October 2022

Don't wanna be here? Send us removal request.

Jiachen Zhao

@jcz12856876

2 months

7/ Very grateful to amazing co-authors! Jing Huang @ZhengxuanZenWu @davidbau @shi_weiyan.

0

2

Jiachen Zhao

@jcz12856876

2 months

6/ 💡We have also found representations of harmfulness may vary across different risk categories. Additionally, adversarial finetuning has minimum influence on the internal belief of harmfulness. Read our full paper:

arxiv.org

LLMs are trained to refuse harmful instructions, but do they truly understand harmfulness beyond just refusing? Prior work has shown that LLMs' refusal behaviors can be mediated by a...

1

0

Jiachen Zhao

@jcz12856876

2 months

5/ ⚔️ We propose Latent Guard: a safeguard model that uses internal beliefs to detect unsafe inputs or over-refusal cases.

1

0

Jiachen Zhao

@jcz12856876

2 months

4/ 🔓 We also find jailbreak methods can suppress refusal, but LLMs may still internally know the input is harmful.

1

0

1

Jiachen Zhao

@jcz12856876

2 months

3/ 🧪 Causal evidence: We extract a “harmfulness” direction and a “refusal” direction. We design a reply inversion task where steering with these two directions leads to opposite results!.Harmfulness direction: it will make LLMs interpret benign prompts as harmful.Refusal

1

0

1

Jiachen Zhao

@jcz12856876

2 months

2/🔗Paper: 💻Code: .📚Project Page: 📊We focus on two token positions t_inst and t_post-inst. Through clustering, we show:. — Harmfulness decides the clustering of instructions at t_inst. — Refusal decides

1

0

4

Jiachen Zhao

@jcz12856876

2 months

1/ 🚨New Paper 🚨.LLMs are trained to refuse harmful instructions, but internally, do they see harmfulness and refusal as the same? .⚔️We find causal evidence that 👈”LLMs encode harmfulness and refusal separately” 👉. ✂️LLMs may know a prompt is harmful internally yet still

5

18

68

Jiachen Zhao

@jcz12856876

2 months

RT @shi_weiyan: 🤔Long-horizon tasks: How to train LLMs for the marathon?🌀. Submit anything on 🔁"Multi-turn Interactions in LLMs"🔁 to our @N….

0

17

0

Jiachen Zhao

@jcz12856876

3 months

RT @dawnsongtweets: 1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work:. 🔓 CyberGym: AI agents discov….

0

151

0

Jiachen Zhao

@jcz12856876

10 months

What types of exemplar CoTs are better for In-Context Learning?.Our #EMNLP paper shows that an LLM usually prefers its own generated CoTs as demonstrations for ICL. 📅I will present this paper in person on Wednesday 4pm at Poster Session E (Jasmine). Come visit our poster!

UMass BioNLP

@UMassBioNLP

10 months

🎉 New paper alert!.Large Language Models are In-context Teachers for Knowledge Reasoning #EMNLP24 finding.🔗 Read the paper: Work done by @jcz12856876 @YaoZonghai @YangZhichaoNLP and Prof. Hong Yu.#BioNLP #InstructionTuning.(0/N).

0

7

10

Jiachen Zhao

@jcz12856876

10 months

RT @RubyFreax: ❓ How do you solve grasping problems when your target object is completely out of sight?. 🚀 Excited to share our latest rese….

0

8

0

Jiachen Zhao

@jcz12856876

11 months

RT @simon_ycl: ❗Are We Truly Achieving Multilingualism in LLMs or Just Relying on Translation?❗. Need multilingual instruction data and ben….

0

16

0

Jiachen Zhao

@jcz12856876

1 year

Our work has been accepted by #EMNLP2024 Findings ! So thankful for my wonderful co-authors !!! All my three projects during my Master's study @UMassAmherst now have happy endings!

3

0

28

Jiachen Zhao

@jcz12856876

1 year

RT @simon_ycl: ☕We release the paper, “Diversity and Conquer: Diversity-Centric Data Selection with Iterative Refinement”. 👉In this paper,….

0

45

0

Jiachen Zhao

@jcz12856876

2 years

RT @mengyer: 🚨 New Research Alert!.People have found safety training of LLMs can be easily undone through finetuning. How can we ensure saf….

0

12

0

Jiachen Zhao

@jcz12856876

2 years

RT @DimaKrotov: In our #NeurIPS2023 paper Energy Transformer we propose a network that unifies three promising ideas in AI: Transformers, E….

0

14

0

Jiachen Zhao

@jcz12856876

2 years

ok, this has been studied.

0

1

Jiachen Zhao

@jcz12856876

2 years

Is it possible to find a metric to locate or explain what training data LLMs generalize from to the test cases? Really interested in how LLMs can magically answer users’ diverse questions. Or it’s simply a result of almost exhaustive training data.

1

2

5

Jiachen Zhao

@jcz12856876

2 years

I will have a poster presentation at ICML TEACH workshop (7/29). The paper is an extension of this tweet 😆that interprets ICL as retrieving from associative memory.

Jiachen Zhao

@jcz12856876

2 years

Is it possible that for the most of time, ChatGPT is only retrieving answers from its memory like a Hopfield Network or an IR system. How good is it for OOD cases.

0

Jiachen Zhao

@jcz12856876

2 years

RT @Andrew_Akbashev: In your application letter for #PhD / postdoc, NEVER ever say:. "Hi prof"."Hello"."Dear Professor"."Greetings of the d….

0

316

0