jcz12856876 Profile Banner
Jiachen Zhao Profile
Jiachen Zhao

@jcz12856876

Followers
152
Following
209
Media
7
Statuses
41

PhD student @KhouryCollege | Scholar @MATSprogram | Prev: @UMassAmherst @HKUST

Boston, MA
Joined October 2022
Don't wanna be here? Send us removal request.
@jcz12856876
Jiachen Zhao
2 months
7/ Very grateful to amazing co-authors! Jing Huang @ZhengxuanZenWu @davidbau @shi_weiyan.
0
0
2
@jcz12856876
Jiachen Zhao
2 months
6/ 💡We have also found representations of harmfulness may vary across different risk categories. Additionally, adversarial finetuning has minimum influence on the internal belief of harmfulness. Read our full paper:
Tweet card summary image
arxiv.org
LLMs are trained to refuse harmful instructions, but do they truly understand harmfulness beyond just refusing? Prior work has shown that LLMs' refusal behaviors can be mediated by a...
1
0
0
@jcz12856876
Jiachen Zhao
2 months
5/ ⚔️ We propose Latent Guard: a safeguard model that uses internal beliefs to detect unsafe inputs or over-refusal cases.
Tweet media one
1
0
0
@jcz12856876
Jiachen Zhao
2 months
4/ 🔓 We also find jailbreak methods can suppress refusal, but LLMs may still internally know the input is harmful.
Tweet media one
1
0
1
@jcz12856876
Jiachen Zhao
2 months
3/ 🧪 Causal evidence: We extract a “harmfulness” direction and a “refusal” direction. We design a reply inversion task where steering with these two directions leads to opposite results!.Harmfulness direction: it will make LLMs interpret benign prompts as harmful.Refusal
Tweet media one
1
0
1
@jcz12856876
Jiachen Zhao
2 months
2/🔗Paper: 💻Code: .📚Project Page: 📊We focus on two token positions t_inst and t_post-inst. Through clustering, we show:. — Harmfulness decides the clustering of instructions at t_inst. — Refusal decides
Tweet media one
1
0
4
@jcz12856876
Jiachen Zhao
2 months
1/ 🚨New Paper 🚨.LLMs are trained to refuse harmful instructions, but internally, do they see harmfulness and refusal as the same? .⚔️We find causal evidence that 👈”LLMs encode harmfulness and refusal separately” 👉. ✂️LLMs may know a prompt is harmful internally yet still
5
18
68
@jcz12856876
Jiachen Zhao
2 months
RT @shi_weiyan: 🤔Long-horizon tasks: How to train LLMs for the marathon?🌀. Submit anything on 🔁"Multi-turn Interactions in LLMs"🔁 to our @N….
0
17
0
@jcz12856876
Jiachen Zhao
3 months
RT @dawnsongtweets: 1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work:. 🔓 CyberGym: AI agents discov….
0
151
0
@jcz12856876
Jiachen Zhao
10 months
What types of exemplar CoTs are better for In-Context Learning?.Our #EMNLP paper shows that an LLM usually prefers its own generated CoTs as demonstrations for ICL. 📅I will present this paper in person on Wednesday 4pm at Poster Session E (Jasmine). Come visit our poster!
Tweet media one
@UMassBioNLP
UMass BioNLP
10 months
🎉 New paper alert!.Large Language Models are In-context Teachers for Knowledge Reasoning #EMNLP24 finding.🔗 Read the paper: Work done by @jcz12856876 @YaoZonghai @YangZhichaoNLP and Prof. Hong Yu.#BioNLP #InstructionTuning.(0/N).
0
7
10
@jcz12856876
Jiachen Zhao
10 months
RT @RubyFreax: ❓ How do you solve grasping problems when your target object is completely out of sight?. 🚀 Excited to share our latest rese….
0
8
0
@jcz12856876
Jiachen Zhao
11 months
RT @simon_ycl: ❗Are We Truly Achieving Multilingualism in LLMs or Just Relying on Translation?❗. Need multilingual instruction data and ben….
0
16
0
@jcz12856876
Jiachen Zhao
1 year
Our work has been accepted by #EMNLP2024 Findings ! So thankful for my wonderful co-authors !!! All my three projects during my Master's study @UMassAmherst now have happy endings!
Tweet media one
3
0
28
@jcz12856876
Jiachen Zhao
1 year
RT @simon_ycl: ☕We release the paper, “Diversity and Conquer: Diversity-Centric Data Selection with Iterative Refinement”. 👉In this paper,….
0
45
0
@jcz12856876
Jiachen Zhao
2 years
RT @mengyer: 🚨 New Research Alert!.People have found safety training of LLMs can be easily undone through finetuning. How can we ensure saf….
0
12
0
@jcz12856876
Jiachen Zhao
2 years
RT @DimaKrotov: In our #NeurIPS2023 paper Energy Transformer we propose a network that unifies three promising ideas in AI: Transformers, E….
0
14
0
@jcz12856876
Jiachen Zhao
2 years
ok, this has been studied.
0
0
1
@jcz12856876
Jiachen Zhao
2 years
Is it possible to find a metric to locate or explain what training data LLMs generalize from to the test cases? Really interested in how LLMs can magically answer users’ diverse questions. Or it’s simply a result of almost exhaustive training data.
1
2
5
@jcz12856876
Jiachen Zhao
2 years
I will have a poster presentation at ICML TEACH workshop (7/29). The paper is an extension of this tweet 😆that interprets ICL as retrieving from associative memory.
@jcz12856876
Jiachen Zhao
2 years
Is it possible that for the most of time, ChatGPT is only retrieving answers from its memory like a Hopfield Network or an IR system. How good is it for OOD cases.
0
0
0
@jcz12856876
Jiachen Zhao
2 years
RT @Andrew_Akbashev: In your application letter for #PhD / postdoc, NEVER ever say:. "Hi prof"."Hello"."Dear Professor"."Greetings of the d….
0
316
0