Jiachen Zhao
@jcz12856876
Followers
156
Following
211
Media
7
Statuses
42
PhD student @KhouryCollege | Scholar @MATSprogram | Prev: @UMassAmherst @HKUST
Boston, MA
Joined October 2022
New paper: You can make ChatGPT 2x as creative with one sentence. Ever notice how LLMs all sound the same? They know 100+ jokes but only ever tell one. Every blog intro: "In today's digital landscape..." We figured out why – and how to unlock the rest 🔓 Copy-paste prompt: 🧵
60
155
1K
0
0
2
6/ 💡We have also found representations of harmfulness may vary across different risk categories. Additionally, adversarial finetuning has minimum influence on the internal belief of harmfulness. Read our full paper:
arxiv.org
LLMs are trained to refuse harmful instructions, but do they truly understand harmfulness beyond just refusing? Prior work has shown that LLMs' refusal behaviors can be mediated by a...
1
0
0
5/ ⚔️ We propose Latent Guard: a safeguard model that uses internal beliefs to detect unsafe inputs or over-refusal cases.
1
0
0
4/ 🔓 We also find jailbreak methods can suppress refusal, but LLMs may still internally know the input is harmful.
1
0
1
3/ 🧪 Causal evidence: We extract a “harmfulness” direction and a “refusal” direction. We design a reply inversion task where steering with these two directions leads to opposite results! Harmfulness direction: it will make LLMs interpret benign prompts as harmful Refusal
1
0
1
2/🔗Paper: https://t.co/iUCeWt9xUj 💻Code: https://t.co/omHob8isAC 📚Project Page: https://t.co/JlcnVH6Jtq 📊We focus on two token positions t_inst and t_post-inst. Through clustering, we show: — Harmfulness decides the clustering of instructions at t_inst — Refusal decides
1
0
4
1/ 🚨New Paper 🚨 LLMs are trained to refuse harmful instructions, but internally, do they see harmfulness and refusal as the same? ⚔️We find causal evidence that 👈”LLMs encode harmfulness and refusal separately” 👉. ✂️LLMs may know a prompt is harmful internally yet still
6
15
73
🤔Long-horizon tasks: How to train LLMs for the marathon?🌀 Submit anything on 🔁"Multi-turn Interactions in LLMs"🔁 to our @NeurIPSConf workshop by 08/22: 📕 Multi-Turn RL ⚖️ Multi-Turn Alignment 💬 Multi-Turn Human-AI Teaming 📊 Multi-Turn Eval ♾️You name it! #neurips #LLM
🚀 Call for Papers — @NeurIPSConf 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,
1
15
79
1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖
28
150
537
What types of exemplar CoTs are better for In-Context Learning? Our #EMNLP paper shows that an LLM usually prefers its own generated CoTs as demonstrations for ICL. 📅I will present this paper in person on Wednesday 4pm at Poster Session E (Jasmine). Come visit our poster!
🎉 New paper alert! Large Language Models are In-context Teachers for Knowledge Reasoning #EMNLP24 finding 🔗 Read the paper: https://t.co/jkOPm4cSZL Work done by @jcz12856876 @YaoZonghai @YangZhichaoNLP and Prof. Hong Yu #BioNLP #InstructionTuning (0/N)
0
8
10
❓ How do you solve grasping problems when your target object is completely out of sight? 🚀 Excited to share our latest research! Check out ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter. 🔗 Site: https://t.co/bWNXlTiUea
1
8
28
❗Are We Truly Achieving Multilingualism in LLMs or Just Relying on Translation?❗ Need multilingual instruction data and benchmarks? Just translate from English. LLM multilingualism can be easily solved! If you agree, check out our #EMNLP 2024 paper which says this is
1
17
54
Our work has been accepted by #EMNLP2024 Findings ! So thankful for my wonderful co-authors !!! All my three projects during my Master's study @UMassAmherst now have happy endings!
3
0
28
☕We release the paper, “Diversity and Conquer: Diversity-Centric Data Selection with Iterative Refinement” 👉In this paper, we explore the critical role of diversity in data selection and introduce a novel approach with iterative selection. 📜 https://t.co/IXOrh94NUd 🧵Below
5
45
150
🚨 New Research Alert! People have found safety training of LLMs can be easily undone through finetuning. How can we ensure safety in customized LLM finetuning while making finetuning still useful? Check out our latest work led by Jiachen Zhao! @jcz12856876 🔍 Our study reveals:
1
12
66
In our #NeurIPS2023 paper Energy Transformer we propose a network that unifies three promising ideas in AI: Transformers, Energy-based Models (EBMs), and Associative Memory. The inference step in our model performs a descent dynamics on a specially engineered energy function. Our
⚡️Energy Transformer (ET)⚡️ A novel architecture combining 3 prominent ideas in AI 1️⃣ Transformers: mix tokens with attention 2️⃣ Energy-based Models: inference descends a tractable energy function 3️⃣ Associative Memory: inference performs error correction #NeurIPS2023 A 🧵:
0
14
73
Is it possible to find a metric to locate or explain what training data LLMs generalize from to the test cases? Really interested in how LLMs can magically answer users’ diverse questions. Or it’s simply a result of almost exhaustive training data.
1
2
5
I will have a poster presentation at ICML TEACH workshop (7/29). The paper is an extension of this tweet 😆that interprets ICL as retrieving from associative memory.
Is it possible that for the most of time, ChatGPT is only retrieving answers from its memory like a Hopfield Network or an IR system. How good is it for OOD cases
0
0
0