Zihao Zhao
@ZihaoZhao1
Followers
42
Following
11
Media
4
Statuses
9
CS PhD student @jhuclsp | AI safety & privacy Previous: Undergrad @jhucompsci
Joined August 2022
Catch @ZihaoZhao1 at todayโs poster session (10:30โ12:00) where he'll be presenting SynthTextEval! Stop by if you're interested in synthetic text for high-stakes domains. Zihao also has another EMNLP paper on private text generation, for people interested in this space! @jhuclsp
๐ SynthTextEval, our open-source toolkit for generating and evaluating synthetic text data for high-stakes domains, will be featured at EMNLP 2025 as a system demonstration! GitHub: https://t.co/vPs1AEZNNS Paper ๐: https://t.co/V09UDoNVeZ
#EMNLP2025 #EMNLP #SyntheticData
0
3
8
Thank you to @anjalie_f for advising. Hands-on with DP-SGD? Start with our another paper and open-source package ( https://t.co/XZK5nG7d93
https://t.co/rnZXQsM2cX)
github.com
SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data For High-Stakes Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval
0
1
1
๐ ๐ฃ๐ฎ๐ฝ๐ฒ๐ฟ & ๐ฐ๐ผ๐ฑ๐ฒ Paper is accepted to EMNLP 2025 Main arXiv: https://t.co/IgX1Jh7DZf Code: https://t.co/RzbFVzyWtA
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM
github.com
Controlled Generation for Private Synthetic Text. Contribute to zzhao71/Controlled-Generation-for-Private-Synthetic-Text development by creating an account on GitHub.
1
1
3
4/5 ๐ ๐จ๐๐ถ๐น๐ถ๐๐ On TAB, prefix-tuning+masking gives best utility (Perplexity โ 10.2, MAUVE โ 0.83), beating ICL and DP-SGD. Similar trends on MIMIC-III.
1
0
1
3/5๐ ๐ฃ๐ฟ๐ถ๐๐ฎ๐ฐ๐ ICL+blocking: ~0.00% privacy leakage (avg in our runs). Prefix-tuning+masking yields the lowest ROUGE vs training data (e.g., ROUGE-L โ 0.098), indicating less copying.
1
0
1
2/5 ๐ง ๐๐ผ๐ ๐ถ๐ ๐๐ผ๐ฟ๐ธ๐ โข Build control codes from detected private entities (PERSON, ORG, LOC, etc.). โข Generate with either ICL (and block those identifiers at decode time) or prefix-tuning with a privacy mask + KL/contrastive losses.
1
0
1
๐Text anonymization is hard; DP often hurts utility. We use entity-aware control codes + either ICL(with bad-token blocking) or prefix-tuning w/ masking to get strong privacyโutility tradeoffs on legal & clinical data, outperforming DP-SGD in practice. https://t.co/Kt0PIoYsq3
1
10
22
We introduce WaltzRL๐ถ, a multi-agent RL framework that treats LLM safety as a positive-sum game between conversation & feedback agents. It strikes an elegant balance between helpfulness & harmlessness, boosting safety & reduces overrefusals without degrading capabilities!
๐New Multi-Agent RL Method: WaltzRL๐ ๐: https://t.co/KE8dM9kX1r - Makes LLM safety a positive-sum game between a conversation & feedback agent - At inference feedback is adaptive, used when needed -> Improves safety & reduces overrefusals without degrading capabilities! ๐งต1/5
2
18
75
Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")? Our new paper shows AI which models othersโ minds as Python code ๐ป can quickly and accurately predict human behavior! https://t.co/1t2fsW7jyL๐งต
4
33
100