ZhiyuanZeng_ Profile Banner
Zhiyuan Zeng Profile
Zhiyuan Zeng

@ZhiyuanZeng_

Followers
467
Following
550
Media
21
Statuses
227

PhD-ing @uwnlp @uwcse | Prev. @Tsinghua_Uni @TsinghuaNLP @princeton_nlp

Seattle, WA
Joined April 2023
Don't wanna be here? Send us removal request.
@ZhiyuanZeng_
Zhiyuan Zeng
4 months
Is a single accuracy number all we can get from model evals?🤔.🚨Does NOT tell where the model fails.🚨Does NOT tell how to improve it. Introducing EvalTree🌳.🔍identifying LM weaknesses in natural language.🚀weaknesses serve as actionable guidance. (paper&demo 🔗in🧵). [1/n]
Tweet media one
Tweet media two
Tweet media three
4
87
239
@ZhiyuanZeng_
Zhiyuan Zeng
4 hours
RT @VictoriaWGraf: Worried about overfitting to IFEval? 🤔 Use ✨IFBench✨ our new, challenging instruction-following benchmark!. Loved workin….
0
6
0
@ZhiyuanZeng_
Zhiyuan Zeng
5 hours
RT @valentina__py: 💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of….
0
44
0
@ZhiyuanZeng_
Zhiyuan Zeng
14 hours
RT @danieljwkim: Can we improve Llama 3’s reasoning abilities through post-training only?.Introducing ASTRO, our new framework that teaches….
0
24
0
@ZhiyuanZeng_
Zhiyuan Zeng
4 days
RT @Cumquaaa: 🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MAD….
0
4
0
@ZhiyuanZeng_
Zhiyuan Zeng
10 days
RT @chrome1996: Have you noticed….🔍 Aligned LLM generations feel less diverse?.🎯 Base models are decoding-sensitive?.🤔 Generations get more….
0
25
0
@ZhiyuanZeng_
Zhiyuan Zeng
14 days
RT @ChengZhoujun: 🤯What we know about RL for reasoning might not hold outside math and code?. We revisit established findings on RL for LLM….
0
55
0
@ZhiyuanZeng_
Zhiyuan Zeng
17 days
RT @xuhaoxh: Wanna 🔎 inside Internet-scale LLM training data w/o spending 💰💰💰?.Introducing infini-gram mini, an exact-match search engine w….
0
18
0
@ZhiyuanZeng_
Zhiyuan Zeng
20 days
RT @HannaHajishirzi: Yayyy!!! Best paper honorable mention at CVPR goes to our Molmo and Pixmo @allen_ai! This is now becoming a tend :) L….
0
7
0
@ZhiyuanZeng_
Zhiyuan Zeng
20 days
RT @sarahwiegreffe: A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of M….
0
48
0
@ZhiyuanZeng_
Zhiyuan Zeng
21 days
RT @RulinShao: 🎉Our Spurious Rewards is available on ArXiv! We added experiments on.- More prompts/steps/models/analysis. - Spurious Prom….
0
40
0
@ZhiyuanZeng_
Zhiyuan Zeng
21 days
RT @StellaLisy: Spurious Rewards was not all‼️We now present spurious PROMPTS🤔 check out our latest findings and discussion on evaluation:….
0
26
0
@ZhiyuanZeng_
Zhiyuan Zeng
21 days
RT @allen_ai: We are #1 on the @huggingface heatmap - this is what true openness looks like!🥇🎉 . 750+ models.230+ datasets.And counting. ….
0
27
0
@ZhiyuanZeng_
Zhiyuan Zeng
22 days
RT @Diyi_Yang: AI agents are transforming the workforce! . We mapped how AI agents could #automate vs. #augment jobs across the U.S. workfo….
0
5
0
@ZhiyuanZeng_
Zhiyuan Zeng
24 days
RT @jcqln_h: LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generati….
0
18
0
@ZhiyuanZeng_
Zhiyuan Zeng
25 days
RT @yikewang_: LLMs are helpful for scientific research — but will they continuously be helpful?. Introducing 🔍ScienceMeter: current knowle….
0
53
0
@ZhiyuanZeng_
Zhiyuan Zeng
26 days
RT @natolambert: My path into AI.The sort of small wins that accumulate into a real career in AI. When I started grad school AI prof's didn….
0
31
0
@ZhiyuanZeng_
Zhiyuan Zeng
27 days
RT @RulinShao: Qwen3-0.6B x Wikipedia datastore is now supported in massive-serve! Serve a local API in one line:. massive-serve serve --do….
0
7
0
@ZhiyuanZeng_
Zhiyuan Zeng
29 days
RT @jxmnop: here are three awesome researchers everyone should follow: .- Songlin (@SonglinYang4 / phd at MIT),.- Will @lambdaviking (phd….
0
40
0
@ZhiyuanZeng_
Zhiyuan Zeng
29 days
RT @Cohere_Labs: Next week on Wednesday, June 11th we're excited to welcome @StellaLisy for a session on "Spurious Rewards: Rethinking Trai….
0
6
0
@ZhiyuanZeng_
Zhiyuan Zeng
29 days
RT @HannaHajishirzi: Check out who the 2025 ACM Dissertation Award honorees are this year — our very own @sewon__min and @sharma_ashish_2!….
0
4
0