Zhiyuan Zeng @ZhiyuanZeng_ X Profile

Zhiyuan Zeng

@ZhiyuanZeng_

Followers

467

Following

550

Media

21

Statuses

227

PhD-ing @uwnlp @uwcse | Prev. @Tsinghua_Uni @TsinghuaNLP @princeton_nlp

Seattle, WA

Joined April 2023

Don't wanna be here? Send us removal request.

Zhiyuan Zeng

@ZhiyuanZeng_

4 months

Is a single accuracy number all we can get from model evals?🤔.🚨Does NOT tell where the model fails.🚨Does NOT tell how to improve it. Introducing EvalTree🌳.🔍identifying LM weaknesses in natural language.🚀weaknesses serve as actionable guidance. (paper&demo 🔗in🧵). [1/n]

4

87

239

Zhiyuan Zeng

@ZhiyuanZeng_

4 hours

RT @VictoriaWGraf: Worried about overfitting to IFEval? 🤔 Use ✨IFBench✨ our new, challenging instruction-following benchmark!. Loved workin….

0

6

0

Zhiyuan Zeng

@ZhiyuanZeng_

5 hours

RT @valentina__py: 💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of….

0

44

0

Zhiyuan Zeng

@ZhiyuanZeng_

14 hours

RT @danieljwkim: Can we improve Llama 3’s reasoning abilities through post-training only?.Introducing ASTRO, our new framework that teaches….

0

24

0

Zhiyuan Zeng

@ZhiyuanZeng_

4 days

RT @Cumquaaa: 🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MAD….

0

4

0

Zhiyuan Zeng

@ZhiyuanZeng_

10 days

RT @chrome1996: Have you noticed….🔍 Aligned LLM generations feel less diverse?.🎯 Base models are decoding-sensitive?.🤔 Generations get more….

0

25

0

Zhiyuan Zeng

@ZhiyuanZeng_

14 days

RT @ChengZhoujun: 🤯What we know about RL for reasoning might not hold outside math and code?. We revisit established findings on RL for LLM….

0

55

0

Zhiyuan Zeng

@ZhiyuanZeng_

17 days

RT @xuhaoxh: Wanna 🔎 inside Internet-scale LLM training data w/o spending 💰💰💰?.Introducing infini-gram mini, an exact-match search engine w….

0

18

0

Zhiyuan Zeng

@ZhiyuanZeng_

20 days

RT @HannaHajishirzi: Yayyy!!! Best paper honorable mention at CVPR goes to our Molmo and Pixmo @allen_ai! This is now becoming a tend :) L….

0

7

0

Zhiyuan Zeng

@ZhiyuanZeng_

20 days

RT @sarahwiegreffe: A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of M….

0

48

0

Zhiyuan Zeng

@ZhiyuanZeng_

21 days

RT @RulinShao: 🎉Our Spurious Rewards is available on ArXiv! We added experiments on.- More prompts/steps/models/analysis. - Spurious Prom….

0

40

0

Zhiyuan Zeng

@ZhiyuanZeng_

21 days

RT @StellaLisy: Spurious Rewards was not all‼️We now present spurious PROMPTS🤔 check out our latest findings and discussion on evaluation:….

0

26

0

Zhiyuan Zeng

@ZhiyuanZeng_

21 days

RT @allen_ai: We are #1 on the @huggingface heatmap - this is what true openness looks like!🥇🎉 . 750+ models.230+ datasets.And counting. ….

0

27

0

Zhiyuan Zeng

@ZhiyuanZeng_

22 days

RT @Diyi_Yang: AI agents are transforming the workforce! . We mapped how AI agents could #automate vs. #augment jobs across the U.S. workfo….

0

5

0

Zhiyuan Zeng

@ZhiyuanZeng_

24 days

RT @jcqln_h: LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generati….

0

18

0

Zhiyuan Zeng

@ZhiyuanZeng_

25 days

RT @yikewang_: LLMs are helpful for scientific research — but will they continuously be helpful?. Introducing 🔍ScienceMeter: current knowle….

0

53

0

Zhiyuan Zeng

@ZhiyuanZeng_

26 days

RT @natolambert: My path into AI.The sort of small wins that accumulate into a real career in AI. When I started grad school AI prof's didn….

0

31

0

Zhiyuan Zeng

@ZhiyuanZeng_

27 days

RT @RulinShao: Qwen3-0.6B x Wikipedia datastore is now supported in massive-serve! Serve a local API in one line:. massive-serve serve --do….

0

7

0

Zhiyuan Zeng

@ZhiyuanZeng_

29 days

RT @jxmnop: here are three awesome researchers everyone should follow: .- Songlin (@SonglinYang4 / phd at MIT),.- Will @lambdaviking (phd….

0

40

0

Zhiyuan Zeng

@ZhiyuanZeng_

29 days

RT @Cohere_Labs: Next week on Wednesday, June 11th we're excited to welcome @StellaLisy for a session on "Spurious Rewards: Rethinking Trai….

0

6

0

Zhiyuan Zeng

@ZhiyuanZeng_

29 days

RT @HannaHajishirzi: Check out who the 2025 ACM Dissertation Award honorees are this year — our very own @sewon__min and @sharma_ashish_2!….

0

4

0