Liyan Tang @LiyanTang4 X Profile

Liyan Tang

@LiyanTang4

Followers

211

Following

156

Media

18

Statuses

169

Fourth-year PhD @UTAustin || NLP || MiniCheck || Intern @GoogleDeepMind; Prev Intern @bespokelabsai, @AmazonScience

Austin, TX, US

Joined February 2022

Don't wanna be here? Send us removal request.

Liyan Tang

@LiyanTang4

1 year

🔎📄New model & benchmark to check LLMs’ output against docs (e.g., fact-check RAG). 🕵️ MiniCheck: a model w/GPT-4 accuracy @ 400x cheaper. 📚LLM-AggreFact: collects 10 human-labeled datasets of errors in model outputs. w/ @PhilippeLaban, @gregd_nlp 🧵

3

26

86

Liyan Tang

@LiyanTang4

21 days

RT @ZEYULIU10: LLMs trained to memorize new facts can’t use those facts well.🤔. We apply a hypernetwork to ✏️edit✏️ the gradients for fact….

0

61

0

Liyan Tang

@LiyanTang4

25 days

RT @xiye_nlp: 🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for….

0

17

0

Liyan Tang

@LiyanTang4

1 month

RT @fangcong_y10593: Solving complex problems with CoT requires combining different skills. We can do this by:.🧩Modify the CoT data format….

0

31

0

Liyan Tang

@LiyanTang4

1 month

RT @PuyuanPeng: The paper is out!.

0

11

0

Liyan Tang

@LiyanTang4

2 months

RT @gregd_nlp: Check out ChartMuseum from @LiyanTang4 @_grace_kim and many other collaborators from UT!. Charts questions take us beyond cu….

0

9

0

Liyan Tang

@LiyanTang4

2 months

Thanks to the awesome team at UT TAUR lab!. @_grace_kim, @lucy_xyzhao, @thomlake, @Wenxuan_Ding_ , @fangcong_y10593, @prasann_singhal, @ManyaWadhwa1, @ZEYULIU10, @ZayneSprague, @ramya_namuduri, @BodunHu, @juand_r_nlp , @PuyuanPeng, @gregd_nlp.

0

3

Liyan Tang

@LiyanTang4

2 months

Read the full paper: ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models.🏅Leaderboard: 🤗 Dataset: Code:

1

4

Liyan Tang

@LiyanTang4

2 months

❌ Extended-thinking in CoTs yields minimal improvement in chart understanding. ⁉️ Why? Fundamental limitations in their visual reasoning capabilities. We identify 4 key shortcomings below and find that models sometimes cannot find the right strategy for visual questions

1

0

4

Liyan Tang

@LiyanTang4

2 months

Existing chart QA benchmarks have limitations.❌ Limited real-world chart sources.❌ Questions are created with LLM in the loop.❌ Saturated/similar model performance. ChartMuseum.✅ 184 chart sources.✅ Entirely human-written questions.✅ Clear distinctions in model performance

1

0

6

Liyan Tang

@LiyanTang4

2 months

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts!. ✍🏻Entirely human-written questions by 13 CS researchers.👀Emphasis on visual reasoning – hard to be verbalized via text CoTs.📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

2

28

74

Liyan Tang

@LiyanTang4

2 months

RT @PhilippeLaban: 🆕paper: LLMs Get Lost in Multi-Turn Conversation. In real life, people don’t speak in perfect prompts. So we simulate mu….

0

31

0

Liyan Tang

@LiyanTang4

2 months

RT @AnirudhKhatry: 🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️.A dataset of 100 real-world C repo….

0

18

0

Liyan Tang

@LiyanTang4

2 months

RT @gregd_nlp: New work led by @LiyanTang4 with a strong new model for chart understanding! Check out the blog post, model, and playground!….

0

8

0

Liyan Tang

@LiyanTang4

2 months

Check out my work at @bespokelabsai We release Bespoke-MiniChart-7B, a new SOTA in chart understanding of its size. Chart understanding is really fun and challenging and requires reasoning skills beyond math reasoning. It's a great starting point for open chart model development!.

Bespoke Labs

@bespokelabsai

2 months

Announcing Bespoke-MiniChart-7B, a new SOTA in chart understanding for models of comparable size on seven benchmarks, on par with Gemini-1.5-Pro and Claude-3.5! 🚀. Beyond its real-world applications, chart understanding is a good challenging problem for VLMs, since it requires

0

9

30

Liyan Tang

@LiyanTang4

3 months

RT @gregd_nlp: Check out Manya's work on evaluation for open-ended tasks! The criteria from EvalAgent can be plugged into LLM-as-a-judge or….

0

3

0

Liyan Tang

@LiyanTang4

3 months

RT @gregd_nlp: Check out Ramya et al.'s work on understanding discourse similarities in LLM-generated text! We see this as an important ste….

0

2

0

Liyan Tang

@LiyanTang4

3 months

RT @bespokelabsai: OpenAI’s o4 just showed that multi-turn tool use is a huge deal for AI agents. Today, we show how to do the same with yo….

0

49

0

Liyan Tang

@LiyanTang4

3 months

RT @bespokelabsai: Announcing Reasoning Datasets Competition📢in collaboration with @huggingface and @togethercompute.Since the launch of D….

0

43

0

Liyan Tang

@LiyanTang4

6 months

RT @madiator: Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe. The….

0

136

0

Liyan Tang

@LiyanTang4

6 months

RT @madiator: Deepseek has done it again! This time, lots of action packed insights, stuff that the top labs are not willing to share. Som….

0

54

0