Liyan Tang Profile
Liyan Tang

@LiyanTang4

Followers
211
Following
156
Media
18
Statuses
169

Fourth-year PhD @UTAustin || NLP || MiniCheck || Intern @GoogleDeepMind; Prev Intern @bespokelabsai, @AmazonScience

Austin, TX, US
Joined February 2022
Don't wanna be here? Send us removal request.
@LiyanTang4
Liyan Tang
1 year
🔎📄New model & benchmark to check LLMs’ output against docs (e.g., fact-check RAG). 🕵️ MiniCheck: a model w/GPT-4 accuracy @ 400x cheaper. 📚LLM-AggreFact: collects 10 human-labeled datasets of errors in model outputs. w/ @PhilippeLaban, @gregd_nlp 🧵
Tweet media one
Tweet media two
3
26
86
@LiyanTang4
Liyan Tang
21 days
RT @ZEYULIU10: LLMs trained to memorize new facts can’t use those facts well.🤔. We apply a hypernetwork to ✏️edit✏️ the gradients for fact….
0
61
0
@LiyanTang4
Liyan Tang
25 days
RT @xiye_nlp: 🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for….
0
17
0
@LiyanTang4
Liyan Tang
1 month
RT @fangcong_y10593: Solving complex problems with CoT requires combining different skills. We can do this by:.🧩Modify the CoT data format….
0
31
0
@LiyanTang4
Liyan Tang
1 month
RT @PuyuanPeng: The paper is out!.
Tweet media one
0
11
0
@LiyanTang4
Liyan Tang
2 months
RT @gregd_nlp: Check out ChartMuseum from @LiyanTang4 @_grace_kim and many other collaborators from UT!. Charts questions take us beyond cu….
0
9
0
@LiyanTang4
Liyan Tang
2 months
Read the full paper: ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models.🏅Leaderboard: 🤗 Dataset: Code:
1
1
4
@LiyanTang4
Liyan Tang
2 months
❌ Extended-thinking in CoTs yields minimal improvement in chart understanding. ⁉️ Why? Fundamental limitations in their visual reasoning capabilities. We identify 4 key shortcomings below and find that models sometimes cannot find the right strategy for visual questions
Tweet media one
1
0
4
@LiyanTang4
Liyan Tang
2 months
Existing chart QA benchmarks have limitations.❌ Limited real-world chart sources.❌ Questions are created with LLM in the loop.❌ Saturated/similar model performance. ChartMuseum.✅ 184 chart sources.✅ Entirely human-written questions.✅ Clear distinctions in model performance
Tweet media one
1
0
6
@LiyanTang4
Liyan Tang
2 months
Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts!. ✍🏻Entirely human-written questions by 13 CS researchers.👀Emphasis on visual reasoning – hard to be verbalized via text CoTs.📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B
Tweet media one
Tweet media two
2
28
74
@LiyanTang4
Liyan Tang
2 months
RT @PhilippeLaban: 🆕paper: LLMs Get Lost in Multi-Turn Conversation. In real life, people don’t speak in perfect prompts. So we simulate mu….
0
31
0
@LiyanTang4
Liyan Tang
2 months
RT @AnirudhKhatry: 🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️.A dataset of 100 real-world C repo….
0
18
0
@LiyanTang4
Liyan Tang
2 months
RT @gregd_nlp: New work led by @LiyanTang4 with a strong new model for chart understanding! Check out the blog post, model, and playground!….
0
8
0
@LiyanTang4
Liyan Tang
2 months
Check out my work at @bespokelabsai We release Bespoke-MiniChart-7B, a new SOTA in chart understanding of its size. Chart understanding is really fun and challenging and requires reasoning skills beyond math reasoning. It's a great starting point for open chart model development!.
@bespokelabsai
Bespoke Labs
2 months
Announcing Bespoke-MiniChart-7B, a new SOTA in chart understanding for models of comparable size on seven benchmarks, on par with Gemini-1.5-Pro and Claude-3.5! 🚀. Beyond its real-world applications, chart understanding is a good challenging problem for VLMs, since it requires
0
9
30
@LiyanTang4
Liyan Tang
3 months
RT @gregd_nlp: Check out Manya's work on evaluation for open-ended tasks! The criteria from EvalAgent can be plugged into LLM-as-a-judge or….
0
3
0
@LiyanTang4
Liyan Tang
3 months
RT @gregd_nlp: Check out Ramya et al.'s work on understanding discourse similarities in LLM-generated text! We see this as an important ste….
0
2
0
@LiyanTang4
Liyan Tang
3 months
RT @bespokelabsai: OpenAI’s o4 just showed that multi-turn tool use is a huge deal for AI agents. Today, we show how to do the same with yo….
0
49
0
@LiyanTang4
Liyan Tang
3 months
RT @bespokelabsai: Announcing Reasoning Datasets Competition📢in collaboration with @huggingface and @togethercompute.Since the launch of D….
0
43
0
@LiyanTang4
Liyan Tang
6 months
RT @madiator: Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe. The….
0
136
0
@LiyanTang4
Liyan Tang
6 months
RT @madiator: Deepseek has done it again! This time, lots of action packed insights, stuff that the top labs are not willing to share. Som….
0
54
0