Jie Ruan Profile
Jie Ruan

@JieRuan75

Followers
149
Following
135
Media
3
Statuses
59

PhD student at University of Michigan @UMich | Mphil from Peking University @PKU1898.

Joined November 2023
Don't wanna be here? Send us removal request.
@JieRuan75
Jie Ruan
2 months
🔍LLMs now give medical diagnoses, legal advice, and even tackle scientific problems. ❓Your LLM sounds smart. But what if it’s just good at faking expertise?.🚀We built ExpertLongBench to find out. 📉And the results? They revealed several concerns.👇.🔗
Tweet media one
1
20
33
@JieRuan75
Jie Ruan
1 day
RT @liusiyang_641: New RAG-empowered tool for cancer care prep. Helps patients go from uninformed to visit-ready - by guiding knowledge, v….
0
11
0
@JieRuan75
Jie Ruan
1 day
RT @NaihaoDeng: 🚨 Excited to share a set of papers I led or collaborated on that are being presented at #ACL2025 this week! 🧵👇. 1. Rethinki….
Tweet card summary image
arxiv.org
Recent advances in table understanding have focused on instruction-tuning large language models (LLMs) for table-related tasks. However, existing research has overlooked the impact of...
0
10
0
@JieRuan75
Jie Ruan
8 days
RT @OwainEvans_UK: New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only….
0
1K
0
@JieRuan75
Jie Ruan
1 month
RT @zkjzou: 🔥 Excited to introduce ManyICLBench (ACL 2025).🧐 Do many-shot ICL tasks evaluate LCLMs' ability to retrieve the most similar ex….
Tweet card summary image
arxiv.org
Many-shot in-context learning (ICL) has emerged as a unique setup to both utilize and test the ability of large language models to handle long context. This paper delves into long-context language...
0
19
0
@JieRuan75
Jie Ruan
1 month
RT @MKhalifaaaa: 🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 @COLM_conf is approaching!🚨. https:/….
0
11
0
@JieRuan75
Jie Ruan
2 months
11/.Grateful to collaborate with @InderjeetNair, @ ShuyangCao, @amyliiu, @sheza_munir, and @LuWang__, with support from @launchnlp, @michigan_AI, and many amazing domain experts!.
0
0
1
@JieRuan75
Jie Ruan
2 months
10/.Your input is invaluable in making ExpertLongBench more representative and impactful across expert domains. Let’s build better evaluations for expert-level AI — together 🔬🧠⚖️.
1
0
1
@JieRuan75
Jie Ruan
2 months
9/.📢 We actively encourage contributions from the research community — including:.- ✅ Proposing new tasks and contributing data.- 🔁 Suggesting improvements to existing ones.- 🧠 Sharing domain-specific insights ⚖️🧪🏥📚.
1
0
1
@JieRuan75
Jie Ruan
2 months
8/.✅ LLMs are great at trivia. ❌ But when it comes to replacing real experts?. They’ve got a long road ahead. 🔗Start here:.Leaderboard: Paper: Data: Code:
Tweet card summary image
github.com
Contribute to launchnlp/ExpertLongBench development by creating an account on GitHub.
1
0
1
@JieRuan75
Jie Ruan
2 months
7/.Worse: models often "cover" the right checklist items… but get them wrong. ✅ High coverage ≠ high quality. 🚨A model might sound right — but still mislead you. That’s risky in law, medicine, science.
1
0
1
@JieRuan75
Jie Ruan
2 months
6/.So… how did today’s best LLMs do?.🟥 GPT-4o?.🟥 Claude 3?.🟥 Gemini?.🥶 Top score: 26.8 F1. 📉 On task T2: Legal Statement of Fact Generation, the best model scored just 7.9. Let that sink in. ⚠️ Even the best models barely passed.
1
0
1
@JieRuan75
Jie Ruan
2 months
5/.💡Also cool: CLEAR works with open-source models like Qwen2.5-72B. No need to worry about shifting OpenAI APIs — get reproducible results using Qwen. 📈We found high agreement and strong correlation — evaluation that’s scalable, transparent, and reproducible.
1
0
1
@JieRuan75
Jie Ruan
2 months
4/.🧠But how to evaluate complex outputs like legal summaries or ESG reports?.✅ We built CLEAR — a checklist-based eval framework grounded in expert-written rubrics.
Tweet media one
1
0
1
@JieRuan75
Jie Ruan
2 months
3/. EXPERTLONGBENCH spans 11 tasks across 9 domains:. ⚖️ Law. 🧪 Chemistry. 🏥 Healthcare. 📚 Education. 💰 Finance . 🧬 Biology. and more. Input: up to 200K tokens.Output: up to 5,000+ tokens.⚠️ This isn’t a quiz. This is work.
1
0
1
@JieRuan75
Jie Ruan
2 months
2/.🤖Most benchmarks?.✅Multiple-choice. ✅Short answers. But real experts….✍️ draft legal briefs.🩺write clinical notes.🧪explain chemical syntheses.— stuff that takes tens of hours. We turned those into benchmark tasks.
1
0
1
@JieRuan75
Jie Ruan
4 months
RT @YunxiangZhang4: 🚨 New Benchmark Drop!.Can LLMs actually do ML research? Not toy problems, not Kaggle tweaks—but real, unsolved ML confe….
0
35
0
@JieRuan75
Jie Ruan
9 months
RT @shi_weiyan: It feels emotional to hear that #EMNLP is going back to China after 10 years🥹🥹🥹 thanks @emnlpmeeting ❤️❤️❤️ .
0
6
0
@JieRuan75
Jie Ruan
9 months
RT @liusiyang_641: "The Invisible Minority" – Older Adults 👵👴.Age bias is often overlooked compared to gender or race, yet by 2030, 1 in 6….
0
14
0
@JieRuan75
Jie Ruan
9 months
RT @FrederickXZhang: Heard of the Alaska-Hawaii merger?🤔Wonder if LLMs know it’s pending government approval before it can happen? They stu….
0
18
0