NJingwei Profile Banner
Ni Jingwei Profile
Ni Jingwei

@NJingwei

Followers
87
Following
18
Media
7
Statuses
38

Doctoral Researcher @ ETH Zurich, Research Interests: NLP for Social Good, Claim Detection and Verification, Causal NLP

Zurich, Switzerland
Joined December 2020
Don't wanna be here? Send us removal request.
@NJingwei
Ni Jingwei
2 months
RT @YinyaHuang: 🤖⚛️Can AI truly see Physics? Test your model with the newly released SeePhys Benchmark! 🚀. 🖼️Covering 2,000 vision-text mul….
0
16
0
@NJingwei
Ni Jingwei
2 months
👏 Big applause to my co-authors! @yu_fan_768, Jakob, Etienne, Yang, Yoan, @YinyaHuang, @akhtarmubashara, Florian, Oliver, Daniel, @LeippoldMarkus, @mrinmayasachan, @Stremitzer_Lab, Christoph Engel, @ellliottt, and @joelniklaus!.
0
0
3
@NJingwei
Ni Jingwei
2 months
🚀 Huge thanks to the community!.📈 LEXam is now the #1 trending evaluation dataset on Hugging Face! Check it out: 🧠 Built for deep legal reasoning across 340 law exams.👩‍⚖️ Expert LLM judge evaluations, long-form + MCQs. #LegalNLP #Benchmark #LLM.
@NJingwei
Ni Jingwei
2 months
🚨Time to push LLMs further! .📚LEXam: The legal reasoning benchmark you’ve been waiting for:.• 340 exams, 116 courses (EN/DE).• Long-form, process-focused questions.• Expert-level LLM Judges.• Rich meta for targeted diagnostics.• Contamination-proof, extendable MCQs.[1/6]🧵
Tweet media one
1
6
9
@NJingwei
Ni Jingwei
2 months
6⃣ Future-proof: Contamination & Extendability. MCQs are permutable and extensible, making data contamination easy to check and new challenges easy to add. Benchmark stays tough as models evolve!🔒🔁. 🔗Check out the paper! [6/6]🧵.
0
0
1
@NJingwei
Ni Jingwei
2 months
4⃣ Dig Into Weaknesses with Rich Metadata. Every question includes detailed metadata (course, topic, reasoning steps), so you can pinpoint where your model struggles—be it contract law, criminal law, or multi-hop reasoning. 🔎📊. [5/6]🧵.
1
0
1
@NJingwei
Ni Jingwei
2 months
3⃣ LLM-as-Judge: Reliable, Scalable Evaluation. Expert-level LLMs (with human spot-checks) provide consistent grading—making large-scale evaluation finally practical for legal QA. 🏛️🤖. [4/6]🧵.
1
0
1
@NJingwei
Ni Jingwei
2 months
2⃣ Beyond Short Answers: Process Matters. LEXam requires LLMs to reason step-by-step, not just spit out final answers. Tasks include long-form answers and multi-stage MCQs that test real legal reasoning and chain-of-thought. 🧠. [3/6]🧵.
1
0
1
@NJingwei
Ni Jingwei
2 months
1⃣Diverse & Realistic Law Exams. LEXam draws from 340 real university law exams across 116 courses, covering everything from constitutional law to business law, in both English and German. Real academic challenge—not cherry-picked samples! 📖. [2/6]🧵.
1
0
1
@NJingwei
Ni Jingwei
2 months
🚨Time to push LLMs further! .📚LEXam: The legal reasoning benchmark you’ve been waiting for:.• 340 exams, 116 courses (EN/DE).• Long-form, process-focused questions.• Expert-level LLM Judges.• Rich meta for targeted diagnostics.• Contamination-proof, extendable MCQs.[1/6]🧵
Tweet media one
1
7
17
@NJingwei
Ni Jingwei
6 months
[4/4] 🧵.These challenges looks hard to solve? . Try DIRAS! A method leveraging open-source LLMs to efficiently generate high-quality annotated datasets for evaluating information retrieval performance in various RAG application domains. Link:
0
0
0
@NJingwei
Ni Jingwei
6 months
[3/4] 🧵.3. How to dynamically determine the Top-K to maximize information coverage?. Different queries require varying amounts of retrieved information. Determining Top-K dynamically to comprehensively supply information to LLMs remains a challenging task.
1
0
0
@NJingwei
Ni Jingwei
6 months
[2/4] 🧵.2. How to scientifically evaluate partial relevance? . The traditional binary approach (relevant/irrelevant) fails to capture the complex relationships between pieces of information, leaving partial relevance underexplored.
1
0
0
@NJingwei
Ni Jingwei
6 months
🥳Newly Accepted by NAACL 2025!. [1/4] 🧵.In current information retrieval evaluation in RAGs. Three VITAL points are often missing!. 1. How to define relevance? Relevance definition may vary across tasks and domains.
Tweet media one
1
0
0
@NJingwei
Ni Jingwei
7 months
RT @Climate_NLP: 📢📢 Call for Papers for the 2nd ClimateNLP Workshop at ACL 2025 📢📢. 🚀 We invite short and long papers at the intersection….
0
4
0
@NJingwei
Ni Jingwei
7 months
RT @Climate_NLP: 📢 Excited to announce that we're going into the second iteration of ClimateNLP at #ACL2025 in Vienna!.🚀 We again invite sh….
0
3
0
@NJingwei
Ni Jingwei
11 months
RT @Climate_NLP: 📢📢 We are excited for the first climateNLP workshop taking place tomorrow (August 16th)!.📍 location: Lotus 9, Bangkok.🗓️ s….
0
6
0
@NJingwei
Ni Jingwei
1 year
We show that it is possible to use pure synthetic data to achieve better attributability on a wide range of benchmarks, from ID to OOD. 🔑The key is to stick to high-quality data, and use data quality filter to easily obtain them 😎. Paper @ 🧵[2/2].
0
0
0
@NJingwei
Ni Jingwei
1 year
Another our major 2023 work "Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering" is also accepted by #acl2024 main. 🎉. 🌟It proposes a novel paradigm for LLM QA: always cite in-context information when answering (always be attributable). 🧵[1/2]
Tweet media one
1
1
2
@NJingwei
Ni Jingwei
1 year
💡AFaCTA annotated data is also good at training small classifiers. 📒A high-quality dataset covering wide political topics is also available. Check out paper @ 🧵[3/3].
0
0
0
@NJingwei
Ni Jingwei
1 year
🔎We propose a factual claim definition based on verifiability, addressing definition discrepancies in prior work. 🤖Then we use LLMs + self-consistency calibration to annotate factual claims, achieving super-expert performance. 🧵[2/3].
1
0
0