Ni Jingwei @NJingwei X Profile

Ni Jingwei

@NJingwei

Followers

87

Following

18

Media

7

Statuses

38

Doctoral Researcher @ ETH Zurich, Research Interests: NLP for Social Good, Claim Detection and Verification, Causal NLP

Zurich, Switzerland

Joined December 2020

Don't wanna be here? Send us removal request.

Ni Jingwei

@NJingwei

2 months

RT @YinyaHuang: 🤖⚛️Can AI truly see Physics? Test your model with the newly released SeePhys Benchmark! 🚀. 🖼️Covering 2,000 vision-text mul….

0

16

0

Ni Jingwei

@NJingwei

2 months

👏 Big applause to my co-authors! @yu_fan_768, Jakob, Etienne, Yang, Yoan, @YinyaHuang, @akhtarmubashara, Florian, Oliver, Daniel, @LeippoldMarkus, @mrinmayasachan, @Stremitzer_Lab, Christoph Engel, @ellliottt, and @joelniklaus!.

0

3

Ni Jingwei

@NJingwei

2 months

🚀 Huge thanks to the community!.📈 LEXam is now the #1 trending evaluation dataset on Hugging Face! Check it out: 🧠 Built for deep legal reasoning across 340 law exams.👩‍⚖️ Expert LLM judge evaluations, long-form + MCQs. #LegalNLP #Benchmark #LLM.

Ni Jingwei

@NJingwei

2 months

🚨Time to push LLMs further! .📚LEXam: The legal reasoning benchmark you’ve been waiting for:.• 340 exams, 116 courses (EN/DE).• Long-form, process-focused questions.• Expert-level LLM Judges.• Rich meta for targeted diagnostics.• Contamination-proof, extendable MCQs.[1/6]🧵

1

6

9

Ni Jingwei

@NJingwei

2 months

6⃣ Future-proof: Contamination & Extendability. MCQs are permutable and extensible, making data contamination easy to check and new challenges easy to add. Benchmark stays tough as models evolve!🔒🔁. 🔗Check out the paper! [6/6]🧵.

0

1

Ni Jingwei

@NJingwei

2 months

4⃣ Dig Into Weaknesses with Rich Metadata. Every question includes detailed metadata (course, topic, reasoning steps), so you can pinpoint where your model struggles—be it contract law, criminal law, or multi-hop reasoning. 🔎📊. [5/6]🧵.

1

0

1

Ni Jingwei

@NJingwei

2 months

3⃣ LLM-as-Judge: Reliable, Scalable Evaluation. Expert-level LLMs (with human spot-checks) provide consistent grading—making large-scale evaluation finally practical for legal QA. 🏛️🤖. [4/6]🧵.

1

0

1

Ni Jingwei

@NJingwei

2 months

2⃣ Beyond Short Answers: Process Matters. LEXam requires LLMs to reason step-by-step, not just spit out final answers. Tasks include long-form answers and multi-stage MCQs that test real legal reasoning and chain-of-thought. 🧠. [3/6]🧵.

1

0

1

Ni Jingwei

@NJingwei

2 months

1⃣Diverse & Realistic Law Exams. LEXam draws from 340 real university law exams across 116 courses, covering everything from constitutional law to business law, in both English and German. Real academic challenge—not cherry-picked samples! 📖. [2/6]🧵.

1

0

1

Ni Jingwei

@NJingwei

2 months

🚨Time to push LLMs further! .📚LEXam: The legal reasoning benchmark you’ve been waiting for:.• 340 exams, 116 courses (EN/DE).• Long-form, process-focused questions.• Expert-level LLM Judges.• Rich meta for targeted diagnostics.• Contamination-proof, extendable MCQs.[1/6]🧵

1

7

17

Ni Jingwei

@NJingwei

6 months

[4/4] 🧵.These challenges looks hard to solve? . Try DIRAS! A method leveraging open-source LLMs to efficiently generate high-quality annotated datasets for evaluating information retrieval performance in various RAG application domains. Link:

0

Ni Jingwei

@NJingwei

6 months

[3/4] 🧵.3. How to dynamically determine the Top-K to maximize information coverage?. Different queries require varying amounts of retrieved information. Determining Top-K dynamically to comprehensively supply information to LLMs remains a challenging task.

1

0

Ni Jingwei

@NJingwei

6 months

[2/4] 🧵.2. How to scientifically evaluate partial relevance? . The traditional binary approach (relevant/irrelevant) fails to capture the complex relationships between pieces of information, leaving partial relevance underexplored.

1

0

Ni Jingwei

@NJingwei

6 months

🥳Newly Accepted by NAACL 2025!. [1/4] 🧵.In current information retrieval evaluation in RAGs. Three VITAL points are often missing!. 1. How to define relevance? Relevance definition may vary across tasks and domains.

1

0

Ni Jingwei

@NJingwei

7 months

RT @Climate_NLP: 📢📢 Call for Papers for the 2nd ClimateNLP Workshop at ACL 2025 📢📢. 🚀 We invite short and long papers at the intersection….

0

4

0

Ni Jingwei

@NJingwei

7 months

RT @Climate_NLP: 📢 Excited to announce that we're going into the second iteration of ClimateNLP at #ACL2025 in Vienna!.🚀 We again invite sh….

0

3

0

Ni Jingwei

@NJingwei

11 months

RT @Climate_NLP: 📢📢 We are excited for the first climateNLP workshop taking place tomorrow (August 16th)!.📍 location: Lotus 9, Bangkok.🗓️ s….

0

6

0

Ni Jingwei

@NJingwei

1 year

We show that it is possible to use pure synthetic data to achieve better attributability on a wide range of benchmarks, from ID to OOD. 🔑The key is to stick to high-quality data, and use data quality filter to easily obtain them 😎. Paper @ 🧵[2/2].

0

Ni Jingwei

@NJingwei

1 year

Another our major 2023 work "Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering" is also accepted by #acl2024 main. 🎉. 🌟It proposes a novel paradigm for LLM QA: always cite in-context information when answering (always be attributable). 🧵[1/2]

1

2

Ni Jingwei

@NJingwei

1 year

💡AFaCTA annotated data is also good at training small classifiers. 📒A high-quality dataset covering wide political topics is also available. Check out paper @ 🧵[3/3].

0

Ni Jingwei

@NJingwei

1 year

🔎We propose a factual claim definition based on verifiability, addressing definition discrepancies in prior work. 🤖Then we use LLMs + self-consistency calibration to annotate factual claims, achieving super-expert performance. 🧵[2/3].

1

0