Joachim Baumann @joabaum X Profile

Joachim Baumann

@joabaum

Followers

220

Following

82

Media

3

Statuses

36

Postdoc @MilaNLProc / Incoming Postdoc @StanfordNLP @StanfordAILab / Prev: @UZH_en @MPI_IS @CarnegieMellon. CompSocSci, LLMs, algorithmic fairness.

https://t.co/s1hsoTGRud

Zurich, Switzerland

Joined February 2021

Don't wanna be here? Send us removal request.

Joachim Baumann

@joabaum

2 months

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**. Paper: https://t.co/24Fyb4Ik3v

16

105

513

Manoel

@manoelribeiro

9 days

The debate over “LLMs as annotators” feels familiar: excitement, backlash, and anxiety about bad science. My take in a new blogpost is that LLMs don’t break measurement; they expose how fragile it already was. https://t.co/6CweDPv5wG

1

8

21

Joachim Baumann

@joabaum

13 days

Cool paper by @ey_985, confirming our LLM hacking findings ( https://t.co/24Fyb4IRT3): ✓ LLMs are brittle data annotators ✓ Downstream conclusions often flip: *LLM hacking risk* is real! ✓ Bias correction methods can help but have tradeoffs ✓ Use human expert whenever possible

Eddie Yang

@ey_985

14 days

New paper: LLMs are increasingly used to label data in political science. But how reliable are these annotations, and what are the consequences for scientific findings? What are best practices? Some new findings from a large empirical evaluation. Paper: https://t.co/F8FlrsLbzM

1

6

15

MilaNLP

@MilaNLProc

17 days

We’re delighted to welcome @enfleisig to our @MilaNLProc lab as a visiting PhD student! ✨

1

22

Sayash Kapoor

@sayashk

19 days

📣New paper: Rigorous AI agent evaluation is much harder than it seems. For the last year, we have been working on infrastructure for fair agent evaluations on challenging benchmarks. Today, we release a paper that condenses our insights from 20,000+ agent rollouts on 9

20

98

422

fly51fly

@fly51fly

2 months

[CL] Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation J Baumann, P Röttger, A Urman, A Wendsjö... [Bocconi University & University of Zurich] (2025) https://t.co/ynve6AW3DH

1

5

16

Joachim Baumann

@joabaum

2 months

Thank you to the amazing @paul_rottger @AUrman21 @awendsjo @florplaza22 @ Johannes B. Gruber @ Dirk Hovy for this fun collaboration!

1

9

Joachim Baumann

@joabaum

2 months

Why this matters: LLM hacking affects any field using AI for data analysis–not just computational social science! Please check out our preprint; we'd be happy to receive your feedback! #LLMHacking #Research #Reproducibility #DataAnnotation #nlp #Statistics #LLM

1

9

Joachim Baumann

@joabaum

2 months

The good news: we present several solutions: ✅ Larger, more capable models are safer (but no guarantee) ✅ Few human annotations beat many AI annotations ✅ Testing several models and configurations on held-out data helps ✅ Pre-registering AI choices can prevent cherry-picking

1

2

8

Joachim Baumann

@joabaum

2 months

This also concerns well-intentioned researchers! - Researchers using SOTA LLMs like GPT-4o face a 31-50% chance of false conclusions for plausible hypotheses - Risk peaks near significance thresholds (p=0.05) - Regression correction methods trade off Type I vs. Type II errors

1

2

13

Joachim Baumann

@joabaum

2 months

We tested 18 LLMs on 37 social science annotation tasks (13M labels, 1.4M regressions). By trying different models and prompts, you can make 94% of null results appear statistically significant–or flip findings completely 68% of the time.

1

4

24

Joachim Baumann

@joabaum

3 months

@MilaNLProc @AUrman21 @RERobertson @ancsaaa3 @tiancheng_hu The @MilaNLProc group is presenting 13 more paper at this year's #ACL2025 , go check them out :) https://t.co/t65ATxE9Rr

MilaNLP

@MilaNLProc

4 months

🎉 The @MilaNLProc lab is excited to present 15 papers and 1 tutorial at #ACL2025 & workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀

0

1

Joachim Baumann

@joabaum

3 months

@MilaNLProc @AUrman21 @RERobertson @ancsaaa3 Shoutout to @tiancheng_hu for yesterday's stellar presentation of our work benchmarking LLMs' ability to simulate group-level human behavior:

Tiancheng Hu

@tiancheng_hu

3 months

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors, SRW Oral, Monday, July 28, 14:00-15:30

1

0

1

Joachim Baumann

@joabaum

3 months

@MilaNLProc I'm at #ACL2025 this week:📍Find me at the FEVER workshop, *Thursday 11am* 📝 presenting: "I Just Can't RAG Enough" - our ongoing work with @AUrman21 @RERobertson @ancsaaa3, showing that RAG does not solve LLM fact-checking limitations!

6

0

3

Joachim Baumann

@joabaum

3 months

Breaking my social media silence because this news is too good not to share! 🎉 Just joined @MilaNLProc as a Postdoc, working on large language models and computational social science!

2

1

17

MilaNLP

@MilaNLProc

4 months

🎉 The @MilaNLProc lab is excited to present 15 papers and 1 tutorial at #ACL2025 & workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀

0

6

18