Joachim Baumann Profile
Joachim Baumann

@joabaum

Followers
220
Following
82
Media
3
Statuses
36

Postdoc @MilaNLProc / Incoming Postdoc @StanfordNLP @StanfordAILab / Prev: @UZH_en @MPI_IS @CarnegieMellon. CompSocSci, LLMs, algorithmic fairness.

Zurich, Switzerland
Joined February 2021
Don't wanna be here? Send us removal request.
@joabaum
Joachim Baumann
2 months
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**. Paper: https://t.co/24Fyb4Ik3v
16
105
513
@manoelribeiro
Manoel
9 days
The debate over “LLMs as annotators” feels familiar: excitement, backlash, and anxiety about bad science. My take in a new blogpost is that LLMs don’t break measurement; they expose how fragile it already was. https://t.co/6CweDPv5wG
1
8
21
@joabaum
Joachim Baumann
13 days
Cool paper by @ey_985, confirming our LLM hacking findings ( https://t.co/24Fyb4IRT3): ✓ LLMs are brittle data annotators ✓ Downstream conclusions often flip: *LLM hacking risk* is real! ✓ Bias correction methods can help but have tradeoffs ✓ Use human expert whenever possible
@ey_985
Eddie Yang
14 days
New paper: LLMs are increasingly used to label data in political science. But how reliable are these annotations, and what are the consequences for scientific findings? What are best practices? Some new findings from a large empirical evaluation. Paper: https://t.co/F8FlrsLbzM
1
6
15
@MilaNLProc
MilaNLP
17 days
We’re delighted to welcome @enfleisig to our @MilaNLProc lab as a visiting PhD student! ✨
1
1
22
@sayashk
Sayash Kapoor
19 days
📣New paper: Rigorous AI agent evaluation is much harder than it seems. For the last year, we have been working on infrastructure for fair agent evaluations on challenging benchmarks. Today, we release a paper that condenses our insights from 20,000+ agent rollouts on 9
20
98
422
@fly51fly
fly51fly
2 months
[CL] Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation J Baumann, P Röttger, A Urman, A Wendsjö... [Bocconi University & University of Zurich] (2025) https://t.co/ynve6AW3DH
1
5
16
@joabaum
Joachim Baumann
2 months
Thank you to the amazing @paul_rottger @AUrman21 @awendsjo @florplaza22 @ Johannes B. Gruber @ Dirk Hovy for this fun collaboration!
1
1
9
@joabaum
Joachim Baumann
2 months
Why this matters: LLM hacking affects any field using AI for data analysis–not just computational social science! Please check out our preprint; we'd be happy to receive your feedback! #LLMHacking #Research #Reproducibility #DataAnnotation #nlp #Statistics #LLM
1
1
9
@joabaum
Joachim Baumann
2 months
The good news: we present several solutions: ✅ Larger, more capable models are safer (but no guarantee) ✅ Few human annotations beat many AI annotations ✅ Testing several models and configurations on held-out data helps ✅ Pre-registering AI choices can prevent cherry-picking
1
2
8
@joabaum
Joachim Baumann
2 months
This also concerns well-intentioned researchers! - Researchers using SOTA LLMs like GPT-4o face a 31-50% chance of false conclusions for plausible hypotheses - Risk peaks near significance thresholds (p=0.05) - Regression correction methods trade off Type I vs. Type II errors
1
2
13
@joabaum
Joachim Baumann
2 months
We tested 18 LLMs on 37 social science annotation tasks (13M labels, 1.4M regressions). By trying different models and prompts, you can make 94% of null results appear statistically significant–or flip findings completely 68% of the time.
1
4
24
@joabaum
Joachim Baumann
3 months
@MilaNLProc @AUrman21 @RERobertson @ancsaaa3 @tiancheng_hu The @MilaNLProc group is presenting 13 more paper at this year's #ACL2025 , go check them out :) https://t.co/t65ATxE9Rr
@MilaNLProc
MilaNLP
4 months
🎉 The @MilaNLProc lab is excited to present 15 papers and 1 tutorial at #ACL2025 & workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀
0
0
1
@joabaum
Joachim Baumann
3 months
@MilaNLProc @AUrman21 @RERobertson @ancsaaa3 Shoutout to @tiancheng_hu for yesterday's stellar presentation of our work benchmarking LLMs' ability to simulate group-level human behavior:
@tiancheng_hu
Tiancheng Hu
3 months
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors, SRW Oral, Monday, July 28, 14:00-15:30
1
0
1
@joabaum
Joachim Baumann
3 months
@MilaNLProc I'm at #ACL2025 this week:📍Find me at the FEVER workshop, *Thursday 11am* 📝 presenting: "I Just Can't RAG Enough" - our ongoing work with @AUrman21 @RERobertson @ancsaaa3, showing that RAG does not solve LLM fact-checking limitations!
6
0
3
@joabaum
Joachim Baumann
3 months
Breaking my social media silence because this news is too good not to share! 🎉 Just joined @MilaNLProc as a Postdoc, working on large language models and computational social science!
2
1
17
@MilaNLProc
MilaNLP
4 months
🎉 The @MilaNLProc lab is excited to present 15 papers and 1 tutorial at #ACL2025 & workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀
0
6
18