Matthias Orlikowski
@morlikow
Followers
629
Following
3K
Media
24
Statuses
1K
NLProc, Computational Social Science • Human Label Variation, Disagreement, Subjectivity • PhD candidate @unibielefeld • he/him • EN, DE
Bielefeld, Germany
Joined February 2015
I will be at #acl2025 to present "Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions" ✨ Heartfelt thank you to my collaborators @jiaxin_pei @paul_rottger @pcimiano @david__jurgens Dirk Hovy more below
1
2
12
Can AI simulate human behavior? 🧠 The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality? To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)
3
22
54
1/ 🧵 New #EMNLP2025 Paper !! Toxicity detection is subjective; shaped by norms, identity, & context. Existing models and dataset overlook this nuance. Enter MODELCITIZENS: a new dataset designed to address this. ✔️ 6.8K posts, 40K annotations across diverse groups ✔️
3
7
41
We have put up all slide decks on the tutorial website: https://t.co/xhQfa5wylv 🥳🥳🥳 Although I was only able to deliver the tutorial remotely due to visa constraints, I was really thrilled to learn that our tutorial received a quite full room of audience for the entire 3.5 hr!
🥳🥳🥳Join us at the tutorial of 𝐆𝐮𝐚𝐫𝐝𝐫𝐚𝐢𝐥𝐬 𝐚𝐧𝐝 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲 𝐟𝐨𝐫 𝐋𝐋𝐌𝐬: 𝐒𝐚𝐟𝐞, 𝐒𝐞𝐜𝐮𝐫𝐞, 𝐚𝐧𝐝 𝐂𝐨𝐧𝐭𝐫𝐨𝐥𝐥𝐚𝐛𝐥𝐞 𝐒𝐭𝐞𝐞𝐫𝐢𝐧𝐠 𝐨𝐟 𝐋𝐋𝐌 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬! Time: 14:00 - 17:30 July 27 Location: Hall CT8
1
10
60
Very excited about all these papers on sociotechnical alignment & the societal impacts of AI at #ACL2025. As is now tradition, I made some timetables to help me find my way around. Sharing here in case others find them useful too :) 🧵
4
12
124
WHY do you prefer something over another? Reward models treat preference as a black-box😶🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵
6
84
413
🗣️ Excited to share our new #ACL2025 Findings paper: “Just Put a Human in the Loop? Investigating LLM-Assisted Annotation for Subjective Tasks” with @jad_kabbara and @dkroy. Arxiv: https://t.co/FeWQLQxt5K Read about our findings ⤵️
arxiv.org
LLM use in annotation is becoming widespread, and given LLMs' overall promising performance and speed, simply "reviewing" LLM annotations in interpretive tasks can be tempting. In subjective...
1
10
56
More detail on the paper in this thread:
Can LLMs learn to simulate individuals' judgments based on their demographics? Not quite! In our new paper, we found that LLMs do not learn information about demographics, but instead learn individual annotators' patterns based on unique combinations of attributes! 🧵
0
0
0
I will present on Monday July 28 during Poster Session 1 in the Human-Centered NLP track. The session runs 11:00-12:30 in Hall 4 and 5, looking forward to discuss our work! https://t.co/BoLWVGtcs4
arxiv.org
People naturally vary in their annotations for subjective questions and some of this variation is thought to be due to the person's sociodemographic characteristics. LLMs have also been used to...
1
0
0
🎉 The @MilaNLProc lab is excited to present 15 papers and 1 tutorial at #ACL2025 & workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀
0
6
18
📢📢👇New job openings. Topic: social bias detection+analysis with LLMs across time (1950-now) & languages. There are 2 Post-Doc/PhD positions, supervised by @egere14 (@utn_nuremberg)+Simone Ponzetto (@dwsunima). Fully funded, up to 3 yrs. More infos:
0
3
10
Prior work has used LLMs to simulate survey responses, yet their ability to match the distribution of views remains uncertain. Our new paper [ https://t.co/DleesiPbif] introduces a benchmark to evaluate how distributionally aligned LLMs are with human opinions. 🧵
4
38
160
📣Calling all #CHI2025 attendees who work with human participants: Join our panel discussion on #LLM, #simulation, #syntheticdata, and the future of human subjects research on Apr 30 (Wed), 2:10 - 3:40 PM (JP Time) Post your questions for panelists here: https://t.co/FxbwBA3nW0
3
18
107
There is more detail and additional analysis in the paper! You can read it on arXiv, happy to receive any comments or questions! Preprint:
arxiv.org
People naturally vary in their annotations for subjective questions and some of this variation is thought to be due to the person's sociodemographic characteristics. LLMs have also been used to...
0
4
6
Our findings underscore that LLMs can’t be expected to be accurate models of individual variation based on sociodemographics. We should not use LLMs to attempt “simulation”, in particular when we do not have access to examples of individual behaviour
1
0
0
Attributes help most for annotators with unique sociodemographic profiles. Apparently, LLMs learn to use unique combinations as a proxy ID! Learning from individual-level examples provides richer information than knowing sociodemographics.
1
0
1
But why did attributes improve predictions when we tested with known annotators? We started to wonder: Is performance linked to how many annotators our models see for each combination of attributes (sociodemographic profile)? We compare unique and frequent profiles.
1
0
1
Ok, but surely attributes are much more useful when transferring to annotators not seen in training? This setting is rarely tested in NLP, so we built a separate partitioning of DeMo for this evaluation. Turns out no model improves over the baseline!
1
0
1
Sociodemographic prompting and models fine-tuned with only the text content are our baselines. We compare against fine-tuning with annotator attributes or unique annotator identifiers (IDs). Trends are clear: attributes help a bit, but IDs are much more accurate!
1
0
0
We compare models on DeMo, a dataset of subjective classification tasks with annotator attributes: age, gender, race and education. We curate DeMo from existing datasets and normalise attributes to increase comparability. DeMo is available in our repo:
github.com
Data and experiment code for "Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions" (ACL2025) - morlikowski/beyond-demographics
1
0
1