Matthias Orlikowski @morlikow X Profile

Matthias Orlikowski

@morlikow

Followers

629

Following

3K

Media

24

Statuses

1K

NLProc, Computational Social Science • Human Label Variation, Disagreement, Subjectivity • PhD candidate @unibielefeld • he/him • EN, DE

https://t.co/icHNnaHShB

Bielefeld, Germany

Joined February 2015

Don't wanna be here? Send us removal request.

Matthias Orlikowski

@morlikow

4 months

I will be at #acl2025 to present "Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions" ✨ Heartfelt thank you to my collaborators @jiaxin_pei @paul_rottger @pcimiano @david__jurgens Dirk Hovy more below

1

2

12

Tiancheng Hu

@tiancheng_hu

23 days

Can AI simulate human behavior? 🧠 The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality? To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)

3

22

54

Ashima Suvarna🌻

@suvarna_ashima

3 months

1/ 🧵 New #EMNLP2025 Paper !! Toxicity detection is subjective; shaped by norms, identity, & context. Existing models and dataset overlook this nuance. Enter MODELCITIZENS: a new dataset designed to address this. ✔️ 6.8K posts, 40K annotations across diverse groups ✔️

3

7

41

Liwei Jiang

@liweijianglw

4 months

We have put up all slide decks on the tutorial website: https://t.co/xhQfa5wylv 🥳🥳🥳 Although I was only able to deliver the tutorial remotely due to visa constraints, I was really thrilled to learn that our tutorial received a quite full room of audience for the entire 3.5 hr!

Liwei Jiang

@liweijianglw

4 months

🥳🥳🥳Join us at the tutorial of 𝐆𝐮𝐚𝐫𝐝𝐫𝐚𝐢𝐥𝐬 𝐚𝐧𝐝 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲 𝐟𝐨𝐫 𝐋𝐋𝐌𝐬: 𝐒𝐚𝐟𝐞, 𝐒𝐞𝐜𝐮𝐫𝐞, 𝐚𝐧𝐝 𝐂𝐨𝐧𝐭𝐫𝐨𝐥𝐥𝐚𝐛𝐥𝐞 𝐒𝐭𝐞𝐞𝐫𝐢𝐧𝐠 𝐨𝐟 𝐋𝐋𝐌 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬! Time: 14:00 - 17:30 July 27 Location: Hall CT8

1

10

60

Paul Röttger @ EMNLP

@paul_rottger

4 months

Very excited about all these papers on sociotechnical alignment & the societal impacts of AI at #ACL2025. As is now tradition, I made some timetables to help me find my way around. Sharing here in case others find them useful too :) 🧵

4

12

124

Stella Li

@StellaLisy

4 months

WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵

6

84

413

Hope Schroeder

@Schropes

4 months

🗣️ Excited to share our new #ACL2025 Findings paper: “Just Put a Human in the Loop? Investigating LLM-Assisted Annotation for Subjective Tasks” with @jad_kabbara and @dkroy. Arxiv: https://t.co/FeWQLQxt5K Read about our findings ⤵️

arxiv.org

LLM use in annotation is becoming widespread, and given LLMs' overall promising performance and speed, simply "reviewing" LLM annotations in interpretive tasks can be tempting. In subjective...

1

10

56

Matthias Orlikowski

@morlikow

4 months

More detail on the paper in this thread:

Matthias Orlikowski

@morlikow

7 months

Can LLMs learn to simulate individuals' judgments based on their demographics? Not quite! In our new paper, we found that LLMs do not learn information about demographics, but instead learn individual annotators' patterns based on unique combinations of attributes! 🧵

0

Matthias Orlikowski

@morlikow

4 months

I will present on Monday July 28 during Poster Session 1 in the Human-Centered NLP track. The session runs 11:00-12:30 in Hall 4 and 5, looking forward to discuss our work! https://t.co/BoLWVGtcs4

arxiv.org

People naturally vary in their annotations for subjective questions and some of this variation is thought to be due to the person's sociodemographic characteristics. LLMs have also been used to...

1

0

MilaNLP

@MilaNLProc

4 months

🎉 The @MilaNLProc lab is excited to present 15 papers and 1 tutorial at #ACL2025 & workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀

0

6

18

NLLG

@NLLG_lab

6 months

📢📢👇New job openings. Topic: social bias detection+analysis with LLMs across time (1950-now) & languages. There are 2 Post-Doc/PhD positions, supervised by @egere14 (@utn_nuremberg)+Simone Ponzetto (@dwsunima). Fully funded, up to 3 yrs. More infos:

0

3

10

Nicole Meister

@nicole__meister

1 year

Prior work has used LLMs to simulate survey responses, yet their ability to match the distribution of views remains uncertain. Our new paper [ https://t.co/DleesiPbif] introduces a benchmark to evaluate how distributionally aligned LLMs are with human opinions. 🧵

4

38

160

Angel Hsing-Chi Hwang

@angelhwang6

7 months

📣Calling all #CHI2025 attendees who work with human participants: Join our panel discussion on #LLM, #simulation, #syntheticdata, and the future of human subjects research on Apr 30 (Wed), 2:10 - 3:40 PM (JP Time) Post your questions for panelists here: https://t.co/FxbwBA3nW0

3

18

107

Matthias Orlikowski

@morlikow

7 months

There is more detail and additional analysis in the paper! You can read it on arXiv, happy to receive any comments or questions! Preprint:

arxiv.org

People naturally vary in their annotations for subjective questions and some of this variation is thought to be due to the person's sociodemographic characteristics. LLMs have also been used to...

0

4

6

Matthias Orlikowski

@morlikow

7 months

Our findings underscore that LLMs can’t be expected to be accurate models of individual variation based on sociodemographics. We should not use LLMs to attempt “simulation”, in particular when we do not have access to examples of individual behaviour

1

0

Matthias Orlikowski

@morlikow

7 months

Attributes help most for annotators with unique sociodemographic profiles. Apparently, LLMs learn to use unique combinations as a proxy ID! Learning from individual-level examples provides richer information than knowing sociodemographics.

1

0

1

Matthias Orlikowski

@morlikow

7 months

But why did attributes improve predictions when we tested with known annotators? We started to wonder: Is performance linked to how many annotators our models see for each combination of attributes (sociodemographic profile)? We compare unique and frequent profiles.

1

0

1

Matthias Orlikowski

@morlikow

7 months

Ok, but surely attributes are much more useful when transferring to annotators not seen in training? This setting is rarely tested in NLP, so we built a separate partitioning of DeMo for this evaluation. Turns out no model improves over the baseline!

1

0

1

Matthias Orlikowski

@morlikow

7 months

Sociodemographic prompting and models fine-tuned with only the text content are our baselines. We compare against fine-tuning with annotator attributes or unique annotator identifiers (IDs). Trends are clear: attributes help a bit, but IDs are much more accurate!

1

0

Matthias Orlikowski

@morlikow

7 months

We compare models on DeMo, a dataset of subjective classification tasks with annotator attributes: age, gender, race and education. We curate DeMo from existing datasets and normalise attributes to increase comparability. DeMo is available in our repo:

github.com

Data and experiment code for "Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions" (ACL2025) - morlikowski/beyond-demographics

1

0

1