Luke Guerdan @lukeguerdan X Profile

Luke Guerdan

@lukeguerdan

Followers

729

Following

1K

Media

18

Statuses

175

PhD Student at @SCSatCMU | Researching sociotechnical measurement & evaluation of AI systems.

https://t.co/TURnSefsT9

Pittsburgh, USA

Joined November 2017

Don't wanna be here? Send us removal request.

Luke Guerdan

@lukeguerdan

1 year

1/ When is an algorithm an improvement over an existing (e.g., human) decision policy? Our #ICML2024 work offers an approach for recovering tighter relative performance intervals under unmeasured confounding by isolating comparison-related uncertainty. https://t.co/NzGjUv6BpM

1

8

48

Luke Guerdan

@lukeguerdan

5 days

📄 https://t.co/dSUMntqCnL This work was in collaboration with the amazing team @dev7saxena (co-first author), @snchancellor, @zstevenwu, and @d19fe8 Thank you for making my first adventure into qualitative research a delightful experience :)

arxiv.org

Data scientists often formulate predictive modeling tasks involving fuzzy, hard-to-define concepts, such as the "authenticity" of student writing or the "healthcare need" of a patient. Yet the...

0

3

Luke Guerdan

@lukeguerdan

5 days

Our paper offers design implications to support this, such as: - Protocols to help data scientists identify minimum standards for validity and other criteria, tailored to their application context - Tools to help data scientists identify and apply strategies more effectively

1

0

1

Luke Guerdan

@lukeguerdan

5 days

The challenge for HCI, CSCW, and ML is not to *replace* these bricolage practices with rigid top-down planning, but to develop scaffolding that enhances the rigor of bricolage while preserving creativity and adaptability

1

0

Luke Guerdan

@lukeguerdan

5 days

Yet from urban planning to software engineering, history is rife with examples where rigid top-down interventions have failed while bottom-up alternatives designed to better scaffold *existing* practices succeeded

1

0

Luke Guerdan

@lukeguerdan

5 days

In light of these findings, we might be tempted to more stringently enforce a rigid "top-down planning approach" to measurement, in which data scientists more carefully define construct → design operationalization → collect data

1

0

Luke Guerdan

@lukeguerdan

5 days

How do data scientists evaluate validity? They treat their target variable as a tangible object to be scrutinized. They "poke holes" in their definition then "patch" them. They apply "spot checks" to reconcile their theoretical understanding of a concept with observed labels

1

0

Luke Guerdan

@lukeguerdan

5 days

Data scientists navigate this balancing act by adaptively applying (re)formulation strategies For example, they use "swapping" to change target variables when the first has unanticipated challenges, or "composing" to capture complementary dimensions of a concept

1

0

Luke Guerdan

@lukeguerdan

5 days

While engaging in bricolage, data scientists balance the validity of their target variable with other criteria, such as: 💡Simplicity ⚙️ Resource requirements 🎯 Predictive performance 🌎 Portability

1

0

Luke Guerdan

@lukeguerdan

5 days

We find that target variable construction is a *bricolage practice*, in which data scientists creatively "make do" with the limited resources at hand

1

0

Luke Guerdan

@lukeguerdan

5 days

To explore this tension, we interviewed 15 data scientists from education and healthcare sectors to understand their practices, challenges, and perceived opportunities for target variable construction in predictive modeling

1

0

Luke Guerdan

@lukeguerdan

5 days

Traditional measurement theory assumes a top-down workflow, where data is collected to fit a study's goals (define construct → design operationalization → collect data) In contrast, data scientists are often forced to reconcile their measurement goals with *existing* data

1

0

Luke Guerdan

@lukeguerdan

5 days

A subtle aspect of predictive modeling is target variable construction: translating an unobservable concept like "healthcare need" into a prediction target But how does target variable construction unfold in practice, and how can we better support it going forward? #CSCW2025🧵

2

4

10

Emily Byun

@yewonbyun_

10 days

💡Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data

2

33

133

Benjamin Laufer

@BenDLaufer

2 months

1/10. In a new paper with @didaoh and Jon Kleinberg, we mapped the family trees of 1.86 million AI models on Hugging Face — the largest open-model ecosystem in the world. AI evolution looks kind of like biology, but with some strange twists. 🧬🤖

4

9

51

Anna Kawakami

@anna_kawakami

4 months

Excited to share our #FAccT25 translation tutorial, where we'll explore how to reconceptualize AI measurement as a stakeholder-engaged design practice 🙋🔍🖥️ Next week Thurs 6/26 at 3:15 pm (last day and session - please don't leave the conference early!) 🧵

2

12

57

Hanna Wallach (@hannawallach.bsky.social)

@hannawallach

4 months

Check out the camera-ready version of our @icmlconf position paper ("Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge") to learn more!!!

arxiv.org

The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges...

0

11

26

Russell Brandom

@russellbrandom

5 months

In @techreview, I wrote about the crisis in AI evaluations — and why a new focus on validity could be the best way forward

2

8

34

Tzu-Sheng Kuo 郭子生

@tzushengkuo

7 months

✨How can we help communities collaboratively shape policies that impact them? In our #CHI2025 paper, we present PolicyCraft, a system that supports collaborative policy design through case-grounded deliberation. (🧵/11)

2

13

61

Hanna Wallach (@hannawallach.bsky.social)

@hannawallach

9 months

Remember this @NeurIPSConf workshop paper? We spent the past month writing a newer, better, longer version!!! You can find it online here:

arxiv.org

The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges...

Hanna Wallach (@hannawallach.bsky.social)

@hannawallach

11 months

Evaluating Generative AI Systems is a Social Science Measurement Challenge: https://t.co/km7E8rV5Uu TL;DR: The ML community would benefit from learning from and drawing on the social sciences when evaluating GenAI systems.

1

17

53

Ken Holstein

@d19fe8

10 months

I'm recruiting a postdoc via the CBI Fellowship Program ( https://t.co/0c1WTuhy40). Interested in designing & evaluating interactive systems for participatory AI, augmented and collective intelligence, responsible AI, or related areas? Please apply and reach out to me via email!

3

20

80