
Luke Guerdan
@lukeguerdan
Followers
729
Following
1K
Media
18
Statuses
175
PhD Student at @SCSatCMU | Researching sociotechnical measurement & evaluation of AI systems.
Pittsburgh, USA
Joined November 2017
1/ When is an algorithm an improvement over an existing (e.g., human) decision policy? Our #ICML2024 work offers an approach for recovering tighter relative performance intervals under unmeasured confounding by isolating comparison-related uncertainty. https://t.co/NzGjUv6BpM
1
8
48
π https://t.co/dSUMntqCnL This work was in collaboration with the amazing team @dev7saxena (co-first author), @snchancellor, @zstevenwu, and @d19fe8 Thank you for making my first adventure into qualitative research a delightful experience :)
arxiv.org
Data scientists often formulate predictive modeling tasks involving fuzzy, hard-to-define concepts, such as the "authenticity" of student writing or the "healthcare need" of a patient. Yet the...
0
0
3
Our paper offers design implications to support this, such as: - Protocols to help data scientists identify minimum standards for validity and other criteria, tailored to their application context - Tools to help data scientists identify and apply strategies more effectively
1
0
1
The challenge for HCI, CSCW, and ML is not to *replace* these bricolage practices with rigid top-down planning, but to develop scaffolding that enhances the rigor of bricolage while preserving creativity and adaptability
1
0
0
Yet from urban planning to software engineering, history is rife with examples where rigid top-down interventions have failed while bottom-up alternatives designed to better scaffold *existing* practices succeeded
1
0
0
In light of these findings, we might be tempted to more stringently enforce a rigid "top-down planning approach" to measurement, in which data scientists more carefully define construct β design operationalization β collect data
1
0
0
How do data scientists evaluate validity? They treat their target variable as a tangible object to be scrutinized. They "poke holes" in their definition then "patch" them. They apply "spot checks" to reconcile their theoretical understanding of a concept with observed labels
1
0
0
Data scientists navigate this balancing act by adaptively applying (re)formulation strategies For example, they use "swapping" to change target variables when the first has unanticipated challenges, or "composing" to capture complementary dimensions of a concept
1
0
0
While engaging in bricolage, data scientists balance the validity of their target variable with other criteria, such as: π‘Simplicity βοΈ Resource requirements π― Predictive performance π Portability
1
0
0
We find that target variable construction is a *bricolage practice*, in which data scientists creatively "make do" with the limited resources at hand
1
0
0
To explore this tension, we interviewed 15 data scientists from education and healthcare sectors to understand their practices, challenges, and perceived opportunities for target variable construction in predictive modeling
1
0
0
Traditional measurement theory assumes a top-down workflow, where data is collected to fit a study's goals (define construct β design operationalization β collect data) In contrast, data scientists are often forced to reconcile their measurement goals with *existing* data
1
0
0
A subtle aspect of predictive modeling is target variable construction: translating an unobservable concept like "healthcare need" into a prediction target But how does target variable construction unfold in practice, and how can we better support it going forward? #CSCW2025π§΅
2
4
10
π‘Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data
2
33
133
1/10. In a new paper with @didaoh and Jon Kleinberg, we mapped the family trees of 1.86 million AI models on Hugging Face β the largest open-model ecosystem in the world. AI evolution looks kind of like biology, but with some strange twists. π§¬π€
4
9
51
Excited to share our #FAccT25 translation tutorial, where we'll explore how to reconceptualize AI measurement as a stakeholder-engaged design practice πππ₯οΈ Next week Thurs 6/26 at 3:15 pm (last day and session - please don't leave the conference early!) π§΅
2
12
57
Check out the camera-ready version of our @icmlconf position paper ("Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge") to learn more!!!
arxiv.org
The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges...
0
11
26
In @techreview, I wrote about the crisis in AI evaluations β and why a new focus on validity could be the best way forward
2
8
34
β¨How can we help communities collaboratively shape policies that impact them? In our #CHI2025 paper, we present PolicyCraft, a system that supports collaborative policy design through case-grounded deliberation. (π§΅/11)
2
13
61
Remember this @NeurIPSConf workshop paper? We spent the past month writing a newer, better, longer version!!! You can find it online here:
arxiv.org
The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges...
Evaluating Generative AI Systems is a Social Science Measurement Challenge: https://t.co/km7E8rV5Uu TL;DR: The ML community would benefit from learning from and drawing on the social sciences when evaluating GenAI systems.
1
17
53
I'm recruiting a postdoc via the CBI Fellowship Program ( https://t.co/0c1WTuhy40). Interested in designing & evaluating interactive systems for participatory AI, augmented and collective intelligence, responsible AI, or related areas? Please apply and reach out to me via email!
3
20
80