lukeguerdan Profile Banner
Luke Guerdan Profile
Luke Guerdan

@lukeguerdan

Followers
729
Following
1K
Media
18
Statuses
175

PhD Student at @SCSatCMU | Researching sociotechnical measurement & evaluation of AI systems.

Pittsburgh, USA
Joined November 2017
Don't wanna be here? Send us removal request.
@lukeguerdan
Luke Guerdan
1 year
1/ When is an algorithm an improvement over an existing (e.g., human) decision policy? Our #ICML2024 work offers an approach for recovering tighter relative performance intervals under unmeasured confounding by isolating comparison-related uncertainty. https://t.co/NzGjUv6BpM
1
8
48
@lukeguerdan
Luke Guerdan
5 days
πŸ“„ https://t.co/dSUMntqCnL This work was in collaboration with the amazing team @dev7saxena (co-first author), @snchancellor, @zstevenwu, and @d19fe8 Thank you for making my first adventure into qualitative research a delightful experience :)
Tweet card summary image
arxiv.org
Data scientists often formulate predictive modeling tasks involving fuzzy, hard-to-define concepts, such as the "authenticity" of student writing or the "healthcare need" of a patient. Yet the...
0
0
3
@lukeguerdan
Luke Guerdan
5 days
Our paper offers design implications to support this, such as: - Protocols to help data scientists identify minimum standards for validity and other criteria, tailored to their application context - Tools to help data scientists identify and apply strategies more effectively
1
0
1
@lukeguerdan
Luke Guerdan
5 days
The challenge for HCI, CSCW, and ML is not to *replace* these bricolage practices with rigid top-down planning, but to develop scaffolding that enhances the rigor of bricolage while preserving creativity and adaptability
1
0
0
@lukeguerdan
Luke Guerdan
5 days
Yet from urban planning to software engineering, history is rife with examples where rigid top-down interventions have failed while bottom-up alternatives designed to better scaffold *existing* practices succeeded
1
0
0
@lukeguerdan
Luke Guerdan
5 days
In light of these findings, we might be tempted to more stringently enforce a rigid "top-down planning approach" to measurement, in which data scientists more carefully define construct β†’ design operationalization β†’ collect data
1
0
0
@lukeguerdan
Luke Guerdan
5 days
How do data scientists evaluate validity? They treat their target variable as a tangible object to be scrutinized. They "poke holes" in their definition then "patch" them. They apply "spot checks" to reconcile their theoretical understanding of a concept with observed labels
1
0
0
@lukeguerdan
Luke Guerdan
5 days
Data scientists navigate this balancing act by adaptively applying (re)formulation strategies For example, they use "swapping" to change target variables when the first has unanticipated challenges, or "composing" to capture complementary dimensions of a concept
1
0
0
@lukeguerdan
Luke Guerdan
5 days
While engaging in bricolage, data scientists balance the validity of their target variable with other criteria, such as: πŸ’‘Simplicity βš™οΈ Resource requirements 🎯 Predictive performance 🌎 Portability
1
0
0
@lukeguerdan
Luke Guerdan
5 days
We find that target variable construction is a *bricolage practice*, in which data scientists creatively "make do" with the limited resources at hand
1
0
0
@lukeguerdan
Luke Guerdan
5 days
To explore this tension, we interviewed 15 data scientists from education and healthcare sectors to understand their practices, challenges, and perceived opportunities for target variable construction in predictive modeling
1
0
0
@lukeguerdan
Luke Guerdan
5 days
Traditional measurement theory assumes a top-down workflow, where data is collected to fit a study's goals (define construct β†’ design operationalization β†’ collect data) In contrast, data scientists are often forced to reconcile their measurement goals with *existing* data
1
0
0
@lukeguerdan
Luke Guerdan
5 days
A subtle aspect of predictive modeling is target variable construction: translating an unobservable concept like "healthcare need" into a prediction target But how does target variable construction unfold in practice, and how can we better support it going forward? #CSCW2025🧡
2
4
10
@yewonbyun_
Emily Byun
10 days
πŸ’‘Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data
2
33
133
@BenDLaufer
Benjamin Laufer
2 months
1/10. In a new paper with @didaoh and Jon Kleinberg, we mapped the family trees of 1.86 million AI models on Hugging Face β€” the largest open-model ecosystem in the world. AI evolution looks kind of like biology, but with some strange twists. πŸ§¬πŸ€–
4
9
51
@anna_kawakami
Anna Kawakami
4 months
Excited to share our #FAccT25 translation tutorial, where we'll explore how to reconceptualize AI measurement as a stakeholder-engaged design practice πŸ™‹πŸ”πŸ–₯️ Next week Thurs 6/26 at 3:15 pm (last day and session - please don't leave the conference early!) 🧡
2
12
57
@hannawallach
Hanna Wallach (@hannawallach.bsky.social)
4 months
Check out the camera-ready version of our @icmlconf position paper ("Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge") to learn more!!!
Tweet card summary image
arxiv.org
The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges...
0
11
26
@russellbrandom
Russell Brandom
5 months
In @techreview, I wrote about the crisis in AI evaluations β€” and why a new focus on validity could be the best way forward
2
8
34
@tzushengkuo
Tzu-Sheng Kuo ιƒ­ε­η”Ÿ
7 months
✨How can we help communities collaboratively shape policies that impact them? In our #CHI2025 paper, we present PolicyCraft, a system that supports collaborative policy design through case-grounded deliberation. (🧡/11)
2
13
61
@hannawallach
Hanna Wallach (@hannawallach.bsky.social)
9 months
Remember this @NeurIPSConf workshop paper? We spent the past month writing a newer, better, longer version!!! You can find it online here:
Tweet card summary image
arxiv.org
The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges...
@hannawallach
Hanna Wallach (@hannawallach.bsky.social)
11 months
Evaluating Generative AI Systems is a Social Science Measurement Challenge: https://t.co/km7E8rV5Uu TL;DR: The ML community would benefit from learning from and drawing on the social sciences when evaluating GenAI systems.
1
17
53
@d19fe8
Ken Holstein
10 months
I'm recruiting a postdoc via the CBI Fellowship Program ( https://t.co/0c1WTuhy40). Interested in designing & evaluating interactive systems for participatory AI, augmented and collective intelligence, responsible AI, or related areas? Please apply and reach out to me via email!
3
20
80