Jimin Mun
@jiminmun_
Followers
298
Following
349
Media
1
Statuses
69
phd student @LTIatCMU she/her
Joined June 2020
As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adoptersβ needs and key research opportunities behind. https://t.co/YprwsthysY
1
37
83
Congratulations to our own Maarten Sap on the tremendous honor of being named a 2025 Packard Fellow!
Iβm β¨ super excited and grateful β¨to announce that I'm part of the 2025 class of #PackardFellows ( https://t.co/MUl0kGlC3h). The Packard Foundation and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI ππ
0
8
22
π’ Announcing the First Workshop on Multilingual and Multicultural Evaluation (MME) β co-located with #EACL2026 π²π¦ π
Mar 24β29, 2026 | Rabat, Morocco MME focuses on resources, metrics & methodologies for evaluating multilingual systems! https://t.co/60yCZUjbzH ποΈ Submit by
1
19
73
We find that 1) acceptance of AI varies widely depending on use case context, 2) judgments differ between demographic groups, and 3) people use both cost-benefit AND rule-based reasoning to make their decisions where diverging strategies show higher disagreement.
0
0
0
To build consensus around AI use cases, it's imperative to understand how people, especially lay-users, reason about AI use cases. We asked 197 participants to make decisions on individual AI use cases and share their reasoning process.
1
0
0
Next stop for conference hopping: #AIES2025 in Madrid! I'll be giving an oral presentation of our paper Why (Not) Use AI during paper session 1 tomorrow (10/20) at 11:45AM :) See details in thread below π https://t.co/p9ai1jZt01
arxiv.org
In recent years, there has been a growing recognition of the need to incorporate lay-people's input into the governance and acceptability assessment of AI usage. However, how and why people judge...
2
0
12
(Fri Oct 10) SoLaR Workshop Please join us at the third iteration of the SoLaR workshop (Socially Responsible Language Models Research)! We have a very exciting full day program! Thanks the amazing organization efforts with @valentina__py @MaartenSap @jiminmun_ and other
Although I canβt attend #COLM2025 in person this year, my ππππππππππ ππππππππππ collaborators and co-organizers are running some exciting sessions. Be sure to check them out! (1/N)
0
6
15
Amazing theoretical work on how to generate text-based synthetic data that will *actually* improve performance on statistical inference!! π€©
π‘Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data
0
1
7
πFor the SoLaR workshop @COLM_conf we are soliciting opinion abstracts to encourage new perspectives and opinions on responsible language modeling, 1-2 of which will be selected to be presented at the workshop. Please use the google form below to submit your opinion abstract β¬οΈ
2
13
35
π€― We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even workβοΈ Here's why: π§΅ Blogpost: https://t.co/jBPlm7cyhr
73
354
2K
Dear ChatGPT, Am I the Asshole? While Reddit users might say yes, your favorite LLM probably wonβt. We present Social Sycophancy: a new way to understand and measure sycophancy as how LLMs overly preserve users' self-image.
13
39
343
Glad to share that our AgoraBench paper has been accepted at @aclmeeting 2025 (main)! Special thanks to our coauthors @scott_sjy @xiangyue96 @vijaytarian @sylee_ai @yizhongwyz @kgashteo Carolin @wellecks @gneubig! A belief I hold more firmly now than when I started this project
#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.
1
11
65
πWebsite: https://t.co/TboKA3u42V β¨ Organizers: @usmananwar391 @liweijianglw @valentina__py @sharonlevy21 Daniel Tan @akhila_yerukola @jiminmun_ @rutheappel @sumeetrm @DavidSKrueger @SheilaMcIlraith @MaartenSap
solar-colm.github.io
Call for Papers The 3rd Socially Responsible Language Modelling Research (SoLaR) workshop at COLM 2025 is soliciting papers on the socially responsible development and deployment of language models...
0
2
12
π’ The SoLaR workshop will be collocated with COLM! @COLM_conf SoLaR is a collaborative forum for researchers working on responsible development, deployment and use of language models. We welcome both technical and sociotechnical submissions, deadline July 5th!
1
19
85
Hi! I'm gonna be presenting this at #ICLR2025 during the Thursday poster session (4/24; 3 p.m - 5:30 p.m, Hall 3 + Hall 2B #208). Come by if you want to talk about making ice cream!! (and also human-computer grounding, interacting with LMs, user models, etc.)
LLMs sound homogeneous *because* feedback modalities like rankings, principles, and pairs cater to group-level preferences. Asking an individual to rank ~1K outputs or provide accurate principles takes effort. What if we relied on a few demos to elicit annotator preferences?
0
14
97
π Excited to share our #NAACL2025 paper on Language Model Personalization! https://t.co/ZW0XDWSXhv Current RLHF methods often overlook *whose* preferences are being optimized. This can cause conflicting signals and models that mainly cater to the βaverageβ or most dominant users
2
17
86
Humans backtrack where we should've made a better decision. How do we do this? We search and simulate alternative paths that might have led to better outcomes. OurπRETRO-Search mimics this process, empowering models to achieve SOTA performance AND efficient reasoning in mathπ
With the rise of R1, search seems out of fashion? We prove the opposite! π Introducing Retro-Search π: an MCTS-inspired search algorithm that RETROspectively revises R1βs reasoning traces to synthesize untaken, new reasoning paths that are better π‘, yet shorter in length β‘οΈ.
0
6
32
#NLProc New paper on "evaluation-time scaling", a new dimension to leverage test-time compute! We replicate the test-time scaling behaviors observed in generators (e.g., o1, r1, s1) with evaluators by enforcing to generate additional reasoning tokens. https://t.co/Qaxhdap52S
3
38
174
π¨New Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research! We address data limitations and offer a fresh evaluation method for the TOT complex queries. Curious how TREC TOT track test queries are created? Check out this threadπ§΅ and our paperπ:
2
11
30