Kshitish Ghate Profile
Kshitish Ghate

@GhateKshitish

Followers
93
Following
266
Media
8
Statuses
54

PhD student @UWCSE | MLT Grad student @LTIatCMU | CS and Econ @bitspilanigoa

Joined October 2022
Don't wanna be here? Send us removal request.
@GhateKshitish
Kshitish Ghate
2 months
🚨New paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences? With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
1
17
67
@lucy3_li
Lucy Li
2 months
PhD apps season is here! 😱🥳 Apply to do a PhD @WisconsinCS (as pictured) w/ me to research: - Societal impact of AI - NLP ←→ CSS and cultural analytics - Computational sociolinguistics - Human-AI interaction - Culturally competent and inclusive NLP https://t.co/YVrGa3BjWg
17
71
364
@ma_tay_
Taylor Sorensen
2 months
@emollick My best hypothesis for the mechanism is: Chat LLMs are hyperoptimized to approximate the single "best" (most-preferred) response. When you prompt it for a single story, it gives the single best story it can. When you ask it to give FIVE stories, you recast the "best" response to
3
3
20
@GhateKshitish
Kshitish Ghate
2 months
Work done with amazing collaborators 🙏 @uilydna @devanshrjain @ma_tay_ @Dr_Atoosa @aylin_cim @MonaDiab77 @MaartenSap
0
1
12
@GhateKshitish
Kshitish Ghate
2 months
For more details about our experiments and findings -- Paper: https://t.co/2y8rQmhcad Code and Data: https://t.co/SreNh5N8pm Please feel free to reach out if you are interested in this work and would like to chat!
Tweet card summary image
github.com
Repository for the paper "EVALUESTEER: MEASURING REWARD MODEL STEERABILITY TOWARDS VALUES AND PREFERENCES" - kshitishghate/EVALUESTEER-benchmark
1
0
3
@GhateKshitish
Kshitish Ghate
2 months
🚨Current RMs may systematically favor certain cultural/stylistic perspectives. EVALUESTEER enables measuring this steerability gap. By controlling values and styles independently, we isolate where models fail due to biases and inability to identify/steer to diverse preferences.
1
0
3
@GhateKshitish
Kshitish Ghate
2 months
Finding 3: All RMs exhibit style-over-substance bias. In value-style conflict scenarios: • Models choose style-aligned responses 57-73% of the time • Persists even with explicit instructions to prioritize values • Consistent across all model sizes and types
1
0
3
@GhateKshitish
Kshitish Ghate
2 months
Finding 2: The RMs we tested generally show intrinsic value and style-biased preferences for: • Secular over traditional values • Self-expression over survival values • Verbose, confident, and formal/cold language
1
0
3
@GhateKshitish
Kshitish Ghate
2 months
Finding 1: Even the best RMs struggle to identify which profile aspects matter for a given prompt query. GPT-4.1-Mini and Gemini-2.5-Flash have ~75% accuracy with full user profile context, while having >99% in the Oracle setting (only relevant info provided).
2
0
3
@GhateKshitish
Kshitish Ghate
2 months
We generate pairs where responses differ only on value alignment or only on style, or when value and style preferences conflict between responses. This lets us isolate whether models can identify and adapt to the relevant dimension for each prompt despite facing confounds.
1
0
3
@GhateKshitish
Kshitish Ghate
2 months
We need controlled variation of values AND styles to test RM steerability. We generate ~166k synthetic preference pairs with profiles that systematically vary: • 4 value dimensions (World Values Survey) • 4 style dimensions (verbosity, confidence, warmth, reading difficulty)
1
0
3
@GhateKshitish
Kshitish Ghate
2 months
Benchmarks like RewardBench test general RM performance in an aggregate sense. The PRISM benchmark has diverse human preferences but lacks ground-truth value/style labels for controlled evaluation. https://t.co/dFEMR0opBG https://t.co/iJAeNSuBUq
1
0
3
@GhateKshitish
Kshitish Ghate
2 months
LLMs serve users with different values (traditional vs secular, survival vs self-expression) and style preferences (verbosity, confidence, warmth, reading difficulty). As a result, we need RMs that can adapt to individual preferences, not just optimize for an "average" user.
1
0
3
@ma_tay_
Taylor Sorensen
2 months
🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵
5
49
194
@GhateKshitish
Kshitish Ghate
2 months
Check out our new paper that uses simulated moral dilemmas to study how LLMs prioritize different values!
@uilydna
Andy Liu
2 months
🚨New Paper: LLM developers aim to align models with values like helpfulness or harmlessness. But when these conflict, which values do models choose to support? We introduce ConflictScope, a fully-automated evaluation pipeline that reveals how models rank values under conflict.
0
0
2
@SmithaMilli
smitha milli
5 months
Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵
12
70
331
@devanshrjain
Devansh Jain
7 months
Thrilled to launch Prompt Adaptation, a state-of-the-art agentic system to automate prompt engineering 🚀
@tomas_hk
Tomas Hernando Kofman
7 months
Today we’re launching Prompt Adaptation, a state-of-the-art agentic system that automatically adapts prompts across LLMs. Prompt Adaptation outperforms all other methods and significantly improves accuracy over manual prompt engineering, saving you thousands of hours per year.
1
4
9
@gvrkiran
Kiran Garimella
7 months
This dataset papers offers a rare glimpse into how LLMs are actually used in the wild. Over 94k real-world use cases, mapped by occupation and application type. A nice addition to the Anthropic paper I tweeted a while ago to study AI's societal impact. https://t.co/PdcIN2I4Q4
4
19
64
@m2saxon
Michael Saxon ✈️ NeurIPS SD
7 months
Super excited that I'll be joining the @UW @TechPolicyLab and @uwnlp as a postdoc in the Fall working with @aylin_cim to continue my research directions in situated evaluation, multimodal/lingual GenAI, and start exploring new directions in safety and alignment! Open to collabs😉
52
7
214