Sayan Ghosh Profile
Sayan Ghosh

@sayan__ghosh

Followers
176
Following
9K
Media
7
Statuses
31

PhD Student at @nlp_usc; Formerly @umich

Joined January 2014
Don't wanna be here? Send us removal request.
@sayan__ghosh
Sayan Ghosh
6 months
RT @_abraranwar: All these VLAs allow robots to do more tasks, but when you're physically testing many policies, it's hard to eval on every….
0
17
0
@sayan__ghosh
Sayan Ghosh
1 year
RT @jetskitosway2: playing the weirdest Autechre album for the largest crowd possible, he’s really living my dream.
0
61
0
@grok
Grok
2 days
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
597
2K
8K
@sayan__ghosh
Sayan Ghosh
1 year
RT @joemuggs: Weird, I've written and talked a lot about "the trip hop era" but one album that rarely came up in conversation in my researc….
0
61
0
@sayan__ghosh
Sayan Ghosh
1 year
RT @m2saxon: Sick of LM capability overclaims? Sick of arbitrary, meaningless benchmarks? Endless hill climbing?. Our COLM paper on 𝐌𝐨𝐝𝐞𝐥 𝐌….
Tweet card summary image
arxiv.org
Modern language models (LMs) pose a new challenge in capability assessment. Static benchmarks inevitably saturate without providing confidence in the deployment tolerances of LM-based systems, but...
0
36
0
@sayan__ghosh
Sayan Ghosh
1 year
RT @BillPourquoimec: Ryuichi Sakamoto drumming.
Tweet media one
0
1K
0
@sayan__ghosh
Sayan Ghosh
1 year
SEPARABILITY can be a valuable measure for model users and developers to compare LLMs more effectively by giving more weight to test instances that are likely to elicit reliable preference ratings. Many thanks to my collaborators @_Tejas_S_ @swabhz.
0
0
6
@sayan__ghosh
Sayan Ghosh
1 year
How else can a model developer/user use SEPARABILITY?.We incorporate SEPARABILITY into ELO scores for more nuanced comparisons between models. 6/n
Tweet media one
1
0
4
@sayan__ghosh
Sayan Ghosh
1 year
How can a model developer/user use SEPARABILITY?.Calculating the distribution of SEPARABILITY over a benchmark (e.g. CNN/DailyMail) gives insights into how useful a benchmark may be for model comparison. 5/n
Tweet media one
1
0
3
@sayan__ghosh
Sayan Ghosh
1 year
We find that test instances with high SEPARABILITY are more likely to result in more consistent preference ratings from both human- and auto-raters. 4/n
Tweet media one
1
0
2
@sayan__ghosh
Sayan Ghosh
1 year
Introducing SEPARABILITY: a measure that incorporates the amount of variability within each model’s outputs and between both model’s outputs to measure how distinguishable two models’ sets of generations are for a particular test instance. 3/n.
1
0
2
@sayan__ghosh
Sayan Ghosh
1 year
How do we do this? We expose two sources of variability that can complicate preference rating collection when comparing two LLMs:. - The outputs from two models being compared have high similarity.- Each model’s own outputs for the same input have low similarity. In these cases,
Tweet media one
1
0
3
@sayan__ghosh
Sayan Ghosh
1 year
Pairwise preference judgments are now standard for evaluating LLM generations and preference-tuning LLMs. In our new paper, we ask: are all test instances equally suitable for preference ratings?. We introduce a meta-evaluation measure called SEPARABILITY to estimate how likely a
Tweet media one
Tweet media two
1
9
51
@sayan__ghosh
Sayan Ghosh
2 years
RT @icwsm: The Outstanding Methodology Paper Award for ICWSM 2023 goes to: "Bridging nations: quantifying the role of multilinguals in comm….
0
4
0
@sayan__ghosh
Sayan Ghosh
2 years
RT @perkopair: I'm Your Brother (Club Version).
0
5
0
@sayan__ghosh
Sayan Ghosh
4 years
RT @vinodkpg: This was a long, but fun, informal collaboration for me. I met Sayan at ACL 2019, and I think we started chalking up this ide….
0
1
0