Sayan Ghosh @sayan__ghosh X Profile

Sayan Ghosh

@sayan__ghosh

Followers

176

Following

9K

Media

7

Statuses

31

PhD Student at @nlp_usc; Formerly @umich

Joined January 2014

Don't wanna be here? Send us removal request.

Sayan Ghosh

@sayan__ghosh

6 months

RT @_abraranwar: All these VLAs allow robots to do more tasks, but when you're physically testing many policies, it's hard to eval on every….

0

17

0

Sayan Ghosh

@sayan__ghosh

1 year

RT @jetskitosway2: playing the weirdest Autechre album for the largest crowd possible, he’s really living my dream.

0

61

0

Grok

@grok

2 days

Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.

597

2K

8K

Sayan Ghosh

@sayan__ghosh

1 year

RT @joemuggs: Weird, I've written and talked a lot about "the trip hop era" but one album that rarely came up in conversation in my researc….

0

61

0

Sayan Ghosh

@sayan__ghosh

1 year

RT @m2saxon: Sick of LM capability overclaims? Sick of arbitrary, meaningless benchmarks? Endless hill climbing?. Our COLM paper on 𝐌𝐨𝐝𝐞𝐥 𝐌….

arxiv.org

Modern language models (LMs) pose a new challenge in capability assessment. Static benchmarks inevitably saturate without providing confidence in the deployment tolerances of LM-based systems, but...

0

36

0

Sayan Ghosh

@sayan__ghosh

1 year

RT @BillPourquoimec: Ryuichi Sakamoto drumming.

0

1K

0

Sayan Ghosh

@sayan__ghosh

1 year

SEPARABILITY can be a valuable measure for model users and developers to compare LLMs more effectively by giving more weight to test instances that are likely to elicit reliable preference ratings. Many thanks to my collaborators @_Tejas_S_ @swabhz.

0

6

Sayan Ghosh

@sayan__ghosh

1 year

How else can a model developer/user use SEPARABILITY?.We incorporate SEPARABILITY into ELO scores for more nuanced comparisons between models. 6/n

1

0

4

Sayan Ghosh

@sayan__ghosh

1 year

How can a model developer/user use SEPARABILITY?.Calculating the distribution of SEPARABILITY over a benchmark (e.g. CNN/DailyMail) gives insights into how useful a benchmark may be for model comparison. 5/n

1

0

3

Sayan Ghosh

@sayan__ghosh

1 year

We find that test instances with high SEPARABILITY are more likely to result in more consistent preference ratings from both human- and auto-raters. 4/n

1

0

2

Sayan Ghosh

@sayan__ghosh

1 year

Introducing SEPARABILITY: a measure that incorporates the amount of variability within each model’s outputs and between both model’s outputs to measure how distinguishable two models’ sets of generations are for a particular test instance. 3/n.

1

0

2

Sayan Ghosh

@sayan__ghosh

1 year

How do we do this? We expose two sources of variability that can complicate preference rating collection when comparing two LLMs:. - The outputs from two models being compared have high similarity.- Each model’s own outputs for the same input have low similarity. In these cases,

1

0

3

Sayan Ghosh

@sayan__ghosh

1 year

Pairwise preference judgments are now standard for evaluating LLM generations and preference-tuning LLMs. In our new paper, we ask: are all test instances equally suitable for preference ratings?. We introduce a meta-evaluation measure called SEPARABILITY to estimate how likely a

1

9

51

Sayan Ghosh

@sayan__ghosh

2 years

RT @icwsm: The Outstanding Methodology Paper Award for ICWSM 2023 goes to: "Bridging nations: quantifying the role of multilinguals in comm….

0

4

0

Sayan Ghosh

@sayan__ghosh

2 years

RT @perkopair: I'm Your Brother (Club Version).

0

5

0

Sayan Ghosh

@sayan__ghosh

4 years

RT @vinodkpg: This was a long, but fun, informal collaboration for me. I met Sayan at ACL 2019, and I think we started chalking up this ide….

0

1

0