Xiao Wang @Sandy_wx95 X Profile

Xiao Wang

@Sandy_wx95

Followers

5

Following

5

Media

12

Statuses

17

PhD student in the NLP group at the University of Manchester

Joined October 2016

Don't wanna be here? Send us removal request.

Xiao Wang

@Sandy_wx95

17 days

RT @Manchester_NLP: Delighted to share the #ACL2025 and TACL papers (10 in total) from the @manchester_nlp group! Come chat with our staff….

0

8

0

Xiao Wang

@Sandy_wx95

3 months

RT @chenghua_lin: Thanks for sharing our paper!.Human evaluators respond differently to the same guidelines, so why assume a single evaluat….

0

3

0

Xiao Wang

@Sandy_wx95

4 months

9/n Thanks to our co-authors: @Rexhaif @siweiwu7 @yiqi_617 @egere14 @NafiseSadat @chenghua_lin.

0

1

4

Xiao Wang

@Sandy_wx95

4 months

8/n Case Study Analysis : ContrastScore leverages probability discrepancy to align with human judgment. Overall: ContrastScore can achieve higher-quality, less biased and more efficient evaluation of generated text.

1

4

Xiao Wang

@Sandy_wx95

4 months

7/n Ablation Study Insights:.Contrastive formulation matters!.• our subtraction-based contrastive formulation: Llama(8B,3B) 0.498, Qwen(7B, 3B) 0.470.• original ratio-based for decoding (Li et al, 2023): Llama(8B,3B) 0.429, Qwen(7B, 3B) 0.435

1

2

Xiao Wang

@Sandy_wx95

4 months

6/n Improving efficiency with faster process speed:.• MT: ContrastScore(Llama3B,1B) 1.5X faster than single Llama 8B.• SUM: ContrastScore(Qwen3B,0.5B) 1.7X faster than single Qwen 7B

1

2

Xiao Wang

@Sandy_wx95

4 months

5/n Mitigating length biases in summarization:.• Single Llama 3B: 0.289 -> ContrastScore(Llama 3B, Llama 1B): 0.086(-70.2%).• Single Qwen 3B: 0.349 -> ContrastScore(Qwen 3B, Qwen 0.5B): 0.253(-27.5%)

1

2

Xiao Wang

@Sandy_wx95

4 months

4/n Mitigating biases in likelihood preferences. • MT: Single Llama 8B: 0.352 -> ContrastScore(Llama 8B, Llama 3B): 0.137 (-61.1%).• SUM: Single Llama 8B: 0.381 -> ContrastScore(Llama 8B, Llama 3B): 0.240 (-37.0%)

1

2

Xiao Wang

@Sandy_wx95

4 months

3/n Achieving higher correlation with human judgments:.• Single Llama 8B: 0.457 -> ContrastScore(Llama 8B, Llama 3B): 0.498(+9.0%).• Single Qwen 7B: 0.442 -> ContrastScore(Qwen 7B, Qwen 3B): 0.470(+6.3%)

1

2

Xiao Wang

@Sandy_wx95

4 months

2/n We introduce a novel contrastive evaluation metric for assessing generated text, and test on two tasks: machine translation (MT) and summarization (SUM).

1

3

Xiao Wang

@Sandy_wx95

4 months

1/n Delighted to share the release of our new preprint "ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation".📄Paper: 💻Code:

1

6

10

Xiao Wang

@Sandy_wx95

4 months

5/n Mitigating length biases in summarization task. • Single Llama 3B: 0.289 -> ContrastScore(Llama 3B, Llama 1B): 0.086(-70.2%).• Single Qwen 3B: 0.349 -> ContrastScore(Qwen 3B, Qwen 0.5B): 0.253(-27.5%)

0

Xiao Wang

@Sandy_wx95

4 months

4/n Mitigating biases in likelihood preferences. • MT: Single Llama 8B: 0.352 -> ContrastScore(Llama 8B, Llama 3B): 0.137(-61.1%).• SUM:Single Llama 8B: 0.381 -> ContrastScore(Llama 8B, Llama 3B): 0.240(-37.0%)

1

0

Xiao Wang

@Sandy_wx95

4 months

3/n Achieving higher correlation with human judgments:.• Single Llama 8B: 0.457 -> ContrastScore(Llama 8B, Llama 3B): 0.498(+9.0%).• Single Qwen 7B: 0.442 -> ContrastScore(Qwen 7B, Qwen 3B): 0.470(+6.3%)

1

0

Xiao Wang

@Sandy_wx95

4 months

2/n We introduce a novel contrastive evaluation metric for assessing generated text, and test on two tasks: machine translation (MT) and summarization (SUM).

1

0

Xiao Wang

@Sandy_wx95

5 months

RT @gowitheflow98: Introducing the 3rd edition of BioLaySumm shared task hosted at the BioNLP Workshop @ #ACL2025NLP ! .

0

5

0

Xiao Wang

@Sandy_wx95

9 months

RT @Manchester_NLP: We’re pleased to share that the Manchester NLP Group will be presenting *11 papers* at #EMNLP2024. Feel free to drop by….

0

13

0