Xiao Wang Profile
Xiao Wang

@Sandy_wx95

Followers
5
Following
5
Media
12
Statuses
17

PhD student in the NLP group at the University of Manchester

Joined October 2016
Don't wanna be here? Send us removal request.
@Sandy_wx95
Xiao Wang
17 days
RT @Manchester_NLP: Delighted to share the #ACL2025 and TACL papers (10 in total) from the @manchester_nlp group! Come chat with our staff….
0
8
0
@Sandy_wx95
Xiao Wang
3 months
RT @chenghua_lin: Thanks for sharing our paper!.Human evaluators respond differently to the same guidelines, so why assume a single evaluat….
0
3
0
@Sandy_wx95
Xiao Wang
4 months
0
1
4
@Sandy_wx95
Xiao Wang
4 months
8/n Case Study Analysis : ContrastScore leverages probability discrepancy to align with human judgment. Overall: ContrastScore can achieve higher-quality, less biased and more efficient evaluation of generated text.
Tweet media one
1
1
4
@Sandy_wx95
Xiao Wang
4 months
7/n Ablation Study Insights:.Contrastive formulation matters!.• our subtraction-based contrastive formulation: Llama(8B,3B) 0.498, Qwen(7B, 3B) 0.470.• original ratio-based for decoding (Li et al, 2023): Llama(8B,3B) 0.429, Qwen(7B, 3B) 0.435
Tweet media one
1
1
2
@Sandy_wx95
Xiao Wang
4 months
6/n Improving efficiency with faster process speed:.• MT: ContrastScore(Llama3B,1B) 1.5X faster than single Llama 8B.• SUM: ContrastScore(Qwen3B,0.5B) 1.7X faster than single Qwen 7B
Tweet media one
1
1
2
@Sandy_wx95
Xiao Wang
4 months
5/n Mitigating length biases in summarization:.• Single Llama 3B: 0.289 -> ContrastScore(Llama 3B, Llama 1B): 0.086(-70.2%).• Single Qwen 3B: 0.349 -> ContrastScore(Qwen 3B, Qwen 0.5B): 0.253(-27.5%)
Tweet media one
1
1
2
@Sandy_wx95
Xiao Wang
4 months
4/n Mitigating biases in likelihood preferences. • MT: Single Llama 8B: 0.352 -> ContrastScore(Llama 8B, Llama 3B): 0.137 (-61.1%).• SUM: Single Llama 8B: 0.381 -> ContrastScore(Llama 8B, Llama 3B): 0.240 (-37.0%)
Tweet media one
1
1
2
@Sandy_wx95
Xiao Wang
4 months
3/n Achieving higher correlation with human judgments:.• Single Llama 8B: 0.457 -> ContrastScore(Llama 8B, Llama 3B): 0.498(+9.0%).• Single Qwen 7B: 0.442 -> ContrastScore(Qwen 7B, Qwen 3B): 0.470(+6.3%)
Tweet media one
1
1
2
@Sandy_wx95
Xiao Wang
4 months
2/n We introduce a novel contrastive evaluation metric for assessing generated text, and test on two tasks: machine translation (MT) and summarization (SUM).
Tweet media one
1
1
3
@Sandy_wx95
Xiao Wang
4 months
1/n Delighted to share the release of our new preprint "ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation".📄Paper: 💻Code:
Tweet media one
1
6
10
@Sandy_wx95
Xiao Wang
4 months
5/n Mitigating length biases in summarization task. • Single Llama 3B: 0.289 -> ContrastScore(Llama 3B, Llama 1B): 0.086(-70.2%).• Single Qwen 3B: 0.349 -> ContrastScore(Qwen 3B, Qwen 0.5B): 0.253(-27.5%)
Tweet media one
0
0
0
@Sandy_wx95
Xiao Wang
4 months
4/n Mitigating biases in likelihood preferences. • MT: Single Llama 8B: 0.352 -> ContrastScore(Llama 8B, Llama 3B): 0.137(-61.1%).• SUM:Single Llama 8B: 0.381 -> ContrastScore(Llama 8B, Llama 3B): 0.240(-37.0%)
Tweet media one
1
0
0
@Sandy_wx95
Xiao Wang
4 months
3/n Achieving higher correlation with human judgments:.• Single Llama 8B: 0.457 -> ContrastScore(Llama 8B, Llama 3B): 0.498(+9.0%).• Single Qwen 7B: 0.442 -> ContrastScore(Qwen 7B, Qwen 3B): 0.470(+6.3%)
Tweet media one
1
0
0
@Sandy_wx95
Xiao Wang
4 months
2/n We introduce a novel contrastive evaluation metric for assessing generated text, and test on two tasks: machine translation (MT) and summarization (SUM).
Tweet media one
1
0
0
@Sandy_wx95
Xiao Wang
5 months
RT @gowitheflow98: Introducing the 3rd edition of BioLaySumm shared task hosted at the BioNLP Workshop @ #ACL2025NLP ! .
0
5
0
@Sandy_wx95
Xiao Wang
9 months
RT @Manchester_NLP: We’re pleased to share that the Manchester NLP Group will be presenting *11 papers* at #EMNLP2024. Feel free to drop by….
0
13
0