Tuhina Tripathi
@tuhina_tripathi
Followers
58
Following
253
Media
1
Statuses
9
PhD at UMass Amherst | MS @CUBoulder
Amherst, MA
Joined June 2018
We have been overlooking a key factor in LLM-as-a-judge evaluation: the feedback collection protocol. Our #COLM2025 paper presents a comprehensive study on how feedback protocols shape reliability and bias in LLM evaluations.
2
7
29
We did this work with amazing collaborators @ManyaWadhwa1
@gregd_nlp
@scottniekum Read the full paper:
0
0
0
Generator models can exploit these distractors to game rankings and artificially climb leaderboards, making evaluation protocols a crucial design choice for fair comparison.
1
0
2
Distractor features can make an LLM judge change its preference even when one response isn’t actually better. We observed these preference flips in 35% of pairwise evaluations, compared to just 9% with absolute scoring.
1
0
2
Pairwise feedback is far easier to distract with spurious features like length, or assertiveness. Absolute scoring individual responses is substantially more robust to these superficial factors.
1
0
1
After a MEGA finals run in which we scored 18 points, Team MARBLE has placed 3rd in the @DARPA #SubTChallenge! We are extremely happy with our performance as a team. Congratulations to the 1st and 2nd place finishers @CerberusSubt and @CSIRORobotics!!
2
4
43
Dear @airindiain, it has been a terrible experience with the airlines. Firstly, my flight was canceled but I still haven't received any message/email regarding that. After finding out about the cancellation somehow, spent 6 hours trying to connect to your ticketing department 👇
2
1
2