Tuhina Tripathi @tuhina_tripathi X Profile

Tuhina Tripathi

@tuhina_tripathi

Followers

58

Following

253

Media

1

Statuses

9

PhD at UMass Amherst | MS @CUBoulder

https://t.co/qHkYhoOvHN

Amherst, MA

Joined June 2018

Don't wanna be here? Send us removal request.

Tuhina Tripathi

@tuhina_tripathi

1 month

We have been overlooking a key factor in LLM-as-a-judge evaluation: the feedback collection protocol. Our #COLM2025 paper presents a comprehensive study on how feedback protocols shape reliability and bias in LLM evaluations.

2

7

29

Tuhina Tripathi

@tuhina_tripathi

1 month

We did this work with amazing collaborators @ManyaWadhwa1 @gregd_nlp @scottniekum Read the full paper:

0

Tuhina Tripathi

@tuhina_tripathi

1 month

Generator models can exploit these distractors to game rankings and artificially climb leaderboards, making evaluation protocols a crucial design choice for fair comparison.

1

0

2

Tuhina Tripathi

@tuhina_tripathi

1 month

Distractor features can make an LLM judge change its preference even when one response isn’t actually better. We observed these preference flips in 35% of pairwise evaluations, compared to just 9% with absolute scoring.

1

0

2

Tuhina Tripathi

@tuhina_tripathi

1 month

Pairwise feedback is far easier to distract with spurious features like length, or assertiveness. Absolute scoring individual responses is substantially more robust to these superficial factors.

1

0

1

MARBLE-Boulder

@BoulderMarble

4 years

After a MEGA finals run in which we scored 18 points, Team MARBLE has placed 3rd in the @DARPA #SubTChallenge! We are extremely happy with our performance as a team. Congratulations to the 1st and 2nd place finishers @CerberusSubt and @CSIRORobotics!!

2

4

43

Arpit Bahety

@ArpitBahety

4 years

Dear @airindiain, it has been a terrible experience with the airlines. Firstly, my flight was canceled but I still haven't received any message/email regarding that. After finding out about the cancellation somehow, spent 6 hours trying to connect to your ticketing department 👇

2

1

2