
Clayton Thorrez
@cthorrez
Followers
1K
Following
10K
Media
438
Statuses
3K
Rating systems and paired comparison experimentation enjoyer @arena Previous: ML @umich @umass @microsoft @apple
Joined March 2016
If you're good at data science/machine learning engineering/data engineering and want to exercise those skills on the most interesting and dynamic human preference dataset collected by mankind @lmarena_ai DM me
2
2
10
Next day delivery πͺ
π¨ Text Leaderboard Update Community votes are in, and @anthropicAI's Claude Haiku 4.5 ranks #22! It has quickly become one of the best value models on the most competitive leaderboard. It delivers a solid punch at a fraction of the cost of its bigger siblings. β‘οΈ A few
0
1
6
glicko2 still undefeated in accuracy and log loss I'm actually thinking of offering a bounty on this haha, can anyone come up with a better general purpose dynamic skill rating system than one from 2001?
0
0
1
EsportsBench v7 49k new matches from 6/30/2025 to 9/30/2025 https://t.co/j07qXG8ZXN
huggingface.co
1
0
3
Come check these cool new models out in our discord! (And then stay and have cool discussions with me about ranking and rating systems in the #leaderboards channel π)
π¨π¬ Veo 3.1 and Veo 3.1 Fast are in the Video Arena! Come see what all the chatter is about by trying it yourself. π Your real-world prompts will push the @googledeepmind video models to its true limits
0
1
6
lowkey miss standards of business conduct
When I worked at Microsoft, it was mandatory for all employees to watch this video about how a former employee made $400K trading company stock w indsider info. Got 2 years of jail. The message was to never inside trade company stock. Crypto is not company stock though, is it?
0
0
2
LLMs are so RLs by errors in their sandboxes that they explicitly disobey direct instructions
0
0
2
Look I know it's popular these days to jump on the arena bandwagon but this is wild insane to brand this as "creative"
so it kinda looks like Contra the "creative network" basically ripped off @designarena_ai's entire concept and their website design
1
0
5
me: writes some code and starts to write a comment LLM: # this is a bit of a hack πππ
0
0
1
Oh sweet! @glicko is publishing the videos from NESSIS 2025 on youtube ! https://t.co/0UkaoMKvWy Check them out if you are interested in sports statistics stuff
0
0
1
My side project of ranking esports players and teams is how I built the skills to do the side project I used to get hired at LMArena Now my full time job is rating and ranking :D
my side projects that i did in 2024 gave me a full time role at FAL where i get to do the thing which excites me the most i.e. optimizing ML inference: people underestimate the power of side projects
1
1
7
very neck and neck
π¨ Leaderboard Update: we have a four-way tie for #1 in the Arena! π The very top tier is now tied across the strongest models in the world: π Claude Sonnet 4.5 32k Thinking π Claude Sonnet 4.5 standard π Claude Opus 4.1 π Gemini 2.5 Pro All separated by just a few Arena
0
0
4
π Re-introducing Categories in Vision Arena! Since we first introduced categories over two years ago (and Vision Arena last year), the AI evaluation landscape has grown rapidly. Categories let us zoom in on model performance for specific areas, from captioning to diagrams. π§΅
1
9
101
everyone say thank you Mr. π
0
0
3
π¨ Big leaderboard update on the toughest Arena to crack: Text π Seven new models landed today, and five broke straight into the Top 10 ποΈ π¨ πΉ#8: Qwen3-VL-235B-a22b-Instruct & Qwen3-Max-2025-09-23 (tied) by @alibaba_qwen πΉ#9: DeepSeek V3.1 Terminus (Standard & Thinking
11
26
197