
Colin White
@crwhite_ml
Followers
2K
Following
3K
Media
64
Statuses
440
Evaluating generative AI models. Research Scientist at @MetaAI. Prev @abacusai, @SCSatCMU
Bay area
Joined June 2019
LiveBench is a Spotlight Paper at #ICLR2025!.I am sadly not in Singapore, but stop by the poster (Friday 3pm Singapore time) to chat with @ArkaPal999 , Ben Feuer, @micahgoldblum!
0
0
18
RT @yisongyue: One of my PhD students got their visa revoked. I know of other cases amongst my AI colleagues. This is not what investing….
0
171
0
RT @Ahmad_Al_Dahle: Introducing our first set of Llama 4 models!. We’ve been hard at work doing a complete re-design of the Llama series. I….
0
933
0
RT @DKokotajlo: "How, exactly, could AI take over by 2027?". Introducing AI 2027: a deeply-researched scenario forecast I wrote alongside @….
0
1K
0
RT @nick11roberts: 📉📉NEW SCALING LAW PHENOMENON 📉📉 . We find that knowledge and reasoning exhibit different scaling behaviors! . Super exci….
0
172
0
RT @micahgoldblum: Here’s an easy trick for improving the performance of gradient-boosted decision trees like XGBoost allowing them to read….
0
92
0
RT @DanHendrycks: We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to….
0
775
0
RT @FrankRHutter: The data science revolution is getting closer. TabPFN v2 is published in Nature: On tabular class….
0
253
0
RT @OfficialLoganK: It is not just vibes, gemini-exp-1206 has really made significant progress (#2 overall on Livebench), can't wait to tes….
0
64
0
We have now added all these models to LiveBench! (the models from the past week that claimed #1 on Chatbot Arena 🏆) LiveBench focuses more on hard reasoning questions, which makes it complementary to Chatbot Arena (both worthwhile but test different dimensions)
Woah, huge news again from Chatbot Arena🔥. @GoogleDeepMind’s just released Gemini (Exp 1121) is back stronger (+20 points), tied #1🏅Overall with the latest GPT-4o-1120 in Arena!. Ranking gains since Gemini-Exp-1114:. - Overall #3 → #1.- Overall (StyleCtrl): #5 -> #2.- Hard
5
3
29
RT @shaunralston: Thanks, @crwhite_ml, for the update; feels more aligned with what we see on the backend. https://….
0
2
0
RT @khodakmoments: 🧵 on surprising revelations from our study of specialized foundation models (FMs beyond vision/text): after evaluating d….
0
62
0
RT @osanseviero: BREAKING NEWS. The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Literature to the Attent….
0
181
0
RT @polynoamial: @geoframeai @OpenAIDevs I told it that it's a new model from @OpenAI and asked it to determine what's special about it. In….
0
6
0
RT @bindureddy: O1-preview IS THE NEW KING - TOPS LIVEBENCH AI. WOOT! o1-preview is the new top model and beats Sonnet 3.5 considerably. o….
0
80
0