crwhite_ml Profile Banner
Colin White Profile
Colin White

@crwhite_ml

Followers
2K
Following
3K
Media
64
Statuses
440

Evaluating generative AI models. Research Scientist at @MetaAI. Prev @abacusai, @SCSatCMU

Bay area
Joined June 2019
Don't wanna be here? Send us removal request.
@crwhite_ml
Colin White
22 days
RT @AdamZweiger: @2prime_PKU everyone always asks who/what is adam. never how is adam.
0
27
0
@crwhite_ml
Colin White
4 months
LiveBench is a Spotlight Paper at #ICLR2025!.I am sadly not in Singapore, but stop by the poster (Friday 3pm Singapore time) to chat with @ArkaPal999 , Ben Feuer, @micahgoldblum!
Tweet media one
0
0
18
@crwhite_ml
Colin White
4 months
RT @yisongyue: One of my PhD students got their visa revoked. I know of other cases amongst my AI colleagues. This is not what investing….
0
171
0
@crwhite_ml
Colin White
4 months
RT @Ahmad_Al_Dahle: Introducing our first set of Llama 4 models!. We’ve been hard at work doing a complete re-design of the Llama series. I….
0
933
0
@crwhite_ml
Colin White
5 months
RT @DKokotajlo: "How, exactly, could AI take over by 2027?". Introducing AI 2027: a deeply-researched scenario forecast I wrote alongside @….
0
1K
0
@crwhite_ml
Colin White
5 months
RT @nick11roberts: 📉📉NEW SCALING LAW PHENOMENON 📉📉 . We find that knowledge and reasoning exhibit different scaling behaviors! . Super exci….
0
172
0
@crwhite_ml
Colin White
6 months
RT @micahgoldblum: Here’s an easy trick for improving the performance of gradient-boosted decision trees like XGBoost allowing them to read….
0
92
0
@crwhite_ml
Colin White
7 months
RT @bindureddy: Livebench Coding for o3 mini is ABSOLUTELY INSANE.
0
23
0
@crwhite_ml
Colin White
7 months
RT @DanHendrycks: We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to….
0
775
0
@crwhite_ml
Colin White
7 months
RT @FrankRHutter: The data science revolution is getting closer. TabPFN v2 is published in Nature: On tabular class….
0
253
0
@crwhite_ml
Colin White
8 months
RT @AmandaAskell: Personal highlights from Claude's snarky AI comedy set.
Tweet media one
0
491
0
@crwhite_ml
Colin White
8 months
RT @polynoamial: I think it's safe to say LLMs can reason.
0
97
0
@crwhite_ml
Colin White
8 months
RT @OfficialLoganK: It is not just vibes, gemini-exp-1206 has really made significant progress (#2 overall on Livebench), can't wait to tes….
0
64
0
@crwhite_ml
Colin White
9 months
We have now added all these models to LiveBench! (the models from the past week that claimed #1 on Chatbot Arena 🏆) LiveBench focuses more on hard reasoning questions, which makes it complementary to Chatbot Arena (both worthwhile but test different dimensions)
Tweet media one
@lmarena_ai
lmarena.ai
9 months
Woah, huge news again from Chatbot Arena🔥. @GoogleDeepMind’s just released Gemini (Exp 1121) is back stronger (+20 points), tied #1🏅Overall with the latest GPT-4o-1120 in Arena!. Ranking gains since Gemini-Exp-1114:. - Overall #3#1.- Overall (StyleCtrl): #5 -> #2.- Hard
Tweet media one
5
3
29
@crwhite_ml
Colin White
9 months
RT @shaunralston: Thanks, @crwhite_ml, for the update; feels more aligned with what we see on the backend. https://….
0
2
0
@crwhite_ml
Colin White
9 months
RT @khodakmoments: 🧵 on surprising revelations from our study of specialized foundation models (FMs beyond vision/text): after evaluating d….
0
62
0
@crwhite_ml
Colin White
10 months
RT @osanseviero: BREAKING NEWS. The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Literature to the Attent….
0
181
0
@crwhite_ml
Colin White
11 months
New Gemini models on .🚨Gemini-1.5-Pro-002 is close to GPT-4o!.🚨Gemini-1.5-Flash-002 surprisingly achieves clear SotA on Instruction Following!
Tweet media one
2
2
12
@crwhite_ml
Colin White
11 months
RT @polynoamial: @geoframeai @OpenAIDevs I told it that it's a new model from @OpenAI and asked it to determine what's special about it. In….
0
6
0
@crwhite_ml
Colin White
11 months
RT @bindureddy: O1-preview IS THE NEW KING - TOPS LIVEBENCH AI. WOOT! o1-preview is the new top model and beats Sonnet 3.5 considerably. o….
0
80
0