Colin White @crwhite_ml X Profile

Colin White

@crwhite_ml

Followers

2K

Following

3K

Media

64

Statuses

440

Evaluating generative AI models. Research Scientist at @MetaAI. Prev @abacusai, @SCSatCMU

Bay area

Joined June 2019

Don't wanna be here? Send us removal request.

Colin White

@crwhite_ml

22 days

RT @AdamZweiger: @2prime_PKU everyone always asks who/what is adam. never how is adam.

0

27

0

Colin White

@crwhite_ml

4 months

LiveBench is a Spotlight Paper at #ICLR2025!.I am sadly not in Singapore, but stop by the poster (Friday 3pm Singapore time) to chat with @ArkaPal999 , Ben Feuer, @micahgoldblum!

0

18

Colin White

@crwhite_ml

4 months

RT @yisongyue: One of my PhD students got their visa revoked. I know of other cases amongst my AI colleagues. This is not what investing….

0

171

0

Colin White

@crwhite_ml

4 months

RT @Ahmad_Al_Dahle: Introducing our first set of Llama 4 models!. We’ve been hard at work doing a complete re-design of the Llama series. I….

0

933

0

Colin White

@crwhite_ml

5 months

RT @DKokotajlo: "How, exactly, could AI take over by 2027?". Introducing AI 2027: a deeply-researched scenario forecast I wrote alongside @….

0

1K

0

Colin White

@crwhite_ml

5 months

RT @nick11roberts: 📉📉NEW SCALING LAW PHENOMENON 📉📉 . We find that knowledge and reasoning exhibit different scaling behaviors! . Super exci….

0

172

0

Colin White

@crwhite_ml

6 months

RT @micahgoldblum: Here’s an easy trick for improving the performance of gradient-boosted decision trees like XGBoost allowing them to read….

0

92

0

Colin White

@crwhite_ml

7 months

RT @bindureddy: Livebench Coding for o3 mini is ABSOLUTELY INSANE.

0

23

0

Colin White

@crwhite_ml

7 months

RT @DanHendrycks: We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to….

0

775

0

Colin White

@crwhite_ml

7 months

RT @FrankRHutter: The data science revolution is getting closer. TabPFN v2 is published in Nature: On tabular class….

0

253

0

Colin White

@crwhite_ml

8 months

RT @AmandaAskell: Personal highlights from Claude's snarky AI comedy set.

0

491

0

Colin White

@crwhite_ml

8 months

RT @polynoamial: I think it's safe to say LLMs can reason.

0

97

0

Colin White

@crwhite_ml

8 months

RT @OfficialLoganK: It is not just vibes, gemini-exp-1206 has really made significant progress (#2 overall on Livebench), can't wait to tes….

0

64

0

Colin White

@crwhite_ml

9 months

We have now added all these models to LiveBench! (the models from the past week that claimed #1 on Chatbot Arena 🏆) LiveBench focuses more on hard reasoning questions, which makes it complementary to Chatbot Arena (both worthwhile but test different dimensions)

lmarena.ai

@lmarena_ai

9 months

Woah, huge news again from Chatbot Arena🔥. @GoogleDeepMind’s just released Gemini (Exp 1121) is back stronger (+20 points), tied #1🏅Overall with the latest GPT-4o-1120 in Arena!. Ranking gains since Gemini-Exp-1114:. - Overall #3 → #1.- Overall (StyleCtrl): #5 -> #2.- Hard

5

3

29

Colin White

@crwhite_ml

9 months

RT @shaunralston: Thanks, @crwhite_ml, for the update; feels more aligned with what we see on the backend. https://….

0

2

0

Colin White

@crwhite_ml

9 months

RT @khodakmoments: 🧵 on surprising revelations from our study of specialized foundation models (FMs beyond vision/text): after evaluating d….

0

62

0

Colin White

@crwhite_ml

10 months

RT @osanseviero: BREAKING NEWS. The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Literature to the Attent….

0

181

0

Colin White

@crwhite_ml

11 months

New Gemini models on .🚨Gemini-1.5-Pro-002 is close to GPT-4o!.🚨Gemini-1.5-Flash-002 surprisingly achieves clear SotA on Instruction Following!

2

12

Colin White

@crwhite_ml

11 months

RT @polynoamial: @geoframeai @OpenAIDevs I told it that it's a new model from @OpenAI and asked it to determine what's special about it. In….

0

6

0

Colin White

@crwhite_ml

11 months

RT @bindureddy: O1-preview IS THE NEW KING - TOPS LIVEBENCH AI. WOOT! o1-preview is the new top model and beats Sonnet 3.5 considerably. o….

0

80

0