Called it with some knowledge about the model. Ultra is going to break ground! Those quibbling over hellaswag and mmlu is just showing their misunderstanding about evaluation. Onwards 🚀 Tweet added by rohan anil @_arohan_

rohan anil

5 months

Called it with some knowledge about the model. Ultra is going to break ground! Those quibbling over hellaswag and mmlu is just showing their misunderstanding about evaluation. Onwards 🚀

lmsys.org

@lmsysorg

5 months

🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini…

154

629

3K

8

5

106

keveman

@keveman

5 months

@_arohan_ Ultra perhaps the first one to exceed 1300 elo?

1

0

1

rohan anil

@_arohan_

5 months

@keveman 🫡

0

1

Nathan Lambert

@natolambert

5 months

@_arohan_ 🫡 haters gonna hate, most realistic people still count google in / as the favorite in the AI race

1

0

7

rohan anil

@_arohan_

5 months

@natolambert I want to be petty and want to respond to everyone who makes a big deal about mmlu and hellaswag but I just dont have enough time in the day to make an alt account and start memeing

1

0

12

Shawn

@Shawnryan96

5 months

@_arohan_ I have never found evals to mean anything to me as the user. It is better to test yourself and see what works

0

Jing Yu Koh

@kohjingyu

5 months

@_arohan_ Can't wait for multimodal Ultra!

0

2

Blaze (Balázs Galambosi)

@gblazex

5 months

@_arohan_ You can argue about HellaSwag but MMLU is far from useless. Human/LLM-judged responses can be biased towards conv. style and length. Strictly general chatbot use-case, yes Arena is king. But if using the model for other things ,breadth of knowledge (MMLU) can matter a lot

0

Programmer Dude

@programmer_dude

5 months

@_arohan_ When we ramping?

0

Blaze (Balázs Galambosi)

@gblazex

5 months

@_arohan_ Also congrats on the results. GPT-4 could be using similar techniques in bg to enhance responses as Bard. Competition is good for everyone. Just be cautious about ditching every single benchmark other than Arena. They can have their place too (e.g. MMLU).

0

Replies