Called it with some knowledge about the model.
Ultra is going to break ground!
Those quibbling over hellaswag and mmlu is just showing their misunderstanding about evaluation.
Onwards 🚀
🔥Breaking News from Arena
Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to
@Google
for the remarkable achievement!
The race is heating up like never before! Super excited to see what's next for Bard + Gemini…
@natolambert
I want to be petty and want to respond to everyone who makes a big deal about mmlu and hellaswag but I just dont have enough time in the day to make an alt account and start memeing
@_arohan_
You can argue about HellaSwag but MMLU is far from useless. Human/LLM-judged responses can be biased towards conv. style and length.
Strictly general chatbot use-case, yes Arena is king. But if using the model for other things ,breadth of knowledge (MMLU) can matter a lot
@_arohan_
Also congrats on the results. GPT-4 could be using similar techniques in bg to enhance responses as Bard. Competition is good for everyone.
Just be cautious about ditching every single benchmark other than Arena. They can have their place too (e.g. MMLU).