🚨 Introducing BenCzechMark (BCM) 🇨🇿—the 1st multitask & multimetric Czech benchmark for large language models! 🧠 🔗 Check out the leaderboard: https://t.co/3nFANXN35i 📖 Read more in our Hugging Face blog: https://t.co/wxnD7BMoKn
#NLP #AI #CzechLanguage #LLM
1
8
10
Replies
✨ 50 tasks 📚 9 categories 📋 - Covering domains from historical Czech to language learner essays & spoken word 🔢 26 submitted systems currently 📊 Unique duel scoring based on statistical significance!
1
0
0
👑 Llama-450B currently reigns supreme in BenCzechMark! But it’s not unbeatable—other models shine in specific categories like Math and Sentiment. 📊
1
0
0
- Qwen-72B shone in Math and Historical IR but lagged behind similarly-sized models in other categories. - Aya-23-35B model excels in Sentiment and Language Modeling, but lags behind in different categories. - Gemma-2 9B delivers excellent results in Czech reading comprehension.
1
0
0
🚀 Submit your model to BenCzechMark without going public! Our leaderboard currently features over 25 models of varying sizes, and you can test your model's performance privately—publishing is optional! 🔒
1
0
0
🌟 Interested in more information? Dive into our blog post for all the details on BenCzechMark! 📖 Stay tuned—our paper is coming soon! 📄 🔗
1
1
0