
Manuel Fernández
@manuFernandezBu
Followers
51
Following
33
Media
5
Statuses
17
Nuevo preprint de @manuFernandezBu y equipo What Large Language Models Know About Plant Molecular Biology https://t.co/nScN223kp2 Manuel Fernandez Burda, Lucia Ferrero, Nicolás Gaggion, Camille Fonouni-Farde, MoBiPlant Consortium, Martín Crespi, Federico Ariel, Enzo Ferrante
biorxiv.org
Large language models (LLMs) are rapidly permeating scientific research, yet their capabilities in plant molecular biology remain largely uncharacterized. Here, we present MOBIPLANT, the first...
1
3
8
🧬🌱New preprint! MoBiPlant is a benchmark built with 112 experts to test how LLMs understand plant molecular biology. Great collaboration driven by @manuFernandezBu @enzoferrante @arg_epilab. Happy to have brought a small contribution! Preprint and detailed thread below :
Large language models are reshaping the way we do science, but how well do they actually understand plant molecular biology? ➡️We created MoBiPlant to answer this. 📝Preprint: https://t.co/EhQf4YofLw 💾Dataset: https://t.co/D1B5lBj0UR (Thread below)
0
3
11
🌱🧬New preprint: What do LLMs know about Plant Molecular Biology? Take a look at MoBiPlant, our new benchmark to measure LLMs performance built by more than 100 plant scientists from 19 countries! Amazing collaboration! 🚀 More info in this thread 👇 https://t.co/xujvRjUFj0
Large language models are reshaping the way we do science, but how well do they actually understand plant molecular biology? ➡️We created MoBiPlant to answer this. 📝Preprint: https://t.co/EhQf4YofLw 💾Dataset: https://t.co/D1B5lBj0UR (Thread below)
3
41
112
🪴 📷🤖Nuevo preprint: What do LLMs know about Plant Molecular Biology? Les presentamos MoBiPlant, nuestro nuevo benchmark para medir el rendimiento de los LLM creado por más de 100 científicos de plantas de 19 países! 👇
Large language models are reshaping the way we do science, but how well do they actually understand plant molecular biology? ➡️We created MoBiPlant to answer this. 📝Preprint: https://t.co/EhQf4YofLw 💾Dataset: https://t.co/D1B5lBj0UR (Thread below)
0
2
4
🪴 🤖 New preprint: What do LLMs know about Plant Molecular Biology? Take a look at MoBiPlant, our new benchmark to measure LLMs performance built by more than 100 plant scientist from 19 countries! More info in this thread 👇 https://t.co/DlRYZCX00x
Large language models are reshaping the way we do science, but how well do they actually understand plant molecular biology? ➡️We created MoBiPlant to answer this. 📝Preprint: https://t.co/EhQf4YofLw 💾Dataset: https://t.co/D1B5lBj0UR (Thread below)
1
12
45
Grateful to be part of this collective effort — huge thanks to everyone involved! 🌱✨ 🙌 Special thanks to Enzo @enzoferrante and Fede @arg_epilab for guiding the work, and to Nico @ngaggion and Luci @luviferrero for building this side by side.
0
0
7
🤔LLMs tend to choose the first option. Popular LLMs hit 75%+ on our MCQs, but many default to option A when unsure, supporting previous literature. We quantify this with shuffled permutations and show that although some models are more robust, most present option bias.
1
0
3
⚠️❗Expert reviews expose critical failure modes despite high MCQ scores We uncover moderate factual alignment, frequent omissions, hallucinations, and low self-awareness. We catalog concrete pitfalls (species confusion, cross-domain bias, outdated knowledge, wrong references).
1
0
4
👀🔍LLM strength tracks canon, not the frontier. We found that model performance rises as the facts being asked about rely on highly cited sources, and questions about review articles outscore those from research articles by ~10–15 pts.
1
0
4
🌱🤖 MoBiPlant is the first benchmark of LLMs for plant molecular biology. - Built by 112 experts across 19 countries. - Packs 565 expert-curated MCQs + 1,075 synthetic items. - Tests LLMs knowledge spanning gene regulation to plant-environment interactions.
1
0
4
Large language models are reshaping the way we do science, but how well do they actually understand plant molecular biology? ➡️We created MoBiPlant to answer this. 📝Preprint: https://t.co/EhQf4YofLw 💾Dataset: https://t.co/D1B5lBj0UR (Thread below)
3
19
28
Very proud to introduce Kaleidoscope ✨🌿 🌍 18 languages (Bengali → Spanish) 📚 14 subjects (Humanities → STEM) 📸 55% requiring image understanding! A very important open science collaboration — which extends in-language evaluation for vision models to many more languages.
🚀 We are excited to introduce Kaleidoscope, the largest culturally-authentic exam benchmark. 📌 Most VLM benchmarks are English-centric or rely on translations—missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual 🌎 & multimodal 👀 VLMs evaluation
4
30
132
Kaleidoscope is out 🌈! An in-language multimodal multilingual exams dataset created to evaluate VLMs capabilities This is the result of a great multi-institutional collaboration led by @CohereForAI Special congrats to @manuFernandezBu for this first publication!
🚀 We are excited to introduce Kaleidoscope, the largest culturally-authentic exam benchmark. 📌 Most VLM benchmarks are English-centric or rely on translations—missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual 🌎 & multimodal 👀 VLMs evaluation
5
8
38
Many thanks to everyone involved in creating this benchmark --especially to those who carefully extracted the data. This work really sets us closer towards building inclusive and culturally representative AI.
0
0
11
Check it out!👇 Arxiv: https://t.co/FteIEiFdWY HF: https://t.co/R8F4zt5wKf Website (navigate through the data!):
huggingface.co
1
1
13
I'm excited to announce the release of Kaleidoscope! A multimodal multilingual benchmark composed of 20,911 real-world questions: 🗣️ 18 languages 📚 14 subjects 📸 55% multimodal questions
7
15
31