Explore tweets tagged as #ResearchCodeBench
🚨 New benchmark alert! 🚨 Can today’s LLMs implement tomorrow’s research ideas? We put them to the test. Introducing #ResearchCodeBench: 212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 https://t.co/gh4kC8dtr6 🧵
3
26
88
The benchmark is open source—code, tasks, and evaluation framework are all available. For full details, check out the links below: 🔗 Benchmark: https://t.co/gh4kC8dtr6 📄 Paper: https://t.co/4LtFrVUWgk 💻 Code: https://t.co/259sd6mn9t
#AI #LLM #MachineLearning #ResearchCodeBench
1
1
10
New benchmarked I collaborated on at the Stanford Autonomous Agents Lab! ResearchCodeBench turns the core contributions of novel research papers represented in code, most unseen by the LLMs tested, to create an eval with top score of 37%. Check out the paper and eval ⬇️
🚨 New benchmark alert! 🚨 Can today’s LLMs implement tomorrow’s research ideas? We put them to the test. Introducing #ResearchCodeBench: 212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 https://t.co/gh4kC8dtr6 🧵
7
1
26
Check out ResearchCodeBench https://t.co/2SgfGhRz2E
New benchmarked I collaborated on at the Stanford Autonomous Agents Lab! ResearchCodeBench turns the core contributions of novel research papers represented in code, most unseen by the LLMs tested, to create an eval with top score of 37%. Check out the paper and eval ⬇️
0
0
8
Schedule-Free Learning is part of the new Research Code Benchmark! Very cool! Top models can implement the algorithm using the paper as reference about 1/3 of the time. https://t.co/4C8HXThBXS
0
0
18
Confira meu artigo mais recente: IA - ResearchCodeBench: Por que 60% dos Códigos de Pesquisa ainda são um Enigma para os LLMs mais Avançados. https://t.co/8Ca0FgOenj via @LinkedIn
0
0
0