Explore tweets tagged as #ResearchCodeBench
@tianyu_hua
Patrick Hua
6 months
🚨 New benchmark alert! 🚨 Can today’s LLMs implement tomorrow’s research ideas? We put them to the test. Introducing #ResearchCodeBench: 212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 https://t.co/gh4kC8dtr6 🧵
3
26
88
@tianyu_hua
Patrick Hua
6 months
The benchmark is open source—code, tasks, and evaluation framework are all available. For full details, check out the links below: 🔗 Benchmark: https://t.co/gh4kC8dtr6 📄 Paper: https://t.co/4LtFrVUWgk 💻 Code: https://t.co/259sd6mn9t #AI #LLM #MachineLearning #ResearchCodeBench
1
1
10
@benklieger
Ben Klieger
6 months
New benchmarked I collaborated on at the Stanford Autonomous Agents Lab! ResearchCodeBench turns the core contributions of novel research papers represented in code, most unseen by the LLMs tested, to create an eval with top score of 37%. Check out the paper and eval ⬇️
@tianyu_hua
Patrick Hua
6 months
🚨 New benchmark alert! 🚨 Can today’s LLMs implement tomorrow’s research ideas? We put them to the test. Introducing #ResearchCodeBench: 212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 https://t.co/gh4kC8dtr6 🧵
7
1
26
@sundeep
sunny madra
6 months
Check out ResearchCodeBench https://t.co/2SgfGhRz2E
@benklieger
Ben Klieger
6 months
New benchmarked I collaborated on at the Stanford Autonomous Agents Lab! ResearchCodeBench turns the core contributions of novel research papers represented in code, most unseen by the LLMs tested, to create an eval with top score of 37%. Check out the paper and eval ⬇️
0
0
8
@aaron_defazio
Aaron Defazio
7 months
Schedule-Free Learning is part of the new Research Code Benchmark! Very cool! Top models can implement the algorithm using the paper as reference about 1/3 of the time. https://t.co/4C8HXThBXS
0
0
18
@santosergioreal
Sergio
5 months
Confira meu artigo mais recente: IA - ResearchCodeBench: Por que 60% dos Códigos de Pesquisa ainda são um Enigma para os LLMs mais Avançados. https://t.co/8Ca0FgOenj via @LinkedIn
0
0
0