Explore tweets tagged as #ResearchCodeBench
🚨 New benchmark alert! 🚨. Can today’s LLMs implement tomorrow’s research ideas?. We put them to the test. Introducing #ResearchCodeBench:.212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 🧵
3
26
89
The benchmark is open source—code, tasks, and evaluation framework are all available. For full details, check out the links below:.🔗 Benchmark: 📄 Paper: 💻 Code: #AI #LLM #MachineLearning #ResearchCodeBench.
1
1
10
New benchmarked I collaborated on at the Stanford Autonomous Agents Lab! . ResearchCodeBench turns the core contributions of novel research papers represented in code, most unseen by the LLMs tested, to create an eval with top score of 37%. Check out the paper and eval ⬇️.
🚨 New benchmark alert! 🚨. Can today’s LLMs implement tomorrow’s research ideas?. We put them to the test. Introducing #ResearchCodeBench:.212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 🧵
7
1
27
Check out ResearchCodeBench.
New benchmarked I collaborated on at the Stanford Autonomous Agents Lab! . ResearchCodeBench turns the core contributions of novel research papers represented in code, most unseen by the LLMs tested, to create an eval with top score of 37%. Check out the paper and eval ⬇️.
0
0
8