Explore tweets tagged as #ResearchCodeBench
@tianyu_hua
Tianyu Hua
2 months
🚨 New benchmark alert! 🚨. Can today’s LLMs implement tomorrow’s research ideas?. We put them to the test. Introducing #ResearchCodeBench:.212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 🧵
Tweet media one
3
26
89
@tianyu_hua
Tianyu Hua
2 months
The benchmark is open source—code, tasks, and evaluation framework are all available. For full details, check out the links below:.🔗 Benchmark: 📄 Paper: 💻 Code: #AI #LLM #MachineLearning #ResearchCodeBench.
1
1
10
@benklieger
Ben Klieger
2 months
New benchmarked I collaborated on at the Stanford Autonomous Agents Lab! . ResearchCodeBench turns the core contributions of novel research papers represented in code, most unseen by the LLMs tested, to create an eval with top score of 37%. Check out the paper and eval ⬇️.
@tianyu_hua
Tianyu Hua
2 months
🚨 New benchmark alert! 🚨. Can today’s LLMs implement tomorrow’s research ideas?. We put them to the test. Introducing #ResearchCodeBench:.212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 🧵
Tweet media one
7
1
27
@sundeep
sunny madra
2 months
Check out ResearchCodeBench.
@benklieger
Ben Klieger
2 months
New benchmarked I collaborated on at the Stanford Autonomous Agents Lab! . ResearchCodeBench turns the core contributions of novel research papers represented in code, most unseen by the LLMs tested, to create an eval with top score of 37%. Check out the paper and eval ⬇️.
0
0
8
@aaron_defazio
Aaron Defazio
2 months
Schedule-Free Learning is part of the new Research Code Benchmark! Very cool!.Top models can implement the algorithm using the paper as reference about 1/3 of the time.
0
0
18