#ResearchCodeBench X Hashtag

Explore tweets tagged as #ResearchCodeBench

Patrick Hua

@tianyu_hua

6 months

🚨 New benchmark alert! 🚨 Can today’s LLMs implement tomorrow’s research ideas? We put them to the test. Introducing #ResearchCodeBench: 212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 https://t.co/gh4kC8dtr6 🧵

3

26

88

Patrick Hua

@tianyu_hua

6 months

The benchmark is open source—code, tasks, and evaluation framework are all available. For full details, check out the links below: 🔗 Benchmark: https://t.co/gh4kC8dtr6 📄 Paper: https://t.co/4LtFrVUWgk 💻 Code: https://t.co/259sd6mn9t #AI #LLM #MachineLearning #ResearchCodeBench

1

10

Ben Klieger

@benklieger

6 months

New benchmarked I collaborated on at the Stanford Autonomous Agents Lab! ResearchCodeBench turns the core contributions of novel research papers represented in code, most unseen by the LLMs tested, to create an eval with top score of 37%. Check out the paper and eval ⬇️

Patrick Hua

@tianyu_hua

6 months

🚨 New benchmark alert! 🚨 Can today’s LLMs implement tomorrow’s research ideas? We put them to the test. Introducing #ResearchCodeBench: 212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 https://t.co/gh4kC8dtr6 🧵

7

1

26

sunny madra

@sundeep

6 months

Check out ResearchCodeBench https://t.co/2SgfGhRz2E

Ben Klieger

@benklieger

6 months

New benchmarked I collaborated on at the Stanford Autonomous Agents Lab! ResearchCodeBench turns the core contributions of novel research papers represented in code, most unseen by the LLMs tested, to create an eval with top score of 37%. Check out the paper and eval ⬇️

0

8

Aaron Defazio

@aaron_defazio

7 months

Schedule-Free Learning is part of the new Research Code Benchmark! Very cool! Top models can implement the algorithm using the paper as reference about 1/3 of the time. https://t.co/4C8HXThBXS

0

18

Sergio

@santosergioreal

5 months

Confira meu artigo mais recente: IA - ResearchCodeBench: Por que 60% dos Códigos de Pesquisa ainda são um Enigma para os LLMs mais Avançados. https://t.co/8Ca0FgOenj via @LinkedIn

0