Explore tweets tagged as #RepoCod
@Lin0Tan
Lin Tan
9 months
All LLMs including GPT-4o achieve < 30% pass@1 on real-world code completion: Check out 🐟REPOCOD, a real-world code generation benchmark:.- Repository-level context.- Whole function generation.- Validation with test cases.- 980 instances from 11 Python projects.#LLM #LLM4Code
Tweet media one
Tweet media two
Tweet media three
3
11
59
@TheAITimeline
The AI Timeline
9 months
Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'. Overview:.LLMs demonstrate high accuracy on Python coding in benchmarks like HumanEval and MBPP. However, they do not yet match human developers in code completion for real-world tasks, which current benchmarks
Tweet media one
1
1
10
@rohanpaul_ai
Rohan Paul
9 months
REPOCOD, proposed in this paper, proves LLMs can't replace programmers yet by testing real-world coding scenarios. 🎯 Original Problem:. Existing benchmarks show LLMs achieve >90% accuracy in code generation, raising the question: Can they replace human programmers? Current
Tweet media one
@Lin0Tan
Lin Tan
9 months
Can #LLMs replace developers? Introducing RepoCod-Lite 🐟 for faster evaluation to answer this: 200 of the toughest #RepoCod #code-generation tasks:.- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.- Leaderboard - 67
Tweet media one
6
7
30
@AINativeF
AI Native Foundation
9 months
13. Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'. 🔑 Keywords: large language models, code generation, benchmark, real-world software development, REPOCOD . 💡 Category: Generative Models . 🌟 Research Objective: To evaluate the capability of large language
Tweet media one
1
0
0
@Lin0Tan
Lin Tan
3 months
Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod ( and website generation tool 🧇WAFFLE (!.#LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security
Tweet media one
Tweet media two
Tweet media three
Tweet media four
5
4
27
@rohanpaul_ai
Rohan Paul
9 months
Paper - "Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'". Generated this podcast on this Paper with Google's Illuminate, a specialized tool to create podcast from arXiv papers only
0
1
3
@LiangShanchao
Shanchao Liang
5 months
New models, same struggle!.DeepSeek-R1, GPT-4.5 Preview, and Claude 3.7 Sonnet still show low performance on realistic coding tasks (RepoCod-Lite), with Pass@1 of 4.5%, 4.0%, and 3.5%. Top models just reach 10% Pass@1. We need better approaches for code generation!.#AI4Code #LLM
Tweet media one
2
1
7
@ComputerPapers
Software Engineering
10 months
Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'.
0
0
1
@Lin0Tan
Lin Tan
9 months
Can #LLMs replace developers? Introducing RepoCod-Lite 🐟 for faster evaluation to answer this: 200 of the toughest #RepoCod #code-generation tasks:.- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.- Leaderboard - 67
Tweet media one
@Lin0Tan
Lin Tan
9 months
Can language models replace developers? RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world method-level code generation tasks. Leaderboard #LLM4code #LLM #CodeGeneration #Security.@cerias @PurdueScience.
3
16
74
@Lin0Tan
Lin Tan
5 months
@lmarena_ai GPT 4.5 ranks only 10th for realistic complex coding tasks from GitHub repositories . RepoCod tasks are.- General code generation tasks, and .- complex tasks: longest average canonical solution length (331.6 tokens). #LLM4Code
Tweet media one
@Lin0Tan
Lin Tan
9 months
Can #LLMs replace developers? Introducing RepoCod-Lite 🐟 for faster evaluation to answer this: 200 of the toughest #RepoCod #code-generation tasks:.- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.- Leaderboard - 67
Tweet media one
0
0
3
@Lin0Tan
Lin Tan
6 months
We have got DeepSeek v3, o1, and o3-mini results on RepoCod Lite. DeepSeek V3 outperforms o1 and o3-mini and has the best performance on RepoCod LITE.
0
0
5
@arXivGPT
arXivGPT
9 months
🏷️:Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'. 🔗:
Tweet media one
0
0
0
@mhatta
Masayuki Hatta
9 months
Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'
0
0
1
@Lin0Tan
Lin Tan
9 months
Can language models replace developers? RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world method-level code generation tasks. Leaderboard #LLM4code #LLM #CodeGeneration #Security.@cerias @PurdueScience.
@Lin0Tan
Lin Tan
9 months
- Paper: - Github: .- Huggingface: #CodeCompletion.@LiangShanchao @NanJiang719 @huyiran1007.
0
6
28
@Lin0Tan
Lin Tan
9 months
leaderboard of 10 latest #LLMs on generating real-world code with repository-level context .#securecode #security.
0
0
3
@Lin0Tan
Lin Tan
3 months
@_akhaliq @YuxiangWei9 Excellent work! Time to test it on complex code generation tasks with repository-level context 😀 @YuxiangWei9.
0
0
3
@Datumo_AI
Datumo
9 months
Can AI really code? 🤖 .LLMs show 90%+ accuracy in Python tests, but new benchmark REPOCOD reveals struggles in real-world coding. What does this mean for the future of developers? . 🔗Explore more: #AI #Coding #Tech #Datumo #LLM
Tweet media one
1
0
3
@Lin0Tan
Lin Tan
9 months
- Paper: - Github: .- Huggingface: #CodeCompletion.@LiangShanchao @NanJiang719 @huyiran1007.
0
4
14
@LiangShanchao
Shanchao Liang
5 months
- Paper: - Leaderboard for Lite: - Huggingface: @NanJiang719 @huyiran1007 @Lin0Tan.#CodeCompletion.
0
0
1
@Lin0Tan
Lin Tan
3 months
@LiangShanchao @NanJiang719 @huyiran1007 @aclmeeting @PurdueCS @cerias 2. We need better approaches for code generation!. - RepoCod Leaderboard, dataset, and preprint: - The top 2 are DeepSeek-V3 and GPT-4o. - Details:
@Lin0Tan
Lin Tan
9 months
Can #LLMs replace developers? Introducing RepoCod-Lite 🐟 for faster evaluation to answer this: 200 of the toughest #RepoCod #code-generation tasks:.- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.- Leaderboard - 67
Tweet media one
0
0
3