#RepoCod X Hashtag | Muskviewer

Explore tweets tagged as #RepoCod

Lin Tan

@Lin0Tan

9 months

All LLMs including GPT-4o achieve < 30% pass@1 on real-world code completion: Check out 🐟REPOCOD, a real-world code generation benchmark:.- Repository-level context.- Whole function generation.- Validation with test cases.- 980 instances from 11 Python projects.#LLM #LLM4Code

3

11

59

The AI Timeline

@TheAITimeline

9 months

Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'. Overview:.LLMs demonstrate high accuracy on Python coding in benchmarks like HumanEval and MBPP. However, they do not yet match human developers in code completion for real-world tasks, which current benchmarks

1

10

Rohan Paul

@rohanpaul_ai

9 months

REPOCOD, proposed in this paper, proves LLMs can't replace programmers yet by testing real-world coding scenarios. 🎯 Original Problem:. Existing benchmarks show LLMs achieve >90% accuracy in code generation, raising the question: Can they replace human programmers? Current

Lin Tan

@Lin0Tan

9 months

Can #LLMs replace developers? Introducing RepoCod-Lite 🐟 for faster evaluation to answer this: 200 of the toughest #RepoCod #code-generation tasks:.- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.- Leaderboard - 67

6

7

30

AI Native Foundation

@AINativeF

9 months

13. Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'. 🔑 Keywords: large language models, code generation, benchmark, real-world software development, REPOCOD . 💡 Category: Generative Models . 🌟 Research Objective: To evaluate the capability of large language

1

0

Lin Tan

@Lin0Tan

3 months

Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod ( and website generation tool 🧇WAFFLE (!.#LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security

5

4

27

Rohan Paul

@rohanpaul_ai

9 months

Paper - "Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'". Generated this podcast on this Paper with Google's Illuminate, a specialized tool to create podcast from arXiv papers only

0

1

3

Shanchao Liang

@LiangShanchao

5 months

New models, same struggle!.DeepSeek-R1, GPT-4.5 Preview, and Claude 3.7 Sonnet still show low performance on realistic coding tasks (RepoCod-Lite), with Pass@1 of 4.5%, 4.0%, and 3.5%. Top models just reach 10% Pass@1. We need better approaches for code generation!.#AI4Code #LLM

2

1

7

Software Engineering

@ComputerPapers

10 months

Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'.

0

1

Lin Tan

@Lin0Tan

9 months

Can #LLMs replace developers? Introducing RepoCod-Lite 🐟 for faster evaluation to answer this: 200 of the toughest #RepoCod #code-generation tasks:.- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.- Leaderboard - 67

Lin Tan

@Lin0Tan

9 months

Can language models replace developers? RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world method-level code generation tasks. Leaderboard #LLM4code #LLM #CodeGeneration #Security.@cerias @PurdueScience.

3

16

74

Lin Tan

@Lin0Tan

5 months

@lmarena_ai GPT 4.5 ranks only 10th for realistic complex coding tasks from GitHub repositories . RepoCod tasks are.- General code generation tasks, and .- complex tasks: longest average canonical solution length (331.6 tokens). #LLM4Code

Lin Tan

@Lin0Tan

9 months

Can #LLMs replace developers? Introducing RepoCod-Lite 🐟 for faster evaluation to answer this: 200 of the toughest #RepoCod #code-generation tasks:.- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.- Leaderboard - 67

0

3

Lin Tan

@Lin0Tan

6 months

We have got DeepSeek v3, o1, and o3-mini results on RepoCod Lite. DeepSeek V3 outperforms o1 and o3-mini and has the best performance on RepoCod LITE.

0

5

arXivGPT

@arXivGPT

9 months

🏷️:Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'. 🔗:

0

Masayuki Hatta

@mhatta

9 months

Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'

0

1

Lin Tan

@Lin0Tan

9 months

Can language models replace developers? RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world method-level code generation tasks. Leaderboard #LLM4code #LLM #CodeGeneration #Security.@cerias @PurdueScience.

Lin Tan

@Lin0Tan

9 months

- Paper: - Github: .- Huggingface: #CodeCompletion.@LiangShanchao @NanJiang719 @huyiran1007.

0

6

28

Lin Tan

@Lin0Tan

9 months

leaderboard of 10 latest #LLMs on generating real-world code with repository-level context .#securecode #security.

0

3

Lin Tan

@Lin0Tan

3 months

@_akhaliq @YuxiangWei9 Excellent work! Time to test it on complex code generation tasks with repository-level context 😀 @YuxiangWei9.

0

3

Datumo

@Datumo_AI

9 months

Can AI really code? 🤖 .LLMs show 90%+ accuracy in Python tests, but new benchmark REPOCOD reveals struggles in real-world coding. What does this mean for the future of developers? . 🔗Explore more: #AI #Coding #Tech #Datumo #LLM

1

0

3

Lin Tan

@Lin0Tan

9 months

- Paper: - Github: .- Huggingface: #CodeCompletion.@LiangShanchao @NanJiang719 @huyiran1007.

0

4

14

Shanchao Liang

@LiangShanchao

5 months

- Paper: - Leaderboard for Lite: - Huggingface: @NanJiang719 @huyiran1007 @Lin0Tan.#CodeCompletion.

0

1

Lin Tan

@Lin0Tan

3 months

@LiangShanchao @NanJiang719 @huyiran1007 @aclmeeting @PurdueCS @cerias 2. We need better approaches for code generation!. - RepoCod Leaderboard, dataset, and preprint: - The top 2 are DeepSeek-V3 and GPT-4o. - Details:

Lin Tan

@Lin0Tan

9 months

Can #LLMs replace developers? Introducing RepoCod-Lite 🐟 for faster evaluation to answer this: 200 of the toughest #RepoCod #code-generation tasks:.- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.- Leaderboard - 67

0

3